JPH1173419A

JPH1173419A - Method and device for retrieving electronic document

Info

Publication number: JPH1173419A
Application number: JP9232004A
Authority: JP
Inventors: Otoya Shirotsuka; 音也城塚
Original assignee: N T T DATA KK; NTT Data Corp
Current assignee: N T T DATA KK; NTT Data Group Corp
Priority date: 1997-08-28
Filing date: 1997-08-28
Publication date: 1999-03-16

Abstract

PROBLEM TO BE SOLVED: To provide a document retrieving device capable of highly accurately retrieving an electronic document suited to a retrieving purpose by a voice input. SOLUTION: Recognition tolerance and recognition time are applied to one or plural keyword candidates recognized from an input voice. An weighting information applying part 22 calculates retrieving weight in each keyword candidate from the recongition tolerance and time elapsed from the recognition time. A word merging part 24 merges retrieving weight values as to identical keyword candidates. A word determination part 25 specifies a keyword candidate having the highest retrieving weight as a retrieving keyword and outputs the specified candidate to a document management part 30. The management part 30 calculates retrieving weighted word importance in each electronic document based on the retrieving keyword and retrieves an electronic document having the highest retrieving weighted word importance from an electronic document DB 40.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、連続的に入力され
る音声や文字等の認識結果に基づいて電子文書を検索す
る方法及びこの方法の実施に適した文書検索装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for retrieving an electronic document based on recognition results of continuously input speech, characters, and the like, and a document retrieval apparatus suitable for implementing the method.

【０００２】[0002]

【従来の技術】キーボードやペンを通じて入力されたキ
ーワードに関連した一つまたは複数の電子文書をディス
プレイ等に表示させる文書検索装置が知られている。こ
のような文書検索装置は、パーソナルコンピュータによ
るファイル検索装置や、インターネット上のＷＷＷ（Wo
rld Wide Web）のページ検索装置に応用されている。こ
の種の文書検索装置は、図１０に示すように、検索対象
となる複数の電子文書を蓄積した電子文書データベース
（以下、電子文書ＤＢ）４０と、操作者から検索に用い
るキーワード（以下、検索キーワード）の入力を受け付
けるキーワード入力部６０と、この検索キーワードに基
づいて電子文書ＤＢ４０から該当する電子文書を索出す
る文書管理部７０とを備えて構成される。文書管理部７
０は、電子文書ＤＢ４０に蓄積されている各電子文書の
インデックスファイルとして機能する検索情報テーブル
３１と、外部からの新たな電子文書を電子文書ＤＢ４０
に蓄積するための文書入力部３３とを含んで構成されて
いる。2. Description of the Related Art A document retrieval apparatus for displaying one or a plurality of electronic documents related to a keyword inputted through a keyboard or a pen on a display or the like is known. Such a document search device includes a file search device using a personal computer and a WWW (WoW) on the Internet.
rld Wide Web). As shown in FIG. 10, this type of document search device includes an electronic document database (hereinafter, electronic document DB) 40 storing a plurality of electronic documents to be searched, and a keyword (hereinafter, search) used by an operator for search. A keyword input unit 60 that receives an input of a keyword), and a document management unit 70 that searches for a corresponding electronic document from the electronic document DB 40 based on the search keyword. Document management unit 7
0 is a search information table 31 functioning as an index file of each electronic document stored in the electronic document DB 40, and a new external electronic document is stored in the electronic document DB 40.
And a document input unit 33 for accumulating the data in a document.

【０００３】[0003]

【発明が解決しようとする課題】従来の文書検索装置２
では、キーボードやペンを使用して検索キーワードを入
力する必要があったため、ある手作業を行っている最中
に、補助的に電子文書の検索が必要となる場合は、本来
の作業が中断されてしまう。A conventional document retrieval apparatus 2
Had to use a keyboard or pen to enter search keywords, so if a user needed to search for an electronic document while performing a manual task, the original task would be interrupted. Would.

【０００４】この点を改善するため、音声認識技術を利
用してキーワード情報を入力することで、キーボード等
による入力作業を不要にする検索手法が提案されてい
る。この検索手法では、文音声、すなわち文法的に正し
い文を読み上げた音声、またはそれと同等の内容の音声
を認識して、その内容を言語的に解析し、解析結果を電
子文書ＤＢ４０への問い合わせ用言語で表現された検索
式に変換して文書検索を行う。予め登録した単語と同一
の検索キーワードを連続的に入力された音声中から検出
する手法が採用される場合もある。[0004] In order to improve this point, a search method has been proposed in which keyword information is input using a voice recognition technique, thereby eliminating the need for input work using a keyboard or the like. In this search method, sentence speech, that is, speech that reads a grammatically correct sentence, or speech with equivalent content is recognized, the content is analyzed linguistically, and the analysis result is used to inquire the electronic document DB 40. The document is searched by converting it into a search expression expressed in a language. In some cases, a method of detecting the same search keyword as a word registered in advance from continuously input speech is adopted.

【０００５】しかし、文音声を認識して検索キーワード
を特定することは、発声される可能性のある単語をすべ
て正確に認識することが必要となるため、実用性の点で
問題があった。また、文音声が連続的に入力された場合
は、音声中の単語が正しく認識されないまま検索キーワ
ードが特定されてしまう場合があり、検索精度を高める
ことができないという問題があった。However, identifying a search keyword by recognizing sentence speech requires accurate recognition of all words that may be uttered, and thus poses a problem in practicality. Further, when sentence speech is continuously input, a search keyword may be specified without correctly recognizing a word in the speech, and there is a problem that search accuracy cannot be improved.

【０００６】そこで、本発明の課題は、連続的に入力さ
れたキーワード情報から検索キーワードを正確に認識し
て電子文書の検索精度を高める改良された検索方法を提
供することにある。本発明の他の課題は、上記検索方法
の実施に適した文書検索装置を提供することにある。SUMMARY OF THE INVENTION It is an object of the present invention to provide an improved search method for accurately recognizing a search keyword from continuously inputted keyword information and improving the search accuracy of an electronic document. Another object of the present invention is to provide a document search device suitable for implementing the above search method.

【０００７】[0007]

【課題を解決するための手段】上記課題を解決する本発
明の検索方法は、コンピュータ等の電子装置において下
記の処理過程を含むことを特徴とする。（１）文音声等のキーワード情報から複数の単語を認識
する過程、（２）認識された個々の単語に、当該認識の
確からしさを表す認識尤度と認識時点からの経過時間と
に基づく検索重みを付加するとともに、同一単語が複数
認識された場合は単語同士の検索重みをマージする過
程、（３）前記検索重みが相対的に大きい単語を検索キ
ーワードとして特定する過程、（４）検索対象となる複
数の電子文書の各々について、前記特定した検索キーワ
ードの検索重みと当該検索キーワードがどれだけ重要な
単語かどうかを表す単語重要度とに基づいて検索重み付
き単語重要度を決定し、この検索重み付き単語重要度が
最も高い電子文書を特定する過程。A search method according to the present invention for solving the above-mentioned problems is characterized in that an electronic device such as a computer includes the following processing steps. (1) a process of recognizing a plurality of words from keyword information such as sentence speech, and (2) a search for each recognized word based on a recognition likelihood indicating the certainty of the recognition and an elapsed time from the recognition time. A process of adding weights and merging search weights between words when a plurality of identical words are recognized; (3) a process of specifying a word having a relatively large search weight as a search keyword; and (4) a search target. For each of the plurality of electronic documents to be determined, a search weighted word importance is determined based on the search weight of the specified search keyword and word importance indicating how important the search keyword is, and The process of identifying the electronic document with the highest search-weighted word importance.

【０００８】なお、前記検索重みは、前記認識尤度に比
例して大きくなり、前記経過時間が長くなるにつれて小
さくなるものである。[0008] The search weight increases in proportion to the recognition likelihood, and decreases as the elapsed time increases.

【０００９】また、上記他の課題を解決する本発明の文
書検索装置は、検索対象となる電子文書を蓄積した電子
文書蓄積手段と、検索時に入力されるキーワード情報か
ら複数の単語を認識するとともに個々の単語に認識の確
からしさを表す認識尤度と認識時刻とを付加して出力す
る単語認識手段と、前記認識された複数の単語の少なく
とも一つを検索キーワードとして特定するキーワード特
定手段と、特定した検索キーワードを用いて前記電子文
書蓄積手段から該当する電子文書を索出する検索手段と
を備え、前記キーワード特定手段が、前記認識尤度と認
識時点からの経過時間の少なくとも一方に基づいて検索
目的に応じた検索重みを作成し、作成した検索重みが相
対的に高い単語を優先的に前記検索キーワードとするよ
うに構成されていることを特徴とする。According to another aspect of the present invention, there is provided a document retrieval apparatus for retrieving a plurality of words from electronic document storage means for storing an electronic document to be retrieved and keyword information inputted at the time of retrieval. Word recognition means for adding and outputting a recognition likelihood and a recognition time representing the likelihood of recognition to each word, and keyword specifying means for specifying at least one of the recognized words as a search keyword, A search unit that searches for the corresponding electronic document from the electronic document storage unit using the specified search keyword, wherein the keyword specifying unit is configured to execute the search based on at least one of the recognition likelihood and the elapsed time from the recognition time. A search weight according to a search purpose is created, and a word having a relatively high created search weight is preferentially used as the search keyword. It is characterized in.

【００１０】本発明の他の文書検索装置は、上述の電子
文書蓄積手段、単語認識手段、キーワード特定手段のほ
か、前記特定された検索キーワードの検索重みと当該検
索キーワードが各電子文書にとってどれだけ重要な単語
かどうかを表す単語重要度とを積算して検索重み付き単
語重要度を電子文書毎に算出し、算出した検索重み付き
単語重要度がより高い電子文書を前記電子文書蓄積手段
から索出する検索手段と、を備えて構成されていること
を特徴とする。According to another document search apparatus of the present invention, in addition to the above-described electronic document storage means, word recognition means, and keyword specifying means, the search weight of the specified search keyword and how much the search keyword is for each electronic document. A search weighted word importance is calculated for each electronic document by integrating the word importance indicating whether the word is an important word, and an electronic document having a higher calculated search weighted word importance is retrieved from the electronic document storage means. And search means for issuing the search result.

【００１１】本発明の他の文書検索装置は、上記各文書
検索装置において、前記キーワード情報が、連続的に発
話された文音声であり、前記単語認識手段が前記文音声
の無音区間を区切りとして繰り返し音声認識を行うよう
に構成されていることを特徴とする。この場合、前記単
語認識手段は、予め前記電子文書に含まれる単語をリス
ト化しておき、各リスト化された単語を用いて音声認識
を行うように構成される。In another document search device of the present invention, in each of the above document search devices, the keyword information is a sentence voice continuously uttered, and the word recognizing means sets a silent section of the sentence voice as a delimiter. It is characterized in that it is configured to perform speech recognition repeatedly. In this case, the word recognition unit is configured to list words included in the electronic document in advance, and perform speech recognition using the words listed.

【００１２】[0012]

【発明の実施の形態】以下、本発明を連続音声入力型の
文書検索装置に適用した場合の実施の形態を詳細に説明
する。図１は、本発明の一実施形態に係る文書検索装置
の構成図である。この文書検索装置１は、連続して入力
される文音声の認識を行う単語認識部１０と、単語認識
部１０の出力結果に基づいて検索キーワードを特定する
キーワード特定部２０と、検索キーワードに基づいて電
子文書の検索処理を行う文書管理部３０と、検索対象と
なる大量の電子文書を蓄積した電子文書ＤＢ４０と、検
索結果を利用者に提示する出力部５０とを含んで構成さ
れる。なお、図１０に示した従来型装置と同一機能の構
成要素については同一符号を付してある。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment in which the present invention is applied to a continuous speech input type document search apparatus will be described below in detail. FIG. 1 is a configuration diagram of a document search device according to an embodiment of the present invention. The document search device 1 includes a word recognition unit 10 for recognizing a sentence voice that is continuously input, a keyword specification unit 20 for specifying a search keyword based on an output result of the word recognition unit 10, and a search unit based on the search keyword. A document management unit 30 for performing a search process of an electronic document, an electronic document DB 40 storing a large amount of electronic documents to be searched, and an output unit 50 for presenting a search result to a user. Components having the same functions as those of the conventional device shown in FIG. 10 are denoted by the same reference numerals.

【００１３】単語認識部１０は、入力された音声中の無
音区間を検出して有声音区間を抽出する前処理部１１、
予め検索対象となる文書群のそれぞれを代表する単語を
リスト化した単語リスト１３、及び、前処理部１１の出
力結果と単語リスト１３とを用いて音声認識を行う音声
認識部１２を備えて構成される。音声認識部１２は、無
音区間を区切りに繰り返し音声認識を行うことにより、
先行入力された音声に続いて入力される音声（付加説
明、詳細化説明、訂正等の発声音）に対して継続的に単
語を認識して出力できるようになっている。The word recognition section 10 detects a silent section in the input speech and extracts a voiced section,
A word list 13 in which words representing each of the documents to be searched are listed in advance, and a speech recognition unit 12 that performs speech recognition using the output result of the preprocessing unit 11 and the word list 13 Is done. The voice recognition unit 12 performs voice recognition repeatedly with a silent section as a delimiter,
Words can be continuously recognized and output for a voice (an utterance sound such as an additional description, a detailed description, and a correction) that is input after the preceding input voice.

【００１４】キーワード特定部２０は、単語認識部１０
から出力される処理結果を受けるパラメータ取得部２
１、処理結果に検索重みを付与する重み情報付与部２
２、重み情報付与部２２において検索重みを付与された
単語を保存する単語保存部２３、単語同士及びその検索
重みのマージを行う単語マージ部２４、及び、各単語に
付与された検索重みを比較することにより一つまたは複
数の検索キーワードを特定する単語特定部２５を備えて
構成される。The keyword specifying unit 20 includes the word recognizing unit 10
Acquisition unit 2 receiving the processing result output from
1. Weight information assigning unit 2 for assigning a search weight to the processing result
2. A word storage unit 23 for storing words to which search weights have been assigned by the weight information assigning unit 22, a word merging unit 24 for merging words and their search weights, and comparing the search weights assigned to each word. Thus, a word specifying unit 25 for specifying one or a plurality of search keywords is provided.

【００１５】文書管理部３０は、特定された検索キーワ
ードを受けて適当な電子文書を電子文書ＤＢ４０から検
索するための検索情報テーブル３１、索出する電子文書
を特定するための情報である検索重み付き単語重要度を
処理する重要度決定部３２、及び、電子文書を入力する
文書入力部３３を有し、図示しない検索エンジンによっ
て索出された電子文書を出力部５０に出力する。The document management section 30 receives a specified search keyword and searches for an appropriate electronic document from the electronic document DB 40 by using a search information table 31 and a search weight which is information for specifying an electronic document to be searched. It has an importance determining unit 32 for processing the attached word importance and a document input unit 33 for inputting an electronic document, and outputs an electronic document retrieved by a search engine (not shown) to an output unit 50.

【００１６】次に、図２〜図９を参照して各部の処理手
順を説明する。図２は、単語認識部１０における処理手
順図である。ここでは、図４（ａ），（ｂ）に示される
内容の文音声が、ポーズで区切られて連続的に入力され
たとする。単語認識部１０では、まず、図４（ａ）の内
容の文音声が入力されると（ステップＳ１０１：Ye
s）、前処理部１１において音声中の無音区間を検出し
（ステップＳ１０２）、有声音区間を抽出する（ステッ
プＳ１０３）。その後、抽出した有声音区間での音声の
特徴を抽出する（ステップＳ１０４）。音声認識部１２
では、抽出した個々の音声特徴と単語リスト１３とを照
合し、検索キーワードとして使用される可能性のある単
語（以下、キーワード候補）を認識するとともに、各キ
ーワード候補の認識尤度と認識時点からの経過時間を計
算する（ステップＳ１０５，Ｓ１０６）。Next, the processing procedure of each unit will be described with reference to FIGS. FIG. 2 is a processing procedure diagram in the word recognition unit 10. Here, it is assumed that sentence voices having the contents shown in FIGS. 4A and 4B are continuously input by being separated by a pause. First, when the sentence voice having the content shown in FIG. 4A is input to the word recognition unit 10 (step S101: Ye)
s) The pre-processing unit 11 detects a silent section in the voice (step S102) and extracts a voiced section (step S103). Then, the feature of the voice in the extracted voiced sound section is extracted (step S104). Voice recognition unit 12
Then, the extracted individual voice features are collated with the word list 13 to recognize words that may be used as search keywords (hereinafter, keyword candidates), and based on the likelihood of each keyword candidate and the recognition time point. Is calculated (steps S105 and S106).

【００１７】図５（ａ）は、図４（ａ）の内容の文音声
から認識されたキーワード候補とその認識尤度、経過時
間の例を示すものである。ここでは、文音声中に、「ス
ノーボード」、「スキー場」、「高速」、「中央道」の
単語が含有されているが、キーワード候補として正確に
認識されたのは「スノーボード」と「高速」のみであ
り、「スキー場」については脱落し、「出来高」につい
ては挿入誤りによって認識されている。また、「中央
道」が誤認識された結果、「中部地方」が得られてい
る。認識尤度は「スノーボード」については“０．９
６”、「出来高」については“０．６４”、「高速」に
ついては“０．７０”、「中部地方」については“０．
７２”となっている。また、経過時間は、「スノーボー
ド」を基準としてそこからの時間が計算されている。FIG. 5A shows an example of a keyword candidate recognized from the sentence voice having the content of FIG. 4A, its recognition likelihood, and elapsed time. Here, the words "snowboard", "ski area", "high speed", and "chuo-michi" are contained in the sentence voice, but the words "snowboard" and "high speed" were correctly recognized as keyword candidates. ”, The“ ski area ”is dropped off, and the“ volume ”is recognized by an insertion error. Also, as a result of misrecognition of "Chuo Expressway", "Chubu region" is obtained. The recognition likelihood is “0.9” for “snowboard”.
6 "," 0.64 "for" volume "," 0.70 "for" high speed ", and" 0.
72 ". The elapsed time is calculated based on" snowboard ".

【００１８】以上の処理を、図４（ｂ）の内容の文音声
についても繰り返す（ステップＳ１０７：Yes）。その
結果、図５（ｂ）のような認識結果が得られる。図５
（ｂ）を参照すると、２回目に入力された文音声につい
ては、すべて正しく認識され、「諏訪インター」、「ス
ノーボード」というキーワード候補が得られている。な
お、こでの経過時間は、１回目の文音声に含まれる「ス
ノーボード」が基準となっている。The above processing is repeated for sentence voices having the contents shown in FIG. 4B (step S107: Yes). As a result, a recognition result as shown in FIG. 5B is obtained. FIG.
Referring to (b), all of the sentence voices input the second time are correctly recognized, and keyword candidates “Suwa Inter” and “Snowboard” are obtained. Here, the elapsed time is based on “snowboard” included in the first sentence voice.

【００１９】検索キーワード候補の認識が終了した場
合、単語認識部１０は、各キーワード候補に、それぞれ
認識尤度と認識時刻とを付加してキーワード特定部２０
に出力する（ステップＳ１０８）。When the recognition of the search keyword candidates is completed, the word recognition unit 10 adds the recognition likelihood and the recognition time to each of the keyword candidates, and adds the recognition time to the keyword identification unit 20.
(Step S108).

【００２０】図３は、キーワード特定部２０における処
理手順図である。キーワード特定部２０では、単語認識
部１０から出力された各キーワード候補の認識尤度と経
過時間とをパラメータ取得部２１で取得して重み情報付
与部２２に入力する（ステップＳ２０１）。FIG. 3 is a processing procedure diagram in the keyword specifying unit 20. In the keyword specifying unit 20, the parameter likelihood unit 21 obtains the recognition likelihood and the elapsed time of each keyword candidate output from the word recognition unit 10 and inputs them to the weight information giving unit 22 (step S201).

【００２１】重み情報付与部２２では、各検索キーワー
ドの検索重みを計算する（ステップＳ２０２）。すなわ
ち、各キーワード候補の認識尤度をＬとすると、この認
識尤度Ｌに基づく検索重みＳＷを、予め検索目的に応じ
て決定された重み係数αと認識尤度Ｌとの乗算によって
計算する。また、各キーワード候補に付与された経過時
間をもとに、認識時点から現在までの時間経過が長いほ
ど検索重みが減少するような重み付けを行う。ここで
は、新たに認識されたキーワード候補のみならず、過去
に認識されたキーワード候補の検索重みに対しても再び
重み付けを行う。この場合の重み付けは、例えば、重み
付け前後の検索重みをそれぞれＢＷ及びＡＷ、時間をＴ
とすると、「ＡＷ＝（β・ＢＷ）の−γＴ乗（β、γは
重み係数）」、あるいは「ＡＷ＝β・ＢＷ／γＴ」等の
計算式によって求めることができる。The weight information assigning unit 22 calculates a search weight of each search keyword (step S202). That is, assuming that the recognition likelihood of each keyword candidate is L, a search weight SW based on the recognition likelihood L is calculated by multiplying the recognition likelihood L by a weight coefficient α determined in advance according to the search purpose. Further, based on the elapsed time given to each keyword candidate, weighting is performed so that the search weight decreases as the time elapsed from the recognition time to the present becomes longer. Here, not only the newly recognized keyword candidates but also the search weights of the previously recognized keyword candidates are weighted again. In this case, the weights are, for example, BW and AW for the search weights before and after weighting, and T
Then, it can be obtained by a calculation formula such as “AW = (β · BW) to the power of −γT (β and γ are weighting factors)” or “AW = β · BW / γT”.

【００２２】図６（ａ）は、最初の文音声から認識され
た各キーワード候補の検索重み、図６（ｂ）は、２回目
に入力された文音声から認識された各キーワード候補の
検索重みを示す図である。また、図７は、過去に認識さ
れたキーワード候補の検索重みに対して再び重み付けが
なされた結果を示す図である。図７から明らかなよう
に、最初の文音声から認識された４つのキーワード候補
の検索重みは、２回目の文音声からキーワード候補が認
識されるまでに一定時間が経過しているので、それぞれ
図６（ａ）に示した検索重みよりも一律に小さくなって
いる。FIG. 6A shows the search weight of each keyword candidate recognized from the first sentence voice, and FIG. 6B shows the search weight of each keyword candidate recognized from the second sentence voice. FIG. FIG. 7 is a diagram illustrating a result of re-weighting the search weight of the keyword candidate recognized in the past. As is apparent from FIG. 7, the search weights of the four keyword candidates recognized from the first sentence voice have a predetermined time since the keyword candidates are recognized from the second sentence voice. 6 (a) is uniformly smaller than the search weight.

【００２３】単語マージ部２４は、過去に認識されたキ
ーワード候補の中に、新たに認識したキーワード候補と
同一のものが存在するかどうかを調査し、存在する場合
は（ステップＳ２０３：Yes）、そのキーワード候補同
士をマージして検索重みの再計算を行う（ステップＳ２
０４，Ｓ２０５）。すなわち、検索重みをさらに大きく
する。本例では、「スノーボード」が共通して認識され
たため、これをマージする。マージ後の検索重みは、例
えば、同一キーワード候補のそれぞれの検索重みの加算
値に所定の重み係数を乗算することによって求める。本
例の場合、最初に文音声から認識された「スノーボー
ド」の検索重みをＷ１、２回目の「スノーボード」の検
索重みをＷ２とすると、マージ後の検索重みＭＷは、
「δ（Ｗ１＋Ｗ２）（但し、δは重み係数）」の計算式
より求めることができる。検索重みの再計算が終了した
場合、あるいはステップＳ２０３において同一キーワー
ド候補がなかった場合（ステップＳ２０３：No）は、検
索重みが付与された各キーワード候補を、単語保存部２
３で保存しておく（ステップＳ２０６）。The word merging unit 24 checks whether or not the same keyword candidate as the newly recognized keyword candidate exists in the past recognized keyword candidates, and if it exists (step S203: Yes), The keyword weights are merged with each other, and the search weight is recalculated (step S2).
04, S205). That is, the search weight is further increased. In this example, since “snowboard” is commonly recognized, it is merged. The search weight after merging is determined, for example, by multiplying the sum of the search weights of the same keyword candidate by a predetermined weight coefficient. In the case of this example, if the search weight of “snowboard” first recognized from the sentence speech is W1, and the search weight of the second “snowboard” is W2, the search weight MW after merging is
It can be obtained from the calculation formula of “δ (W1 + W2) (where δ is a weight coefficient)”. When the recalculation of the search weight is completed, or when there is no identical keyword candidate in step S203 (step S203: No), each keyword candidate to which the search weight is assigned is stored in the word storage unit 2.
It is stored in step 3 (step S206).

【００２４】図８は、本例の場合に保存されるキーワー
ド候補及びその検索重みを示したものである。単語特定
部２３は、この保存された各キーワード候補の検索重み
を評価し（ステップＳ２０７）、検索重みが相対的に大
きいキーワード候補を検索キーワードとして特定して文
書管理部３０に出力する（ステップＳ２０８）。このよ
うにして特定された検索キーワードは、検索目的に最も
適合すると推定されるものである。FIG. 8 shows keyword candidates stored in the case of this example and their search weights. The word specifying unit 23 evaluates the search weight of each of the stored keyword candidates (step S207), specifies a keyword candidate having a relatively large search weight as a search keyword, and outputs the keyword candidate to the document management unit 30 (step S208). ). The search keyword specified in this way is the one that is estimated to be most suitable for the search purpose.

【００２５】例えば、本例では、「スノーボード」が、
２回の音声入力時にいずれも認識されており、利用者が
特に関心を持っていることが推定される。この場合の検
索重みは、最初の値“０．９２”から２回目には上記マ
ージ処理によって“１．５７”に大きくなっている。一
方、単語認識部１０が誤って認識した「出来高」、「中
部地方」は、最初の音声入力時には、それぞれ認識尤度
“０．６２”、“０．７２”であったが、２回目の音声
入力時には認識されなかったため、検索重みが、「出来
高」については“０．６２”から“０．５６”へ、「中
部地方」については“０．７２”から“０．６８”へと
低下したままとなっており、誤った単語認識がもたらす
検索結果への悪影響を軽減する働きをしている。このよ
うに、認識時点からの経過時間や複数回の認識結果等を
パラメータとすることで、検索重みが自動的に検索目的
に向けて学習されるようになり、キーワード候補をすべ
て正確に認識しなくとも、所要の検索目的を達成するこ
とが容易になる。For example, in this example, "snowboard"
Both are recognized at the time of two voice inputs, and it is presumed that the user is particularly interested. In this case, the search weight is increased to “1.57” from the first value “0.92” by the above-described merge processing for the second time. On the other hand, the "volume" and "Chubu district", which were incorrectly recognized by the word recognition unit 10, had the recognition likelihoods "0.62" and "0.72" at the time of the first speech input, but the second time Since it was not recognized at the time of voice input, the search weight decreased from “0.62” to “0.56” for “volume” and from “0.72” to “0.68” for “Chubu” It works to reduce the negative impact on search results caused by incorrect word recognition. In this way, by using the elapsed time from the recognition time and the results of multiple recognitions as parameters, the search weight is automatically learned for the search purpose, and all keyword candidates can be accurately recognized. If not, it becomes easy to achieve the required search purpose.

【００２６】結局、本例では、図８に示されるように、
「スノーボード」の検索重みが“１．５７”になってお
り、これが相対的に最も高い検索重みとなっているの
で、この「スノーボード」が検索キーワードとして文書
管理部３０に出力される。After all, in this example, as shown in FIG.
Since the search weight of “Snowboard” is “1.57”, which is the relatively highest search weight, this “Snowboard” is output to the document management unit 30 as a search keyword.

【００２７】図４は文書管理部３０における処理手順図
である。文書管理部３０では、キーワード特定部２０
で、検索キーワードとして特定された「スノーボード」
を取得し（ステップＳ３０１)、この「スノーボード」
に適合する電子文書を検索する。このとき、重要度決定
部３２において、検索対象となる複数の電子文書の各々
について検索重み付き単語重要度を求める。FIG. 4 is a processing procedure diagram in the document management unit 30. In the document management unit 30, the keyword specifying unit 20
"Snowboard" identified as a search keyword
(Step S301), and this "snowboard"
Search for electronic documents that match. At this time, the importance determining unit 32 obtains a search-weighted word importance for each of the plurality of electronic documents to be searched.

【００２８】すなわち、検索情報テーブル３１を通じて
電子文書ＤＢ４０から最初の電子文書を取り出し（ステ
ップＳ３０２)、検索キーワードがその電子文書にとっ
てどの程度重要な単語かどうかを表す単語重要度を算出
する（ステッップＳ３０３）。この単語重要度は、電子
文書における単語の出現頻度等によって定めることがで
きる。その後、この単語重要度とキーワード特定部２０
において付与された当該検索キーワードの検索重みとを
積算して、検索重み付き単語重要度を求める（ステッッ
プＳ３０４）。この検索重み付き単語重要度ＷＷＩは、
検索キーワードの単語重要度をＷＩ、検索キーワードに
付与された検索重みをＳＷとすると、例えば「ｅ（ＷＩ
・ＳＷ）」の計算式によって算出することができる。た
だし「ｅ」は重み係数である。That is, the first electronic document is extracted from the electronic document DB 40 through the search information table 31 (step S302), and the word importance indicating how important the search keyword is to the electronic document is calculated (step S303). ). The word importance can be determined by the frequency of appearance of words in the electronic document. After that, the word importance and the keyword specifying unit 20
Is added to the search weight of the search keyword assigned in the step (a) to obtain a search-weighted word importance (step S304). This search weighted word importance WWI is
Assuming that the word importance of the search keyword is WI and the search weight given to the search keyword is SW, for example, “e (WI
SW) ”. Here, “e” is a weight coefficient.

【００２９】この処理を検索対象となるすべての電子文
書について繰り返した後（ステッップＳ３０５）、各検
索重み付き単語重要度を評価する（ステッップＳ３０
６）。そして、検索重み付き単語重要度が最も高い電子
文書を索出し、これを出力部５０に出力する（ステッッ
プＳ３０７）。このように、検索重み付き単語重要度に
基づいて検索処理を行うようにすることで、目的達成度
の高い電子文書を容易に索出できるようになる。After this processing is repeated for all the electronic documents to be searched (step S305), each search-weighted word importance is evaluated (step S30).
6). Then, an electronic document having the highest search-weighted word importance is searched for and output to the output unit 50 (step S307). As described above, by performing the search processing based on the search-weighted word importance, it is possible to easily search for an electronic document with a high degree of attainment of the purpose.

【００３０】以上の説明のとおり、本実施形態の文書検
索装置１は、大量の電子文書が蓄積された電子文書ＤＢ
４０から、必要な電子文書を説明するために人間に対し
て話すように連続的に発声された文音声を認識して目的
の電子文書を自動的に索出するので、利用者は、検索に
有効な検索キーワードを考えてこれをキーボード等から
入力する必要がなくなり、他の作業を行いながらの検索
が可能になる。As described above, the document search device 1 of the present embodiment is an electronic document DB storing a large amount of electronic documents.
From 40, the user automatically recognizes the sentence voice uttered continuously as if speaking to a human to explain the necessary electronic document and automatically searches for the target electronic document. It is not necessary to consider an effective search keyword and input it from a keyboard or the like, and the search can be performed while performing other operations.

【００３１】また、入力された文音声の内容に応じて、
刻々と文書の検索及び検索結果の提示を行うので、検索
結果が期待外れであって、新たに検索をやり直す場合
に、必要な情報についての説明音声を引き続き入力する
だけで、過去の認識結果を反映した検索結果が得られる
ようになる。Further, according to the contents of the input sentence voice,
Since the search of the document and the presentation of the search result are performed every moment, when the search result is disappointing and the search is newly started, the past recognition result can be obtained simply by continuously inputting the explanation voice of the necessary information. You will be able to get reflected search results.

【００３２】さらに、人間同士の対話音声をこの文書検
索装置１の入力とし、対話内容に関係する電子文書が検
索されて相手側に自動的に提示するような利用形態も可
能になり、人間同士の対話の円滑化に効果を発揮するこ
とができる。このような利用形態は、対面や音声入出力
機器を使用したオンラインの相談業務における自動ヘル
プ出力等に応用することができる。Further, it is possible to use a dialogue between humans as an input to the document search apparatus 1 to search for an electronic document related to the contents of the dialogue and automatically present it to the other party. This can be effective in facilitating the dialogue. Such a use form can be applied to automatic help output in a face-to-face or online consultation operation using a voice input / output device.

【００３３】なお、本実施形態では、連続的に入力され
る文音声から複数のキーワード候補を認識する例を示し
たが、本発明は、文音声のみならず、通常の文字認識や
パターン認識によってキーワード候補を特定する場合に
も同様に適用できるものである。In the present embodiment, an example has been described in which a plurality of keyword candidates are recognized from continuously input sentence voices. However, the present invention uses not only sentence voices but also ordinary character recognition and pattern recognition. The same applies to the case of specifying keyword candidates.

【００３４】[0034]

【発明の効果】以上の説明から明らかなように、本発明
によれば、最初のキーワード情報が正確に認識できなか
った場合であっても、後続の関係キーワード情報を補充
的に入力するだけで検索目的を達成する上で有効となる
検索キーワードが正確に特定されるので、電子文書の検
索作業が容易になる効果がある。As is apparent from the above description, according to the present invention, even if the first keyword information cannot be accurately recognized, only the subsequent related keyword information is merely input. Since a search keyword effective for achieving the search purpose is accurately specified, there is an effect that the search operation of the electronic document becomes easy.

【００３５】また、検索対象となる複数の電子文書の各
々について、検索重み付き単語重要度を評価し、評価結
果が最も高い文書を索出するようにしたので、電子文書
の検索精度が高まる効果がある。In addition, for each of a plurality of electronic documents to be searched, the importance of a search-weighted word is evaluated, and the document having the highest evaluation result is searched for, so that the search accuracy of the electronic document is improved. There is.

【００３６】また、本発明の文書検索装置は、連続的に
発話された文音声もキーワード情報として用いることが
できるので、応用場面を音声入出力機器を用いたサービ
スシステムや、通信手段を用いたネットワークサービス
にまで拡大することができるようになる。さらに、単語
リストを用いて、文音声の無音区間を区切りに繰り返し
音声認識が行われるので、検出誤りが抑制される効果も
ある。Further, since the sentence voice continuously uttered can be used as the keyword information in the document search apparatus of the present invention, the application scene can be used by a service system using a voice input / output device or a communication means. It can be extended to network services. Furthermore, since the speech recognition is repeatedly performed using the word list at intervals of the silent section of the sentence speech, there is also an effect that detection errors are suppressed.

[Brief description of the drawings]

【図１】本発明の一実施形態に係る音声入力型文書検索
装置の機能ブロック図。FIG. 1 is a functional block diagram of a voice input type document search device according to an embodiment of the present invention.

【図２】本実施形態における単語認識部の処理手順説明
図。FIG. 2 is an explanatory diagram of a processing procedure of a word recognition unit in the embodiment.

【図３】本実施形態におけるキーワード特定部の処理手
順説明図。FIG. 3 is an explanatory diagram of a processing procedure of a keyword specifying unit in the embodiment.

【図４】本実施形態における文書管理部の処理手順図。FIG. 4 is a processing procedure diagram of a document management unit in the embodiment.

【図５】（ａ）は最初の入力音声の内容、（ｂ）は２回
目の入力音声の内容を示す説明図。FIG. 5A is an explanatory diagram showing the content of a first input voice, and FIG. 5B is a diagram showing the content of a second input voice;

【図６】（ａ）は最初の入力音声の認識結果、（ｂ）は
２回目の入力音声の認識結果を示す図表。FIG. 6A is a chart showing a recognition result of a first input voice, and FIG. 6B is a chart showing a recognition result of a second input voice;

【図７】２回目の認識結果に対する検索重みを算出する
際に再計算された最初の認識結果（過去のキーワード候
補）の検索重みを示す図表。FIG. 7 is a chart showing search weights of the first recognition result (past keyword candidates) recalculated when calculating the search weight for the second recognition result.

【図８】単語保存部に保存されるキーワード候補とその
検索重みを示す図表。FIG. 8 is a table showing keyword candidates stored in a word storage unit and their search weights.

【図９】本実施形態における文書管理部の処理手順図。FIG. 9 is a processing procedure diagram of a document management unit in the embodiment.

【図１０】従来の文書検索装置の機能ブロック図。FIG. 10 is a functional block diagram of a conventional document search device.

[Explanation of symbols]

１本実施形態による音声入力型文書検索装置２従来の文書検索装置１０単語認識部１１前処理部１２音声認識部１３単語リスト２０キーワード特定部２１パラメータ取得部２２重み情報付与部２３単語保存部２４単語マージ部２５単語決定部３０、７０文書管理部３１検索情報テーブル３２重要度決定部３３文書入力部４０電子文書データベース（ＤＢ）５０出力部６０キーワード入力部 1 Speech input type document search device according to the present embodiment 2 Conventional document search device 10 Word recognition unit 11 Preprocessing unit 12 Speech recognition unit 13 Word list 20 Keyword specification unit 21 Parameter acquisition unit 22 Weight information addition unit 23 Word storage unit 24 Word merge unit 25 Word determination unit 30, 70 Document management unit 31 Search information table 32 Importance determination unit 33 Document input unit 40 Electronic document database (DB) 50 Output unit 60 Keyword input unit

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号ＦＩＧ０６Ｆ 15/40 ３７０Ａ 15/403 ３１０Ｚ ──────────────────────────────────────────────────の Continued on the front page (51) Int.Cl. ⁶ Identification code FIG06F 15/40 370A 15/403 310Z

Claims

[Claims]

1. A process of recognizing a plurality of words from keyword information, and adding a search weight to each recognized word based on a recognition likelihood indicating a certainty of the recognition and an elapsed time from a recognition time. A process of merging search weights between words when the same word is recognized a plurality of times; a process of specifying a word whose search weight is relatively large as a search keyword; With respect to the search weight of the specified search keyword and the word importance indicating how important the search keyword is, a word weight with a search weight is determined. The process of identifying high electronic documents;
How to search for electronic documents containing.

2. The search method according to claim 1, wherein the search weight increases in proportion to the recognition likelihood.

3. The search method according to claim 1, wherein the search weight decreases as the elapsed time increases.

4. The retrieval method according to claim 1, wherein the keyword information is sentence voice continuously uttered by a user.

5. An electronic document storing means for storing an electronic document to be searched, a plurality of words are recognized from keyword information input at the time of searching, and a recognition likelihood and a recognition indicating a certainty of recognition of each word. A word recognizing means for adding a time and outputting; a keyword specifying means for specifying at least one of the plurality of recognized words as a search keyword; and a corresponding one of the electronic document storage means using the specified search keyword. A search unit for searching for an electronic document, wherein the keyword specifying unit creates a search weight according to a search purpose based on at least one of the recognition likelihood and an elapsed time from a recognition time, and the created search weight A document search apparatus that is configured to preferentially use a word having a relatively large as the search keyword.

6. A parameter acquisition unit for acquiring the recognition likelihood and the recognition time from the word recognition unit for each word, and at least one of the acquired recognition likelihood and an elapsed time from a recognition time. A weight information assigning unit that assigns a search weight to each word based on one of the words, a word identifying unit that identifies a predetermined number of words having a relatively large search weight, and a word that stores the word assigned the search weight A storage unit, a value obtained by multiplying a sum of search weights given to each word by a predetermined coefficient when the same word as the word recognized with the input of new keyword information is present in the word storage unit; The document search device according to claim 5, further comprising: a word merging unit that sets a new search weight of the word.

7. The method according to claim 1, wherein the weight information assigning unit, when assigning the search weight to the newly recognized word, re-assigns the search weight to the search weight of the previously recognized word. 7. The document search device according to claim 6, wherein:

8. An electronic document storage means for storing an electronic document to be searched, and a plurality of words are recognized from keyword information inputted at the time of search, and a recognition likelihood indicating a certainty of recognition is added to each word. Word recognition means for outputting as a search keyword a keyword specifying means for specifying at least one of the plurality of recognized words as a search keyword; and a search weight of the specified search keyword and a description of the search keyword for each electronic document. Only the word importance representing whether the word is important or not is integrated to calculate a search weighted word importance for each electronic document, and the calculated search weighted word importance is used to obtain a higher electronic document from the electronic document storage means. A document search device comprising: a search means for searching;

9. The method according to claim 8, wherein the keyword information is sentence speech uttered continuously, and the word recognizing means is configured to perform speech recognition repeatedly with a silent section of the sentence speech as a delimiter. 9. The document search device according to claim 5, wherein

10. The apparatus according to claim 1, wherein the word recognition unit lists words included in the electronic document in advance, and performs voice recognition using the words listed. Item 9. The document search device according to Item 9.