JP2015099290A

JP2015099290A - In-utterance important word extraction device and in-utterance important word extraction using the device, and method and program thereof

Info

Publication number: JP2015099290A
Application number: JP2013239472A
Authority: JP
Inventors: 吉田　明弘; Akihiro Yoshida; 明弘吉田; 裕司青野; Yuji Aono; 石原　晋也; Shinya Ishihara; 晋也石原; 豊國田; Yutaka Kunida; 義男神田; Yoshio Kanda; 野村　英司; Eiji Nomura; 英司野村; 雄二大石; Yuji Oishi
Original assignee: Nippon Telegraph and Telephone Corp; Nippon Telegraph and Telephone East Corp
Current assignee: Nippon Telegraph and Telephone Corp; Nippon Telegraph and Telephone East Corp
Priority date: 2013-11-20
Filing date: 2013-11-20
Publication date: 2015-05-28
Anticipated expiration: 2033-11-20
Also published as: JP6347939B2

Abstract

PROBLEM TO BE SOLVED: To provide an in-utterance important word extraction device capable of extracting an important word from utterance information without creating a keyword list for each user.SOLUTION: A morpheme analysis part outputs a morpheme string comprising information on a word and a part of speech divided into morphemes through morpheme analysis of text information input to a text input part. A word registration part receives the morpheme string as an input and registers a word meeting a predetermined condition as a keyword in a keyword list. A speech recognition part creates and outputs a text sentence comprising a word string divided into morphemes through speech recognition processing on speech information input to a speech input part. An important word extraction part receives the text sentence as an input and extracts, as an important word, a word matching any of keywords registered in the keyword list.

Description

本発明は、発話内の重要語を抽出する発話内重要語抽出装置と、当該装置を用いた発話内重要語抽出システムと、それらの方法とプログラムに関する。 The present invention relates to an important word extraction device in an utterance for extracting an important word in an utterance, an important word extraction system in an utterance using the device, a method and a program thereof.

我々は、生活の中で声を発声して他者との間でコミュニケーションを取っている。その発声には、様々な発話情報が含まれており、その発話情報の中から重要と思われる単語を効率よく抽出することで、ライフログのような個人活動履歴や備忘録の作成などが可能だと考えられる。 We communicate with others by speaking in our lives. The utterance includes various utterance information. By efficiently extracting important words from the utterance information, it is possible to create personal activity histories such as life logs and memorandums. it is conceivable that.

利用者の利便性を考慮すると、履歴を残すために、その都度発話するのではなく、常時音声を録音しておき、普段の無意識の生活の中で発せられる発話情報から、情報が得られる方法が好ましい。その発話情報から得られた重要と思われる単語を用いた個人活動履歴や備忘録は、有用な記録になると考えられる。 Considering the convenience of the user, in order to leave a history, instead of uttering each time, the voice is always recorded, information can be obtained from the utterance information uttered in the usual unconscious life Is preferred. Personal activity histories and memorandums using words considered important obtained from the utterance information are considered to be useful records.

従来、発話内容から重要語を抽出する方法としては、予め重要語を登録したキーワードリストを用いてキーワードと一致する認識結果を重要語として抽出する方法が知られている（例えば特許文献１）。 Conventionally, as a method for extracting an important word from utterance contents, a method is known in which a recognition result matching a keyword is extracted as an important word using a keyword list in which the important word is registered in advance (for example, Patent Document 1).

特開２００８−２８６９２１号公報JP 2008-286922 A

しかし、重要語とすべき単語は利用者ごとに違うと考えられるので、従来の方法では利用者ごとにキーワードリストを作成しなくてはならない課題がある。また、各利用者の関心の対象は、日々の生活の中で変化するのが一般的であると考えられるため、関心の対象に直結するキーワードのメンテナンスを日々行う必要がある。しかし、キーワードリストを日々メンテナンスするのは現実的ではない。また、普段の会話の対話音声の発話速度は速く、不明瞭な発音をすることも多いため、誤認識した単語を重要語として誤って抽出しまう場合もある。 However, since words that should be important words are considered to be different for each user, the conventional method has a problem that a keyword list must be created for each user. In addition, since it is generally considered that the object of interest of each user changes in daily life, it is necessary to perform daily maintenance of keywords directly connected to the object of interest. However, daily maintenance of the keyword list is not practical. In addition, since the speech rate of conversational speech in a normal conversation is fast and often produces unclear pronunciation, a misrecognized word may be erroneously extracted as an important word.

本発明は、このような課題に鑑みてなされたものであり、キーワードリストを自動的に最新化すると共に、誤認識した単語を重要語として抽出するリスクを低減させた発話内重要語抽出装置と、その装置を用いた発話内重要語抽出システムとそれらの方法とプログラムを提供することを目的とする。 The present invention has been made in view of such problems, and it is possible to automatically update a keyword list and reduce the risk of extracting misrecognized words as important words, It is an object of the present invention to provide a system for extracting important words in an utterance using the apparatus, a method and a program thereof.

本発明の発話内重要語抽出装置は、キーワードリストと、テキスト入力部と、形態素解析部と、単語登録部と、音声入力部と、音声認識部と、重要語抽出部と、を具備する。キーワードリストにはキーワードが登録される。テキスト入力部は、テキスト情報が入力される。形態素解析部は、テキスト入力部に入力されたテキスト情報を形態素解析して形態素に分割した単語と品詞の情報から成る形態素列を出力する。単語登録部は、形態素解析部が出力する形態素列を入力として、所定の条件を満たす単語を、キーワードとしてキーワードリストに登録する。音声入力部は、音声情報が入力される。音声認識部は、音声入力部に入力された音声情報を音声認識処理して形態素に分割した単語列から成るテキスト文を生成して出力する。重要語抽出部は、音声認識部が出力するテキスト文を入力として、キーワードリストに登録されたキーワードの何れかと一致する単語を重要語として抽出する。 The important word extraction device in an utterance of the present invention includes a keyword list, a text input unit, a morpheme analysis unit, a word registration unit, a voice input unit, a voice recognition unit, and an important word extraction unit. Keywords are registered in the keyword list. Text information is input to the text input unit. The morpheme analysis unit outputs a morpheme string composed of word and part of speech information obtained by dividing the text information input to the text input unit into a morpheme. The word registration unit inputs the morpheme string output by the morpheme analysis unit and registers words satisfying a predetermined condition in the keyword list as keywords. Voice information is input to the voice input unit. The voice recognition unit generates and outputs a text sentence composed of a word string obtained by dividing the voice information input to the voice input unit by voice recognition processing into morphemes. The important word extraction unit receives a text sentence output from the speech recognition unit as an input, and extracts a word that matches one of the keywords registered in the keyword list as an important word.

また、この発明の発話内重要語抽出システムは、音声入力端末と、発話内重要語抽出装置と、ネットワークと、認識単語抽出サーバと、を具備する。音声入力端末は、マイクロホンで収音した音声信号を発話内重要語抽出装置に出力する。発話内重要語抽出装置は、入力されるテキスト情報から得た所定の条件を満たす単語をキーワードとして登録すると共に、音声入力端末が出力する音声信号を音声ファイルとして録音した音声ファイルから人が発声した音声部分のみを切り出した音声区間の音声信号とその発声開始時刻情報をネットワークを介して認識単語抽出サーバに送信し、当該認識単語抽出サーバから受信したキーワードと一致する認識単語情報の単語を、重要語として抽出して出力する。認識単語抽出サーバは、発話内重要語抽出装置から音声区間の音声信号と発声開始時刻情報とを受信し、音声区間の音声信号を音声認識処理して形態素に分割した単語列から成るテキスト文を生成し、当該テキスト文を構成する単語を認識単語情報として発話内重要語抽出装置に出力する。 Moreover, the important word extraction system in an utterance of the present invention includes a voice input terminal, an important word extraction device in an utterance, a network, and a recognized word extraction server. The voice input terminal outputs the voice signal picked up by the microphone to the important word extraction device in the utterance. In-speech important word extraction device registers a word satisfying a predetermined condition obtained from input text information as a keyword, and a person utters from a voice file in which a voice signal output from a voice input terminal is recorded as a voice file The speech signal of the speech segment obtained by extracting only the speech part and the utterance start time information are transmitted to the recognition word extraction server via the network, and the word of the recognition word information that matches the keyword received from the recognition word extraction server is important. Extract and output as words. The recognized word extraction server receives a speech signal and speech start time information from the speech important word extraction device, and performs a speech recognition process on the speech signal in the speech section and converts a text sentence including a word string divided into morphemes. It generates and outputs the words constituting the text sentence as recognition word information to the utterance important word extraction apparatus.

本発明の発話内重要語抽出装置によれば、重要語を特定するキーワードリストは、利用者が入力するテキスト情報から自動的に作成されるので、キーワードリストを作成する手間を省力化する効果を奏する。また、利用者が入力するテキスト情報は、利用者の関心の対象そのものを表現する場合が多く、その誤入力も少ないと考えられるので、そのテキスト情報からキーワードを得るこの発明の方法は重要語を精度良く抽出することができる。 According to the important word extraction device in an utterance of the present invention, since the keyword list for specifying the important word is automatically created from the text information input by the user, it is possible to save labor for creating the keyword list. Play. In addition, since the text information input by the user often expresses the object of interest of the user itself, and it is considered that the erroneous input is less likely, the method of the present invention for obtaining a keyword from the text information uses the important word. It can be extracted with high accuracy.

また、本発明の発話内重要語抽出システムによれば、比較的に処理の重い音声認識処理を、ネットワークを介した認識単語抽出サーバに分担させるので、発話内重要語抽出装置の構成を簡単にすることができる。その結果、発話内重要語抽出装置を安価にすると共に小型化することができる。 Further, according to the important word extraction system in an utterance of the present invention, since the relatively heavy processing speech recognition processing is shared by the recognized word extraction server via the network, the configuration of the important word extraction device in the utterance can be simplified. can do. As a result, the key word extraction device in the utterance can be made inexpensive and downsized.

本発明の発話内重要語抽出装置１００の機能構成例を示す図。The figure which shows the function structural example of the important word extraction apparatus 100 in an utterance of this invention. 発話内重要語抽出装置１００の動作フローを示す図であり、（ａ）はテキスト入力後の処理、（ｂ）は音声入力後の処理を示す。It is a figure which shows the operation | movement flow of the important word extraction apparatus 100 in an utterance, (a) shows the process after text input, (b) shows the process after speech input. テキスト入力部１１０の機能構成例を示す図。The figure which shows the function structural example of the text input part 110. FIG. キーワードリスト１４０の一例を示す図。The figure which shows an example of the keyword list | wrist 140. FIG. 本発明の発話内重要語抽出システム２００のシステム構成を示す図。The figure which shows the system configuration | structure of the important word extraction system 200 in an utterance of this invention. 発話内重要語抽出システム２００の動作シーケンスを示す図。The figure which shows the operation | movement sequence of the important word extraction system 200 in an utterance. 本発明の発話内重要語抽出システム３００のシステム構成を示す図。The figure which shows the system configuration | structure of the important word extraction system 300 in an utterance of this invention. 発話内重要語抽出装置３２０の外観とその利用場面の一例を示す図。The figure which shows an example of the external appearance of the important word extraction apparatus 320 in an utterance, and its utilization scene.

以下、この発明の実施の形態を図面を参照して説明する。複数の図面中同一のものには同じ参照符号を付し、説明は繰り返さない。 Embodiments of the present invention will be described below with reference to the drawings. The same reference numerals are given to the same components in a plurality of drawings, and the description will not be repeated.

図１に、この発明の発話内重要語抽出装置１００の機能構成例を示す。その動作フローを図２に示す。発話内重要語抽出装置１００は、キーワードリスト１４０と、テキスト入力部１１０と、形態素解析部１２０と、単語登録部１３０と、音声入力部１５０と、音声認識部１６０と、重要語抽出部１７０と、制御部１８０と、を具備する。発話内重要語抽出装置１００は、例えばＲＯＭ、ＲＡＭ、ＣＰＵ等で構成されるコンピュータに所定のプログラムが読み込まれて、ＣＰＵがそのプログラムを実行することで実現されるものである。以降で説明する他の実施例についても同様である。 FIG. 1 shows an example of the functional configuration of an apparatus for extracting important words in an utterance 100 of the present invention. The operation flow is shown in FIG. The utterance important word extraction device 100 includes a keyword list 140, a text input unit 110, a morpheme analysis unit 120, a word registration unit 130, a voice input unit 150, a voice recognition unit 160, and a key word extraction unit 170. And a control unit 180. The in-speech important word extraction device 100 is realized by reading a predetermined program into a computer composed of, for example, a ROM, a RAM, a CPU, and the like, and executing the program by the CPU. The same applies to other embodiments described below.

キーワードリスト１４０には、キーワードが登録される。テキスト入力部１１０には、テキスト情報が入力される。図３に、テキスト入力部１１０の機能構成例を示す。テキスト入力部１１０は、キーボード１１１と、タブレット１１２と、キーボードインターフェース１１３と、タブレットインターフェース１１４と、制御部１８０と、キャッシュメモリ１１６と、を具備する。キーボード１１１、タブレット１１２、キーボードインターフェース１１３、タブレットインターフェース１１４、は一般的なものである。制御部１８０も、発話内重要語抽出装置１００を例えばコンピュータで構成した場合にＣＰＵとＲＡＭとＲＯＭで構成される一般的なものである。制御部１８０は、発話内重要語抽出装置１００の各部の時系列動作等を制御するものであり、その一部の機能として、キーボード１１１から入力された情報に基づいて図示しないウェブブラウザを起動したニュースの閲覧や、情報を検索する等の動作を制御してテキスト情報を出力するテキスト入力部１１０の機能を分担する。キャッシュメモリ１１６は、ウェブブラウザの一時ファイルである。 Keywords are registered in the keyword list 140. Text information is input to the text input unit 110. FIG. 3 shows a functional configuration example of the text input unit 110. The text input unit 110 includes a keyboard 111, a tablet 112, a keyboard interface 113, a tablet interface 114, a control unit 180, and a cache memory 116. The keyboard 111, the tablet 112, the keyboard interface 113, and the tablet interface 114 are general ones. The control unit 180 is also a general unit configured by a CPU, a RAM, and a ROM when the utterance important word extracting apparatus 100 is configured by a computer, for example. The control unit 180 controls time-series operations and the like of each unit of the utterance important word extraction device 100. As a part of the control unit 180, a web browser (not shown) is started based on information input from the keyboard 111. The functions of the text input unit 110 that outputs text information by controlling operations such as browsing news and searching for information are shared. The cache memory 116 is a temporary file of the web browser.

テキスト入力部１１０が出力するテキスト情報には、キャッシュメモリ１１６に一時的に記録されたテキストや、ブログやマイクロブログで利用者が投稿したテキストが含まれる。テキスト情報の具体例としては、例えば、キャッシュメモリ１１６に一時記録されたウェブ画面のタイトル文である。タイトル文は、ＨＴＭＬのタイトルタグが付与されたテキストを取り出せば容易に抽出することができる。例えば、ブラウザソフトの検索キーワードとして「日本代表」を入力して検索された記事のＨＴＭＬソース内に「＜title＞香川、イラク戦不発の「次は別」＜/title＞」があったとする。この＜title＞＜/title＞がタイトルタグであり、タイトルタグで囲まれた文がタイトル文である。 The text information output by the text input unit 110 includes text temporarily recorded in the cache memory 116 and text posted by a user on a blog or microblog. A specific example of the text information is, for example, a web screen title sentence temporarily recorded in the cache memory 116. The title sentence can be easily extracted by taking out the text with the HTML title tag. For example, it is assumed that “<title> Kagawa,“ next is different ”</ title>” that does not occur in Iraq is included in an HTML source of an article searched by inputting “Japan representative” as a search keyword of browser software. <Title> </ title> is a title tag, and a sentence surrounded by the title tags is a title sentence.

タイトル文以外の、検索キーワードや記事の全文をテキスト情報として扱っても良い。テキスト情報は、発話内重要語抽出装置１００を構成するコンピュータの例えばＲＡＭ等の記憶部に記憶される。 Other than the title sentence, the entire text of the search keyword or article may be handled as text information. Text information is memorize | stored in memory | storage parts, such as RAM, of the computer which comprises the important word extraction apparatus 100 in an utterance.

テキスト入力部１１０のキーボード１１１又はタブレット１１２からテキスト情報が入力されると、形態素解析部１２０は、テキスト入力部１１０が出力するテキスト情報を形態素解析して形態素に分割した単語と品詞の情報から成る形態素列を出力する（ステップＳ１２０）。このテキスト入力部１１０にテキスト情報が入力されたか否かの判断は、制御部１８０が行う（ステップＳ１８０のＹｅｓ）。 When text information is input from the keyboard 111 or the tablet 112 of the text input unit 110, the morpheme analysis unit 120 includes word and part of speech information obtained by dividing the text information output from the text input unit 110 into morphemes. A morpheme string is output (step S120). The control unit 180 determines whether text information is input to the text input unit 110 (Yes in step S180).

単語登録部１３０は、形態素解析部１２０が出力する形態素列を入力として、所定の条件を満たす単語を、キーワードとしてキーワードリスト１４０に登録する（ステップＳ１３２）。所定の条件とは、例えば品詞情報のことである。単語登録部１３０は、所定の条件に従って、例えば固有名詞等の名詞や動詞の品詞情報を持つ単語のみをキーワードとする。 The word registration unit 130 receives the morpheme string output from the morpheme analysis unit 120 as an input, and registers a word satisfying a predetermined condition in the keyword list 140 as a keyword (step S132). The predetermined condition is, for example, part of speech information. The word registration unit 130 uses only words having part-of-speech information of nouns and verbs as proper keywords, for example, according to predetermined conditions.

単語登録部１３０は、形態素解析部１２０が出力する品詞の情報が、所定の条件に合致しない単語はキーワードとして登録しない（ステップＳ１３１のＮｏ）。この形態素解析部１２０と単語登録部１３０の処理は、テキスト入力部１１０にテキスト情報が入力される度に、そのテキスト情報が終了するまで繰り返される（ステップＳ１８１のＮｏ）。 The word registration unit 130 does not register a word whose part of speech information output from the morphological analysis unit 120 does not match a predetermined condition as a keyword (No in step S131). The processes of the morphological analysis unit 120 and the word registration unit 130 are repeated every time text information is input to the text input unit 110 until the text information ends (No in step S181).

図４に、キーワードリスト１４０に登録されたキーワードの一例を示す。キーワードリスト１４０には、最低限必要な情報として例えば「渋谷」や「初台」等の単語のみのキーワードを表す単語が登録されれば良い。それらの単語に加えて、図４の１列目に示すようにキーワードを識別する識別子（ＩＤ）を付与しても良い。また、そのキーワードを登録した「登録契機」、「登録日時」、「最終利用日時」、なども登録するようにしても良い。ここで「登録契機」とは、キーワードの種別を表す情報である。「初期登録」は、発話内重要語抽出装置１００が予め持つキーワードであり、消去されないものである。「テキスト入力」は、ある日時のブログやマイクロブログ、テキストメモなどで利用者が入力したテキスト情報から登録されたキーワードであり、所定時間経過後に消去しても良いものである。「Ｗｅｂ閲覧」は、利用者がインターネットをブラウジングして取得したテキスト情報から抽出したキーワードの種別である。 FIG. 4 shows an example of keywords registered in the keyword list 140. In the keyword list 140, for example, words representing only keywords such as “Shibuya” and “Hatsudai” may be registered as minimum necessary information. In addition to those words, an identifier (ID) for identifying a keyword may be given as shown in the first column of FIG. In addition, “registration opportunity”, “registration date / time”, “last use date / time”, and the like that registered the keyword may be registered. Here, “registration opportunity” is information indicating the type of keyword. “Initial registration” is a keyword that the utterance important word extraction apparatus 100 has in advance and is not deleted. “Text input” is a keyword registered from text information entered by a user on a blog, microblog, text memo, etc. at a certain date and time, and may be deleted after a predetermined time has elapsed. “Web browsing” is a type of keyword extracted from text information acquired by a user browsing the Internet.

キーワードは、「登録日時」又は「最終利用日時」を基準に計時した所定時間経過後にリセットするようにしても良い。キーワードをリセットする場合、発話内重要語抽出装置１００は、図１に破線で示すキーワードリストリセット部１９０を具備する。 The keyword may be reset after elapse of a predetermined time measured based on “registration date” or “last use date”. When resetting a keyword, the important word extraction device 100 in the utterance includes a keyword list reset unit 190 indicated by a broken line in FIG.

キーワードリストリセット部１９０は、例えば「最終利用日時」から所定時間の経過を計時した後に、キーワードを個別にリセットする。ここで所定時間の経過は、例えば、２４時間、２日、３日、一週間後など、複数の期間が考えられ、例えば「登録契機」の情報に対応させて、その種別に応じて期間を変えても良い。 The keyword list resetting unit 190 resets the keywords individually after, for example, measuring the passage of a predetermined time from the “last use date”. Here, the elapse of the predetermined time may be, for example, a plurality of periods such as 24 hours, 2 days, 3 days, or a week later. For example, according to the information of “registration opportunity”, the period may be set according to the type. You can change it.

又は、例えば利用者からのテキスト情報を入力する操作入力に連動させた外部から入力されるリセット信号によって、キーワードリスト１４０をリセットするようにしても良い。この場合の所定時間は、利用者が操作する間隔であり不定期な時間となる。又は、キーワードリストリセット部１９０を、所定の時間周期でリセット信号を出力するように構成しても良い。この場合は、所定の時間間隔でキーワードリスト１４０に登録されたキーワードが全部一度に消去される。このような所定時間でキーワードリスト１４０をリセットすることで、最新の重要語で構成されるキーワードリスト１４０を保持し続けることができる。 Alternatively, for example, the keyword list 140 may be reset by a reset signal input from the outside in conjunction with an operation input for inputting text information from a user. The predetermined time in this case is an interval operated by the user, and is an irregular time. Alternatively, the keyword list reset unit 190 may be configured to output a reset signal at a predetermined time period. In this case, all the keywords registered in the keyword list 140 are deleted at a predetermined time interval. By resetting the keyword list 140 at such a predetermined time, the keyword list 140 composed of the latest important words can be kept.

音声入力部１５０に音声情報が入力されると（ステップＳ１８３のＹｅｓ）、音声認識部１６０はその音声情報を音声認識処理して形態素に分割した単語列から成るテキスト文を生成して出力する（ステップＳ１６０）。音声入力部１５０に入力される音声情報は、例えばサンプリング周波数１６ｋＨｚで離散的なディジタル信号に変換された音声信号に変換される。音声認識部１６０は、離散値化された音声信号の所定数（例えば３２０個）を１フレームとしたフレーム毎に、例えばメル周波数ケプストラム係数（ＭＦＣＣ）分析によって音響特徴量を求め、音響尤度と言語尤度の最も高い形態素に分割した単語列から成るテキスト文を生成して出力する。また、テキスト文と同時に単語毎の信頼度を出力してもよく、信頼度はＮベスト候補における単語の事後確率に基づいて求める方法、例えば参考文献１（Frank Wessel , Ralf Schluter , Klaus Macherey and Hermann Ney, “Confidence Measures for Large Vocabulary Continuous Speech Recognition”，IEEE Transactions on Speech and Audio Processing，Vol.9，No.3，March 2001．）などを用いれば良い。 When speech information is input to the speech input unit 150 (Yes in step S183), the speech recognition unit 160 generates and outputs a text sentence composed of a word string divided into morphemes by performing speech recognition processing on the speech information ( Step S160). Audio information input to the audio input unit 150 is converted into an audio signal converted into a discrete digital signal at a sampling frequency of 16 kHz, for example. The speech recognition unit 160 obtains an acoustic feature amount by, for example, mel frequency cepstrum coefficient (MFCC) analysis for each frame in which a predetermined number (eg, 320) of discrete speech signals is one frame, and the acoustic likelihood and Generate and output a text sentence consisting of word strings divided into morphemes with the highest language likelihood. Also, the reliability of each word may be output simultaneously with the text sentence, and the reliability is calculated based on the posterior probability of the word in the N best candidates, for example, Reference 1 (Frank Wessel, Ralf Schluter, Klaus Macherey and Hermann Ney, “Confidence Measures for Large Vocabulary Continuous Speech Recognition”, IEEE Transactions on Speech and Audio Processing, Vol. 9, No. 3, March 2001.) may be used.

重要語抽出部１７０は、音声認識部１６０が出力するテキスト文を入力として、キーワードと一致する単語を重要語として抽出する（ステップＳ１７２）。また、ここで音声認識部１６０が出力する信頼度を利用し、キーワードと一致する単語であり、ある閾値以上の信頼度を持った認識単語のみを重要語として抽出しても良い。この音声認識部１６０と重要語抽出部１７０の処理は、音声入力部１５０に音声情報が入力される度に、その音声情報が終了するまで繰り返される（ステップＳ１８２のＮｏ）。 The important word extraction unit 170 receives the text sentence output from the speech recognition unit 160 and extracts a word that matches the keyword as an important word (step S172). In addition, by using the reliability output by the voice recognition unit 160, only a recognized word that has a reliability equal to or higher than a certain threshold and that matches the keyword may be extracted as an important word. The processes of the voice recognition unit 160 and the important word extraction unit 170 are repeated every time voice information is input to the voice input unit 150 until the voice information ends (No in step S182).

以上説明したように発話内重要語抽出装置１００は、テキスト入力部に入力したテキスト情報に含まれる単語や、若しくはそのテキスト情報を元にインターネットをブラウジングして取得したテキスト情報に含まれる単語などを、キーワードとしてキーワードリスト１４０に登録することができ、音声入力部１５０に入力される音声情報に含まれるキーワードリストに登録されたキーワードの何れかと一致する単語を、重要語として抽出する。 As described above, the utterance important word extraction apparatus 100 reads words included in the text information input to the text input unit, or words included in the text information acquired by browsing the Internet based on the text information. A keyword that can be registered as a keyword in the keyword list 140 and matches one of the keywords registered in the keyword list included in the voice information input to the voice input unit 150 is extracted as an important word.

発話内重要語抽出装置１００によれば、音声情報から重要語を抽出するので、ライフログのような個人活動履歴や備忘録の作成を容易にする効果を奏する。また、キーワードを特定するキーワードリストを、テキスト入力部に入力されるテキスト情報から自動的に作成するので、キーワードリストを作成する手間を省力化することができる。また、利用者が入力するテキスト情報は、利用者の関心の対象そのものを表現する場合が多く、その入力誤りも少ないと考えられるので、重要語を精度良く抽出することが可能である。 According to the utterance important word extraction apparatus 100, since an important word is extracted from voice information, an effect of facilitating creation of a personal activity history such as a life log or a memorandum can be achieved. In addition, since the keyword list for specifying the keyword is automatically created from the text information input to the text input unit, labor for creating the keyword list can be saved. In addition, text information input by the user often represents the object of interest of the user in many cases, and it is considered that there are few input errors, so that important words can be extracted with high accuracy.

なお、上記した実施例では、キーワードとしてキーワードリスト１４０に登録する単語の品詞を「名詞」や「動詞」とする例で説明を行ったが、この例に限定されない。例えば２種類以上の品詞が連結する単語列をキーワードとして登録するようにしても良い。例えば「名詞」と「接尾辞」の連続する単語列や、３個以上の単語の連結を、キーワードとして登録して、重要語を抽出するようにしても良い。キーワードを構成する品詞の組み合わせを増やすことで、重要語をより限定して特定することができる。 In the above-described embodiment, the example in which the part of speech of the word registered in the keyword list 140 as a keyword is “noun” or “verb” has been described, but the present invention is not limited to this example. For example, a word string connecting two or more types of parts of speech may be registered as a keyword. For example, a word string in which “nouns” and “suffixes” are continuous or a connection of three or more words may be registered as keywords to extract important words. By increasing the number of combinations of parts of speech that make up a keyword, it is possible to more specifically identify important words.

〔発話内重要語抽出システム〕
図５に、この発明の発話内重要語抽出装置２２０を含む発話内重要語抽出システム２００のシステム構成を示す。発話内重要語抽出システム２００は、音声入力端末２１０と、発話内重要語抽出装置２２０と、ネットワーク２４０と、認識単語抽出サーバ２５０と、を具備する。図６も参照して発話内重要語抽出システム２００の動作を説明する。 [Key words extraction system in utterance]
FIG. 5 shows a system configuration of an intra-speech key word extraction system 200 including the intra-speech key word extraction device 220 of the present invention. The utterance important word extraction system 200 includes a voice input terminal 210, an utterance important word extraction device 220, a network 240, and a recognized word extraction server 250. The operation of the utterance important word extraction system 200 will be described with reference to FIG.

音声入力端末２１０は、図示しないマイクロホンで収音（ステップＳ２１０）した音声信号を発話内重要語抽出装置２２０に出力する（ステップＳ２１１）。マイクロホンは、常時、利用者が発声する音声を記録するために、例えば利用者の胸元に装着可能な小型なものが好ましい。マイクロホンと音声入力端末とは一体で構成しても良い。その形態は、例えばネクタイピンのようなものであっても良い。 The voice input terminal 210 outputs a voice signal picked up by a microphone (not shown) (step S210) to the utterance important word extraction device 220 (step S211). The microphone is preferably a small one that can be worn on the chest of the user, for example, in order to record voices uttered by the user at all times. The microphone and the voice input terminal may be integrated. The form may be a tie pin, for example.

音声信号は、上記したように、例えばサンプリング周波数１６ｋＨｚで離散的なディジタル信号に変換された信号であり、有線または無線で発話内重要語抽出装置２２０に出力される。無線の場合は、無線ＰＡＮ（Personal Area Network）と称される１０ｍくらいまでの距離をカバーする近距離無線技術である例えばBluetooth（登録商標）や無線ＬＡＮを用いることができる。 As described above, the voice signal is a signal converted into a discrete digital signal at a sampling frequency of 16 kHz, for example, and is output to the in-speech important word extraction device 220 by wire or wirelessly. In the case of wireless, for example, Bluetooth (registered trademark) or wireless LAN, which is a short-range wireless technology that covers a distance of up to about 10 m called a wireless PAN (Personal Area Network), can be used.

〔発話内重要語抽出装置〕
発話内重要語抽出装置２２０は、入力されるテキスト情報から得た所定の条件を満たす単語をキーワードとして登録すると共に、音声入力端末２１０が出力する音声信号を音声ファイルとして録音した音声ファイルから人が発声した音声部分のみを切り出した音声区間の音声信号とその発声開始時刻情報をネットワーク２４０を介して認識単語抽出サーバ２５０に送信し、当該認識単語抽出サーバ２５０から受信したキーワードと一致する認識単語情報の単語を、重要語として抽出して出力する。ここで音声区間の抽出は、例えば参考文献２（特開２０１２−４８１１９号公報）に記載された方法を用いることができる。 [Key words extraction device]
The utterance important word extraction device 220 registers words satisfying a predetermined condition obtained from input text information as keywords, and a person from a voice file that records a voice signal output from the voice input terminal 210 as a voice file. The voice signal of the voice section obtained by cutting out only the voice part uttered and the voice start time information are transmitted to the recognized word extraction server 250 via the network 240, and the recognized word information that matches the keyword received from the recognized word extraction server 250. Are extracted as important words and output. Here, for example, a method described in Reference Document 2 (Japanese Patent Laid-Open No. 2012-48119) can be used to extract a voice section.

図５を参照して発話内重要語抽出装置２２０の動作を更に詳しく説明する。発話内重要語抽出装置２２０は、テキスト入力部２２１と、形態素解析部２２２と、単語登録部２２３と、キーワードリスト２２４と、音声録音部２２５と、音声区間抽出部２２６と、音声送出部２２７と、認識単語情報受信部２２８と、重要語抽出部２２９と、重要語表示部２３０と、を具備する。 With reference to FIG. 5, the operation of the keyword-in-speech extraction device 220 will be described in more detail. The important word extraction device 220 in the utterance includes a text input unit 221, a morphological analysis unit 222, a word registration unit 223, a keyword list 224, a voice recording unit 225, a voice segment extraction unit 226, and a voice transmission unit 227. , A recognized word information receiving unit 228, a keyword extraction unit 229, and a keyword display unit 230.

テキスト入力部２２１は、テキスト情報が入力される（ステップＳ２２１）。形態素解析部２２２は、テキスト入力部２２１に入力されるテキスト情報を形態素解析して形態素に分割した単語と品詞の情報から成る形態素列を出力する（ステップＳ２２２）。単語登録部２２３は、形態素解析部２２２が出力する形態素列を入力として、所定の条件を満たす単語を、キーワードとしてキーワードリストに登録する（ステップＳ２２３）。 Text input unit 221 receives text information (step S221). The morpheme analysis unit 222 outputs a morpheme string composed of word and part of speech information obtained by dividing the text information input to the text input unit 221 into a morpheme by performing morpheme analysis (step S222). The word registration unit 223 receives the morpheme string output from the morpheme analysis unit 222 as an input, and registers words satisfying a predetermined condition in the keyword list as keywords (step S223).

音声録音部２２５は、音声入力端末２１０から送られて来る音声信号を音声ファイルとして録音する（ステップＳ２２５）。この時、音声信号の録音が開始された時刻である録音開始時刻も記録される。音声信号と録音開始時刻は発話内重要語抽出装置２２０を構成するコンピュータの例えばＲＡＭ等の記憶部に記憶される。 The voice recording unit 225 records the voice signal sent from the voice input terminal 210 as a voice file (step S225). At this time, the recording start time, which is the time when the recording of the audio signal is started, is also recorded. The voice signal and the recording start time are stored in a storage unit such as a RAM of a computer constituting the important word extraction device 220 in the utterance.

音声区間抽出部２２６は、音声録音部２２５が録音した音声ファイルから人が発声した音声部分のみを切り出した音声ファイルを作成し、音声送出部２２７に当該音声ファイルを出力する（ステップＳ２２６）。この時に、切り出した音声ファイルの発声開始時刻情報も音声送出部２２７に出力する。発声開始時刻情報は、音声信号の録音開始時刻と切り出し音声の音声信号録音開始時刻からの経過時間から得られる。つまり、発声開始時刻情報は、音声信号の録音開始時刻に、切り出し音声の音声信号録音開始時刻からの経過時間を加算することで得られる。音声部分のみを切り出す方法は、上記したように周知である。 The voice segment extraction unit 226 creates a voice file by cutting out only a voice part uttered by a person from the voice file recorded by the voice recording unit 225, and outputs the voice file to the voice transmission unit 227 (step S226). At this time, the utterance start time information of the extracted audio file is also output to the audio transmission unit 227. The voice start time information is obtained from the elapsed time from the voice signal recording start time and the voice signal recording start time of the cut voice. That is, the voice start time information is obtained by adding the elapsed time from the voice signal recording start time of the cut voice to the voice signal recording start time. The method of cutting out only the audio part is well known as described above.

音声送出部２２７は、音声区間抽出部２２６が抽出した音声区間の音声信号と発声開始時刻情報をネットワーク２４０を介して認識単語抽出サーバ２５０に送信する（ステップＳ２２７）。認識単語情報受信部２２８は、認識単語抽出サーバ２５０からネットワーク２４０を介して送られて来る認識単語情報を受信する（ステップＳ２２８）。この認識単語情報は、音声認識部１６０（図１）が出力するテキスト文と同じものである。 The voice sending unit 227 transmits the voice signal of the voice section extracted by the voice section extraction unit 226 and the utterance start time information to the recognized word extraction server 250 via the network 240 (step S227). The recognized word information receiving unit 228 receives the recognized word information sent from the recognized word extraction server 250 via the network 240 (step S228). This recognition word information is the same as the text sentence output by the speech recognition unit 160 (FIG. 1).

重要語抽出部２２９は、認識単語情報受信部２２８で受信した認識単語情報を入力として、キーワードリスト２２４に登録されたキーワードと一致する単語を重要語として抽出する（ステップＳ２２９）。重要語表示部２３０は、重要語抽出部２２９が抽出した重要語を表示する（ステップＳ２３０）。重要語の表示は、発話内重要語抽出装置２２０が備える図示しない液晶パネル等の表示手段によって表示される。 The keyword extraction unit 229 uses the recognition word information received by the recognition word information reception unit 228 as an input, and extracts a word that matches the keyword registered in the keyword list 224 as a keyword (step S229). The important word display unit 230 displays the important words extracted by the important word extraction unit 229 (step S230). The important word is displayed by display means such as a liquid crystal panel (not shown) provided in the utterance important word extraction device 220.

発話内重要語抽出装置２２０の、テキスト入力部２２１、形態素解析部２２２、単語登録部２２３、キーワードリスト２２４、音声録音部２２５、音声区間抽出部２２６、重要語抽出部２２９、の各部は、装置としては別の装置であるので参照符号を変えているが、発話内重要語抽出装置１００（図１）で説明した同一名称の各機能部と同じものである。 The text input unit 221, the morpheme analysis unit 222, the word registration unit 223, the keyword list 224, the voice recording unit 225, the speech segment extraction unit 226, and the keyword extraction unit 229 of the important word extraction device 220 in the utterance are However, it is the same as each functional unit having the same name explained in the important word extracting device 100 (FIG. 1).

〔認識単語抽出サーバ〕
認識単語抽出サーバ２５０は、図５に示すように、音声受信部２５１と、音声認識部２５２と、認識単語情報送信部２５３と、を具備する。音声受信部２５１は、発話内重要語抽出装置２２０からネットワーク２４０を介して送信されて来る音声信号と発声開始時刻情報とを受信する（ステップＳ２５１）。 [Recognized word extraction server]
As shown in FIG. 5, the recognized word extraction server 250 includes a voice receiving unit 251, a voice recognizing unit 252, and a recognized word information transmitting unit 253. The voice receiving unit 251 receives a voice signal and utterance start time information transmitted from the intra-utterance important word extraction device 220 via the network 240 (step S251).

音声認識部２５２は、音声受信部２５１で受信した音声信号を、音声認識処理して形態素に分割した単語列の情報から成るテキスト文を生成して出力する（ステップＳ２５２）。認識単語情報送信部２５３は、音声認識部２５２が出力するテキスト文と、音声受信部２５１で受信した発声開始時刻情報とをネットワーク２４０を介して発話内重要語抽出装置２２０に出力する（ステップＳ２５３）。 The voice recognition unit 252 generates and outputs a text sentence composed of word string information obtained by dividing the voice signal received by the voice reception unit 251 into a morpheme through voice recognition processing (step S252). The recognized word information transmission unit 253 outputs the text sentence output by the voice recognition unit 252 and the utterance start time information received by the voice reception unit 251 to the important word extraction device 220 in the utterance via the network 240 (step S253). ).

以上説明した発話内重要語抽出システム２００によれば、比較的に処理の重い音声認識処理を、認識単語抽出サーバ２５０に分担させるので、発話内重要語抽出装置１００と同じ効果を奏する発話内重要語抽出装置２２０の構成を簡単にすることができる。また、認識単語抽出サーバ２５０の機能を実現するコンピュータのＣＰＵパワーを高めることで、発話内重要語抽出装置１００で行う音声認識性能よりも高速・高精度な音声認識処理を行う事も可能である。 According to the intra-speech key word extraction system 200 described above, the relatively heavy processing speech recognition process is shared by the recognized word extraction server 250, so the intra-speech key word extraction device 100 has the same effect as the intra-speech key word extraction device 100. The configuration of the word extraction device 220 can be simplified. Further, by increasing the CPU power of the computer that implements the function of the recognized word extraction server 250, it is possible to perform voice recognition processing that is faster and more accurate than the voice recognition performance performed by the important word extraction device 100 in the utterance. .

図７に、この発明の発話内重要語抽出システム３００のシステム構成を示す。発話内重要語抽出システム３００は、発話内重要語抽出システム２００に対して発話内重要語抽出装置３２０の構成のみが異なる。 FIG. 7 shows a system configuration of an important word extraction system 300 in an utterance according to the present invention. The important word extraction system 300 in the utterance is different from the important word extraction system 200 in the utterance only in the configuration of the important word extraction device 320 in the utterance.

発話内重要語抽出装置３２０は、発話内重要語抽出装置２２０（図５）に対して行動履歴表示部３２１を備える点で異なる。行動履歴表示部３２１は、重要語抽出部２２９が抽出した重要語と、その重要語が抽出された時間帯とを組とした行動履歴を表示する。行動履歴は、発話内重要語抽出装置３２０が備える図示しない液晶パネル等の表示手段によって表示される。 The utterance important word extraction device 320 differs from the utterance important word extraction device 220 (FIG. 5) in that it includes an action history display unit 321. The action history display unit 321 displays an action history that is a combination of the important word extracted by the important word extraction unit 229 and the time zone in which the important word is extracted. The action history is displayed by display means such as a liquid crystal panel (not shown) provided in the utterance important word extraction device 320.

図８に、音声入力端末２１０と発話内重要語抽出装置３２０の外観とその利用場面の一例を示す。音声入力端末２１０は上記した例のネクタイピン型である。発話内重要語抽出装置３２０の表示手段に、重要語とその重要語が抽出された時間帯とが、時間帯別に表示されている。このように重要語と時間帯が表示されることで利用者は、一日の行動履歴を確認することができる。 FIG. 8 shows an example of the appearance and usage scenes of the voice input terminal 210 and the important word extraction device 320 in the utterance. The voice input terminal 210 is a tie pin type in the above example. The important word and the time zone in which the important word is extracted are displayed for each time zone on the display means of the important word extraction device 320 in the utterance. Thus, by displaying the important words and the time zone, the user can check the daily action history.

また、利用者が表示手段の重要語又は時間帯を指先でタップすることで、音声録音部２２５が録音したその時間帯の音声を、図示を省略しているスピーカで再生するようにしても良い。行動履歴に対応する音声を再生する場合、発話内重要語抽出装置３２０は、更に、行動履歴選択入力部３２２と、録音データ選択再生部３２３と、を備える。行動履歴選択入力部３２２は、利用者が行動履歴を選択する指先が表示手段にタップする入力に対応して、重要語又は時間帯に対応した信号を録音データ選択再生部３２３に出力する。 Further, when the user taps an important word or time zone on the display means with a fingertip, the voice of the time zone recorded by the voice recording unit 225 may be reproduced by a speaker (not shown). . When reproducing the voice corresponding to the action history, the important word extraction device 320 in the utterance further includes an action history selection input unit 322 and a recorded data selection reproduction unit 323. The action history selection input unit 322 outputs a signal corresponding to an important word or time zone to the recording data selection / reproduction unit 323 in response to an input that a fingertip of the user selecting action history taps on the display unit.

録音データ選択再生部３２３は、音声録音部２２５が録音した音声ファイルを、行動履歴選択入力部３２２が出力する重要語又は時間帯に対応した信号に基づいて読み出してスピーカで再生する。重要語又は時間帯に対応する音声ファイルを、音声として聴取可能にすることで、利用者は行動履歴を詳細に振り返ることができる。 The recorded data selection / playback unit 323 reads the voice file recorded by the voice recording unit 225 based on the signal corresponding to the important word or time zone output from the action history selection input unit 322 and plays it back on the speaker. By making the audio file corresponding to the important word or time zone audible as audio, the user can look back on the action history in detail.

以上説明したこの発明の発話内重要語抽出装置とその装置を用いた発話内重要語抽出システムは、利用者が発話内重要語抽出装置に入力するテキスト情報は利用者の関心の対象そのものであり、そのテキスト情報に関連する単語をキーワードとして登録し、そのキーワードと一致する単語を重要語として抽出する。この方法によれば、キーワードリストを自動的に最新化できるのでそのメンテナンスコストを不要にすることができる。また、利用者が入力するテキスト情報には、ほとんど誤りが含まれないと考えられるので、重要語を精度良く抽出することが可能である。 In the utterance important word extraction apparatus and the utterance important word extraction system using the apparatus of the present invention described above, the text information that the user inputs to the utterance important word extraction apparatus is the object of interest of the user itself. Then, a word related to the text information is registered as a keyword, and a word matching the keyword is extracted as an important word. According to this method, since the keyword list can be automatically updated, the maintenance cost can be eliminated. Moreover, since it is considered that the text information input by the user contains almost no errors, it is possible to extract important words with high accuracy.

このように優れた効果を奏する本願発明の発話内重要語抽出装置は、ライフログのような個人活動履歴や備忘録の作成が可能であり、利用者に高い利便性を提供することができる。 The important word extraction device in an utterance of the present invention having such excellent effects can create a personal activity history such as a life log and a memorandum, and can provide a user with high convenience.

上記装置における処理手段をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、各装置における処理手段がコンピュータ上で実現される。 When the processing means in the above apparatus is realized by a computer, the processing contents of the functions that each apparatus should have are described by a program. Then, by executing this program on the computer, the processing means in each apparatus is realized on the computer.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記録装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させる構成としてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a recording device of a server computer and transferring the program from the server computer to another computer via a network.

また、各手段は、コンピュータ上で所定のプログラムを実行させることにより構成することにしてもよいし、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 Each means may be configured by executing a predetermined program on a computer, or at least a part of these processing contents may be realized by hardware.

Claims

A keyword list with registered keywords,
A text input section for inputting text information;
A morpheme analysis unit that outputs a morpheme string composed of word and part-of-speech information obtained by morphologically analyzing the text information and dividing it into morphemes;
A word registration unit for registering a word satisfying a predetermined condition as a keyword in the keyword list using the morpheme string as an input;
A voice input unit for inputting voice information;
A speech recognition unit that generates and outputs a text sentence composed of a word string obtained by performing speech recognition processing on the speech information and dividing the speech information;
An important word extraction unit that extracts the word that matches any of the keywords registered in the keyword list as an important word, using the text sentence as an input;
An apparatus for extracting important words in an utterance.

A voice input terminal,
An apparatus for extracting important words in an utterance connected to the voice input terminal;
Network,
A recognized word extraction server that communicates with the utterance key word extraction device via the network;
An important word extraction system in an utterance comprising
The voice input terminal outputs a voice signal picked up by a microphone to the important word extraction device in the utterance,
The utterance important word extraction device registers words satisfying a predetermined condition obtained from input text information as keywords, and only a voice portion uttered by a person from a voice file in which the voice signal is recorded as a voice file. The speech signal of the extracted speech section and the utterance start time information are transmitted to the recognized word extraction server via the network, and the word of the recognized word information that matches the keyword received from the recognized word extraction server Is extracted and output as
The recognized word extraction server includes a word string obtained by receiving a speech signal and speech start time information of a speech section from the important word extraction device in the utterance, and performing speech recognition processing on the speech signal of the speech section and dividing it into morphemes. A text sentence is generated, and the words constituting the text sentence are output as recognition word information to the important word extraction device in the utterance,
An important word extraction system in the utterance.

An apparatus for extracting important words in an utterance used in the important word extracting system in an utterance according to claim 2,
A keyword list with registered keywords,
An audio recording unit that records audio signals sent from the audio input terminal;
A voice segment extraction unit that extracts a voice signal of a voice segment obtained by cutting out only a voice part uttered by a person from the voice file recorded in the voice recording unit, and adds generation start time information indicating the start time to the voice segment. When,
A voice sending unit that sends the voice signal of the voice segment extracted by the voice segment extraction unit and the utterance start time information to the recognition word extraction server via the network;
A text input section for inputting text information;
A morpheme analysis unit that outputs a morpheme string composed of word and part-of-speech information obtained by morphologically analyzing the text information and dividing it into morphemes;
A word registration unit for registering a word satisfying a predetermined condition as a keyword in the keyword list using the morpheme string as an input;
A recognition word information receiving unit for receiving recognition word information sent from the recognition word extraction server via the network;
A keyword extraction unit that extracts, as input, the recognition word information received by the recognition word information reception unit, as a keyword, a word that matches any of the keywords registered in the keyword list;
An apparatus for extracting important words in an utterance.

In the utterance important word extraction device according to claim 1 or 3,
Furthermore,
An apparatus for extracting important words in an utterance, comprising: an action history display unit that displays an action history that includes the important word and a time zone in which the important word is extracted.

A text input process in which text information is input;
A morpheme analysis process for outputting a morpheme sequence composed of word and part-of-speech information obtained by morphologically analyzing the text information and dividing it into morphemes;
Using the morpheme string as an input, a word registration process for registering a word satisfying a predetermined condition as a keyword in a keyword list;
A voice input process in which voice information is input;
A speech recognition unit that generates and outputs a text sentence composed of a word string obtained by performing speech recognition processing on the speech information and dividing the speech information;
An important word extraction process for extracting, as an important word, a word that matches any of the keywords registered in the keyword list, using the text sentence as an input,
A method for extracting important words in an utterance.

A voice signal output process in which a voice input terminal outputs a voice signal picked up by a microphone to a key word extraction device in an utterance;
The speech important word extraction device records the voice signal as a voice file, and extracts the voice signal of the voice section obtained by cutting out only the voice part uttered by the person from the recorded voice file and the utterance start time information of the network. A voice transmission process to be transmitted to the recognized word extraction server via
The recognition word extraction server includes a word string obtained by receiving a speech signal and speech start time information of a speech section from the important word extraction device in the utterance, and performing speech recognition processing on the speech signal of the speech section and dividing it into morphemes. A recognition word information transmission process for generating a text sentence and outputting the words constituting the text sentence as recognition word information to the important word extraction device in the utterance;
The utterance important word extraction device registers a keyword obtained from text information input to a text input unit, and extracts a word of the recognized word information received from the recognized word extraction server that matches the keyword as an important word. The important word extraction process
A method for extracting important words in utterances.

A program for causing a computer to function as the important word extraction device in an utterance according to claim 1.