JP2023147490A

JP2023147490A - Processing device, processing method and processing program

Info

Publication number: JP2023147490A
Application number: JP2022055008A
Authority: JP
Inventors: 光治西本; Mitsuharu Nishimoto
Original assignee: Buzzgraph Inc
Current assignee: Buzzgraph Inc
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2023-10-13

Abstract

To divide text information obtained by voice recognition into one or a plurality of pieces of element information and to generate output information by associating attribute information with the element information.SOLUTION: At least one processor included in a processing device is configured to perform processing for receiving input of voice information including contents of utterances of one or a plurality of speakers based on input operation by a user, generating text information indicating the contents of the utterances of the one or the plurality of speakers based on the received voice information, dividing the generated text information into one or a plurality of pieces of element information, associating at least one attribute information with each divided element information, and generating output information in which at least a part of each element information is associated with any item in one or a plurality of items in form information having the one or the plurality of items based on the attribute information associated with each element information.SELECTED DRAWING: Figure 1

Description

本開示は、音声情報をテキスト情報に変換して編集できる処理装置、処理方法、処理プログラムに関する。 The present disclosure relates to a processing device, a processing method, and a processing program that can convert and edit audio information into text information.

音声情報をテキスト情報に変換する音声認識が一般的に用いられている。また、音声を発した話者を識別する装置が一般的に用いられている。例えば、特許文献１は、会議等の音声情報から議事録を生成するシステムを開示する。また、引用文献２は、話者を識別する装置を開示する。しかしながら、特許文献１は、音声認識により得られたテキスト情報を分割し、分割により得られたエレメント情報に意味及び話者等を示す属性情報を対応付け、出力用のフォームに含まれる項目に関連付けることを開示しない。 Speech recognition, which converts voice information into text information, is commonly used. Additionally, devices are commonly used to identify the speaker who uttered the voice. For example, Patent Document 1 discloses a system that generates minutes from audio information of a meeting or the like. Further, Cited Document 2 discloses a device for identifying a speaker. However, Patent Document 1 divides text information obtained by speech recognition, associates element information obtained by the division with attribute information indicating meaning, speaker, etc., and associates it with items included in an output form. do not disclose.

特開２０２２－０２８７７６号公報JP2022-028776A 特開平１０－３１３３５７号公報Japanese Patent Application Publication No. 10-313357

本実施形態は、上述した背景からなされたものであり、音声認識により得られたテキスト情報を一又は複数のエレメント情報に分割し、エレメント情報それぞれに属性情報を対応付けて出力情報を生成することを課題とする。また、本実施形態は、エレメント情報に、さらに、話者情報を対応付けて出力情報を生成することを課題とする。 This embodiment was developed based on the above-mentioned background, and involves dividing text information obtained through speech recognition into one or more element information, and generating output information by associating attribute information with each element information. The task is to Furthermore, the present embodiment aims to generate output information by further associating element information with speaker information.

本開示に係る処理装置は、少なくとも一つのプロセッサを含む処理装置であって、前記少なくとも一つのプロセッサは、ユーザによる入力操作に基づいて、一又は複数の話者の発言の内容を含む音声情報の入力を受け付け、受け付けた前記音声情報に基づいて、前記一又は複数の話者の発言の内容を示すテキスト情報を生成し、生成した前記テキスト情報を一又は複数のエレメント情報に分割し、分割した各エレメント情報に対して少なくとも一つの属性情報を対応付け、前記各エレメント情報に対応付けられた前記属性情報に基づいて、一又は複数の項目を有するフォーム情報において前記各エレメント情報の少なくとも一部が前記一又は複数の項目のうちのいずれかの項目に関連付けられた出力情報を生成するための処理を実行するように構成される。 A processing device according to the present disclosure is a processing device including at least one processor, and the at least one processor processes audio information including the content of utterances of one or more speakers based on an input operation by a user. Receive the input, generate text information indicating the content of the one or more speakers' utterances based on the received audio information, and divide the generated text information into one or more element information. At least one attribute information is associated with each element information, and based on the attribute information associated with each element information, at least a part of each element information in form information having one or more items is The device is configured to execute processing for generating output information associated with any one of the one or more items.

また、本開示に係る処理方法は、少なくとも一つのプロセッサを含むコンピュータにおいて、前記少なくとも一つのプロセッサが所定の指示命令を実行することによりなされる方法であって、ユーザによる入力操作に基づいて、一又は複数の話者の発言内容を含む音声情報の入力を受け付ける段階と、受け付けた前記音声情報に基づいて、前記一又は複数の話者の発言内容を示すテキスト情報を生成する段階と、生成した前記テキスト情報を一又は複数のエレメント情報に分割する段階と、分割した各エレメント情報に対して少なくとも一つの属性情報を対応付け、前記各エレメント情報に対応付けられた前記属性情報に基づいて、一又は複数の項目を有するフォーム情報において前記各エレメント情報の少なくとも一部が前記一又は複数の項目のうちのいずれかの項目に関連付けられた出力情報を生成する段階とを含む。 Further, the processing method according to the present disclosure is a method performed in a computer including at least one processor, in which the at least one processor executes a predetermined instruction command, and the processing method is performed by executing a predetermined instruction command based on an input operation by a user. or a step of receiving an input of voice information including contents of utterances of a plurality of speakers; and a step of generating text information indicating contents of utterances of the one or more speakers based on the received voice information; dividing the text information into one or more element information; associating at least one attribute information with each divided element information; Alternatively, in form information having a plurality of items, at least a part of each element information is associated with any one of the one or more items.

本開示に係る処理プログラムは、少なくとも一つのプロセッサを含むコンピュータを、ユーザによる入力操作に基づいて、一又は複数の話者の発言内容を含む音声情報の入力を受け付け、受け付けた前記音声情報に基づいて、前記一又は複数の話者の発言内容を示すテキスト情報を生成し、生成した前記テキスト情報を一又は複数のエレメント情報に分割し、分割した各エレメント情報に対して少なくとも一つの属性情報を対応付け、前記各エレメント情報に対応付けられた前記属性情報に基づいて、一又は複数の項目を有するフォーム情報において前記各エレメント情報の少なくとも一部が前記一又は複数の項目のうちのいずれかの項目に関連付けられた出力情報を生成する、処理を行うように構成されたプロセッサとして機能させる。 A processing program according to the present disclosure causes a computer including at least one processor to accept input of voice information including utterance content of one or more speakers based on an input operation by a user, and based on the received voice information. generating text information indicating the content of the one or more speakers' statements, dividing the generated text information into one or more element information, and assigning at least one attribute information to each of the divided element information. and, based on the attribute information associated with each element information, in form information having one or more items, at least a part of each element information corresponds to one of the one or more items. Act as a processor configured to perform processing that generates output information associated with the item.

本開示によれば、音声認識により得られたテキスト情報を一又は複数のエレメント情報に分割し、エレメント情報それぞれに属性情報を対応付けて出力情報を生成できる。また、本開示によれば、エレメント情報に、さらに、話者情報を対応付けて出力情報を生成できる。 According to the present disclosure, it is possible to divide text information obtained through speech recognition into one or more pieces of element information, associate attribute information with each piece of element information, and generate output information. Further, according to the present disclosure, output information can be generated by further associating element information with speaker information.

なお、上述した効果は説明の便宜のための例示的なものであるにすぎず、限定的なものではない。上述した効果に加えて、又は上述した効果に代えて、本開示中に記載されたいかなる効果や当業者であれば明らかな効果を奏することも可能である。 Note that the effects described above are merely illustrative for convenience of explanation, and are not limiting. In addition to or in place of the effects described above, any effects described in this disclosure or effects obvious to those skilled in the art may be achieved.

図１は、音声情報に対する音声認識により得られたテキスト情報を編集して生成されたエレメント情報１２、属性情報１６及び話者情報１８の出力情報１４のなかにおける表示の形式を示すフォーム１０を例示する図である。FIG. 1 illustrates a form 10 showing a display format in output information 14 of element information 12, attribute information 16, and speaker information 18 generated by editing text information obtained by voice recognition of voice information. This is a diagram. 図２は、会議等の音声情報を処理して会議の発言の内容を示すテキスト情報、及び会議において発言した話者を識別し、図１に示したフォーム１０に従って会議の議事録を生成する処理を実行する端末装置１００の構成を例示する図である。FIG. 2 shows a process of processing audio information from a meeting, etc. to identify text information indicating the contents of the remarks made in the meeting, as well as speakers who spoke in the meeting, and generating minutes of the meeting according to the form 10 shown in FIG. 1. FIG. 2 is a diagram illustrating the configuration of a terminal device 100 that executes. 図３Ａは、フォーム情報テーブルを示す図である。FIG. 3A is a diagram showing a form information table. 図３Ｂは、音声情報テーブルを示す図である。FIG. 3B is a diagram showing the audio information table. 図３Ｃは、エレメント情報テーブルを示す図である。FIG. 3C is a diagram showing an element information table. 図４Ａは、図３Ｂに示した音声情報テーブルに含まれるテキスト情報Ｈを例示する図である。FIG. 4A is a diagram illustrating text information H included in the audio information table shown in FIG. 3B. 図４Ｂは、図４Ａに示したテキスト情報から生成された音声情報テーブル（図３Ｂ）に含まれる変換情報を例示する図である。FIG. 4B is a diagram illustrating conversion information included in the audio information table (FIG. 3B) generated from the text information shown in FIG. 4A. 図４Ｃは、図４Ｂに示した変換情報から生成された音声情報テーブル（図３Ｂ）に含まれるエレメント情報１２（エレメント情報Ｊ）を例示する図である。FIG. 4C is a diagram illustrating element information 12 (element information J) included in the audio information table (FIG. 3B) generated from the conversion information shown in FIG. 4B. 図４Ｄは、チェックボックスと、図４Ｃに示したエレメント情報１２と、属性を示すフラグＦ１～Ｆ４と、話者Ａ～Ｃを示す話者情報１８を対応付けたＵＩ画像を例示する図である。FIG. 4D is a diagram illustrating a UI image in which check boxes, element information 12 shown in FIG. 4C, flags F1 to F4 indicating attributes, and speaker information 18 indicating speakers A to C are associated with each other. . 図５Ａは、ユーザによる端末装置１００に対して、フォーム情報テーブル（図３Ａ）に含まれるフォーム情報のうちの出力情報１４のフォーム情報Ｃを示す情報を登録する処理を示すフローチャートである。FIG. 5A is a flowchart showing a process in which the user registers information indicating form information C of the output information 14 of the form information included in the form information table (FIG. 3A) in the terminal device 100. 図５Ｂは、フォーム情報により示される出力情報１４のフォーム情報Ｃが示すフォーム１０を例示する図である。FIG. 5B is a diagram illustrating the form 10 indicated by the form information C of the output information 14 indicated by the form information. 図６Ａは、端末装置１００による音声情報処理を示すフローチャートである。FIG. 6A is a flowchart showing audio information processing by the terminal device 100. 図６Ｂは、属性を示すフラグと対応付けられたエレメント情報１２のリストを例示する図である。FIG. 6B is a diagram illustrating a list of element information 12 associated with flags indicating attributes. 図６Ｃは、エレメント情報１２等を、出力情報１４のフォーム１０に含まれ、属性情報１６に対応付けられたボックスに移動させるために用いられるＵＩ画像を例示する図である。FIG. 6C is a diagram illustrating a UI image used to move the element information 12 and the like to a box included in the form 10 of the output information 14 and associated with the attribute information 16.

以下、本開示の実施形態として、音声情報に対する音声識別処理等により得られた会議の発言内容を示すテキスト情報を処理し、予め決められたフォームに従った議事録を生成する処理を、図面を参照して詳細に説明する。なお、図面において実質的に同じ構成要素、処理及び情報には同じ符号及び名称が付される。また、「情報」と「データ」とは厳密には区別されない。 Hereinafter, as an embodiment of the present disclosure, a process of processing text information indicating the contents of a meeting uttered obtained by voice identification processing etc. on voice information and generating minutes according to a predetermined form will be described with reference to the drawings. This will be explained in detail with reference to the following. Note that substantially the same components, processes, and information in the drawings are given the same reference numerals and names. Furthermore, "information" and "data" are not strictly distinguished.

また、図面において、構成要素及びデータの数及び種類は例示的に示され、適宜、増減されたり変更されたりする。また、図面において、装置の間における通信の順番は例示的に示され、適宜、変更される。また、図面において、発明の本質的な説明に関係しない構成要素は、適宜、省略されることがある。また、図示の都合上、図面において、「情報」及び「モジュール」等、構成要素及び情報の名称の一部が適宜、省略されることがある。また、「エレメント情報１２ａ～１２ｄ」等、複数ある用語の区別が必要とされない場合には、符号の添え字が省略されて「エレメント情報１２」等と記載されることがある。 Further, in the drawings, the numbers and types of components and data are shown by way of example, and may be increased, decreased, or changed as appropriate. Furthermore, in the drawings, the order of communication between devices is shown as an example, and may be changed as appropriate. Furthermore, in the drawings, components that are not related to the essential description of the invention may be omitted as appropriate. Furthermore, for convenience of illustration, some names of components and information such as "information" and "module" may be omitted as appropriate in the drawings. Furthermore, when it is not necessary to distinguish between multiple terms such as "element information 12a to 12d", the subscript of the code may be omitted and the term may be written as "element information 12" or the like.

１．端末装置１００による処理の概要
まず、図１及び図２を参照して、本実施形態に係る会議の音声から議事録を生成する処理の概要を説明する。図１は、音声情報に対する音声認識により得られたテキスト情報を編集して生成されたエレメント情報１２、属性情報１６及び話者情報１８の出力情報１４のなかにおける表示の形式を示すフォーム１０を例示する図である。図２は、会議等の音声情報を処理して会議の発言の内容を示すテキスト情報、及び会議において発言した話者を識別し、図１に示したフォーム１０に従って会議の議事録を生成する処理を実行する端末装置１００の構成を例示する図である。 1. Outline of Processing by Terminal Device 100 First, with reference to FIGS. 1 and 2, an overview of the process of generating minutes from the audio of a meeting according to the present embodiment will be described. FIG. 1 illustrates a form 10 showing a display format in output information 14 of element information 12, attribute information 16, and speaker information 18 generated by editing text information obtained by voice recognition of voice information. This is a diagram. FIG. 2 shows a process of processing audio information from a meeting, etc. to identify text information indicating the contents of the remarks made in the meeting, as well as speakers who spoke in the meeting, and generating minutes of the meeting according to the form 10 shown in FIG. 1. FIG. 2 is a diagram illustrating the configuration of a terminal device 100 that executes.

なお、フォーム１０は、出力情報１４において、エレメント情報１２、属性情報１６及び話者情報１８等の一つ又は複数の項目を、ユーザ所望の形式で出力するために定義される。ユーザは、任意のフォーム１０を定義して、エレメント情報１２等を、端末装置１００に、定義したフォーム１０に従って出力情報１４のなかに表示させることができる。 Note that the form 10 is defined in order to output one or more items such as element information 12, attribute information 16, speaker information 18, etc. in the output information 14 in a format desired by the user. A user can define an arbitrary form 10 and cause the terminal device 100 to display element information 12 and the like in output information 14 according to the defined form 10.

図２に示す端末装置１００は、オンライン会議等の音声情報、及びオフラインのマイク１１９を介して収集された音声情報等、様々な音声情報を処理し、図１に示したフォーム１０に従ってディスプレイ（不図示）等に表示できる。ただし、以下、説明の明確化及び具体化のために、端末装置１００が、オフラインで行われた会議の音声情報の入力を受ける場合が説明する。また、端末装置１００が、音声識別により、会議における発言の内容を示すテキスト情報を生成し、生成したテキスト情報を分割してエレメント情報１２を生成する場合が説明される。また、端末装置１００が、エレメント情報１２を含む変換情報を発言した話者を特定する場合が説明される。また、端末装置１００が、エレメント情報１２に、エレメント情報１２の属性を示す属性情報１６、及びエレメント情報１２に対応する話者を示す話者情報１８を対応付ける場合が説明される。なお、話者認識は、エレメント情報１２を含む発言をした話者を識別することである。さらに、端末装置１００が、エレメント情報１２と、属性情報１６と、話者情報１８とを対応付け、フォーム１０に従って表示する場合が具体例とされる。 The terminal device 100 shown in FIG. 2 processes various audio information, such as audio information from an online conference, and audio information collected via an offline microphone 119, and displays the information in accordance with the form 10 shown in FIG. (Illustrated) etc. However, for clarity and specificity of explanation, a case will be described below in which the terminal device 100 receives input of audio information of a conference held offline. Further, a case will be described in which the terminal device 100 generates text information indicating the content of remarks made in a conference by voice identification, and divides the generated text information to generate element information 12. Also, a case will be described in which the terminal device 100 identifies a speaker who has uttered conversion information including the element information 12. Further, a case will be described in which the terminal device 100 associates the element information 12 with attribute information 16 indicating the attribute of the element information 12 and speaker information 18 indicating the speaker corresponding to the element information 12. Note that speaker recognition is to identify a speaker who has made a statement that includes the element information 12. Furthermore, a specific example is a case where the terminal device 100 associates the element information 12, the attribute information 16, and the speaker information 18 and displays them according to the form 10.

会議等において、端末装置１００によりマイク１１９等を介して、一又は複数人の会議の出席者等の話者の音声を示す音声情報が録音される。録音された音声情報に対して音声認識処理を行うことにより、音声情報は、一つ又は複数のテキスト情報に変換される。会議の音声から得られたテキスト情報は、複数の文章を含みうる。従って、テキスト情報は、複数の文章に分割されうる。一方、このテキスト情報は、例えば「え～まずＡさんが○月×日までにセミナー資料を生成してください」といった文字列を含む。従って、テキスト情報を分割して得られた文章もまた、「ください」といった敬語等、議事録に含まれる必要がない部分を含みうる。このように、議事録に含まれる必要がない部分は、削除されてよい。 In a conference or the like, audio information indicating the voices of one or more speakers such as conference attendees is recorded by the terminal device 100 via the microphone 119 or the like. By performing voice recognition processing on the recorded voice information, the voice information is converted into one or more pieces of text information. The text information obtained from the conference audio may include multiple sentences. Therefore, text information can be divided into multiple sentences. On the other hand, this text information includes, for example, a character string such as "Well, Mr. A, please generate seminar materials by month x date." Therefore, the sentences obtained by dividing the text information may also include portions that do not need to be included in the minutes, such as honorific expressions such as "please." In this way, parts that do not need to be included in the minutes may be deleted.

なお、分割により得られたテキスト情報は、テキスト情報同士の間の区切りを示す句読点「、」及び「。」、カンマ及びピリオド「，」，「．」及びスペース等の区切情報を含まない。一方、複数のテキスト情報を区切るためには、区切情報を、テキスト情報同士の間に挿入する必要があり、また、最初及び最後のテキスト情報に後置する必要がある。なお、日本語のテキスト情報において、テキスト文章を区切る区切情報は、句読点「、」及び「。」等の文章の区切りを示す句読点情報でありうる。このように、テキスト情報から、不要な文末の敬語等が削除され、テキスト情報同士の間等に句読点情報を挿入すると、テキスト情報同士の区切りが明確になるように変換された変換情報が得られる。 Note that the text information obtained by division does not include delimiter information such as punctuation marks "," and ".", commas and periods "," and ".", and spaces, which indicate delimiters between pieces of text information. On the other hand, in order to delimit a plurality of pieces of text information, it is necessary to insert delimiter information between the pieces of text information, and it is also necessary to post the delimiter information after the first and last text information. Note that in Japanese text information, the delimiter information that delimits text sentences may be punctuation mark information that indicates the delimitation of sentences, such as punctuation marks "," and ".". In this way, by removing unnecessary honorific words at the end of sentences from text information and inserting punctuation information between pieces of text information, conversion information can be obtained that has been converted so that the boundaries between pieces of text information are clear. .

変換情報は、それぞれ単語及び単語群から構成され、何らかの意味を有する一つ又は複数のエレメント情報１２を含みうる。例えば、変換情報「え～、まずＡさんが○月×日までにセミナー資料を生成する。」は、形態素解析処理及び機能素解析処理等により、それぞれ特定の意味を有する複数のエレメント情報１２に分割されうる。例えば、この変換情報は、それぞれ「え～」、「まず」、「Ａさんが○月×日までに」及び「セミナー資料を生成する」といった単語及び単語群により構成されるエレメント情報１２ａ～１２ｄに分割されうる。 The conversion information is composed of words and word groups, and may include one or more element information 12 having some meaning. For example, the conversion information ``Well, first, Mr. A will generate seminar materials by month x date.'' is converted into multiple pieces of element information 12 each having a specific meaning through morphological analysis processing, functional analysis processing, etc. Can be divided. For example, this conversion information includes element information 12a to 12d each consisting of words and word groups such as "Eh~", "First of all", "Mr. A by month x date", and "Generate seminar materials". can be divided into

エレメント情報１２は変換情報に含まれ、１つの変換情報は、同一の話者により発言されたと推定されうるので、エレメント情報１２の話者は、エレメント情報１２を含む変換情報に対応する音声情報に対して話者識別の処理を行うことにより特定されうる。言い換えると、変換情報を介してテキスト情報に対して話者識別の処理を行うことができ、この処理により、エレメント情報１２の話者が特定され、エレメント情報１２にこの話者を示す話者情報１８が対応付けられうる。 The element information 12 is included in the conversion information, and one piece of conversion information can be presumed to have been uttered by the same speaker. Therefore, the speaker of the element information 12 can change the speech information corresponding to the conversion information including the element information 12. It can be identified by performing speaker identification processing on the speaker. In other words, speaker identification processing can be performed on the text information via the conversion information, and through this processing, the speaker of the element information 12 is specified, and the speaker information indicating this speaker is added to the element information 12. 18 can be associated.

この変換情報に含まれるエレメント情報１２ａの「え～」（図１において不図示）は感動詞であり、議事録に含められなくてよい。エレメント情報１２ｂの「まず」は、「セミナー資料を生成する」という行動の順番又は期限のうちの前者（順番）を意味するので、議事録に含められる必要がある。エレメント情報１２ｃの「Ａさんが○月×日までに」は、エレメント情報１２ｄの「セミナー資料を生成する」という行動の順番又は期限のうちの後者（期限）、及び主体を意味するので、議事録に含められる必要がある。エレメント情報１２ｄの「セミナー資料を生成する」は、行動を意味するので、議事録に含められる必要がある。 "E~" (not shown in FIG. 1) of the element information 12a included in this conversion information is an interjection and does not need to be included in the minutes. "First" in the element information 12b means the former (order) of the action order or deadline of "generating seminar materials", and therefore needs to be included in the minutes. Element information 12c, “Mr. A, by month x date” means the latter (deadline) of the action order or deadline of “generate seminar materials” in element information 12d, and the subject. must be included in the record. "Generate seminar materials" in the element information 12d means an action, and therefore needs to be included in the minutes.

以上説明したように、エレメント情報１２は、例えば、会議における話者による発言前の挨拶として「おはよう」（不図示）、及び発言に前置されるエレメント情報１２ａの「え～」等の日本語文法の用語で感動詞と呼ばれる単語を含みうる。また、エレメント情報１２は、感動詞「おはよう」に敬語「ございます」（不図示）を付した「おはようございます」等の単語群を含みうる。これらの感動詞に含まれる単語、及び感動詞に敬語が付された感動詞に対応する単語群は、一般に、会議の議事録に含まれる必要はない。従って、感動詞に含まれる単語、及び感動詞に敬語が付された感動詞に対応する単語群には、属性情報１６を対応付ける必要はないので、議事録を生成する場合には無属性とされる。なお、以下、「感動詞に含まれる単語、及び感動詞に敬語が付された感動詞に対応する単語群」等は、「感動詞等に対応する単語及び単語群」等と記載される。 As explained above, the element information 12 includes, for example, "Good morning" (not shown) as a greeting before a speaker makes a statement in a meeting, and Japanese words such as "E~" in the element information 12a that is prefixed to a statement. It can include words called interjections in grammatical terms. Furthermore, the element information 12 may include a group of words such as "Good morning", which is an interjection "Good morning" with the honorific word "Mazaimasu" (not shown) added thereto. Words included in these interjections and word groups corresponding to interjections with honorifics added to them generally do not need to be included in the minutes of the meeting. Therefore, there is no need to associate attribute information 16 with words included in interjections and word groups corresponding to interjections with honorifics added to them, so they are treated as having no attributes when generating minutes. Ru. Note that hereinafter, "words included in interjections and word groups corresponding to interjections in which honorifics are added to interjections", etc. will be referred to as "words and word groups corresponding to interjections, etc.".

無属性ではなく、議事録に含められるべきエレメント情報１２ｂ～１２ｄは、行動の主体、期限、行動及び結論等、何らかの意味を有する。つまり、エレメント情報１２ｂの「まず」は行動（エレメント情報１２ｄの「セミナー資料を生成する」）の順番を示し、また、エレメント情報１２ｃの「Ａさんが○月×日までに」は行動の主体及び期限を示す。従って、これらのエレメント情報１２は、会議の議事録に含まれる必要がある。従って、これらのように、行動の順番又は期限を示すエレメント情報１２には、例えば、「期限」という属性情報１６が対応付けられる。 The element information 12b to 12d that is not attributeless and should be included in the minutes has some meaning, such as the subject of the action, the deadline, the action, and the conclusion. In other words, "first" in the element information 12b indicates the order of the action ("generate seminar materials" in the element information 12d), and "Mr. A by month x date" in the element information 12c indicates the subject of the action. and the deadline. Therefore, these element information 12 need to be included in the minutes of the meeting. Therefore, as shown above, element information 12 indicating the order or deadline of an action is associated with attribute information 16 called "deadline", for example.

エレメント情報１２ｄの「セミナー資料を生成する」は、会議において行われることが決められた行動を示すので、会議の議事録に含まれる必要がある。従って、このように、行うことが決められた行動には、「行うことのリスト」との意味の「Ｔｏ－Ｄｏリスト（Ｔｏ－ＤｏＬｉｓｔ）」の一部から「Ｔｏ－Ｄｏ」という属性情報１６が対応付けられる。なお、行うことが決められた行動は、一般に「タスク」等と呼ばれることがある。また、会議の結論を示すエレメント情報１２ｄには、「結論」という属性情報１６が対応付けられる。 The element information 12d "generate seminar materials" indicates an action that has been decided to be performed at the meeting, and therefore needs to be included in the minutes of the meeting. Therefore, for an action that has been decided to be performed, the attribute information "To-Do" is included as part of the "To-Do List" which means "list of things to do". 16 are associated. Note that an action that has been decided to be performed is generally called a "task" or the like. Further, attribute information 16 called "conclusion" is associated with element information 12d indicating the conclusion of the meeting.

さらに、各エレメント情報１２が、会議の出席者のいずれの話者の発言に含まれていたかは、上述したように、当該エレメント情報１２を含む変換情報に対応する音声情報に対して話者識別の処理を行うことにより特定されうる。例えば、会議の出席者がｍ人（ｍ≧２）である場合、例えば、会議の出席者の人数ｍを予め音声認識装置に設定しておくことにより、各エレメント情報１２を含むテキスト情報の話者を認識するための話者認識処理が容易となる。このように、図１に示すように、話者認識により、エレメント情報１２それぞれと、当該エレメント情報１２を含む文章を発言した話者（図１においては話者Ａ～Ｃ；ｍ＝３）とを対応付けることができる。 Furthermore, as described above, whether each element information 12 was included in the speech of one of the speakers attending the conference is determined by the speaker identification based on the audio information corresponding to the conversion information including the element information 12. It can be identified by performing the following processing. For example, if the number of attendees at a meeting is m (m≧2), for example, by setting the number m of attendees at the meeting in advance in the voice recognition device, text information including each element information 12 can be spoken. This facilitates speaker recognition processing for recognizing speakers. As shown in FIG. 1, speaker recognition identifies each element information 12 and the speaker who uttered the sentence including the element information 12 (speakers A to C; m=3 in FIG. 1). can be mapped.

本実施形態においては、音声情報からテキスト情報が生成され、さらに、テキスト情報から変換情報を経てエレメント情報１２が生成される。生成されたエレメント情報１２それぞれには、「無属性」、「期限」、「Ｔｏ－Ｄｏ」及び「結論」のうちの一つ以上の属性情報１６が対応付けられる。生成されたエレメント情報１２が、会議の出席者等の話者のいずれの発言に含まれていたかは、上述したように特定されうる。特定された話者を示す話者情報１８は、エレメント情報１２に対応付けられる。 In this embodiment, text information is generated from audio information, and element information 12 is further generated from the text information via conversion information. Each of the generated element information 12 is associated with one or more attribute information 16 of "no attribute", "deadline", "To-Do", and "conclusion". As described above, it can be specified which utterance of a speaker such as a conference attendee includes the generated element information 12. Speaker information 18 indicating the identified speaker is associated with element information 12.

ユーザは、エレメント情報１２自体を適宜、修正及び編集できる。さらに、ユーザは、エレメント情報１２に誤って対応付けられた属性情報１６及び話者情報１８を修正したり、属性情報１６及び話者情報１８自体を編集したりできる。これらの修正及び編集により、エレメント情報１２と、属性情報１６及び話者情報１８とが正しく対応付けられる。さらに、点線の矢印により示すように、ユーザの手作業により属性情報１６に基づいてエレメント情報１２が並べ替えられたり、自動的に属性情報１６に基づいてエレメント情報１２が並べ替えられたりしうる。これにより、ユーザの作業による会議の議事録の生成が容易になったり、議事録が自動的に生成されたりしうる。以上説明された会議の議事録を示す情報が、図１に示す出力情報１４とされ、フォーム１０に従ってディスプレイを介してユーザに表示される。 The user can modify and edit the element information 12 itself as appropriate. Furthermore, the user can modify the attribute information 16 and speaker information 18 that are erroneously associated with the element information 12, or edit the attribute information 16 and speaker information 18 themselves. Through these corrections and edits, the element information 12, attribute information 16, and speaker information 18 are correctly associated. Further, as indicated by dotted arrows, the element information 12 can be manually sorted by the user based on the attribute information 16, or the element information 12 can be automatically sorted based on the attribute information 16. Thereby, it becomes easy to generate the minutes of a meeting by the user's work, or the minutes can be automatically generated. The information indicating the minutes of the meeting explained above is made into the output information 14 shown in FIG. 1, and is displayed to the user via the display according to the form 10.

さらに、Ｔｏ－Ｄｏの属性が対応付けられたエレメント情報１２を含む発言を行った出席者（話者）を示す話者情報１８は、例えば、当該エレメント情報１２が示す行動の指示者と推定されて出力情報に含まれうる。なお、この発言を行った出席者がこの発言に含まれ、Ｔｏ－Ｄｏの属性が対応付けられたエレメント情報１２が示す行動の指示者でないことがある。このような場合には、適宜、ユーザによる話者情報１８の修正及び編集により、Ｔｏ－Ｄｏの属性が対応付けられたエレメント情報１２が示す行動の指示者は、正しい指示者、例えば当該話者以外の他の話者に訂正されうる。なお、「ユーザ」は、会議の議事録を生成する人を意味し、会議の参加者等の話者に含まれていても、含まれていなくともよい。
Furthermore, speaker information 18 indicating an attendee (speaker) who made a statement including element information 12 associated with an attribute of To-Do is, for example, presumed to be the person who directed the action indicated by the element information 12. can be included in the output information. Note that the attendee who made this comment may not be included in this comment and be the instructor of the action indicated by the element information 12 associated with the To-Do attribute. In such a case, the person who instructed the action indicated by the element information 12 associated with the attribute of To-Do can be changed to the correct person, for example, by correcting and editing the speaker information 18 by the user. Can be corrected by other speakers. Note that the "user" refers to a person who generates the minutes of a meeting, and may or may not be included in the speakers such as participants of the meeting.

２．端末装置１００の構成
以下、図２を参照して、図１を参照して説明した本実施形態に係る会議の音声から議事録を生成する処理を実行する端末装置１００の構成を説明する。端末装置１００は、スマートフォンといった携帯端末装置、タブレット型コンピュータといった端末装置、ノート型パーソナルコンピュータ（ＰＣ）、デスクトップ型ＰＣ、サーバ装置及び大型コンピュータといった汎用の情報処理装置でありうる。 2. Configuration of Terminal Device 100 Hereinafter, with reference to FIG. 2, the configuration of the terminal device 100 that executes the process of generating minutes from the audio of the meeting according to the present embodiment described with reference to FIG. 1 will be described. The terminal device 100 may be a general-purpose information processing device such as a mobile terminal device such as a smartphone, a terminal device such as a tablet computer, a notebook personal computer (PC), a desktop PC, a server device, or a large computer.

なお、端末装置１００は、図２に示す構成要素の全てを備える必要はなく、端末装置１００の一部の構成要素は省略されうる。また、端末装置１００には、図２に示す以外の他の構成要素が加えられうる。端末装置１００は、図２に示す構成要素により、図１を参照して説明した本実施形態に係る会議の音声から議事録を生成する処理を行う処理装置として機能する。 Note that the terminal device 100 does not need to include all of the components shown in FIG. 2, and some components of the terminal device 100 may be omitted. Furthermore, other components than those shown in FIG. 2 may be added to the terminal device 100. The terminal device 100 uses the components shown in FIG. 2 to function as a processing device that performs a process of generating minutes from the audio of the meeting according to the present embodiment described with reference to FIG.

ただし、上述した会議の音声から議事録を生成する処理は、必ずしも端末装置１００において実行される必要はなく、端末装置１００と、インターネット等の通信ネットワーク（不図示）を介して接続されたサーバ装置（不図示）を処理装置として行われてもよい。この場合には、例えば、端末装置１００は、当該サーバ装置に通信ネットワークを介して音声情報を送信し、当該サーバ装置が、図１を参照して説明した音声情報処理及び話者認識処理を実行し、出力情報１４を生成する。サーバ装置は、この出力情報１４を、通信ネットワークを介して端末装置１００に送信する。 However, the process of generating minutes from the audio of the meeting described above does not necessarily need to be executed in the terminal device 100, but rather in a server device connected to the terminal device 100 via a communication network (not shown) such as the Internet. (not shown) may be used as a processing device. In this case, for example, the terminal device 100 transmits voice information to the server device via the communication network, and the server device executes the voice information processing and speaker recognition process described with reference to FIG. and generates output information 14. The server device transmits this output information 14 to the terminal device 100 via the communication network.

図２に示すように、端末装置１００は、バスを介して相互に接続された出力インターフェイス（出力ＩＦ）１１１、プロセッサ１１２、メモリ１１３、通信インターフェイス（通信ＩＦ）１１４、入力インターフェイス（入力ＩＦ）１１６及びマイク１１９を含む。 As shown in FIG. 2, the terminal device 100 includes an output interface (output IF) 111, a processor 112, a memory 113, a communication interface (communication IF) 114, and an input interface (input IF) 116 that are interconnected via a bus. and a microphone 119.

メモリ１１３は、ＲＡＭ、ＲＯＭ、不揮発性メモリ（ＮＶＭ）、ＨＤＤ（不図示）及びＳＳＤ（不図示）等を含む。通信インターフェイス１１４は、通信処理回路１１５及びアンテナを含む。入力インターフェイス１１６は、マウス１１７及びハードキー１１８を含む。そして、端末装置１００のこれらの構成要素は、制御ライン（不図示）及びバスを介して電気的に接続され、データ及び情報を相互に送受信する。 The memory 113 includes RAM, ROM, nonvolatile memory (NVM), HDD (not shown), SSD (not shown), and the like. Communication interface 114 includes a communication processing circuit 115 and an antenna. Input interface 116 includes a mouse 117 and hard keys 118. These components of the terminal device 100 are electrically connected via a control line (not shown) and a bus, and mutually transmit and receive data and information.

出力インターフェイス１１１は、スピーカ及びディスプレイ（不図示）等の出力デバイスを端末装置１００に接続する。なお、これらの出力デバイスは、端末装置１００の外部に配置され、出力インターフェイス１１１を介して接続されても、端末装置１００と一体に構成されて出力インターフェイス１１１に接続されてもよい。 The output interface 111 connects output devices such as speakers and displays (not shown) to the terminal device 100. Note that these output devices may be placed outside the terminal device 100 and connected via the output interface 111, or may be configured integrally with the terminal device 100 and connected to the output interface 111.

出力インターフェイス１１１に接続されるディスプレイは、プロセッサ１１２の指示に応じて、メモリ１１３に記憶された画像情報を読み出して、各種表示を行う表示部として機能する。ディスプレイは、実施形態に係る音声情報から得られたテキスト情報処理の実行のための情報等を表示する。なお、ディスプレイは、例えば液晶ディスプレイや有機ＥＬディスプレイから構成される。出力インターフェイス１１１に接続されるスピーカは、端末装置１００が受信した音声データから得られた音声信号を出力するオーディオ出力部として機能する。 A display connected to the output interface 111 functions as a display section that reads out image information stored in the memory 113 and displays various types of information according to instructions from the processor 112. The display displays information for executing text information processing obtained from the audio information according to the embodiment. Note that the display is composed of, for example, a liquid crystal display or an organic EL display. The speaker connected to the output interface 111 functions as an audio output unit that outputs an audio signal obtained from audio data received by the terminal device 100.

プロセッサ１１２は、１以上のＣＰＵ（マイクロプロセッサ）又は１以上のＣＰＵと画像処理に特化した１以上のＧＰＵ等との組み合わせと、その周辺回路とから構成される。プロセッサ１１２は、メモリ１１３に記憶された各種プログラムに基づいて、接続された他の構成要素を制御する制御部として機能する。 The processor 112 includes one or more CPUs (microprocessors) or a combination of one or more CPUs and one or more GPUs specialized for image processing, and peripheral circuits thereof. The processor 112 functions as a control unit that controls other connected components based on various programs stored in the memory 113.

具体的には、プロセッサ１１２は、実施形態に係る処理を実行するための所定の指示命令を含むアプリケーションプログラム、及びＯＳの処理のための所定の指示命令を含むプログラムをメモリ１１３から読み出して実行する。また、ＯＳは、プロセッサ１１２によるアプリケーションプログラムの実行のための機能を提供する。 Specifically, the processor 112 reads from the memory 113 and executes an application program including predetermined instruction instructions for executing the process according to the embodiment and a program including predetermined instruction instructions for OS processing. . Further, the OS provides a function for the processor 112 to execute an application program.

特に、プロセッサ１１２は、ユーザによる入力操作に基づいて、一又は複数人の話者の発言の内容を含む音声情報の入力を受け付ける所定の指示命令を含むアプリケーションプログラムをメモリ１１３から読み出して実行する。また、プロセッサ１１２は、受け付けた音声情報に基づいて、一又は複数の話者の発言の内容を示すテキスト情報を生成し、生成したテキスト情報を一又は複数のエレメント情報１２情報に分割する所定の指示命令を含むアプリケーションプログラムをメモリ１１３から読み出して実行する。 In particular, the processor 112 reads from the memory 113 an application program including a predetermined command to accept input of audio information including the content of utterances by one or more speakers based on an input operation by the user and executes the application program. Further, the processor 112 generates text information indicating the contents of one or more speakers' statements based on the received audio information, and divides the generated text information into one or more pieces of element information 12 information. An application program including instructions is read from memory 113 and executed.

また、プロセッサ１１２は、分割した各エレメント情報１２に対して少なくとも一つの属性情報１６を対応付け、各エレメント情報１２に対応付けられた属性情報１６に基づいて、一又は複数の項目を有するフォーム情報において各エレメント情報１２の少なくとも一部が一又は複数の項目のうちのいずれかの項目に関連付けられた出力情報１４を生成する所定の指示命令を含むアプリケーションプログラムをメモリ１１３から読み出して実行する。 Further, the processor 112 associates at least one attribute information 16 with each divided element information 12, and based on the attribute information 16 associated with each element information 12, forms information having one or more items. In this step, an application program including a predetermined instruction command for generating output information 14 in which at least a portion of each element information 12 is associated with one or more items is read out from the memory 113 and executed.

メモリ１１３は、記憶部として機能する。さらに、メモリ１１３には、端末装置１００に対して着脱可能な記憶媒体及びデータベース（不図示）等が接続されうる。メモリ１１３において、ＲＯＭは、ＯＳ等の処理のための所定の指示命令を含むプログラムを記憶する。 Memory 113 functions as a storage section. Further, a storage medium, a database (not shown), etc., which are removable from the terminal device 100, can be connected to the memory 113. In the memory 113, the ROM stores a program including predetermined instructions for processing the OS and the like.

ＲＡＭは、ＲＯＭに記憶されたアプリケーションプログラム及びＯＳのプログラムがプロセッサ１１２により処理されている間に、処理に必要とされるデータの書き込み、及び読み出しが行われるメモリである。不揮発性メモリは、書き込まれたデータを電源の供給なしに保持するメモリである。不揮発性メモリには、プロセッサ１１２により、当該プログラムの実行によって得られたデータが書き込まれたり、書き込まれたデータが読み出されたりする。 The RAM is a memory into which data necessary for processing is written and read while the processor 112 is processing application programs and OS programs stored in the ROM. Nonvolatile memory is memory that retains written data without power supply. The processor 112 writes data obtained by executing the program into the nonvolatile memory, and reads written data.

特に、メモリ１１３は、ユーザによる入力操作に基づいて、一又は複数の話者の発言の内容を含む音声情報の入力を受け付ける所定の指示命令を含むアプリケーションプログラムを記憶する。また、メモリ１１３は、受け付けた音声情報に基づいて、一又は複数の話者の発言の内容を示すテキスト情報を生成し、生成したテキスト情報を一又は複数のエレメント情報１２情報に分割する所定の指示命令を含むアプリケーションプログラムを記憶する。また、メモリ１１３は、分割した各エレメント情報１２に対して少なくとも一つの属性情報１６を対応付け、各エレメント情報１２に対応付けられた属性情報１６に基づいて、一又は複数の項目を有するフォーム情報において各エレメント情報１２の少なくとも一部が一又は複数の項目のうちのいずれかの項目に関連付けられた出力情報１４を生成する所定の指示命令を含むアプリケーションプログラムを記憶する。 In particular, the memory 113 stores an application program including a predetermined instruction command that accepts input of audio information including the contents of one or more speakers' utterances based on an input operation by a user. The memory 113 also generates text information indicating the contents of one or more speakers' statements based on the received audio information, and divides the generated text information into one or more pieces of element information 12 information. Stores an application program including instructions. The memory 113 also associates at least one attribute information 16 with each divided element information 12, and stores form information having one or more items based on the attribute information 16 associated with each element information 12. At least part of each element information 12 stores an application program including a predetermined instruction command for generating output information 14 associated with one of the one or more items.

通信インターフェイス１１４は、通信処理回路１１５及びアンテナを介して通信ネットワーク（不図示）と端末装置１００とを接続し、通信ネットワークに接続された他の装置（不図示）との間で、情報及びデータを送受信する通信部として機能する。通信処理回路１１５は、広帯域又は狭帯域の無線通信方式によって、通信ネットワークと端末装置１００の間で、アンテナを介して情報を通信するための通信処理を行う。なお、広帯域の無線通信方式は、例えばＬＴＥ方式であり、狭帯域の無線通信方式は、例えばＩＥＥＥ８０２．１１及びＢｌｕｅｔｏｏｔｈ（登録商標）等である。また、通信処理回路１１５は、無線通信の代わりに、又は無線通信に加えて、有線通信のための処理を行ってもよい。 The communication interface 114 connects a communication network (not shown) and the terminal device 100 via a communication processing circuit 115 and an antenna, and exchanges information and data with other devices (not shown) connected to the communication network. It functions as a communication unit that sends and receives information. The communication processing circuit 115 performs communication processing for communicating information via an antenna between the communication network and the terminal device 100 using a wideband or narrowband wireless communication method. Note that the wideband wireless communication method is, for example, the LTE method, and the narrowband wireless communication method is, for example, IEEE802.11 and Bluetooth (registered trademark). Further, the communication processing circuit 115 may perform processing for wired communication instead of or in addition to wireless communication.

入力インターフェイス１１６は、マウス１１７及びハードキー１１８等の入力デバイスと有線通信又は無線通信によって接続され、ユーザの操作を受け入れて各種情報の入力を受ける入力部として機能する。入力インターフェイス１１６の例としては、シリアルポート、パラレルポート、及びＵＳＢ等が挙げられる。また、無線通信（例えば、Ｂｌｕｅｔｏｏｔｈ（登録商標））によりマウス１１７を接続するような場合には、無線通信機能を有する構成要素に、入力インターフェイス１１６及び通信インターフェイス１１４の機能を兼用させることも可能である。 The input interface 116 is connected to input devices such as a mouse 117 and hard keys 118 through wired or wireless communication, and functions as an input unit that receives user operations and inputs various information. Examples of the input interface 116 include a serial port, parallel port, USB, and the like. Furthermore, in the case where the mouse 117 is connected via wireless communication (for example, Bluetooth (registered trademark)), it is also possible to have a component with a wireless communication function function as the input interface 116 and the communication interface 114. be.

マウス１１７は、それ自体の移動を検出するセンサ、左ボタン及び右ボタン等を含む。マウス１１７は、ディスプレイに表示されたマウスポインタを移動させるユーザの操作を検出する。また、マウス１１７は、ディスプレイに表示されたアイコン等に対する左右のボタン（不図示）を用いたユーザのクリック操作を検出する。 The mouse 117 includes a sensor that detects its movement, a left button, a right button, and the like. The mouse 117 detects a user's operation to move a mouse pointer displayed on the display. Further, the mouse 117 detects a user's click operation using left and right buttons (not shown) on an icon or the like displayed on the display.

例えば、ユーザは、マウス１１７を用いてＯＳにより提供される機能を利用し、ディスプレイに表示されたエレメント情報１２を、左ボタンへの操作（クリック）により選択できる。また、ユーザは、選択したエレメント情報１２を、左ボタンを押下したままマウス１１７を移動させることによりディスプレイの画面において移動させ、左ボタンの押下を止めることにより、所望の位置に動かせる。マウス１１７を用いたユーザによるこのような操作は、一般に、ドラッグ・アンド・ドロップとも呼ばれる。マウス１１７は、上述したようなユーザの操作を受け入れ、入力インターフェイス１１６を介してプロセッサ１１２に出力する。なお、マウス１１７は、端末装置１００がノートＰＣ等の場合には、タッチバッド等により代替されうる。 For example, the user can use the functions provided by the OS using the mouse 117 to select the element information 12 displayed on the display by operating (clicking) the left button. The user can also move the selected element information 12 on the display screen by moving the mouse 117 while holding down the left button, and move it to a desired position by stopping the left button. Such an operation by the user using the mouse 117 is also generally referred to as drag and drop. The mouse 117 accepts the user's operations as described above and outputs them to the processor 112 via the input interface 116. Note that the mouse 117 may be replaced by a touch pad or the like when the terminal device 100 is a notebook PC or the like.

ハードキー１１８は、機械的スイッチを含み、ユーザによる端末装置１００への操作を受け入れて、入力インターフェイス１１６を介してプロセッサ１１２に出力する。なお、端末装置１００とハードキー１１８とは一体に構成されても、別々に構成されてもよい。端末装置１００とハードキー１１８とが別々に構成される場合には、端末装置１００とマウス１１７及びハードキー１１８の間は無線通信又は有線通信によって接続される。 Hard keys 118 include mechanical switches, accept operations on terminal device 100 by a user, and output them to processor 112 via input interface 116 . Note that the terminal device 100 and the hard key 118 may be configured integrally or separately. When the terminal device 100 and the hard keys 118 are configured separately, the terminal device 100, the mouse 117, and the hard keys 118 are connected by wireless communication or wired communication.

マイク１１９は、会議等における一又は複数の話者の発言等の音声を受けてアナログ形式の音声信号に変換し、さらに、この音声信号をディジタル形式の音声情報に変換し、プロセッサ１１２に出力する。ただし、端末装置１００は、音声情報を、マイク１１９を用いて得る必要はない。例えば、端末装置１００が、オンラインミーティングに参加しているユーザにより用いられている場合には、端末装置１００は、オンラインミーティングにおいて他の端末装置との間で送受信されている音声情報を処理できる。 The microphone 119 receives audio such as statements made by one or more speakers at a conference, etc., converts it into an analog audio signal, further converts this audio signal into digital audio information, and outputs it to the processor 112. . However, the terminal device 100 does not need to obtain audio information using the microphone 119. For example, when the terminal device 100 is being used by a user participating in an online meeting, the terminal device 100 can process audio information being transmitted and received with other terminal devices in the online meeting.

３．端末装置１００において処理に用いられる情報
以下、図３Ａ～図３Ｃを参照して、図２に示した端末装置１００において音声情報の処理のために用いられる情報を説明する。図３Ａは、フォーム情報テーブルを示す図である。端末装置１００は、図３Ａに示すフォーム情報テーブルを、ユーザ所望の出力形式に従って出力情報１４を生成するために用いる。 3. Information used for processing in the terminal device 100 Information used for processing audio information in the terminal device 100 shown in FIG. 2 will be described below with reference to FIGS. 3A to 3C. FIG. 3A is a diagram showing a form information table. Terminal device 100 uses the form information table shown in FIG. 3A to generate output information 14 according to the output format desired by the user.

フォーム情報テーブルは、フォーム識別情報（フォームＩＤ）Ａ、ユーザ識別情報（ユーザＩＤ）Ｂ、フォーム情報Ｃ及び話者数情報Ｄを対応付けて含む。なお、ユーザ識別情報Ｂは、端末装置１００を用いて会議の議事録を生成するユーザを一意に識別する識別情報である。なお、上述したように、このユーザは、会議の音声情報から議事録を生成する作業を行う人であって、当該会議において発言する話者であっても、話者でなくてもよい。 The form information table includes form identification information (form ID) A, user identification information (user ID) B, form information C, and number of speakers information D in association with each other. Note that the user identification information B is identification information that uniquely identifies a user who uses the terminal device 100 to generate meeting minutes. Note that, as described above, this user is a person who generates the minutes from the audio information of the conference, and may or may not be a speaker who speaks at the conference.

フォーム情報Ｃは、ユーザにより生成され、図１に示したようにエレメント情報１２、属性情報１６及び話者情報１８の出力情報１４のなかにおける出力及び表示のフォーム１０を示す。話者数情報Ｄは、議事録の生成の対象となる会議の出席者の数、つまり、エレメント情報１２を含む発言を行いえた人の数を示し、会議の開始前に、ユーザにより予め設定される。フォーム識別情報Ａは、対応付けられたユーザ識別情報Ｂ、フォーム情報Ｃ及び話者数情報Ｄを一意に識別する。 Form information C is generated by a user and shows a form 10 for output and display in output information 14 of element information 12, attribute information 16, and speaker information 18 as shown in FIG. The number of speakers information D indicates the number of attendees of the meeting for which minutes are to be generated, that is, the number of people who were able to speak including the element information 12, and is set in advance by the user before the start of the meeting. Ru. Form identification information A uniquely identifies associated user identification information B, form information C, and number of speakers information D.

図３Ｂは、音声情報テーブルを示す図である。端末装置１００は、音声情報テーブルを、音声情報から出力情報１４を生成するために用いる。音声情報テーブルは、音声識別情報（音声ＩＤ）Ｅ、ユーザ識別情報Ｂ、音声情報Ｇ、テキスト情報Ｈ、変換情報Ｉ、エレメント情報Ｊ（エレメント情報１２）及び出力情報Ｋ（出力情報１４）を対応付けて含む。なお、ユーザ識別情報Ｂは、図３Ａを参照して説明したフォーム情報テーブルに含まれるユーザ識別情報Ｂと同じであり、フォーム情報テーブルの各行と、音声情報テーブルの各行とは、ユーザ識別情報Ｂにより対応付けられうる。 FIG. 3B is a diagram showing the audio information table. The terminal device 100 uses the audio information table to generate output information 14 from audio information. The audio information table corresponds to audio identification information (voice ID) E, user identification information B, audio information G, text information H, conversion information I, element information J (element information 12), and output information K (output information 14). Attach and include. Note that the user identification information B is the same as the user identification information B included in the form information table described with reference to FIG. 3A, and each row of the form information table and each row of the voice information table is the user identification information B. can be correlated by

音声識別情報Ｅは、音声識別情報Ｅに対応付けられた音声情報Ｇ、テキスト情報Ｈ、変換情報Ｉ、エレメント情報Ｊ及び出力情報Ｋを一意に識別する。音声情報Ｇは、マイク１１９等から入力され、端末装置１００による処理の対象となる音声情報である。テキスト情報Ｈは、対応付けられた音声情報Ｇに対して音声識別を行うことにより生成されたテキスト情報である。変換情報Ｉは、対応付けられたテキスト情報Ｈを変換して生成された変換情報である。エレメント情報Ｊは、対応付けられた変換情報Ｉを分割して生成された一つ又は複数のエレメント情報１２である。出力情報Ｋは、対応するエレメント情報Ｊから、図３Ａに示したユーザ識別情報Ｂに対応するフォーム情報Ｃに従って生成された出力情報１４である。 The audio identification information E uniquely identifies the audio information G, text information H, conversion information I, element information J, and output information K associated with the audio identification information E. The audio information G is audio information that is input from the microphone 119 or the like and is subject to processing by the terminal device 100. The text information H is text information generated by performing voice identification on the associated voice information G. The conversion information I is conversion information generated by converting the associated text information H. The element information J is one or more pieces of element information 12 generated by dividing the associated conversion information I. Output information K is output information 14 generated from corresponding element information J according to form information C corresponding to user identification information B shown in FIG. 3A.

図３Ｃは、エレメント情報テーブルを示す図である。図３Ｃに示すように、エレメント情報テーブルは、エレメント情報Ｊ（エレメント情報１２）、属性情報１６を示すフラグ情報Ｍ、及び話者情報１８を示す話者情報Ｎを対応付けて含む。エレメント情報Ｊは、図３Ｂを参照して説明した音声情報テーブルのエレメント情報Ｊと同じであり、音声情報テーブルの各行と、エレメント情報テーブルの各行とは、エレメント情報Ｊにより対応付けられる。 FIG. 3C is a diagram showing an element information table. As shown in FIG. 3C, the element information table includes element information J (element information 12), flag information M indicating attribute information 16, and speaker information N indicating speaker information 18 in association with each other. The element information J is the same as the element information J of the audio information table described with reference to FIG. 3B, and each row of the audio information table and each row of the element information table are correlated by the element information J.

フラグ情報Ｍは、テキスト情報Ｈに含まれる一又は複数の変換情報から生成された一つ又は複数のエレメント情報Ｊ（エレメント情報１２）それぞれの属性情報１６を示す情報である。図１を参照して説明したように、フラグ情報Ｍは、エレメント情報Ｊ（エレメント情報１２）それぞれの属性情報１６の「無属性」、「期限」、「Ｔｏ－Ｄｏ」及び「結論」のうちの一つ以上を示す。また、話者情報Ｎは、図１において話者Ａ～Ｃと例示したように、エレメント情報Ｊを含む変換情報を発言した会議の参加者を示す。 Flag information M is information indicating attribute information 16 of each of one or more pieces of element information J (element information 12) generated from one or more pieces of conversion information included in text information H. As explained with reference to FIG. 1, the flag information M is selected from among "no attribute", "deadline", "To-Do", and "conclusion" of the attribute information 16 of each element information J (element information 12). Indicates one or more of the following. Further, the speaker information N indicates the participants of the conference who have uttered the conversion information including the element information J, as exemplified as speakers A to C in FIG.

４．端末装置１００による音声情報の処理により得られる情報
以下、図４Ａ～図４Ｄを参照して、端末装置１００による音声情報の処理により得られる情報を、具体例を挙げて説明する。図４Ａは、図３Ｂに示した音声情報テーブルに含まれるテキスト情報Ｈを例示する図である。 4. Information Obtained by Processing Audio Information by Terminal Device 100 Hereinafter, information obtained by processing audio information by terminal device 100 will be described with reference to FIGS. 4A to 4D, using specific examples. FIG. 4A is a diagram illustrating text information H included in the audio information table shown in FIG. 3B.

端末装置１００は、図３Ｂに示した音声情報Ｇに対して音声認識処理を行い、図４Ａに示すように、「え～まずＡさんが○月×日までにセミナー資料を生成してくださいＢさんは△月□日までにチェックをしてくださいそれではＡさんとＢさんが資料を作るということで本日は終了します」とのテキスト情報Ｈを生成する。なお、このテキスト情報Ｈは、音声情報Ｇに対して単に音声識別処理を行って生成されたので、テキストのみを含み、これ以外の句読点情報等の区切情報を含まない。 The terminal device 100 performs voice recognition processing on the voice information G shown in FIG. 3B, and as shown in FIG. Please check by △ month □ day. Then, Mr. A and Mr. B will prepare the materials, so we will finish today.'' Text information H is generated. Note that this text information H is generated by simply performing voice identification processing on the voice information G, so it includes only text and does not include other delimiter information such as punctuation mark information.

図４Ｂは、図４Ａに示したテキスト情報から生成された音声情報テーブル（図３Ｂ）に含まれる変換情報を例示する図である。端末装置１００は、複数の文章を含むテキスト情報を、文章ごとに分割し、分割したテキスト情報それぞれの末尾に付された敬語等、議事録に含めるべきでない部分を削除する。さらに、端末装置１００は、文章の末尾に句読点情報等の区切情報を付加することにより、図４Ｂに示す変換情報を生成する。生成された変換情報の内容は、例えば、「え～、まずＡさんが○月×日までに、セミナー資料を生成する。Ｂさんは△月□日までにチェックをする。それでは、ＡさんとＢさんが資料を作るということで本日は終了する。」となる。 FIG. 4B is a diagram illustrating conversion information included in the audio information table (FIG. 3B) generated from the text information shown in FIG. 4A. The terminal device 100 divides text information including a plurality of sentences into sentences, and deletes portions that should not be included in the minutes, such as honorifics added at the end of each of the divided text information. Furthermore, the terminal device 100 generates conversion information shown in FIG. 4B by adding delimiter information such as punctuation mark information to the end of the sentence. The content of the generated conversion information is, for example, "Well, first, Mr. A will generate seminar materials by XX month and date. Mr. B will check it by △ month and □ day. Then, with Mr. A. Mr. B will prepare the materials, so today is over.''

図４Ｃは、図４Ｂに示した変換情報から生成された音声情報テーブル（図３Ｂ）に含まれるエレメント情報１２（エレメント情報Ｊ）を例示する図である。端末装置１００は、図４Ｂに示したように分割された変換情報に対して、形態素解析処理、機能素解析処理、特徴語解析処理及び構造化処理を行うことにより、図４Ｃに示すエレメント情報１２を生成する。エレメント情報１２それぞれは、図１を参照して上述したように、「無属性」、「期限」、「Ｔｏ－Ｄｏ」及び「結論」の少なくとも一つの属性が対応付けられる単語又は単語群である。なお、この説明においては、エレメント情報１２それぞれに、一つずつ属性を示すフラグが対応付けられる場合を具体例とする。 FIG. 4C is a diagram illustrating element information 12 (element information J) included in the audio information table (FIG. 3B) generated from the conversion information shown in FIG. 4B. The terminal device 100 performs a morphological analysis process, a functional element analysis process, a feature word analysis process, and a structuring process on the converted information divided as shown in FIG. 4B, thereby generating the element information 12 shown in FIG. 4C. generate. As described above with reference to FIG. 1, each element information 12 is a word or a group of words to which at least one attribute of "no attribute", "deadline", "To-Do", and "conclusion" is associated. . In this description, a specific example will be taken in which each element information 12 is associated with a flag indicating an attribute.

図４Ｄは、チェックボックスと、図４Ｃに示したエレメント情報１２と、属性を示すフラグＦ１～Ｆ４と、話者Ａ～Ｃを示す話者情報１８を対応付けたＵＩ画像を例示する図である。図４Ｄに示すように、このＵＩ画像は、左側から、チェックボックスと、エレメント情報１２と、フラグＦ１～Ｆ４（属性情報１６）と、話者情報１８（話者Ａ～Ｃ）とを含む。なお、話者情報１８の数は、Ａ～Ｃの３つに限定されず、会議の出席者の人数に応じて増減する。端末装置１００は、図４Ｃに示したエレメント情報１２それぞれに、最も相応しいと推定される属性を示すフラグＦ１～Ｆ４のいずれかを対応付けてディスプレイに表示する。 FIG. 4D is a diagram illustrating a UI image in which check boxes, element information 12 shown in FIG. 4C, flags F1 to F4 indicating attributes, and speaker information 18 indicating speakers A to C are associated with each other. . As shown in FIG. 4D, this UI image includes, from the left side, check boxes, element information 12, flags F1 to F4 (attribute information 16), and speaker information 18 (speakers A to C). Note that the number of speaker information items 18 is not limited to three, A to C, and increases or decreases depending on the number of attendees at the conference. The terminal device 100 associates each of the element information 12 shown in FIG. 4C with one of the flags F1 to F4 indicating an attribute estimated to be most appropriate, and displays it on the display.

なお、図４Ｄには、端末装置１００が、エレメント情報１２の「え～」を、感動詞等に対応する単語と判断し、このエレメント情報１２に、無属性を示すフラグＦ１と、話者Ｂを示す話者情報１８とを対応付けた場合が例示されている。また、図４Ｄには、端末装置１００が、エレメント情報１２の「まず」に、期限の属性を示すフラグＦ２と、話者Ｂを示す話者情報１８とを対応付けた場合が例示されている。また、図４Ｄには、端末装置１００が、端末装置１００が、エレメント情報１２の「Ａさんが○月×日までに」に、期限の属性を示すフラグＦ２と、話者Ｂを示す話者情報１８とを対応付けた場合が例示されている。また、図４Ｄには、端末装置１００が、エレメント情報１２の「セミナー資料を生成する」に、Ｔｏ－Ｄｏの属性を示すフラグＦ３と、話者Ｂを示す話者情報１８とを対応付けた場合が例示されている。 In addition, in FIG. 4D, the terminal device 100 determines that "eh" in the element information 12 is a word corresponding to an interjection, etc., and adds a flag F1 indicating no attribute to the element information 12, and the speaker B A case is illustrated in which the speaker information 18 is associated with the speaker information 18 indicating . Further, FIG. 4D illustrates a case where the terminal device 100 associates "first" of the element information 12 with a flag F2 indicating an attribute of a deadline and speaker information 18 indicating speaker B. . In addition, in FIG. 4D, the terminal device 100 adds a flag F2 indicating a deadline attribute to "Mr. A by x month x date" of element information 12, and a flag F2 indicating a deadline attribute and a speaker name indicating speaker B. A case where information 18 is associated is illustrated. Further, in FIG. 4D, the terminal device 100 associates the element information 12 "generate seminar materials" with the flag F3 indicating the To-Do attribute and the speaker information 18 indicating the speaker B. A case is illustrated.

また、図４Ｄには、端末装置１００が、エレメント情報１２の「Ｂさんは△月□日までに」に、期限の属性を示すフラグＦ２と、話者Ａを示す話者情報１８とを対応付けた場合が例示されている。また、図４Ｄには、端末装置１００が、エレメント情報１２の「チェックをする」にＴｏ－Ｄｏの属性を示すフラグＦ３と、話者Ａを示す話者情報１８とを対応付けた場合が例示されている。 Further, in FIG. 4D, the terminal device 100 corresponds the flag F2 indicating the deadline attribute and the speaker information 18 indicating the speaker A to "Mr. B by △ month □ date" of the element information 12. An example is shown in which it is attached. Further, FIG. 4D illustrates a case where the terminal device 100 associates the flag F3 indicating the attribute of To-Do with "Check" in the element information 12, and the speaker information 18 indicating the speaker A. has been done.

また、図４Ｄには、端末装置１００が、エレメント情報１２の「それでは」を感動詞等に対応する単語と判断し、無属性を示すフラグＦ１と、話者Ｃを示す話者情報１８とを対応付ける場合が例示されている。また、図４Ｄには、端末装置１００が、エレメント情報１２の「ＡさんとＢさんが資料を作る」に、結論を示すフラグＦ４と、話者Ｃを示す話者情報１８とを対応付けた場合が例示されている。また、図４Ｄには、端末装置１００が「ということで本日は終了する」を、感動詞等に対応する単語及び単語群であると判断し、無属性を示すフラグＦ１と話者Ｃを示す話者情報１８とを対応付けた場合が例示されている。 Further, in FIG. 4D, the terminal device 100 determines that "Sorere" in the element information 12 is a word corresponding to an interjection, etc., and sets the flag F1 indicating no attribute and the speaker information 18 indicating the speaker C. An example of a case of association is shown. Further, in FIG. 4D, the terminal device 100 associates the element information 12 "Mr. A and Mr. B create materials" with the flag F4 indicating a conclusion and the speaker information 18 indicating the speaker C. A case is illustrated. Further, in FIG. 4D, the terminal device 100 determines that "That's it for today" is a word or word group corresponding to an interjection, etc., and indicates a flag F1 indicating no attribute and a speaker C. A case is illustrated in which the information is associated with speaker information 18.

上述したように、端末装置１００は、図４Ｄに示したＵＩ画像に、チェックボックスと、エレメント情報１２と、エレメント情報１２に対応付けることを推奨（リコメンド）するフラグ（属性情報１６）及び話者情報１８とを対応付けて表示する。ユーザは、最も左に位置するチェックボックスにチェック（レ印）を入れることにより、推奨されたエレメント情報１２とフラグ（属性情報１６）と話者情報１８との組み合わせを承諾することができる。あるいは、ユーザが、図４Ｄに示したＵＩ画像に対して、マウス１１７及びハードキー１１８等を用いた編集操作を行うことにより、エレメント情報１２、フラグ及び話者情報１８を適宜、編集できる。 As described above, the terminal device 100 includes a check box, element information 12, a flag (attribute information 16) that recommends association with the element information 12, and speaker information in the UI image shown in FIG. 4D. 18 are displayed in association with each other. The user can approve the recommended combination of element information 12, flag (attribute information 16), and speaker information 18 by checking (marking) the leftmost checkbox. Alternatively, the user can edit the element information 12, flags, and speaker information 18 as appropriate by performing editing operations on the UI image shown in FIG. 4D using the mouse 117, hard keys 118, and the like.

つまり、ユーザは、ＵＩ画像に対して操作を行うことにより、エレメント情報１２に間違い等がある場合には、この間違い等を修正できる。また、ユーザは、同様に、エレメント情報１２と対応付けられたフラグ（属性情報１６）に間違い等がある場合には、エレメント情報１２に対応付けられたフラグを修正できる。また、ユーザは、同様に、エレメント情報１２に対応づけられた話者情報１８に間違い等がある場合には、エレメント情報１２に対応付けられた話者情報１８を修正できる。 In other words, if there is a mistake in the element information 12, the user can correct the mistake by performing an operation on the UI image. Similarly, if there is a mistake in the flag (attribute information 16) associated with the element information 12, the user can modify the flag associated with the element information 12. Similarly, if there is a mistake in the speaker information 18 associated with the element information 12, the user can modify the speaker information 18 associated with the element information 12.

なお、１つのエレメント情報１２が２つのエレメント情報１２に分割された場合には、端末装置１００は、図４Ｄにおいてこのエレメント情報１２が含まれていた１つの行を、分割後のエレメント情報１２をそれぞれ含む２つの行に分割する。さらに、端末装置１００は、ユーザの入力操作に応じて、分割後の２つのエレメント情報１２それぞれに対してフラグと話者情報１８とを対応付ける。あるいは、端末装置１００は、自動的に、分割後の２つのエレメント情報１２それぞれに対してフラグと話者情報１８とを対応付ける。 Note that when one piece of element information 12 is divided into two pieces of element information 12, the terminal device 100 replaces the one row that contained this element information 12 in FIG. 4D with the element information 12 after the division. Split into two lines each containing: Further, the terminal device 100 associates a flag and speaker information 18 with each of the two pieces of element information 12 after division, in accordance with the user's input operation. Alternatively, the terminal device 100 automatically associates the flag and the speaker information 18 with each of the two pieces of element information 12 after division.

あるいは、図４Ｄにおいて隣り合う２つのエレメント情報１２が１つのエレメント情報１２にマージ（併合）されることがある。この場合には、端末装置１００は、図４Ｄにおいてこれらのエレメント情報１２が含まれていた２つの行を、分割後の１つのエレメント情報１２を含む１つの行とする。さらに、端末装置１００は、ユーザの入力操作に応じて、マージ後のエレメント情報１２に対してフラグと話者情報１８とを対応付ける。あるいは、端末装置１００は、自動的に、マージ後のエレメント情報１２に対してフラグと話者情報１８とを対応付ける。 Alternatively, two adjacent pieces of element information 12 in FIG. 4D may be merged into one piece of element information 12. In this case, the terminal device 100 converts the two rows that included these element information 12 in FIG. 4D into one row that includes one piece of element information 12 after division. Further, the terminal device 100 associates the flag and the speaker information 18 with the merged element information 12 in accordance with the user's input operation. Alternatively, the terminal device 100 automatically associates the flag and the speaker information 18 with the element information 12 after merging.

５．端末装置１００の処理
以下、端末装置１００の処理を説明する。まず、図５Ａ及び図５Ｂを参照して、ユーザが端末装置１００に、出力情報１４の出力形式を示すフォーム情報Ｃを登録する処理を説明する。図５Ａは、ユーザによる端末装置１００に対して、フォーム情報テーブル（図３Ａ）に含まれるフォーム情報のうちの出力情報１４のフォーム情報Ｃを示す情報を登録する処理を示すフローチャートである。 5. Processing of the terminal device 100 The processing of the terminal device 100 will be described below. First, with reference to FIGS. 5A and 5B, a process in which the user registers form information C indicating the output format of the output information 14 in the terminal device 100 will be described. FIG. 5A is a flowchart showing a process in which the user registers information indicating form information C of the output information 14 of the form information included in the form information table (FIG. 3A) in the terminal device 100.

図５Ａに示すＳ１００において、端末装置１００（図２）のプロセッサ１１２は、入力インターフェイス１１６を介してユーザの入力操作を受け入れ、当該ユーザのユーザ識別情報を受信したか否かを判断する。プロセッサ１１２は、ユーザ識別情報を受信した場合（Ｙ）にはＳ１０２の処理に進み、受信しなかった場合（Ｎ）にはＳ１００の処理にとどまる。 In S100 shown in FIG. 5A, the processor 112 of the terminal device 100 (FIG. 2) accepts a user's input operation via the input interface 116, and determines whether or not the user identification information of the user has been received. If the processor 112 receives the user identification information (Y), it proceeds to the process of S102, and if it does not receive the user identification information (N), it remains in the process of S100.

Ｓ１０２において、端末装置１００のプロセッサ１１２は、Ｓ１００における入力操作を行ったユーザを認証するための処理を行う。 In S102, the processor 112 of the terminal device 100 performs processing to authenticate the user who performed the input operation in S100.

Ｓ１０４において、プロセッサ１１２は、Ｓ１０２の処理により、Ｓ１００における入力操作を行ったユーザが認証されたか否かを判断する。プロセッサ１１２は、ユーザが認証された場合（Ｙ）にはＳ１０６の処理に進み、認証されなかった場合（Ｎ）には処理を終了する。 In S104, the processor 112 determines whether the user who performed the input operation in S100 has been authenticated through the process in S102. If the user is authenticated (Y), the processor 112 proceeds to the process of S106, and if the user is not authenticated (N), the process ends.

Ｓ１０６において、プロセッサ１１２は、入力インターフェイス１１６を介してユーザの入力操作に応じて、当該ユーザによるフォーム情報Ｃを入力する入力操作、又は複数のフォーム情報Ｃのいずれかを選択する選択操作を受け入れる。プロセッサ１１２は、入力されたフォーム情報Ｃ、又は選択されたフォーム情報Ｃを、入力インターフェイス１１６を介して受け入れる。なお、出力情報１４の出力形式は、通信インターフェイス１１４を介して、通信ネットワークに接続された他の装置から受信されることもある。 In S106, the processor 112 accepts an input operation for inputting form information C or a selection operation for selecting one of a plurality of form information C by the user in response to the user's input operation via the input interface 116. Processor 112 accepts input form information C or selected form information C via input interface 116 . Note that the output format of the output information 14 may be received from another device connected to the communication network via the communication interface 114.

Ｓ１０８において、端末装置１００のプロセッサ１１２は、Ｓ１０６の処理において受信した出力情報１４のフォーム情報Ｃにより、図３Ａに示したフォーム情報テーブルに含まれるフォーム情報Ｃを更新し、登録する。さらに、プロセッサ１１２は、更新されたフォーム情報テーブルをメモリ１１３に記憶する。図５Ｂに示すように、フォーム情報Ｃは、図１に示した出力情報１４を、出力インターフェイス１１１を介してディスプレイに表示するときに用いられる出力情報画面に含まれる属性情報１６の項目及びレイアウト等を定義する。 In S108, the processor 112 of the terminal device 100 updates and registers the form information C included in the form information table shown in FIG. 3A using the form information C of the output information 14 received in the process of S106. Furthermore, processor 112 stores the updated form information table in memory 113. As shown in FIG. 5B, the form information C includes the items and layout of the attribute information 16 included in the output information screen used when displaying the output information 14 shown in FIG. 1 on the display via the output interface 111. Define.

図５Ｂは、フォーム情報により示される出力情報１４のフォーム情報Ｃが示すフォーム１０を例示する図である。図５Ｂに示すように、出力情報１４のフォーム１０は、結論の項目と、期限の項目と、Ｔｏ－Ｄｏの項目とを表示する部分を含む。フォーム情報Ｃにより示されるフォーム１０に従って、出力情報１４の画面において、結論の属性を示すフラグＦ４に対応付けられたエレメント情報１２が上側半分に表示される。また、このフォーム１０に従って、下側半分の左側に期限の属性を示すフラグＦ２に対応付けられたエレメント情報１２が表示される。 FIG. 5B is a diagram illustrating the form 10 indicated by the form information C of the output information 14 indicated by the form information. As shown in FIG. 5B, the form 10 of the output information 14 includes a portion displaying a conclusion item, a deadline item, and a to-do item. According to the form 10 indicated by the form information C, the element information 12 associated with the flag F4 indicating the attribute of the conclusion is displayed in the upper half of the output information 14 screen. Further, according to this form 10, element information 12 associated with the flag F2 indicating the deadline attribute is displayed on the left side of the lower half.

また、このフォーム１０に従って、下側半分の右側にはＴｏ－Ｄｏの属性を示すフラグＦ３に対応付けられたエレメント情報１２と、このエレメント情報１２に対応付けられた話者情報１８（図５Ｂにおいて話者Ａ，Ｂ）が表示される。なお、上述したように、エレメント情報１２に対応付けられた話者情報１８が示す話者は、Ｔｏ－Ｄｏの属性が付されたエレメント情報１２の指示者と推定される。この推定が間違っている場合には、ユーザは、図４Ｄを参照して上述したＵＩ画像を用いて、このような間違いを修正できる。 In addition, according to this form 10, on the right side of the lower half are element information 12 associated with the flag F3 indicating the attribute of To-Do, and speaker information 18 associated with this element information 12 (see FIG. 5B). Speakers A and B) are displayed. Note that, as described above, the speaker indicated by the speaker information 18 associated with the element information 12 is estimated to be the person who directed the element information 12 to which the To-Do attribute is attached. If this estimation is wrong, the user can correct such mistakes using the UI image described above with reference to FIG. 4D.

次に、図６Ａ～図６Ｃ等を参照して、端末装置１００によるユーザ識別情報の入力から出力情報１４の生成までの処理を説明する。図６Ａは、端末装置１００による音声情報処理を示すフローチャートである。図６Ａに示すように、Ｓ１２０において、端末装置１００（図２）のプロセッサ１１２は、入力インターフェイス１１６を介して、一又は複数のユーザのいずれかによるユーザ識別情報を受け入れる。プロセッサ１１２は、受け入れたユーザ識別情報Ｂによりフォーム情報テーブル及び音声情報テーブル（図３Ａ及び図３Ｂ）を更新して、ユーザ識別情報Ｂを登録し、メモリ１１３に記憶する。 Next, with reference to FIGS. 6A to 6C and the like, processing from input of user identification information to generation of output information 14 by the terminal device 100 will be described. FIG. 6A is a flowchart showing audio information processing by the terminal device 100. As shown in FIG. 6A, at S120, processor 112 of terminal device 100 (FIG. 2) accepts user identification information from one or more users via input interface 116. The processor 112 updates the form information table and the voice information table (FIGS. 3A and 3B) with the accepted user identification information B, registers the user identification information B, and stores it in the memory 113.

Ｓ１２２において、プロセッサ１１２は、マイク１１９から音声情報を受け入れる。あるいは、プロセッサ１１２は、通信インターフェイス１１４を介して、通信ネットワークに接続された他の装置から音声情報を受信する。プロセッサ１１２は、これらの音声情報により音声情報テーブルを更新し、音声情報を登録し、メモリ１１３に記憶する。 At S122, processor 112 accepts audio information from microphone 119. Alternatively, processor 112 receives audio information from other devices connected to the communication network via communication interface 114. The processor 112 updates the audio information table with this audio information, registers the audio information, and stores it in the memory 113.

Ｓ１２４において、プロセッサ１１２は、メモリ１１３に記憶した音声情報を読み出し、読み出した音声情報に対して音声認識処理を行い、図４Ａに例示したテキスト情報を生成する。プロセッサ１１２は、生成したテキスト情報により音声情報テーブルを更新し、テキスト情報を登録し、メモリ１１３に記憶する。 In S124, the processor 112 reads the audio information stored in the memory 113, performs audio recognition processing on the read audio information, and generates the text information illustrated in FIG. 4A. Processor 112 updates the audio information table with the generated text information, registers the text information, and stores it in memory 113.

Ｓ１２６において、プロセッサ１１２は、メモリ１１３からテキスト情報を読み出して、読み出したテキスト情報に対して、形態素解析処理を行う。プロセッサ１１２は、この形態素解析処理により、入力されたテキスト情報の範囲を、形態素、及び複数の形態素を含む形態素群に分割する。 In S126, the processor 112 reads text information from the memory 113 and performs morphological analysis processing on the read text information. Through this morphological analysis process, the processor 112 divides the range of input text information into morphemes and morpheme groups including a plurality of morphemes.

なお、形態素は、言語学の用語であって、意味をもつ表現要素の最小単位であり、ある言語において、それ以上分解したら意味をなさなくなるところまで分割して抽出された音素のまとまりである。なお、英語等のように、単語が一つずつ分かち書きされる言語においては、例外はあるが、ほぼ、１つの単語が一つの形態素である。つまり、複数の形態素を含むテキスト情報は、ワードプロセッサを利用してユーザにより生成されるテキスト情報、及びＷｅｂサーバにより提供されるテキスト情報など、ごく一般的で普通のテキスト情報である。 A morpheme is a term in linguistics, and is the smallest unit of meaningful expressive elements, and is a group of phonemes extracted in a language by dividing them to the point where they no longer make sense if broken down any further. Note that in languages such as English where each word is written separately, one word is almost always one morpheme, although there are exceptions. That is, the text information including a plurality of morphemes is very general text information, such as text information generated by a user using a word processor and text information provided by a web server.

なお、端末装置１００により、日本語のテキスト情報から議事録が生成される場合が具体例とされている。例えば、テキスト情報が、「テキスト情報に」という文字列を含んでいる場合には、このテキスト情報に含まれる形態素は、「テキスト」、「情報」及び「に」である。なお、この実施形態においては、複数の形態素「テキスト」及び「情報」を含む「テキスト情報」、及び複数の形態素「情報」及び「に」を含む「情報に」等、複数の形態素を含む音素のまとまりは「形態素群」と記載される。 Note that a specific example is a case where the terminal device 100 generates minutes from Japanese text information. For example, if the text information includes the character string "text information", the morphemes included in this text information are "text", "information", and "ni". Note that in this embodiment, phonemes that include multiple morphemes, such as "text information" that includes multiple morphemes "text" and "information", and "information ni" that includes multiple morphemes "information" and "ni" A group of morphemes is described as a "morpheme group."

Ｓ１２８において、プロセッサ１１２は、Ｓ１２６の処理により得られた形態素及び形態素群それぞれに対して関係性解析処理のための機能素解析処理を行い、形態素及び形態素群それぞれの機能を特定する。 In S128, the processor 112 performs functional element analysis processing for relationship analysis processing on each of the morphemes and morpheme groups obtained through the processing in S126, and specifies the functions of each of the morphemes and morpheme groups.

Ｓ１３０において、プロセッサ１１２は、Ｓ１２６の処理により得られた形態素及び形態素群それぞれに対して、特徴語辞書を用いて関係性解析処理のための特徴語解析処理を行う。 In S130, the processor 112 uses the feature word dictionary to perform feature word analysis processing for relationship analysis processing on each of the morphemes and morpheme groups obtained through the processing in S126.

Ｓ１３２において、プロセッサ１１２は、Ｓ１２８及びＳ１３０における機能素解析処理の結果及び特徴語解析処理の結果に基づいて、関係性解析処理を行い、形態素同士の間、形態素と形態素群との間、及び形態素群同士の間の関係性を特定する。なお、「形態素同士の間」、「形態素と形態素群との間」及び「形態素同士の間」は、「形態素及び形態素群の間」と総称される。 In S132, the processor 112 performs relationship analysis processing based on the results of the functional element analysis processing and the characteristic word analysis processing in S128 and S130, and performs relationship analysis processing between morphemes, between morphemes and morpheme groups, and between morphemes. Identify relationships between groups. Note that "between morphemes", "between a morpheme and a morpheme group", and "between morphemes" are collectively referred to as "between a morpheme and a morpheme group."

Ｓ１３４において、プロセッサ１１２は、関係性解析処理の結果に基づいて、図４Ｂに示した変換情報を生成する。具体的には、プロセッサ１１２は、関係性解析処理の結果に基づいて、変換情報の末尾から敬語等を削除し、変換情報の間に句読点情報等の区切情報を挿入して、変換情報を生成する。プロセッサ１１２は、生成した変換情報により音声情報テーブルを更新し、変換情報を登録し、メモリ１１３に記憶する。 In S134, the processor 112 generates the conversion information shown in FIG. 4B based on the result of the relationship analysis process. Specifically, the processor 112 generates converted information by deleting honorific words etc. from the end of the converted information and inserting delimiter information such as punctuation mark information between the converted information based on the result of the relationship analysis process. do. The processor 112 updates the audio information table with the generated conversion information, registers the conversion information, and stores it in the memory 113.

Ｓ１３６において、プロセッサ１１２は、Ｓ１３４の処理において生成された変換情報に対する処理を行い、図４Ｃに示したように、それぞれ特定の意味を有する複数のエレメント情報１２に分割する。プロセッサ１１２は、このような分割により生成したエレメント情報１２により音声情報テーブルを更新し、エレメント情報１２を登録し、メモリ１１３に記憶する。 In S136, the processor 112 processes the conversion information generated in the process of S134, and divides it into a plurality of element information 12 each having a specific meaning, as shown in FIG. 4C. The processor 112 updates the audio information table with the element information 12 generated by such division, registers the element information 12, and stores it in the memory 113.

Ｓ１３８において、プロセッサ１１２は、Ｓ１３６の処理により生成されたエレメント情報１２を含む変換情報に対応する音声情報に対して話者識別処理を行う。この処理の結果として、プロセッサ１１２は、エレメント情報１２を含む変換情報を発言した話者を特定する。また、プロセッサ１１２は、特定した話者を示す話者情報１８を生成する。また、プロセッサ１１２は、生成した話者情報１８により、図３Ｃに示したエレメント情報テーブルに含まれる話者情報Ｎを更新する。さらに、プロセッサ１１２は、当該話者情報Ｎに対応するエレメント情報Ｊ及びフラグ情報Ｍ１２を対応付けてメモリ１１３に記憶する。 In S138, the processor 112 performs speaker identification processing on the audio information corresponding to the conversion information including the element information 12 generated in the process of S136. As a result of this processing, processor 112 identifies the speaker who uttered the transformation information including element information 12. The processor 112 also generates speaker information 18 indicating the identified speaker. Furthermore, the processor 112 updates the speaker information N included in the element information table shown in FIG. 3C using the generated speaker information 18. Further, the processor 112 stores the element information J and flag information M12 corresponding to the speaker information N in the memory 113 in association with each other.

Ｓ１４０において、プロセッサ１１２は、関係性解析処理の結果に基づいて処理を行う。この処理により、プロセッサ１１２は、エレメント情報１２に、フラグＦ１～Ｆ４（属性情報１６）の一つ以上を対応付ける。さらに、プロセッサ１１２は、対応付けられたエレメント情報１２及びフラグＦ１～Ｆ４の一つ以上に、さらに、チェックボックス、及びＳ１３８の処理により生成された話者情報１８を対応付ける。プロセッサ１１２は、エレメント情報１２に対応付けたチェックボックス、エレメント情報１２、フラグＦ１～Ｆ４及び話者情報１８を、図４Ｄに示したＵＩ画像に表示する。 In S140, the processor 112 performs processing based on the results of the relationship analysis processing. Through this process, the processor 112 associates the element information 12 with one or more of the flags F1 to F4 (attribute information 16). Furthermore, the processor 112 further associates the associated element information 12 and one or more of the flags F1 to F4 with the check box and the speaker information 18 generated by the process of S138. The processor 112 displays check boxes associated with the element information 12, element information 12, flags F1 to F4, and speaker information 18 on the UI image shown in FIG. 4D.

プロセッサ１１２は、このＵＩ画像の表示により、エレメント情報１２それぞれに対応付ける属性情報１６及び話者情報１８を、ユーザに推奨（リコメンド）する。なお、上述したように、フラグＦ１は無属性を示し、フラグＦ２は期限の属性を示し、フラグＦ３はＴｏ－Ｄｏの属性を示し、フラグＦ４は結論の属性を示す。ユーザは、図４Ｄを参照して上述したように、Ｓ１３８の処理において対応付けられたエレメント情報１２、フラグ及び話者情報１８に対して適宜、編集及び修正の操作を行う。プロセッサ１１２は、ユーザによる編集及び修正の操作を受け入れ、エレメント情報１２、属性情報１６及び話者情報１８の内容及びこれらの情報の対応付け等に反映させる。ユーザが、ＵＩ画像において確定と記載されたボタンに対する操作を行うと、エレメント情報１２、属性情報１６及び話者情報１８の対応付け等の編集及び修正が終了する。なお、ユーザが、図４Ｄに示したＵＩ画像に対して、議事録に必要な情報、例えば、会議の議題及び出席者等の情報をさらに追加する操作を行ってもよい。プロセッサ１１２は、このような操作を、入力インターフェイス１１６を介して受け入れ、出力情報１４のなかに追加し、出力インターフェイス１１１を介して、フォーム１０に従ってディスプレイに表示する。 By displaying this UI image, the processor 112 recommends the attribute information 16 and speaker information 18 associated with each element information 12 to the user. As described above, the flag F1 indicates no attribute, the flag F2 indicates the deadline attribute, the flag F3 indicates the To-Do attribute, and the flag F4 indicates the conclusion attribute. As described above with reference to FIG. 4D, the user performs editing and modification operations as appropriate on the element information 12, flag, and speaker information 18 associated in the process of S138. The processor 112 accepts editing and modification operations by the user, and reflects them on the contents of the element information 12, attribute information 16, and speaker information 18, and the correspondence of these information. When the user performs an operation on the button labeled "Confirm" in the UI image, editing and modification such as the association of element information 12, attribute information 16, and speaker information 18 is completed. Note that the user may perform an operation to further add information necessary for the minutes, such as information such as the agenda of the meeting and attendees, to the UI image shown in FIG. 4D. Processor 112 accepts such operations via input interface 116 , adds them to output information 14 , and displays them on a display according to form 10 via output interface 111 .

図６Ｂは、属性を示すフラグと対応付けられたエレメント情報１２のリストを例示する図である。Ｓ１４２において、プロセッサ１１２は、Ｓ１４０の処理により編集及び修正されたエレメント情報１２と、無属性を除く属性情報１６の一つ以上と、話者情報１８とを対応付ける。さらに、プロセッサ１１２は、このように対応付けた情報を、図６Ｂに示すリスト形式で、入力インターフェイス１１６を介してディスプレイに表示する。 FIG. 6B is a diagram illustrating a list of element information 12 associated with flags indicating attributes. In S142, the processor 112 associates the element information 12 edited and corrected by the processing in S140, one or more of the attribute information 16 excluding non-attribute, and the speaker information 18. Further, the processor 112 displays the information associated in this way in the list format shown in FIG. 6B on the display via the input interface 116.

Ｓ１４２において、プロセッサ１１２は、図６Ｂに示すように、Ｓ１４０の処理において編集及び修正されたエレメント情報１２、属性情報１６及び話者情報１８を対応付けて含むリストを生成する。さらに、プロセッサ１１２は、生成したリストを、出力インターフェイス１１１を介してディスプレイに表示する。ユーザは、表示されたリストに対する操作を行いうる。つまり、ユーザは、マウス１１７等によりチェックボックス２０にチェックを入れる操作を行うことにより、プロセッサ１１２によりエレメント情報１２それぞれに対応付けられた属性情報１６を承認する。あるいは、ユーザは、マウス１１７等により、属性情報１６に対する操作を行い、所望の属性情報１６に変更する操作を行う。あるいは、ユーザは、話者情報１８に対する操作を行い、所望の話者情報１８に変更する操作を行う。 In S142, the processor 112 generates a list that includes the element information 12, attribute information 16, and speaker information 18 that were edited and modified in the process of S140 in association with each other, as shown in FIG. 6B. Further, processor 112 displays the generated list on a display via output interface 111. The user can perform operations on the displayed list. That is, the user approves the attribute information 16 associated with each element information 12 by the processor 112 by checking the check box 20 using the mouse 117 or the like. Alternatively, the user operates the attribute information 16 using the mouse 117 or the like to change the attribute information 16 to the desired attribute information 16. Alternatively, the user performs an operation on the speaker information 18 to change it to desired speaker information 18.

Ｓ１４２の処理におけるユーザによるこれらの操作は、エレメント情報１２と、期限、Ｔｏ－Ｄｏ及び結論の属性の一つ以上の属性情報１６と、話者情報１８との対応付けを変更したり修正したりする編集操作である。プロセッサ１１２は、ユーザによる編集操作を、入力インターフェイス１１６を介して受け入れ、エレメント情報１２と、期限、Ｔｏ－Ｄｏ及び結論の属性の一つ以上との対応付けを、ユーザによる編集操作に応じて変更する。ただし、Ｓ１４０の処理において、既に、エレメント情報１２と、エレメント情報１２に対応付けられた属性情報１６及び話者情報１８の編集は行われているので、Ｓ１４２における編集操作は省略されうる。 These operations by the user in the process of S142 include changing or modifying the association between the element information 12, one or more attribute information 16 of deadline, to-do, and conclusion attributes, and speaker information 18. This is an editing operation. The processor 112 accepts an editing operation by the user via the input interface 116, and changes the association between the element information 12 and one or more of the attributes of deadline, to-do, and conclusion in accordance with the editing operation by the user. do. However, since the element information 12, the attribute information 16 and the speaker information 18 associated with the element information 12 have already been edited in the process of S140, the editing operation in S142 can be omitted.

Ｓ１４４において、ユーザが、図６Ｂに示したリストを確認し、このリストに含まれ、確定と記載されたボタンに対する操作を行うと、プロセッサ１１２は、この操作を、入力インターフェイス１１６を介して受け入れる。プロセッサ１１２は、この操作に応じて、Ｓ１４０の処理において編集されたエレメント情報１２と属性及び話者との対応付けを確定させる。 In S144, when the user checks the list shown in FIG. 6B and performs an operation on a button included in the list and marked as OK, the processor 112 accepts this operation via the input interface 116. In response to this operation, the processor 112 determines the association between the element information 12 edited in the process of S140 and the attributes and speakers.

図６Ｃは、エレメント情報１２等を、出力情報１４のフォーム１０に含まれ、属性情報１６に対応付けられたボックスに移動させるために用いられるＵＩ画像を例示する図である。Ｓ１４６において、端末装置１００のプロセッサ１１２は、チェックボックス２０以外の属性情報１６、話者情報１８及びエレメント情報１２を、図６Ｂに示したように表示する。さらに、プロセッサ１１２は、図３Ａに示したフォーム情報テーブルから、ユーザのユーザ識別情報Ｂに対応するフォーム情報Ｃを読み出す。さらに、プロセッサ１１２は、読み出したフォーム情報Ｃが示すフォーム１０に従って、出力情報１４を表示する。プロセッサ１１２は、図６Ｂに示したチェックボックス以外の情報と、出力情報１４とを組み合わせる。これにより、プロセッサ１１２は、図６Ｃに示すように、無属性以外の属性を示す属性情報１６と、当該属性に対応付けられたエレメント情報１２及びその話者情報１８と、出力情報１４とを含むＵＩ画像を表示する。 FIG. 6C is a diagram illustrating a UI image used to move the element information 12 and the like to a box included in the form 10 of the output information 14 and associated with the attribute information 16. In S146, the processor 112 of the terminal device 100 displays the attribute information 16 other than the check box 20, the speaker information 18, and the element information 12 as shown in FIG. 6B. Further, processor 112 reads form information C corresponding to user identification information B of the user from the form information table shown in FIG. 3A. Further, the processor 112 displays the output information 14 according to the form 10 indicated by the read form information C. Processor 112 combines information other than the checkboxes shown in FIG. 6B with output information 14. As a result, the processor 112 includes attribute information 16 indicating an attribute other than no attribute, element information 12 associated with the attribute and its speaker information 18, and output information 14, as shown in FIG. 6C. Display UI image.

Ｓ１４８において、ユーザは、図６Ｃに示したＵＩ画像の出力情報１４に含まれ、結論、期限又はＴｏ－Ｄｏのいずれかの属性情報１６に対応付けられたボックスと、エレメント情報１２に対応付けられた属性情報１６とを参照する。さらに、ユーザは、このＵＩ画像に対して、属性情報１６に対応付けられたエレメント情報１２を、同じ属性情報１６に対応付けられた出力情報１４のなかのボックスのなかに移動させる操作を行う。この操作は、マウス１１７を用いてエレメント情報１２等をドラッグ・アンド・ドロップする操作であってよい。 In S148, the user selects a box that is included in the output information 14 of the UI image shown in FIG. The attribute information 16 is referred to. Furthermore, the user performs an operation on this UI image to move the element information 12 associated with the attribute information 16 into a box in the output information 14 associated with the same attribute information 16. This operation may be an operation of dragging and dropping the element information 12 and the like using the mouse 117.

なお、エレメント情報１２に話者情報１８が対応付けられている場合には、ユーザは、エレメント情報１２と、このエレメント情報１２に対応付けられた話者情報１８とを、出力情報１４のなかのボックスに移動させる操作を行う。端末装置１００のプロセッサ１１２は、ユーザの操作を、入力インターフェイス１１６を介して受け入れる。プロセッサ１１２は、受け入れたユーザの操作に従って、エレメント情報１２、又はエレメント情報１２及び話者情報１８を、出力情報１４のボックスそれぞれのなかに移動させて表示する。 Note that if the element information 12 is associated with the speaker information 18, the user can select the element information 12 and the speaker information 18 associated with this element information 12 in the output information 14. Perform the operation to move it to the box. Processor 112 of terminal device 100 accepts user operations via input interface 116 . The processor 112 moves and displays the element information 12 or the element information 12 and the speaker information 18 into each box of the output information 14 according to the accepted user's operation.

Ｓ１５０において、図１は、全てのエレメント情報１２が出力情報１４に含まれるボックスのいずれかに移動されると、プロセッサ１１２は、会議の議事録を生成する。さらに、プロセッサ１１２は、会議の議事録の情報を生成し、図１に示した出力情報１４として、出力インターフェイス１１１を介してディスプレイに表示したり、通信ネットワーク（不図示）を介して他の装置（不図示）に送信したりする等の処理を行う。 At S150, FIG. 1 shows that once all the element information 12 has been moved to any of the boxes included in the output information 14, the processor 112 generates the minutes of the meeting. Furthermore, the processor 112 generates information on the minutes of the meeting and displays it on a display via the output interface 111 as the output information 14 shown in FIG. (not shown).

なお、ここでは、図６Ａに示したＳ１２０～Ｓ１５０の全てを端末装置１００が行う場合を説明したが、これら全ての処理を端末装置１００が実行する必要はない。例えば、例えば、端末装置１００が、Ｓ１２４の処理において生成したテキスト情報をサーバ装置（不図示）に送信し、サーバ装置がＳ１２６～Ｓ１３２の処理を実行してよい。この場合には、Ｓ１３４の処理の前に、サーバ装置がＳ１２６～Ｓ１３２の処理の結果を端末装置１００に受信し、端末装置１００がこれらの処理を受信してＳ１３４～Ｓ１５０の処理を行うこととなる。 Note that although a case has been described here in which the terminal device 100 performs all of S120 to S150 shown in FIG. 6A, it is not necessary for the terminal device 100 to perform all of these processes. For example, the terminal device 100 may transmit the text information generated in the process of S124 to a server device (not shown), and the server device may execute the processes of S126 to S132. In this case, before the process in S134, the server device receives the results of the processes in S126 to S132 to the terminal device 100, and the terminal device 100 receives these processes and performs the processes in S134 to S150. Become.

なお、以上、ユーザが、マウス１１７を用いた手作業で、エレメント情報１２を、出力情報１４のフォーム１０に含まれ、属性情報１６それぞれが付されたボックスの中に移動させる場合を説明した。一方、出力情報１４に含まれるボックスそれぞれに、結論、期限及びＴｏ－Ｄｏの属性情報１６それぞれを付すことにより、プロセッサ１１２は、このような移動を自動的に行うことができる。 In addition, the case where the user manually moves the element information 12 using the mouse 117 into a box included in the form 10 of the output information 14 and attached with each attribute information 16 has been described. On the other hand, by attaching conclusion, deadline, and to-do attribute information 16 to each box included in the output information 14, the processor 112 can automatically perform such movement.

つまり、プロセッサ１１２は、プロセッサ１１２がエレメント情報１２に付された属性と、出力情報１４のフォーム１０に付された属性情報１６とを比較することにより、プロセッサ１１２は、このような移動を自動的に行うことができる。具体的には、プロセッサ１１２は、結論の属性情報１６が付されたエレメント情報１２を、自動的に、出力情報１４のフォーム１０において、結論の属性情報１６が付されたボックスのなかに移動させられる。また、プロセッサ１１２は、期限の属性情報１６が付されたエレメント情報１２を、自動的に、出力情報１４のフォーム１０において、期限の属性情報１６が付されたボックスのなかに移動させられる。さらに、プロセッサ１１２は、Ｔｏ－Ｄｏの属性情報１６及び話者情報１８が付されたエレメント情報１２を、自動的に、出力情報１４のフォーム１０において、Ｔｏ－Ｄｏの属性情報１６が付されたボックスのなかに移動させられる。 In other words, by comparing the attribute attached to the element information 12 and the attribute information 16 attached to the form 10 of the output information 14, the processor 112 automatically performs such movement. can be done. Specifically, the processor 112 automatically moves the element information 12 to which the conclusion attribute information 16 is attached into the box to which the conclusion attribute information 16 is attached in the form 10 of the output information 14. It will be done. Further, the processor 112 automatically moves the element information 12 to which the deadline attribute information 16 is attached to the box to which the deadline attribute information 16 is attached in the form 10 of the output information 14 . Further, the processor 112 automatically converts the element information 12 to which the To-Do attribute information 16 and speaker information 18 are attached into the form 10 of the output information 14. be moved into the box.

以上説明した端末装置１００によれば、会議等の音声情報からエレメント情報１２を生成し、生成したエレメント情報１２に、その属性を適切に対応付けることができる。従って、音声情報から、適切な議事録、様々な記録及び資料を生成できる。また、エレメント情報１２に対応付けられる属性が自動的に選択されてユーザに勧められるので、ユーザが音声情報から議事録等を生成する手間が大幅に省かれる。また、端末装置１００は、多くの人が参加するオフラインミーティング及びオンラインミーティングの音声から、議事録等を自動的に生成するために役立つ。 According to the terminal device 100 described above, it is possible to generate element information 12 from audio information such as a conference, and to appropriately associate the generated element information 12 with its attributes. Therefore, appropriate minutes and various records and materials can be generated from the audio information. Further, since the attributes associated with the element information 12 are automatically selected and recommended to the user, the user's effort to generate minutes and the like from audio information is greatly reduced. Furthermore, the terminal device 100 is useful for automatically generating minutes and the like from the audio of offline meetings and online meetings in which many people participate.

６．変形例等
なお、以上、図６Ａを参照して、「期限」等の属性がフラグを介してエレメント情報１２に対応付けられ、編集される場合が説明されたが、属性は、必ずしもフラグを介してエレメント情報１２に対応付けられなくてよい。また、図６Ａを参照して、ユーザの操作に従って、出力情報１４のフォーム１０に含まれる項目にエレメント情報１２が振り分けられる場合が説明された。しかしながら、図６Ａに示したＳ１４４及びＳ１４６は必須ではない。つまり、ユーザがエレメント情報１２と属性との対応付けを確定させた時点で、プロセッサ１１２は、エレメント情報１２それぞれを、自動的に出力情報１４のフォーム１０に含まれる各項目に振り分けてよい。 6. Modifications, etc.Although a case has been described above with reference to FIG. 6A in which an attribute such as "deadline" is associated with the element information 12 via a flag and edited, the attribute is not necessarily associated with the element information 12 via the flag. It is not necessary to associate the element information 12 with the element information 12. Furthermore, with reference to FIG. 6A, a case has been described in which the element information 12 is distributed to items included in the form 10 of the output information 14 according to the user's operation. However, S144 and S146 shown in FIG. 6A are not essential. That is, at the time when the user establishes the association between the element information 12 and the attributes, the processor 112 may automatically allocate each piece of element information 12 to each item included in the form 10 of the output information 14.

また、以上説明した音声情報の処理方法は、会議の議事録の生成の他に、様々な記録及び資料の生成に応用されうる。また、エレメント情報１２それぞれに対応付けられる属性情報１６は、音声情報の処理方法の用途等に応じて、「無属性」、「期限」、「Ｔｏ－Ｄｏ」及び「結論」以外に、例えば「主体」等の他の属性情報１６を含んでよい。あるいは、属性情報１６は、「無属性」、「期限」、「Ｔｏ－Ｄｏ」及び「結論」の全てを含まなくてよい。また、図１等に示した各種情報の表示の態様は例示であって、表示の態様は、ユーザの好み、端末装置１００の用途などに応じて、適宜、変更されうる。 Furthermore, the audio information processing method described above can be applied to the generation of various records and materials in addition to the generation of meeting minutes. In addition, the attribute information 16 associated with each element information 12 may include, for example, "no attribute," "deadline," "To-Do," and "conclusion," depending on the purpose of the audio information processing method. Other attribute information 16 such as "subject" may be included. Alternatively, the attribute information 16 may not include all of "no attribute", "deadline", "To-Do", and "conclusion". Further, the display mode of various information shown in FIG. 1 and the like is an example, and the display mode can be changed as appropriate depending on the user's preference, the purpose of the terminal device 100, and the like.

また、以上、エレメント情報１２に無属性、結論、期限及びＴｏ－Ｄｏの４種類の属性情報１６が対応付けられる場合が例示されたが、属性情報１６の種類はこれら４種類に限らず、適宜、増やされたり減らされたりしてよい。また、以上、エレメント情報１２の全てに属性情報１６が対応付けられる場合が例示された。一方、例えば、エレメント情報１２に無属性と無属性以外の２種類の属性情報１６のみを対応付け、無属性以外の属性情報１６に対応付けられたエレメント情報１２に、ユーザが任意の種類の属性情報１６を対応付けてよい。あるいは、無属性以外の属性情報１６に対応付けられたエレメント情報１２に、ユーザが、任意に、結論、期限及びＴｏ－Ｄｏのいずれかの種類の属性情報１６を対応付けてよい。 Further, although the case where four types of attribute information 16 such as non-attribute, conclusion, deadline, and To-Do are associated with the element information 12 has been exemplified above, the types of attribute information 16 are not limited to these four types, and can be changed as appropriate. , may be increased or decreased. Further, the case where the attribute information 16 is associated with all of the element information 12 has been exemplified above. On the other hand, for example, only two types of attribute information 16, non-attribute and non-attribute, are associated with the element information 12, and the user can add any type of attribute to the element information 12 that is associated with the attribute information 16 other than the non-attribute. Information 16 may be associated. Alternatively, the user may arbitrarily associate any type of attribute information 16 such as conclusion, deadline, and to-do with element information 12 associated with attribute information 16 other than non-attribute.

実施形態において明示的に説明された装置によってだけでなく、ソフトウェア、ハードウェア又はこれらの組み合わせにより実現されうる。具体的には、実施形態において説明された処理及び手順は、集積回路、揮発性メモリ、不揮発性メモリ、磁気ディスク、光ストレージ等の媒体に、当該処理に相当するロジックを実装することにより実現されうる。また、実施形態において説明された処理及び手順は、それらの処理及び手順をコンピュータプログラムとして実装されえ、端末装置及びサーバ装置を含む各種のコンピュータにより実行されうる。 It can be realized not only by the devices explicitly described in the embodiments, but also by software, hardware, or a combination thereof. Specifically, the processes and procedures described in the embodiments are realized by implementing logic corresponding to the processes in a medium such as an integrated circuit, volatile memory, nonvolatile memory, magnetic disk, or optical storage. sell. Furthermore, the processes and procedures described in the embodiments can be implemented as computer programs and can be executed by various computers including terminal devices and server devices.

実施形態において、単一の装置、ソフトウェア、コンポーネント、及び／又は、モジュールによって実行されると説明された処理及び手順は、複数の装置、複数のソフトウェア、複数のコンポーネント、及び／又は、複数のモジュールによって実行されうる。また、実施形態において、単一のメモリ及び記憶装置に格納される旨が説明された各種情報は、単一の装置に含まれる複数のメモリ又は複数の装置に分散して配置された複数のメモリに分散して格納されうる。さらに、実施形態において説明された複数のソフトウェア及びハードウェアは、それらをより少ない構成要素に統合することにより、又は、より多い構成要素に分解することにより実現されうる。 In embodiments, processes and procedures described as being performed by a single device, software, component, and/or module may be performed by multiple devices, software, components, and/or modules. can be executed by Furthermore, in the embodiments, various types of information described as being stored in a single memory and storage device may be stored in multiple memories included in a single device or multiple memories distributed and arranged in multiple devices. can be distributed and stored. Furthermore, the software and hardware described in the embodiments can be implemented by integrating them into fewer components or by decomposing them into more components.

以上、一実施形態が説明されたが、この実施形態は、例として提示されたものであり、発明の範囲を限定することを意図されていない。これら新規な実施形態は、その他の様々な形態で実施されることができ、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更され得る。これら実施形態やその変形は、実施形態の範囲及び要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 Although one embodiment has been described above, this embodiment is presented as an example and is not intended to limit the scope of the invention. These novel embodiments may be implemented in various other forms, and various omissions, substitutions, and changes may be made without departing from the spirit of the invention. These embodiments and their modifications are included within the scope and gist of the embodiments, as well as within the scope of the invention described in the claims and its equivalents.

１０フォーム、１２エレメント情報、１４出力情報、１６属性情報、１８話者情報、２０チェックボックス、１００端末装置、１１１出力インターフェイス、１１２プロセッサ、１１４通信インターフェイス、１１６入力インターフェイス、１１７マウス、１１８ハードキー、１１９マイク

10 form, 12 element information, 14 output information, 16 attribute information, 18 speaker information, 20 check box, 100 terminal device, 111 output interface, 112 processor, 114 communication interface, 116 input interface, 117 mouse, 118 hard key, 119 Mike

Claims

A processing device comprising at least one processor,
The at least one processor includes:
Accepts input of audio information including the contents of one or more speakers' utterances based on an input operation by a user,
generating text information indicating the content of the utterances of the one or more speakers based on the received voice information;
Divide the generated text information into one or more element information,
Associate at least one attribute information with each divided element information,
Based on the attribute information associated with each element information, in form information having one or more items, at least a part of each element information is associated with any one of the one or more items. generate output information,
configured to perform processing for
Processing equipment.

The at least one processor is configured to further associate speaker information indicating one of the one or more speakers who made the utterance corresponding to the text information with the element information. The processing device according to claim 1.

3. The processing device of claim 2, wherein the at least one processor is configured to generate the speaker information based on the audio information and a preset number of speakers.

The processing device according to any one of claims 1 to 3, wherein the user is included in the one or more speakers.

The at least one processor is configured to perform processing for generating conversion information in which delimiter information indicating a semantic delimiter of the text information is inserted based on the text information. 4. The processing device according to any one of 4.

The at least one processor executes processing for dividing the text information into the element information, which is a group of meaningful words or word groups including a plurality of words, based on the generated conversion information. 6. A processing device according to claim 5, configured to.

The processing device according to claim 5 or 6, wherein the delimiter information is punctuation mark information indicating a punctuation mark.

The processing device according to any one of claims 1 to 7, wherein the element information is editable based on the user's operation.

The attribute information is information indicating at least one of a conclusion indicated by the text information, a task for the speaker or another speaker different from the speaker indicated by the text information, and a deadline for the task. The processing device according to any one of items 1 to 8.

The processing device according to any one of claims 1 to 9, wherein the association of the attribute information is determined based on the user's operation on the attribute information associated with the element information.

In a computer including at least one processor, the method is performed by the at least one processor executing a predetermined instruction command, the method comprising:
receiving input of audio information including utterances from one or more speakers based on an input operation by a user;
generating text information indicating the content of the one or more speakers' utterances based on the received voice information;
dividing the generated text information into one or more element information;
Associate at least one attribute information with each divided element information,
Based on the attribute information associated with each element information, in form information having one or more items, at least a part of each element information is associated with any one of the one or more items. generating output information based on the
processing methods including;

a computer including at least one processor;
Based on the input operation by the user, one receives the input of voice information including the content of utterances from a plurality of speakers;
Generating text information indicating the content of the one or more speakers' statements based on the received voice information,
Divide the generated text information into one or more element information,
Associate at least one attribute information with each divided element information,
Based on the attribute information associated with each element information, in form information having one or more items, at least a part of each element information is associated with any one of the one or more items. generate output information,
act as a processor configured to perform processing;
Processing program.