JP2019138988A

JP2019138988A - Information processing system, method for processing information, and program

Info

Publication number: JP2019138988A
Application number: JP2018020599A
Authority: JP
Inventors: 啓水奥間; Hiromi Okuma
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2018-02-08
Filing date: 2018-02-08
Publication date: 2019-08-22

Abstract

To solve the issue that if all the texts for a voice command in a speech text obtained by voice recognition are deleted as unnecessary texts, the contents important to the meeting in the voice command will also end up being deleted undesirably.SOLUTION: When the contents of speeches in a meeting are voice-recognized so that a minute book is prepared, the range of a spoken voice command which is to be deleted from a header unit, an order unit, and a data unit is determined according to the order part and the command to be recorded in the minute book is left.SELECTED DRAWING: Figure 11

Description

本発明は、人の発話を音声認識してテキスト化する情報処理システムに関するものであり、特に、会議におけるユーザの発話内容から議事録を生成する議事録生成システムに関するものである。 The present invention relates to an information processing system that recognizes a person's utterance and converts it into text, and particularly relates to a minutes generation system that generates minutes from the contents of a user's utterance in a meeting.

従来、人の発話を音声認識してテキスト化するディクテーション技術がある。特許文献１には、人の発話を音声認識してテキスト化し、要約することで、議事録を生成するシステムが開示されている。特許文献１の技術によれば、人の発話から取得したテキスト（以下、「発話テキスト」と言う）を要約して議事録を作成することができる。 Conventionally, there is a dictation technique for recognizing a person's utterance and converting it into text. Patent Document 1 discloses a system for generating minutes by recognizing a person's utterance, converting it into text, and summarizing it. According to the technique of Patent Document 1, it is possible to create a minutes by summarizing text acquired from human speech (hereinafter referred to as “utterance text”).

また、人がシステムに実行してもらいたい処理をコマンド形式で発話することによって、コマンドに応じた処理をシステムで実行できる音声操作技術がある。この一例として、システムの設定（出力音量等）を変更する音声操作がある。しかし、音声操作のためにコマンドを発話すると、音声認識によって、発話テキストにコマンドに相当するテキストが含まれてしまう。そのため特許文献２では、音声入力ワープロシステムにおいて、音声認識してテキスト化した結果から、コマンドに相当するテキストを不要なテキストとして削除する技術が開示されている。これにより、文書とは関係のないコマンドが文書内に残らないようにすることができる。 In addition, there is a voice operation technique that allows a system to execute a process according to a command by speaking a process that a person wants the system to execute in a command format. As an example of this, there is a voice operation for changing system settings (such as output volume). However, when a command is uttered for voice operation, the text corresponding to the command is included in the utterance text due to voice recognition. For this reason, Patent Document 2 discloses a technique for deleting a text corresponding to a command as an unnecessary text from a result of voice recognition and text conversion in a voice input word processor system. Thereby, it is possible to prevent commands not related to the document from remaining in the document.

特許第５１０４７６２号Japanese Patent No. 5104762 特開２０００−７６２４１号公報JP 2000-76241 A

しかしながら、コマンド内にも削除すべきでないテキストが含まれる場合がある。例えば、特許文献１のようなシステムにおいて、会議の場で参加者に対して行う作業の依頼（以下、「アクションアイテム」と言う）や、会議において決定した事項（以下、「決定事項」と言う」）を、音声コマンドで登録できるような機能を想定したとする。このとき、従来技術のように発話テキスト内に含まれるコマンドを不要なテキストとして削除すると、コマンドに含まれるアクションアイテムや決定事項といったテキストが発話テキストから削除される。そのため、発話テキストから生成した議事録に、音声コマンドを用いて登録した、アクションアイテムや決定事項などの、会議の主要な内容が残らないという課題があった。 However, there may be text that should not be deleted in the command. For example, in a system such as Patent Document 1, a work request (hereinafter referred to as an “action item”) performed on a participant in a meeting place, or a matter determined in a conference (hereinafter referred to as “decision matter”). )) Is assumed to be a function that can be registered with a voice command. At this time, if the command included in the utterance text is deleted as unnecessary text as in the prior art, the text such as the action item and the decision item included in the command is deleted from the utterance text. Therefore, there is a problem that the main contents of the conference such as action items and decisions that are registered by using voice commands remain in the minutes generated from the utterance text.

本発明は、発話された音声データを音声認識してテキストにする音声認識手段と、前記テキストのうち、情報処理装置に対する命令の開始を示す部分である、ヘッダ部を検出する第１の検出手段と、前記テキストのうち、前記情報処理装置に対する前記命令の種類を示す部分である、命令部を検出する第２の検出手段と、前記テキストのうち、前記情報処理装置に対する前記命令の内容を示す部分である、データ部を検出する第３の検出手段と、前記音声データのうち、前記テキストから削除する削除範囲を前記命令部に応じて決定する決定手段と、を有することを特徴とする情報処理装置である。 The present invention provides speech recognition means for recognizing spoken speech data to text, and first detection means for detecting a header portion of the text that indicates a start of an instruction to an information processing apparatus. And a second detection means for detecting an instruction part, which is a part indicating the type of the instruction for the information processing device in the text, and a content of the instruction for the information processing device in the text. Information including: a third detection unit that detects a data portion that is a portion; and a determination unit that determines a deletion range to be deleted from the text in the voice data according to the command unit. It is a processing device.

会議における発話内容を音声認識して議事録を作成する際に、発話された音声コマンドのうち、議事録に残す必要のあるコマンドは、削除せずに、残すことができる。 When creating a minutes by recognizing speech content at a conference, commands that need to be left in the minutes can be left without being deleted.

会議システムの構成例を示す図である。It is a figure which shows the structural example of a conference system. 会議装置と会議サーバの構成例を示すブロック図である。It is a block diagram which shows the structural example of a conference apparatus and a conference server. 表示デバイスに表示される画面例である。It is an example of a screen displayed on a display device. 会議情報のデータ構成例である。It is an example of a data structure of meeting information. 会議情報を記録する手順を示すフローチャートである。It is a flowchart which shows the procedure which records meeting information. 議事録元情報のデータ構成例である。It is a data structural example of minutes origin information. 議事録元情報のデータ構成例である。It is a data structural example of minutes origin information. 議事録を生成する手順を示すフローチャートである。It is a flowchart which shows the procedure which produces | generates the minutes. 生成される議事録の概要である。A summary of the minutes to be generated. 会議テキストを要約する手順を示すフローチャートである。It is a flowchart which shows the procedure which summarizes a meeting text. 音声コマンドの命令を管理するためのデータ構成例である。It is an example of a data structure for managing the command of a voice command. 音声コマンド処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of a voice command process. 音声コマンドの削除処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of the deletion process of a voice command. 音声コマンドのヘッダ部を削除する際の動作例である。It is an operation example when deleting the header part of the voice command. 音声コマンドのヘッダ部と命令部を削除する際の動作例である。It is an operation example when deleting a header part and a command part of a voice command. 音声コマンドのヘッダ部と命令部とデータ部を削除する際の動作例である。It is an operation example when deleting a header part, a command part, and a data part of a voice command.

以下、本発明の実施例について図面を用いて説明する。
図１は、情報処理システムとしての会議システムの構成を示す図である。
会議システム１００は、会議装置１０１と会議サーバ１０２より構成される。会議装置１０１と会議サーバ１０２はネットワーク１０３を介して接続されている。 Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 is a diagram illustrating a configuration of a conference system as an information processing system.
The conference system 100 includes a conference device 101 and a conference server 102. The conference apparatus 101 and the conference server 102 are connected via the network 103.

会議装置１０１は、音声認識装置としての情報処理装置の一例であり、例えば一般的なＰＣ（Personal Computer）である。
会議装置１０１は、例えば会議テーブル上のように、会議室などの会議開催場所に配置される。会議装置１０１は、会議中に発生する音声や画像など複数の種類のデータ（以下、「会議情報」と言う）を記録する。そして、会議装置１０１は、会議情報を会議サーバ１０２に送信する。
なお、図１において、会議システム１００は、会議装置１０１と会議サーバ１０２を一台ずつ備えているが、会議装置１０１や会議サーバ１０２を、それぞれ、複数台備えるように構成してもよい。
また、図１では、会議装置１０１と会議サーバ１０２とを別離した装置として記載しているが、両装置の機能を備えた単一の装置として構成されてもよい。 The conference apparatus 101 is an example of an information processing apparatus as a voice recognition apparatus, and is, for example, a general PC (Personal Computer).
The conference apparatus 101 is arranged at a conference holding place such as a conference room, for example, on a conference table. The conference device 101 records a plurality of types of data (hereinafter referred to as “conference information”) such as voice and images generated during the conference. Then, the conference apparatus 101 transmits the conference information to the conference server 102.
In FIG. 1, the conference system 100 includes one conference device 101 and one conference server 102, but may be configured to include a plurality of conference devices 101 and conference servers 102.
In FIG. 1, the conference apparatus 101 and the conference server 102 are described as separate apparatuses, but may be configured as a single apparatus having the functions of both apparatuses.

会議装置１０１は、例えばオフィスや所定の会場等において開催される会議を記録することを想定している。しかし、本発明が適用対象とする会議は、オフィスや所定の会場等において開催される狭義の会議には限定されず、複数の人物の視認／発声行動を伴うような集まりであればよい。例えば、面接や、取り調べ等も、本発明の適用対象となる会議に相当する。 For example, the conference apparatus 101 is assumed to record a conference held in an office, a predetermined venue, or the like. However, the conference to which the present invention is applied is not limited to a narrow-sense conference held in an office, a predetermined venue, or the like, and may be a gathering that involves visual recognition / speaking behavior of a plurality of persons. For example, interviews and surveys also correspond to conferences to which the present invention is applied.

会議サーバ１０２は、一般的なＰＣやクラウドサーバである。会議サーバ１０２は、会議装置１０１から会議情報を受信し、会議情報に含まれる音声データを音声認識してテキストを生成する。また、会議情報に含まれる画像データを文字認識してテキストを生成する。そして、これらテキストを含む情報を解析・加工して、議事録を生成し、作成した議事録を配信する。 The conference server 102 is a general PC or a cloud server. The conference server 102 receives the conference information from the conference device 101, and generates speech by recognizing voice data included in the conference information. In addition, text is generated by recognizing image data included in the conference information. Then, the information including these texts is analyzed and processed to generate the minutes, and the created minutes are distributed.

図２は、図１の会議装置１０１および会議サーバ１０２のハードウェア構成例を示すブロック図である。
図２（ａ）で、会議装置１０１は、ＣＰＵ２０１、ＲＯＭ２０２、ＲＡＭ２０４、ストレージ２０５、入力デバイス２０６、表示デバイス２０７、外部インターフェース２０８、カメラデバイス２０９、マイクデバイス２１０、スピーカデバイス２１１を備える。これら各デバイスは、データバス２０３を介して相互にデータを送受信することができる。なお、ＣＰＵは、Central Processing Unit の略称である。ＲＡＭは、Random Access Memory の略称である。ＲＯＭは、Read Only Memory の略称である。 FIG. 2 is a block diagram illustrating a hardware configuration example of the conference apparatus 101 and the conference server 102 of FIG.
2A, the conference apparatus 101 includes a CPU 201, ROM 202, RAM 204, storage 205, input device 206, display device 207, external interface 208, camera device 209, microphone device 210, and speaker device 211. Each of these devices can transmit and receive data to and from each other via the data bus 203. CPU is an abbreviation for Central Processing Unit. RAM is an abbreviation for Random Access Memory. ROM is an abbreviation for Read Only Memory.

ＣＰＵ２０１は、会議装置全体を制御するためのコントローラである。ＣＰＵ２０１は、不揮発メモリであるＲＯＭ２０２に格納されているブートプログラムによりＯＳ（Operating System）を起動する。ＣＰＵ２０１は、ＯＳの上で、ストレージ２０５に記録されているコントローラプログラムを実行する。コントローラプログラムは、会議装置全体を制御するプログラムである。ＣＰＵ２０１は、データバス２０３などのバスを介して各デバイスを制御する。
ＲＡＭ２０４は、ＣＰＵ２０１のメインメモリやワークエリア等の一時記憶領域として動作するものである。ストレージ２０５は、読み出しと書き込みが可能な、記録手段としての不揮発メモリであり、前述のコントローラプログラムを保存する。また、会議装置１０１は、会議情報を会議サーバ１０２へ送信するまでの間、ストレージ２０５に会議情報を記録する。 The CPU 201 is a controller for controlling the entire conference apparatus. The CPU 201 activates an OS (Operating System) by a boot program stored in a ROM 202 which is a nonvolatile memory. The CPU 201 executes the controller program recorded in the storage 205 on the OS. The controller program is a program for controlling the entire conference apparatus. The CPU 201 controls each device via a bus such as the data bus 203.
The RAM 204 operates as a temporary storage area such as the main memory or work area of the CPU 201. The storage 205 is a non-volatile memory as a recording unit that can be read and written, and stores the controller program described above. Further, the conference apparatus 101 records the conference information in the storage 205 until the conference information is transmitted to the conference server 102.

入力デバイス２０６は、タッチパネルやハードキー、マウスなどから構成される入力装置である。また、表示デバイス２０７は、ＬＣＤなどの表示装置である。入力デバイス２０６は、ユーザから操作の指示を受け付けると、指示をＣＰＵ２０１に伝達する。
表示デバイス２０７は、ＣＰＵ２０１が生成した表示画像データを画面上に表示する。ＣＰＵ２０１は、入力デバイス２０６から受信した指示情報と、表示デバイス２０７に表示させている表示画像データとに基づいて、操作を判定する。ＣＰＵ２０１は、判定結果に応じて、会議装置１０１を制御するとともに、操作内容に応じて新たな表示画像データを生成し、表示デバイス２０７に表示させる。 The input device 206 is an input device that includes a touch panel, hard keys, a mouse, and the like. The display device 207 is a display device such as an LCD. When the input device 206 receives an operation instruction from the user, the input device 206 transmits the instruction to the CPU 201.
The display device 207 displays the display image data generated by the CPU 201 on the screen. The CPU 201 determines an operation based on the instruction information received from the input device 206 and the display image data displayed on the display device 207. The CPU 201 controls the conference apparatus 101 according to the determination result, generates new display image data according to the operation content, and causes the display device 207 to display the display image data.

外部インターフェース２０８は、ＬＡＮや電話回線、赤外線といった近接無線などのネットワークを介して、別体の外部機器と各種データの送信あるいは受信を行う。
カメラデバイス２０９は、動画や画像を撮影することができる手段である。具体的には、いわゆるデジタルカメラなどがその一例である。
マイクデバイス２１０は、入力された音声をデジタル信号化する手段である。例えば、ユーザが発話した音声を、ＷＡＶＥ形式などの音声データとして取得する。
スピーカデバイス２１１は、外部に音を出力することができる装置である。 The external interface 208 transmits or receives various data to / from a separate external device via a network such as a local area network such as a LAN, a telephone line, or infrared rays.
The camera device 209 is a unit that can capture a moving image or an image. Specifically, a so-called digital camera is an example.
The microphone device 210 is means for converting the input sound into a digital signal. For example, the voice uttered by the user is acquired as voice data in the WAVE format or the like.
The speaker device 211 is a device that can output sound to the outside.

図２（ｂ）で、会議サーバ１０２は、ＣＰＵ２５１、ＲＯＭ２５２、ＲＡＭ２５４、ストレージ２５５、入力デバイス２５６、表示デバイス２５７、外部インターフェース２５８を備える。各デバイスは、データバス２５３を介して相互にデータを送受信することができる。 2B, the conference server 102 includes a CPU 251, a ROM 252, a RAM 254, a storage 255, an input device 256, a display device 257, and an external interface 258. Each device can transmit and receive data to and from each other via the data bus 253.

ＣＰＵ２５１は、この会議サーバ全体を制御するためのコントローラである。ＣＰＵ２５１は、不揮発メモリであるＲＯＭ２５２に格納されているブートプログラムによりＯＳを起動する。ＣＰＵ２５１は、ＯＳの上で、ストレージ２５５に記憶されている会議サーバプログラムを実行することより、会議サーバ１０２の各処理を実現する。ＣＰＵ２５１は、データバス２５３などのバスを介して各部を制御する。
ＲＡＭ２５４は、ＣＰＵ２５１のメインメモリやワークエリア等の一時記憶領域として動作する。ストレージ２５５は、読み出しと書き込みが可能な不揮発メモリであり、前述の会議サーバプログラムを保存する。 The CPU 251 is a controller for controlling the entire conference server. The CPU 251 starts up the OS by a boot program stored in the ROM 252 which is a nonvolatile memory. The CPU 251 implements each process of the conference server 102 by executing the conference server program stored in the storage 255 on the OS. The CPU 251 controls each unit via a bus such as the data bus 253.
The RAM 254 operates as a temporary storage area such as a main memory or work area for the CPU 251. The storage 255 is a non-volatile memory that can be read and written, and stores the conference server program described above.

入力デバイス２５６および表示デバイス２５７は、図２（ａ）で説明した入力デバイス２０６および表示デバイス２０７と同様である。
外部インターフェース２５８は、図２（ａ）で説明した外部インターフェース２０８と同様である。 The input device 256 and the display device 257 are the same as the input device 206 and the display device 207 described with reference to FIG.
The external interface 258 is the same as the external interface 208 described with reference to FIG.

次に、会議システム１００が提示し、ユーザが参照・操作するユーザインターフェースについて説明する。
図３は、会議装置１０１の表示デバイス２０７の表示例を示す。 Next, a user interface presented by the conference system 100 and referred to / operated by the user will be described.
FIG. 3 shows a display example of the display device 207 of the conference apparatus 101.

図３（ａ）に示す画面３００は、会議開始前に表示される画面である。
「開始」ボタン３０１は、ユーザが会議装置１０１に会議の開始を指示するためのものである。ＣＰＵ２０１は、入力デバイス２０６を介して「開始」ボタン３０１への指示を受け付けると、図３（ｂ）に示す画面３１０を表示デバイス２０７に表示させる。そして、ＣＰＵ２０１は、会議情報の記録を開始する。 A screen 300 shown in FIG. 3A is a screen displayed before the start of the conference.
A “start” button 301 is used by the user to instruct the conference apparatus 101 to start a conference. When the CPU 201 receives an instruction to the “start” button 301 via the input device 206, the CPU 201 displays a screen 310 shown in FIG. 3B on the display device 207. Then, the CPU 201 starts recording conference information.

図３（ｂ）に示す画面３１０は、会議中に表示する画面である。
「撮影」ボタン３１１は、ユーザが会議装置１０１に撮影を要求する指示をするためのものである。ＣＰＵ２０１は、入力デバイス２０６を介して「撮影」ボタン３１１への指示を受け付けると、図３（ｃ）に示す画面３２０を表示デバイス２０７に表示させる。 A screen 310 shown in FIG. 3B is a screen displayed during the conference.
The “shoot” button 311 is used by the user to instruct the conference apparatus 101 to request shooting. When the CPU 201 receives an instruction to the “shoot” button 311 via the input device 206, the CPU 201 displays a screen 320 shown in FIG.

「アジェンダ」ボタン３１２は、ユーザが会議装置１０１にアジェンダを変更（開始や終了）する指示をするためのものである。ＣＰＵ２０１は、入力デバイス２０６を介して「アジェンダ」ボタン３１２への指示を受け付けると、図３（ｄ）に示す画面３３０を表示デバイス２０７に表示させる。 The “Agenda” button 312 is for the user to instruct the conference apparatus 101 to change (start or end) the agenda. When the CPU 201 receives an instruction to the “Agenda” button 312 via the input device 206, the CPU 201 causes the display device 207 to display a screen 330 illustrated in FIG.

「音量」ボタン３１３は、ユーザが会議装置１０１に音量の変更を要求する指示をするためのものである。ＣＰＵ２０１は、入力デバイス２０６を介して「音量」ボタン３１３への指示を受け付けると、図３（ｅ）に示す画面３４０を表示デバイス２０７に表示させる。 A “volume” button 313 is used by the user to instruct the conference apparatus 101 to change the volume. When the CPU 201 receives an instruction for the “volume” button 313 via the input device 206, the CPU 201 displays a screen 340 shown in FIG.

「終了」ボタン３１４は、ユーザが会議装置１０１に会議を終了する指示をするためのものである。ＣＰＵ２０１は、入力デバイス２０６を介して「終了」ボタン３１４への指示を受け付けると、図３（ｆ）に示す画面３５０を表示デバイス２０７に表示させる。 The “end” button 314 is used by the user to instruct the conference apparatus 101 to end the conference. When the CPU 201 receives an instruction to the “end” button 314 via the input device 206, the CPU 201 displays a screen 350 shown in FIG.

図３（ｃ）に示す画面３２０は、撮影をする際に表示される画面である。図示のようにカメラデバイス２０９により得られる被写体の映像が表示される。ユーザは、映像を見ながら、文字が記入されたホワイトボードや紙が、会議装置１０１の撮影画角に収まるよう調整することができる。 A screen 320 shown in FIG. 3C is a screen displayed when shooting. As shown in the figure, an image of a subject obtained by the camera device 209 is displayed. While viewing the video, the user can adjust the whiteboard or paper on which characters are written so that the white angle is within the shooting angle of view of the conference apparatus 101.

ＣＰＵ２０１は、入力デバイス２０６を介して画面３２０の任意箇所への指示を受け付けると、カメラデバイス２０９により被写体を撮影して画像データを取得する。
「ＯＫ」ボタン３２１は、ユーザが会議装置１０１に撮影の終了を指示するためのものである。ＣＰＵ２０１は、入力デバイス２０６を介して「ＯＫ」ボタン３２１への指示を受け付けると、図３（ｂ）に示す画面３１０を表示デバイス２０７に表示させる。 When the CPU 201 receives an instruction to an arbitrary location on the screen 320 via the input device 206, the CPU 201 captures an image of the subject by the camera device 209 and acquires image data.
The “OK” button 321 is used by the user to instruct the conference apparatus 101 to end shooting. When the CPU 201 receives an instruction to the “OK” button 321 via the input device 206, the CPU 201 displays a screen 310 shown in FIG.

図３（ｄ）に示す画面３３０は、アジェンダの変更を指示する際に表示する画面である。
テキストフィールド３３１は、ユーザが会議装置１０１にアジェンダ名を登録するためのものである。ユーザは入力デバイス２０６を介して、テキストフィールド３３１に所望のアジェンダ名を入力することができる。
「開始」ボタン３３２は、ユーザが会議装置１０１に新しいアジェンダの開始を指示するためのものである。
「終了」ボタン３３３は、ユーザが会議装置１０１に現在のアジェンダの終了を指示するためのものである。なお、終了したアジェンダの名称はテキストエリア３３４にリスト表示される。
「ＯＫ」ボタン３３５は、ユーザが会議装置１０１にアジェンダ変更の終了を指示するためのものである。ＣＰＵ２０１は、入力デバイス２０６を介して「ＯＫ」ボタン３３５への指示を受け付けると、図３（ｂ）に示す画面３１０を表示デバイス２０７に表示させる。 A screen 330 shown in FIG. 3D is a screen displayed when an agenda change is instructed.
The text field 331 is for the user to register an agenda name in the conference apparatus 101. The user can enter a desired agenda name in the text field 331 via the input device 206.
The “start” button 332 is used by the user to instruct the conference apparatus 101 to start a new agenda.
The “end” button 333 is used by the user to instruct the conference apparatus 101 to end the current agenda. The names of the completed agendas are displayed in a list in the text area 334.
The “OK” button 335 is used by the user to instruct the conference apparatus 101 to end the agenda change. When the CPU 201 receives an instruction to the “OK” button 335 via the input device 206, the CPU 201 displays the screen 310 shown in FIG.

図３（ｅ）に示す画面３４０は、音量の変更を指示する際に表示する画面である。
図示のように、ユーザはスライドバー３４１を操作して会議装置１０１の出力音量を調整することができる。「ＯＫ」ボタン３４２は、ユーザが会議装置１０１に、音量変更の終了を指示するためのものである。ＣＰＵ２０１は、入力デバイス２０６を介して「ＯＫ」ボタン３４２への指示を受け付けると、図３（ｂ）に示す画面３１０を表示デバイス２０７に表示させる。 A screen 340 shown in FIG. 3E is a screen displayed when an instruction to change the volume is given.
As illustrated, the user can adjust the output volume of the conference apparatus 101 by operating the slide bar 341. The “OK” button 342 is used by the user to instruct the conference apparatus 101 to end the volume change. When the CPU 201 accepts an instruction to the “OK” button 342 via the input device 206, the CPU 201 displays a screen 310 shown in FIG.

図３（ｆ）に示す画面３５０は、会議を終了する際に表示する画面である。
テキストフィールド３５１は、ユーザが会議装置１０１に、会議システム１００が作成する議事録の送信先を指示するためのものである。ユーザは入力デバイス２０６を介して、テキストフィールド３５１に、所望の送信先を入力することができる。送信先としては、例えば、メールアドレスを入力することができる。 A screen 350 shown in FIG. 3F is a screen displayed when the conference is ended.
The text field 351 is used by the user to instruct the conference apparatus 101 where to send the minutes created by the conference system 100. The user can input a desired transmission destination in the text field 351 via the input device 206. For example, an e-mail address can be input as the transmission destination.

「ＯＫ」ボタン３５２は、ユーザが会議装置１０１に会議終了の確定を指示するためのものである。ＣＰＵ２０１は、入力デバイス２０６を介して「ＯＫ」ボタン３５２への指示を受け付けると、図３（ａ）に示す画面３００を表示デバイス２０７に表示させる。
そして、会議装置１０１は、会議情報の記録を終了し、会議情報を会議サーバ１０２へ送信する。その後、会議サーバ１０２は、受信した会議情報を解析・加工して議事録を生成し、送信先に送信する。 The “OK” button 352 is used by the user to instruct the conference apparatus 101 to confirm the end of the conference. When the CPU 201 receives an instruction to the “OK” button 352 via the input device 206, the CPU 201 displays the screen 300 shown in FIG.
Then, the conference apparatus 101 ends the recording of the conference information and transmits the conference information to the conference server 102. Thereafter, the conference server 102 analyzes and processes the received conference information, generates a minutes, and transmits the minutes to the transmission destination.

次に、会議装置１０１によって記録される会議情報について説明する。
図４は、会議装置１０１がストレージ２０５に記録する会議情報の構成例を示すものである。 Next, conference information recorded by the conference apparatus 101 will be described.
FIG. 4 shows a configuration example of conference information recorded in the storage 205 by the conference apparatus 101.

図４（ａ）に示す音声情報テーブル４００は、会議装置１０１が録音して取得する音声に関する情報（以下、「音声情報」と言う）を記録するデータテーブルである。会議装置１０１は、会議中の会話を録音し、音声データを記録する。
録音開始時刻列４０１には、録音開始時刻（会議開始時刻）を記録する。録音終了時刻列４０２には、録音終了時刻（会議終了時刻）を記録する。音声データ列４０３には、ストレージ２０５に記録された音声データのファイル名を記録する。 The audio information table 400 shown in FIG. 4A is a data table that records information about audio acquired by the conference apparatus 101 (hereinafter referred to as “audio information”). The conference apparatus 101 records a conversation during the conference and records voice data.
In the recording start time column 401, the recording start time (conference start time) is recorded. The recording end time column 402 records the recording end time (conference end time). In the audio data column 403, the file name of the audio data recorded in the storage 205 is recorded.

図４（ｂ）に示す画像情報テーブル４１０は、会議装置１０１が撮影して取得する画像に関する情報（以下、「画像情報」と言う）を記録するデータテーブルである。会議装置１０１は、被写体を撮影して画像データを取得し、ファイルとしてストレージ２０５に記録する。
撮影時刻列４１１には、撮影時刻を記録する。画像データ列４１２には、ストレージ２０５に記録した画像データのファイル名を記録する。 An image information table 410 illustrated in FIG. 4B is a data table that records information (hereinafter referred to as “image information”) related to an image captured and acquired by the conference apparatus 101. The conference apparatus 101 captures image data by capturing a subject and records it as a file in the storage 205.
The shooting time column 411 records the shooting time. In the image data string 412, the file name of the image data recorded in the storage 205 is recorded.

図４（ｃ）に示すアジェンダ情報テーブル４２０は、会議装置１０１が記録するアジェンダに関する情報（以下、「アジェンダ情報」と言う）を記録するデータテーブルである。
アジェンダ開始時刻列４２１には、アジェンダの開始時刻を記録する。アジェンダ終了時刻列４２２には、アジェンダの終了時刻を記録する。アジェンダ名列４２３には、アジェンダ名を記録する。 The agenda information table 420 shown in FIG. 4C is a data table that records information on the agenda recorded by the conference apparatus 101 (hereinafter referred to as “agenda information”).
The agenda start time column 421 records the start time of the agenda. The agenda end time column 422 records the end time of the agenda. The agenda name is recorded in the agenda name column 423.

次に、会議装置１０１の会議情報を記録するための処理について説明する。
図５は、会議情報を記録するための処理を示すフローチャートである。
会議装置１０１の電源キー（不図示）を操作して電源をＯＮにすると、ＣＰＵ２０１は、ストレージ２０５に記録されているコントローラプログラムを読み込む。そして、コントローラプログラムをＲＡＭ２０４に展開して実行する。これにより、会議装置１０１は会議情報記録処理を実行することが可能となる。また、ＣＰＵ２０１は、画面３００の表示画像データを生成して、表示デバイス２０７に表示させる。 Next, a process for recording the conference information of the conference apparatus 101 will be described.
FIG. 5 is a flowchart showing a process for recording conference information.
When the power is turned on by operating a power key (not shown) of the conference apparatus 101, the CPU 201 reads a controller program recorded in the storage 205. Then, the controller program is expanded in the RAM 204 and executed. As a result, the conference apparatus 101 can execute the conference information recording process. Further, the CPU 201 generates display image data of the screen 300 and causes the display device 207 to display the display image data.

まず、Ｓ５０１において、ＣＰＵ２０１は、会議を開始する指示がなされたか否かを判定する。
「開始」ボタン３０１への指示がなされていたら、ＹＥＳと判定し、Ｓ５０２に遷移する。このとき、ＣＰＵ２０１は、画面３１０の表示画像データを生成して、表示デバイス２０７に表示させる。「開始」ボタン３０１への指示がなされていなければ、ＮＯと判定し、Ｓ５２２に遷移する。 First, in step S501, the CPU 201 determines whether an instruction to start a conference has been issued.
If an instruction is given to the “start” button 301, it is determined as YES and the process proceeds to S502. At this time, the CPU 201 generates display image data of the screen 310 and causes the display device 207 to display the display image data. If no instruction is given to the “start” button 301, it is determined as NO, and the process proceeds to S522.

Ｓ５０２において、ＣＰＵ２０１は、マイクデバイス２１０により、会話の録音を開始する。これにより、ＣＰＵ２０１は音声データを取得する。そして、ＣＰＵ２０１は、音声情報テーブル４００の録音開始時刻列４０１に、録音開始時刻として現在の時刻を記録する。
また、ＣＰＵ２０１は、音声データを、ファイルとしてストレージ２０５に記録し始める。ＣＰＵ２０１は、音声データのファイル名を、音声情報テーブル４００の音声データ列４０３に記録する。なお、録音開始時刻は会議開始時刻に相当する。 In step S 502, the CPU 201 starts recording a conversation using the microphone device 210. Thereby, the CPU 201 acquires audio data. Then, the CPU 201 records the current time in the recording start time column 401 of the audio information table 400 as the recording start time.
In addition, the CPU 201 starts recording audio data as a file in the storage 205. The CPU 201 records the audio data file name in the audio data string 403 of the audio information table 400. Note that the recording start time corresponds to the conference start time.

Ｓ５０３において、ＣＰＵ２０１は、音声コマンドを検出するために、音声コマンド処理を実行する。
本実施例の音声コマンドはヘッダ部、命令部、データ部の３つの部分で構成される。
１つ目のヘッダ部は、会議装置１０１に対して音声コマンドの発話が開始されることを示すための部分である。
具体的に言うと、ヘッダ部は、「Ｈｅｙ」のような、音声コマンドの開始を示す発話である。ヘッダ部として用いる発話の文言は、予め会議装置１０１におけるストレージ２０５に登録しておく。 In step S503, the CPU 201 executes voice command processing in order to detect a voice command.
The voice command of this embodiment is composed of three parts: a header part, a command part, and a data part.
The first header portion is a portion for indicating that the speech command is started to the conference apparatus 101.
Specifically, the header part is an utterance indicating the start of a voice command such as “Hey”. The utterance word used as the header part is registered in the storage 205 in the conference apparatus 101 in advance.

２つ目の命令部は、会議装置１０１に実行させる命令の種類を表す部分である。
命令部としては、図１１の命令部の文字列１１０１に示すような発話が用いられる。
具体的な命令の種類としては、ＡＩへ登録するための「ＡＩ」、会議における決定事項を登録するための「決定事項」、などがある。さらに、決定事項を削除するための「決定事項削除」、アジェンダを開始するための「アジェンダ開始」、撮影を指示するための「撮影」、音量を変更するための「音量」、などもある。
命令部として用いられる文言は、図１１のような命令部データテーブル１１００で予め登録しておく。命令部データテーブル１１００は、会議装置１０１のストレージ２０５及び会議サーバ１０２のストレージ２５５で保持される。
命令部の文字列１１０１は、ユーザが会議装置１０１に対して発話する命令部の文字列を表す列である。命令内容列１１０２は、会議装置１０１が実行する命令の指示内容を表す列である。データ部有無列１１０３は、命令がデータ部を有するか否かを表す列である。データ部を有する場合は「有」、データ部を持たない場合は「無」で管理される。削除範囲列１１０４は、音声コマンドを構成する部分のうち、発話テキストから削除する範囲を表す列である。 The second command part is a part representing the type of command to be executed by the conference apparatus 101.
As the command part, an utterance as shown in the character string 1101 of the command part in FIG. 11 is used.
Specific types of commands include “AI” for registering with the AI, “decision items” for registering the decision items in the conference, and the like. Further, there are “determination deletion” for deleting a determination item, “start agenda” for starting an agenda, “shooting” for instructing shooting, “volume” for changing the volume, and the like.
The wording used as the command part is registered in advance in the command part data table 1100 as shown in FIG. The command part data table 1100 is held in the storage 205 of the conference apparatus 101 and the storage 255 of the conference server 102.
The character string 1101 of the command part is a string representing the character string of the command part that the user speaks to the conference apparatus 101. The command content column 1102 is a column representing the command content of commands executed by the conference apparatus 101. The data portion presence / absence column 1103 is a column indicating whether or not the instruction has a data portion. When there is a data part, it is managed as “Yes”, and when it does not have a data part, it is managed as “No”. The deletion range column 1104 is a column representing a range to be deleted from the utterance text among the parts constituting the voice command.

３つ目のデータ部は、会議装置１０１に実行させる命令の内容を表す部分である。
データ部としては、例えば、ユーザが会議における決定事項を登録する場合の「案２で進める」、音量を変更する場合の「１０上げる」などの発話が、これに相当する。
命令部が決定事項を削除するための「決定事項削除」である場合など、データ部を含まない音声コマンドも存在する。
なお、データ部は、内容の性質上、予め登録されているものではない。 The third data part is a part representing the content of a command to be executed by the conference apparatus 101.
As the data portion, for example, utterances such as “Proceed with plan 2” when the user registers a decision item in the conference and “Raise 10” when changing the volume correspond to this.
There is also a voice command that does not include a data part, such as when the command part is “deletion of decision item” for deleting a decision item.
The data part is not registered in advance due to the nature of the contents.

例えば、ユーザは、決定事項を登録したい場合、「Ｈｅｙ、決定事項、案２で進める。」のように、ヘッダ部（Ｈｅｙ）、命令部（決定事項）、データ部（案２で進める）の順に発話する。
また、アジェンダの開始を会議装置１０１に対して指示したい場合は、「Ｈｅｙ、アジェンダ開始、来年度予算について。」のように、ヘッダ部（Ｈｅｙ）、命令部（アジェンダ開始）、データ部（来年度予算について）の順に発話する。
同様に、撮影を指示する場合は、「Ｈｅｙ（ヘッダ部）、撮影（命令部）。」のように、音量の変更を指示する場合は、「Ｈｅｙ（ヘッダ部）、音量（命令部）、１０上げる（データ部）。」のように、発話する。
なお、ここで示した発話する音声コマンドの文言や構成は一例であり、本実施例に記載の処理が実行できれば、どのような文言や構成であってもよい。
音声コマンド処理の具体的なフローについては、図１２に示す。 For example, when the user wants to register a decision item, the header part (Hey), the command part (decision item), and the data part (progress in the plan 2) are read as “Hey, decision item, proceed with plan 2”. Speak in order.
When it is desired to instruct the conference apparatus 101 to start the agenda, the header part (Hey), the command part (starting the agenda), the data part (budget for the next year, etc.) as in “Hey, agenda start, next year budget”. Speak in the order of
Similarly, when instructing photographing, “Hey (header part), photographing (command part).” When instructing change of the sound volume, “Hey (header part), sound volume (command part), Speak like "Raise 10 (data part)."
Note that the wording and configuration of the voice command to be uttered here are merely examples, and any wording or configuration may be used as long as the processing described in this embodiment can be executed.
A specific flow of voice command processing is shown in FIG.

図１２は、会議装置１０１における音声コマンドの処理を示すフローチャートである。
まず、Ｓ１２０１において、ＣＰＵ２０１は、音声データを取得して音声認識を実行し、認識結果テキストに追記する。
ここで、音声データは、ユーザの発話区間単位で取得する。発話区間とは、ある無音区間と次の無音区間の間の区間である。無音区間の検出は、例えば、音声データの音圧が閾値以下の状態が一定時間継続されたことに基づいて検出する。Ｓ１２０１で音声認識した結果のテキストは、順次認識結果テキストに追記する。 FIG. 12 is a flowchart showing voice command processing in the conference apparatus 101.
First, in step S1201, the CPU 201 acquires voice data, executes voice recognition, and appends to the recognition result text.
Here, the voice data is acquired in units of user utterance sections. The utterance section is a section between a certain silent section and the next silent section. The silent section is detected based on, for example, that a state in which the sound pressure of the audio data is equal to or lower than a threshold value is continued for a certain period of time. The text resulting from the speech recognition in S1201 is sequentially added to the recognition result text.

Ｓ１２０２において、ＣＰＵ２０１は、認識結果テキストに新たに追記されたテキストの内、音声コマンドのヘッダ部に相当するテキストが含まれているか否かを判定する。
具体的には、会議装置１０１におけるストレージ２０５に予め保持されたヘッダ部の文言と一致するテキストが含まれているか否かに基づいて判定する。
含まれている場合、ＹＥＳと判定し、Ｓ１２０３に遷移する。含まれていない場合、ＮＯと判定し、処理を終了する。 In step S 1202, the CPU 201 determines whether or not the text newly added to the recognition result text includes text corresponding to the header part of the voice command.
Specifically, the determination is made based on whether or not the storage 205 in the conference apparatus 101 includes a text that matches the wording of the header part held in advance.
When it is included, it determines with YES and changes to S1203. When it is not included, it determines with NO and complete | finishes a process.

Ｓ１２０３において、ＣＰＵ２０１は、ヘッダ部に続くテキストに音声コマンドの命令部に相当するテキストが含まれているか否かを判定する。
具体的には、会議装置１０１のストレージ２０５に保持された命令部データテーブル１１００の命令部の文字列１１０１と一致するテキストが、ヘッダ部に続くテキストに含まれているか否かに基づいて判定する。
含まれている場合、ＹＥＳと判定し、Ｓ１２０４に遷移する。含まれていない場合、ＮＯと判定し、処理を終了する。 In step S1203, the CPU 201 determines whether or not the text following the header portion includes text corresponding to the command portion of the voice command.
Specifically, the determination is made based on whether or not the text that matches the character string 1101 of the command part of the command part data table 1100 held in the storage 205 of the conference apparatus 101 is included in the text that follows the header part. .
If it is included, it is determined as YES, and the process proceeds to S1204. When it is not included, it determines with NO and complete | finishes a process.

Ｓ１２０４において、ＣＰＵ２０１は、Ｓ１２０３で検出した音声コマンドの命令がデータ部を有するか否かを判定する。判定では、Ｓ１２０３で検出したテキストと命令部の文字列１１０１とが一致するレコードを特定する。そして、特定されたレコードのデータ部有無列１１０３を参照し「有」の場合、ＹＥＳと判定し、Ｓ１２０５に遷移する。「無」の場合は、ＮＯと判定し、Ｓ１２０６に遷移する。 In step S1204, the CPU 201 determines whether the voice command instruction detected in step S1203 has a data portion. In the determination, a record in which the text detected in S1203 matches the character string 1101 of the command part is specified. Then, with reference to the data portion presence / absence column 1103 of the identified record and “Yes”, it is determined as YES, and the process proceeds to S1205. If “no”, NO is determined, and the process proceeds to S1206.

Ｓ１２０５において、ＣＰＵ２０１は、命令部に続くテキストに音声コマンドのデータ部に相当するテキストが含まれているか否かを判定する。具体的には、命令部に相当するテキストから発話区間の終了（句読点）までに、テキストが含まれているか否かに基づいて判定する。
含まれている場合、ＹＥＳと判定し、Ｓ１２０６に遷移する。含まれていない場合、ＮＯと判定し、処理を終了する。 In step S 1205, the CPU 201 determines whether the text following the command part includes text corresponding to the data part of the voice command. Specifically, the determination is made based on whether or not the text is included between the text corresponding to the command section and the end of the utterance section (punctuation).
If it is included, the determination is YES, and the process proceeds to S1206. When it is not included, it determines with NO and complete | finishes a process.

Ｓ１２０６において、ＣＰＵ２０１は、命令部データテーブル１１００を参照し、検出したコマンドの命令内容を特定する。命令部データテーブル１１００の命令部の文字列１１０１のうち、Ｓ１２０３で判定した際の、ヘッダ部に続くテキストと一致するレコードを特定し、特定されたレコードの命令内容列１１０１を参照して命令内容を特定する。 In step S 1206, the CPU 201 refers to the instruction part data table 1100 and specifies the instruction content of the detected command. Of the character string 1101 of the instruction part of the instruction part data table 1100, the record that matches the text following the header part as determined in S1203 is specified, and the instruction content is referenced with reference to the instruction content string 1101 of the specified record Is identified.

図１２のフローチャートに示す音声コマンド処理が終了すると、図５の会議情報の記録処理に戻る。
そして、Ｓ５０４において、ＣＰＵ２０１は、撮影を要求する指示がなされたか否かを判定する。
画面３１０で「撮影」ボタン３１１への指示がなされていた場合、もしくは、Ｓ５０３において特定した音声コマンドの命令内容が撮影要求指示であった場合、ＹＥＳと判定し、Ｓ５０５に遷移する。このとき、ＣＰＵ２０１は、画面３２０の表示画像データを生成して、表示デバイス２０７に表示させる。
「撮影」ボタン３１１への指示がなされていなければ、ＮＯと判定し、Ｓ５０８に遷移する。 When the voice command process shown in the flowchart of FIG. 12 ends, the process returns to the meeting information recording process of FIG.
In step S504, the CPU 201 determines whether an instruction for requesting shooting has been issued.
If the “shoot” button 311 is instructed on the screen 310, or if the instruction content of the voice command specified in S503 is a shoot request instruction, the determination is YES, and the process proceeds to S505. At this time, the CPU 201 generates display image data of the screen 320 and displays it on the display device 207.
If no instruction is given to the “shoot” button 311, the determination is NO, and the flow proceeds to S 508.

Ｓ５０５において、ＣＰＵ２０１は、撮影の指示がなされたか否かを判定する。
画面３２０で任意の箇所への指示がなされていた場合、もしくは、Ｓ５０３において特定した音声コマンドの命令内容が撮影要求指示であった場合、ＹＥＳと判定し、Ｓ５０６に遷移する。
画面３２０で任意の箇所への指示がなされていなければ、ＮＯと判定し、Ｓ５０７に遷移する。 In step S 505, the CPU 201 determines whether a shooting instruction has been issued.
If an instruction is given to an arbitrary location on the screen 320, or if the instruction content of the voice command specified in S503 is a shooting request instruction, the determination is YES, and the process proceeds to S506.
If no instruction is given to an arbitrary location on the screen 320, it is determined as NO and the process proceeds to S507.

Ｓ５０６において、ＣＰＵ２０１は、カメラデバイス２０９により被写体を撮影して画像データを取得する。また、ＣＰＵ２０１は、画像情報テーブル４１０にレコードを追加して、撮影時刻列４１１に現在の時刻を記録する。また、ＣＰＵ２０１は、画像データをファイルとしてストレージ２０５に記録する。ＣＰＵ２０１は、当該画像データのファイル名を、画像情報テーブル４１０の画像データ列４１２に記録する。 In step S 506, the CPU 201 captures a subject with the camera device 209 and acquires image data. Further, the CPU 201 adds a record to the image information table 410 and records the current time in the shooting time column 411. Further, the CPU 201 records the image data as a file in the storage 205. The CPU 201 records the file name of the image data in the image data string 412 of the image information table 410.

Ｓ５０７において、ＣＰＵ２０１は、撮影を終了する指示がなされたか否かを判定する。
画面３２０で「ＯＫ」ボタン３２１への指示がなされていたら、ＹＥＳと判定し、Ｓ５０４に遷移する。このとき、ＣＰＵ２０１は、画面３１０の表示画像データを生成して、表示デバイス２０７に表示させる。
「ＯＫ」ボタン３２１への指示がなされていなければ、ＮＯと判定し、Ｓ５０５に遷移する。 In step S 507, the CPU 201 determines whether an instruction to end shooting has been issued.
If an instruction to the “OK” button 321 is given on the screen 320, it is determined YES and the process proceeds to S504. At this time, the CPU 201 generates display image data of the screen 310 and causes the display device 207 to display the display image data.
If no instruction is given to the “OK” button 321, the determination is NO, and the flow proceeds to S 505.

Ｓ５０８において、ＣＰＵ２０１は、アジェンダを変更する指示がなされたか否かを判定する。
画面３１０で「アジェンダ」ボタン３１２への指示がなされていた場合、もしくは、Ｓ５０３において特定した音声コマンドの命令内容がアジェンダの開始指示又は終了指示であった場合、ＹＥＳと判定し、Ｓ５０９に遷移する。このとき、ＣＰＵ２０１は、画面３３０の表示画像データを生成して、表示デバイス２０７に表示させる。
「アジェンダ」ボタン３１２への指示がなされていなければ、ＮＯと判定し、Ｓ５１４に遷移する。 In step S508, the CPU 201 determines whether an instruction to change the agenda has been issued.
If an instruction is given to the “Agenda” button 312 on the screen 310, or if the instruction content of the voice command specified in S503 is an agenda start instruction or end instruction, it is determined YES, and the process proceeds to S509. . At this time, the CPU 201 generates display image data of the screen 330 and causes the display device 207 to display it.
If no instruction is given to the “Agenda” button 312, the determination is NO and the process proceeds to S 514.

Ｓ５０９において、ＣＰＵ２０１は、アジェンダを開始する指示がなされたか否かを判定する。
画面３３０で「開始」ボタン３３２への指示がなされていた場合、もしくは、Ｓ５０３において特定した音声コマンドの命令内容がアジェンダ開始指示であった場合、ＹＥＳと判定し、Ｓ５１０に遷移する。
「開始」ボタン３３２への指示がなされていなければ、ＮＯと判定し、Ｓ５１１に遷移する。 In step S509, the CPU 201 determines whether an instruction to start an agenda has been issued.
If an instruction is given to the “start” button 332 on the screen 330, or if the instruction content of the voice command specified in S503 is an agenda start instruction, the determination is YES, and the process proceeds to S510.
If no instruction is given to the “start” button 332, the determination is NO and the process proceeds to S 511.

Ｓ５１０において、ＣＰＵ２０１は、新しいアジェンダを開始する。ＣＰＵ２０１は、アジェンダ情報テーブル４２０にレコードを追加して、アジェンダ開始時刻列４２１に現在の時刻を記録する。また、Ｓ５０３において特定した音声コマンドの命令内容がアジェンダ開始指示であった場合、音声コマンドの認識結果として受信したアジェンダ名をアジェンダ名列４２３に記録する。 In S510, the CPU 201 starts a new agenda. The CPU 201 adds a record to the agenda information table 420 and records the current time in the agenda start time column 421. If the command content of the voice command identified in S503 is an agenda start instruction, the received agenda name is recorded in the agenda name column 423 as a voice command recognition result.

Ｓ５１１において、ＣＰＵ２０１は、アジェンダを終了する指示がなされたか否かを判定する。
画面３３０で「終了」ボタン３３３への指示がなされていた場合、もしくは、Ｓ５０３において特定した音声コマンドの命令内容がアジェンダ終了指示であった場合、ＹＥＳと判定し、Ｓ５１２に遷移する。
「終了」ボタン３３３への指示がなされていなければ、ＮＯと判定し、Ｓ５１３に遷移する。 In step S511, the CPU 201 determines whether an instruction to end the agenda has been issued.
If an instruction to the “end” button 333 is given on the screen 330, or if the instruction content of the voice command specified in S503 is an agenda end instruction, the determination is YES, and the process proceeds to S512.
If no instruction is given to the “end” button 333, it is determined as NO and the process proceeds to S513.

Ｓ５１２において、ＣＰＵ２０１は、現在のアジェンダを終了する。ＣＰＵ２０１は、アジェンダ情報テーブル４２０のアジェンダ終了時刻列４２２に現在の時刻を記録する。また、テキストフィールド３３１に入力されたアジェンダ名をアジェンダ名列４２３に記録する。 In S512, the CPU 201 ends the current agenda. The CPU 201 records the current time in the agenda end time column 422 of the agenda information table 420. Further, the agenda name input in the text field 331 is recorded in the agenda name column 423.

Ｓ５１３において、ＣＰＵ２０１は、アジェンダ変更を終了する指示がなされたか否を判定する。
画面３３０で「ＯＫ」ボタン３３５への指示がなされていれば、ＹＥＳと判定し、Ｓ５０４に遷移する。このとき、ＣＰＵ２０１は、画面３１０の表示画像データを生成して、表示デバイス２０７に表示させる。
「ＯＫ」ボタン３３５への指示がなされていなければ、ＮＯと判定し、Ｓ５０９に遷移する。 In step S513, the CPU 201 determines whether an instruction to end the agenda change has been issued.
If an instruction to the “OK” button 335 is given on the screen 330, it is determined as YES and the process proceeds to S504. At this time, the CPU 201 generates display image data of the screen 310 and causes the display device 207 to display the display image data.
If no instruction is given to the “OK” button 335, it is determined as NO, and the process proceeds to S509.

Ｓ５１４において、ＣＰＵ２０１は、音量を変更する指示がなされたか否かを判定する。
画面３１０で「音量」ボタン３１１への指示がなされていた場合、もしくは、Ｓ５０３において特定した音声コマンドの命令内容が音量変更指示であった場合、ＹＥＳと判定し、Ｓ５１５に遷移する。
指示がなされていなければＮＯと判定し、Ｓ５１８に遷移する。 In step S514, the CPU 201 determines whether an instruction to change the volume has been issued.
If an instruction is given to the “volume” button 311 on the screen 310, or if the instruction content of the voice command specified in S503 is a volume change instruction, the determination is YES, and the process proceeds to S515.
If no instruction is given, NO is determined, and the flow proceeds to S518.

Ｓ５１５において、ＣＰＵ２０１は、現在の音量を表示する。具体的には、会議装置１０１のストレージ２０５に保持された現在の音量を画面３３０でスライドバー３４１として表示する。 In step S515, the CPU 201 displays the current volume. Specifically, the current volume stored in the storage 205 of the conference apparatus 101 is displayed as a slide bar 341 on the screen 330.

Ｓ５１６において、ＣＰＵ２０１は、音量変更を終了する指示がなされたか否を判定する。
画面３４０で「ＯＫ」ボタン３４２への指示、もしくはＳ５０３において受信した音声コマンドの命令内容が音量変更指示であった場合、ＹＥＳと判定し、Ｓ５１７に遷移する。
指示がなされていなければ、ＮＯと判定し、再度Ｓ５１６の処理を行う。 In step S516, the CPU 201 determines whether an instruction to end the volume change has been given.
If the instruction to the “OK” button 342 on the screen 340 or the instruction content of the voice command received in S503 is a volume change instruction, the determination is YES, and the process proceeds to S517.
If no instruction is given, NO is determined and the process of S516 is performed again.

Ｓ５１７において、ＣＰＵ２０１は、設定された音量を保存する。具体的には、スライドバー３４１で設定された音量、もしくは音声コマンドで指示された音量をストレージ２０５に保存する。 In step S517, the CPU 201 stores the set volume. Specifically, the volume set by the slide bar 341 or the volume specified by the voice command is stored in the storage 205.

Ｓ５１８において、ＣＰＵ２０１は、会議を終了する指示がなされたか否かを判定する。
画面３１０で「終了」ボタン３１３への指示がなされていれば、ＹＥＳと判定し、Ｓ５１９に遷移する。このとき、ＣＰＵ２０１は、画面３４０の表示画像データを生成して、表示デバイス２０７に表示させる。
「終了」ボタン３１３への指示がなされていなければ、ＮＯと判定し、Ｓ５０４に遷移する。 In step S518, the CPU 201 determines whether an instruction to end the conference has been given.
If an instruction to the “Finish” button 313 is given on the screen 310, it is determined YES and the process proceeds to S519. At this time, the CPU 201 generates display image data of the screen 340 and displays it on the display device 207.
If no instruction is given to the “end” button 313, it is determined as NO, and the process proceeds to S504.

Ｓ５１９において、ＣＰＵ２０１は、マイクデバイス２１０による会議の録音を終了する。ＣＰＵ２０１は、音声情報テーブル４００の録音終了時刻列４０２に現在の時刻を記録する。なお、このとき、アジェンダ情報テーブル４２０に、アジェンダ終了時刻列４２２に終了時刻が記録されていないレコードがあれば、アジェンダ終了時刻として現在の時刻をアジェンダ終了時刻列４２２に記録する。 In step S519, the CPU 201 ends the recording of the conference by the microphone device 210. The CPU 201 records the current time in the recording end time column 402 of the audio information table 400. At this time, if there is a record whose end time is not recorded in the agenda end time column 422 in the agenda information table 420, the current time is recorded in the agenda end time column 422 as the agenda end time.

Ｓ５２０において、ＣＰＵ２０１は、会議の終了を確定する指示がなされたか否かを判定する。
画面３４０でテキストフィールド３４１に送信先が入力され、かつ「ＯＫ」ボタン３４２への指示がなされていれば、ＹＥＳと判定し、Ｓ５２１に遷移する。
テキストフィールド３４１に送信先が入力されていない、あるいは、「ＯＫ」ボタン３４２への指示がなされていなければ、ＮＯと判定し、再度Ｓ５２０の処理を行う。なお、テキストフィールド３４１に入力された送信先は、会議情報の一部として記録する。 In step S520, the CPU 201 determines whether an instruction to finalize the conference has been issued.
If the transmission destination is input to the text field 341 on the screen 340 and the “OK” button 342 is instructed, the determination is YES, and the process proceeds to S521.
If the transmission destination is not input in the text field 341 or if the instruction to the “OK” button 342 is not made, it is determined as NO, and the process of S520 is performed again. The transmission destination input in the text field 341 is recorded as part of the conference information.

Ｓ５２１において、ＣＰＵ２０１は、以上の処理によりストレージ２０５に記録した会議情報を、外部インターフェース２０８を介して、会議サーバ１０２に送信する。なお、送信後は、会議情報をストレージ２０５から削除してもよい。また、ＣＰＵ２０１は、画面３００の表示画像データを生成して、表示デバイス２０７に表示させる。 In step S 521, the CPU 201 transmits the conference information recorded in the storage 205 by the above processing to the conference server 102 via the external interface 208. Note that the conference information may be deleted from the storage 205 after the transmission. Further, the CPU 201 generates display image data of the screen 300 and causes the display device 207 to display the display image data.

Ｓ５２２において、ＣＰＵ２０１は、電源をオフする指示がなされたか否かを判定する。
会議装置１０１の電源キー（不図示）への指示がなされていれば、ＹＥＳと判定し、処理を終了する。会議装置１０１の電源キーへの指示がなされていなければ、ＮＯと判定し、Ｓ５０１に遷移する。 In step S522, the CPU 201 determines whether an instruction to turn off the power has been given.
If an instruction is given to the power key (not shown) of the conference apparatus 101, the determination is YES, and the process ends. If no instruction is given to the power key of the conference apparatus 101, the determination is NO, and the process proceeds to S501.

次に、図６と図７を用いて、会議サーバ１０２が会議装置１０１から受信した会議情報を解析・加工して生成する議事録元情報について説明する。図６と図７は、会議サーバ１０２がストレージ２５５に記録する議事録元情報の構成例を示すものである。 Next, the minutes source information generated by analyzing and processing the conference information received from the conference device 101 by the conference server 102 will be described with reference to FIGS. 6 and 7. 6 and 7 show configuration examples of the minutes source information recorded in the storage 255 by the conference server 102. FIG.

図６（ａ）に示す発話情報テーブル６００は、会議情報に含まれる音声データを音声認識した結果に関する情報（以下、「発話情報」と言う）を記録するデータテーブルである。発話情報テーブル６００は、音声データが解析されてユーザの発話が特定されると、発話毎に生成される。
発話時刻列６０１は、発話が発生した時刻（以下、「発話時刻」と言う）を記録するものである。発話テキスト列６０２は、発話を音声認識して取得した発話テキストを記録するものである。発話や発話時刻の特定については後述する。 The utterance information table 600 shown in FIG. 6A is a data table that records information (hereinafter referred to as “utterance information”) related to the result of voice recognition of voice data included in the conference information. The utterance information table 600 is generated for each utterance when voice data is analyzed and a user's utterance is specified.
The utterance time column 601 records the time when the utterance occurred (hereinafter referred to as “utterance time”). The utterance text string 602 records the utterance text acquired by voice recognition of the utterance. The specification of the utterance and the utterance time will be described later.

図６（ｂ）に示す記入情報テーブル６１０は、会議情報に含まれる画像データを文字認識した結果に関する情報（以下、「記入情報」と言う）を記録するデータテーブルである。記入情報テーブル６１０は、画像データが解析されてユーザによる記入が特定されると、記入毎に生成される。
記入時刻列６１１は、記入が発生した時刻（以下、「記入時刻」と言う）を記録するものである。記入テキスト列６１２は、画像データを文字認識して取得した記入テキストを記録するものである。記入や記入時刻の特定については後述する。 The entry information table 610 shown in FIG. 6B is a data table for recording information (hereinafter referred to as “entry information”) related to the result of character recognition of the image data included in the conference information. The entry information table 610 is generated for each entry when the image data is analyzed and the entry by the user is specified.
The entry time column 611 records the time at which entry occurred (hereinafter referred to as “entry time”). The entry text column 612 records entry text obtained by character recognition of image data. The specification of entry and entry time will be described later.

図６（ｃ）に示す会議テキスト情報テーブル６２０は、会議において発生したテキスト（以下、「会議テキスト」と言う）に関する情報（以下、「会議テキスト情報」と言う）を記録するデータテーブルである。
会議テキスト情報は、図６（ａ）に示す発話情報と図６（ｂ）に示す記入情報を統合して生成するものである。
発生時刻列６２１は、会議テキスト情報が発生した時刻を記録するものであり、発話時刻６０１または記入時刻６１１の時刻を記録する。
会議テキスト列６２２は、会議テキストを記録するものであり、発話テキスト列６０２または記入テキスト列６１２のテキストを記録する。
区分列６２３は、そのレコードが、統合前に発話情報であったのか記入情報であったのかを記録するものである。発話情報であった場合には「０」を記録し、記入情報であった場合には「１」を記録する。
要点列６２４は、そのレコードの会議テキスト列６２２の会議テキストが要点であるか否かを記録するものである。ここで「要点」とは、アクションアイテムや決定事項など、その会議の主要な内容を示すものである。要点である場合には「１」を記録し、そうでない場合には「０」を記録する。
なお、会議テキスト情報テーブル６２０のレコードは、発生時刻列６２１の値で昇順に（発生した順に）ソートする。 The conference text information table 620 shown in FIG. 6C is a data table that records information (hereinafter referred to as “conference text information”) regarding text generated in the conference (hereinafter referred to as “conference text information”).
The meeting text information is generated by integrating the utterance information shown in FIG. 6A and the entry information shown in FIG.
The occurrence time column 621 records the time when the meeting text information is generated, and records the time of the utterance time 601 or the entry time 611.
The meeting text column 622 records the meeting text, and records the text of the utterance text column 602 or the entry text column 612.
The classification column 623 records whether the record was speech information or entry information before integration. If it is utterance information, “0” is recorded, and if it is entry information, “1” is recorded.
The point column 624 records whether or not the meeting text in the meeting text column 622 of the record is a point. Here, the “main points” indicate the main contents of the conference such as action items and decisions. If it is the main point, “1” is recorded, otherwise “0” is recorded.
The records in the conference text information table 620 are sorted in ascending order (in the order in which they occurred) by the value in the occurrence time column 621.

図７に示す要約情報テーブル７００は、図６（ｃ）に示す会議テキストを要約した情報（以下、「要約情報」と言う）を記録したデータテーブルである。要約情報は、会議テキスト情報テーブル６２０の会議テキスト列６２２の会議テキストから、アジェンダ毎に生成されて、要約情報テーブル７００に記録される。
アジェンダ名列７０１は、要約情報のアジェンダ名を記録するものである。要約テキスト列７０２は、生成した要約テキストを記録するものである。 The summary information table 700 illustrated in FIG. 7 is a data table in which information (hereinafter referred to as “summary information”) that summarizes the conference text illustrated in FIG. 6C is recorded. The summary information is generated for each agenda from the conference text in the conference text column 622 of the conference text information table 620 and recorded in the summary information table 700.
The agenda name column 701 records the agenda name of the summary information. The summary text column 702 records the generated summary text.

次に、会議サーバ１０２が議事録を生成する処理について説明する。
図８は、議事録を生成する処理を示すフローチャートである。会議サーバ１０２が起動すると、ＣＰＵ２５１は、ストレージ２５５に記録されている会議サーバプログラムを読み込む。そして、会議サーバプログラムをＲＡＭ２５４に展開して実行する。これにより、会議サーバ１０２は議事録生成処理を実行することが可能となる。 Next, a process in which the conference server 102 generates the minutes will be described.
FIG. 8 is a flowchart showing a process for generating the minutes. When the conference server 102 is activated, the CPU 251 reads the conference server program recorded in the storage 255. Then, the conference server program is expanded in the RAM 254 and executed. Thereby, the conference server 102 can execute the minutes generation process.

まず、Ｓ８０１において、ＣＰＵ２５１は、会議情報を受信したか否かを判定する。
ＣＰＵ２５１が、外部インターフェース２５８を介して、会議装置１０１から会議情報を受信しているならば、ＹＥＳと判定し、Ｓ８０２に遷移する。外部インターフェース２５８を介して、会議装置１０１から会議情報を受信していなければ、ＮＯと判定し、Ｓ８１０に遷移する。 First, in S801, the CPU 251 determines whether meeting information has been received.
If the CPU 251 receives conference information from the conference apparatus 101 via the external interface 258, the CPU 251 determines YES and the process proceeds to S802. If the conference information is not received from the conference apparatus 101 via the external interface 258, it is determined as NO and the process proceeds to S810.

Ｓ８０２において、ＣＰＵ２５１は、受信した会議情報に含まれる音声データに対して音声認識を行い、発話テキストを取得する。ここで、音声認識を行うため、ＣＰＵ２５１は、音声データを先頭から走査して、次の処理を行う。
まず、ＣＰＵ２５１は、音声データ中の無音区間を検出する。無音区間の検出は、例えば、音声データの音圧が閾値以下の状態が一定時間継続されたことに基づいて検出する。ある無音区間と次の無音区間の間の区間を発話区間とする。ＣＰＵ２５１は、個々の発話区間について、音声認識を行って発話テキストを取得する。
次に、ＣＰＵ２５１は、会議情報の音声情報テーブル４００の録音開始時刻列４０１の録音開始時刻と、各発話区間の音声データの先頭からの経過位置とから、各発話区間の発話時刻を計算する。
このようにして取得した発話区間毎に、発話情報テーブル６００のレコードを生成する。そして、該当する発話時刻と発話テキストを、それぞれ、発話時刻列６０１と発話テキスト列６０２に記録する。 In step S 802, the CPU 251 performs voice recognition on the voice data included in the received conference information, and acquires the utterance text. Here, in order to perform voice recognition, the CPU 251 scans voice data from the head and performs the following processing.
First, the CPU 251 detects a silent section in audio data. The silent section is detected based on, for example, that a state in which the sound pressure of the audio data is equal to or lower than a threshold value is continued for a certain period of time. A section between a certain silent section and the next silent section is defined as a speech section. The CPU 251 performs speech recognition for each utterance section and acquires the utterance text.
Next, the CPU 251 calculates the utterance time of each utterance section from the recording start time in the recording start time column 401 of the conference information voice information table 400 and the elapsed position from the beginning of the voice data of each utterance section.
A record of the utterance information table 600 is generated for each utterance section acquired in this way. Then, the corresponding utterance time and utterance text are recorded in the utterance time column 601 and the utterance text column 602, respectively.

Ｓ８０３において、ＣＰＵ２５１は、取得した発話テキストから音声コマンドを削除する処理を行う。削除処理のフローの詳細な説明は図１３で後述する。 In step S803, the CPU 251 performs processing for deleting a voice command from the acquired utterance text. A detailed description of the deletion processing flow will be described later with reference to FIG.

Ｓ８０４において、ＣＰＵ２５１は、受信した会議情報に含まれる画像データに対して文字認識を行い、記入テキストを取得する。ここで、文字認識を行うため、ＣＰＵ２５１は、会議情報に含まれる画像情報テーブル４１０のレコードを順に走査して、次の処理を行う。
画像情報テーブル４１０のレコードは、撮影時刻列４１１の値で昇順に（撮影した順に）ソートしておく。ＣＰＵ２５１は、現在参照しているレコードの画像データ列４１２が示す画像データと、ひとつ前のレコードの画像データ列４１２が示す画像データとの画像の差分を求める。画像の差分は、ひとつ前のレコードに該当する撮影から、現在参照しているレコードに該当する撮影までの間に、ユーザが記入した文字を含む、部分画像とみなすことができる。この部分画像に対して文字認識を行い、記入テキストを取得する。
また、ＣＰＵ２５１は、現在参照しているレコードの撮影時刻列４１１の撮影時刻を、画像の差分が発生した時刻、すなわちユーザによる記入が行われた記入時刻とする。
このようにして取得された画像の差分（ユーザによる記入）毎に、ＣＰＵ２５１は、記入情報テーブル６１０にレコードを生成する。そして、該当する記入時刻と記入テキストを、それぞれ記入時刻列６１１と記入テキスト列６１２に記録する。 In step S 804, the CPU 251 performs character recognition on the image data included in the received conference information, and acquires entry text. Here, in order to perform character recognition, the CPU 251 sequentially scans the records of the image information table 410 included in the conference information, and performs the following processing.
Records in the image information table 410 are sorted in ascending order (in order of shooting) by the value of the shooting time column 411. The CPU 251 obtains an image difference between the image data indicated by the image data string 412 of the currently referenced record and the image data indicated by the image data string 412 of the previous record. The image difference can be regarded as a partial image including characters entered by the user from the shooting corresponding to the previous record to the shooting corresponding to the currently referenced record. Character recognition is performed on this partial image to obtain entry text.
In addition, the CPU 251 sets the shooting time in the shooting time column 411 of the currently referenced record as the time when the image difference occurs, that is, the entry time when the entry was made by the user.
The CPU 251 generates a record in the entry information table 610 for each image difference (entry by the user) acquired in this way. The corresponding entry time and entry text are recorded in entry time column 611 and entry text column 612, respectively.

Ｓ８０５において、ＣＰＵ２５１は、発話テキストと記入テキストを統合して、会議テキストを取得する。すなわち、ＣＰＵ２５１は、Ｓ８０２で生成した発話情報テーブル６００（図６（ａ））とＳ８０４で生成した記入情報テーブル６１０（図６（ｂ））とを統合して、会議テキスト情報テーブル６２０（図６（ｃ））を生成する。
ここで、ＣＰＵ２５１は、発話情報テーブル６００に含まれるレコードを会議テキスト情報テーブル６２０に追加する。このとき、発話時刻列６０１の発話時刻を会議テキストが発生した時刻として発生時刻列６２１に、発話テキスト列６０２の発話テキストを会議テキストとして会議テキスト列６２２に、それぞれ記録する。区分列６２３には、元のデータが発話情報であったことを示す「０」を記録する。
また、ＣＰＵ２５１は、記入情報テーブル６１０に含まれるレコードを会議テキスト情報テーブル６２０に追加する。このとき、記入時刻列６１１の記入時刻を会議テキストが発生した時刻として発生時刻列６２１に、記入テキスト列６１２の記入テキストを会議テキストとして会議テキスト列６２２に、それぞれ記録する。区分列６２３には、元のデータが記入情報であったことを示す「１」を記録する。ＣＰＵ２５１は、以上追加したレコードを発生時刻列６２１の値で昇順に（発生した順に）ソートする。 In S805, the CPU 251 acquires the conference text by integrating the utterance text and the entry text. That is, the CPU 251 integrates the utterance information table 600 (FIG. 6A) generated in S802 and the entry information table 610 (FIG. 6B) generated in S804, to obtain a conference text information table 620 (FIG. 6). (C)) is generated.
Here, the CPU 251 adds the record included in the utterance information table 600 to the conference text information table 620. At this time, the utterance time in the utterance time column 601 is recorded in the occurrence time column 621 as the time when the conference text is generated, and the utterance text in the utterance text column 602 is recorded in the conference text column 622 as the conference text. In the classification column 623, “0” indicating that the original data was speech information is recorded.
In addition, the CPU 251 adds a record included in the entry information table 610 to the conference text information table 620. At this time, the entry time in the entry time column 611 is recorded in the occurrence time column 621 as the occurrence time of the meeting text, and the entry text in the entry text column 612 is recorded in the meeting text column 622 as the meeting text. In the classification column 623, “1” indicating that the original data was entry information is recorded. The CPU 251 sorts the added records by the value of the occurrence time column 621 in ascending order (in the order of occurrence).

Ｓ８０６において、ＣＰＵ２５１は、Ｓ８０５で生成した会議テキストから要点を抽出する。
ＣＰＵ２５１は、Ｓ８０５において会議テキスト情報テーブル６２０に追加した各レコードについて、会議テキスト列６２２の会議テキストが要点であるか否かを判定する。例えば、会議テキストが予め決定した特定のキーワードを含むか否かに基づいて判定する。会議テキストに特定のキーワードが含まれていれば、要点であると判定する。また、音声コマンドを用いて、ＡＩや決定事項の登録指示を行った会議テキストである場合、要点であると判定する。
会議テキストが要点である場合には、要点列６２４に「１」を記録し、そうでない場合は「０」を記録する。 In step S806, the CPU 251 extracts a point from the conference text generated in step S805.
The CPU 251 determines whether or not the conference text in the conference text column 622 is the main point for each record added to the conference text information table 620 in S805. For example, the determination is made based on whether or not the conference text includes a predetermined keyword. If the conference text includes a specific keyword, it is determined that it is the main point. Further, if the conference text is an instruction for registering AI or a decision item using a voice command, the conference text is determined to be the main point.
If the conference text is the main point, “1” is recorded in the main point column 624, otherwise “0” is recorded.

Ｓ８０７において、ＣＰＵ２５１は、Ｓ８０５で生成した会議テキストを要約する。ＣＰＵ２５１は、会議情報のアジェンダ情報テーブル４２０のレコードを順に走査して、次の処理を行う。
アジェンダ情報テーブル４２０のレコードは、アジェンダ開始時刻列４２１の値で昇順に（アジェンダの開始順に）ソートしておく。ＣＰＵ２５１は、現在参照しているレコードのアジェンダ開始時刻列４２１のアジェンダ開始時刻から、アジェンダ終了時刻列４２２のアジェンダ終了時刻までの期間を取得する。
そして、会議テキスト情報テーブル６２０から発生時刻列６２１の値が当該期間に該当するレコード群を抽出する。ＣＰＵ２５１は、それらレコード群の会議テキスト列６２２のテキストを要約して要約テキストを生成する。そして、要約情報テーブル７００にレコードを追加して、要約テキスト列７０２に生成した要約テキストを記録する。
また、現在参照しているアジェンダ情報テーブル４２０のレコードのアジェンダ名列４２３のアジェンダ名を、要約情報テーブル７００に追加したレコードのアジェンダ名７０１に記録する。 In S807, the CPU 251 summarizes the conference text generated in S805. The CPU 251 sequentially scans the records in the conference information agenda information table 420 and performs the following processing.
The records in the agenda information table 420 are sorted in ascending order (in order of agenda start) according to the values in the agenda start time column 421. The CPU 251 acquires the period from the agenda start time in the agenda start time column 421 of the currently referenced record to the agenda end time in the agenda end time column 422.
Then, a record group in which the value of the occurrence time column 621 corresponds to the period is extracted from the conference text information table 620. The CPU 251 summarizes the texts in the meeting text string 622 of these record groups and generates a summary text. Then, a record is added to the summary information table 700, and the generated summary text is recorded in the summary text column 702.
Further, the agenda name in the agenda name column 423 of the record of the agenda information table 420 currently referred to is recorded in the agenda name 701 of the record added to the summary information table 700.

Ｓ８０８において、ＣＰＵ２５１は、以上のようにして取得した議事録元情報および会議情報に基づいて議事録を生成する。
ここで、図９を用いて、Ｓ８０８で作成する議事録について説明する。図９は、議事録の概要を示す図である。
議事録９００は、議事を示す議事テキストデータ９１０と、添付画像データ群９２０からなる。
議事テキストデータ９１０には、会議開催時間９１１、要点一覧９１２、アジェンダ名９１３、９１５、要約テキスト９１４、９１６が含まれる。
会議開催時間９１１は、音声情報テーブル４００の録音開始時刻列４０１の録音開始時刻（会議開始時刻）と、録音終了時刻列４０２の録音終了時刻（会議終了時刻）から生成される。要点一覧９１２は、会議テキスト情報テーブル６２０のレコードであり、要点列６２４が「１」（要点）であるレコードの会議テキスト６２２の一覧である。
アジェンダ名９１３、９１５は、要約情報テーブル７００のアジェンダ名列７０１のアジェンダ名である。要約テキスト９１４、９１６は、要約情報テーブル７００の要約テキスト７０２である。
また、添付画像データ群９２０は、会議情報に含まれる画像データを含む。 In S808, the CPU 251 generates the minutes based on the minutes source information and the meeting information acquired as described above.
Here, the minutes created in S808 will be described with reference to FIG. FIG. 9 is a diagram showing an outline of the minutes.
The minutes 900 include agenda text data 910 indicating the agenda and an attached image data group 920.
The agenda text data 910 includes a meeting holding time 911, a gist list 912, agenda names 913 and 915, and summary texts 914 and 916.
The conference opening time 911 is generated from the recording start time (conference start time) in the recording start time column 401 of the audio information table 400 and the recording end time (conference end time) in the recording end time column 402. The point list 912 is a record of the meeting text information table 620, and is a list of the meeting text 622 of the record whose point column 624 is “1” (point).
Agenda names 913 and 915 are agenda names in the agenda name column 701 of the summary information table 700. The summary texts 914 and 916 are the summary text 702 of the summary information table 700.
The attached image data group 920 includes image data included in the conference information.

Ｓ８０９において、ＣＰＵ２５１は、議事録を、会議情報に含まれる送信先に送信する。
送信方法としては、例えば、電子メールで送信することができる。ＣＰＵ２５１は、電子メール本文に議事テキストデータ９１０を入力し、添付ファイルに添付画像データ群９２０を入力して、電子メールを送信する。 In step S809, the CPU 251 transmits the minutes to the transmission destination included in the conference information.
As a transmission method, for example, it can be transmitted by electronic mail. The CPU 251 inputs the proceedings text data 910 in the body of the email, inputs the attached image data group 920 in the attached file, and transmits the email.

Ｓ８１０において、ＣＰＵ２５１は、終了の指示がなされたか否かを判定する。
ユーザは、例えば、外部インターフェース２５８を介して、別体のＰＣから会議サーバ１０２に終了を指示することができる。
終了指示がなされていたら、ＹＥＳと判定し、処理を終了する。終了指示がなされていなければ、ＮＯと判定し、Ｓ８０１に遷移する。 In step S810, the CPU 251 determines whether an end instruction has been issued.
For example, the user can instruct the conference server 102 to end from a separate PC via the external interface 258.
If an end instruction has been given, the determination is YES and the process ends. If no termination instruction is given, it is determined as NO and the process proceeds to S801.

ここで、図１０を用いて、Ｓ８０７で生成する要約テキストについて説明する。
図１０（ａ）は、要約テキストを生成する処理の一例を示すフローチャートである。本フローチャートの処理のために入力されるテキストは、会議テキスト情報テーブル６２０の複数のレコードである。 Here, the summary text generated in S807 will be described with reference to FIG.
FIG. 10A is a flowchart illustrating an example of a process for generating a summary text. The text input for the processing of this flowchart is a plurality of records in the conference text information table 620.

Ｓ１００１において、ＣＰＵ２５１は、入力されたレコード全てを参照し、会議テキスト列６２２の会議テキストに出現する各単語について、その重要度を算出する。これは、例えば、各単語の出現頻度に基づいて算出することができる。 In step S 1001, the CPU 251 refers to all input records and calculates the importance of each word that appears in the conference text in the conference text string 622. This can be calculated based on the appearance frequency of each word, for example.

Ｓ１００２において、ＣＰＵ２５１は、入力された各レコードの会議テキスト列６２２の会議テキストについて、それぞれその重要度を算出する。具体的には、Ｓ１００１で算出した各単語の重要度を参照し、各会議テキストに含まれる単語の重要度の合計値を算出することなどにより、会議テキストの重要度を算出する。 In S1002, the CPU 251 calculates the importance of the conference text in the conference text column 622 of each input record. Specifically, the importance of the conference text is calculated by referring to the importance of each word calculated in S1001 and calculating the total value of the importance of the words included in each conference text.

Ｓ１００３において、ＣＰＵ２５１は、重要度が閾値以上の会議テキストを抽出する。そして、これらの会議テキストを結合して要約テキストを生成する。そして、本処理の結果として、処理呼び出し元に要約テキストを返す。 In step S 1003, the CPU 251 extracts meeting texts whose importance is greater than or equal to a threshold value. These meeting texts are combined to generate a summary text. Then, as a result of this process, a summary text is returned to the process caller.

ところで、会議においては、ホワイトボードなどの記入媒体には、論点や重要な意見を記載する場合がある。これらの内容は議事として重要なので、記入テキストを要約テキストに優先的に反映するようにしてもよい。
例えば、Ｓ１００３において、ＣＰＵ２５１は、入力されたレコードの区分列６２３の値を確認する。値が「１」（元のデータが記入情報である場合）ならば、重要度が閾値以上か否かに関わらず、当該レコードの会議テキスト列６２２の会議テキストを要約テキストの一部として抽出するようにしてもよい。 By the way, in a meeting, a point of entry or an important opinion may be written on an entry medium such as a whiteboard. Since these contents are important for the agenda, the entry text may be preferentially reflected in the summary text.
For example, in S1003, the CPU 251 confirms the value of the classification column 623 of the input record. If the value is “1” (when the original data is entry information), the meeting text in the meeting text column 622 of the record is extracted as a part of the summary text regardless of whether the importance is equal to or higher than the threshold value. You may do it.

あるいは、記入テキストを要約テキストに優先的に反映する処理の例として、次のように要約テキスト生成処理を実行してもよい。
図１０（ｂ）は、要約テキストを生成する処理の別の例を示すフローチャートである。図１０（ａ）のフローチャートと同様に、本フローチャートの処理のために入力されるテキストは、会議テキスト情報テーブル６２０の複数のレコードである。 Alternatively, as an example of processing that preferentially reflects the entry text in the summary text, the summary text generation processing may be executed as follows.
FIG. 10B is a flowchart showing another example of processing for generating summary text. Similar to the flowchart of FIG. 10A, the text input for the process of this flowchart is a plurality of records in the conference text information table 620.

Ｓ１０１１において、ＣＰＵ２５１は、入力されたレコードのうち、区分列６２３が「１」（元のデータが記入情報である場合）のレコードを参照して、会議テキスト列６２２の会議テキストに含まれる単語のリスト（以下、「記入単語リスト」と言う）を作成する。
Ｓ１０１２において、ＣＰＵ２５１は、Ｓ１００１と同様の処理を行う。
Ｓ１０１３において、ＣＰＵ２５１は、Ｓ１００２と同様の処理を行う。
Ｓ１０１４において、ＣＰＵ２５１は、Ｓ１０１１で作成した記入単語リストを参照して、Ｓ１０１３で算出した会議テキストの重要度を更新する。すなわち、入力された各レコードの会議テキスト列６２２の会議テキストについて、記入単語リストの単語を含む場合には、当該会議テキストの重要度にバイアスを加える。
Ｓ１０１５において、ＣＰＵ２５１は、Ｓ１００３と同様の処理を行う。 In S 1011, the CPU 251 refers to a record in which the classification column 623 is “1” (when the original data is entry information) among the input records, and determines the word included in the conference text in the conference text column 622. A list (hereinafter referred to as “entry word list”) is created.
In S1012, the CPU 251 performs the same process as in S1001.
In S1013, the CPU 251 performs the same process as in S1002.
In S1014, the CPU 251 updates the importance of the conference text calculated in S1013 with reference to the entry word list created in S1011. That is, when the conference text in the conference text column 622 of each input record includes a word in the entry word list, a bias is added to the importance of the conference text.
In S1015, the CPU 251 performs the same process as in S1003.

あるいは、Ｓ１０１４において、対象の会議テキストが記入単語リストの単語を含む場合には、重要度の最大値を付与するようにし、Ｓ１０１５で当該会議テキストが要約テキストの一部として抽出されるようにしてもよい。 Alternatively, in S1014, if the target meeting text includes a word in the entry word list, the maximum importance level is assigned, and in S1015, the meeting text is extracted as part of the summary text. Also good.

次に、発話テキストから音声コマンドを削除する処理について説明する。
図１３は、会議サーバ１０２における音声コマンドを削除する処理を示すフローチャートである。 Next, processing for deleting a voice command from the utterance text will be described.
FIG. 13 is a flowchart showing processing for deleting a voice command in the conference server 102.

まず、Ｓ１３０１において、ＣＰＵ２５１は、発話テキストから発話区間分のテキストを取得する。 First, in S1301, the CPU 251 acquires text for an utterance section from the utterance text.

Ｓ１３０２において、ＣＰＵ２５１は、取得したテキストの内、音声コマンドのヘッダ部に相当するテキストが含まれているか否かを判定する。具体的には、会議サーバ１０２のストレージ２５５に予め保持されたヘッダ部の文言と一致するテキストが含まれているか否かを判定する。
含まれている場合、ＹＥＳと判定し、Ｓ１３０３に遷移する。含まれていない場合、ＮＯと判定し、Ｓ１３１２に遷移する。 In step S 1302, the CPU 251 determines whether the acquired text includes text corresponding to the header portion of the voice command. Specifically, it is determined whether or not the storage 255 of the conference server 102 includes a text that matches the wording of the header part held in advance.
If it is included, the determination is YES, and the process proceeds to S1303. When not included, it determines with NO and changes to S1312.

Ｓ１３０３において、ＣＰＵ２５１は、ヘッダ部に続いて、音声コマンドの命令部に相当するテキストが含まれているか否かを判定する。具体的には、会議サーバ１０２のストレージ２５５に保持された命令部データテーブル１１００の命令部の文字列１１０１と一致するテキストが、ヘッダ部に続くテキストに含まれているか否かを判定する。
含まれている場合、ＹＥＳと判定し、Ｓ１３０４に遷移する。含まれていない場合、ＮＯと判定し、Ｓ１３１２に遷移する。 In step S1303, the CPU 251 determines whether the text corresponding to the command part of the voice command is included after the header part. Specifically, it is determined whether or not the text that matches the character string 1101 of the command part of the command part data table 1100 held in the storage 255 of the conference server 102 is included in the text that follows the header part.
If it is included, it is determined as YES, and the process proceeds to S1304. When not included, it determines with NO and changes to S1312.

Ｓ１３０４において、ＣＰＵ２５１は、Ｓ１３０３で検出した音声コマンドの命令がデータ部を有するか否かを判定する。具体的には、Ｓ１３０３で検出したテキストと命令部の文字列１１０１とが一致するレコードを特定する。そして、特定されたレコードのデータ部有無列１１０３を参照し「有」の場合、ＹＥＳと判定し、Ｓ１３０５に遷移する。「無」の場合は、ＮＯと判定し、Ｓ１３０６に遷移する。 In step S1304, the CPU 251 determines whether the voice command instruction detected in step S1303 has a data portion. Specifically, the record in which the text detected in S1303 matches the character string 1101 of the command part is specified. Then, with reference to the data portion presence / absence column 1103 of the identified record and “Yes”, it is determined as YES, and the process proceeds to S1305. In the case of “None”, it is determined as NO, and the process proceeds to S1306.

Ｓ１３０５において、ＣＰＵ２５１は、命令部に続くテキストに音声コマンドのデータ部に相当するテキストが含まれているか否かを判定する。具体的には、命令部に相当するテキストから、発話区間の終了（句読点）までに、テキストが含まれているか否かを判定する。
含まれている場合、ＹＥＳと判定し、Ｓ１３０６に遷移する。含まれていない場合、ＮＯと判定し、Ｓ１３１２に遷移する。 In step S 1305, the CPU 251 determines whether the text following the command part includes text corresponding to the data part of the voice command. Specifically, it is determined whether the text is included from the text corresponding to the command section to the end of the utterance section (punctuation).
If it is included, it is determined as YES, and the process proceeds to S1306. When not included, it determines with NO and changes to S1312.

Ｓ１３０６において、ＣＰＵ２５１は、命令部データテーブル１１００の削除範囲１１０４を参照し、検出したコマンドの削除範囲を特定する。
Ｓ１３０７において、ＣＰＵ２５１は、Ｓ１３０２で検出したヘッダ部に相当するテキストを発話テキストから削除する。例えば、会議サーバ１０２のストレージ２５５に予め保持された、ヘッダ部に相当する「Ｈｅｙ」という文字列と一致する箇所を削除する。 In step S 1306, the CPU 251 refers to the deletion range 1104 of the command part data table 1100 and identifies the deletion range of the detected command.
In S1307, the CPU 251 deletes the text corresponding to the header portion detected in S1302 from the utterance text. For example, a portion that matches the character string “Hey” corresponding to the header portion that is stored in the storage 255 of the conference server 102 in advance is deleted.

Ｓ１３０８において、ＣＰＵ２５１は、発話テキストから命令部を削除するか否かを判定する。Ｓ１３０６において特定した削除範囲に命令部が含まれる場合、ＹＥＳと判定し、Ｓ１３０９に遷移する。含まれない場合、ＮＯと判定し、Ｓ１３１０に遷移する。
Ｓ１３０９において、ＣＰＵ２５１は、Ｓ１３０３で検出した命令部に相当するテキストを発話テキストから削除する。 In step S1308, the CPU 251 determines whether to delete the command part from the utterance text. If the command part is included in the deletion range specified in S1306, it is determined YES and the process proceeds to S1309. When not included, it determines with NO and changes to S1310.
In S1309, the CPU 251 deletes the text corresponding to the command part detected in S1303 from the utterance text.

Ｓ１３１０において、ＣＰＵ２５１は、発話テキストからデータ部を削除するか否かを判定する。Ｓ１３０６において特定した削除範囲にデータ部が含まれる場合、ＹＥＳと判定し、Ｓ１３１１に遷移する。含まれない場合、ＮＯと判定し、Ｓ１３１２に遷移する。
Ｓ１３１１において、ＣＰＵ２５１は、Ｓ１３０５で検出したデータ部に相当するテキストを発話テキストから削除する。 In step S1310, the CPU 251 determines whether to delete the data part from the utterance text. When a data part is contained in the deletion range specified in S1306, it determines with YES and changes to S1311. When not included, it determines with NO and changes to S1312.
In S1311, the CPU 251 deletes the text corresponding to the data part detected in S1305 from the utterance text.

Ｓ１３１２において、ＣＰＵ２５１は、発話テキストに含まれるテキストを全て走査したか否かを判定する。
全て走査した場合、ＹＥＳと判定し、処理を終了する。まだ走査を全て終了していない場合は、Ｓ１３０１に遷移する。 In step S1312, the CPU 251 determines whether all the text included in the utterance text has been scanned.
If all scanning has been performed, the determination is YES and the processing is terminated. If all the scanning has not been completed yet, the process proceeds to S1301.

図１４は、発話テキストから音声コマンドのヘッダ部を削除する動作例を示す図である。
本動作は、図１４（ａ）に示すように、例えば、ユーザが決定事項を登録するための発話を行った場合に実行される。
この例では、「Ｈｅｙ、決定事項、Ｄさんは見直し案を作成する。」という発話（図１４（ａ））に対して、ヘッダ部（Ｈｅｙ）を削除した、「決定事項、Ｄさんは見直し案を作成する。」というテキスト（図１４（ｂ））が生成される。
本動作は、上述の例以外にも、図１１に示すように、ＡＩへの登録指示、アジェンダの開始指示、アジェンダの終了指示、などを行う場合にも実行される。 FIG. 14 is a diagram illustrating an operation example in which the header portion of the voice command is deleted from the utterance text.
As shown in FIG. 14A, this operation is executed, for example, when the user makes an utterance for registering a decision item.
In this example, the header (Hey) is deleted from the utterance (FIG. 14A) “Hey, decision item, Mr. D prepares a review plan”. The text “Create a plan” (FIG. 14B) is generated.
In addition to the above-described example, this operation is also performed when a registration instruction to AI, an agenda start instruction, an agenda end instruction, and the like are performed as shown in FIG.

会議サーバ１０２は、Ｓ８０１において会議装置１０１から会議情報に含まれる音声データを受信し、Ｓ８０２において音声データを音声認識して、図１４（ａ）のような発話テキスト１４００が得られるものとする。発話テキスト１４００に示されるテキストは、音声認識された結果の一部であり、内容はヘッダ部を削除する動作を説明するための一例である。
続いて、Ｓ８０３において音声コマンドの削除処理が実行される。音声コマンドの削除処理の詳細は図１３のフローチャートに示すとおりである。 Assume that the conference server 102 receives voice data included in the conference information from the conference apparatus 101 in S801, and recognizes the voice data in S802 to obtain an utterance text 1400 as shown in FIG. The text shown in the utterance text 1400 is a part of the result of speech recognition, and the content is an example for explaining the operation of deleting the header part.
Subsequently, a voice command deletion process is executed in S803. The details of the voice command deletion process are as shown in the flowchart of FIG.

Ｓ１３０１において、ＣＰＵ２５１は、発話テキストから発話区間のテキストを取得する。ここでは、発話テキスト１４００から発話区間のテキスト１４０１が取得された場合を説明する。
Ｓ１３０２において、ＣＰＵ２５１は、ヘッダ部に相当するテキストが含まれているか否かを判定する。ここでは、発話区間のテキスト１４０１には、会議サーバ１０２のストレージ２５５に予め保持された「Ｈｅｙ」という音声コマンドのヘッダ部に相当するテキストが含まれているため、判定がＹＥＳとなり、Ｓ１３０３に遷移する。
Ｓ１３０３において、ＣＰＵ２５１は、ヘッダ部に続いて命令部に相当するテキストが含まれているか否かを判定する。そこで、ヘッダ部に相当する「Ｈｅｙ」に続くテキストと、命令部データテーブル１１００の命令部の文字列１１０１とを比較する。そうすると、発話区間のテキスト１４０１には、「決定事項」という命令部に相当するテキストが含まれているため、判定がＹＥＳとなり、Ｓ１３０４に遷移する。
Ｓ１３０４において、ＣＰＵ２５１は、検出した命令部がデータ部を有するか否かを判定する。図１１で命令部の文字列１１０１が「決定事項」であるレコードを参照すると、ここでは、データ部有無列１１０３が「有」であるため、判定がＹＥＳとなり、Ｓ１３０５に遷移する。
Ｓ１３０５において、ＣＰＵ２５１は、命令部に続いてデータ部に相当するテキストが含まれているか否かを判定する。発話区間のテキスト１４０１には、命令部に相当する「決定事項」に続くテキストとして、「Ｄさんは見直し案を作成する。」というテキストが含まれているため、判定がＹＥＳとなり、Ｓ１３０６に遷移する。 In step S1301, the CPU 251 acquires the text of the utterance section from the utterance text. Here, a case where the text 1401 of the utterance section is acquired from the utterance text 1400 will be described.
In S1302, the CPU 251 determines whether or not text corresponding to the header portion is included. Here, since the text 1401 of the utterance section includes text corresponding to the header portion of the voice command “Hey” stored in the storage 255 of the conference server 102 in advance, the determination is YES, and the flow proceeds to S1303. To do.
In step S1303, the CPU 251 determines whether text corresponding to the command portion is included after the header portion. Therefore, the text following “Hey” corresponding to the header portion is compared with the character string 1101 of the instruction portion of the instruction portion data table 1100. Then, since the text 1401 in the utterance section includes the text corresponding to the command part “determination item”, the determination is YES, and the process proceeds to S1304.
In step S1304, the CPU 251 determines whether the detected command part has a data part. Referring to the record in which the character string 1101 of the instruction part is “determined matter” in FIG. 11, since the data part presence / absence string 1103 is “present”, the determination is YES, and the process proceeds to S1305.
In step S1305, the CPU 251 determines whether text corresponding to the data portion is included after the command portion. Since the text 1401 of the utterance section includes the text “Mr. D creates a review plan” as the text following the “decision item” corresponding to the command section, the determination is YES, and the flow proceeds to S1306. To do.

Ｓ１３０６において、命令部データテーブル１１００を参照し、音声コマンドの削除範囲を特定する。命令部データテーブル１１００の命令部の文字列１１０１が「決定事項」であるレコードの削除範囲列１１０４を参照すると、削除範囲はヘッダ部であると特定される。
Ｓ１３０７において、ＣＰＵ２５１は、発話区間のテキスト１４０１からヘッダ部に相当する「Ｈｅｙ」というテキストを削除する。
Ｓ１３０８において、ＣＰＵ２５１は、命令部を削除するか否かを判定する。削除範囲はヘッダ部であるため、判定はＮＯとなり、Ｓ１３１０に遷移する。
Ｓ１３１０において、ＣＰＵ２５１は、データ部を削除するか否かを判定する。削除範囲はヘッダ部であるため、判定はＮＯとなり、Ｓ１３１２に遷移する。
以上の処理が実行されると、図１４（ｂ）のように、発話テキストに含まれる発話区間のテキストから、ヘッダ部が削除されたテキスト１４０２が得られる。 In S1306, the command part data table 1100 is referred to, and the deletion range of the voice command is specified. Referring to the deletion range column 1104 of the record in which the character string 1101 of the command part in the command part data table 1100 is “determination matter”, the deletion range is specified as the header part.
In step S 1307, the CPU 251 deletes the text “Hey” corresponding to the header portion from the text 1401 of the utterance section.
In step S1308, the CPU 251 determines whether to delete the command part. Since the deletion range is the header part, the determination is no and the process proceeds to S1310.
In step S1310, the CPU 251 determines whether to delete the data part. Since the deletion range is the header part, the determination is no and the process proceeds to S1312.
When the above processing is executed, as shown in FIG. 14B, the text 1402 from which the header portion is deleted is obtained from the text of the speech section included in the speech text.

図１５は、発話テキストから音声コマンドのヘッダ部と命令部を削除する動作例を示す図である。
本動作は、図１５（ａ）に示すように、例えば、ユーザが決定事項を削除するための発話を行った場合に実行される。
この例では、「Ｈｅｙ、決定事項削除。」という発話（図１５（ａ））に対して、ヘッダ部（Ｈｅｙ）と命令部（決定事項削除）がいずれも削除される（図１５（ｂ））。
本動作は、上述の例以外にも、図１１に示すように、ＡＩへの削除指示、撮影の指示、などを行う場合にも実行される。 FIG. 15 is a diagram illustrating an operation example in which the header part and the command part of the voice command are deleted from the utterance text.
As shown in FIG. 15A, this operation is executed when, for example, the user makes an utterance for deleting a decision item.
In this example, both the header part (Hey) and the command part (determination item deletion) are deleted in response to the utterance "Hey, determination item deletion" (FIG. 15A) (FIG. 15B). ).
In addition to the above-described example, this operation is also performed when a deletion instruction to AI, a photographing instruction, or the like is performed as shown in FIG.

会議サーバ１０２は、Ｓ８０１において会議装置１０１からの会議情報に含まれる音声データを受信し、Ｓ８０２において音声データを音声認識して、図１５（ａ）のような発話テキスト１５００が得られるものとする。発話テキスト１５００に示されるテキストは、音声認識された結果の一部であり、内容は動作例を説明するための一例である。
続いて、Ｓ８０３において音声コマンドの削除処理が実行される。音声コマンドの削除処理の詳細は図１３のフローチャートに示すとおりである。 The conference server 102 receives the voice data included in the conference information from the conference apparatus 101 in S801, recognizes the voice data in S802, and obtains the utterance text 1500 as shown in FIG. . The text shown in the utterance text 1500 is a part of the result of speech recognition, and the content is an example for explaining an operation example.
Subsequently, a voice command deletion process is executed in S803. The details of the voice command deletion process are as shown in the flowchart of FIG.

Ｓ１３０１において、ＣＰＵ２５１は、発話テキストから発話区間のテキストを取得する。ここでは、発話テキスト１５００から発話区間のテキスト１５０１が取得された場合を説明する。 In step S1301, the CPU 251 acquires the text of the utterance section from the utterance text. Here, the case where the text 1501 of the speech section is acquired from the speech text 1500 will be described.

Ｓ１３０２において、ＣＰＵ２５１は、ヘッダ部に相当するテキストが含まれているか否かを判定する。発話区間のテキスト１５０１には、会議サーバ１０２のストレージ２５５に予め保持された「Ｈｅｙ」という音声コマンドのヘッダ部に相当するテキストが含まれているため、判定がＹＥＳとなり、Ｓ１３０３に遷移する。
Ｓ１３０３において、ＣＰＵ２５１は、ヘッダ部に続いて命令部に相当するテキストが含まれているか否かを判定する。そこで、ヘッダ部に相当する「Ｈｅｙ」に続くテキストと、命令部データテーブル１１００の命令部の文字列１１０１とを比較する。そうすると、発話区間のテキスト１５０１には「決定事項削除」という命令部に相当するテキストが含まれているため、判定がＹＥＳとなり、Ｓ１３０４に遷移する。
Ｓ１３０４において、ＣＰＵ２５１は、検出した命令部がデータ部を有するか否かを判定する。図１１で命令部の文字列１１０１が「決定事項削除」であるレコードを参照すると、データ部有無列１１０３が「無」であるため、判定がＮＯとなり、Ｓ１３０６に遷移する。 In S1302, the CPU 251 determines whether or not text corresponding to the header portion is included. Since the text 1501 of the utterance section includes text corresponding to the header portion of the voice command “Hey” stored in the storage 255 of the conference server 102 in advance, the determination is YES, and the flow shifts to S1303.
In step S1303, the CPU 251 determines whether text corresponding to the command portion is included after the header portion. Therefore, the text following “Hey” corresponding to the header portion is compared with the character string 1101 of the instruction portion of the instruction portion data table 1100. Then, since the text 1501 in the utterance section includes the text corresponding to the command part “deletion of decision item”, the determination is YES, and the process proceeds to S1304.
In step S1304, the CPU 251 determines whether the detected command part has a data part. Referring to the record in which the character string 1101 of the command part is “deletion of decision item” in FIG.

Ｓ１３０６において、命令部データテーブル１１００を参照し、コマンドの削除範囲を特定する。命令部データテーブル１１００の命令部の文字列１１０１が「決定事項削除」であるレコードの削除範囲列１１０４を参照すると、削除範囲はヘッダ部と命令部であると特定される。
Ｓ１３０７において、発話区間のテキスト１５０１からヘッダ部に相当する「Ｈｅｙ」というテキストを削除する。
Ｓ１３０８において、命令部を削除するか否かを判定する。削除範囲はヘッダ部と命令部であるため、判定はＹＥＳとなり、Ｓ１３０９に遷移する。
Ｓ１３０９において、発話区間のテキスト１５０１から命令部に相当する「決定事項削除」というテキストを削除する。
Ｓ１３１０において、データ部を削除するか否かを判定する。削除範囲はヘッダ部と命令部であるため、判定はＮＯとなり、Ｓ１３１２に遷移する。
以上の処理が実行されると、図１５（ｂ）のように、発話テキストに含まれる発話区間のテキストから、ヘッダ部と命令部が削除されたテキスト１５０２が得られる。 In step S1306, the command part data table 1100 is referenced to specify the command deletion range. Referring to the deletion range column 1104 of the record in which the character string 1101 of the command portion in the command portion data table 1100 is “deletion of decision item”, the deletion range is specified as the header portion and the command portion.
In step S1307, the text “Hey” corresponding to the header portion is deleted from the text 1501 of the utterance section.
In S1308, it is determined whether or not to delete the command part. Since the deletion range is the header part and the command part, the determination is YES, and the process proceeds to S1309.
In step S1309, the text “determination item deletion” corresponding to the command part is deleted from the text 1501 of the utterance section.
In step S1310, it is determined whether to delete the data part. Since the deletion range is the header part and the command part, the determination is no and the process proceeds to S1312.
When the above processing is executed, as shown in FIG. 15B, a text 1502 in which the header part and the instruction part are deleted from the text of the speech section included in the speech text is obtained.

図１６は、発話テキストから音声コマンドのヘッダ部と命令部とデータ部を削除する動作例を示す図である。
本動作は、図１６（ａ）に示すように、例えば、音量の変更を指示するための発話を行った場合に実行される。
この例では、「Ｈｅｙ、音量、１０上げる。」という発話（図１６（ａ））に対して、ヘッダ部（Ｈｅｙ）、命令部（音量）、データ部（１０上げる）がいずれも削除される（図１６（ｂ））。 FIG. 16 is a diagram illustrating an operation example in which the header portion, the command portion, and the data portion of the voice command are deleted from the utterance text.
This operation is executed when, for example, an utterance for instructing a change in volume is performed, as shown in FIG.
In this example, the header part (Hey), the command part (sound volume), and the data part (10 increase) are all deleted for the utterance “Hey, volume increase by 10” (FIG. 16A). (FIG. 16B).

会議サーバ１０２は、Ｓ８０１において会議装置１０１からの会議情報に含まれる音声データを受信し、Ｓ８０２において音声データを音声認識して、図１６（ａ）のような発話テキスト１６００が得られるものとする。発話テキスト１６００に示されるテキストは、音声認識された結果の一部であり、内容は動作例を説明するための一例である。
続いて、Ｓ８０３において音声コマンドの削除処理が実行される。音声コマンドの削除処理の詳細は図１３のフローチャートに示すとおりである。 The conference server 102 receives the voice data included in the conference information from the conference apparatus 101 in S801, recognizes the voice data in S802, and obtains the utterance text 1600 as shown in FIG. . The text shown in the utterance text 1600 is a part of the speech recognition result, and the content is an example for explaining an operation example.
Subsequently, a voice command deletion process is executed in S803. The details of the voice command deletion process are as shown in the flowchart of FIG.

Ｓ１３０１において、ＣＰＵ２５１は、発話テキストから発話区間のテキストを取得する。ここでは、発話テキスト１６００から発話区間のテキスト１６０１が取得された場合を説明する。 In step S1301, the CPU 251 acquires the text of the utterance section from the utterance text. Here, the case where the text 1601 of the speech section is acquired from the speech text 1600 will be described.

Ｓ１３０２において、ＣＰＵ２５１は、ヘッダ部に相当するテキストが含まれているか否かを判定する。発話区間のテキスト１６０１には、会議サーバ１０２のストレージ２５５に予め保持された「Ｈｅｙ」という音声コマンドのヘッダ部に相当するテキストが含まれているため、判定がＹＥＳとなり、Ｓ１３０３に遷移する。
Ｓ１３０３において、ＣＰＵ２５１は、ヘッダ部に続いて命令部に相当するテキストが含まれているか否かを判定する。そこで、ヘッダ部に相当する「Ｈｅｙ」に続くテキストと、命令部データテーブル１１００の命令部の文字列１１０１とを比較する。そうすると、発話区間のテキスト１６０１には「音量」という命令部に相当するテキストが含まれているため、判定がＹＥＳとなり、Ｓ１３０４に遷移する。
Ｓ１３０４において、ＣＰＵ２５１は、検出した命令部がデータ部を有するか否かを判定する。図１１で命令部の文字列１１０１が「音量」であるレコードを参照すると、データ部有無列１１０３が「有」であるため、判定がＹＥＳとなり、Ｓ１３０５に遷移する。
Ｓ１３０５において、ＣＰＵ２５１は、命令部に続いてデータ部に相当するテキストが含まれているか否かを判定する。発話区間のテキスト１６０１には、命令部に相当する「音量」に続くテキストとして、「１０上げる。」というテキストが含まれているため、判定がＹＥＳとなり、Ｓ１３０６に遷移する。 In S1302, the CPU 251 determines whether or not text corresponding to the header portion is included. Since the text 1601 of the utterance section includes text corresponding to the header portion of the voice command “Hey” stored in the storage 255 of the conference server 102 in advance, the determination is YES, and the processing proceeds to S1303.
In step S1303, the CPU 251 determines whether text corresponding to the command portion is included after the header portion. Therefore, the text following “Hey” corresponding to the header portion is compared with the character string 1101 of the instruction portion of the instruction portion data table 1100. Then, since the text 1601 in the utterance section includes the text corresponding to the command part “volume”, the determination is YES, and the process proceeds to S1304.
In step S1304, the CPU 251 determines whether the detected command part has a data part. Referring to the record in which the character string 1101 of the command part is “sound volume” in FIG.
In step S1305, the CPU 251 determines whether text corresponding to the data portion is included after the command portion. Since the text 1601 in the utterance section includes the text “Raise 10” as the text following the “volume” corresponding to the command section, the determination is YES, and the flow shifts to S1306.

Ｓ１３０６において、ＣＰＵ２５１は、命令部データテーブル１１００を参照し、音声コマンドの削除範囲を特定する。命令部データテーブル１１００の命令部の文字列１１０１が「音量」であるレコードの削除範囲列１１０４を参照すると、削除範囲はヘッダ部と命令部とデータ部であると特定される。
Ｓ１３０７において、ＣＰＵ２５１は、発話区間のテキスト１６０１からヘッダ部に相当する「Ｈｅｙ」というテキストを削除する。
Ｓ１３０８において、ＣＰＵ２５１は、命令部を削除するか否かを判定する。削除範囲はヘッダ部と命令部とデータ部であるため、判定はＹＥＳとなり、Ｓ１３０９に遷移する。
Ｓ１３０９において、ＣＰＵ２５１は、発話区間のテキスト１６０１から命令部に相当する「音量」というテキストを削除する。
Ｓ１３１０において、ＣＰＵ２５１は、データ部を削除するか否かを判定する。削除範囲はヘッダ部と命令部とデータ部であるため、判定はＹＥＳとなり、Ｓ１３１１に遷移する。
Ｓ１３１１において、ＣＰＵ２５１は、発話区間のテキスト１６０１から命令部に続くテキストである「１０上げる。」というテキストを削除し、Ｓ１３１２に遷移する。
以上の処理が実行されると、図１６（ｂ）のように、発話テキストに含まれる発話区間のテキストから、ヘッダ部と命令部とデータ部が削除されたテキスト１６０２が得られる。 In step S 1306, the CPU 251 refers to the instruction part data table 1100 and specifies the deletion range of the voice command. Referring to the deletion range column 1104 of the record in which the character string 1101 of the command portion in the command portion data table 1100 is “volume”, the deletion range is specified as the header portion, the command portion, and the data portion.
In step S 1307, the CPU 251 deletes the text “Hey” corresponding to the header portion from the text 1601 of the utterance section.
In step S1308, the CPU 251 determines whether to delete the command part. Since the deletion range is the header part, the instruction part, and the data part, the determination is YES, and the process proceeds to S1309.
In step S 1309, the CPU 251 deletes the text “volume” corresponding to the command section from the text 1601 in the utterance section.
In step S1310, the CPU 251 determines whether to delete the data part. Since the deletion range is the header part, the instruction part, and the data part, the determination is YES, and the process proceeds to S1311.
In step S1311, the CPU 251 deletes the text “raise 10”, which is the text following the command section, from the text 1601 in the utterance section, and proceeds to step S1312.
When the above processing is executed, as shown in FIG. 16B, a text 1602 in which the header portion, the command portion, and the data portion are deleted from the text in the speech section included in the speech text is obtained.

以上、本実施例に示したとおり、音声コマンドとして発話したテキストのうち、議事録に不要なテキストを削除し、議事録に必要なテキストを発話テキストに残すことができる。 As described above, as shown in the present embodiment, it is possible to delete unnecessary text in the minutes from the text uttered as a voice command and leave the necessary text in the minutes in the utterance text.

（その他の実施例）
本発明は、上述の実施例の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。
また、本発明は、複数の機器から構成されるシステムに適用しても、１つの機器からなる装置に適用してもよい。
本発明は上述の実施例に限定されるものではなく、本発明の趣旨に基づき種々の変形が可能であり、それらを本発明の範囲から除外するものではない。即ち、上述の実施例及びその変形例を組み合わせた構成も全て本発明に含まれるものである。 (Other examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.
Further, the present invention may be applied to a system composed of a plurality of devices or an apparatus composed of a single device.
The present invention is not limited to the above-described embodiments, and various modifications can be made based on the spirit of the present invention, and they are not excluded from the scope of the present invention. That is, all the configurations in which the above-described embodiments and modifications thereof are combined are also included in the present invention.

１００会議システム
１０１会議装置
１０２会議サーバ
６００発話情報テーブル
７００要約情報テーブル
９００議事録
１１００命令部データテーブル
１４００発話テキスト 100 Conference System 101 Conference Device 102 Conference Server 600 Utterance Information Table 700 Summary Information Table 900 Minutes 1100 Command Data Table 1400 Utterance Text

Claims

Speech recognition means for recognizing spoken speech data to text,
A first detection means for detecting a header portion, which is a portion indicating the start of an instruction to the information processing device, of the text;
A second detection means for detecting an instruction part, which is a part indicating the type of the instruction to the information processing apparatus in the text;
Third detection means for detecting a data portion, which is a portion indicating the content of the command to the information processing apparatus, in the text;
An information processing apparatus comprising: a determining unit that determines a deletion range to be deleted from the text in the voice data according to the command unit.

The information processing apparatus according to claim 1, wherein the deletion range determined by the determining unit is registered in advance in a recording unit in accordance with the command unit.

The deletion range determined by the determination unit is any one of a header part, a header part and an instruction part, and a header part, an instruction part, and a data part according to the instruction part. The information processing apparatus according to 1 or 2.

The information processing apparatus according to claim 1, wherein a word used as the header part is registered in advance in a recording unit.

5. The information processing apparatus according to claim 1, wherein a word used as the command unit is registered in a recording unit in advance.

The information processing apparatus according to claim 1, wherein conference information that occurs in a conference is recorded based on content obtained by deleting the deletion range from the text.

The information processing apparatus according to any one of claims 1 to 6, wherein meeting information generated in the meeting is recorded based on an image photographed by the information processing apparatus.

8. The information processing apparatus according to claim 6, wherein a minutes of the meeting is generated based on the meeting information.

The information processing apparatus according to claim 8, wherein the minutes are generated for each agenda instructed by a user.

A speech recognition process for recognizing spoken speech data to text,
A first detection step of detecting a header portion, which is a portion indicating the start of an instruction to the information processing device, of the text;
A second detection step of detecting an instruction part, which is a part of the text indicating the type of the instruction to the information processing apparatus;
A third detection step of detecting a data portion, which is a portion indicating the content of the command to the information processing device, in the text;
A determination step of determining a deletion range to be deleted from the text in the voice data according to the command unit.

A program for causing a computer to execute the information processing method according to claim 10.