JP2022092178A

JP2022092178A - Information processing device

Info

Publication number: JP2022092178A
Application number: JP2020204816A
Authority: JP
Inventors: 和寛金丸; Kazuhiro Kanemaru; 譲内田; Yuzuru Uchida
Original assignee: Kyocera Document Solutions Inc
Current assignee: Kyocera Document Solutions Inc
Priority date: 2020-12-10
Filing date: 2020-12-10
Publication date: 2022-06-22

Abstract

To create minutes based on voice signals.SOLUTION: An information processing device 1 includes a voice recognition unit 13, a conversion unit 14, and a comparison processing unit 15. The voice recognition unit 13 recognizes a voice. The conversion unit 14 converts the voice recognized by the voice recognition unit 13 into character data. The comparison processing unit 15 compares the character data converted by the conversion unit 14 with manuscript data, and if the character data and the manuscript data match, changes the property of the corresponding character in the manuscript data. The voice recognition unit 13 recognizes a specific voice and registers it as a speaker. The conversion unit 14 distinguishes between the voice by the speaker and a voice by a person other than the speaker, and converts each voice into character data. The comparison processing unit 15 changes the character corresponding to the character data in which the voice by the speaker is converted in the manuscript data and the character corresponding to the character data in which the voice by the person other than the speaker is converted to different properties.SELECTED DRAWING: Figure 2

Description

本発明は、情報処理装置に関する。 The present invention relates to an information processing device.

特許文献１には、顧客と担当者との交渉において、担当者が説明すべきキーワードをすべて説明できた否かを確認するシステムが開示されている。 Patent Document 1 discloses a system for confirming whether or not all the keywords to be explained by the person in charge can be explained in the negotiation between the customer and the person in charge.

具体的には、特許文献１のシステムは、顧客端末と、担当者端末と、管理サーバを備える。担当者端末は、担当者音声をマイクで拾って担当者音声信号として管理サーバへ送る。管理サーバは、担当者音声信号からキーワードデータを抽出し、抽出したキーワードデータと予め設定された確認キーワードデータと比較する。 Specifically, the system of Patent Document 1 includes a customer terminal, a person in charge terminal, and a management server. The person in charge terminal picks up the person in charge voice with a microphone and sends it to the management server as a person in charge voice signal. The management server extracts keyword data from the voice signal of the person in charge and compares the extracted keyword data with the preset confirmation keyword data.

特開２０１５－９９４７４号公報Japanese Unexamined Patent Publication No. 2015-99474

特許文献１のシステムでは、キーワードを予め設定する必要があり、例えば、担当者による説明の最中に顧客からの質問等があった場合に、質問内容を記録したり、質問に対する回答が適切であるかを判定すること等には対応できない。このため、特許文献１のシステムは、議事録の作成には向いていない。 In the system of Patent Document 1, it is necessary to set keywords in advance. For example, when a customer asks a question during an explanation by a person in charge, the content of the question is recorded and the answer to the question is appropriate. It is not possible to determine if there is any. Therefore, the system of Patent Document 1 is not suitable for preparing minutes.

本発明は上記課題に鑑みてなされたものであり、その目的は、音声信号に基づいて議事録の作成が可能な情報処理装置を提供することにある。 The present invention has been made in view of the above problems, and an object of the present invention is to provide an information processing apparatus capable of creating minutes based on an audio signal.

本発明に係る情報処理装置は、音声認識部と、変換部と、比較処理部とを備える。前記音声認識部は、音声を認識する。前記変換部は、前記音声認識部によって認識された前記音声を文字データに変換する。前記比較処理部は、前記変換部によって変換された前記文字データを原稿データと比較し、前記文字データと前記原稿データとが一致する場合、前記原稿データにおける対応する文字のプロパティを変更する。前記音声認識部は、特定の音声を認識して発言者として登録する。前記変換部は、前記音声認識部によって登録された前記発言者による音声と、前記発言者以外による音声とを区別してそれぞれの前記音声を前記文字データに変換する。前記比較処理部は、前記原稿データにおける前記発言者による音声が変換された前記文字データに対応する前記文字と、前記発言者以外による音声が変換された前記文字データに対応する前記文字とを異なる前記プロパティに変更する。 The information processing apparatus according to the present invention includes a voice recognition unit, a conversion unit, and a comparison processing unit. The voice recognition unit recognizes voice. The conversion unit converts the voice recognized by the voice recognition unit into character data. The comparison processing unit compares the character data converted by the conversion unit with the manuscript data, and when the character data and the manuscript data match, the property of the corresponding character in the manuscript data is changed. The voice recognition unit recognizes a specific voice and registers it as a speaker. The conversion unit distinguishes between the voice by the speaker registered by the voice recognition unit and the voice by a person other than the speaker, and converts each of the voices into the character data. The comparison processing unit differs between the character corresponding to the character data in which the voice by the speaker in the manuscript data is converted and the character corresponding to the character data in which the voice by a person other than the speaker is converted. Change to the above property.

本発明によれば、音声信号に基づいて議事録を作成することが可能となる。 According to the present invention, it is possible to create minutes based on an audio signal.

本実施形態に係る情報処理装置１を備える会議システム１０を示す図である。It is a figure which shows the conference system 10 which includes the information processing apparatus 1 which concerns on this embodiment. 本実施形態に係る情報処理装置１を示す図である。It is a figure which shows the information processing apparatus 1 which concerns on this embodiment. 本実施形態に係る情報処理装置１による音声と原稿データとの比較処理の一例を示す図である。It is a figure which shows an example of the comparison processing of voice and manuscript data by the information processing apparatus 1 which concerns on this embodiment. 本実施形態に係る情報処理装置１による原稿データに含まれる文字のプロパティ変更の一例を示す図である。It is a figure which shows an example of the property change of the character included in the manuscript data by the information processing apparatus 1 which concerns on this embodiment. 本実施形態に係る情報処理装置１による文字のプロパティの再変更の一例を示す図である。It is a figure which shows an example of the re-change of the character property by the information processing apparatus 1 which concerns on this embodiment. 本実施形態に係る比較プロセスを示すフローチャートである。It is a flowchart which shows the comparison process which concerns on this embodiment.

以下、本発明の実施形態について、図面を参照しながら説明する。なお、図中、同一又は相当部分については同一の参照符号を付して説明を繰り返さない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the figure, the same or corresponding parts are designated by the same reference numerals and the description is not repeated.

まず、図１を参照して、本実施形態に係る情報処理装置１を備える会議システム１０の構成について説明する。図１は、本実施形態に係る情報処理装置１を備える会議システム１０を示す図である。図１に示すように、会議システム１０は、情報処理装置１と、表示装置３とを備える。 First, with reference to FIG. 1, the configuration of the conference system 10 including the information processing apparatus 1 according to the present embodiment will be described. FIG. 1 is a diagram showing a conference system 10 including an information processing device 1 according to the present embodiment. As shown in FIG. 1, the conference system 10 includes an information processing device 1 and a display device 3.

会議システム１０は、例えば、１人の発表者Ｓ１が複数の聞き手Ｌ１～Ｌ３に対して発表を行うプレゼンテーションにおいて使用される。なお、図１では、一例として、聞き手が３人の場合を示す。 The conference system 10 is used, for example, in a presentation in which one presenter S1 makes a presentation to a plurality of listeners L1 to L3. Note that FIG. 1 shows, as an example, the case where there are three listeners.

情報処理装置１は、例えば、発表者Ｓ１によって使用される端末であり、デスクトップ型パーソナルコンピューター（ＰＣ）、ノート型ＰＣ、タブレット端末、又はスマートフォンである。情報処理装置１は、例えば、プレゼンテーションの原稿データを記憶している。また、情報処理装置１は、例えば、図示しないディスプレーを有し、発表者Ｓ１に原稿データの内容を提示する。 The information processing device 1 is, for example, a terminal used by the presenter S1 and is a desktop personal computer (PC), a notebook PC, a tablet terminal, or a smartphone. The information processing device 1 stores, for example, the manuscript data of the presentation. Further, the information processing apparatus 1 has, for example, a display (not shown) and presents the contents of the manuscript data to the presenter S1.

表示装置３は、例えば、液晶ディスプレー及びスクリーン等であり、原稿データの内容を表示する。例えば、表示装置３は、情報処理装置１と通信可能である。具体的には、情報処理装置１は、原稿データの内容を示す原稿画像Ｍ１を表示装置３へ送信する。表示装置３は、情報処理装置１からの原稿画像Ｍ１を受信して表示する。 The display device 3 is, for example, a liquid crystal display, a screen, or the like, and displays the contents of the original data. For example, the display device 3 can communicate with the information processing device 1. Specifically, the information processing device 1 transmits a manuscript image M1 indicating the contents of the manuscript data to the display device 3. The display device 3 receives and displays the original image M1 from the information processing device 1.

例えば、プレゼンテーションにおいて、発表者の発表内容、及び聞き手からの質問事項等を記録しておくことが望ましい。これに対して、プレゼンテーションを音声で記録して原稿データを照合する方法が考えられる。 For example, in a presentation, it is desirable to record the content of the presenter's presentation and the questions from the listener. On the other hand, a method of recording the presentation by voice and collating the manuscript data can be considered.

次に、図１及び図２を参照して、本実施形態に係る情報処理装置１の構成について説明する。図２は、本実施形態に係る情報処理装置１を示す図である。 Next, the configuration of the information processing apparatus 1 according to the present embodiment will be described with reference to FIGS. 1 and 2. FIG. 2 is a diagram showing an information processing device 1 according to the present embodiment.

情報処理装置１は、制御部１１と、記憶部１２とを備える。制御部１１は、音声認識部１３と、変換部１４と、比較処理部１５と、表示処理部１６とを備える。制御部１１は、情報処理装置１の各部の動作を制御する。具体的には、制御部１１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）のようなプロセッサーを含み、例えば記憶部１２に記憶されたコンピュータープログラムを実行することによって、情報処理装置１の各部の動作を制御する。 The information processing device 1 includes a control unit 11 and a storage unit 12. The control unit 11 includes a voice recognition unit 13, a conversion unit 14, a comparison processing unit 15, and a display processing unit 16. The control unit 11 controls the operation of each unit of the information processing device 1. Specifically, the control unit 11 includes a processor such as a CPU (Central Processing Unit), and controls the operation of each unit of the information processing device 1 by executing, for example, a computer program stored in the storage unit 12. ..

記憶部１２は、原稿データを記憶する。表示処理部１６は、例えば、原稿画像Ｍ１を表示装置３に表示する処理を行う。具体的には、表示処理部１６は、記憶部１２を参照して原稿データの内容を示す原稿画像Ｍ１を生成し、原稿画像Ｍ１を表示装置３へ送信する。 The storage unit 12 stores the manuscript data. The display processing unit 16 performs a process of displaying the original image M1 on the display device 3, for example. Specifically, the display processing unit 16 refers to the storage unit 12 to generate a manuscript image M1 showing the contents of the manuscript data, and transmits the manuscript image M1 to the display device 3.

音声認識部１３は、マイク２を介して入力された音声を認識する。例えば、発表者Ｓ１及び聞き手Ｌ１～Ｌ３によって発声された音声は、マイク２によって電気信号に変換されて音声認識部１３に入力される。音声認識部１３は、音声が変換された電気信号に対してＡ／Ｄ変換及びフーリエ変換等の信号処理を行う。音声認識部１３は、信号処理の結果、得られた周波数及び波形等に基づいて、音声の内容を判定する。 The voice recognition unit 13 recognizes the voice input via the microphone 2. For example, the voice uttered by the presenter S1 and the listeners L1 to L3 is converted into an electric signal by the microphone 2 and input to the voice recognition unit 13. The voice recognition unit 13 performs signal processing such as A / D conversion and Fourier transform on the electric signal obtained by converting the voice. The voice recognition unit 13 determines the content of the voice based on the frequency, the waveform, and the like obtained as a result of the signal processing.

変換部１４は、音声認識部１３によって判定された音声の内容を文字データに変換する。 The conversion unit 14 converts the content of the voice determined by the voice recognition unit 13 into character data.

本実施形態では、マイク２を介して入力された音声が、発表者Ｓ１によって発声された音声であるか、聞き手Ｌ１～Ｌ３によって発声された音声であるかを区別する。 In the present embodiment, it is distinguished whether the voice input through the microphone 2 is the voice uttered by the presenter S1 or the voice uttered by the listeners L1 to L3.

次に、図２及び図３を参照して、本実施形態に係る情報処理装置１による音声と原稿データとの比較処理について説明する。図３は、本実施形態に係る情報処理装置１による音声と原稿データとの比較処理の一例を示す図である。 Next, with reference to FIGS. 2 and 3, a comparison process between the voice and the manuscript data by the information processing apparatus 1 according to the present embodiment will be described. FIG. 3 is a diagram showing an example of comparison processing between voice and manuscript data by the information processing apparatus 1 according to the present embodiment.

本実施形態において、情報処理装置１による比較処理が行われる場合、プレゼンテーションの開始前に、発表者Ｓ１の登録が行われる。 In the present embodiment, when the comparison process is performed by the information processing apparatus 1, the presenter S1 is registered before the start of the presentation.

例えば、音声認識部１３は、特定の音声を認識して発言者として登録する。具体的には、発表者Ｓ１を発言者として登録する場合、発表者Ｓ１は、例えば、所定の単語又は文章をマイク２へ向かって発声する。発声された音声は、マイク２によって電気信号に変換されて音声認識部１３に入力される。音声認識部１３は、音声が変換された電気信号に対してＡ／Ｄ変換及びフーリエ変換等の信号処理を行う。音声認識部１３は、信号処理の結果、得られた周波数及び波形等に基づいて、発表者Ｓ１によって発声された音声（Ｖ１）の特徴を抽出する。音声認識部１３は、例えば、音声Ｖ１の特徴と発表者Ｓ１とを関連付けて記憶部１２に記憶させる。 For example, the voice recognition unit 13 recognizes a specific voice and registers it as a speaker. Specifically, when the presenter S1 is registered as a speaker, the presenter S1 utters a predetermined word or sentence into the microphone 2, for example. The uttered voice is converted into an electric signal by the microphone 2 and input to the voice recognition unit 13. The voice recognition unit 13 performs signal processing such as A / D conversion and Fourier transform on the electric signal obtained by converting the voice. The voice recognition unit 13 extracts the characteristics of the voice (V1) uttered by the presenter S1 based on the frequency, the waveform, and the like obtained as a result of the signal processing. The voice recognition unit 13 stores, for example, the characteristics of the voice V1 and the presenter S1 in association with each other in the storage unit 12.

このように、情報処理装置１は、プレゼンテーションにおいて、マイク２を介して入力された音声が、発表者Ｓ１によって発声された音声であるか、発表者Ｓ１以外（例えば、聞き手Ｌ１～Ｌ３）によって発声された音声であるかを区別することができる。 As described above, in the presentation, in the presentation, the voice input through the microphone 2 is the voice uttered by the presenter S1, or is uttered by a person other than the presenter S1 (for example, listeners L1 to L3). It is possible to distinguish whether it is a voice that has been processed.

具体的には、音声認識部１３は、発表者Ｓ１がプレゼンテーションにおいて発声した音声Ｖ２が変換された電気信号に対してＡ／Ｄ変換及びフーリエ変換等の信号処理を行う。音声認識部１３は、信号処理の結果、得られた周波数及び波形等に基づいて、音声Ｖ２の特徴を抽出する。音声認識部１３は、抽出した音声Ｖ２の特徴と記憶部１２の音声Ｖ１の特徴とを比較する。音声認識部１３は、抽出した音声Ｖ２の特徴と記憶部１２の音声Ｖ１の特徴とがある程度以上一致しているか、完全に一致している場合、音声Ｖ２を発表者Ｓ１によって発声された音声であると判定する。 Specifically, the voice recognition unit 13 performs signal processing such as A / D conversion and Fourier transform on the electric signal converted by the voice V2 uttered by the presenter S1 in the presentation. The voice recognition unit 13 extracts the characteristics of the voice V2 based on the frequency, the waveform, and the like obtained as a result of the signal processing. The voice recognition unit 13 compares the characteristics of the extracted voice V2 with the characteristics of the voice V1 of the storage unit 12. When the characteristics of the extracted voice V2 and the characteristics of the voice V1 of the storage unit 12 match or are completely matched, the voice recognition unit 13 uses the voice uttered by the presenter S1 for the voice V2. Judge that there is.

一方、音声認識部１３は、音声Ｖ２以外の音声Ｖ３がマイク２を介して入力されると、音声Ｖ３が変換された電気信号に対してＡ／Ｄ変換及びフーリエ変換等の信号処理を行う。音声認識部１３は、信号処理の結果、得られた周波数及び波形等に基づいて、音声Ｖ３の特徴を抽出する。音声認識部１３は、抽出した音声Ｖ３の特徴と記憶部１２の音声Ｖ１の特徴とがある程度未満の一致か、全く一致していない場合、音声Ｖ３を発表者Ｓ１以外によって発声された音声であると判定する。 On the other hand, when the voice V3 other than the voice V2 is input via the microphone 2, the voice recognition unit 13 performs signal processing such as A / D conversion and Fourier transform on the electric signal converted by the voice V3. The voice recognition unit 13 extracts the characteristics of the voice V3 based on the frequency, the waveform, and the like obtained as a result of the signal processing. The voice recognition unit 13 is a voice uttered by a person other than the presenter S1 when the characteristics of the extracted voice V3 and the characteristics of the voice V1 of the storage unit 12 match less than a certain degree or do not match at all. Is determined.

変換部１４は、音声認識部１３によって登録された発表者Ｓ１による音声Ｖ２と、発表者Ｓ１以外による音声Ｖ３とを区別してそれぞれの音声を文字データに変換する。例えば、変換部１４は、音声認識部１３によって判定された音声の内容を文字データに変換し、音声が音声認識部１３によって発表者Ｓ１によって発声された音声Ｖ２であると判定された場合、発表者Ｓ１の音声である旨の情報を文字データに付加する。また、変換部１４は、音声が音声認識部１３によって発表者Ｓ１以外によって発声された音声Ｖ３であると判定された場合、発表者Ｓ１以外の音声である旨の情報を文字データに付加する。 The conversion unit 14 distinguishes between the voice V2 by the presenter S1 registered by the voice recognition unit 13 and the voice V3 by other than the presenter S1 and converts each voice into character data. For example, the conversion unit 14 converts the content of the voice determined by the voice recognition unit 13 into character data, and when the voice recognition unit 13 determines that the voice is the voice V2 uttered by the presenter S1, the announcement is made. Information to the effect that it is the voice of person S1 is added to the character data. Further, when the voice recognition unit 13 determines that the voice is a voice V3 uttered by a voice other than the presenter S1, the conversion unit 14 adds information to the effect that the voice is a voice other than the presenter S1 to the character data.

比較処理部１５は、変換部１４によって変換された文字データを原稿データと比較し、文字データと原稿データとが一致する場合、原稿データにおける対応する文字のプロパティを変更する。図３の例では、比較処理部１５は、変換部１４によって変換された文字データ「ＡＢＣ」と同じ文字が原稿データに含まれる場合、原稿データに含まれる文字「ＡＢＣ」の色、サイズ及びフォント等の少なくともいずれか１つを変更する。例えば、比較処理部１５は、変換部１４によって変換された文字データ「ＡＢＣ」と同じ文字「ＡＢＣ」の色、サイズ及びフォント等を、原稿データにおける他の文字と異なる色、サイズ及びフォント等にそれぞれ変更する。 The comparison processing unit 15 compares the character data converted by the conversion unit 14 with the manuscript data, and when the character data and the manuscript data match, changes the property of the corresponding character in the manuscript data. In the example of FIG. 3, when the manuscript data contains the same characters as the character data “ABC” converted by the conversion unit 14, the comparison processing unit 15 has the color, size, and font of the characters “ABC” included in the manuscript data. And at least one of them is changed. For example, the comparison processing unit 15 changes the color, size, font, etc. of the same character "ABC" as the character data "ABC" converted by the conversion unit 14 to a color, size, font, etc. different from other characters in the manuscript data. Change each.

本実施形態において、比較処理部１５は、原稿データにおける発表者Ｓ１の音声Ｖ２が変換された文字データに対応する文字と、発表者Ｓ１以外による音声Ｖ３が変換された文字データに対応する文字とを異なるプロパティに変更する。 In the present embodiment, the comparison processing unit 15 includes characters corresponding to the character data converted by the voice V2 of the presenter S1 in the manuscript data, and characters corresponding to the character data converted by the voice V3 by a voice V3 other than the presenter S1. To a different property.

次に、図３及び図４を参照して、本実施形態に係る情報処理装置１による原稿データに含まれる文字のプロパティ変更について説明する。図４は、本実施形態に係る情報処理装置１による原稿データに含まれる文字のプロパティ変更の一例を示す図である。 Next, with reference to FIGS. 3 and 4, the property change of the characters included in the manuscript data by the information processing apparatus 1 according to the present embodiment will be described. FIG. 4 is a diagram showing an example of changing the property of characters included in the manuscript data by the information processing apparatus 1 according to the present embodiment.

図３は、プレゼンテーションにおいて発表者Ｓ１が音声「ＡＢＣ」を発した場合を示す。図４は、プレゼンテーションにおいて聞き手Ｌ２が音声「ＡＢＣではない」を発した場合を示す。 FIG. 3 shows a case where the presenter S1 emits the voice “ABC” in the presentation. FIG. 4 shows a case where the listener L2 emits the voice “not ABC” in the presentation.

音声認識部１３は、発表者Ｓ１がプレゼンテーションにおいて発声した音声「ＡＢＣ」が変換された電気信号に対してＡ／Ｄ変換及びフーリエ変換等の信号処理を行う。音声認識部１３は、信号処理の結果、得られた周波数及び波形等に基づいて、音声「ＡＢＣ」の特徴を抽出する。音声認識部１３は、抽出した音声「ＡＢＣ」の特徴と記憶部１２の音声Ｖ１の特徴とを比較する。音声認識部１３は、抽出した音声Ｖ２の特徴と記憶部１２の音声Ｖ１の特徴とがある程度以上一致しているか、完全に一致している場合、音声「ＡＢＣ」を発表者Ｓ１によって発声された音声であると判定する。 The voice recognition unit 13 performs signal processing such as A / D conversion and Fourier transform on the electric signal converted by the voice "ABC" uttered by the presenter S1 in the presentation. The voice recognition unit 13 extracts the characteristics of the voice "ABC" based on the frequency, the waveform, and the like obtained as a result of the signal processing. The voice recognition unit 13 compares the characteristics of the extracted voice "ABC" with the characteristics of the voice V1 of the storage unit 12. The voice recognition unit 13 utters the voice "ABC" by the presenter S1 when the characteristics of the extracted voice V2 and the characteristics of the voice V1 of the storage unit 12 match or completely match. Judged as voice.

変換部１４は、音声認識部１３によって判定された音声「ＡＢＣ」を文字データ「ＡＢＣ」に変換し、発表者Ｓ１の音声である旨の情報を文字データ「ＡＢＣ」に付加する。 The conversion unit 14 converts the voice "ABC" determined by the voice recognition unit 13 into the character data "ABC", and adds information to the effect that it is the voice of the presenter S1 to the character data "ABC".

比較処理部１５は、変換部１４によって変換された文字データ「ＡＢＣ」を原稿データと比較し、原稿データに含まれる文字「ＡＢＣ」のプロパティをプロパティＰ１に変更する。例えば、文字「ＡＢＣ」の色が原稿データにおける他の文字と異なる色であることをプロパティＰ１とする。また、例えば、文字「ＡＢＣ」の色を背景と同じ色に変更すると、文字「ＡＢＣ」を見えなくすることができる。 The comparison processing unit 15 compares the character data "ABC" converted by the conversion unit 14 with the manuscript data, and changes the property of the character "ABC" included in the manuscript data to the property P1. For example, the property P1 is that the color of the character "ABC" is different from that of other characters in the manuscript data. Further, for example, if the color of the character "ABC" is changed to the same color as the background, the character "ABC" can be made invisible.

表示処理部１６は、比較処理部１５によるプロパティ変更後の原稿データの内容を示す原稿画像Ｍ２を生成し、原稿画像Ｍ２を表示装置３へ送信する。表示装置３は、情報処理装置１からの原稿画像Ｍ２を受信して表示する。 The display processing unit 16 generates a manuscript image M2 showing the contents of the manuscript data after the property is changed by the comparison processing unit 15, and transmits the manuscript image M2 to the display device 3. The display device 3 receives and displays the original image M2 from the information processing device 1.

一方、図４の例では、音声認識部１３は、聞き手Ｌ２が発した音声「ＡＢＣではない」がマイク２を介して入力されると、音声「ＡＢＣではない」が変換された電気信号に対してＡ／Ｄ変換及びフーリエ変換等の信号処理を行う。音声認識部１３は、信号処理の結果、得られた周波数及び波形等に基づいて、音声「ＡＢＣではない」の特徴を抽出する。音声認識部１３は、抽出した音声「ＡＢＣではない」の特徴と記憶部１２の音声Ｖ１の特徴とがある程度未満の一致か、全く一致していない場合、音声「ＡＢＣではない」を発表者Ｓ１以外によって発声された音声であると判定する。 On the other hand, in the example of FIG. 4, when the voice "not ABC" emitted by the listener L2 is input through the microphone 2, the voice recognition unit 13 responds to the electric signal to which the voice "not ABC" is converted. It performs signal processing such as A / D conversion and Fourier transform. The voice recognition unit 13 extracts the characteristics of the voice "not ABC" based on the frequency, the waveform, and the like obtained as a result of the signal processing. If the characteristics of the extracted voice "not ABC" and the characteristics of the voice V1 of the storage unit 12 match less than a certain degree or do not match at all, the voice recognition unit 13 announces the voice "not ABC" S1. It is determined that the voice is uttered by other than.

変換部１４は、音声認識部１３によって判定された音声「ＡＢＣではない」を文字データ「ＡＢＣではない」に変換し、発表者Ｓ１以外の音声である旨の情報を文字データに付加する。 The conversion unit 14 converts the voice "not ABC" determined by the voice recognition unit 13 into the character data "not ABC", and adds information to the effect that the voice is other than the presenter S1 to the character data.

比較処理部１５は、変換部１４によって変換された文字データ「ＡＢＣではない」のうちの一部又は全部が原稿データに含まれているかを検索する。比較処理部１５は、「ＡＢＣ」が原稿データに含まれると判定し、原稿データに含まれる文字「ＡＢＣ」のプロパティをプロパティＰ１と異なるプロパティＰ２に変更する。例えば、文字「ＡＢＣ」の色が、原稿データにおける他の文字、及びプロパティＰ１とも異なる色であることをプロパティＰ２とする。 The comparison processing unit 15 searches whether a part or all of the character data “not ABC” converted by the conversion unit 14 is included in the manuscript data. The comparison processing unit 15 determines that "ABC" is included in the manuscript data, and changes the property of the character "ABC" included in the manuscript data to a property P2 different from the property P1. For example, the property P2 is such that the color of the character "ABC" is different from other characters in the manuscript data and the property P1.

表示処理部１６は、比較処理部１５によるプロパティ変更後の原稿データの内容を示す原稿画像Ｍ３を生成し、原稿画像Ｍ３を表示装置３へ送信する。表示装置３は、情報処理装置１からの原稿画像Ｍ３を受信して表示する。 The display processing unit 16 generates a manuscript image M3 showing the contents of the manuscript data after the property is changed by the comparison processing unit 15, and transmits the manuscript image M3 to the display device 3. The display device 3 receives and displays the original image M3 from the information processing device 1.

次に、図５参照して、本実施形態に係る情報処理装置１による文字のプロパティの再変更について説明する。図５は、本実施形態に係る情報処理装置１による文字のプロパティの再変更の一例を示す図である。 Next, with reference to FIG. 5, the re-change of the character property by the information processing apparatus 1 according to the present embodiment will be described. FIG. 5 is a diagram showing an example of re-changing character properties by the information processing apparatus 1 according to the present embodiment.

例えば、比較処理部１５は、発表者Ｓ１による音声が所定条件を満たす場合、発表者Ｓ１以外による音声が変換された文字データに対応する文字のプロパティＰ２を、プロパティＰ１に再変更する。 For example, when the voice by the presenter S1 satisfies a predetermined condition, the comparison processing unit 15 re-changes the character property P2 corresponding to the converted character data by the voice other than the presenter S1 to the property P1.

例えば、比較処理部１５によって原稿データに含まれる文字のプロパティがプロパティＰ２に変更された後、発表者Ｓ１が「修正」又は「検討」等並びに「ＡＢＣ」の単語を含む音声Ｖ４を発した場合、プロパティの再変更が行われる。 For example, when the presenter S1 emits a voice V4 including the words "correction" or "examination" and the word "ABC" after the property of the character included in the manuscript data is changed to the property P2 by the comparison processing unit 15. , The property is changed again.

具体的には、比較処理部１５によって原稿データに含まれる文字のプロパティがプロパティＰ２に変更された後、発表者Ｓ１が発した音声Ｖ４がマイク２を介して入力されると、音声認識部１３は、音声Ｖ４が変換された電気信号に対してＡ／Ｄ変換及びフーリエ変換等の信号処理を行う。 Specifically, after the property of the character included in the manuscript data is changed to the property P2 by the comparison processing unit 15, when the voice V4 emitted by the presenter S1 is input via the microphone 2, the voice recognition unit 13 Performs signal processing such as A / D conversion and Fourier conversion on the electric signal converted by the voice V4.

音声認識部１３は、信号処理の結果、得られた周波数及び波形等に基づいて、音声Ｖ４の特徴を抽出する。音声認識部１３は、抽出した音声Ｖ４の特徴と記憶部１２の音声Ｖ１の特徴とを比較する。音声認識部１３は、抽出した音声Ｖ４の特徴と記憶部１２の音声Ｖ１の特徴とがある程度以上一致しているか、完全に一致している場合、音声Ｖ４を発表者Ｓ１によって発声された音声であると判定する。 The voice recognition unit 13 extracts the characteristics of the voice V4 based on the frequency, waveform, etc. obtained as a result of signal processing. The voice recognition unit 13 compares the characteristics of the extracted voice V4 with the characteristics of the voice V1 of the storage unit 12. When the characteristics of the extracted voice V4 and the characteristics of the voice V1 of the storage unit 12 match or are completely matched, the voice recognition unit 13 uses the voice uttered by the presenter S1 for the voice V4. Judge that there is.

変換部１４は、音声認識部１３によって判定された音声Ｖ４を文字データに変換し、発表者Ｓ１の音声である旨の情報を文字データに付加する。 The conversion unit 14 converts the voice V4 determined by the voice recognition unit 13 into character data, and adds information to the effect that it is the voice of the presenter S1 to the character data.

比較処理部１５は、変換部１４によって変換された文字データに含まれる単語「修正」及び「ＡＢＣ」を原稿データと比較し、原稿データに含まれる文字「ＡＢＣ」のプロパティをプロパティＰ１に変更する。 The comparison processing unit 15 compares the words "correction" and "ABC" included in the character data converted by the conversion unit 14 with the manuscript data, and changes the property of the character "ABC" included in the manuscript data to the property P1. ..

表示処理部１６は、比較処理部１５によるプロパティ変更後の原稿データの内容を示す原稿画像Ｍ４を生成し、原稿画像Ｍ４を表示装置３へ送信する。表示装置３は、情報処理装置１からの原稿画像Ｍ４を受信して表示する。 The display processing unit 16 generates a manuscript image M4 showing the contents of the manuscript data after the property is changed by the comparison processing unit 15, and transmits the manuscript image M4 to the display device 3. The display device 3 receives and displays the original image M4 from the information processing device 1.

このように、発表者Ｓ１による発声と、発表者Ｓ１以外による発声を区別し、原稿データにおいて対応する文字をそれぞれ異なるプロパティに変更することで、プレゼンテーション後に確認すべき事項、又はプレゼンテーション中に解決した事項の確認が容易になる。 In this way, by distinguishing between the utterances by the presenter S1 and the utterances by other than the presenter S1 and changing the corresponding characters in the manuscript data to different properties, items to be confirmed after the presentation or during the presentation were solved. It becomes easy to confirm the matter.

なお、本実施形態において、マイク２を介して入力された音声が所定条件を満たす場合、音声を文字データに変換しなくてもよい。 In the present embodiment, if the voice input via the microphone 2 satisfies a predetermined condition, the voice may not be converted into character data.

例えば、マイク２を介して入力された音声が変換された電気信号に対して、音声認識部１３による信号処理の結果、マイク２を介して入力された音声が所定の音量以下の音声、又は所定のトーンの音声であると判定された場合、変換部１４は、音声認識部１３によって判定された音声の内容を文字データに変換しない。 For example, as a result of signal processing by the voice recognition unit 13 with respect to the converted electric signal of the voice input via the microphone 2, the voice input through the microphone 2 is a voice having a predetermined volume or less, or a predetermined voice. When it is determined that the voice is the tone of the above, the conversion unit 14 does not convert the content of the voice determined by the voice recognition unit 13 into character data.

次に、図６参照して、本実施形態に係る比較プロセスについて説明する。図６は、本実施形態に係る比較プロセスを示すフローチャートである。 Next, the comparison process according to the present embodiment will be described with reference to FIG. FIG. 6 is a flowchart showing a comparison process according to the present embodiment.

まず、音声認識部１３は、発表者Ｓ１による音声を認識して発言者として登録する（ステップＳ１１）。 First, the voice recognition unit 13 recognizes the voice produced by the presenter S1 and registers it as a speaker (step S11).

音声認識部１３は、マイク２を介して入力された音声が変換された電気信号に対してＡ／Ｄ変換及びフーリエ変換等の信号処理を行う（ステップＳ１２）。 The voice recognition unit 13 performs signal processing such as A / D conversion and Fourier transform on the converted electric signal of the voice input via the microphone 2 (step S12).

音声認識部１３は、信号処理の結果、マイク２を介して入力された音声が十分な音量であるかを判定する（ステップＳ１３）。マイク２を介して入力された音声が十分な音量ではないと音声認識部１３によって判定された場合（ステップＳ１３でＮｏ）、変換部１４は、音声認識部１３によって判定された音声の内容を文字データに変換することなく、新たな音声が入力されるまで待機する（ステップＳ１２）。 As a result of signal processing, the voice recognition unit 13 determines whether the voice input via the microphone 2 has a sufficient volume (step S13). When the voice recognition unit 13 determines that the voice input via the microphone 2 is not sufficiently loud (No in step S13), the conversion unit 14 displays the content of the voice determined by the voice recognition unit 13 as characters. It waits until a new voice is input without converting it into data (step S12).

一方、マイク２を介して入力された音声が十分な音量であると音声認識部１３によって判定された場合（ステップＳ１３でＹｅｓ）、音声認識部１３は、音声が発表者Ｓ１による音声か、発表者Ｓ１以外による音声かを判定する（ステップＳ１４）。 On the other hand, when the voice recognition unit 13 determines that the voice input via the microphone 2 is sufficiently loud (Yes in step S13), the voice recognition unit 13 announces whether the voice is the voice of the presenter S1. It is determined whether the voice is from a person other than the person S1 (step S14).

変換部１４は、マイク２を介して入力された音声が発表者Ｓ１による音声であると音声認識部１３によって判定された場合（ステップＳ１４でＹｅｓ）、音声認識部１３によって判定された音声の内容を文字データに変換する（ステップＳ１５）。 When the voice recognition unit 13 determines that the voice input via the microphone 2 is the voice by the presenter S1 (Yes in step S14), the conversion unit 14 determines the content of the voice by the voice recognition unit 13. Is converted into character data (step S15).

比較処理部１５は、変換部１４によって変換された文字データを原稿データと比較し、原稿データに含まれる対応する文字のプロパティをプロパティＰ１に変更する（ステップＳ１６）。変換部１４及び比較処理部１５は、新たな音声が入力されるまで待機する（ステップＳ１２）。 The comparison processing unit 15 compares the character data converted by the conversion unit 14 with the manuscript data, and changes the property of the corresponding character included in the manuscript data to the property P1 (step S16). The conversion unit 14 and the comparison processing unit 15 wait until a new voice is input (step S12).

一方、変換部１４は、マイク２を介して入力された音声が発表者Ｓ１以外による音声であると音声認識部１３によって判定された場合（ステップＳ１４でＮｏ）、音声認識部１３によって判定された音声の内容を文字データに変換する（ステップＳ１７）。 On the other hand, when the voice recognition unit 13 determines that the voice input via the microphone 2 is a voice other than the presenter S1 (No in step S14), the conversion unit 14 is determined by the voice recognition unit 13. The content of the voice is converted into character data (step S17).

比較処理部１５は、変換部１４によって変換された文字データのうちの一部又は全部が原稿データに含まれているかを検索する（ステップＳ１８）。 The comparison processing unit 15 searches whether a part or all of the character data converted by the conversion unit 14 is included in the manuscript data (step S18).

比較処理部１５は、原稿データに含まれる対応する文字のプロパティをプロパティＰ２に変更する（ステップＳ１９）。 The comparison processing unit 15 changes the property of the corresponding character included in the manuscript data to the property P2 (step S19).

変換部１４及び比較処理部１５は、マイク２を介して新たな音声が入力されるまで待機する。新たな音声が入力された場合、音声認識部１３は、入力された音声が発表者Ｓ１による音声か、発表者Ｓ１以外による音声かを判定する（ステップＳ２０）。 The conversion unit 14 and the comparison processing unit 15 wait until a new voice is input via the microphone 2. When a new voice is input, the voice recognition unit 13 determines whether the input voice is a voice by the presenter S1 or a voice other than the presenter S1 (step S20).

変換部１４及び比較処理部１５は、マイク２を介して入力された音声が発表者Ｓ１以外による音声であると音声認識部１３によって判定された場合（ステップＳ２０でＮｏ）、ステップＳ１７～ステップＳ１９の処理を行う。 When the voice recognition unit 13 determines that the voice input via the microphone 2 is a voice other than the presenter S1 (No in step S20), the conversion unit 14 and the comparison processing unit 15 have steps S17 to S19. Is processed.

一方、変換部１４及び比較処理部１５は、マイク２を介して入力された音声が発表者Ｓ１以外による音声であると音声認識部１３によって判定された場合（ステップＳ２０でＹｅｓ）、マイク２を介して入力された発表者Ｓ１による音声の内容に所定の単語が含まれるか否かを判定する（ステップＳ２１）。 On the other hand, when the voice recognition unit 13 determines that the voice input via the microphone 2 is a voice other than the presenter S1, the conversion unit 14 and the comparison processing unit 15 use the microphone 2 (Yes in step S20). It is determined whether or not a predetermined word is included in the content of the voice input by the presenter S1 (step S21).

比較処理部１５は、発表者Ｓ１による音声の内容に所定の単語が含まれる場合（ステップＳ２１でＹｅｓ）、発表者Ｓ１以外による音声が変換された文字データに対応する文字のプロパティＰ２を、プロパティＰ１に変更する（ステップＳ１６）。変換部１４及び比較処理部１５は、新たな音声が入力されるまで待機する（ステップＳ１２）。 When the content of the voice by the presenter S1 includes a predetermined word (Yes in step S21), the comparison processing unit 15 sets the property P2 of the character corresponding to the character data to which the voice by other than the presenter S1 is converted. Change to P1 (step S16). The conversion unit 14 and the comparison processing unit 15 wait until a new voice is input (step S12).

一方、比較処理部１５は、発表者Ｓ１による音声の内容に所定の単語が含まれない場合（ステップＳ２１でＮｏ）、比較処理部１５は、対応する文字のプロパティＰ２を維持する。変換部１４及び比較処理部１５は、新たな音声が入力されるまで待機する（ステップＳ１２）。 On the other hand, when the comparison processing unit 15 does not include a predetermined word in the content of the voice by the presenter S1 (No in step S21), the comparison processing unit 15 maintains the property P2 of the corresponding character. The conversion unit 14 and the comparison processing unit 15 wait until a new voice is input (step S12).

本実施形態において、表示装置３の数は１つとしたが、これに限らず、例えば、プレゼンテーションの参加人数の数（図１の例では４つ）であってもよく、また、表示装置３が設けられなくてもよい。 In the present embodiment, the number of display devices 3 is one, but the number is not limited to this, and may be, for example, the number of participants in the presentation (four in the example of FIG. 1), and the display device 3 may be used. It does not have to be provided.

本実施形態において、情報処理装置１は、発表者Ｓ１によって使用される端末としたが、これに限らず、例えば、プレゼンテーション会場内又はプレゼンテーション会場外に設けられたサーバー等であってもよい。この場合、マイク２を介して入力された音声が変換された電気信号は、情報処理装置１に送信される。情報処理装置１は、送信された電気信号を受信して各種処理を行う。 In the present embodiment, the information processing apparatus 1 is a terminal used by the presenter S1, but the present invention is not limited to this, and may be, for example, a server provided inside or outside the presentation venue. In this case, the electric signal converted from the voice input via the microphone 2 is transmitted to the information processing apparatus 1. The information processing apparatus 1 receives the transmitted electric signal and performs various processes.

本実施形態において、情報処理装置１には、マイク２を介して入力された音声が変換された電気信号が入力される構成としたが、これに限らず、情報処理装置１には予め録音された音声データに対応する電気信号が入力されてもよい。 In the present embodiment, the information processing device 1 is configured to input an electric signal converted from the voice input via the microphone 2, but the present invention is not limited to this, and the information processing device 1 is pre-recorded. An electric signal corresponding to the audio data may be input.

本実施形態において、原稿データが記憶部１２に記憶されている構成としたが、これに限らず、例えば、情報処理装置１が記憶部１２を備えず、制御部１１が原稿データを外部から取得する構成であってもよい。 In the present embodiment, the manuscript data is stored in the storage unit 12, but the present invention is not limited to this. For example, the information processing device 1 does not have the storage unit 12, and the control unit 11 acquires the manuscript data from the outside. It may be configured to be used.

以上、図面（図１～図６）を参照しながら本発明の実施形態を説明した。但し、本発明は、上記の実施形態に限られるものではなく、その要旨を逸脱しない範囲で種々の態様において実施することが可能である。図面は、理解しやすくするために、それぞれの構成要素を主体に模式的に示しており、図示された各構成要素の厚み、長さ、個数等は、図面作成の都合上から実際とは異なる。また、上記の実施形態で示す各構成要素の材質や形状、寸法等は一例であって、特に限定されるものではなく、本発明の効果から実質的に逸脱しない範囲で種々の変更が可能である。 The embodiments of the present invention have been described above with reference to the drawings (FIGS. 1 to 6). However, the present invention is not limited to the above embodiment, and can be implemented in various embodiments without departing from the gist thereof. The drawings are schematically shown mainly for each component for easy understanding, and the thickness, length, number, etc. of each of the illustrated components are different from the actual ones for the convenience of drawing creation. .. Further, the material, shape, dimensions, etc. of each component shown in the above embodiment are merely examples, and are not particularly limited, and various changes can be made without substantially deviating from the effects of the present invention. be.

本発明は、会議における議事録作成の分野に利用可能である。 The present invention can be used in the field of minutes preparation at a meeting.

１：情報処理装置
３：表示装置
１１：制御部
１２：記憶部
１３：音声認識部
１４：変換部
１５：比較処理部
１６：表示処理部
Ｍ１～Ｍ４：原稿画像
Ｐ１、Ｐ２：プロパティ
Ｓ１：発表者
Ｖ１～Ｖ４：音声 1: Information processing device 3: Display device 11: Control unit 12: Storage unit 13: Voice recognition unit 14: Conversion unit 15: Comparison processing unit 16: Display processing units M1 to M4: Original image P1, P2: Property S1: Announcement Person V1 to V4: Voice

Claims

A voice recognition unit that recognizes voice and
A conversion unit that converts the voice recognized by the voice recognition unit into character data, and a conversion unit.
The character data converted by the conversion unit is compared with the manuscript data, and when the character data and the manuscript data match, a comparison processing unit for changing the property of the corresponding character in the manuscript data is provided.
The voice recognition unit recognizes a specific voice and registers it as a speaker.
The conversion unit distinguishes between the voice by the speaker registered by the voice recognition unit and the voice by a person other than the speaker, and converts each of the voices into the character data.
The comparison processing unit differs between the character corresponding to the character data in which the voice by the speaker in the manuscript data is converted and the character corresponding to the character data in which the voice by a person other than the speaker is converted. An information processing device that changes to the above property.

When the voice by the speaker satisfies a predetermined condition, the comparison processing unit sets the property of the character corresponding to the character data converted by the voice other than the speaker to the voice by the speaker in the manuscript data. The information processing apparatus according to claim 1, wherein the property is changed to the same property as the character corresponding to the converted character data.

The information processing device according to claim 1 or 2, wherein the conversion unit does not convert the voice into character data when the voice satisfies a predetermined condition.

The information processing device is
The information processing device according to any one of claims 1 to 3, further comprising a display processing unit that performs a process of displaying an image corresponding to the manuscript data on the display device.