JPH10198393A

JPH10198393A - Conversation recording device

Info

Publication number: JPH10198393A
Application number: JP9001052A
Authority: JP
Inventors: Ryoichi Yushimo; 良一湯下
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1997-01-08
Filing date: 1997-01-08
Publication date: 1998-07-31

Abstract

PROBLEM TO BE SOLVED: To provide a conversation recording device with which recorded results can be easily construed even though conversation among several persons is recorded. SOLUTION: A conversation recording device comprises a voice recognizing means 6, a recognized result storing means 7, a speaker characteristic clustering device 8, a speaker determining means 9 and a display means 10. When the content of conversation is recorded through character data through voice recognition, a time stamp and a voice feature extracted from a voice are also recorded for every character. At the time when the recording is completed, a clustering process is carried out for all voice features in order to obtain a number of persons in the conversation, and typical sound features of speakers. Thus, the voice features of the speakers are compared with recorded data in order to determine a speaker, the content of saying by one and the same speaker is displayed in sections by colors or a displaying position, thereby making it possible carry out a display which can be identified for speakers, respectively.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、会議等における会
話内容を記録する会話記録装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a conversation recording apparatus for recording conversation contents in a conference or the like.

【０００２】[0002]

【従来の技術】近年、音声認識装置をコンピュータ等の
入力装置として利用する要求が高まっており、さらに会
議などでの会話内容を記録する会話記録装置への応用が
期待されている。2. Description of the Related Art In recent years, there has been an increasing demand for using a voice recognition device as an input device of a computer or the like, and further application to a conversation recording device for recording the content of a conversation in a conference or the like is expected.

【０００３】以下に、従来の会話記録装置について説明
する。図８は従来例における会話記録装置の機能ブロッ
ク図である。[0003] A conventional conversation recording apparatus will be described below. FIG. 8 is a functional block diagram of a conventional conversation recording device.

【０００４】図８において、１は音声を電気信号に変換
して入力する音声入力手段、２は音声入力手段１から出
力される電気信号を文字コードとして認識する音声認識
手段、３は音声認識手段２から出力される文字コードを
記憶する認識結果格納手段、４は認識結果格納手段４に
記憶された文字コードを表示する表示手段である。In FIG. 8, reference numeral 1 denotes voice input means for converting a voice into an electric signal and inputting the input signal; 2 denotes a voice recognition means for recognizing the electric signal output from the voice input means 1 as a character code; Recognition result storage means 4 for storing the character code output from 2, and display means 4 for displaying the character code stored in the recognition result storage means 4.

【０００５】以上のように構成された会話記録装置につ
いて、以下にその動作を説明する。まず、音声入力手段
１により会話中の音声を電気信号として入力し、音声認
識手段２に送る。音声認識手段２では、入力された電気
信号の波形と、人の発声とそれを文字として表現したと
きの文字コードを記憶した辞書とを比較し、何という文
字を発声したのかを認識する。その結果を時系列的に認
識結果格納手段３に記憶すると共に、表示手段４により
会話内容を文字として表示する。[0005] The operation of the conversation recording apparatus configured as described above will be described below. First, the voice during conversation is input as an electrical signal by the voice input unit 1 and sent to the voice recognition unit 2. The voice recognition means 2 compares the waveform of the input electric signal with a human voice and a dictionary storing character codes when the voice is expressed as a character, and recognizes what character was voiced. The results are stored in the recognition result storage means 3 in chronological order, and the contents of the conversation are displayed as characters on the display means 4.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、このよ
うな会話記録装置では、複数の人の会話を記録する場合
に、複数の人の声が混ざり合った状態を記録することに
なるので、会話内容が理解しづらいといった問題点を有
していた。However, in such a conversation recording apparatus, when a conversation between a plurality of persons is recorded, a state in which voices of a plurality of persons are mixed is recorded. Had a problem that it was difficult to understand.

【０００７】すなわち、図９に示すように、話者Ａ、話
者Ｂ、話者Ｃの３人の話者が、順番に、話者Ａ「これは
Ｚですね」、話者Ｂ「えーっと」、話者Ｃ「いや、Ｙで
しょう」と会話をすると、表示手段４には、図１０の画
面表示のように会話結果の表示が行われ、複数の人の会
話を識別できなかった。That is, as shown in FIG. 9, three speakers, speaker A, speaker B, and speaker C, are, in order, speaker A "this is Z" and speaker B "um". When the user talks with the speaker C, "No, it will be Y", the result of the conversation is displayed on the display means 4 as shown in the screen of FIG. 10, and the conversation between a plurality of persons cannot be identified.

【０００８】本発明は、複数の人の会話を記録する場合
でも、記録した結果を理解しやすい会話記録装置を提供
することを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to provide a conversation recording device that allows a user to easily understand the recorded result even when recording conversations between a plurality of persons.

【０００９】[0009]

【課題を解決するための手段】この目的を達成するため
に本発明は、音声を電気信号に変換して入力する音声入
力手段と、音声入力手段から出力される電気信号を文字
コードとして認識する音声認識手段と、音声認識手段か
ら出力される文字コード、認識に用いた音声特徴、及び
認識した時間を、それぞれ対にして記憶する認識結果格
納手段と、認識結果格納手段に記憶された全ての音声の
特徴のうち、近似な特徴を持つ音声特徴をひとまとめに
し、会話中に存在する話者の特徴を求める話者特徴クラ
スタリング手段と、話者特徴クラスタリング手段にて求
められた話者の特徴を基準として、認識結果格納手段に
記憶された音声の特徴と比較することにより、音声特徴
と対になっている文字コードを発声した話者を判別する
話者判別手段と、話者判別手段にて得られた判別結果に
基づいて、同一話者が発声した音声の認識結果の文字コ
ードを視覚的に識別できるように文字として表示する表
示手段とを有する構成とした。SUMMARY OF THE INVENTION In order to achieve this object, the present invention recognizes voice input means for converting voice into an electric signal and inputs the same, and recognizes the electric signal output from the voice input means as a character code. Voice recognition means, a character code output from the voice recognition means, a voice feature used for recognition, and a recognition result storage means for storing the recognition time in pairs, and all of the recognition results stored in the recognition result storage means. Among the features of speech, speech features having similar features are grouped together, and speaker feature clustering means for obtaining the features of the speakers present during the conversation, and speaker features obtained by the speaker feature clustering means are obtained. Speaker discriminating means for discriminating a speaker who has uttered a character code paired with the speech feature by comparing with a feature of the speech stored in the recognition result storage means, Person based on the determination result obtained by the determining means, the same speaker is configured to have a display means for displaying a character so as to be visually identify the character code of the recognition result of the speech uttered.

【００１０】これにより、複数の人の会話を記録する場
合でも、記録した結果を理解しやすい会話記録装置が得
られる。As a result, even when a conversation between a plurality of persons is recorded, a conversation recording device in which the recorded result can be easily understood can be obtained.

【００１１】[0011]

【発明の実施の形態】本発明の請求項１に記載の発明
は、音声を電気信号に変換して入力する音声入力手段
と、音声入力手段から出力される電気信号を文字コード
として認識する音声認識手段と、音声認識手段から出力
される文字コード、認識に用いた音声特徴、及び認識し
た時間を、それぞれ対にして記憶する認識結果格納手段
と、認識結果格納手段に記憶された全ての音声の特徴の
うち、近似な特徴を持つ音声特徴をひとまとめにし、会
話中に存在する話者の特徴を求める話者特徴クラスタリ
ング手段と、話者特徴クラスタリング手段にて求められ
た話者の特徴を基準として、認識結果格納手段に記憶さ
れた音声の特徴と比較することにより、音声特徴と対に
なっている文字コードを発声した話者を判別する話者判
別手段と、話者判別手段にて得られた判別結果に基づい
て、同一話者が発声した音声の認識結果の文字コードを
視覚的に識別できるように文字として表示する表示手段
と、を有する構成としたことにより、複数の話者の会話
の内容を記録し、同一話者の会話の内容を、話者別に表
示できる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS According to the first aspect of the present invention, there is provided a voice input means for converting a voice into an electric signal and inputting the same, and a voice for recognizing the electric signal output from the voice input means as a character code. Recognition means, recognition result storage means for storing the character code output from the voice recognition means, the voice feature used for recognition, and the recognition time in pairs, and all the voices stored in the recognition result storage means Speaker characteristics clustering means for obtaining the characteristics of speakers present during a conversation, and speaker characteristics obtained by the speaker characteristics clustering means. A speaker discriminating means for discriminating a speaker who has uttered a character code paired with the speech feature by comparing with a feature of the speech stored in the recognition result storage means; Display means for displaying the character code of the recognition result of the voice uttered by the same speaker as characters based on the determination result obtained in The content of the conversation of the speaker can be recorded, and the content of the conversation of the same speaker can be displayed for each speaker.

【００１２】本発明の請求項２に記載の発明は、音声を
電気信号に変換して入力する音声入力手段と、音声入力
手段から出力される電気信号を文字コードとして認識す
る音声認識手段と、音声認識手段から出力される文字コ
ード、認識に用いた音声の特徴、及び認識した時間を、
それぞれ対にして記憶する認識結果格納手段と、認識結
果格納手段に記憶された全ての音声の特徴のうち、近似
な特徴を持つ音声特徴をひとまとめにし、会話中に存在
する話者の特徴を求める話者特徴クラスタリング手段
と、話者特徴クラスタリング手段にて求められた話者の
特徴を基準として、認識結果格納手段に記憶された音声
特徴と比較することにより、音声特徴と対になっている
文字コードを発声した話者を判別する話者判別手段と、
話者判別手段にて得られた判別結果に基づいて、同一話
者が発声した音声の認識結果の文字コードのみを選別し
て出力する話者選択手段と、話者選択手段から出力され
た文字コードを文字として表示する表示手段と、を有す
る構成としたことにより、複数の話者の会話の内容を記
録し、ユーザによって選択された話者の会話内容のみを
表示手段に表示できる。According to a second aspect of the present invention, there is provided a voice input means for converting a voice into an electrical signal and inputting the voice signal, a voice recognition means for recognizing the electrical signal output from the voice input means as a character code, Character code output from the voice recognition means, the characteristics of the voice used for recognition, and the recognized time,
Recognition result storage means that is stored as a pair, and speech features having approximate features among all the speech features stored in the recognition result storage means are combined to obtain the features of the speaker present during the conversation. A character paired with the speech feature by comparing the speaker feature clustering means and the speech feature stored in the recognition result storage means with reference to the speaker feature obtained by the speaker feature clustering means. Speaker discriminating means for discriminating a speaker who uttered the code;
A speaker selecting means for selecting and outputting only a character code of a recognition result of a voice uttered by the same speaker based on a discrimination result obtained by the speaker discriminating means, and a character output from the speaker selecting means With a configuration having display means for displaying codes as characters, it is possible to record the contents of conversations of a plurality of speakers and display only the conversation contents of the speaker selected by the user on the display means.

【００１３】以下、本発明の実施の形態について、図面
を参照しながら説明する。（実施の形態１）図１は本発明の第１の実施の形態にお
ける会話記録装置の機能ブロック図、図２は本発明の第
１の実施の形態における会話録装置の回路ブロック図で
ある。Hereinafter, embodiments of the present invention will be described with reference to the drawings. (Embodiment 1) FIG. 1 is a functional block diagram of a conversation recording device according to a first embodiment of the present invention, and FIG. 2 is a circuit block diagram of the conversation recording device according to the first embodiment of the present invention.

【００１４】図１において、５は音声を電気信号に変換
して入力する音声入力手段、６は音声入力手段５から出
力される電気信号を文字コードとして認識する音声認識
手段、７は音声認識手段６から出力される文字コード、
認識に用いた音声の特徴、及び認識した時間を、それぞ
れ対にして記憶する認識結果格納手段、８は認識結果格
納手段７に記憶された全ての音声特徴のうち、近似な特
徴を持つ音声をひとまとめにして会話中に存在する話者
の特徴を求める話者特徴クラスタリング手段、９は話者
特徴クラスタリング手段８にて求められた話者特徴を基
準として、認識結果格納手段７に記憶された音声特徴と
比較して、音声特徴と対になっている文字コードを発声
した話者を判別する話者判別手段、１０は話者判別手段
９にて得られた判別結果をもとに、同一話者が発声した
音声の認識結果の文字コードを視覚的に識別できるよう
に文字として表示する表示手段である。In FIG. 1, reference numeral 5 denotes voice input means for converting a voice into an electric signal and inputting it; 6, a voice recognition means for recognizing the electric signal output from the voice input means 5 as a character code; Character code output from 6,
Recognition result storage means for storing the features of the voice used for recognition and the time of recognition in pairs, respectively, and among all the voice features stored in the recognition result storage means 7, a voice having an approximate feature is stored. A speaker characteristic clustering means 9 for obtaining the characteristics of the speakers present in the conversation collectively, and a speech stored in the recognition result storage means 7 based on the speaker characteristics obtained by the speaker characteristic clustering means 8. The speaker discriminating means for discriminating the speaker who uttered the character code paired with the voice feature in comparison with the feature, and 10 based on the discrimination result obtained by the speaker discriminating means 9, This is display means for displaying a character code of a recognition result of a voice uttered by a person as a character so that the character code can be visually identified.

【００１５】図２において、１２は収集した音声を電気
信号に変換するマイクロフォン、１３はマイクロフォン
１２から得られたアナログ的な電気信号をディジタル信
号に変換するアナログ−ディジタル変換装置である。In FIG. 2, reference numeral 12 denotes a microphone for converting collected voice into an electric signal, and reference numeral 13 denotes an analog-digital converter for converting an analog electric signal obtained from the microphone 12 into a digital signal.

【００１６】１４は後述するＲＯＭ１５に記憶されてい
る制御プログラムを実行する中央処理装置（以下「ＣＰ
Ｕ」と称する）、１５はデータを固定的に記憶しておく
ためのリード・オンリ・メモリ（以下「ＲＯＭ」と称す
る）であって、ＣＰＵ１４が実行する制御プログラム１
６と音声認識辞書データ１７とを記憶している。１８は
データを一時的に記憶しておくためのランダム・アクセ
ス・メモリ（以下「ＲＡＭ」と称する）であって、認識
結果格納データ１９、話者判別結果格納データ２０を記
憶するための領域を有する。Reference numeral 14 denotes a central processing unit (hereinafter referred to as "CP") for executing a control program stored in a ROM 15 described later.
U), 15 is a read-only memory (hereinafter referred to as “ROM”) for fixedly storing data, and is a control program 1 executed by the CPU 14.
6 and speech recognition dictionary data 17 are stored. Reference numeral 18 denotes a random access memory (hereinafter, referred to as “RAM”) for temporarily storing data, and an area for storing recognition result storage data 19 and speaker discrimination result storage data 20. Have.

【００１７】２１はＣＰＵ１４に対して外部より開始や
終了などの指令を与えるためのキーボード、２２は会話
記録結果を表示するための表示装置であり、ＣＲＴディ
スプレイやＬＣＤ（液晶ディスプレイ）などの表示画面
を有する。２３は装置内部の信号伝送のためのバスライ
ンである。Reference numeral 21 denotes a keyboard for externally giving commands such as start and end to the CPU 14, and 22 denotes a display device for displaying a conversation record result, such as a CRT display or an LCD (liquid crystal display). Having. 23 is a bus line for signal transmission inside the device.

【００１８】以上のように構成された会話記録装置につ
いて、図３のフローチャートに基づき、以下にその動作
を説明する。The operation of the conversation recording apparatus configured as described above will be described below with reference to the flowchart of FIG.

【００１９】図３は本発明の第１の実施の形態における
会話記録装置のフローチャートである。本実施の形態で
は、図９に示すように、話者Ａ、話者Ｂ、話者Ｃの３人
話者が、順番に話者Ａ「これはＺですね」、話者Ｂ「え
ーっと」、話者Ｃ「いや、Ｙでしょう」と会話をしてい
る場合を例として説明する。FIG. 3 is a flowchart of the conversation recording device according to the first embodiment of the present invention. In the present embodiment, as shown in FIG. 9, three speakers A, B, and C are, in order, speaker A “this is Z” and speaker B “er”. The case where the user is talking with the speaker C, "No, Y will" will be described as an example.

【００２０】図３に示すように、複数の話者による会話
をマイクロフォン１２にて取り込んだ音声データをアナ
ログ−ディジタル変換装置１３によってディジタル音声
信号に変換し、ＲＡＭ１８に記憶する（Ｓ１）。図９の
場合、ＲＡＭ１８には「これはＺですねえーっとＹでし
ょう」の音声ディジタル信号が記録される。音声データ
取り込みの際、話者の人数を指定することも可能であ
る。As shown in FIG. 3, voice data obtained by taking in conversations by a plurality of speakers by the microphone 12 is converted into a digital voice signal by the analog-digital converter 13 and stored in the RAM 18 (S1). In the case of FIG. 9, an audio digital signal of "This is Z, is it Y?" Is recorded in the RAM 18. At the time of voice data capture, it is also possible to specify the number of speakers.

【００２１】次に、ＲＡＭ１８に記憶されたディジタル
音声信号に、あらかじめ計算式を施すことによって、音
声の区切れ、音声の特徴等を抽出する（Ｓ２）。Next, by applying a calculation formula to the digital voice signal stored in the RAM 18 in advance, voice breaks, voice characteristics, and the like are extracted (S2).

【００２２】入力された電気信号の波形から抽出した音
声特徴と、人の発声とそれを文字として表現したときの
文字コードを記憶した辞書とを比較し、何という文字を
発声したのかを認識する（Ｓ３）。The speech feature extracted from the waveform of the input electric signal is compared with a human utterance and a dictionary storing character codes when the utterance is expressed as a character to recognize what character is uttered. (S3).

【００２３】音声認識に際しては音声認識によって得ら
れた文字コード、認識に用いた音声の特徴、及び認識し
た時間を、それぞれ対にしてＲＡＭ１８に認識結果格納
データ１９として格納する（Ｓ４）。At the time of voice recognition, the character code obtained by the voice recognition, the characteristics of the voice used for the recognition, and the recognition time are stored as a pair in the RAM 18 as recognition result storage data 19 (S4).

【００２４】さらに、ＲＡＭ１８に記憶された全ての音
声の特徴のうち、近似な特徴を持つ音声特徴をひとまと
めとし、会話中に存在する話者の特徴を求める（Ｓ
５）。なお、Ｓ１において、音声データの取り込み時
に、話者の人数が入力された場合には、人数情報を使っ
て音声特徴のクラスタリングの精度を向上させることが
可能となる。Further, of all the voice features stored in the RAM 18, voice features having approximate features are grouped together, and the features of the speaker present during the conversation are obtained (S).
5). If the number of speakers is input at the time of capturing the voice data in S1, it is possible to improve the accuracy of the clustering of the voice feature using the number information of the speakers.

【００２５】話者特徴クラスタリングにて求められた話
者特徴を基準として、ＲＡＭ１８に記憶された音声特徴
と比較することにより、音声特徴と対になっている文字
コードを発声した話者を判別する（Ｓ６）。Based on the speaker characteristic obtained by the speaker characteristic clustering, the speaker which has uttered the character code paired with the voice characteristic is discriminated by comparing it with the voice characteristic stored in the RAM 18. (S6).

【００２６】判別結果に基づいて、同一話者が発声した
音声の認識結果の文字コードを視覚的に識別できるよう
に文字として表示して（Ｓ７）、全ての処理を終了す
る。この表示状態を示したものが図４であり、話者別に
表示された結果を示している。Based on the discrimination result, the character code of the recognition result of the voice uttered by the same speaker is displayed as a character so that it can be visually identified (S7), and all the processing ends. FIG. 4 shows this display state, and shows the result displayed for each speaker.

【００２７】（実施の形態２）図５は本発明の第２の実
施の形態における会話記録装置の機能ブロック図であ
る。(Embodiment 2) FIG. 5 is a functional block diagram of a conversation recording apparatus according to a second embodiment of the present invention.

【００２８】図５において、１１は話者判別手段９にて
得られた判別結果から、ユーザによって指定された話者
が発声した音声の認識結果の文字コードを選定する話者
選択手段である。その他の手段については、第１の実施
の形態（図１参照）で説明しているので、ここでの説明
を省略する。In FIG. 5, reference numeral 11 denotes a speaker selecting means for selecting a character code of the recognition result of the voice uttered by the speaker specified by the user from the result of the determination obtained by the speaker determining means 9. Other means have been described in the first embodiment (see FIG. 1), and thus description thereof will be omitted.

【００２９】なお、回路構成については、第１の実施の
形態（図２参照）と同様であり、話者選択手段１１は、
キーボード２１をユーザが操作することにより実現され
る。The circuit configuration is the same as that of the first embodiment (see FIG. 2).
This is realized by the user operating the keyboard 21.

【００３０】以上のように構成された会話記録装置につ
いて、図６のフローチャートに基づき、以下にその動作
を説明する。The operation of the conversation recording device configured as described above will be described below with reference to the flowchart of FIG.

【００３１】図６は本発明の第２の実施の形態における
会話記録装置のフローチャートである。FIG. 6 is a flowchart of the conversation recording device according to the second embodiment of the present invention.

【００３２】図６に示すように、複数の話者による会話
をマイクロフォン１２にて取り込んだ音声データをアナ
ログ−ディジタル変換装置１３によってディジタル音声
信号に変換し、ＲＡＭ１８に記憶する（Ｓ８）。なお、
音声データ取り込みの際、話者の人数を指定することも
可能である。As shown in FIG. 6, voice data obtained by taking in conversations by a plurality of speakers by the microphone 12 is converted into a digital voice signal by the analog-digital converter 13 and stored in the RAM 18 (S8). In addition,
At the time of voice data capture, it is also possible to specify the number of speakers.

【００３３】次に、ＲＡＭ１８に記憶されたディジタル
音声信号に、あらかじめ定められた計算式を施すことに
よって音声の区切れ、音声の特徴等を抽出する（Ｓ
９）。Next, a predetermined formula is applied to the digital voice signal stored in the RAM 18 to extract voice breaks, voice features, etc. (S
9).

【００３４】入力された電気信号の波形から抽出した音
声特徴と、人の発声とそれを文字として表現したときの
文字コードを記憶した辞書とを比較し、何という文字を
発声したのかを認識する（Ｓ１０）。The speech feature extracted from the waveform of the input electric signal is compared with the utterance of a person and a dictionary storing character codes when the utterance is expressed as a character, and it is recognized what character was uttered. (S10).

【００３５】音声認識に際しては、音声認識によって得
られた文字コード、認識に用いた音声の特徴、及び認識
した時間を、それぞれ対にしてＲＡＭ１８に認識結果格
納データ１９として格納する（Ｓ１１）。At the time of speech recognition, the character code obtained by the speech recognition, the characteristics of the speech used for the recognition, and the recognition time are stored as a pair in the RAM 18 as recognition result storage data 19 (S11).

【００３６】さらに、ＲＡＭ１８に記憶された全ての音
声の特徴のうち、近似な特徴を持つ音声特徴をひとまと
めにし、会話中に存在する話者の特徴を求める（Ｓ１
２）。なお、Ｓ１において、音声データの取り込み時
に、話者の人数が入力された場合には、人数情報を使っ
て音声特徴のクラスタリングの精度を向上させることが
可能となる。Further, of all the voice features stored in the RAM 18, voice features having approximate features are grouped together, and the features of the speaker present during the conversation are obtained (S1).
2). If the number of speakers is input at the time of capturing the voice data in S1, it is possible to improve the accuracy of the clustering of the voice feature using the number information of the speakers.

【００３７】話者特徴クラスタリングによって求められ
た話者特徴を基準として、ＲＡＭ１８に記憶された音声
特徴と比較することにより、音声の特徴と対になってい
る文字コードを発声した話者を判別する（Ｓ１３）。Based on the speaker characteristics obtained by the speaker characteristic clustering, the speaker that has uttered the character code paired with the voice characteristics is discriminated by comparing with the voice characteristics stored in the RAM 18. (S13).

【００３８】ここで、表示したい話者の指定を促すメッ
セージを表示装置２２に表示し、ユーザからのキー入力
を待つ（Ｓ１４）。Here, a message urging the designation of the speaker to be displayed is displayed on the display device 22, and a key input from the user is waited (S14).

【００３９】キー入力があればＳ１５に処理を移行す
る。Ｓ１５では、ユーザによってどの話者が指定された
かを判別する。話者の指定がない場合にはＳ１７に移行
し、話者の指定があった場合にはＳ１６へ進む。If there is a key input, the process proceeds to S15. In S15, it is determined which speaker has been designated by the user. If no speaker is specified, the process proceeds to S17, and if a speaker is specified, the process proceeds to S16.

【００４０】Ｓ１６では、指定された話者が発声した音
声の認識結果の文字コードのみを視覚的に識別できるよ
うに文字として表示する。話者Ｂを指定した例として図
７を示す。In S16, only the character code of the recognition result of the voice uttered by the designated speaker is displayed as characters so that it can be visually identified. FIG. 7 shows an example in which speaker B is specified.

【００４１】Ｓ１５によって話者が指定されなかった場
合は、話者別に同一話者が発声した音声の認識結果の文
字コードを、視覚的に識別できるように文字として表示
装置２２に表示する（Ｓ１７）。If the speaker is not specified in S15, the character code of the recognition result of the voice uttered by the same speaker for each speaker is displayed on the display device 22 as characters so that it can be visually identified (S17). ).

【００４２】[0042]

【発明の効果】以上のように本発明によれば、音声入力
手段で取得した音声データを音声認識手段によって文字
コードに変換し、さらに、話者の音声特徴に基づいて、
音声認識手段によって得た文字コードを話者別に分類さ
せる手段を設けることにより、同一話者の発言内容を、
色や表示位置で区分して表示させることが可能になると
いう有利な効果が得られる。As described above, according to the present invention, the voice data obtained by the voice input means is converted into a character code by the voice recognition means, and further, based on the voice characteristics of the speaker,
By providing a means for classifying the character codes obtained by the voice recognition means for each speaker,
An advantageous effect is obtained in that it is possible to display the image separately by color or display position.

[Brief description of the drawings]

【図１】本発明の第１の実施の形態における会話記録装
置の機能ブロック図FIG. 1 is a functional block diagram of a conversation recording device according to a first embodiment of the present invention.

【図２】本発明の第１の実施の形態における会話記録装
置の回路ブロック図FIG. 2 is a circuit block diagram of a conversation recording device according to the first embodiment of the present invention.

【図３】本発明の第１の実施の形態における会話記録装
置のフローチャートFIG. 3 is a flowchart of a conversation recording device according to the first embodiment of the present invention.

【図４】本発明の第１の実施の形態における会話記録装
置の画面表示の説明図FIG. 4 is an explanatory diagram of a screen display of the conversation recording device according to the first embodiment of the present invention.

【図５】本発明の第２の実施の形態における会話記録装
置の機能ブロック図FIG. 5 is a functional block diagram of a conversation recording device according to a second embodiment of the present invention.

【図６】本発明の第２の実施の形態における会話記録装
置のフローチャートFIG. 6 is a flowchart of a conversation recording device according to a second embodiment of the present invention.

【図７】本発明の第２の実施の形態における会話記録装
置の画面表示の説明図FIG. 7 is an explanatory diagram of a screen display of a conversation recording device according to a second embodiment of the present invention.

【図８】従来例における会話記録装置の機能ブロック図FIG. 8 is a functional block diagram of a conversation recording device in a conventional example.

【図９】従来例における会話の様子を示す図FIG. 9 is a diagram showing a state of a conversation in a conventional example.

【図１０】従来例における会話記録装置の画面表示の説
明図FIG. 10 is an explanatory diagram of a screen display of a conversation recording device in a conventional example.

[Explanation of symbols]

５音声入力手段６音声認識手段７認識結果格納手段８話者特徴クラスタリング手段９話者判別手段１０表示手段１１話者選択手段１２マイクロフォン１３アナログ−ディジタル変換装置１４中央処理装置（ＣＰＵ）１５リードオンリメモリ（ＲＯＭ）１６制御プログラム１７音声認識データ１８ランダムアクセスメモリ（ＲＡＭ) １９認識結果格納データ２０話者判別結果格納データ２１キーボード２２表示装置２３バスライン Reference Signs List 5 voice input means 6 voice recognition means 7 recognition result storage means 8 speaker feature clustering means 9 speaker determination means 10 display means 11 speaker selection means 12 microphone 13 analog-digital conversion device 14 central processing unit (CPU) 15 read-only Memory (ROM) 16 Control program 17 Voice recognition data 18 Random access memory (RAM) 19 Recognition result storage data 20 Speaker discrimination result storage data 21 Keyboard 22 Display device 23 Bus line

Claims

[Claims]

A voice input unit for converting a voice into an electric signal and inputting the voice signal; a voice recognition unit for recognizing the electric signal output from the voice input unit as a character code; and a character output from the voice recognition unit A recognition result storage unit that stores the code, the voice feature used for recognition, and the recognition time in pairs, and a voice having an approximate feature among all voice features stored in the recognition result storage unit. A speaker feature clustering unit that collects the features and obtains the characteristics of the speakers present during the conversation, and is stored in the recognition result storage unit based on the speaker characteristics obtained by the speaker characteristic clustering unit. A speaker discriminating means for discriminating a speaker who has uttered the character code paired with the speech feature by comparing with the speech feature. Determination Based on the results, the conversation recording apparatus; and a display unit for the same speaker is displayed as a character so that it can be visually identify the character code of the recognition result of the speech uttered.

2. A voice input means for converting a voice into an electric signal for input, a voice recognition means for recognizing an electric signal output from the voice input means as a character code, and a character output from the voice recognition means. A recognition result storing means for storing the code, the feature of the voice used for recognition, and the time of recognition in pairs, and having an approximate feature among all voice features stored in the recognition result storing means. A speaker feature clustering unit that collects voice features and obtains features of a speaker present during a conversation, and is stored in the recognition result storage unit based on the speaker features obtained by the speaker feature clustering unit. Speaker discriminating means for discriminating a speaker who has uttered a character code paired with the speech feature by comparing the obtained speech feature with the obtained speech feature. Speaker selecting means for selecting and outputting only the character code of the recognition result of the voice uttered by the same speaker based on the determination result, and displaying means for displaying the character code output from the speaker selecting means as characters And a conversation recording device.