JP2001272990A

JP2001272990A - Interaction recording and editing device

Info

Publication number: JP2001272990A
Application number: JP2000089033A
Authority: JP
Inventors: Naoki Hayashi; 直樹林; Yutaka Manjo; 裕萬上
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2000-03-28
Filing date: 2000-03-28
Publication date: 2001-10-05
Anticipated expiration: 2020-03-28
Also published as: JP3896760B2

Abstract

PROBLEM TO BE SOLVED: To provide an interaction recording and editing device giving an edition operating system suitable for editing a record of interaction. SOLUTION: An interactive voice storage means 1 stores voice information on recorded interactions; edit unit extracting means 2, 4-6 extract a unit part to be provided with edit operation from the voice information stored in the interactive voice storage means 1 as an edit unit, an edit unit selecting means 7 stores the extracted edit unit, edit unit selecting means 7-9, 15 select a prescribed edit unit from the stored edit units, an edited voice information generating means 16 generates edited voice information consisting of the selected edit units, and an edited voice storage means 17 stores the edited voice information generated. Moreover, the device performs extraction and selection by using, for example, a keyword.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、複数の対話参加者
による対話を記録した対話記録を編集する対話記録編集
装置や当該編集を実現する記憶媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a dialogue record editing apparatus for editing a dialogue record in which dialogues of a plurality of dialogue participants are recorded, and a storage medium for implementing the editing.

【０００２】[0002]

【従来の技術】複数人による対話（例えば会話や会議に
おける対話）を記録し、その記録を分析することで新た
な知識が得られることは多い。例えば、顧客との対話
は、後でその内容を吟味してみると（特に、上長やトッ
プセールスにその記録を見せて意見を聞いてみると）、
その場では気づかなかった問題点やよりよい解決案を考
えることができる。あるいは、システムの使い勝手など
を調べるために、プロトコル分析という人の発話を分析
する手法が有効であることは広く知られている。2. Description of the Related Art In many cases, new knowledge can be obtained by recording a conversation (for example, a conversation or a conversation in a conference) by a plurality of people and analyzing the recorded record. For example, a conversation with a customer can be reviewed later (especially when the manager or top sales shows the record and asks for feedback)
You can think of problems and better solutions that you did not notice on the spot. Alternatively, it is widely known that a protocol analysis technique of analyzing a human utterance is effective for checking the usability of the system.

【０００３】音声記録から所望の箇所を見つけだし編集
することに関連する従来技術としては、例えば特開平９
−９１９２８号公報（以下、文献１と言う）に示される
技術がある。この技術では、まず、音声付き映像記録に
記録された音声を文章に変換する。次に、時間的位置を
示す標識（例えばタイムスタンプ或いは物理アドレス）
を介して、音声記録と映像記録と文章とを対応付けて記
憶する。そして、この標識を利用して、ユーザが文章に
施した編集作業を、音声記録と映像記録の編集に反映さ
せる。従って、この従来技術を用いると、文章中の言葉
の削除や並び替えを行うことで、音声や映像を編集する
ことができる。[0003] As a prior art relating to finding and editing a desired portion from an audio recording, Japanese Patent Application Laid-Open No.
There is a technique disclosed in Japanese Patent Application Laid-Open No. 91928 (hereinafter referred to as Document 1). In this technique, first, the sound recorded in the video recording with sound is converted into a sentence. Next, a sign indicating a temporal position (for example, a time stamp or a physical address)
, The audio recording, the video recording, and the sentence are stored in association with each other. Then, using this sign, the editing work performed by the user on the text is reflected in the editing of the audio recording and the video recording. Therefore, by using this conventional technique, audio and video can be edited by deleting and rearranging words in a sentence.

【０００４】また、例えば特開平８−３１７３６５号公
報（以下、文献２と言う）に記載された電子会議装置で
は、音声データを会話順に時系列でグラフ化して表示す
ることや、選択された音声データを再生することや、音
声データを編集することや、各音声データの記憶量の大
きさによりグラフ化表示することが行われる。従って、
この従来技術を用いると、音声データの状態を視覚的に
認識しながら音声データを編集することができる。In an electronic conference apparatus described in, for example, Japanese Patent Application Laid-Open No. 8-317365 (hereinafter referred to as Reference 2), audio data is displayed in a time-series graph in the order of conversation, and selected audio data is displayed. Reproduction of data, editing of audio data, and graph display according to the amount of storage of each audio data are performed. Therefore,
With this conventional technique, audio data can be edited while visually recognizing the state of the audio data.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上記の
ような従来技術を対話の記録の編集に適用する場合に
は、以下で述べるように不具合が生じてしまう。すなわ
ち、対話の記録を編集する作業としては、例えば、ミー
ティングの議事録作成や要約作成、対談録の作成などが
ある。議事録作成や対談録の作成においては、対話全体
を残しつつ、言いよどみなどの不要な発言、或いはオフ
レコ情報など公開できない発言を削除することが主体と
なる。また、要約作成においては、いくつか重要な議題
について関連発言をピックアップするといった作業が要
求される。However, when the above-described conventional technique is applied to editing of a recording of a conversation, a problem occurs as described below. That is, the work of editing the record of the dialogue includes, for example, creation of the minutes of the meeting, creation of the summary, creation of the talk log, and the like. In preparing the minutes and the minutes of the talk, the main task is to delete unnecessary remarks such as stagnation or unrecognizable remarks such as off-the-record information while leaving the entire dialogue. In summarizing, it is necessary to pick up relevant remarks on some important topics.

【０００６】しかしながら、例えば上記文献１に示され
る従来技術では、編集の方法として、編集開始点と編集
終了点とをユーザが明示的に指定し、それらの点に挟ま
れていない部分は消去するという方法がとられている。
この指定方法では、長時間の取材から数分の番組を制作
するような「捨てる」部分が多い音源には向いている
が、全体を残しつついくつかの発言を削除していくよう
な場合には、編集開始点と終了点の指定が煩雑となって
しまうといった不具合があった。また、この指定方法で
は、関連する発言をピックアップすることには何らの支
援も与えられていないといった不具合があった。However, in the prior art disclosed in the above-mentioned reference 1, for example, as an editing method, a user explicitly specifies an editing start point and an editing end point, and a portion not sandwiched between these points is deleted. The method is taken.
This designation method is suitable for a sound source that has many "discard" parts, such as producing a program for several minutes from a long time interview, but if you want to delete some remarks while leaving the whole Has a problem that the specification of the editing start point and the end point is complicated. In addition, in this designation method, there is a problem that no support is provided for picking up a related statement.

【０００７】同様に、例えば上記文献２に示される従来
技術においても、編集の方法として、範囲指定された音
声データを削除、編集するという方法がとられているた
め、このような範囲指定が煩雑となってしまい、また、
関連する発言をピックアップすることが支援されていな
いといった不具合があった。本発明は、このような従来
の課題を鑑みてなされたもので、対話の記録の編集に向
いた編集操作系を与えることができる対話記録編集装置
や記憶媒体を提供することを目的とする。[0007] Similarly, for example, in the prior art disclosed in the above-mentioned Document 2, as a method of editing, a method of deleting and editing audio data whose range is specified is adopted, so that such range specification is complicated. And again,
There was a problem that it was not supported to pick up related remarks. The present invention has been made in view of such a conventional problem, and has as its object to provide a dialog recording and editing apparatus and a storage medium that can provide an editing operation system suitable for editing a recording of a dialog.

【０００８】[0008]

【課題を解決するための手段】上記目的を達成するた
め、本発明に係る対話記録編集装置では、対話音声記憶
手段が対話を記録した音声情報を記憶し、編集単位抽出
手段が対話音声記憶手段に記憶される音声情報中から編
集操作をほどこす単位となる部分を編集単位として抽出
し、編集単位記憶手段が抽出された編集単位を記憶し、
編集単位選択手段が記憶された編集単位の中から所定の
編集単位を選択し、編集済み音声情報生成手段が選択さ
れた編集単位からなる編集済み音声情報を生成し、編集
済み音声記憶手段が生成された編集済み音声情報を記憶
する。従って、ユーザは音声情報中から抽出された編集
単位を選択することで音声情報の編集を行うことができ
るため、例えば従来技術のように編集する範囲を指定す
ることが不要となり、ユーザによる対話記録の編集を行
い易くすることができる。In order to achieve the above object, in a dialogue recording and editing apparatus according to the present invention, a dialogue voice storage means stores voice information recording a dialogue, and an edit unit extracting means includes a dialogue voice storage means. A portion which is a unit for performing an editing operation is extracted from the audio information stored in the unit as an editing unit, and the editing unit storing means stores the extracted editing unit,
The editing unit selecting unit selects a predetermined editing unit from the stored editing units, and the edited audio information generating unit generates edited audio information including the selected editing unit, and the edited audio storage unit generates the edited audio information. The edited edited audio information is stored. Therefore, the user can edit the audio information by selecting the editing unit extracted from the audio information, so that it is not necessary to specify a range to be edited as in the prior art, for example. Can be easily edited.

【０００９】また、本発明に係る対話記録編集装置で
は、編集単位抽出手段は音声情報中の無声部分を編集単
位として抽出し、編集単位選択手段は無声部分である編
集単位を一括して選択対象から除外する機能を有する。
従って、例えばユーザが音声情報中の無声部分を１つ１
つ指定して削除することをしなくとも、当該無声部分を
一括して削除することができる。In the dialog recording and editing apparatus according to the present invention, the editing unit extracting means extracts unvoiced portions in the audio information as editing units, and the editing unit selecting means collectively selects the editing units which are unvoiced portions. It has a function to exclude from.
Therefore, for example, when the user selects one unvoiced part
The unvoiced portion can be deleted at once without specifying and deleting one.

【００１０】また、本発明に係る対話記録編集装置で
は、編集単位抽出手段は音声情報の話者に基づいて編集
単位を抽出する。ここで、音声情報の話者に基づいて編
集単位を抽出する仕方としては、例えば対話における話
者が変化するところで編集単位を区切って抽出する仕方
や、例えば複数人が同時に発言している場合に話者毎の
音声情報を異なる編集単位として抽出する仕方を用いる
ことができる。従って、例えば話者毎に編集単位が抽出
されるため、編集を行い易くすることができる。[0010] In the dialog recording and editing apparatus according to the present invention, the editing unit extracting means extracts the editing unit based on the speaker of the voice information. Here, as a method of extracting the editing unit based on the speaker of the voice information, for example, a method of extracting the editing unit in a place where the speaker in the dialog changes, or a case where a plurality of people are simultaneously speaking A method of extracting audio information for each speaker as a different editing unit can be used. Therefore, for example, an editing unit is extracted for each speaker, so that editing can be easily performed.

【００１１】また、本発明に係る対話記録編集装置で
は、編集単位抽出手段は音声情報を当該音声情報と対応
したテキスト情報へ変換する機能を有しており、編集単
位記憶手段は編集単位の属性として当該編集単位の音声
情報から当該機能により変換されたテキスト情報を記憶
し、編集単位選択手段は編集単位の属性であるテキスト
情報に基づいて編集単位を選択する。従って、各編集単
位の音声情報と対応したテキスト情報に基づいて編集単
位を選択することができ、例えば、キーワードによる選
択や、ユーザの視覚による選択が可能となる。In the dialog recording and editing apparatus according to the present invention, the editing unit extracting means has a function of converting the audio information into text information corresponding to the audio information, and the editing unit storing means has an attribute of the editing unit. The text information converted by the function from the audio information of the edit unit is stored, and the edit unit selecting means selects the edit unit based on the text information which is the attribute of the edit unit. Therefore, the editing unit can be selected based on the text information corresponding to the audio information of each editing unit. For example, selection by a keyword or selection by the user's vision becomes possible.

【００１２】また、本発明に係る対話記録編集装置で
は、編集済みテキスト情報生成手段が編集単位選択手段
により選択された編集単位の属性であるテキスト情報か
らなる編集済みテキスト情報を生成し、編集済みテキス
ト情報記憶手段が生成された編集済みテキスト情報を記
憶する。従って、編集済みの音声情報ばかりでなく、編
集済みのテキスト情報も記憶されるため、このようなテ
キスト情報の利用が可能となる。In the dialog recording and editing apparatus according to the present invention, the edited text information generating means generates edited text information including text information which is an attribute of the editing unit selected by the editing unit selecting means, and edits the edited text information. The text information storage means stores the generated edited text information. Therefore, not only edited audio information but also edited text information is stored, so that such text information can be used.

【００１３】また、本発明に係る対話記録編集装置で
は、編集単位抽出手段は対話音声記憶手段に記憶された
音声情報から変換されたテキスト情報に基づいて編集単
位を抽出する。従って、音声情報と対応したテキスト情
報に基づいて編集単位を抽出することができ、例えば、
キーワードによる抽出が可能となる。Further, in the dialogue recording and editing apparatus according to the present invention, the editing unit extracting means extracts the editing unit based on the text information converted from the voice information stored in the voice storing means. Therefore, the editing unit can be extracted based on the text information corresponding to the audio information, for example,
Extraction by keywords is possible.

【００１４】また、本発明に係る対話記録編集装置で
は、第１キーワード記憶手段が所定の第１キーワードを
記憶し、編集単位抽出手段は対話音声記憶手段に記憶さ
れた音声情報から変換されたテキスト情報中に第１キー
ワード記憶手段に記憶された第１キーワードと合致する
部分が含まれる場合には当該部分に対応した音声情報部
分を例えば１つの編集単位として抽出し、編集単位選択
手段は第１キーワードと合致する部分として抽出された
編集単位を一括して選択或いは選択対象から除外する機
能を有する。従って、例えば予め設定されたキーワード
等と合致する音声情報部分を一括して選択することや、
或いは一括して選択しないようにすることができる。Further, in the dialogue recording and editing apparatus according to the present invention, the first keyword storage means stores a predetermined first keyword, and the edit unit extracting means outputs the text converted from the voice information stored in the dialogue voice storage means. If the information includes a portion that matches the first keyword stored in the first keyword storage unit, the audio information portion corresponding to the portion is extracted as, for example, one editing unit, and the editing unit selection unit selects the first unit. It has a function of collectively selecting or excluding edit units extracted as portions matching the keyword. Therefore, for example, it is possible to collectively select audio information portions that match a preset keyword or the like,
Alternatively, it is possible not to select all at once.

【００１５】また、本発明に係る対話記録編集装置で
は、第１キーワードは間投的な言葉である。ここで、間
投的な言葉とは、後述する実施例で示すように、例えば
対話を続けるための表現を示す言葉であって対話の内容
には直接関係がないような言葉（後述するフィラーワー
ド）のことである。従って、このような言葉を例えば一
括して削除することができ、これにより、対話の内容に
関係がある音声情報部分のみを残すことや、音声情報量
を削減することができる。Further, in the dialogue recording and editing apparatus according to the present invention, the first keyword is an interjective word. Here, the intermittent word is, for example, a word indicating an expression for continuing a dialogue and having no direct relation to the content of the dialogue (a filler word described later), as shown in an embodiment described later. ). Therefore, such words can be deleted, for example, collectively, whereby only the voice information portion related to the content of the dialogue can be left and the amount of voice information can be reduced.

【００１６】また、本発明に係る対話記録編集装置で
は、第２キーワード記憶手段が所定の第２キーワードを
記憶し、編集単位選択手段は第２キーワードと合致する
言葉を含む編集単位を一括して選択或いは選択対象から
除外する機能を有する。従って、例えば予め設定された
キーワード等と合致する言葉を含む編集単位を一括して
選択することや、或いは一括して選択しないようにする
ことができる。Further, in the dialogue recording and editing apparatus according to the present invention, the second keyword storage means stores a predetermined second keyword, and the edit unit selection means collectively edits the edit units including words matching the second keyword. It has a function to select or exclude from selection. Therefore, for example, it is possible to collectively select editing units including words that match a preset keyword or the like, or not to select them collectively.

【００１７】また、本発明に係る対話記録編集装置で
は、編集単位選択手段は第２キーワードと合致する言葉
を含む編集単位が複数ある場合には、これらの編集単位
の対話時刻間の他の編集単位も同時に一括して選択或い
は選択対象から除外する機能を有する。従って、第２キ
ーワードと合致する言葉を含む編集単位が複数あった場
合には、このような複数の編集単位のそれぞれに含まれ
る対話が行われた時刻の間の時刻に行われた対話を含む
他の編集単位も一括して選択等することができるため、
例えば第２キーワードに関する対話部分をまとめて選択
等するのに有効である。In the dialog recording and editing apparatus according to the present invention, when there are a plurality of editing units including a word that matches the second keyword, the editing unit selecting means performs another editing between the editing times of these editing units. The unit also has a function of simultaneously selecting or excluding it from the selection. Therefore, when there are a plurality of editing units including a word that matches the second keyword, a dialog performed at a time between the times at which the dialog included in each of the plurality of editing units is performed is included. Since other editing units can be selected at once,
For example, it is effective to collectively select dialog parts related to the second keyword.

【００１８】また、本発明に係る対話記録編集装置で
は、第２キーワードは対話の公開を禁止する言葉であ
る。従って、公開が禁止される対話を含んだ編集単位を
一括して削除等することができる。Further, in the dialogue recording and editing apparatus according to the present invention, the second keyword is a word that prohibits the publication of the dialogue. Therefore, it is possible to collectively delete the editing unit including the conversation whose disclosure is prohibited.

【００１９】また、本発明に係る対話記録編集装置で
は、重要キーワード抽出手段が編集単位記憶手段に記憶
された編集単位から所定の重要キーワードを抽出し、抽
出された重要キーワードを第２キーワードとして用い
る。従って、重要なキーワードと合致する言葉を含む編
集単位を一括して選択することや、或いは一括して選択
しないようにすることができる。In the dialog recording and editing apparatus according to the present invention, the important keyword extracting means extracts a predetermined important keyword from the editing unit stored in the editing unit storing means and uses the extracted important keyword as the second keyword. . Therefore, it is possible to collectively select editing units including words that match important keywords, or not to select them collectively.

【００２０】また、以上に示したような本発明に係る各
種の処理は、例えば記憶媒体に記憶されたプログラムを
コンピュータにより読み取って実行することにより実現
することも可能である。一例として、本発明に係る記憶
媒体は、コンピュータに実行させるプログラムを当該コ
ンピュータの入力手段が読取可能に記憶しており、当該
プログラムは、対話音声メモリに記憶される対話を記録
した音声情報中から編集操作をほどこす単位となる部分
を編集単位として抽出する処理と、抽出された編集単位
を編集単位メモリに記憶する処理と、記憶された編集単
位の中から所定の編集単位を選択する処理と、選択され
た編集単位からなる編集済み音声情報を生成する処理
と、生成された編集済み音声情報を編集済み音声メモリ
に記憶する処理とを当該コンピュータに実行させる。The various processes according to the present invention as described above can also be realized by, for example, reading and executing a program stored in a storage medium by a computer. As an example, the storage medium according to the present invention stores a program to be executed by a computer in such a manner that input means of the computer can read the program. A process of extracting a portion to be an editing unit as an editing unit, a process of storing the extracted editing unit in an editing unit memory, and a process of selecting a predetermined editing unit from the stored editing units. And causing the computer to execute a process of generating edited audio information including the selected editing unit and a process of storing the generated edited audio information in the edited audio memory.

【００２１】[0021]

【発明の実施の形態】本発明に係る一実施例を図面を参
照して説明する。図１には、本発明に係る対話記録編集
装置の一構成例を示してある。ここで、同図に示した本
実施形態の対話記録編集装置に備えられた各機能部１〜
１９を示しつつ、これら各機能部１〜１９と本発明に言
う各手段との対応関係を示す。すなわち、本例では、対
話音声データ記憶部１により対話音声記憶手段が構成さ
れ、音声認識部２と分割箇所決定部４と音声データ分割
部５とテキストデータ分割部６により編集単位抽出手段
が構成され、編集単位記憶部７により編集単位記憶手段
が構成され、編集単位記憶部７（編集単位の選択状態を
記憶する機能）と編集操作入力部８と画面表示部９と選
択フラグ設定部１５により編集単位選択手段が構成さ
れ、編集操作入力部８と音声データ結合部１６により編
集済み音声情報生成手段が構成され、編集済み音声デー
タ記憶部１７により編集済み音声記憶手段が構成されて
いる。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment according to the present invention will be described with reference to the drawings. FIG. 1 shows a configuration example of a dialog recording and editing apparatus according to the present invention. Here, each functional unit 1 to 1 provided in the dialogue recording and editing apparatus of the present embodiment shown in FIG.
19, the correspondence between each of these functional units 1 to 19 and each means referred to in the present invention is shown. That is, in this example, the dialogue voice data storage unit 1 configures a dialogue voice storage unit, and the voice recognition unit 2, the division location determination unit 4, the voice data division unit 5, and the text data division unit 6 configure the edit unit extraction unit. The editing unit storing unit 7 constitutes an editing unit storing unit. The editing unit storing unit 7 (a function of storing the selection state of the editing unit), the editing operation input unit 8, the screen display unit 9, and the selection flag setting unit 15 are provided. An editing unit selection unit is configured, an edited operation information input unit 8 and a voice data combining unit 16 configure an edited voice information generating unit, and an edited voice data storage unit 17 configures an edited voice storage unit.

【００２２】また、本例では、編集操作入力部８とテキ
ストデータ結合部１８により編集済みテキスト情報生成
手段が構成され、編集済みテキスト記憶部１９により編
集済みテキスト情報記憶手段が構成され、フィラーワー
ド記憶部３により第１キーワード記憶手段が構成され、
非公開キーワード記憶部１１とユーザ指定キーワード記
憶部１２と重要キーワード記憶部１４により第２キーワ
ード記憶手段が構成され、重要キーワード抽出部１３に
より重要キーワード抽出手段が構成されている。In this example, the edited operation input unit 8 and the text data combining unit 18 constitute edited text information generating means, and the edited text storage unit 19 constitutes edited text information storing means. The storage unit 3 forms a first keyword storage unit,
A second keyword storage unit is configured by the secret keyword storage unit 11, the user-specified keyword storage unit 12, and the important keyword storage unit 14, and an important keyword extraction unit is configured by the important keyword extraction unit 13.

【００２３】以下では、上記図１に示した本例の対話記
録編集装置を構成する各機能部１〜１９の構成例や動作
例を示して、本例の対話記録編集装置を説明する。対話
音声データ記憶部１は、対話音声を記録した音声データ
をメモリにより記憶する機能を有している。ここで、本
実施の形態においては、対話音声データはマルチトラッ
クのデジタル音声データとして磁気ディスク上に記録さ
れており、トラック毎に一人の対話参加者の発言が記録
されている。従って、全てのトラックを再生すれば全員
の発言を聴くことができ、或る特定のトラックを再生す
ればある一人の参加者の発言を聴くことができるように
なっている。このようなデータ形式を用いているのは、
対話参加者の発言の音声認識や発言者交代の発見を容易
にするためである。In the following, the dialog recording and editing apparatus of the present embodiment will be described with reference to configuration examples and operation examples of the functional units 1 to 19 constituting the dialog recording and editing apparatus of the present embodiment shown in FIG. The conversation voice data storage unit 1 has a function of storing voice data in which conversation voice is recorded in a memory. Here, in the present embodiment, the dialogue voice data is recorded on the magnetic disk as multi-track digital voice data, and the speech of one dialogue participant is recorded for each track. Therefore, when all the tracks are reproduced, the utterances of all the participants can be heard, and when a certain specific track is reproduced, the utterance of a certain participant can be heard. This data format is used
This is for facilitating voice recognition of the speech of the conversation participant and finding the change of the speaker.

【００２４】なお、本実施の形態では音声データ形式と
してデジタルマルチトラックデータを用い、各トラック
に一人の音声を割り当てているが、後述する各機能部２
〜１９がその機能を果たせる限りにおいて、異なるメデ
ィアや、異なるトラック数や、異なる音声ミキシングが
用いられても構わない。In this embodiment, digital multi-track data is used as the audio data format, and one audio is assigned to each track.
~ 19 can fulfill its function, different media, different number of tracks, and different audio mixing may be used.

【００２５】音声認識部２は、対話音声データ記憶部１
に記憶された音声データに対して音声認識処理を行い、
当該音声データからテキストデータを生成する機能を有
している。ここで、生成されるテキストデータは当該音
声データの音声をテキスト（つまり、文字や記号）で表
したものとなる。本実施の形態では、上記した音声デー
タの各トラック毎に音声認識処理プロセスが並行して音
声認識処理を行う。このようにすることで、複数人が同
時に発話した場合などにおける音声認識の精度の低下を
防ぐことができる。また、対話参加者毎の音声的特徴に
合わせたチューニングを行うことにより、認識精度を高
めることができる。The speech recognition unit 2 includes a conversation speech data storage unit 1
Performs voice recognition processing on the voice data stored in the
It has a function of generating text data from the audio data. Here, the generated text data is the voice of the voice data represented by text (that is, characters or symbols). In the present embodiment, the voice recognition processing process performs the voice recognition processing in parallel for each track of the above-described voice data. In this way, it is possible to prevent a decrease in the accuracy of voice recognition when a plurality of persons utter simultaneously. In addition, by performing tuning in accordance with the voice characteristics of each dialog participant, recognition accuracy can be improved.

【００２６】なお、本実施の形態では複数プロセスによ
る並行処理を行っているが、例えば多人数の同時発話に
対しても安定して音声を認識することができるような音
声認識処理プロセスを用意した場合には、音声が１トラ
ックにミックスダウンされた音声データから一つのプロ
セス処理で複数人分のテキストデータを安定した精度で
生成することができる。In this embodiment, parallel processing is performed by a plurality of processes. However, for example, a voice recognition processing process is provided to enable stable voice recognition even for simultaneous utterances of many people. In this case, text data for a plurality of persons can be generated with stable accuracy by one process from audio data in which audio is mixed down to one track.

【００２７】フィラーワード記憶部３は、例えば対話を
続けるための表現を示す言葉であって、対話の内容には
直接関係がない言葉（以下、フィラーワードと言う）を
メモリにより記憶する機能を有している。本実施の形態
では、フィラーワードとしては、「うん」「はい」「え
え」「なるほど」「そうですね」など相手の発言を促す
ような相づち表現の言葉や、「あの」「ちょっと」など
自分の発言機会を得るための割り込み表現の言葉や、
「あー」「えーと」「うーん」など時間をとるためのつ
なぎの言葉が記憶されている。The filler word storage section 3 has a function of storing words which are expressions indicating expressions for continuing a dialogue and which are not directly related to the contents of the dialogue (hereinafter referred to as filler words). are doing. In this embodiment, as the filler word, words such as "yes", "yes", "yes", "indeed", "yes", and other words that encourage the other person to speak, or "oh,""slightly," Words of interrupt expression to get an opportunity,
The connection words for taking time, such as "ah,""um," and "um," are stored.

【００２８】分割箇所決定部４は、対話音声データ記憶
部１に記憶された音声データから対話編集の編集単位を
抽出するために、当該音声データを分割する箇所を決定
する機能を有している。分割箇所決定部４における処理
は、後述する編集操作入力部８を介したユーザの指示に
より起動される。具体的には、分割箇所決定部４は、分
割のための条件を保持しており、当該条件と前記音声デ
ータとを比較して分割箇所を決定する。本実施の形態で
は、このような条件の種類は大きく二つに分けられる。
一つは対話での発声に関するものであり、もう一つは対
話での発言内容に関するものである。The division location determination unit 4 has a function of determining a location where the audio data is to be divided, in order to extract an editing unit of the dialog editing from the audio data stored in the interaction voice data storage unit 1. . The processing in the division location determination unit 4 is started by a user's instruction via an editing operation input unit 8 described later. Specifically, the division location determination unit 4 holds a condition for division, and determines the division location by comparing the condition with the audio data. In the present embodiment, the types of such conditions are roughly divided into two.
One is related to the utterance in the dialogue, and the other is related to the content of the utterance in the dialogue.

【００２９】発声に関する条件としては、例えば「話者
が変わった場合には、その変わった箇所で音声を分割す
る」という条件や、「規定の時間（本例では１秒）以
上、誰も発言しなかった場合に、その直前の発言が終了
した箇所で音声を分割する」という条件や、「無声状態
から新たな発言があった場合に、その発言の開始時で音
声を分割する」という条件や、「複数の話者が同時に発
言した場合には、話者毎に音声を分割する」という条件
などが用いられる。また、発言内容に関する条件として
は、例えば「フィラーワードがあった場合には、その言
葉の最初と最後で音声を分割する」という条件などが用
いられる。Examples of the condition relating to the utterance include a condition that “when the speaker changes, the voice is divided at the changed portion”, and a condition that “no more than a specified time (1 second in this example) If not, split the voice at the point where the previous utterance ended. "Or the condition" if there is a new utterance from the unvoiced state, split the voice at the start of the utterance. " For example, a condition such as "when a plurality of speakers speak at the same time, voice is divided for each speaker" is used. Further, as a condition relating to the content of a comment, for example, a condition that “if there is a filler word, voice is divided at the beginning and end of the word” is used.

【００３０】分割箇所決定部４は、対話音声データ記憶
部１に記憶された音声データ中の各トラックの音量変化
を計測して、上記した発声に関する条件と比較すること
で分割箇所を決定する。また、分割箇所決定部４は、音
声認識部２による音声認識結果と上記した発言内容に関
する条件とを比較することで分割箇所を決定する。分割
箇所決定部４は、分割箇所を決定すると、分割箇所に挟
まれた音声データとその音声に対応するテキストデータ
とを一つの編集単位として、後述する編集単位記憶部７
に記憶させるように、後述する音声データ分割部５とテ
キストデータ分割部６へ指示を送る。The division location determination unit 4 measures the volume change of each track in the audio data stored in the dialogue audio data storage unit 1 and determines the division location by comparing it with the above utterance condition. Further, the division location determination unit 4 determines the division location by comparing the result of the speech recognition by the speech recognition unit 2 with the above-mentioned condition regarding the content of the utterance. When the division location determination unit 4 determines the division location, the audio data sandwiched between the division locations and the text data corresponding to the audio are set as one editing unit, and an editing unit storage unit 7 described later is used.
An instruction is sent to a voice data dividing unit 5 and a text data dividing unit 6 to be described later.

【００３１】なお、本実施の形態では話者の交代をマル
チトラックデータの音量変化により検出したが、これ
を、音声データ中で話者を特定するための特徴（例え
ば、周波数スペクトルの時間変化）を用いて検出するよ
うに構成することもできる。この場合には、例えばモノ
ラル或いはステレオの音声データ（すなわち、話者毎に
異なる音声データが割り当てられていないもの）を用い
ることが可能となる。In this embodiment, the change of the speaker is detected by the change in the volume of the multi-track data. However, this change is detected by a feature for specifying the speaker in the voice data (for example, the time change of the frequency spectrum). It can also be configured to detect using. In this case, for example, monaural or stereo audio data (that is, audio data to which different audio data is not assigned to each speaker) can be used.

【００３２】音声データ分割部５は、分割箇所決定部４
による分割箇所の決定に従って、対話音声データ記憶部
１に記憶された音声データから分割箇所に挟まれた部分
をコピーし、後述する編集単位記憶部７に記憶させる機
能を有している。テキストデータ分割部６は、分割箇所
決定部４による分割箇所の決定に従って、音声認識部２
による音声認識結果であるテキストデータから分割箇所
に挟まれた部分をコピーし、後述する編集単位記憶部７
に記憶させる機能を有している。The audio data dividing unit 5 includes a division location determining unit 4
Has a function of copying a portion sandwiched between the divided portions from the voice data stored in the dialogue voice data storage unit 1 and storing the copied portion in the editing unit storage unit 7 described later, in accordance with the determination of the divided portion. The text data division unit 6 determines the division of the speech by the speech recognition unit 2
The part sandwiched between the divided portions is copied from the text data as the speech recognition result by the
Has a function of storing the information.

【００３３】編集単位記憶部７は、編集単位をメモリに
より記憶する機能を有している。ここで、本実施の形態
において編集単位記憶部７に記憶されるデータのデータ
構造例を図２に示す。同図に示されるように、本実施の
形態では表形式のデータ構造で編集単位を記憶してお
り、表の「行」にあたる部分が個々の編集単位となる。The editing unit storage unit 7 has a function of storing an editing unit in a memory. Here, FIG. 2 shows an example of a data structure of data stored in the edit unit storage unit 7 in the present embodiment. As shown in the figure, in the present embodiment, editing units are stored in a tabular data structure, and a portion corresponding to a “row” of the table is an individual editing unit.

【００３４】また、同図に示されるように、各編集単位
は４つの属性のデータから構成される。１つ目の属性で
ある「順序」は、対話における時間的な順序を示し、値
として正数が入る。この「順序」の値は発話の開始時刻
により決まる。従って、複数の話者が同時に発言した部
分の順序については、後から被さった発言が後の順番と
なる。２番目の属性である「音声」には、音声データ分
割部５によってコピーされた音声データが代入される。As shown in the figure, each editing unit is composed of data of four attributes. The first attribute “order” indicates a temporal order in the dialogue, and a positive number is entered as a value. The value of the “order” is determined by the start time of the utterance. Therefore, as for the order of the portions where a plurality of speakers have simultaneously spoken, the later speech is the later order. The audio data copied by the audio data dividing unit 5 is substituted for the second attribute “audio”.

【００３５】３番目の属性である「テキスト」には、テ
キストデータ分割部６によってコピーされたテキストデ
ータ（すなわち、「音声」に代入される音声データをテ
キストデータへ変換したもの）が代入される。４番目の
属性である「選択フラグ」は、その編集単位が選択され
ているか否かを示すフラグであり、値として例えば予約
語であるＴＲＵＥ又はＦＡＬＳＥ（ＦＡＬＳＥがデフォ
ルト値）が入る。As the third attribute “text”, the text data copied by the text data division unit 6 (ie, the voice data substituted for “voice” converted to text data) is substituted. . The “selection flag”, which is the fourth attribute, is a flag indicating whether or not the editing unit has been selected. For example, TRUE or FALSE (FALSE is a default value) which is a reserved word is entered as a value.

【００３６】なお、対話音声データ記憶部１に記憶され
た音声データ中で対話参加者の全員が無声である部分に
ついては、上記した属性「音声」の値として予約語であ
るＮＵＬＬが入り、上記した属性「テキスト」の値とし
て『（無声）』というテキストデータが入る。The part of the voice data stored in the voice data storage unit 1 where all the dialog participants are unvoiced contains the reserved word NULL as the value of the attribute "voice". The text data “(silent)” is entered as the value of the attribute “text”.

【００３７】編集操作入力部８は、例えばキーボードや
マウス等から構成されており、対話の編集に関してユー
ザによりメニューの選択や値の入力などの操作入力を実
行させるための機能を有している。本実施の形態では、
ユーザは編集操作入力部８を介して次の１）〜８）のよ
うな編集操作を行うことができる。The editing operation input section 8 is composed of, for example, a keyboard and a mouse, and has a function for allowing the user to execute an operation input such as menu selection or value input with respect to interactive editing. In the present embodiment,
The user can perform the following editing operations 1) to 8) via the editing operation input unit 8.

【００３８】１）対話音声データ記憶部１や編集単位記
憶部７や編集済み音声データ記憶部１７といった各機能
部に記憶された音声データの再生を指示する操作。２）編集単位記憶部７や編集済みテキスト記憶部１９と
いった各機能部に記憶されたテキストデータの画面表示
を指示する操作。1) An operation for instructing reproduction of audio data stored in each functional unit such as the interactive audio data storage unit 1, the edit unit storage unit 7, and the edited audio data storage unit 17. 2) An operation of instructing a screen display of text data stored in each functional unit such as the edit unit storage unit 7 and the edited text storage unit 19.

【００３９】３）対話音声データ記憶部１に記憶された
音声データから編集単位を新規に生成する（分割箇所決
定部４による分割箇所決定処理を起動する）ことを指示
する操作。４）一つの編集単位に対して選択或いは未選択を直接的
に指示する操作。なお、本実施の形態で言う選択とは当
該編集単位を編集対象等として選択することを示し、未
選択とは当該編集単位を編集対象等として選択しないよ
うにすること（つまり、選択対象から除外すること）を
示す。５）全ての編集単位に対して一括して選択或いは未選択
を指示する操作。3) An operation for instructing to newly generate an editing unit from the audio data stored in the dialogue audio data storage unit 1 (start the division location determination processing by the division location determination unit 4). 4) An operation of directly instructing selection or non-selection for one editing unit. Note that the selection in the present embodiment indicates that the editing unit is selected as an editing target or the like, and the non-selection indicates that the editing unit is not selected as an editing target or the like (that is, excluded from the selection target) To do). 5) Operation for instructing selection or non-selection for all editing units at once.

【００４０】６）複数の編集単位に対して、キーワード
を用いて、一括して選択或いは未選択を指示する操作。
具体的には、後述する非公開キーワードや、ユーザ指定
キーワードや、重要キーワードを用いて複数の編集単位
を一括して選択或いは未選択の状態にすることができ
る。6) An operation of collectively selecting or not selecting a plurality of editing units by using a keyword.
Specifically, a plurality of editing units can be collectively selected or unselected by using a secret keyword, a user-specified keyword, or an important keyword described later.

【００４１】また、選択や未選択の対象となる編集単位
は、そのキーワードを含む編集単位か、或いはそのキー
ワードを含む編集単位及びこれらの編集単位に時間的に
挟まれる全ての編集単位か、のいずれかを選ぶことがで
きる。例えば、前者の選び方であれば、そのキーワード
を含む複数箇所の発言を一括してピックアップすること
ができる。また後者の選び方であれば、そのキーワード
が何回か出てくる、時間的に連続した対話部分を一括し
てピックアップすることができる。なお、ユーザは、例
えば後述する各キーワード記憶部１１、１２、１４に記
憶されるキーワード群から任意のものを選択することが
できる。また、キーワードを含む論理式を用いた指示を
行って、当該論理式に適合する編集単位を選択等させる
ことも可能である。The editing unit to be selected or unselected is an editing unit including the keyword, or an editing unit including the keyword and all the editing units temporally interposed between these editing units. You can choose either. For example, in the case of the former method, it is possible to collectively pick up remarks at a plurality of locations including the keyword. In the latter case, it is possible to collectively pick up time-continuous dialogue portions in which the keyword appears several times. Note that the user can select an arbitrary one from a group of keywords stored in, for example, each of the keyword storage units 11, 12, and 14 described below. It is also possible to issue an instruction using a logical expression including a keyword, and to select an editing unit that matches the logical expression.

【００４２】７）上記のようにして選択された編集単位
から編集済み音声データを生成することを指示する操
作。８）上記のようにして選択された編集単位から編集済み
テキストデータを生成することを指示する操作。7) An operation to instruct generation of edited audio data from the editing unit selected as described above. 8) An operation to instruct generation of edited text data from the editing unit selected as described above.

【００４３】画面表示部９は、編集操作入力部８におい
てユーザから受け付けた操作指示や、編集単位記憶部７
に記憶されたテキストデータや、後述する各種のキーワ
ード群や、後述する編集済みテキスト記憶部１９に記憶
されたテキストデータ等を画面に表示出力する機能を有
している。音声出力部１０は、対話音声データ記憶部１
に記憶された音声データや、編集単位記憶部７に記憶さ
れた音声データや、後述する編集済み音声データ記憶部
１７に記憶された音声データ等をスピーカ等から音声出
力する機能を有している。The screen display unit 9 stores an operation instruction received from the user at the editing operation input unit 8 and an editing unit storage unit 7.
Has a function of displaying and outputting, on a screen, text data stored in a text file, various keyword groups described later, text data stored in an edited text storage unit 19 described later, and the like. The voice output unit 10 includes the interactive voice data storage unit 1
, Audio data stored in the edit unit storage unit 7, audio data stored in an edited audio data storage unit 17, which will be described later, and the like, and a function of outputting the audio from a speaker or the like. .

【００４４】非公開キーワード記憶部１１は、対話の公
開を禁止する表現を示す言葉をメモリにより記憶する機
能を有している。本実施の形態では、このような言葉と
して、「オフレコ」や「カット」や「削除」や「非公
開」などといった言葉が記憶されている。ユーザ指定キ
ーワード記憶部１２は、編集操作入力部８を用いてユー
ザが入力したキーワードをメモリにより記憶する機能を
有している。The secret keyword storage unit 11 has a function of storing, in a memory, words indicating expressions that prohibit the disclosure of dialogue. In the present embodiment, such words as “off the record”, “cut”, “delete”, and “non-disclosure” are stored. The user-specified keyword storage unit 12 has a function of storing a keyword input by the user using the editing operation input unit 8 in a memory.

【００４５】重要キーワード抽出部１３は、編集単位記
憶部７に記憶されたテキストデータから、対話において
重要とみなされるキーワード（重要キーワード）を抽出
する機能を有している。本実施の形態では、固有名詞
と、頻出する一般名詞とを重要キーワードとして抽出す
る。この抽出のために、本例の重要キーワード抽出部１
３は用語辞書を保持しており、当該辞書の内容と編集単
位記憶部７に記憶されたテキストデータとを比較して重
要キーワードを抽出する。なお、このような用語辞書に
対してユーザは編集操作入力部８を介して用語を登録す
ることが可能であり、登録された用語は例えば優先的に
重要キーワードとして抽出される。重要キーワード記憶
部１４は、重要キーワード抽出部１３により抽出された
重要キーワードをメモリにより記憶する機能を有してい
る。The important keyword extracting section 13 has a function of extracting keywords (important keywords) regarded as important in the dialog from the text data stored in the editing unit storage section 7. In the present embodiment, proper nouns and frequent common nouns are extracted as important keywords. For this extraction, the important keyword extraction unit 1 of this example
Reference numeral 3 holds a term dictionary, and extracts important keywords by comparing the contents of the dictionary with text data stored in the editing unit storage unit 7. The user can register terms in such a term dictionary via the editing operation input unit 8, and the registered terms are preferentially extracted as important keywords, for example. The important keyword storage unit 14 has a function of storing the important keywords extracted by the important keyword extraction unit 13 in a memory.

【００４６】選択フラグ設定部１５は、編集操作入力部
８を介したユーザの指示により、編集単位記憶部７に記
憶された編集単位の「選択フラグ」の値を設定する機能
を有している。本実施の形態では、ユーザから指示され
たキーワードを含む（論理式で指示された場合はそれを
満たす）テキストデータを「テキスト」に有する編集単
位を検索し、検索された編集単位の「選択フラグ」の値
を「選択」する場合にはＴＲＵＥに、選択しない（未選
択の）場合にはＦＡＬＳＥに設定する。The selection flag setting section 15 has a function of setting a value of a “selection flag” of an editing unit stored in the editing unit storage section 7 in accordance with a user's instruction via the editing operation input section 8. . In the present embodiment, a search is performed for an editing unit having text data including a keyword specified by a user (when the logical data is specified, the text data is satisfied) in “text”, and a “selection flag” of the searched editing unit is searched. Is set to TRUE when the value of "" is "selected", and set to FALSE when not selected (unselected).

【００４７】ここで、上記したユーザからの指示が「時
間的に挟まれる編集単位も一括して設定する」ものであ
れば、上記のようにして検索された複数の編集単位に属
性「順序」の値が挟まれる編集単位（すなわち、検索さ
れた複数の編集単位の対話時刻間の他の編集単位）の
「選択フラグ」の値も、当該検索された編集単位につい
て設定されるフラグの値と同じ値に設定する。Here, if the instruction from the user is to “set also the editing units interposed temporally,” the attribute “order” is assigned to the plurality of editing units searched as described above. The value of the “selection flag” of the editing unit (ie, another editing unit between the conversation times of the plurality of searched editing units) between which the value of “n” is interposed is also the value of the flag set for the searched editing unit. Set to the same value.

【００４８】音声データ結合部１６は、編集単位記憶部
７に記憶された複数の音声データを結合して一つの音声
データ（編集済み音声データ）を生成する機能を有して
いる。ここで、結合の対象となる音声データは、上記し
た「選択フラグ」の値がＴＲＵＥである編集単位が有す
る音声データである。音声データ結合部１６による処理
は、例えば編集操作入力部８を介したユーザからの指示
により起動される。編集済み音声データ記憶部１７は、
音声データ結合部１６により生成された音声データをメ
モリにより記憶する機能を有している。The audio data combining section 16 has a function of combining a plurality of audio data stored in the editing unit storage section 7 to generate one audio data (edited audio data). Here, the audio data to be combined is audio data included in the editing unit in which the value of the “selection flag” is TRUE. The processing by the audio data combining unit 16 is started, for example, by an instruction from the user via the editing operation input unit 8. The edited voice data storage unit 17
It has a function of storing the audio data generated by the audio data combining unit 16 in a memory.

【００４９】テキストデータ結合部１８は、編集単位記
憶部７に記憶された複数のテキストデータを結合して一
つのテキストデータ（編集済みテキストデータ）を生成
する機能を有している。ここで、結合の対象となるテキ
ストデータは、上記した「選択フラグ」の値がＴＲＵＥ
である編集単位が有するテキストデータである。テキス
トデータ結合部１８による処理は、例えば編集操作入力
部８を介したユーザからの指示により起動される。編集
済みテキストデータ記憶部１９は、テキストデータ結合
部１８により生成されたテキストデータをメモリにより
記憶する機能を有している。The text data combining unit 18 has a function of combining a plurality of text data stored in the editing unit storage unit 7 to generate one text data (edited text data). Here, the text data to be combined has the value of the above “selection flag” set to TRUE.
Is the text data of the editing unit. The processing by the text data combining unit 18 is started by, for example, an instruction from the user via the editing operation input unit 8. The edited text data storage unit 19 has a function of storing the text data generated by the text data combining unit 18 in a memory.

【００５０】以上に示したように、本例の対話記録編集
装置では、対話の記録の編集に向いた編集操作系を提供
することができ、これにより、ユーザによる対話記録の
編集を行い易くすることができる。具体的には、本例の
対話記録編集装置では、対話音声データ記憶部１に記憶
された音声データから分割箇所決定部４により複数の編
集単位が抽出されるため、例えば全体を残しつついくつ
かの発言を削除していくような編集であっても容易に行
うことができる。また、本例の対話記録編集装置では、
キーワードによる検索等が可能であるため、例えば関連
する発言をピックアップすることについても大きな支援
が与えられる。As described above, the dialogue recording editing apparatus of the present embodiment can provide an editing operation system suitable for editing a dialogue recording, thereby facilitating the user to edit the dialogue recording. be able to. Specifically, in the dialogue recording / editing apparatus of the present example, since a plurality of editing units are extracted by the division location determination unit 4 from the voice data stored in the voice data storage unit 1, for example, some The editing can be easily performed even if the comment is deleted. Also, in the dialogue record editing apparatus of this example,
Since a search by a keyword or the like is possible, great support is provided for, for example, picking up a related statement.

【００５１】なお、本例では、本発明の好適な実施形態
を示したが、本発明に係る対話記録編集装置の構成とし
ては、必ずしも本例で示したものに限られず、種々な構
成が用いられてもよい。例えば、本実施形態では、好ま
しい態様として、本発明に係る対話記録編集装置により
行われる各種の処理としては、例えばプロセッサやメモ
リ等を備えたハードウエア資源においてプロセッサがＲ
ＯＭに格納された制御プログラムを実行することにより
制御される構成としたが、例えば当該処理を実行するた
めの各機能手段を独立したハードウエア回路として構成
することも可能である。Although the preferred embodiment of the present invention has been described in the present embodiment, the configuration of the dialogue recording and editing apparatus according to the present invention is not necessarily limited to that shown in the present embodiment, and various configurations may be used. You may be. For example, in the present embodiment, as a preferable mode, the various processes performed by the dialogue recording and editing apparatus according to the present invention include, for example, a case where the processor is a hardware resource including a processor and a memory.
Although the configuration is controlled by executing the control program stored in the OM, for example, each functional unit for executing the processing may be configured as an independent hardware circuit.

【００５２】また、本発明は上記のような制御プログラ
ムを格納したフロッピー（登録商標）ディスクやＣＤ−
ＲＯＭ等のコンピュータにより読み取り可能な記憶媒体
として把握することもでき、当該制御プログラムを記憶
媒体からコンピュータに入力してプロセッサに実行させ
ることにより、本発明に係る処理を遂行させることがで
きる。The present invention also relates to a floppy (registered trademark) disk or CD-ROM storing the above-described control program.
It can be understood as a computer-readable storage medium such as a ROM, and the processing according to the present invention can be performed by inputting the control program from the storage medium to the computer and causing the processor to execute the control program.

【００５３】[0053]

【発明の効果】以上説明したように、本発明に係る対話
記録編集装置や記憶媒体によると、例えば対話の編集に
適した編集単位が対話音声から抽出されるため、ユーザ
はこのような編集単位に対して編集をほどこすことがで
き、ユーザによる編集操作が容易になる。特に、本発明
の一態様では、相づちや割り込みやつなぎの言葉など、
対話を続けるために発せられた言葉を一括して除去する
ことができる。As described above, according to the dialogue recording / editing apparatus and storage medium of the present invention, for example, an editing unit suitable for editing a dialogue is extracted from a dialogue voice. Can be edited, and the editing operation by the user becomes easy. In particular, in one embodiment of the present invention, such as a word of rapping, interruption, and bridging,
The words spoken to continue the conversation can be removed at once.

【００５４】また、本発明の一態様では、或るキーワー
ドが現れる対話部分を一括して取り出す、或いは、一括
して除去することができる。更に、本発明の一態様で
は、このようなキーワードとして、ユーザが明示的に指
定するものや、「オフレコ」などの公開禁止に関わるも
のや、対話中に現れる重要キーワードなどを容易に使い
分けることができる。Further, in one embodiment of the present invention, a dialog part in which a certain keyword appears can be collectively extracted or removed collectively. Furthermore, in one embodiment of the present invention, it is possible to easily use, as such keywords, those explicitly specified by the user, those related to prohibition of disclosure such as “off the record”, and important keywords appearing during a dialogue. it can.

【００５５】また、本発明の一態様では、対話中の無音
部分を一括して除去することができる。また、本発明の
一態様では、編集結果を音声として、或いは、テキスト
として残すことができる。以上のように、本発明に係る
対話記録編集装置や記憶媒体では、従来技術を用いた場
合と比べて、ユーザによる対話の編集を容易にすること
ができる。Further, in one embodiment of the present invention, a silent portion during a conversation can be removed at a time. In one embodiment of the present invention, the editing result can be left as audio or text. As described above, in the dialogue recording / editing apparatus and storage medium according to the present invention, it is possible to facilitate the editing of the dialogue by the user as compared with the case where the related art is used.

[Brief description of the drawings]

【図１】本発明の一実施形態に係る対話記録編集装置の
構成例を示す図である。FIG. 1 is a diagram illustrating a configuration example of a dialogue recording and editing apparatus according to an embodiment of the present invention.

【図２】編集単位記憶部のデータ構造の一例を示す図で
ある。FIG. 2 is a diagram illustrating an example of a data structure of an editing unit storage unit.

[Explanation of symbols]

１・・対話音声データ記憶部、２・・音声認識部、３
・・フィラーワード記憶部、４・・分割箇所決定部、
５・・音声データ分割部、６・・テキストデータ分割
部、７・・編集単位記憶部、８・・編集操作入力部、
９・・画面表示部、１０・・音声出力部、１１・・
非公開キーワード記憶部、１２・・ユーザ指定キーワー
ド記憶部、１３・・重要キーワード抽出部、１４・・
重要キーワード記憶部、１５・・選択フラグ設定部、
１６・・音声データ結合部、１７・・編集済み音声デー
タ記憶部、１８・・テキストデータ結合部、１９・・
編集済みテキスト記憶部、1. Dialogue voice data storage unit 2. Voice recognition unit 3,
..Filler word storage section, 4..Division location determination section,
5 audio data division section, 6 text data division section, 7 edit unit storage section, 8 edit operation input section,
9 ··· Screen display unit, 10 ·· Audio output unit, 11 ···
Private keyword storage unit, 12 user-specified keyword storage unit, 13 important keyword extraction unit, 14
Important keyword storage, 15 ... selection flag setting,
16 ··· Voice data connection unit, 17 ··· Edited voice data storage unit, 18 ··· Text data connection unit, 19 ···
Edited text storage,

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 19/00 Ｇ１０Ｌ 9/18 ＪＧ１１Ｂ 27/031 Ｇ１１Ｂ 27/02 ＨＦターム(参考） 5B075 ND14 NS01 PP02 PP03 PP23 PQ04 5D015 AA04 BB01 HH13 KK03 KK04 5D045 AA07 AA11 DB01 9A001 BB03 HH15 ──────────────────────────────────────────────────の Continued on the front page (51) Int.Cl. ⁷ Identification symbol FI Theme coat ゛ (Reference) G10L 19/00 G10L 9/18 J G11B 27/031 G11B 27/02 HF term (Reference) 5B075 ND14 NS01 PP02 PP03 PP23 PQ04 5D015 AA04 BB01 HH13 KK03 KK04 5D045 AA07 AA11 DB01 9A001 BB03 HH15

Claims

[Claims]

1. A dialogue voice storage means for storing voice information recording a dialogue, and an editing unit for extracting, from the voice information stored in the dialogue voice storage means, a portion serving as a unit for performing an editing operation as an editing unit. Extracting means; editing unit storing means for storing the extracted editing unit; editing unit selecting means for selecting a predetermined editing unit from the stored editing units; and edited audio information comprising the selected editing unit A dialogue recording and editing apparatus, comprising: edited voice information generating means for generating the edited voice information; and edited voice storage means for storing the generated edited voice information.

2. The dialog recording and editing apparatus according to claim 1, wherein the editing unit extracting means extracts unvoiced portions in the audio information as editing units, and the editing unit selecting means collectively edits the unvoiced portions as the editing units. A dialogue recording and editing device having a function of excluding from a selection.

3. The dialogue recording and editing apparatus according to claim 1, wherein the editing unit extracting unit extracts an editing unit based on a speaker of the audio information.

4. The dialog recording and editing apparatus according to claim 1, wherein the editing unit extracting means has a function of converting the audio information into text information corresponding to the audio information. The editing unit storing means stores text information converted from audio information of the editing unit as an attribute of the editing unit, and the editing unit selecting means selects an editing unit based on the text information which is the attribute of the editing unit. A dialogue record editing device, characterized in that:

5. The dialogue recording and editing apparatus according to claim 4, wherein the edited text information generating means generates edited text information comprising text information which is an attribute of the editing unit selected by the editing unit selecting means. A dialog recording and editing apparatus, comprising: edited text information storage means for storing the generated edited text information.

6. The dialog recording and editing apparatus according to claim 4, wherein the editing unit extracting unit extracts the editing unit based on text information converted from the voice information stored in the dialog voice storage unit. A dialogue recording and editing apparatus characterized in that:

7. The dialogue recording and editing apparatus according to claim 6, further comprising: a first keyword storage unit for storing a predetermined first keyword, wherein the editing unit extracting unit extracts from the voice information stored in the dialogue voice storage unit. If the converted text information includes a part that matches the first keyword stored in the first keyword storage means, the audio information part corresponding to the part is extracted as an editing unit, and the editing unit selection means A dialogue recording and editing apparatus having a function of collectively selecting or excluding an editing unit extracted as a part matching the first keyword from selection targets.

8. The dialogue recording and editing apparatus according to claim 7, wherein the first keyword is a pitched-down word.

9. The dialogue recording and editing apparatus according to claim 4, further comprising a second keyword storage unit for storing a predetermined second keyword, wherein the editing unit selection unit includes a second unit. A dialog recording / editing apparatus having a function of collectively selecting or excluding edit units including words matching a keyword from selection targets.

10. The dialog recording and editing apparatus according to claim 9, wherein, when there are a plurality of editing units including a word that matches the second keyword, the editing unit selecting unit determines a time interval between the dialog times of these editing units. A dialogue recording and editing apparatus having a function of simultaneously selecting or excluding other editing units from a selection target.

11. The dialog recording and editing apparatus according to claim 9, wherein the second keyword is a word that prohibits disclosure of the dialog.

12. The dialog recording and editing apparatus according to claim 9, further comprising an important keyword extracting means for extracting a predetermined important keyword from the editing unit stored in the editing unit storing means, wherein the second keyword is used as the second keyword. A dialogue record editing device using the extracted important keywords.

13. A storage medium storing a program to be executed by a computer in a manner readable by input means of the computer, wherein the program executes an editing operation from voice information recorded in a dialogue voice memory, in which a dialogue is recorded. A process of extracting a portion serving as a rubbing unit as an editing unit; a process of storing the extracted editing unit in an editing unit memory; a process of selecting a predetermined editing unit from the stored editing units; A storage medium for causing a computer to execute a process of generating edited audio information including an editing unit and a process of storing the generated edited audio information in an edited audio memory.