JP2007221574A

JP2007221574A - Voice processing apparatus, voice processing method, and program

Info

Publication number: JP2007221574A
Application number: JP2006041150A
Authority: JP
Inventors: Tamihei Hiramatsu; 民平平松
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2006-02-17
Filing date: 2006-02-17
Publication date: 2007-08-30

Abstract

PROBLEM TO BE SOLVED: To produce voice data for well recording only a voice of a speaker when a plurality of microphones are disposed. SOLUTION: An apparatus includes a first memory 21 for storing original voice data collected by at least more than two microphones after adding an identifier, an identifier extracting unit 42c for extracting the identifier added to the original voice data in which a level of the original voice data stored in the first memory 21 exceeds a predetermined threshold, and a voice data addition control unit 42d for adding the original voice data corresponding to what is except for the extracted identifier from among the original voice data read from the first memory 21 after attenuating. The voice data performed the addition process are to be produced. COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、例えば会議において発言者の音声を録音する場合に適用して好適な音声処理装置、音声処理方法及びプログラムに関する。 The present invention relates to a voice processing apparatus, a voice processing method, and a program that are suitable for application when, for example, a voice of a speaker is recorded in a conference.

従来、会議などにおいて、空間的に分散した発言者が発する複数の発言を、複数のマイクロフォン（以下の説明では、マイクとも称する。）で録音するために様々な技術が提供されていた。例えば、発言者に近い場所に置かれたマイク信号のみを発言者やオペレータの操作で選択（スイッチング）して録音する技術があった。また、音声信号の振幅を検出して発言者の選択を自動的に行って録音する技術があった。さらに、全てのマイクで集音した音声信号を加算（ミクス）して録音する技術があった。そして、近年は、ハードディスクドライブやフラッシュメモリ等の記憶媒体の大容量化や低価格化の進展によって、長時間の録音であっても必要な記憶容量を確保することができるようになってきた。 Conventionally, in a conference or the like, various techniques have been provided for recording a plurality of utterances uttered by spatially dispersed speakers using a plurality of microphones (also referred to as microphones in the following description). For example, there has been a technique of selecting (switching) and recording only a microphone signal placed near a speaker by the operation of the speaker or an operator. There has also been a technique for recording by automatically detecting the amplitude of the audio signal and selecting the speaker. Furthermore, there has been a technique for recording by adding (mixing) audio signals collected by all microphones. In recent years, the required storage capacity can be ensured even for long-time recording due to the increase in capacity and price of storage media such as hard disk drives and flash memories.

特許文献１には、複数のマイクにより発言者の音声を集音するデータ伝送システムの記載がある。
特開２００５−１１７１３４号公報（図１４） Patent Document 1 describes a data transmission system that collects a speaker's voice using a plurality of microphones.
Japanese Patent Laying-Open No. 2005-117134 (FIG. 14)

ところで、従来用いてきた発言者のマイクを選択して録音する技術ではマイク選択操作のオン／オフ切り替えが必要であったため、操作自体が煩わしく、また操作を誤ると録音できないという不都合があった。例えば、選択操作のオン切り替えを忘れた場合、音声を録音できなくなってしまう。また、オフ切り換えを忘れた場合、不要な録音を続行してしまう。 By the way, the conventional technique of selecting and recording a speaker's microphone requires the on / off switching of the microphone selection operation. Therefore, there is a problem that the operation itself is troublesome and recording cannot be performed if the operation is wrong. For example, if the selection operation is forgotten to be switched on, voice cannot be recorded. Also, if you forget to switch off, unnecessary recording will continue.

また、音声信号の振幅を検出して自動的に録音する技術では、音声信号がある程度大きくなってからマイクオンするので、発言開始音声が頭切れして録音してしまう。また、発言終了間際に小さい音声となったところでマイクオフとなることもあり、録音した発言の語尾が唐突に切断されてしまう。あるいは不要な雑音であっても、ある閾値を超えるとマイクオンして録音してしまう。そして、頭切れしないようにマイクオンしやすくすると雑音であってもマイクオンしてしまう。一方、雑音でオンしないようにすると、発言を開始していても雑音とみなしてマイクオンしないため、発言開始音声が切れてしまう。このように自動録音技術では、確実に音声を録音できない可能性があった。 Further, in the technique of automatically recording by detecting the amplitude of the audio signal, the microphone is turned on after the audio signal becomes large to some extent, so that the speech start voice is cut off and recorded. Also, the microphone may be turned off when the voice becomes low just before the end of the speech, and the ending of the recorded speech is suddenly cut off. Or even if it is unnecessary noise, if it exceeds a certain threshold, the microphone is turned on and recorded. If it is easy to turn on the microphone so as not to cut off the head, the microphone is turned on even with noise. On the other hand, if it is set not to be turned on by noise, even if speech is started, it is regarded as noise and the microphone is not turned on, so the speech start voice is cut off. As described above, there is a possibility that the sound cannot be reliably recorded by the automatic recording technique.

また、全てのマイクで集音した音声信号を加算して録音する技術では、非発言者のマイクが集音する不要な周囲の背景雑音が加算されるためＳ／Ｎ（Signal to Noise）が非常に悪化してしまう。例えば、２０個のマイクを用いて音声を録音している場合、発言者１人分の音声に対して雑音が１９マイク分加算されてしまい、録音した音声の品質が良いとは言えなかった。 In addition, in the technique of recording by adding the audio signals collected by all microphones, unnecessary background noise collected by non-speaker microphones is added, resulting in a very high S / N (Signal to Noise). It gets worse. For example, when voice is recorded using 20 microphones, noise is added for 19 microphones to the voice of one speaker, and it cannot be said that the quality of the recorded voice is good.

このように、従来は多数のマイクで集音した音声信号から録音状態を良好に録音する技術がなかった。そして、多数のマイクから音声を集音する場合は、ミキサが必要であるため、未だに効率的に音声議事録を作成する装置は実用化されていなかった。 As described above, conventionally, there has been no technique for recording a good recording state from audio signals collected by a large number of microphones. In order to collect sound from a large number of microphones, a mixer is necessary, and thus an apparatus for efficiently creating a sound minutes has not yet been put into practical use.

本発明はこのような状況に鑑みて成されたものであり、多数のマイクで集音する場合に、発言者の音声を良好に録音することを目的とする。 The present invention has been made in view of such circumstances, and an object of the present invention is to satisfactorily record the voice of a speaker when collecting sound with a large number of microphones.

本発明は、少なくとも２つ以上のマイクロフォンで集音した原音声データに識別子を付与して記憶し、記憶された原音声データのレベルが予め設定した閾値を超過する原音声データに付与された識別子を抽出し、読み出した原音声データのうち、抽出した識別子以外に対応する原音声データを減衰させて加算するものである。 The present invention assigns an identifier to original voice data collected by at least two or more microphones and stores the identifier, and the identifier assigned to the original voice data in which the level of the stored original voice data exceeds a preset threshold value Are extracted and the original audio data corresponding to the extracted identifier other than the extracted identifier is attenuated and added.

このようにしたことで、発言者を抽出して、発言者以外の音声を減衰させた音声を録音することが可能となった。 By doing in this way, it became possible to record the sound which extracted the speaker and attenuated the sound other than a speaker.

本発明によれば、発言者を抽出して、発言者以外の音声を減衰させた音声議事録を作成できるため、発言者の音声が強調されると共に不要な周囲の雑音が低減されるという効果がある。 According to the present invention, it is possible to extract a speaker and create a voice minutes in which a voice other than the speaker is attenuated. Therefore, the voice of the speaker is emphasized and unnecessary ambient noise is reduced. There is.

以下、本発明の一実施の形態を、添付図面を参照して説明する。本実施の形態では、発言者の音声を複数本のマイクで集音して、音声議事録を作成可能な音声議事録作成装置に適用した例について説明する。本例で用いる音声議事録作成装置は、マイクで集音した音声から発言者のマイクを特定し、他のマイクで集音した音声を減衰させることで、発言者の音声を強調して記録した音声議事録を作成できる。 Hereinafter, an embodiment of the present invention will be described with reference to the accompanying drawings. In the present embodiment, an example will be described in which the present invention is applied to a voice minutes creation apparatus that can create a voice minutes by collecting voices of speakers with a plurality of microphones. The audio minutes creation device used in this example identifies the speaker's microphone from the sound collected by the microphone, and attenuates the sound collected by the other microphones to emphasize and record the speaker's voice. Can create audio minutes.

まず、本例の音声議事録作成装置の外部構成例について、図１を参照して説明する。図１は、本例の音声議事録作成装置を各種装置に接続した音声議事録作成システムの接続構成例を示した図である。本例の音声議事録を作成する音声議事録作成装置１は、発言者の音声を集音する２６本のマイクＭ１〜Ｍ２６より、原音声信号として収集する。マイクＭ１〜Ｍ２６は、識別可能とするための識別子としてマイク番号が付与してあり、会議テーブル５上に設置してある。そして、マイクＭ１〜Ｍ２６で集音したアナログ音声信号は、アナログ／デジタル変換する端末ａ１〜ａ２６でデジタルの音声データ（以下この音声データを原音声データと称する）に変換して音声議事録作成装置１に供給される。本例の端末ａ１〜ａ２６はマイク番号の順に、音声議事録作成装置１の音声入力部４１までデイジーチェーン接続してある。集音した音声データは、音声入力インタフェースである音声入力部４１を介して音声議事録作成装置１に供給する。ただし、マイク本数は２６本に限定されるものではなく、必要に応じて本数を増減させてよい。 First, an example of the external configuration of the audio minutes creating apparatus of this example will be described with reference to FIG. FIG. 1 is a diagram showing a connection configuration example of a voice minutes creation system in which the voice minutes creation apparatus of this example is connected to various devices. The audio minutes creation apparatus 1 that creates the audio minutes of the present example collects the voice of the speaker from 26 microphones M1 to M26 as the original audio signal. Microphones M 1 to M 26 are provided with microphone numbers as identifiers for identification, and are installed on the conference table 5. The analog audio signals collected by the microphones M1 to M26 are converted into digital audio data (hereinafter, the audio data is referred to as original audio data) by the terminals a1 to a26 for analog / digital conversion. 1 is supplied. The terminals a1 to a26 in this example are daisy chain connected to the voice input unit 41 of the voice minutes creating apparatus 1 in the order of microphone numbers. The collected voice data is supplied to the voice minutes creating apparatus 1 via the voice input unit 41 which is a voice input interface. However, the number of microphones is not limited to 26, and the number may be increased or decreased as necessary.

音声議事録作成装置１は、装置１の前面に外部装置へのインタフェースとなる入力部と出力部を備える。キーボード，マウス等からなる入力装置４は、外部信号を装置１に入力する入力部４７に接続してあり、入力信号を装置１に供給する。また、ボタン，スイッチ等を備えた操作部４３が、装置１の前面に取り付けてあり、直接操作可能としてある。入力装置４や操作部４３の操作入力によって、後述するレベルデータの閾値の設定、ピーク個数の設定、音声議事録作成時間の開始／終了設定等を行うことができる。音声議事録作成装置１で作成する音声議事録は、装置１内部に格納してある後述する第４のメモリ２４に記憶してある。液晶ディスプレイパネルを備えたモニタ２には、時間毎に全マイク番号に対応するレベルデータの波形を表示させて、レベルが閾値ｔｈ以下で聞き取りにくい発言者であっても音声議事録に記録させるようマニュアル操作を行うことができる。そして、音声議事録の再生時には、第４のメモリ２４から音声議事録ファイルを読み出してスピーカ３より、放音させる。 The audio minutes creating apparatus 1 includes an input unit and an output unit that serve as an interface to an external device on the front surface of the device 1. The input device 4 including a keyboard and a mouse is connected to an input unit 47 that inputs an external signal to the device 1, and supplies the input signal to the device 1. An operation unit 43 having buttons, switches and the like is attached to the front surface of the apparatus 1 and can be directly operated. By the operation input of the input device 4 or the operation unit 43, setting of threshold values of level data, setting of the number of peaks, setting of start / end of voice minutes creation time, and the like described later can be performed. The audio minutes created by the audio minutes creating device 1 are stored in a fourth memory 24 (described later) stored in the device 1. On the monitor 2 equipped with a liquid crystal display panel, the waveform of level data corresponding to all microphone numbers is displayed every hour so that even a speaker who is difficult to hear when the level is below the threshold th is recorded in the audio minutes. Manual operation can be performed. Then, when reproducing the audio minutes, the audio minutes file is read from the fourth memory 24 and emitted from the speaker 3.

次に、マイクに接続して音声信号を伝送する端末の内部構成例について、図２を参照して説明する。本例では、端末ａ１〜ａ３をデイジーチェーン接続してあり、前端末として端末ａ１、次端末として端末ａ３として、端末ａ２を介してデータの送受を行う例を示してある。端末ａ２は、データの送受と、終端での自動的な折り返し制御を行う送受信ブロック１０ａと、スロットへの制御データの書き込みを行うデータ処理ブロック１０Ｂとで構成してある。また、端末ａ２は、前端末ａ１との接続用のＡ端子１０ｄと、次端末ａ３との接続用のＢ端子１０ｅとを有する。各端子には、データ伝送用の信号線１１ａ，１１ｂと電源線１１ｃとが設けてある。信号線１１ａが処理パス用、信号線１１ｂが中継パス用である。そして、端末ａ２には、電源供給用の電源端子１０ｆが設けてある。 Next, an example of the internal configuration of a terminal that is connected to a microphone and transmits an audio signal will be described with reference to FIG. In this example, the terminals a1 to a3 are connected in a daisy chain, and an example is shown in which data is transmitted and received via the terminal a2 as the terminal a1 as the previous terminal and the terminal a3 as the next terminal. The terminal a2 includes a transmission / reception block 10a that performs data transmission / reception and automatic loopback control at the end, and a data processing block 10B that writes control data to the slot. Further, the terminal a2 has an A terminal 10d for connection with the previous terminal a1 and a B terminal 10e for connection with the next terminal a3. Each terminal is provided with signal lines 11a and 11b for data transmission and a power line 11c. The signal line 11a is for the processing path, and the signal line 11b is for the relay path. The terminal a2 is provided with a power supply terminal 10f for supplying power.

端末ａ２は、受信スロットのデータを出力するための端子１０ｉを備える。本例において、データマスタ用の端末は、伝送路の最下流に配置される端末であり、収集したデータを外部に出力するために用いられる。一方、非データマスタ用の端末は、一般に伝送路の最下流を除く上流側に配置される端末である。端子１０ｇは非データマスタ用の端末に必要な端子であり、端子１０ｉはデータマスタ用の端末に必要な端子である。本例では、端子１０ｇ，１０ｉへのデータの入出力を検出し、検出状態に応じて端末を切り替える。 The terminal a2 includes a terminal 10i for outputting data of the reception slot. In this example, the data master terminal is a terminal arranged on the most downstream side of the transmission line, and is used to output collected data to the outside. On the other hand, the terminal for non-data master is generally a terminal arranged on the upstream side excluding the most downstream of the transmission path. The terminal 10g is a terminal necessary for a non-data master terminal, and the terminal 10i is a terminal necessary for a data master terminal. In this example, input / output of data to the terminals 10g and 10i is detected, and the terminal is switched according to the detection state.

端末ａ２は、マイクからアナログ音声信号を入力するための端子１０ｇを備える。マイクＭ２で集音して、端子１０ｇより入力したアナログ音声信号は、アナログ／デジタル変換部１０ｃでデジタル音声信号に変換して、データ処理ブロック１０ｂに供給する。そして、前端末で作成した音声信号を、送受信ブロック１０ａを介して次端末へ供給する。端末の終端（例えば端末ａ２６）では、端子１０ｉと装置１の音声入力部４１とを接続して、装置１に音声信号を供給する。供給された音声信号は、音声議事録作成装置１で原音声データとして記憶できる。 The terminal a2 includes a terminal 10g for inputting an analog audio signal from a microphone. The analog audio signal collected by the microphone M2 and input from the terminal 10g is converted into a digital audio signal by the analog / digital conversion unit 10c and supplied to the data processing block 10b. Then, the audio signal created by the previous terminal is supplied to the next terminal via the transmission / reception block 10a. At the terminal end (for example, terminal a26), the terminal 10i and the audio input unit 41 of the apparatus 1 are connected to supply an audio signal to the apparatus 1. The supplied audio signal can be stored as original audio data by the audio minutes generating apparatus 1.

次に、データを伝送するのに用いる伝送データの構造例について、図３を参照して説明する。本例では、伝送にＵＡＲＴ（Universal Asynchronous Receiver Transmitter）を適用した例としてある。ＵＡＲＴ自体は、非同期伝送技術の一つとして周知の技術であるので詳細な説明は省略する。簡単に説明すると、この技術は、スタートビット“０”の検出後、内部カウンタにより一定時間毎所定のビット数だけ、ビットの中央位相で１／０を判定することにより、通信を行うことができる。所定ビットの読み込み終了後は、新たに次フレームのスタートビットの検出を始める。図３（ａ）は、フレームの構成例を示す。本例では、フレーム周波数ｆｓを２２．０５ｋＨｚ（４．５μ秒）とする。図３（ｂ）は、データの構成例を示す。１フレームは３１個のスロットと一定長のギャップ（データ“１”）で構成されるものとする。３１スロットのうち２６スロットは音声データ、残る５スロットは制御データとする。また、各スロットのデータ長は１７ビットとする。１スロットは、１ビット長のスタートビット“０”と、これに続く１６ビット長のデータｄｓとで構成する。 Next, an example of the structure of transmission data used for transmitting data will be described with reference to FIG. In this example, UART (Universal Asynchronous Receiver Transmitter) is applied to the transmission. Since UART itself is a well-known technique as one of asynchronous transmission techniques, detailed description thereof is omitted. Briefly, in this technique, after the start bit “0” is detected, communication can be performed by determining 1/0 in the central phase of the bit by a predetermined number of bits every predetermined time by an internal counter. . After reading the predetermined bits, the detection of the start bit of the next frame is newly started. FIG. 3A shows an example of a frame configuration. In this example, the frame frequency fs is set to 22.05 kHz (4.5 μsec). FIG. 3B shows a data configuration example. One frame is composed of 31 slots and a fixed-length gap (data “1”). Of the 31 slots, 26 slots are audio data, and the remaining 5 slots are control data. The data length of each slot is 17 bits. One slot consists of a 1-bit start bit “0” followed by 16-bit data ds.

次に、本例の音声議事録作成装置１の内部構成例について、図４を参照して説明する。発言者の発言内容や周囲の雑音をマイクＭ１〜Ｍ２６で集音した音声は、端末ａ１〜ａ２６でアナログ／デジタル変換が施され、音声信号として端末ａ１〜ａ２６を接続するケーブルを通して、音声入力部４１を介して装置１に入力する。装置１には、入力した音声信号や、音声信号の各種変換処理での変換データを記憶させるため、大容量のデータを記憶可能な第１のメモリ２１〜第４のメモリ２４を備える。本例では、メモリ２１〜２４には、例えばフラッシュメモリを用いる。 Next, an example of the internal configuration of the audio minutes creating apparatus 1 of this example will be described with reference to FIG. The voice collected by the microphones M1 to M26 of the speech content of the speaker and the surrounding noise is subjected to analog / digital conversion at the terminals a1 to a26, and the voice input unit passes through a cable connecting the terminals a1 to a26 as voice signals. It inputs into the apparatus 1 via 41. The apparatus 1 includes a first memory 21 to a fourth memory 24 that can store a large amount of data in order to store input audio signals and conversion data in various conversion processes of the audio signals. In this example, flash memories are used as the memories 21 to 24, for example.

各部を制御する制御部４２は、読み出しのみ可能なＲＯＭ（Read Only Memory）４４よりプログラム、固定パラメータ等を読み出して処理を実行し、書き込み可能なＲＡＭ（Random Access Memory）４５に作業領域を確保して、変数，一時データ等を記憶させ、必要に応じてＲＡＭ４５からデータを読み出して処理に用いる。また、制御部４２は、時刻を計時する時計部４６より、時刻を読み出して、メモリ２１〜２４への読み出しや書き込みのタイミングを制御する。そして、本例の制御部４２は、音声入力部４１から入力した原音声データを第１のメモリ２１へ書き込む原音声データ作成部４２ａと、原音声データから大きさのみのレベルデータを作成して第２のメモリ２２に書き込むレベルデータ作成部４２ｂと、予め定めた閾値を超過したレベルデータのマイク番号を一定の個数だけ第３のメモリ２３に作成したピークテーブルに書き込む識別子抽出部４２ｃと、第１のメモリ２１から読み出した原音声データのうち、ピークテーブルから読み出したマイク番号以外の原音声データを減衰させて減衰音声データとし、減衰音声データと、ピークテーブルから読み出したマイク番号に対応する原音声データとを同一時刻毎に加算し、音声議事録を作成する音声データ加算制御部４２ｄを備える。 A control unit 42 that controls each unit reads a program, a fixed parameter, and the like from a read-only ROM (Read Only Memory) 44 and executes processing, and secures a work area in a writable RAM (Random Access Memory) 45. Then, variables, temporary data, and the like are stored, and the data is read from the RAM 45 and used for processing as necessary. Further, the control unit 42 reads the time from the clock unit 46 that measures the time, and controls the timing of reading and writing to the memories 21 to 24. Then, the control unit 42 of this example creates an original audio data creation unit 42a that writes the original audio data input from the audio input unit 41 to the first memory 21, and creates level data of only the size from the original audio data. A level data creation unit 42b to be written to the second memory 22, an identifier extraction unit 42c to write a certain number of level data microphone numbers exceeding a predetermined threshold value to the peak table created in the third memory 23, and Among the original audio data read from the memory 21 of 1, the original audio data other than the microphone number read from the peak table is attenuated to obtain attenuated audio data, and the attenuated audio data and the original number corresponding to the microphone number read from the peak table are stored. An audio data addition control unit 42d that adds audio data at the same time and creates audio minutes is provided.

音声議事録作成装置１に時刻順で入力した音声データは、原音声データ作成部４２ａの書き込みアドレスを管理して、第１のメモリ２１のマイク番号毎に対応するメモリ領域に原音声データとして書き込まれる。第１のメモリ２１には、マイク番号に対応して領域ｍ１〜ｍ２６までの２６個の記憶領域を確保してある。マイクで集音した原音声データは、マイクＭ１は領域ｍ１，マイクＭ２は領域ｍ２，…，マイクＭ２６は領域ｍ２６へと、マイク番号毎に対応する領域に書き込む。音声信号は一本の信号線に多重化されたデジタル信号として音声入力部４１より入力するため、多重化タイミング信号に合わせて第１のメモリ２１への書き込みデータとすることができる。 The audio data input to the audio minutes generating apparatus 1 in time order manages the write address of the original audio data generating unit 42a and writes it as original audio data in the memory area corresponding to each microphone number of the first memory 21. It is. In the first memory 21, 26 storage areas from areas m1 to m26 corresponding to the microphone numbers are secured. The original audio data collected by the microphone is written in the area corresponding to each microphone number, in the area of m1, the area of m1, the area of m2, the area of m2, the microphone M26, and the area of m26. Since the audio signal is input from the audio input unit 41 as a digital signal multiplexed on one signal line, it can be used as write data to the first memory 21 in accordance with the multiplexed timing signal.

そして、レベルデータ作成部４２ｂは、第１のメモリ２１の読み出しアドレスを管理して、マイク番号毎に原音声データを読み出す。原音声データには、音声データの信号は正負に振れ、さらに声以外の高い周波数の雑音も含まれている。原音声データは、１００Ｈｚ〜１ｋＨｚ程度のローパスフィルタ（ＬＰＦ：Low Pass Filter）と、整流回路からなる検波部２５を通して正の波形に検波して、大きさ（音声レベル）を示すレベルデータを作成する。レベルデータ作成部４２ｂは、書き込みアドレスの管理によって、作成したレベルデータを第２のメモリ２２のマイク番号毎に対応する領域に書き込む。第２のメモリ２２には、マイク番号に対応して領域ｍＬ１〜ｍＬ２６までの２６個の記憶領域を予め確保してある。そして、マイクＭ１は領域ｍＬ１，マイクＭ２は領域ｍＬ２，…，マイクＭ２６は領域ｍＬ２６へと、レベルデータとしてそれぞれの領域に書き込む。 The level data creation unit 42b manages the read address of the first memory 21 and reads the original audio data for each microphone number. In the original voice data, the signal of the voice data fluctuates positive and negative, and also includes high frequency noise other than voice. The original voice data is detected into a positive waveform through a low pass filter (LPF) of about 100 Hz to 1 kHz and a detection unit 25 including a rectifier circuit, and level data indicating the magnitude (voice level) is created. . The level data creation unit 42b writes the created level data in an area corresponding to each microphone number in the second memory 22 by managing the write address. In the second memory 22, 26 storage areas from the areas mL1 to mL26 corresponding to the microphone numbers are secured in advance. Then, the microphone M1 is written in the region mL1, the microphone M2 is written in the region mL2,.

識別子抽出部２３は、第２のメモリ２２の読み出しアドレスを管理して、同一時刻毎かつマイク番号毎にレベルデータを読み出す。そして、レベルデータが閾値を超過する大きさであるマイク番号を、予め定めたピーク個数だけ抽出し、抽出したマイク番号を時刻順に第３のメモリ２３に構成したピークテーブルに書き込む。本例のピークテーブルには、ピーク個数をｐ１〜ｐ３の３個としてあり、レベルデータの大きさが大きい順に３個のマイク番号を抽出するものとする。レベルデータの大きさが閾値に満たない場合は、マイク番号は抽出しない。こうして時刻ｔ１，ｔ２，…，Ｔのそれぞれの時刻でピークとなったマイク番号を書き込む。本例の装置１では、全マイクの同一時刻毎のレベルを表すグラフを、映像出力部３１からモニタ２に出力させて、表示させることができる。そして、操作部４３や入力装置４の外部操作によって任意にマイク番号を抽出してピークテーブルに書き込むこともできる。 The identifier extraction unit 23 manages the read address of the second memory 22 and reads the level data at the same time and for each microphone number. Then, the microphone numbers whose level data exceeds the threshold are extracted by a predetermined number of peaks, and the extracted microphone numbers are written in the peak table configured in the third memory 23 in time order. In the peak table of this example, the number of peaks is three, that is, p1 to p3, and three microphone numbers are extracted in descending order of the level data. If the size of the level data is less than the threshold value, the microphone number is not extracted. In this way, the microphone numbers that peak at the times t1, t2,..., T are written. In the apparatus 1 of this example, a graph representing the level of all microphones at the same time can be output from the video output unit 31 to the monitor 2 and displayed. Then, the microphone number can be arbitrarily extracted and written to the peak table by an external operation of the operation unit 43 or the input device 4.

音声データ加算制御部４２ｄは、第３のメモリ２３の読み出しアドレスを管理して、ピークテーブルに書き込まれたマイク番号を時刻順で読み出す。また、音声データ加算制御部４２ｄは、第１のメモリ２１の読み出しアドレスを管理して、全てのマイクの原音声データを同一時刻毎に読み出す。そして、音声データ加算制御部４２ｄは、ピークテーブルに記憶したマイク番号以外に対応する原音声データを、アッテネータ２７に供給して、原音声データからレベルを低下させた減衰音声データを作成させる制御を行う。そして、同一時刻毎に、ピークテーブルに記憶したマイク番号に対応する原音声データと、減衰音声データとをアキュムレータ２８で加算して音声議事録データを作成する。 The audio data addition control unit 42d manages the read address of the third memory 23 and reads the microphone numbers written in the peak table in order of time. The audio data addition control unit 42d manages the read address of the first memory 21 and reads the original audio data of all microphones at the same time. Then, the audio data addition control unit 42d performs control for supplying the original audio data corresponding to other than the microphone number stored in the peak table to the attenuator 27 and generating attenuated audio data whose level is reduced from the original audio data. Do. Then, at the same time, the original audio data corresponding to the microphone number stored in the peak table and the attenuated audio data are added by the accumulator 28 to create audio minutes data.

作成した音声議事録データは、第４のメモリ２４に音声議事録ファイルとして記憶させる。音声出力時には、記憶させた音声議事録ファイルを随時読み出して、デジタルデータをアナログ信号に変換するデジタル／アナログ変換部２９でアナログ音声信号に変換する。そして、スピーカ３へのインタフェースである音声出力部３０を介してスピーカ３にアナログ音声信号を供給し、スピーカ３で放音させる。ただし、作成した音声議事録データを第４のメモリ２４に記憶させることなく、デジタル／アナログ変換部２９と音声出力部３０を介してスピーカ３で直接放音させることもできる。 The created voice minutes data is stored in the fourth memory 24 as a voice minutes file. At the time of voice output, the stored voice minutes file is read as needed, and the digital data is converted into an analog voice signal by the digital / analog converter 29 which converts the digital data into an analog signal. Then, an analog audio signal is supplied to the speaker 3 via the audio output unit 30 that is an interface to the speaker 3, and the speaker 3 emits sound. However, the created audio minutes data can be directly emitted from the speaker 3 via the digital / analog converter 29 and the audio output unit 30 without being stored in the fourth memory 24.

次に、本例のピークテーブル作成処理の例について、図５と図６を参照して説明する。図５は、ピークテーブル作成処理例のフローチャートである。図６（ａ）〜図６（ｃ）は、各データの波形の例であり、図６（ｄ）は、ピークテーブルの例である。まず、原音声データ作成部４２ａは、音声入力部４１より入力した音声信号を原音声データとして第１のメモリ２１に記憶させる（ステップＳＴ１）。このとき、マイクＭ１〜Ｍ３，Ｍ２６で集音した音声は、縦軸をレベル、横軸を時間として図６（ａ）に示す波形を描く。ここで、端末ａ１〜ａ２６は、サンプリング周波数を、例えば２２．０５ｋＨｚ、量子化ビット数を１６ビットとして、集音したアナログ音声信号からデジタル音声信号に変換して、装置１に供給する。そして、原音声データ作成部４２ａは、デジタル化された原音声データを第１のメモリ２１の領域ｍ１〜ｍ２６にマイク番号毎に書き込む。 Next, an example of the peak table creation process of this example will be described with reference to FIGS. FIG. 5 is a flowchart of an example of peak table creation processing. FIG. 6A to FIG. 6C are examples of the waveform of each data, and FIG. 6D is an example of the peak table. First, the original audio data creation unit 42a stores the audio signal input from the audio input unit 41 in the first memory 21 as original audio data (step ST1). At this time, the sound collected by the microphones M1 to M3 and M26 draws a waveform shown in FIG. 6A with the level on the vertical axis and the time on the horizontal axis. Here, the terminals a1 to a26 convert the collected analog audio signal into a digital audio signal with a sampling frequency of, for example, 22.05 kHz and a quantization bit number of 16 bits, and supply the converted signal to the apparatus 1. Then, the original audio data creation unit 42a writes the digitized original audio data in the areas m1 to m26 of the first memory 21 for each microphone number.

次に、レベルデータ作成部４２ｂは、第１のメモリ２１の領域ｍ１〜ｍ２６をマイク番号に読み出し（ステップＳＴ２）、検波部２６を通して整流し、マイク番号毎に第２のメモリ２２の領域ｍＬ１〜ｍＬ２６にレベルデータを書き込む（ステップＳＴ３）。 Next, the level data creation unit 42b reads the areas m1 to m26 of the first memory 21 into the microphone numbers (step ST2), rectifies them through the detection unit 26, and the areas mL1 to mL2 of the second memory 22 for each microphone number. Level data is written in mL26 (step ST3).

ステップＳＴ３の処理によって、原音声データは、４５ｍｓ間隔、大きさは８ビットのレベルデータに変換される。ここで、サンプリング周波数が２２．０５ｋＨｚの場合、約４５μ秒間隔であるが、４５ｍ秒間隔とすることで、データ量を１／１０００に削減できる。さらに、量子化ビット数の大きさを１６ビットから８ビットとすることで、データ量を１／２に削減できる。このため、レベルデータのデータ量は、原音声データの１／２０００に削減可能となる。このとき、図６（ｂ）で示す波形の例のように、正の包絡線となる。 By the process of step ST3, the original audio data is converted into level data having an interval of 45 ms and a size of 8 bits. Here, when the sampling frequency is 22.05 kHz, the interval is about 45 μsec. However, by setting the interval to 45 msec, the data amount can be reduced to 1/1000. Furthermore, the data amount can be reduced to ½ by changing the number of quantization bits from 16 bits to 8 bits. For this reason, the amount of level data can be reduced to 1/2000 of the original audio data. At this time, it becomes a positive envelope as in the example of the waveform shown in FIG.

次に、ある時刻ｔを定めて、これに対応する全マイクのレベルデータを第２のメモリ２２から読み出す（ステップＳＴ４）。識別子抽出部４２ｃは、時刻ｔを変数として、０＜ｔ＜Ｔの時間内であれば、時刻ｔを一定の時間間隔でカウントアップする。そして、第２のメモリ２２より同一時刻ｔ毎のレベルデータを全マイク番号（本例では２６個）分だけ読み出す。ある時刻ｔでは、まず第２のメモリ２２を読み出す添え字として変数ｉを定め、領域ｍＬｉとする。そして、初期値１をｉにセットして（ステップＳＴ５）、時刻ｔにおける領域ｍＬ１のレベルデータを読み出す。 Next, a certain time t is determined, and the level data of all microphones corresponding to the time t is read from the second memory 22 (step ST4). The identifier extraction unit 42c counts up the time t at regular time intervals if the time t is a variable and the time is within the time 0 <t <T. Then, the level data at the same time t is read from the second memory 22 for all microphone numbers (26 in this example). At a certain time t, first, a variable i is defined as a subscript for reading out the second memory 22 and is set as a region mLi. Then, the initial value 1 is set to i (step ST5), and the level data of the region mL1 at time t is read.

そして、ｍＬｉ＞ｍＬ２６であるかどうか判断する（ステップＳＴ６）。ｍＬｉ≦ｍＬ２６の場合、識別子抽出部４２ｃは、時刻ｔでレベル値が大きいマイク番号を検出し、ＲＡＭ４５に一時記憶させる（ステップＳＴ７）。ここで、縦軸をレベル、横軸をマイク番号として図６（ｃ）で示した時刻ｔ１〜ｔ４毎のレベルデータ波形の例を示す。図６（ｃ）では、予め閾値ｔｈを設定してあり、閾値ｔｈを超過しないレベルデータはピークと判定せず、マイク番号をピークテーブルに書き込まない。 And it is judged whether it is mLi> mL26 (step ST6). When mLi ≦ mL26, the identifier extraction unit 42c detects a microphone number having a large level value at time t and temporarily stores it in the RAM 45 (step ST7). Here, an example of the level data waveform at each time t1 to t4 shown in FIG. 6C is shown with the vertical axis representing the level and the horizontal axis representing the microphone number. In FIG. 6C, a threshold value th is set in advance, level data that does not exceed the threshold value th is not determined to be a peak, and the microphone number is not written in the peak table.

ただし、図６（ｃ）で示した時間毎のレベルデータの波形の例は、映像信号として映像出力部３１より出力し、モニタ２に表示させることができる。この場合、ユーザのマニュアル操作によって抽出した任意のマイク番号をピークテーブルに書き込むよう指定することも可能である。 However, the example of the waveform of the level data for each time shown in FIG. 6C can be output from the video output unit 31 as a video signal and displayed on the monitor 2. In this case, it is possible to specify that an arbitrary microphone number extracted by the user's manual operation is written in the peak table.

そして、添え字ｉを１つカウントアップしてステップＳＴ６の判定処理に戻ってｍＬｉ＞ｍＬ２６となるまで処理を繰り返す。ここで、ステップＳＴ７では、ＲＡＭ４５に一時記憶させたマイク番号のレベルより大きなレベルを検出すると、そのマイク番号で、ＲＡＭ４５に一時記憶させたマイク番号を書き換える。ＲＡＭ４５に記憶させるマイク番号の個数は任意に設定可能であり、本例ではレベルの大きい順に３個まで一時記憶できるようにしてある。 Then, the subscript i is incremented by one, and the process returns to the determination process of step ST6, and the process is repeated until mLi> mL26. Here, in step ST7, when a level larger than the level of the microphone number temporarily stored in the RAM 45 is detected, the microphone number temporarily stored in the RAM 45 is rewritten with the microphone number. The number of microphone numbers stored in the RAM 45 can be arbitrarily set. In this example, up to three microphone numbers can be temporarily stored in descending order of level.

領域ｍＬ２６までの読み出しとピークとなるマイク番号の抽出が完了したら、第３のメモリ２３に作成するピークテーブルにマイク番号を書き込む（ステップＳＴ８）。このとき、ＲＡＭ４５に一時記憶させた時刻ｔ毎のマイク番号のうち、３個のマイク番号を読み出し、レベルの大きい順にｐ１，ｐ２，ｐ３としてピークテーブルにマイク番号を書き込む。ただし、レベルデータが閾値を超過しない場合は、ピークテーブルにマイク番号を書き込まない。 When the reading up to the region mL26 and the extraction of the peak microphone number are completed, the microphone number is written in the peak table created in the third memory 23 (step ST8). At this time, among the microphone numbers for each time t temporarily stored in the RAM 45, three microphone numbers are read, and the microphone numbers are written in the peak table as p1, p2, and p3 in descending order of level. However, if the level data does not exceed the threshold value, the microphone number is not written in the peak table.

ここで、ピークテーブルの例として図６（ｄ）に示す。本例のピークテーブルは、ピークｐ１〜ｐ３を列、時刻ｔ１〜Ｔを行とするテーブル形式で表される。時刻ｔ１では、１番大きいピークｐ１にマイクＭ１、２番目に大きいピークｐ２にマイクＭ２６を書き込む。同様に、時刻ｔ２では、ピークｐ１にマイクＭ２、ピークｐ２にマイクＭ３、３番目に大きいピークｐ３にマイクＭ１を書き込む。そして、時刻ｔ３では、ピークｐ１にマイクＭ３を書き込む。時刻ｔ４では、閾値ｔｈを超過するレベルデータがないため、ピークテーブルには何も書き込まない。 Here, an example of the peak table is shown in FIG. The peak table of this example is represented in a table format with peaks p1 to p3 as columns and times t1 to T as rows. At time t1, the microphone M1 is written in the first largest peak p1, and the microphone M26 is written in the second largest peak p2. Similarly, at time t2, microphone M2 is written at peak p1, microphone M3 at peak p2, and microphone M1 is written at the third largest peak p3. At time t3, the microphone M3 is written at the peak p1. At time t4, since there is no level data exceeding the threshold th, nothing is written in the peak table.

こうして、時刻ｔをカウントアップしながら、時刻ｔ＝Ｔとなるまで識別子抽出部２３ｃがレベルデータを読み出して、ピークテーブルへの書き込み処理を繰り返す。そして、時刻ｔ＝Ｔとなったらピークテーブル作成処理を終了する。 Thus, while counting up the time t, the identifier extraction unit 23c reads the level data until the time t = T and repeats the writing process to the peak table. When the time t = T, the peak table creation process is terminated.

次に、本例の音声議事録作成処理の例について、図７のフローチャートを参照して説明する。ある時刻を定めて、第３のメモリ２３から同一時刻毎に対応するピークテーブルのマイク番号を読み出す（ステップＳＴ１１）。音声データ加算制御部４２ｄは、時刻ｔを変数として、０＜ｔ＜Ｔの時間内であれば、時刻ｔを一定の時間間隔でカウントアップして、ピークテーブルより同一時刻ｔにおけるピークｐ１〜ｐ３のマイク番号を読み出す。ある時刻ｔでは、第１のメモリ２１を読み出す添え字として変数ｊを定め、領域ｍｊとする。そして、初期値１をｊにセットして（ステップＳＴ１３）、時刻ｔにおける領域ｍ１の原音声データを読み出す。 Next, an example of the audio minutes creation process of this example will be described with reference to the flowchart of FIG. A certain time is determined, and the microphone number of the peak table corresponding to the same time is read from the third memory 23 (step ST11). The voice data addition control unit 42d uses the time t as a variable and counts up the time t at a constant time interval within the time 0 <t <T, and peaks p1 to p3 at the same time t from the peak table. Read the microphone number. At a certain time t, a variable j is defined as a subscript for reading out the first memory 21 and is set as an area mj. Then, the initial value 1 is set to j (step ST13), and the original audio data in the area m1 at time t is read.

そして、ｍｊ＞ｍ２６であるかどうか判断する（ステップＳＴ１４）。ｍｊ≦ｍ２６の場合、音声データ加算制御部４２ｄは、ピークテーブルから読み出した時刻ｔでのマイク番号に対応する原音声データを第１のメモリ２１から読み出す（ステップＳＴ１５）。そして、ピークテーブルに記載されたマイク番号以外の原音声データはアッテネータ２７で減衰させて（ステップＳＴ１６）、減衰音声データとする。アッテネータの減衰値は、複数のピークや周囲雑音をどの程度再現再生させるかの要求によって決められる。ピークテーブルに記載されたマイク番号の原音声データは、処理を加えない。 Then, it is determined whether mj> m26 (step ST14). When mj ≦ m26, the voice data addition control unit 42d reads the original voice data corresponding to the microphone number at time t read from the peak table from the first memory 21 (step ST15). Then, the original voice data other than the microphone number described in the peak table is attenuated by the attenuator 27 (step ST16) to obtain attenuated voice data. The attenuation value of the attenuator is determined by the request for reproducing and reproducing a plurality of peaks and ambient noise. The original voice data of the microphone number described in the peak table is not processed.

さらに、同一時刻ｔ毎に減衰音声データと原音声データを加算して加算音声データを作成する（ステップＳＴ１７）。そして、添え字ｊを１つカウントアップしてステップＳＴ１４の判定処理に戻ってｍｊ＞ｍ２６となるまで処理を繰り返す。そして、時刻ｔにおける領域ｍ２６まで読み出しが完了したら、スピーカ３で音声を出力させるか、第４のメモリ２４に作成した音声議事録ファイルに書き込むか、出力を判断する（ステップＳＴ１８）。 Further, the attenuated sound data and the original sound data are added at the same time t to create added sound data (step ST17). Then, the subscript j is incremented by one, and the process returns to the determination process of step ST14 and the process is repeated until mj> m26. Then, when the reading is completed up to the area m26 at time t, it is determined whether to output sound by the speaker 3 or to write to the audio minutes file created in the fourth memory 24 (step ST18).

音声出力させる場合、加算音声データをデジタル／アナログ変換して、アナログ音声信号をスピーカ３へ供給して、音声を放音させる（ステップＳＴ１９）。音声ファイルに書き込む場合、第４のメモリ２４に作成した音声議事録ファイルに加算音声データを書き込む（ステップＳＴ２０）。 When outputting the sound, the added sound data is converted from digital to analog and an analog sound signal is supplied to the speaker 3 to emit sound (step ST19). When writing to the audio file, the added audio data is written to the audio minutes file created in the fourth memory 24 (step ST20).

そして、時刻ｔ＝Ｔとなるまで、音声データ加算制御部２３ｄは第１のメモリ２１の原音声データを全マイク領域にわたって順次読み出し、選択的な加算がアキュムレータ２８によって時刻ｔ毎に行って加算音声データを作成する。時刻ｔ＝Ｔとなったら音声議事録作成処理を終了する。 Then, until time t = T, the audio data addition control unit 23d sequentially reads the original audio data in the first memory 21 over the entire microphone area, and the selective addition is performed every time t by the accumulator 28. Create data. When the time t = T is reached, the audio minutes creation process is terminated.

このようにして、発言者毎に集音した原音声データを加算する場合に、発言者以外の音声を減衰させて、音声議事録を作成することができるようになった。 In this way, when the original voice data collected for each speaker is added, the voice minutes can be created by attenuating the voice other than the speaker.

本実施の形態によれば、適切に加工された複数の音声データを加算することによって、音声議事録を作成することができる。このため、周囲の不要な雑音が抑制され、発言内容の頭切れがなく、必要な発言のみが録音された音声議事録を得られる。また、会議等において複数のマイクで集音した場合に、録音状態の良好な音声議事録が作成できる。 According to the present embodiment, the audio minutes can be created by adding a plurality of appropriately processed audio data. For this reason, unnecessary noise in the surroundings is suppressed, the speech content is not interrupted, and a voice minutes in which only the necessary speech is recorded can be obtained. In addition, when the sound is collected by a plurality of microphones in a meeting or the like, a voice minutes with a good recording state can be created.

また、第１のメモリ２１から読み出した原音声データのうち、発言者以外の音声を減衰させた音声データと、ピークテーブルに書き込まれたマイク番号の原音声データとを加算するようにしたため、発言内容がより際立って音声議事録に反映されるという効果がある。また、例えば最大ピークのみを際立たせたいなら、他信号の減衰量を無限大とするし、周囲の背景音も混ぜたいなら非無限大とすればよい。また、複数ピークが時間的に交差する場合は減衰量を時間的に連続変化させて不自然さを感じさせないようにもできる。 In addition, since the voice data in which the voice other than the speaker is attenuated among the original voice data read from the first memory 21 and the original voice data of the microphone number written in the peak table are added. The content is more prominently reflected in the audio minutes. Further, for example, if only the maximum peak is conspicuous, the attenuation amount of other signals may be infinite, and if the surrounding background sound is also mixed, it may be non-infinite. Further, when a plurality of peaks cross over time, the attenuation can be continuously changed over time so as not to feel unnaturalness.

なお、上述した実施の形態では、端末ａ１〜ａ２６と音声議事録作成装置１との接続は、有線接続としたが、無線接続としてもよい。このようにするとケーブルを設置する作業が不要となり、マイク，端末の設置が容易となるという効果がある。 In the above-described embodiment, the connection between the terminals a1 to a26 and the audio minutes creating apparatus 1 is a wired connection, but may be a wireless connection. In this way, there is an effect that the work of installing the cable becomes unnecessary and the installation of the microphone and the terminal becomes easy.

また、上述した実施の形態では、音声データを記憶させるメモリにフラッシュメモリを用いるようにしたが、ハードディスクドライブ、テープドライブ等の大容量記録装置に音声データを記録させるようにしてもよい。 In the embodiment described above, the flash memory is used as the memory for storing the audio data. However, the audio data may be recorded in a large-capacity recording device such as a hard disk drive or a tape drive.

また、第４のメモリ２４に作成した音声議事録ファイルに音声議事録データを書き込む場合、複数のトラックを設けて、トラック毎に音声議事録ファイルを記憶させることで、会議日付，時間等が異なる音声議事録ファイルを複数作成してもよい。また、マイク番号とトラック番号を関連付けて、トラック毎に異なる発言者の音声議事録を記憶させてもよい。あるいは、ピークテーブルを記憶させる場合に、第４のメモリ２４に設けたトラック毎にピークとなるマイク番号を個別に記憶させてピークテーブルとして用いるようにしてもよい。 In addition, when writing audio minutes data to the audio minutes file created in the fourth memory 24, a plurality of tracks are provided, and the audio minutes file is stored for each track, so that the meeting date, time, etc. are different. Multiple audio minutes files may be created. Further, the microphone minutes and the track numbers may be associated with each other so that the voice minutes of speakers different for each track may be stored. Alternatively, when the peak table is stored, the microphone number that is the peak for each track provided in the fourth memory 24 may be individually stored and used as the peak table.

また、上述した実施の形態では、ピークテーブルに設定した抽出するピーク個数を３個としたが、任意のピーク個数を設定することができる。例えば、識別子抽出部４２ｃは、例えばピーク個数を１個としてレベルが最大値を示すマイク番号だけをピークテーブルに書き込むようにしてもよい。また、ピーク個数を３個とした場合に、レベルが最大値を示すマイク番号と、このマイク番号のマイクに隣接して配置されたマイクのマイク番号をピークテーブルに書き込むようにしてもよい。また、ピーク個数を２個としてレベルが最大値と２番目に大きいマイク番号をピークテーブルに書き込むようにしてもよい。このようにピーク個数やピークテーブルへの書き込み条件を変更することによって使用状況に応じた音声議事録を作成できるという効果がある。また、特定のマイク番号のみをピークテーブルに書き込むようにしてもよい。このようにすると、特定の発言者の発言内容のみを音声議事録として作成することも可能となる。 In the above-described embodiment, the number of peaks to be extracted set in the peak table is three, but any number of peaks can be set. For example, the identifier extraction unit 42c may write only the microphone number whose level is the maximum value into the peak table, for example, with one peak. When the number of peaks is three, the microphone number whose level is the maximum value and the microphone number of the microphone arranged adjacent to the microphone of this microphone number may be written in the peak table. Alternatively, the number of peaks may be two, and the microphone level having the maximum value and the second largest level may be written in the peak table. As described above, there is an effect that the audio minutes according to the use situation can be created by changing the number of peaks and the writing condition to the peak table. Further, only a specific microphone number may be written in the peak table. In this way, it is also possible to create only the content of the speech of a specific speaker as a voice minutes.

また、上述した実施の形態では、音声議事録ファイルとして第４のメモリ２４に記憶させるようにしたが、第４のメモリ２４を装置１に着脱可能なスティック状などのカード型の半導体記憶装置としてもよい。こうすることによって、任意に第４のメモリ２４を取り外し、別の装置に装填して音声議事録ファイルを再生することもできるようになる。また、第４のメモリ２４を装置１に取り付けたときに自動的に第１のメモリ２１から原音声データを読み出して第４のメモリ２４に音声議事録ファイルを作成するようにしてもよい。 Further, in the above-described embodiment, the fourth memory 24 is stored as a voice minutes file, but the fourth memory 24 is a stick-type semiconductor memory device such as a stick that can be attached to and detached from the device 1. Also good. In this way, the fourth memory 24 can be arbitrarily removed and loaded in another device to reproduce the audio minutes file. Alternatively, when the fourth memory 24 is attached to the apparatus 1, the original audio data may be automatically read from the first memory 21 to create an audio minutes file in the fourth memory 24.

また、上述した実施の形態では、音声議事録ファイルとして第４のメモリ２４に記憶させるようにしたが、音声データ加算制御部４２ｄで出力した音声議事録データより、発言内容を読み取って自動的に文字議事録ファイルを作成するようにしてもよい。こうすることによって、音声議事録ファイルを再生しながら発言内容を文字入力する手間が不要となるという効果がある。 In the above-described embodiment, the fourth memory 24 is stored as a voice minutes file. However, the voice contents are automatically read from the voice minutes data output by the voice data addition control unit 42d. A character minutes file may be created. By doing so, there is an effect that the trouble of inputting the contents of the utterance while reproducing the audio minutes file becomes unnecessary.

また、上述した実施の形態では、会議などにおいて音声議事録などを作成する装置を例としたが、複数のマイクで集音した音声データを処理する装置であれば、その他の目的に使用される同様の音声処理装置にも適用可能である。 In the above-described embodiment, an apparatus for creating a voice minutes, etc. at a meeting or the like is taken as an example. However, any apparatus that processes sound data collected by a plurality of microphones may be used for other purposes. The present invention can also be applied to similar sound processing apparatuses.

本発明の一実施の形態における音声議事録作成システムの接続例を示した構成図である。It is the block diagram which showed the example of a connection of the audio | voice minutes creation system in one embodiment of this invention. 本発明の一実施の形態における端末の内部構成例を示したブロック図である。It is the block diagram which showed the example of the internal structure of the terminal in one embodiment of this invention. 本発明の一実施の形態における伝送データの構成例を示した説明図である。It is explanatory drawing which showed the structural example of the transmission data in one embodiment of this invention. 本発明の一実施の形態における音声議事録作成装置の内部構成例を示したブロック図である。It is the block diagram which showed the example of an internal structure of the audio | voice minutes production apparatus in one embodiment of this invention. 本発明の一実施の形態におけるピークテーブルの作成処理例を示したフローチャートである。It is the flowchart which showed the creation process example of the peak table in one embodiment of this invention. 本発明の一実施の形態におけるピークテーブルの作成例を示した説明図である。It is explanatory drawing which showed the creation example of the peak table in one embodiment of this invention. 本発明の一実施の形態における音声議事録の作成処理例を示したフローチャートである。It is the flowchart which showed the example of a production process of the audio minutes in one embodiment of this invention.

Explanation of symbols

１…音声議事録作成装置、２…モニタ、３…スピーカ、４…入力装置、５…会議テーブル、２１〜２４…メモリ、２５…検波部、２７…アッテネータ、２８…アキュムレータ、２９…デジタル／アナログ変換部、３０…音声出力部、３１…映像出力部、２９…外部入力部、４１…音声入力部、４２…制御部、４２ａ…原音声データ作成部、４２ｂ…レベルデータ作成部、４２ｃ…識別子抽出部、４２ｄ…音声データ加算制御部、４３…操作部、４４…ＲＯＭ、４５…ＲＡＭ、４６…時計部、４７…入力部、１００…音声議事録作成システム、Ｍ１〜Ｍ２６…マイク、ａ１〜ａ２６…端末 DESCRIPTION OF SYMBOLS 1 ... Voice minutes creation apparatus, 2 ... Monitor, 3 ... Speaker, 4 ... Input device, 5 ... Conference table, 21-24 ... Memory, 25 ... Detection part, 27 ... Attenuator, 28 ... Accumulator, 29 ... Digital / analog Conversion unit, 30 ... audio output unit, 31 ... video output unit, 29 ... external input unit, 41 ... audio input unit, 42 ... control unit, 42a ... original audio data creation unit, 42b ... level data creation unit, 42c ... identifier Extraction unit, 42d ... audio data addition control unit, 43 ... operation unit, 44 ... ROM, 45 ... RAM, 46 ... clock unit, 47 ... input unit, 100 ... audio minutes creation system, M1-M26 ... microphone, a1- a26 ... terminal

Claims

A first storage unit for storing an original sound data collected by at least two or more microphones with an identifier;
An identifier extraction unit that extracts the identifier assigned to the original audio data in which the level of the original audio data stored in the first storage unit exceeds a preset threshold;
An audio processing apparatus comprising: an audio data addition control unit that attenuates and adds original audio data corresponding to other than the extracted identifier among the original audio data read from the first storage unit .

The speech processing apparatus according to claim 1, wherein
A display unit for displaying level data for each same time for each identifier;
An audio processing apparatus comprising: an operation unit that extracts an arbitrary identifier from the level data displayed on the display unit.

The speech processing apparatus according to claim 1, wherein
As the identifier extraction unit,
A second storage unit that stores data of the level of the original voice data stored in the first storage unit for each identifier;
A third storage for extracting the identifier of data at a level exceeding the threshold value at each time from the level data for each identifier stored in the second storage unit and storing the extracted identifier And a voice processing device.

The speech processing apparatus according to claim 1, wherein
The said identifier extraction part extracts the said identifier from which the said level data becomes the maximum, The audio processing apparatus characterized by the above-mentioned.

The speech processing apparatus according to claim 1, wherein
The said identifier extraction part extracts the said identifier with which the said level data becomes the maximum, and the said identifier with the 2nd maximum, The audio processing apparatus characterized by the above-mentioned.

The speech processing apparatus according to claim 1, wherein
The identifier extraction unit extracts the identifier having the maximum level data and the identifier assigned to the second microphone arranged in the vicinity of the first microphone to which the selected identifier is assigned. Voice processing device.

The speech processing apparatus according to claim 1, wherein
The said identifier extraction part extracts the said specific identifier, The audio processing apparatus characterized by the above-mentioned.

An identifier is assigned to the original voice data collected by at least two microphones and stored,
Extracting the identifier given to the original voice data in which the level of the stored original voice data exceeds a preset threshold;
A voice processing method, comprising: attenuating and adding original voice data other than the extracted identifier among the read original voice data.

A storage process for storing the original voice data collected by at least two or more microphones with an identifier;
An identifier extraction process for extracting the identifier assigned to the original audio data in which the level of the stored original audio data exceeds a preset threshold;
A program for executing an audio data addition control process for attenuating and adding original audio data corresponding to other than the extracted identifier among the read original audio data.