JP4779954B2

JP4779954B2 - Audio data processing apparatus, method and program

Info

Publication number: JP4779954B2
Application number: JP2006333308A
Authority: JP
Inventors: 英裕大橋
Original assignee: Kenwood KK
Current assignee: Kenwood KK
Priority date: 2006-12-11
Filing date: 2006-12-11
Publication date: 2011-09-28
Anticipated expiration: 2026-12-11
Also published as: JP2008145757A

Description

本発明は、音声データ処理装置、方法及びプログラムに関する。 The present invention relates to an audio data processing apparatus, method, and program.

近年、メモリやＨＤＤ（Hard Disk Drive）が大容量かつ安価になってきており、音声をディジタル圧縮して長時間記録するポータブルプレーヤやボイスレコーダ等の機器が普及している。例えば、ポータブルプレーヤを用いて会議の音声を記録する場合、１つの音声データファイルとして長時間の会議を記録し、後で音声を聞く際に、聞きたい位置や区切りの良い位置での頭出しが困難となる。早送り再生をすることで頭出し時間を短縮することも可能であるが、人が聞き取り可能な再生速度には限界があるため、時間短縮のための過度な早送りは的確な頭出しを困難にする。 In recent years, memories and HDDs (Hard Disk Drives) have become large-capacity and inexpensive, and devices such as portable players and voice recorders that digitally compress sound and record for a long time have become widespread. For example, when recording the audio of a conference using a portable player, the long-time conference is recorded as one audio data file, and when listening to the audio later, cueing at the position where you want to hear or at a position with good separation can be made. It becomes difficult. Although it is possible to shorten the cue time by fast-forwarding playback, there is a limit to the playback speed that humans can hear, so excessive fast-forwarding to shorten the time makes accurate cueing difficult. .

このような問題を解決するために、従来、音声の記録時にユーザが機器を適宜操作することによりファイルを分割することも行われているが、手間である。
さらに、特許文献１には、記録中に無音を検出してファイルを分割する方法や、所定のキーワードの検知によりファイルを分割することが提案されている。
特開２００５−２２１５６５号公報 In order to solve such a problem, conventionally, a user divides a file by appropriately operating a device at the time of recording audio, but this is troublesome.
Further, Patent Document 1 proposes a method of dividing a file by detecting silence during recording, and a method of dividing a file by detecting a predetermined keyword.
JP 2005-221565 A

しかし、上記従来技術では、音声の無音区間を検出する方法の場合、記録する会話が途切れがちになると多数の無音区間により頻繁にファイルが区切られることが考えられる。
また、所定のキーワードを検知する方法の場合、所定のキーワードを予めポータブルプレーヤ等の操作部より入力する必要があるため、操作が煩雑である。また、話者の癖により区切り位置に多用される語句も異なることから、記録中に話者の癖を判断してキーワード設定を行えることが好ましい。 However, in the above-described prior art, in the case of a method for detecting a silent section of speech, it is considered that a file is frequently divided by a large number of silent sections when the recorded conversation tends to be interrupted.
In the case of a method for detecting a predetermined keyword, it is necessary to input the predetermined keyword from an operation unit such as a portable player in advance, so that the operation is complicated. In addition, since words frequently used at the delimiter position differ depending on the speaker's habit, it is preferable that keyword setting can be performed by determining the speaker's habit during recording.

本発明の課題は、音声データの区分けを容易かつ適切に行うことである。 An object of the present invention is to easily and appropriately classify audio data.

請求項１に記載の発明は、
入力手段及び当該入力手段で入力された音声データを記録する音声データ記録手段を有する音声データ処理装置において、
前記入力手段で入力された音声データの音声認識を行う音声認識手段と、
前記入力手段での入力時の操作を受け付ける操作手段と、
前記操作手段による入力時の操作に基づいて、前記音声認識手段で音声認識された音声認識データを登録する音声認識データ登録手段と、
前記音声認識手段における入力された音声データの音声認識結果と、前記音声認識データ登録手段で登録された音声認識データと、に基づいて、前記入力された音声データを区分けして前記記録手段で記録するように制御する記録制御手段と、
を備えることを特徴とする。 The invention described in claim 1
In an audio data processing apparatus having an input means and an audio data recording means for recording audio data input by the input means,
Voice recognition means for performing voice recognition of the voice data input by the input means;
Operation means for accepting an operation at the time of input by the input means;
Voice recognition data registration means for registering voice recognition data recognized by the voice recognition means based on an operation at the time of input by the operation means;
Based on the voice recognition result of the input voice data in the voice recognition means and the voice recognition data registered in the voice recognition data registration means, the input voice data is divided and recorded by the recording means. Recording control means for controlling so as to
It is characterized by providing.

請求項２に記載の発明は、
外部から入力される音声データ又は記録手段に予め記録された音声データを再生する再生手段を有する音声データ処理装置において、
前記再生手段で再生された音声データの音声認識を行う音声認識手段と、
前記再生手段での音声再生時の操作を受け付ける操作手段と、
前記操作手段による音声再生時の操作に基づいて、前記音声認識手段で音声認識された音声認識データを登録する音声認識データ登録手段と、
前記音声認識手段における再生された音声データの音声認識結果と、前記音声認識データ登録手段で登録された音声認識データと、に基づいて、前記音声データを区分けして前記記録手段で記録するように制御する記録制御手段と、
を備えることを特徴とする。 The invention described in claim 2
In an audio data processing apparatus having reproduction means for reproducing audio data input from the outside or audio data recorded in advance in the recording means,
Voice recognition means for performing voice recognition of the voice data reproduced by the reproduction means;
Operation means for accepting an operation at the time of sound reproduction by the reproduction means;
Voice recognition data registration means for registering voice recognition data recognized by the voice recognition means based on an operation at the time of voice reproduction by the operation means;
Based on the voice recognition result of the reproduced voice data in the voice recognition means and the voice recognition data registered by the voice recognition data registration means, the voice data is divided and recorded by the recording means. Recording control means for controlling;
It is characterized by providing.

請求項３に記載の発明は、請求項１又は請求項２に記載の音声データ処理装置において、
音声データを記録するバッファを備え、
前記バッファは、前記操作手段からの指示に従って、所定の時間分の音声データを記録し、
前記音声認識手段は、前記バッファに記録されたデータに対し音声認識を行い、
前記音声認識データ登録手段は、前記音声認識手段によって音声認識された音声認識データを登録することを特徴とする。 The invention according to claim 3 is the audio data processing device according to claim 1 or 2,
It has a buffer for recording audio data,
The buffer records audio data for a predetermined time according to an instruction from the operation means,
The voice recognition means performs voice recognition on the data recorded in the buffer,
The voice recognition data registration unit registers the voice recognition data recognized by the voice recognition unit.

請求項４に記載の発明は、請求項１〜請求項３のうちいずれか１項に記載の音声データ処理装置において、
前記記録制御手段が行う音声データの区分けは、音声データのファイル分割であることを特徴とする。 According to a fourth aspect of the present invention, in the audio data processing device according to any one of the first to third aspects,
The audio data classification performed by the recording control means is file division of audio data.

請求項５に記載の発明は、請求項１〜請求項３のうちいずれか１項に記載の音声データ処理装置において、
前記記録制御手段が行う音声データの区分けは、音声データのトラック書換えであることを特徴とする。 The invention according to claim 5 is the audio data processing device according to any one of claims 1 to 3,
The audio data classification performed by the recording control means is track rewriting of audio data.

請求項６に記載の発明は、請求項１〜請求項３のうちいずれか１項に記載の音声データ処理装置において、
前記記録制御手段が行う音声データの区分けは、音声データへのフラグ設定であることを特徴とする。 The invention according to claim 6 is the audio data processing device according to any one of claims 1 to 3,
The audio data classification performed by the recording control means is a flag setting for the audio data.

請求項７に記載の発明は、
入力工程及び当該入力工程で入力された音声データを記録する音声データ記録工程を有する音声データ処理方法において、
前記入力工程で入力された音声データの音声認識を行う音声認識工程と、
前記入力工程での入力時の操作を受け付ける操作工程と、
前記操作工程による入力時の操作に基づいて、前記音声認識手段で音声認識された音声認識データを登録する音声認識データ登録工程と、
前記音声認識工程における入力された音声データの音声認識結果と、前記音声認識データ登録工程で登録された音声認識データと、に基づいて、前記入力された音声データを区分けして前記記録工程で記録するように制御する記録制御工程と、
を備えることを特徴とする。 The invention described in claim 7
In an audio data processing method including an input process and an audio data recording process for recording audio data input in the input process,
A voice recognition step of performing voice recognition of the voice data input in the input step;
An operation process for receiving an operation at the time of input in the input process;
A voice recognition data registration step of registering voice recognition data recognized by the voice recognition means based on an operation at the time of input in the operation step;
Based on the voice recognition result of the input voice data in the voice recognition step and the voice recognition data registered in the voice recognition data registration step, the input voice data is divided and recorded in the recording step. A recording control process for controlling
It is characterized by providing.

請求項８に記載の発明は、
外部から入力される音声データ又は記録工程にて予め記録された音声データを再生する再生工程を有する音声データ処理方法において、
前記再生工程で再生された音声データの音声認識を行う音声認識工程と、
前記再生工程での音声再生時の操作を受け付ける操作工程と、
前記操作工程による音声再生時の操作に基づいて、前記音声認識工程で音声認識された音声認識データを登録する音声認識データ登録工程と、
前記音声認識工程における再生された音声データの音声認識結果と、前記音声認識データ登録工程で登録された音声認識データと、に基づいて、前記音声データを区分けして前記記録工程で記録するように制御する記録制御工程と、
を備えることを特徴とする。 The invention according to claim 8 provides:
In an audio data processing method having a reproduction step of reproducing audio data input from the outside or audio data recorded in advance in a recording step,
A voice recognition step of performing voice recognition of the voice data reproduced in the reproduction step;
An operation step of accepting an operation at the time of audio reproduction in the reproduction step;
A voice recognition data registration step of registering voice recognition data recognized in the voice recognition step based on an operation at the time of voice reproduction by the operation step;
Based on the voice recognition result of the reproduced voice data in the voice recognition step and the voice recognition data registered in the voice recognition data registration step, the voice data is divided and recorded in the recording step. Recording control process to control;
It is characterized by providing.

請求項９に記載の発明は、コンピュータを
入力手段及び当該入力手段で入力された音声データを記録する音声データ記録手段として機能させるプログラムにおいて、
前記入力手段で入力された音声データの音声認識を行う音声認識手段、
前記入力手段での入力時の操作を受け付ける操作手段、
前記操作手段による入力時の操作に基づいて、前記音声認識手段で音声認識された音声認識データを登録する音声認識データ登録手段、
前記音声認識手段における入力された音声データの音声認識結果と、前記音声認識データ登録手段で登録された音声認識データと、に基づいて、前記入力された音声データを区分けして前記記録手段で記録するように制御する記録制御手段、
として機能させることを特徴とする。 The invention according to claim 9 is a program that causes a computer to function as an input unit and an audio data recording unit that records audio data input by the input unit.
Voice recognition means for performing voice recognition of the voice data input by the input means;
Operation means for accepting an operation at the time of input by the input means;
Voice recognition data registration means for registering voice recognition data recognized by the voice recognition means based on an operation at the time of input by the operation means;
Based on the voice recognition result of the input voice data in the voice recognition means and the voice recognition data registered in the voice recognition data registration means, the input voice data is divided and recorded by the recording means. Recording control means for controlling to
It is made to function as.

請求項１０に記載の発明は、コンピュータを
外部から入力される音声データ又は記録手段に予め記録された音声データを再生する再生手段として機能させるプログラムにおいて、
前記再生手段で再生された音声データの音声認識を行う音声認識手段、
前記再生手段での音声再生時の操作を受け付ける操作手段、
前記操作手段による音声再生時の操作に基づいて、前記音声認識手段で音声認識された音声認識データを登録する音声認識データ登録手段、
前記音声認識手段における再生された音声データの音声認識結果と、前記音声認識データ登録手段で登録された音声認識データと、に基づいて、前記音声データを区分けして前記記録手段で記録するように制御する記録制御手段、
として機能させることを特徴とする。 According to a tenth aspect of the present invention, there is provided a program for causing a computer to function as reproduction means for reproducing audio data input from the outside or audio data recorded in advance in the recording means.
Voice recognition means for performing voice recognition of the voice data reproduced by the reproduction means;
Operation means for accepting an operation at the time of sound reproduction by the reproduction means;
Voice recognition data registration means for registering voice recognition data that has been voice-recognized by the voice recognition means based on an operation during voice reproduction by the operation means;
Based on the voice recognition result of the reproduced voice data in the voice recognition means and the voice recognition data registered by the voice recognition data registration means, the voice data is divided and recorded by the recording means. Recording control means for controlling,
It is made to function as.

本発明によれば、音声データの区分けを容易かつ適切に行うことができるという効果を奏する。 According to the present invention, there is an effect that audio data can be classified easily and appropriately.

以下、図面を参照して本発明に係る実施の形態を詳細に説明する。ただし、発明の範囲は本実施の形態に限定されない。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. However, the scope of the invention is not limited to this embodiment.

図１を参照して、音声データ処理装置の内部構成を説明する。音声データ処理装置は、ポータブルプレーヤ、ボイスレコーダ、ステレオ装置といった音声の録音、音声データの入力、再生が可能な装置である。
図１に示すように、音声データ処理装置１は、アナログ入力部２、ＡＤＣ（Analog Digital Converter）３、エンコード部４、記録部５、ディジタル入力部６、単語ＤＢ（Database）７、音声認識部８、再生部９、ＤＡＣ（Digital Analog Converter）１０、増幅部１１、出力部１２、キー操作部１３、バッファ１４、及び制御部１５等から構成されている。 With reference to FIG. 1, the internal configuration of the audio data processing apparatus will be described. The audio data processing apparatus is an apparatus capable of recording sound, inputting and reproducing audio data, such as a portable player, a voice recorder, and a stereo apparatus.
As shown in FIG. 1, an audio data processing device 1 includes an analog input unit 2, an ADC (Analog Digital Converter) 3, an encoding unit 4, a recording unit 5, a digital input unit 6, a word DB (Database) 7, a voice recognition unit. 8, a playback unit 9, a DAC (Digital Analog Converter) 10, an amplification unit 11, an output unit 12, a key operation unit 13, a buffer 14, a control unit 15, and the like.

アナログ入力部２は、マイク等の装置であり、音声の録音時に外部の音声を集音し、アナログ音声データに変換する処理を行う。
ＡＤＣ３は、アナログ入力部２から受取ったアナログ音声データをディジタル音声データに変換する処理を行う。
エンコード部４は、バッファ１４から受取ったディジタル音声データをＭＰ３（MPEG Audio Layer-3）、ＷＭＡ（Windows Media（登録商標） Audio）、又はＷＡＶＥ等のファイル形式にエンコードし、ファイルとして記録する処理を行う。
記録部５は、エンコード部４により生成された音声データファイル、ディジタル入力部６から入力された音声データファイル等を記録するＨＤＤ（Hard Disk Drive）やメモリ、音声データを格納するためにＭＤ（Mini Disk）などのメディアを格納するドライブ等の装置である。
ディジタル入力部６は、ＵＳＢ（Universal Serial Bus）等のインターフェースであり、外部装置と接続し、ファイル等の形式のディジタル音声データの入力を受ける。 The analog input unit 2 is a device such as a microphone, and performs processing for collecting external sound and recording it into analog sound data when recording sound.
The ADC 3 performs processing for converting the analog voice data received from the analog input unit 2 into digital voice data.
The encoding unit 4 encodes the digital audio data received from the buffer 14 into a file format such as MP3 (MPEG Audio Layer-3), WMA (Windows Media (registered trademark) Audio), or WAVE, and records the file as a file. Do.
The recording unit 5 is an HDD (Hard Disk Drive) or memory for recording the audio data file generated by the encoding unit 4, the audio data file input from the digital input unit 6, etc. Disk) and other devices that store media.
The digital input unit 6 is an interface such as a USB (Universal Serial Bus), is connected to an external device, and receives digital audio data in a format such as a file.

再生部９は、記録部５に格納されたＭＰ３、ＷＭＡ、ＷＡＶＥ等の各種形式の音声データを制御部１５の制御を受けて再生する処理を行う。
ＤＡＣ１０は、再生部９から入力したディジタル音声データをアナログ音声データに変換する処理を行う。
増幅部１１は、アンプ等の装置であり、ＤＡＣ１０から入力したアナログ音声データを増幅する装置である。
出力部１２は、スピーカ等の装置であり、増幅部１１から入力したアナログ音声データを音声として外部に出力する装置である。 The reproduction unit 9 performs processing for reproducing audio data of various formats such as MP3, WMA, and WAVE stored in the recording unit 5 under the control of the control unit 15.
The DAC 10 performs processing for converting digital audio data input from the reproduction unit 9 into analog audio data.
The amplifying unit 11 is a device such as an amplifier, and is a device that amplifies analog audio data input from the DAC 10.
The output unit 12 is a device such as a speaker, and is a device that outputs analog audio data input from the amplification unit 11 to the outside as audio.

単語ＤＢ７は、複数の単語及びその単語の読みが登録されたデータベースであり、単語の読みからその単語を検索することができる。また、後述のキーワードが登録され、その読みからキーワードを検索することができる。
音声認識部８は、ＡＤＣ３又はバッファ１４から入力される音声データに対し、単語ＤＢ７を参照して音声認識処理を施すことにより、音声データが表している可能性のある単語の尤度（スコア）を特定する。スコアの値が所定値を下回る単語を候補からはずすことで、単語を特定する。音声認識の手法は任意であり、単語が特定されれば良い。 The word DB 7 is a database in which a plurality of words and their readings are registered, and the words can be searched from the word readings. In addition, keywords to be described later are registered, and keywords can be searched from the readings.
The speech recognition unit 8 performs speech recognition processing on the speech data input from the ADC 3 or the buffer 14 with reference to the word DB 7, so that the likelihood (score) of words that may be represented by the speech data. Is identified. A word is specified by removing a word having a score value lower than a predetermined value from candidates. The speech recognition method is arbitrary, and a word only needs to be specified.

キー操作部１３は、ユーザが音声データ処理装置１を操作するためのボタンやスイッチなどの装置である。 The key operation unit 13 is a device such as a button or a switch for the user to operate the audio data processing device 1.

バッファ１４は、ＡＤＣ３又は再生部９から出力されたディジタル音声データを一時的に保存しておくＲＡＭ（Random Access Memory）等の記憶媒体である。
制御部１５は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、バッファ１４などから構成される。ＲＯＭに記憶された制御プログラムに従って、記録部５に蓄積される音声データをバッファ１４にて区分する処理、及び音声データ処理装置１が行う処理全体を制御する。 The buffer 14 is a storage medium such as a RAM (Random Access Memory) that temporarily stores the digital audio data output from the ADC 3 or the playback unit 9.
The control unit 15 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a buffer 14, and the like. In accordance with a control program stored in the ROM, the process of dividing the audio data stored in the recording unit 5 by the buffer 14 and the entire process performed by the audio data processing apparatus 1 are controlled.

図２を参照して、音声データ処理装置１が録音をしている際に、その録音されている音声の中からキーワードを検出し、そのキーワードを検出した位置で音声を分割し、ファイルとして保存する際の処理の流れを説明する。なお、処理の実行は制御部１５が行う。 Referring to FIG. 2, when voice data processing apparatus 1 is recording, a keyword is detected from the recorded voice, and the voice is divided at the position where the keyword is detected, and saved as a file. The flow of processing when doing this will be described. Note that the control unit 15 executes the process.

この処理は、キー操作部１３から録音開始の指示を受けることによって開始する。
図２に示すように、まず記憶部６にＭＰ３、ＷＭＡ又はＷＡＶＥ等の形式のファイルが生成され、そのファイルがオープンされる（Ｓ１）。 This process is started by receiving a recording start instruction from the key operation unit 13.
As shown in FIG. 2, first, a file of a format such as MP3, WMA, or WAVE is generated in the storage unit 6, and the file is opened (S1).

次いで、アナログ音声データがアナログ入力部２から継続して流れ始め、ＡＤＣ３によりディジタル音声データに変換され、バッファ１４に一時的に（例えば２秒間）蓄えられる。その後、ディジタル音声データはエンコード部４によりエンコードされ、ファイルに順次蓄積される（Ｓ２）。 Next, analog audio data starts to flow continuously from the analog input unit 2, is converted into digital audio data by the ADC 3, and is temporarily stored in the buffer 14 (for example, for 2 seconds). Thereafter, the digital audio data is encoded by the encoding unit 4 and sequentially stored in a file (S2).

次いで、キー操作部１３からキーワード登録指示を受信しない場合（Ｓ３；Ｎｏ）、Ｓ９に進む。
また、キーワード登録指示を受けた場合（Ｓ３；Ｙｅｓ）、バッファ１４に一時的に蓄積された音声データは音声認識部８により音声認識処理が行われ、この音声データが表している単語が特定される（Ｓ４）。
次いで、Ｓ４により特定された単語は、単語ＤＢ７にキーワードとして新たに登録される（Ｓ５）。 Next, when a keyword registration instruction is not received from the key operation unit 13 (S3; No), the process proceeds to S9.
When a keyword registration instruction is received (S3; Yes), the voice data temporarily stored in the buffer 14 is subjected to voice recognition processing by the voice recognition unit 8, and the word represented by the voice data is specified. (S4).
Next, the word specified in S4 is newly registered as a keyword in the word DB 7 (S5).

次いで、Ｓ４により特定された単語の直前の位置で音声データのファイルへの蓄積は停止され、ファイルはそこでクローズされる（Ｓ６）。
次いで、新たにファイルがオープンされ（Ｓ７）、バッファ１４に一時的に蓄えられた音声データは、Ｓ６にて蓄積が停止された位置からエンコード部４によりエンコードされ、このファイルに順次蓄積される（Ｓ８）。 Next, the accumulation of the voice data in the file is stopped at the position immediately before the word specified in S4, and the file is closed there (S6).
Next, a new file is opened (S7), and the audio data temporarily stored in the buffer 14 is encoded by the encoding unit 4 from the position where the storage was stopped in S6, and sequentially stored in this file ( S8).

次いで、ＡＤＣ３によりディジタルデータに変換された音声データは、バッファ１４に入力されるのと同期を取りながら平行して音声認識部８に継続して入力され、音声認識処理が行われる。Ｓ５にて登録されたキーワードが検出されない場合（Ｓ９；Ｎｏ）、Ｓ１３に進む。
また、キーワードが検出された場合（Ｓ９、Ｙｅｓ）、検出されたキーワードの直前の位置でバッファ１４に蓄えられた音声データのファイルへの蓄積は停止され、ファイルはそこでクローズされる（Ｓ１０）。 Next, the voice data converted into digital data by the ADC 3 is continuously input to the voice recognition unit 8 in parallel with the input to the buffer 14 in parallel with the voice data, and voice recognition processing is performed. When the keyword registered in S5 is not detected (S9; No), the process proceeds to S13.
If a keyword is detected (S9, Yes), the storage of the audio data stored in the buffer 14 at the position immediately before the detected keyword is stopped and the file is closed there (S10).

次いで、新たにファイルがオープンされ（Ｓ１１）、バッファ１４に一時的に蓄えられた音声データは、Ｓ１０にて蓄積が停止された位置からエンコード部４によりエンコードされ、このファイルに順次蓄積される（Ｓ１２）。 Next, a new file is opened (S11), and the audio data temporarily stored in the buffer 14 is encoded by the encoding unit 4 from the position where the storage was stopped in S10, and sequentially stored in this file ( S12).

次いで、キー操作部１３から録音終了の指示がない場合（Ｓ１３；Ｎｏ）、Ｓ３に戻る。
また、録音終了の指示を受けた場合（Ｓ１３；Ｙｅｓ）、ファイルはクローズされ（Ｓ１４）、この処理は終了する。 Next, when there is no instruction to end recording from the key operation unit 13 (S13; No), the process returns to S3.
If an instruction to end recording is received (S13; Yes), the file is closed (S14), and this process ends.

以上のように、この処理を要すれば、キーワード登録指示を受信した際は、キーワード登録処理（Ｓ４、Ｓ５）が行われ、音声認識により登録されたキーワードを検出した際は、音声データを分割しファイルとして記録する処理（Ｓ１、Ｓ２、Ｓ６、Ｓ７、Ｓ８、Ｓ１０、Ｓ１１、Ｓ１２、Ｓ１４）が行われることとなる。 As described above, if this processing is required, keyword registration processing (S4, S5) is performed when a keyword registration instruction is received, and voice data is divided when a registered keyword is detected by voice recognition. The process of recording as a file (S1, S2, S6, S7, S8, S10, S11, S12, S14) will be performed.

図３を参照して、例えば会議中に会議内容を録音する際に、音声データ処理装置１であるボイスレコーダの使用方法及びボイスレコーダが行う処理の概要を説明する。 With reference to FIG. 3, for example, when recording the contents of a conference during a conference, a method for using the voice recorder that is the voice data processing device 1 and an outline of the processing performed by the voice recorder will be described.

まず、図３（ａ）は、どのように話者の発話からキーワードが登録され、キーワードが検出されるかを示した図である。
話者が「・・・と思います。それでは次に・・・・・・・・となるでしょう。それでは昨今の・・」と発話する（ａ１）。
話者が「それでは次に」と発話した際の「それでは」のタイミングで、ボイスレコーダを操作する者がキーワードを登録するためにキー操作部１３を操作すると（ａ２）、バッファ１４に一時的に（例えば２秒間）蓄えられている音声データ（ａ３）の内容に対して、音声認識部８により音声認識処理が行われ、「それでは」という単語が特定され、その単語が検出された音声のまま単語ＤＢ７にキーワードとして登録される。そして、音声認識部８はその後の話者の発話から「それでは」を検出する（ａ４）。 First, FIG. 3A is a diagram showing how a keyword is registered from a speaker's utterance and the keyword is detected.
The speaker utters "I think ... Then it will be ... Next ... Now then ..." (a1).
When the person who operates the voice recorder operates the key operation unit 13 in order to register a keyword at the timing of “then” when the speaker speaks “then next” (a2), the buffer 14 is temporarily stored. A voice recognition process is performed by the voice recognition unit 8 on the contents of the voice data (a3) stored (for example, for 2 seconds), and the word “Now” is specified, and the voice in which the word is detected is used. It is registered as a keyword in the word DB 7. Then, the voice recognition unit 8 detects “Now” from the subsequent speech of the speaker (a4).

図３（ｂ）は、話者の発話からキーワードが検出される度にファイルが記録されることを示したイメージ図である。
話者の発話からキーワード「それでは」が検出される度に、エンコード部４によってエンコードされた音声データは制御部１５によってその位置で分割され、ＭＰ３の形式のファイルとして記録部５に記録される（例えばファイル名は「ファイルＮ.mp3」）。 FIG. 3B is an image diagram showing that a file is recorded every time a keyword is detected from a speaker's utterance.
Each time the keyword “Now” is detected from the utterance of the speaker, the audio data encoded by the encoding unit 4 is divided at that position by the control unit 15 and recorded in the recording unit 5 as a file in the MP3 format ( For example, the file name is “file N.mp3”).

なお、本実施の形態では、音声データの区分の方法は、分割した音声データをファイルに分けて記録する方法であったが、この方法に限られず、音声データにフラグ情報を設定する方法や、トラックに分けて音楽データをＭＤに記録する方法であってもよい。 In the present embodiment, the method of dividing the audio data was a method of recording the divided audio data divided into files, but is not limited to this method, a method of setting flag information in the audio data, It may be a method of recording music data on an MD divided into tracks.

フラグ設定により音声データを区分する場合は、制御部１５が行う処理は、音声データの分割を行う代わりに、音声データにフラグ情報を設定する処理を行う。そして、図２のフローチャートにおけるステップＳ６及びＳ７に代わって、フラグ情報設定が行われ、同様に、ステップＳ１０及びＳ１１に代わって、フラグ情報設定が行われる。 When the voice data is classified by setting the flag, the process performed by the control unit 15 performs a process of setting flag information in the voice data instead of dividing the voice data. Then, flag information setting is performed instead of steps S6 and S7 in the flowchart of FIG. 2, and similarly, flag information setting is performed instead of steps S10 and S11.

図３（ｃ）は、フラグ情報の設定により音声データが区分されることを示したイメージ図である。話者の発話からキーワード「それでは」が検出される度に、エンコード部４によってエンコードされた音声データは制御部１５によってその位置にフラグ情報（ｃ１、ｃ２）が設定され、記録部５に記録される。 FIG. 3C is an image diagram showing that audio data is divided by setting flag information. Each time the keyword “Now” is detected from the speaker's utterance, the control unit 15 sets the flag information (c1, c2) at the position of the voice data encoded by the encoding unit 4 and is recorded in the recording unit 5. The

また、トラックに分けて音声データをＭＤに記録する場合は、エンコード部４は音声データをＭＤへ記録するためのエンコード処理を行い、記録部５はＭＤドライブ等のＭＤを格納する装置となり、制御部１５はＭＤが保持する音声データのアドレス情報及びトラック情報を書き換える処理を行う。そして、図２のフローチャートにおけるステップＳ６及びＳ７に代わって、音声データのアドレス情報及びトラック情報の書き換えが行われ、同様に、ステップＳ１０及びＳ１１に代わって、音声データのアドレス情報及びトラック情報の書き換えが行われる。 Also, when recording audio data on an MD divided into tracks, the encoding unit 4 performs an encoding process for recording the audio data on the MD, and the recording unit 5 serves as a device for storing the MD such as an MD drive. The unit 15 performs a process of rewriting address information and track information of audio data held by the MD. Then, in place of steps S6 and S7 in the flowchart of FIG. 2, the address information and track information of the audio data are rewritten, and similarly, the address information and track information of the audio data are rewritten in place of steps S10 and S11. Is done.

また、本実施の形態では、記録部５に記録する音声データは、アナログ入力部２から入力された音声データであるが、この方法に限定されない。 In the present embodiment, the audio data recorded in the recording unit 5 is the audio data input from the analog input unit 2, but is not limited to this method.

例えば、ディジタル入力部６から入力され、記録部５に記録されたＭＰ３等の音声ファイルを再生部９により再生し、その音声を音声認識部８で認識しながらエンコード部４により分割し、記録部５に記録することとしても良い。この際、音声データ処理装置１のユーザは、再生部９により再生され、ＤＡＣ１０によりアナログデータに変換され、増幅部１１により増幅され、出力部１２から出力された音声を聞きながら、キーワード登録の為にキー操作部１３を操作する。 For example, an audio file such as MP3 input from the digital input unit 6 and recorded in the recording unit 5 is reproduced by the reproducing unit 9, and the audio is recognized by the audio recognizing unit 8 while being divided by the encoding unit 4, and the recording unit 5 may be recorded. At this time, the user of the audio data processing apparatus 1 performs keyword registration while listening to the audio reproduced by the reproduction unit 9, converted into analog data by the DAC 10, amplified by the amplification unit 11, and output from the output unit 12. The key operation unit 13 is operated.

図４を参照して、音声データ処理装置１が音声データを再生している際に、その再生されている音声の中からキーワードを検出し、そのキーワードを検出した位置で音声を分割し、ファイルとして保存する際の処理の流れを説明する。なお、処理の実行は制御部１５が行う。 Referring to FIG. 4, when audio data processing apparatus 1 is reproducing audio data, a keyword is detected from the reproduced audio, the audio is divided at the position where the keyword is detected, and a file is The flow of processing when saving as will be described. Note that the control unit 15 executes the process.

この処理は、記録部５に記録された音声データの再生中に、キー操作部１３から録音開始の指示を受けることによって開始する。
図４に示すように、まず記憶部６にＭＰ３、ＷＭＡ又はＷＡＶＥ等の形式のファイルが生成され、そのファイルがオープンされる（Ｓ２１）。 This process is started by receiving an instruction to start recording from the key operation unit 13 during reproduction of the audio data recorded in the recording unit 5.
As shown in FIG. 4, first, a file of a format such as MP3, WMA, or WAVE is generated in the storage unit 6, and the file is opened (S21).

次いで、再生部９により再生された音声データは、バッファ１４に一時的に（例えば２秒間）蓄えられた後、エンコード部４によりエンコードされ、オープンされたファイルに順次蓄積される（Ｓ２２）。 Next, the audio data reproduced by the reproducing unit 9 is temporarily stored in the buffer 14 (for example, for 2 seconds), then encoded by the encoding unit 4, and sequentially stored in the opened file (S22).

次いで、キー操作部１３からキーワード登録指示を受信しない場合（Ｓ２３；Ｎｏ）、Ｓ２９に進む。
また、キーワード登録指示を受けた場合（Ｓ２３；Ｙｅｓ）、バッファ１４に一時的に蓄積された音声データは音声認識部８により音声認識処理が行われ、この音声データが表している単語が特定される（Ｓ２４）。 Next, when a keyword registration instruction is not received from the key operation unit 13 (S23; No), the process proceeds to S29.
When a keyword registration instruction is received (S23; Yes), the voice data temporarily stored in the buffer 14 is subjected to voice recognition processing by the voice recognition unit 8, and the word represented by the voice data is specified. (S24).

次いで、Ｓ２４により特定された単語は、単語ＤＢ７にキーワードとして新たに登録される（Ｓ２５）。 Next, the word specified in S24 is newly registered as a keyword in the word DB 7 (S25).

次いで、Ｓ２４により特定された単語の直前の位置で音声データのファイルへの蓄積は停止され、ファイルはそこでクローズされる（Ｓ２６）。 Next, the accumulation of the voice data in the file is stopped at the position immediately before the word specified in S24, and the file is closed there (S26).

次いで、新たにファイルがオープンされ（Ｓ２７）、バッファ１４に一時的に蓄えられた音声データは、Ｓ２６にて蓄積が停止された位置からエンコード部４によりエンコードされ、このファイルに順次蓄積される（Ｓ２８）。 Next, a new file is opened (S27), and the audio data temporarily stored in the buffer 14 is encoded by the encoding unit 4 from the position where the storage was stopped in S26, and sequentially stored in this file ( S28).

次いで、再生部９により再生された音声データは、バッファ１４に入力されるのと同期を取りながら平行して音声認識部８に継続して入力され、音声認識処理が行われる。Ｓ２５にて登録されたキーワードが検出されない場合（Ｓ２９；Ｎｏ）、Ｓ３３に進む。
また、キーワードが検出された場合（Ｓ２９、Ｙｅｓ）、検出されたキーワードの直前の位置でバッファ１４に蓄えられた音声データのファイルへの蓄積は停止され、ファイルはそこでクローズされる（Ｓ３０）。 Next, the audio data reproduced by the reproducing unit 9 is continuously input in parallel to the audio recognizing unit 8 while being synchronized with the input to the buffer 14, and audio recognition processing is performed. When the keyword registered in S25 is not detected (S29; No), the process proceeds to S33.
When a keyword is detected (S29, Yes), the accumulation of the audio data stored in the buffer 14 at the position immediately before the detected keyword is stopped and the file is closed there (S30).

次いで、新たにファイルがオープンされ（Ｓ３１）、バッファ１４に一時的に蓄えられた音声データは、Ｓ２８にて蓄積が停止された位置からエンコード部４によりエンコードされ、このファイルに順次蓄積される（Ｓ３２）。 Next, a new file is opened (S31), and the audio data temporarily stored in the buffer 14 is encoded by the encoding unit 4 from the position where the storage was stopped in S28, and sequentially stored in this file ( S32).

次いで、キー操作部１３から再生終了の指示がない場合（Ｓ３３；Ｎｏ）、Ｓ２３に戻る。
また、再生終了の指示を受けた場合（Ｓ３３；Ｙｅｓ）、ファイルはクローズされ（Ｓ３４）、この処理は終了する。 Next, when there is no instruction to end reproduction from the key operation unit 13 (S33; No), the process returns to S23.
If an instruction to end reproduction is received (S33; Yes), the file is closed (S34), and this process ends.

なお、再生部９により再生される音声データは、アナログ入力部２から入力され、記録部５に記録された音声データであってもよい。また、ディジタル入力部６から入力される音声データはファイルではなく、ストリーミングデータであってもよい。 The audio data reproduced by the reproduction unit 9 may be audio data input from the analog input unit 2 and recorded in the recording unit 5. The audio data input from the digital input unit 6 may be streaming data instead of a file.

また、登録するキーワードは一つではなく異なるキーワードを複数登録し、音声データの分割位置をキーワード毎に変えて音声データを記録することとしても良い。 Also, not a single keyword but a plurality of different keywords may be registered, and the voice data may be recorded by changing the division position of the voice data for each keyword.

以上のように、本実施形態によれば、アナログ入力部２から入力された音声データをエンコード部４によりエンコードし、記録部５に記録しながら音声認識部８により音声データを音声認識し、キー操作部１３によりキーワード登録のための操作をすることで、音声の録音をしながらキーワードを登録することができ、録音前に音声データを区分するためのキーワードを登録する事前作業が不要となる。 As described above, according to the present embodiment, the audio data input from the analog input unit 2 is encoded by the encoding unit 4, and the audio data is recognized by the audio recognition unit 8 while being recorded in the recording unit 5. By performing an operation for registering a keyword with the operation unit 13, it is possible to register a keyword while recording a voice, and a prior work for registering a keyword for classifying voice data before recording becomes unnecessary.

また、録音中もしくは再生中の音声を聞きながらキーワードの登録を行うことによって、話者がどのような言葉を多用するか、又は話者が話題を変える際にどのような言葉を使用するか、といった話者の癖を判断してキーワードを登録することができる。 Also, by registering keywords while listening to the sound being recorded or played back, what words the speaker uses frequently or what words the speaker uses when changing topics, It is possible to register keywords by judging the speaker's habits.

また、ＡＤＣ３によりディジタル音声データに変換後、もしくは再生部９により再生後の音声データをバッファ１４に一時的に蓄積しておくことで、一定時間前に既に録音もしくは再生された音声に対して、音声認識部８はキーワード登録のための音声認識を容易に行うことができる。 Further, by storing the audio data after being converted into digital audio data by the ADC 3 or after being reproduced by the reproduction unit 9 in the buffer 14, for the audio already recorded or reproduced before a certain time, The voice recognition unit 8 can easily perform voice recognition for keyword registration.

更に、異なるキーワードを複数登録し、音声データの区分け位置を登録されたキーワード毎に変えて音声データを記録することで、音声データの使用目的に応じて様々な位置で音声を区分けすることが容易となる。 Furthermore, by recording multiple different keywords and recording the voice data by changing the voice data segmentation position for each registered keyword, it is easy to classify the voice at various positions according to the purpose of use of the voice data. It becomes.

音声データ処理装置のブロック図である。It is a block diagram of an audio | voice data processing apparatus. 録音中の音声データを記録する処理のフローチャートである。It is a flowchart of the process which records the audio | voice data in recording. （ａ）は、どのように話者の発話からキーワードが登録され、キーワードが検出されるかを示した図である。（ｂ）は、話者の発話からキーワードが検出される度にファイルが記録されることを示したイメージ図である。（ｃ）は、フラグ設定により音声データが区分されることを示したイメージ図である。(A) is the figure which showed how a keyword was registered from a speaker's utterance and a keyword was detected. (B) is an image diagram showing that a file is recorded every time a keyword is detected from a speaker's utterance. (C) is an image diagram showing that audio data is divided by flag setting. 再生中の音声データを記録する処理のフローチャートである。It is a flowchart of the process which records the audio | voice data in reproduction | regeneration.

Explanation of symbols

１音声データ処理装置
２アナログ入力部
３ＡＤＣ
４エンコード部
５記録部
６ディジタル入力部
７単語ＤＢ
８音声認識部
９再生部
１０ＤＡＣ
１１増幅部
１２出力部
１３キー操作部
１４バッファ
１５制御部 DESCRIPTION OF SYMBOLS 1 Audio | voice data processing apparatus 2 Analog input part 3 ADC
4 Encoding unit 5 Recording unit 6 Digital input unit 7 Word DB
8 Voice recognition unit 9 Playback unit 10 DAC
DESCRIPTION OF SYMBOLS 11 Amplification part 12 Output part 13 Key operation part 14 Buffer 15 Control part

Claims

In an audio data processing apparatus having an input means and an audio data recording means for recording audio data input by the input means,
Voice recognition means for performing voice recognition of the voice data input by the input means;
Operation means for accepting an operation at the time of input by the input means;
Voice recognition data registration means for registering voice recognition data recognized by the voice recognition means based on an operation at the time of input by the operation means;
Based on the voice recognition result of the input voice data in the voice recognition means and the voice recognition data registered in the voice recognition data registration means, the input voice data is divided and recorded by the recording means. Recording control means for controlling so as to
An audio data processing apparatus comprising:

In an audio data processing apparatus having reproduction means for reproducing audio data input from the outside or audio data recorded in advance in the recording means,
Voice recognition means for performing voice recognition of the voice data reproduced by the reproduction means;
Operation means for accepting an operation at the time of sound reproduction by the reproduction means;
Voice recognition data registration means for registering voice recognition data recognized by the voice recognition means based on an operation at the time of voice reproduction by the operation means;
Based on the voice recognition result of the reproduced voice data in the voice recognition means and the voice recognition data registered by the voice recognition data registration means, the voice data is divided and recorded by the recording means. Recording control means for controlling;
An audio data processing apparatus comprising:

It has a buffer for recording audio data,
The buffer records audio data for a predetermined time according to an instruction from the operation means,
The voice recognition means performs voice recognition on the data recorded in the buffer,
The voice data processing apparatus according to claim 1, wherein the voice recognition data registration unit registers voice recognition data voice-recognized by the voice recognition unit.

4. The audio data processing apparatus according to claim 1, wherein the audio data classification performed by the recording control means is file division of audio data.

4. The audio data processing apparatus according to claim 1, wherein the audio data classification performed by the recording control means is track switching of audio data.

4. The audio data processing apparatus according to claim 1, wherein the classification of the audio data performed by the recording control means is a flag setting for the audio data.

In an audio data processing method including an input process and an audio data recording process for recording audio data input in the input process,
A voice recognition step of performing voice recognition of the voice data input in the input step;
An operation process for receiving an operation at the time of input in the input process;
A voice recognition data registration step of registering voice recognition data recognized by the voice recognition means based on an operation at the time of input in the operation step;
Based on the voice recognition result of the input voice data in the voice recognition step and the voice recognition data registered in the voice recognition data registration step, the input voice data is divided and recorded in the recording step. A recording control process for controlling
An audio data processing method comprising:

In an audio data processing method having a reproduction step of reproducing audio data input from the outside or audio data recorded in advance in a recording step,
A voice recognition step of performing voice recognition of the voice data reproduced in the reproduction step;
An operation step of accepting an operation at the time of audio reproduction in the reproduction step;
A voice recognition data registration step of registering voice recognition data recognized in the voice recognition step based on an operation at the time of voice reproduction by the operation step;
Based on the voice recognition result of the reproduced voice data in the voice recognition step and the voice recognition data registered in the voice recognition data registration step, the voice data is divided and recorded in the recording step. Recording control process to control;
An audio data processing method comprising:

In a program for causing a computer to function as input means and sound data recording means for recording sound data input by the input means,
Voice recognition means for performing voice recognition of the voice data input by the input means;
Operation means for accepting an operation at the time of input by the input means;
Voice recognition data registration means for registering voice recognition data recognized by the voice recognition means based on an operation at the time of input by the operation means;
Based on the voice recognition result of the input voice data in the voice recognition means and the voice recognition data registered in the voice recognition data registration means, the input voice data is divided and recorded by the recording means. Recording control means for controlling to
A program characterized by functioning as

In a program for causing a computer to function as reproduction means for reproducing audio data input from outside or audio data recorded in advance in a recording means,
Voice recognition means for performing voice recognition of the voice data reproduced by the reproduction means;
Operation means for accepting an operation at the time of sound reproduction by the reproduction means;
Voice recognition data registration means for registering voice recognition data that has been voice-recognized by the voice recognition means based on an operation during voice reproduction by the operation means;
Based on the voice recognition result of the reproduced voice data in the voice recognition means and the voice recognition data registered by the voice recognition data registration means, the voice data is divided and recorded by the recording means. Recording control means for controlling,
A program characterized by functioning as