JP2015206928A

JP2015206928A - Voice processor, voice processing program, and voice processing method

Info

Publication number: JP2015206928A
Application number: JP2014087996A
Authority: JP
Inventors: 信利藤沢; Nobutoshi Fujisawa; 勝明赤間; Katsuaki Akama
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2014-04-22
Filing date: 2014-04-22
Publication date: 2015-11-19
Anticipated expiration: 2034-04-22
Also published as: JP6409163B2

Abstract

PROBLEM TO BE SOLVED: To suppress the fluctuation of the level of a recorded voice in the case of reproduction processing of a recorded voice, and to improve the operability of a voice processor without generating the volume adjusting operation of the voice processor in the case of reproduction processing of recorded data.SOLUTION: A voice processor 2 having a recording function includes voice input means 4, voice processing means 8, and storage means 10. The voice input means fetches voice having different levels, and generates recorded data. The voice processing means divides the recorded data generated by the voice input means into a plurality of frames for each group of voice, and determines the level of the voice included in the recorded data for each frame, and when the level of the recorded data is different from a reference level (output volume level), generates new recorded data by adjusting the level of the recorded data on the basis of the reference level. The storage means stores the recorded data generated by the voice processing means.

Description

本開示の技術は、録音データに対する音声処理技術に関する。 The technology of the present disclosure relates to a voice processing technology for recorded data.

録音・再生機能を備える音声処理装置は、たとえば録音した音声メッセージを送信する音声メールなどのようにコミュニケーションを図るものや、音声メッセ−ジ、周囲の会話や会議などを録音するボイスメモなどに利用される。音声処理装置の録音・再生機能には、たとえば録音された音声データの音量などのばらつきを解消するために、音声データを調整して再生処理を行うものがある。音声データに生じる音量などのばらつきは、たとえば録音と再生とが異なる機器で行われる場合や、録音を行う周囲の環境などの影響を受ける。 A voice processing device having a recording / playback function is used for communication such as voice mail for sending recorded voice messages, voice memos for recording voice messages, surrounding conversations and meetings, and the like. The As a recording / playback function of a voice processing device, for example, there is a function that performs playback processing by adjusting voice data in order to eliminate variations in volume of recorded voice data. Variations in sound volume and the like that occur in audio data are affected by, for example, the case where recording and playback are performed by different devices, and the surrounding environment where recording is performed.

このような録音データの調整処理に関し、記憶した音声メッセージのレベル情報を検出して再生処理を行う交換機側にて、レベル情報に応じて音声メッセージのレベルを変換することが知られている（たとえば、特許文献１）。また、音声メールの作成時において、録音時の環境情報を付加し、この環境情報に基づいて雑音除去処理を決定して音声メールの内容を編集することが知られている（たとえば、特許文献２）。 Regarding such recording data adjustment processing, it is known that the level of voice message is converted in accordance with the level information on the exchange side that detects the level information of the stored voice message and performs playback processing (for example, Patent Document 1). Also, it is known that when creating a voice mail, environment information at the time of recording is added, and noise removal processing is determined based on this environment information to edit the contents of the voice mail (for example, Patent Document 2). ).

特開平７−２９７９２８号公報JP-A-7-297828 特開２００４−２３６２４５号公報JP 2004-236245 A

ところで、音声メッセージやボイスメモなどによる音声の録音処理では、たとえば複数人の発言を録音する場合、それぞれの発声による音声レベルや録音する音声処理装置までの距離などの相違により、録音した音声のレベルが大きくばらつく場合がある。また録音音声は、たとえば同一人が発した音声を録音した場合でも、発言者の動作により録音する音声処理装置との距離に変化が生じる場合や、録音環境が変化することで音声のレベルがばらつく場合がある。さらに録音データを複数作成した場合、録音データ毎に集音環境や発言者の状態の相違により、音声のレベルが相違する。 By the way, in the recording process of voice by voice messages or voice memos, for example, when recording the utterances of multiple people, the level of the recorded voice is different depending on the voice level by each voice and the distance to the voice processing device to record. May vary greatly. For example, even when the voice of the same person is recorded, the voice level varies depending on the distance from the voice processing device to be recorded due to the action of the speaker, or the recording environment changes. There is a case. Further, when a plurality of recording data are created, the sound level differs depending on the sound collection environment and the state of the speaker for each recording data.

録音機能を備える音声処理装置では、たとえば録音音声レベルの大小に応じてマイクロフォンの集音感度（ダイナミックレンジ）を変動させるものがあるが、録音される音声のレベルの変動を抑えることはできない。従って、発せられた音声のレベルに変動があると、音声処理装置は、変動を生じたままの音声を録音することになる。 Some audio processing devices having a recording function vary the microphone sound collection sensitivity (dynamic range) according to the level of the recorded sound level, for example, but it is not possible to suppress fluctuations in the recorded sound level. Therefore, if there is a change in the level of the uttered voice, the voice processing device records the voice with the fluctuation.

音声のレベルにばらつきがある録音データが再生された場合、利用者は、たとえばレベル変化による音量の変化に応じて音量調整の操作を行うことになる。すなわち、利用者は、たとえばレベルが低い部分の音声が再生された場合、音量が小さくなり、再生された音声が聞き取りづらくなるため音量を増加させることになる。また、音量が大きく設定された状態でレベルが大きい音声が再生されると、スピーカから大音量で出力されるため、利用者は音量を減少させる操作を行うことになる。このように音声処理装置では、録音データを再生する際に常に音量調整操作が必要となり、利用者に操作の煩わしさを感じることになるという課題がある。 When recorded data with variations in the sound level is reproduced, the user performs a volume adjustment operation in accordance with, for example, a change in volume due to a level change. That is, for example, when a low-level portion of sound is reproduced, the user decreases the sound volume and increases the sound volume because the reproduced sound becomes difficult to hear. In addition, when a sound with a high level is reproduced in a state where the sound volume is set to be high, the sound is output from the speaker at a high sound volume, and thus the user performs an operation for decreasing the sound volume. As described above, in the sound processing apparatus, there is a problem that a volume adjustment operation is always required when reproducing recorded data, and the user feels troublesome.

また、音声処理装置は、たとえば録音環境や録音レベルに応じて再生する録音データのレベルを調整する機能を備えていても、音声のレベルにばらつきが含まれる録音データについて、レベルを一律に調整したのでは音量の変動を解消できないという課題がある。 Moreover, even if the audio processing device has a function of adjusting the level of recorded data to be reproduced in accordance with the recording environment and the recording level, for example, the level is uniformly adjusted for the recorded data that includes variations in the audio level. However, there is a problem that fluctuations in volume cannot be resolved.

そこで、本開示の技術の目的は、録音した音声について、再生処理時に録音音声のレベルの変動を抑えることにある。 Therefore, an object of the technology of the present disclosure is to suppress fluctuations in the level of recorded voice during playback processing for recorded voice.

また、本開示の技術の他の目的は、録音データの再生処理時に音声処理装置の音量調整操作を生じさせず、音声処理装置の操作性の向上を図ることにある。 Another object of the technology of the present disclosure is to improve the operability of the sound processing device without causing the sound volume adjusting operation of the sound processing device during the reproduction processing of the recorded data.

上記目的を達成するため、本開示の技術の一側面は、録音機能を備える音声処理装置であって、音声入力手段と、音声処理手段と、記憶手段とを備える。音声入力手段は、レベルの異なる音声を取り込んで録音データを生成する。音声処理手段は、前記音声入力手段で生成した前記録音データを、音声のまとまり毎に複数のフレームに分割し、該フレーム毎に前記録音データに含まれる音声のレベルを判別し、前記録音データのレベルが基準レベルと異なる場合、該基準レベルに基づいて、前記録音データのレベルを調整した新たな録音データを生成する。記憶手段は、前記音声処理手段で生成した前記録音データを記憶する。
In order to achieve the above object, one aspect of the technology of the present disclosure is a voice processing device having a recording function, and includes a voice input unit, a voice processing unit, and a storage unit. The voice input means captures voices having different levels and generates recording data. The voice processing means divides the recording data generated by the voice input means into a plurality of frames for each group of voices, determines the level of voice included in the recording data for each frame, If the level is different from the reference level, new recording data in which the recording data level is adjusted is generated based on the reference level. The storage means stores the recording data generated by the voice processing means.

本開示の技術によれば、次のいずれかの効果が得られる。 According to the technique of the present disclosure, any of the following effects can be obtained.

(1) 録音データの再生時に、スピーカ等から出力される音声の再生音量の変動が抑制され、再生時に聞き取りやすい録音音声を提供することができる。 (1) When the recorded data is reproduced, fluctuations in the reproduction volume of the sound output from the speaker or the like are suppressed, and the recorded sound that is easy to hear during reproduction can be provided.

(2) 再生音量の変動が抑制されることで、録音データを再生する音声処理装置の音量調整操作が不要となり、音声再生時の利便性を向上させることができる。 (2) Since the fluctuation of the reproduction volume is suppressed, the volume adjustment operation of the audio processing apparatus that reproduces the recording data becomes unnecessary, and the convenience during audio reproduction can be improved.

第１の実施の形態に係る音声処理装置の一例を示す図である。It is a figure which shows an example of the audio processing apparatus which concerns on 1st Embodiment. 音声処理の一例を示すフローチャートである。It is a flowchart which shows an example of an audio | voice process. 録音データのレベル調整の一例を示す図である。It is a figure which shows an example of the level adjustment of recording data. 第２の実施の形態に係る音声処理装置の一例を示す図である。It is a figure which shows an example of the audio processing apparatus which concerns on 2nd Embodiment. 録音データの分割状態の一例を示す図である。It is a figure which shows an example of the division | segmentation state of recording data. 音量レベル設定テーブルの一例を示す図である。It is a figure which shows an example of a volume level setting table. 録音データテーブルの一例を示す図である。It is a figure which shows an example of a recording data table. 音量レベル調整による新たな録音データの生成状態例を示す図である。It is a figure which shows the example of the production | generation state of the new recording data by volume level adjustment. 録音データの調整処理を示す図である。It is a figure which shows the adjustment process of recording data. 音声処理の一例を示すフローチャートである。It is a flowchart which shows an example of an audio | voice process. ファイル作成処理の一例を示すフローチャートである。It is a flowchart which shows an example of a file creation process. ファイルレベル変換処理の一例を示すフローチャートである。It is a flowchart which shows an example of a file level conversion process. 録音データ変換処理の一例を示すフローチャートである。It is a flowchart which shows an example of a sound recording data conversion process. 第３の実施の形態に係る音声処理の状態例を示す図である。It is a figure which shows the example of a state of the audio | voice process which concerns on 3rd Embodiment. 録音データの調整処理を示す図である。It is a figure which shows the adjustment process of recording data. 音声処理の他の例を示す図である。It is a figure which shows the other example of an audio | voice process. 録音データの調整処理の他の例を示す図である。It is a figure which shows the other example of the adjustment process of recording data. 他の実施の形態に係る音声処理状態例を示す図である。It is a figure which shows the audio processing state example which concerns on other embodiment. 音声処理の一例を示すフローチャートである。It is a flowchart which shows an example of an audio | voice process.

〔第１の実施の形態〕 [First Embodiment]

図１は、第１の実施の形態に係る音声処理装置の一例を示している。この音声処理装置２は、本開示の音声処理装置の一例である。 FIG. 1 shows an example of a speech processing apparatus according to the first embodiment. The voice processing device 2 is an example of a voice processing device according to the present disclosure.

音声処理装置２は、会話などの人が発声した言葉や周囲の音などを含む音声を録音するとともに、録音した音声の編集処理を行う装置である。この音声処理装置２には、音声入力手段４、マイクロフォン６、音声処理手段８、記憶手段１０が備えられている。 The voice processing device 2 is a device that records a voice including words spoken by a person such as a conversation and surrounding sounds and performs an editing process on the recorded voice. The voice processing apparatus 2 includes a voice input unit 4, a microphone 6, a voice processing unit 8, and a storage unit 10.

音声入力手段４は、マイクロフォン６を利用して集音した外部音声を取り込み、録音データを生成する録音機能の一例である。また、音声入力手段４は、たとえばマイクロフォン６のダイナミックレンジ（集音能力）の制御機能を備えてもよい。 The voice input means 4 is an example of a recording function that takes in external sound collected using a microphone 6 and generates recording data. Moreover, the voice input means 4 may be provided with a control function of the dynamic range (sound collecting ability) of the microphone 6, for example.

マイクロフォン６は、たとえば一人が発したレベルの異なる音声や、音声のレベルが異なる複数人の声を含む音声を集音する。この音声のレベルは、たとえば音圧や音量の大きさであり、マイクロフォン６で取り込んだ音声信号の大きさを示している。マイクロフォン６は、音声処理装置２に内蔵されたものや、発言者に向けて持ち運び可能なもののいずれでもよく、または単一または、複数本を切替えて利用するものであってもよい。 For example, the microphone 6 collects sounds including sounds of different levels uttered by one person and voices of a plurality of persons having different levels of sound. This level of sound is, for example, the level of sound pressure or volume, and indicates the level of the audio signal captured by the microphone 6. The microphone 6 may be either one built in the audio processing device 2 or one that can be carried toward a speaker, or may be used by switching between a single or a plurality of microphones.

音声処理手段８は、取り込んだ録音データを所定の音声のまとまり毎にフレーム分割するとともに、分割された各録音データファイルの音量レベルを調整して、新たな録音データファイルを生成する。録音データの分割では、音声のまとまりとしてたとえば、発言する者を基準に録音データを切り分けるほか、一連の発言毎に録音データを切り分けてもよい。そして、レベル調整では、フレーム分割された各録音データファイル間で音量レベルのばらつきを無くすように、音量レベルが調整され、新たな録音データが生成される。その他音声処理手段８では、たとえば音声処理装置２の全体動作制御などを行ってもよい。 The audio processing means 8 divides the captured recording data into frames for each set of predetermined audio and adjusts the volume level of each divided recording data file to generate a new recording data file. In the division of the recorded data, for example, the recorded data may be divided for each series of utterances, in addition to dividing the recorded data based on the person who speaks. In the level adjustment, the volume level is adjusted so as to eliminate the variation in the volume level between the recording data files divided into frames, and new recording data is generated. In the other voice processing means 8, for example, overall operation control of the voice processing device 2 may be performed.

記憶手段１０は、音声処理手段８で生成されたフレーム毎の新たな録音データを格納する手段の一例であり、たとえば音声処理装置２に内蔵されたメモリのほか、挿抜可能なカード型のＩＣメモリ、半導体メモリ、磁気ディスクなどが含まれる。そのほか記憶手段１０には、カセットテープやＤＡＴ（Digital Audio Tape）などの磁気テープを利用してもよい。そして記憶手段１０は、新たに生成された録音データを格納するとともに、音声処理装置２に対し、録音したデータの再生の要求に応じて録音データファイルの読み出しなどが行われる。 The storage means 10 is an example of means for storing new recording data for each frame generated by the sound processing means 8. For example, in addition to the memory built in the sound processing apparatus 2, a card-type IC memory that can be inserted and removed is used. , Semiconductor memory, magnetic disk, and the like. In addition, the storage means 10 may be a magnetic tape such as a cassette tape or DAT (Digital Audio Tape). The storage means 10 stores the newly generated recording data, and the audio processing device 2 reads the recording data file in response to a request for reproducing the recorded data.

音声処理装置２では、音声入力手段４が取り込んだ録音データについて、音声処理手段８がフレーム毎に音声のレベルのばらつきを抑え、同等なレベルに調整する。 In the voice processing device 2, the voice processing means 8 suppresses the fluctuation of the voice level for each frame and adjusts the recorded data taken in by the voice input means 4 to an equivalent level.

＜音声制御処理について＞ <About voice control processing>

図２は、音声処理の一例を示している。図２に示す処理手順、処理内容は一例であり、本発明がかかる構成に限定されない。 FIG. 2 shows an example of audio processing. The processing procedure and processing contents shown in FIG. 2 are examples, and the present invention is not limited to such a configuration.

音声処理は、本開示の音声処理方法または音声処理プログムの一例であり、音声の録音処理、録音データの分析および分解処理、録音データのレベル変換処理、新たな録音データの生成が含まれる。 Audio processing is an example of the audio processing method or audio processing program of the present disclosure, and includes audio recording processing, recording data analysis and decomposition processing, recording data level conversion processing, and generation of new recording data.

音声の録音処理では、たとえば音声処理が開始されると、録音したデータを記憶手段１０に形成された音声バッファに格納し、録音データファイルを生成する（Ｓ１）。次に、録音処理と並行に、または録音処理が完了した後に、録音データを音声のまとまり毎のフレームに分割する（Ｓ２）。このフレーム分割では、たとえば音声データの無音区間を基準に一連の発言や発言者が変わったタイミングと判断して録音データファイルを分割する。 In the audio recording process, for example, when the audio process is started, the recorded data is stored in an audio buffer formed in the storage means 10, and a recording data file is generated (S1). Next, in parallel with the recording process or after the recording process is completed, the recording data is divided into frames for each unit of sound (S2). In this frame division, for example, a recording data file is divided based on a silent section of audio data as a reference and a timing when a series of utterances or speakers change.

分割されたフレーム毎に録音データファイルの音声のレベルを判別する（Ｓ３）。判別したデータは、たとえば記憶手段１０に格納され、テーブル化される。このテーブルは、たとえば図３に示すように録音データテーブル１２Ａとして、各フレーム１２−１、１２−２、１２−３・・・で区分けされている。各フレーム１２−１、１２−２、１２−３・・・には、たとえば検出した音声のレベルとともに音声データが格納される。 The sound level of the recording data file is determined for each divided frame (S3). The determined data is stored, for example, in the storage means 10 and tabulated. For example, as shown in FIG. 3, this table is divided into frames 12-1, 12-2, 12-3,... As a recording data table 12A. In each of the frames 12-1, 12-2, 12-3,..., Audio data is stored together with the detected audio level, for example.

録音データのレベルが判別されると、この音声のレベルが予め設定されまたは所定のタイミングで設定される基準レベルと一致するか否かを判断する（Ｓ４）。この判別処理では、録音された音声と基準レベルとの差分を算出する。そして、録音データの調整処理として、算出したレベルの差分が無い場合（Ｓ４のＹＥＳ）は、音声のレベルを維持させる（Ｓ５）。 When the level of the recording data is determined, it is determined whether or not the level of the sound matches a reference level set in advance or set at a predetermined timing (S4). In this discrimination process, the difference between the recorded voice and the reference level is calculated. If there is no difference between the calculated levels as a recording data adjustment process (YES in S4), the sound level is maintained (S5).

またレベルに差分が有る場合（Ｓ４のＮＯ）、音声のレベルを基準レベルと同等にするようにレベルを更新させる（Ｓ６）。レベルの更新では、音声処理手段８により図３に示すように、記憶手段１０の録音データテーブル１２Ｂのレベルを変更させる。音声処理手段８は、テーブル１２Ａのレベルを基準レベルと同じ、または一定の範囲内になるようにテーブル１２Ｂを書き換え、レベルが調整された新たな録音データを生成する（Ｓ７）。 If there is a difference in level (NO in S4), the level is updated so that the audio level is equal to the reference level (S6). In the level update, the sound processing means 8 changes the level of the recording data table 12B of the storage means 10 as shown in FIG. The audio processing means 8 rewrites the table 12B so that the level of the table 12A is the same as or within a certain range as the reference level, and generates new recording data whose level is adjusted (S7).

斯かる構成によれば、録音データの再生時に、スピーカ等から出力される音声の再生音量の変動が抑制され、再生時に聞き取りやすい録音音声を提供することができる。再生音量の変動が抑制されることで、録音データを再生する音声処理装置の音量調整操作が不要となり、音声再生時の利便性を向上させることができる。 According to such a configuration, when the recorded data is reproduced, fluctuations in the reproduction volume of the sound output from the speaker or the like are suppressed, and a recorded sound that is easy to hear during reproduction can be provided. By suppressing the fluctuation in the reproduction volume, the volume adjustment operation of the audio processing apparatus that reproduces the recorded data becomes unnecessary, and the convenience during audio reproduction can be improved.

〔第２の実施の形態〕 [Second Embodiment]

図４は、第２の実施の形態に係る音声処理装置の一例を示している。図４に示す構成は一例であり、本開示の技術がかかる構成に限定されるものではない。 FIG. 4 shows an example of a speech processing apparatus according to the second embodiment. The configuration illustrated in FIG. 4 is an example, and the technology of the present disclosure is not limited to such a configuration.

音声処理装置２０は、マイクロフォン６によって集音された音声の録音機能とともに、その録音データの調整処理機能を備えている。この音声処理装置２０は、たとえば録音装置のほか、録音機能および録音プログラムまたは録音データに対する音声処理プログラムを実行可能なＰＣ（Personal Computer）、携帯情報処理装置などが含まれる。 The voice processing device 20 has a recording function for the voice collected by the microphone 6 and a function for adjusting the recorded data. The voice processing device 20 includes, for example, a recording device, a PC (Personal Computer) capable of executing a voice processing program for a recording function and a recording program or recorded data, a portable information processing device, and the like.

音声処理装置２０は、たとえばプロセッサ２２、記憶部２４、音声入出力部３０、音量レベル測定部３４、音声増幅回路３６、タイマ３７が形成されている。また、音声処理装置２０には、操作部３８、表示部４０、通信部４２などを備えている。 The audio processing device 20 includes, for example, a processor 22, a storage unit 24, an audio input / output unit 30, a volume level measurement unit 34, an audio amplification circuit 36, and a timer 37. The voice processing device 20 includes an operation unit 38, a display unit 40, a communication unit 42, and the like.

記憶部２４は、たとえば音声処理装置２０を動作させるプログラムや録音データなどを記憶するＲＯＭ（Read Only Memory）２６やプログラムの実行領域として機能するＲＡＭ（Random Access Memory）２８で形成される。 The storage unit 24 is formed of, for example, a ROM (Read Only Memory) 26 that stores a program for operating the audio processing device 20 and recorded data, and a RAM (Random Access Memory) 28 that functions as an execution area of the program.

ＲＯＭ２６は、不揮発性メモリであって、音声処理装置２０のＯＳ（Operating System）や音声処理を行うためのアプリケーションプログラムなどのプログラムを記憶するほか、録音データファイルや録音データテーブル７０（図７）などが記憶される。ＲＯＭ２６は、たとえばＨＤＤ（Hard Disk Drive）などの磁気ディスクやフラッシュメモリやＳＳＤ(Solid State Drive）などの半導体メモリで形成されればよい。 The ROM 26 is a non-volatile memory that stores programs such as an OS (Operating System) of the sound processing device 20 and application programs for performing sound processing, as well as a recording data file, a recording data table 70 (FIG. 7), and the like. Is memorized. The ROM 26 may be formed of a magnetic disk such as an HDD (Hard Disk Drive), or a semiconductor memory such as a flash memory or an SSD (Solid State Drive).

ＲＡＭ２８は、音声処理を実行するためのワークエリアとして音声バッファを形成し、また音声処理プログラムを展開する。プロセッサ２２は、プログラムを実行する演算処理手段であり、ＲＡＭ２８に展開されたプログラムにより音声処理を実行する。 The RAM 28 forms an audio buffer as a work area for executing audio processing, and develops an audio processing program. The processor 22 is an arithmetic processing unit that executes a program, and executes voice processing using the program developed in the RAM 28.

音声入出力部３０は、本開示の音声入出力手段の一例であり、マイクロフォン６からの入力音声の録音処理や録音データの解析、分割処理などを実行するほか、スピーカ３２により、録音データファイルの再生処理を行う。 The voice input / output unit 30 is an example of a voice input / output unit of the present disclosure. The voice input / output unit 30 performs recording processing of voice input from the microphone 6, analysis of recorded data, division processing, and the like. Perform playback processing.

音量レベル測定部３４は、音声処理手段の一部であって、音声入出力部３０で取り込んだ録音データについて、音量レベルを解析する。 The sound volume level measuring unit 34 is a part of the sound processing means and analyzes the sound volume level of the recording data captured by the sound input / output unit 30.

音声増幅回路３６は、フレーム毎に設定された音量レベルになるように録音データを調整する本開示の音声処理手段の一部を形成する回路の一例であり、たとえばアンプ（Amplifier）で形成される。音声増幅回路３６は、生成された録音データテーブル７０の指示情報に基づいて、対応するフレームに対して音量レベルを増減させる。 The audio amplifying circuit 36 is an example of a circuit that forms part of the audio processing means of the present disclosure that adjusts the recording data so as to achieve a volume level set for each frame, and is formed by an amplifier, for example. . The audio amplification circuit 36 increases or decreases the volume level for the corresponding frame based on the instruction information in the generated recording data table 70.

通信部４２は、通信アンテナ４４を利用して外部の通信機器とデータの送受信を行う手段の一例である。音声処理装置２０は、たとえば音声データファイルを添付した音声メールなど、音声処理を行った録音データや録音データテーブル７０を、通信部４２を介して外部の通信機器に送信してもよい。また音声処理装置２０は、たとえば外部の通信機器から録音データファイルを受信してもよい。 The communication unit 42 is an example of a unit that transmits / receives data to / from an external communication device using the communication antenna 44. The voice processing device 20 may transmit the recorded data and the recorded data table 70 subjected to voice processing, such as voice mail attached with a voice data file, to an external communication device via the communication unit 42. Further, the audio processing device 20 may receive a recording data file from, for example, an external communication device.

＜取り込んだ録音データについて＞ <About imported recording data>

図５は、録音データの分割状態の一例を示している。マイクロフォン６によって取り込んだ録音データは、たとえば図５に示すように、周囲で人が音声を発した時には大きな音量レベルを示し、無音または遠距離や小さな声で発した音声に対して小さな音量レベルを示している。また録音データには、たとえば時間経過に従って大きな音量レベルが続いた後、小さな音量レベルが続く状態となり音声のまとまりが生じている。このような音声のまとまりは、たとえば人が一連の言葉の固まり（会話）を発した状態や、発声した人が切り替わった状態を示している。 FIG. 5 shows an example of a recording data division state. As shown in FIG. 5, for example, the recording data captured by the microphone 6 shows a high volume level when a person utters a voice in the surroundings, and a low volume level with respect to a voice uttered by silence or a long distance or a small voice. Show. Also, in the recorded data, for example, after a large volume level continues with time, a small volume level continues and a sound is collected. Such a group of voices indicates, for example, a state in which a person utters a series of words (conversation) or a state in which the person who has spoken has switched.

このような音量レベルがばらついた録音データをそのまま再生処理すると、音量レベルの大きなときには、大きな音声で再生され、音量レベルが小さい場合は、小さな音声となる。音声処理装置２０では、録音データに含まれる音声のまとまり毎にフレーム分割し、分割された録音データファイル間での音声レベルを解析する。録音データのフレーム分割では、たとえば音量レベルの変動に基づいて分割しており、音声が発せられた固まりのほか、無音や一定レベル以下の音声の固まりも分割する。 When the recorded data with such a varying volume level is reproduced as it is, it is played back with a loud sound when the volume level is high, and a small sound when the volume level is low. The audio processing device 20 divides the frame for each unit of audio included in the recording data, and analyzes the audio level between the divided recording data files. The recording data is divided into frames based on, for example, fluctuations in the volume level, and in addition to the chunks in which the voice is emitted, the chunks of silence and voices below a certain level are also divided.

また音声処理装置２０は、各録音データファイルについて、音量レベル測定部３４により音量レベルが解析される。音声は、１つの録音データファイル内でも音量レベルが変動する。音声処理装置２０では、音量レベル測定部３４で生成された録音データファイル内の音量レベルの最高値または平均値に基づいて音量レベルを設定する。 In the audio processing device 20, the volume level is analyzed by the volume level measuring unit 34 for each recording data file. The volume level of audio varies even within one recorded data file. In the sound processing device 20, the volume level is set based on the maximum value or the average value of the volume levels in the recording data file generated by the volume level measurement unit 34.

音量レベルは、たとえば図６に示すように、記憶部２４に格納された音量レベル設定テーブル５０を利用して設定される。この音量レベル設定テーブル５０は、たとえば音量レベルの検出値に対し、所定の閾値が設定されている。音量レベルの閾値は、たとえば録音データに含まれる音圧などの範囲について均等に分けられるほか、録音データの再生時に人が音量を調整する傾向にある音圧の範囲について細分化して設定してもよい。 The volume level is set by using a volume level setting table 50 stored in the storage unit 24, for example, as shown in FIG. In this volume level setting table 50, for example, a predetermined threshold is set for the detected value of the volume level. For example, the threshold of the volume level can be divided evenly for the range of sound pressure included in the recorded data, or can be set by subdividing the range of sound pressure that tends to adjust the volume when the recorded data is played back. Good.

音声処理装置２０では、取り込んだ録音データについてたとえば図７に示すようにフレーム毎に音量レベル情報と音声データとを関連付けた録音データテーブル７０を作成し、ＲＯＭ２６に格納している。録音データテーブル７０は、録音データを解析した情報であるとともに、この録音データの再生処理を実行するときの指示情報となる。 In the audio processing device 20, a recording data table 70 in which volume level information and audio data are associated with each other is created for each frame as shown in FIG. 7 and stored in the ROM 26. The recording data table 70 is information obtained by analyzing the recording data, and also serves as instruction information when executing the reproduction processing of the recording data.

録音データテーブル７０には、たとえばフレーム毎に、記録トラックＡとして設定音量レベル情報を格納し、記録トラックＢとして発声音声を含む音声データが格納される。録音データテーブル７０は、録音した時系列に従ってフレームを配列している。 In the recording data table 70, for example, for each frame, set volume level information is stored as the recording track A, and audio data including the uttered voice is stored as the recording track B. The recording data table 70 arranges frames according to the recorded time series.

＜音声処理について＞ <About audio processing>

図８は、音量レベルの調整による新たな録音データの生成状態を示している。 FIG. 8 shows a new recording data generation state by adjusting the volume level.

音声処理装置２０は、たとえば図８のＡに示すように、出力音量レベルとしてたとえば音量レベル居３が予め設定され、または音声処理の実行操作を行う利用者によって設定されると、この設定を制御情報として記憶する。変換前の録音データは、図８のＢに示すように、たとえば言葉の固まり毎に音量レベルに大小のばらつきが生じている。 For example, as shown in FIG. 8A, the sound processing device 20 controls the setting when the sound volume level 3 is set in advance as the output sound volume level or is set by the user who performs the sound processing execution operation. Store as information. In the recorded data before conversion, as shown in FIG. 8B, for example, there are large and small variations in the volume level for each word group.

録音データには、設定された音量レベル３に対し、発声部分Ｘ１ａ、Ｘ３ａは大きな音量で録音され、発声部分Ｘ２ａが小さな音量で録音されている。この録音データは、たとえば図９のＡに示すように、録音データテーブル７０に、言葉のかたまりで分割されたフレーム１、３、５の音量レベル情報７２、７４、７６として音量レベル４、音量レベル１、音量レベル４がそれぞれ特定されている。 In the recorded data, the utterance portions X1a and X3a are recorded at a high volume with respect to the set volume level 3, and the utterance portion X2a is recorded at a low volume. For example, as shown in FIG. 9A, the recording data is recorded in the recording data table 70 as volume level information 72, 74, 76 of frames 1, 3, 5 divided by a lump of words. 1 and volume level 4 are specified respectively.

音声増幅回路３６では、図８のＣに示すように、録音データのうち、設定された音量レベル３よりも大きな音で録音された発声部分Ｘ１ａ、Ｘ３ａについて音量を低減させ、発声部分Ｘ１ｂ、Ｘ３ｂに変換する。また、音声増幅回路３６は、音量レベル３よりも小さい音量の発声部分Ｘ２ａについて、音量を音量レベル３に基づいて増幅させ、発声部分Ｘ２ｂに変換する。そして、音声増幅回路３６は、音量レベルを変化させた新たな録音データを生成する。また、音声処理では、無音区間として特定されたフレームについては音量の増減処理を行わない。 As shown in FIG. 8C, the audio amplifying circuit 36 reduces the volume of the utterance parts X1a and X3a recorded with a sound higher than the set volume level 3 in the recording data, and the utterance parts X1b and X3b. Convert to In addition, the sound amplification circuit 36 amplifies the volume of the utterance portion X2a having a volume smaller than the volume level 3 based on the volume level 3, and converts the voicing portion X2b into the utterance portion X2b. Then, the audio amplification circuit 36 generates new recording data with the volume level changed. Further, in the audio processing, the volume increase / decrease process is not performed for the frame specified as the silent section.

この音声処理では、たとえば図９のＢに示すように、録音データテーブル７０について、音量レベル情報７２、７４、７６として音量レベル３が設定され、この設定情報に基づいて音量を増減した新たな録音データが生成される。 In this audio processing, for example, as shown in FIG. 9B, the volume level 3 is set as the volume level information 72, 74, 76 in the recording data table 70, and a new recording whose volume is increased or decreased based on this setting information. Data is generated.

＜音声処理の具体例について＞ <Specific examples of audio processing>

図１０〜図１３は、音声処理の一例を示すフローチャートである。図１０〜図１３に示す処理手順、処理内容は一例である。 10 to 13 are flowcharts illustrating an example of audio processing. The processing procedures and processing contents shown in FIGS. 10 to 13 are examples.

この音声処理は、本開示の音声処理方法または音声処理プログムの一例である。音声処理装置２０は、音声入力の開始判断として、たとえば操作部３８の押下や表示部４０に設定されたタッチパネルの操作により録音機能が開始されたか否かを判断する（Ｓ１１）。取り込んだ録音データは、たとえば音声入出力部３０や記憶部２４に形成された音声バッファに記録される（Ｓ１２）。音声バッファでは、たとえば録音データについて、時間経過情報に関連付けて音量レベルの波形検出を行ってもよい。 This sound processing is an example of the sound processing method or the sound processing program of the present disclosure. The voice processing device 20 determines whether or not the recording function is started by, for example, pressing the operation unit 38 or operating the touch panel set on the display unit 40 as a voice input start determination (S11). The captured recording data is recorded, for example, in an audio buffer formed in the audio input / output unit 30 or the storage unit 24 (S12). In the audio buffer, for example, sound volume level waveform detection may be performed in association with time lapse information for recorded data.

音声処理装置２０は、フレーム分割処理として、たとえば音量レベル測定部３４により音量レベルを測定し（Ｓ１３）、発声部分と無音部分とを判別する。音声処理では、たとえば発声のかたまりの区切り部分を抽出するため、無音または閾値未満の音量レベルが所定時間として、３秒以上続いたか否かを判断する（Ｓ１４）。音声処理装置２０では、無音が所定時間継続していない場合（Ｓ１４のＮＯ）、音量レベルの変化が一連の言葉の途中の抑揚であり、同一の人が続けて発声して会話が継続していると判断し、録音状態のまま、音量レベル監視を継続する。 As the frame division process, the audio processing device 20 measures the volume level by using the volume level measuring unit 34 (S13), and discriminates the uttered part and the silent part. In the voice processing, for example, in order to extract a break portion of the voicing chunk, it is determined whether or not the sound volume level that is silent or less than the threshold has continued for 3 seconds or more as a predetermined time (S14). In the voice processing device 20, when silence has not continued for a predetermined time (NO in S14), the change in volume level is an inflection in the middle of a series of words, and the same person continues to speak and continue the conversation. Continue monitoring the volume level while recording.

音声処理装置２０は、音量レベルが無音または閾値未満の状態が所定時間継続した場合（Ｓ１４のＹＥＳ）、録音した音声についてフレーム分割したファイルを作成する（Ｓ１５）とともに、無音部分について音量レベル「０」のファイルを追加する（Ｓ１６）。フレーム分割では、たとえば音量レベルが閾値未満となったタイミング、すなわち所定時間の計時を開始したタイミングで録音データを分割すればよい。また音声処理装置２０は、フレーム分割した時点から次の音声入力を検出するまで音量レベル「０」の録音データを作成する。 When the sound volume level is silent or less than the threshold value for a predetermined time (YES in S14), the sound processing device 20 creates a file in which the recorded sound is divided into frames (S15), and at the same time the sound volume level “0” for the soundless portion. "Is added (S16). In the frame division, for example, the recording data may be divided at the timing when the volume level becomes less than the threshold, that is, the timing when the predetermined time is started. Further, the audio processing device 20 creates recording data of a volume level “0” from the time when the frame is divided until the next audio input is detected.

音声処理装置２０は、音声入力が終了したか否かを判別し（Ｓ１７）、たとえば利用者による録音機能停止操作があった場合には、音声入力の終了とし（Ｓ１７のＹＥＳ）、記憶した音声ファイルのレベル変換処理に移行する（Ｓ１８）。音声入力が終了していない場合（Ｓ１７のＮＯ）、再び音声が検出されると、続けて録音処理と無音または閾値未満の音量レベルの検出を行う。 The voice processing device 20 determines whether or not the voice input is finished (S17). For example, when the user performs a recording function stop operation, the voice input is finished (YES in S17), and the stored voice is recorded. The process proceeds to file level conversion processing (S18). If the voice input has not ended (NO in S17), when the voice is detected again, the recording process and the detection of the silence or the volume level below the threshold are subsequently performed.

なお、音声ファイルのレベル変換処理は、録音処理の終了後に実行される場合に限られない。音声処理装置２０は、録音処理と同時に、音声バッファに蓄積された録音データをフレーム毎に音声ファイルのレベルを変換させてもよい。 Note that the level conversion process of the audio file is not limited to being executed after the recording process is completed. At the same time as the recording process, the audio processing device 20 may convert the level of the audio file for each frame of the recording data stored in the audio buffer.

＜ファイル作成処理について＞ <About file creation processing>

Ｓ１５のファイル作成処理では、たとえば図１１に示すように、フレーム分割した録音データファイルについて、音声データに関連付けた音量レベルを示す録音データテーブル７０を作成する。録音データテーブル７０は、録音データの再生処理を行う際の設定音量情報であり、無音や閾値未満の音声を含ませないように、フレーム分割処理後に作成される。 In the file creation process of S15, for example, as shown in FIG. 11, a recording data table 70 indicating the volume level associated with the audio data is created for the recording data file divided into frames. The recording data table 70 is set sound volume information when performing the reproduction processing of the recording data, and is created after the frame division processing so as not to include silence and audio less than a threshold value.

音声処理装置２０は、たとえば音量レベル設定テーブル５０を読み出し、記憶した録音データの波形情報と音量レベルの閾値とを比較し、音量レベルの設定値を設定する（Ｓ２１）。設定された音量レベルは、音声データと関連付けて録音データテーブル７０に設定する（Ｓ２２、Ｓ２３）。作成された録音データテーブル７０（録音データファイル）は、記憶部２４に格納される（Ｓ２４）。 For example, the sound processing device 20 reads the volume level setting table 50, compares the waveform information of the stored recording data with the threshold value of the volume level, and sets the set value of the volume level (S21). The set volume level is set in the recording data table 70 in association with the audio data (S22, S23). The created recording data table 70 (recording data file) is stored in the storage unit 24 (S24).

＜ファイルレベルの変換処理について＞ <File level conversion process>

ファイルレベル変換処理では、作成された録音データファイルについて、音声レベルを所定の設定値に設定させる。音声処理装置２０では、例えば図１２に示すように、録音データ全体のタイムフレームに従った順序で、作成した録音データファイル毎に音量レベルを読み込む（Ｓ３１）。また音声処理装置２０は、タイムフレームに従って音声データを読込み（Ｓ３２）、録音データの変換をさせ（Ｓ３３）、変換した音量レベルと音声データとを関連付けた新たな録音データを作成する。録音データの変換処理では、音声データの音量を増幅や減衰させるとともに、録音データテーブル７０に格納される音量レベルの設定値も変換する。 In the file level conversion process, the sound level is set to a predetermined setting value for the created recording data file. In the audio processing device 20, for example, as shown in FIG. 12, the volume level is read for each created recording data file in the order according to the time frame of the entire recording data (S31). The voice processing device 20 reads the voice data according to the time frame (S32), converts the recorded data (S33), and creates new recorded data in which the converted volume level is associated with the voice data. In the recording data conversion process, the volume of the audio data is amplified or attenuated, and the set value of the volume level stored in the recording data table 70 is also converted.

ファイルのレベル変換処理は、タイムフレーム毎に処理し、全てのフレームの変換処理が完了するまで実行される（Ｓ３４）。全てのフレームが変換されると（Ｓ３４のＹＥＳ）、音声処理装置２０は、タイムフレームに従って変換後の録音データファイルを記憶部２４に格納する（Ｓ３５）。 The file level conversion process is performed for each time frame, and is executed until the conversion process for all the frames is completed (S34). When all the frames are converted (YES in S34), the sound processing device 20 stores the converted recording data file in the storage unit 24 according to the time frame (S35).

＜録音データの変換処理について＞ <Recording data conversion process>

録音データの変換処理では、たとえば図１３に示すように、音声処理装置２０は、設定された出力音量レベルを取得すると（Ｓ４１）、録音データファイルをフレーム毎に、録音データファイルの音量レベルが「０」か否かを判別する（Ｓ４２）。この判断では、完全な無音（音量レベル「０」）の場合のほか、無音と判断する閾値未満か否かを判断してもよい。 In the recording data conversion process, for example, as shown in FIG. 13, when the audio processing device 20 acquires the set output volume level (S41), the volume level of the recording data file is set to “3” for each frame of the recording data file. It is determined whether or not “0” (S42). In this determination, in addition to the case of complete silence (sound volume level “0”), it may be determined whether or not it is less than a threshold for determining silence.

録音データの音量レベルが「０」でない、または閾値以上の場合（Ｓ４２のＮＯ）、録音データファイルの音量レベルが出力音量レベルよりも小さいか否かを判断する（Ｓ４３）。または録音データの音量レベルが「０」または、閾値未満の場合（Ｓ４２のＹＥＳ）、録音データファイルの音量レベル（「０」）と無音の音声データを出力ファイルに複写する（Ｓ４４）。出力ファイルは、たとえば記憶部２４内に形成される新たな録音データテーブル７０を示す。 If the volume level of the recorded data is not “0” or greater than the threshold (NO in S42), it is determined whether or not the volume level of the recorded data file is smaller than the output volume level (S43). Alternatively, if the volume level of the recording data is “0” or less than the threshold value (YES in S42), the volume level (“0”) of the recording data file and the silent audio data are copied to the output file (S44). The output file indicates a new recording data table 70 formed in the storage unit 24, for example.

録音データファイルの音量レベルが出力音量レベルよりも小さい場合（Ｓ４３のＹＥＳ）、録音データファイルの音量レベルを出力音量レベルになるように増幅させる（Ｓ４５）。または録音データファイルの音量レベルが出力音量レベルよりも大きい場合（Ｓ４３のＮＯ）、録音データファイルの音量レベルを出力音量レベルになるように減衰させる（Ｓ４６）。録音データファイルの変換処理では、録音データを音量レベルと設定された出力音量レベルとの差分を算出し、この差分に基づいて音量レベルの増幅または減衰させればよい。録音データファイルの増幅または減衰処理は、たとえば音声増幅回路３６によって実行される。 If the volume level of the recording data file is smaller than the output volume level (YES in S43), the volume level of the recording data file is amplified to the output volume level (S45). Alternatively, when the volume level of the recording data file is larger than the output volume level (NO in S43), the volume level of the recording data file is attenuated to the output volume level (S46). In the recording data file conversion process, the difference between the volume level of the recording data and the set output volume level may be calculated, and the volume level may be amplified or attenuated based on this difference. Amplification or attenuation processing of the recording data file is executed by, for example, the audio amplification circuit 36.

そして、音声処理装置２０では、増幅または減衰させた録音データファイルについて、出力ファイルのフレーム音量レベルを保存し（Ｓ４７）、出力ファイルの音量レベルを録音データテーブル７０に設定する（Ｓ４８）。 The audio processing device 20 stores the frame volume level of the output file for the amplified or attenuated recording data file (S47), and sets the volume level of the output file in the recording data table 70 (S48).

斯かる構成によれば、録音データの再生時に、スピーカ等から出力される音声の再生音量の変動が抑制され、再生時に聞き取りやすい録音音声を提供することができる。再生音量の変動が抑制されることで、録音データを再生する音声処理装置の音量調整操作が不要となり、音声再生時の利便性を向上させることができる。また、無音区間の監視により、音声データが含まれる部分のみの音量レベルを変換させることで、再生時に雑音などが増幅されず、聞取り易い音声データを生成できる。 According to such a configuration, when the recorded data is reproduced, fluctuations in the reproduction volume of the sound output from the speaker or the like are suppressed, and a recorded sound that is easy to hear during reproduction can be provided. By suppressing the fluctuation in the reproduction volume, the volume adjustment operation of the audio processing apparatus that reproduces the recorded data becomes unnecessary, and the convenience during audio reproduction can be improved. In addition, by monitoring the silent section and converting the volume level of only the portion including the audio data, it is possible to generate audio data that is easy to hear without amplification of noise during reproduction.

〔第３の実施の形態〕 [Third Embodiment]

図１４は、第３の実施の形態に係る音声処理の状態例を示している。音声処理装置２０は、たとえば図１４のＡに示すように、音量レベルの設定値として、出力音量レベルが設定されている。変換前の録音データは、図１４のＢに示すように、たとえば言葉の固まり毎に音量レベルに大小のばらつきが生じている。 FIG. 14 shows an example of the state of audio processing according to the third embodiment. For example, as shown in FIG. 14A, the audio processing device 20 has an output volume level set as a volume level setting value. In the recording data before conversion, as shown in FIG. 14B, for example, there are large and small variations in the volume level for each word cluster.

録音データは、設定された出力音量レベルに対し、発声部分Ｘ２ａの部分が小さな音量で録音されている。この実施の形態では、録音データの音量レベルについて、出力音量レベルよりも小さい部分の増幅のみを行う場合を示している。 In the recorded data, the utterance portion X2a is recorded at a low volume with respect to the set output volume level. In this embodiment, a case is shown in which only a portion of the volume level of the recording data that is smaller than the output volume level is amplified.

そこで、音声増幅回路３６では、図１４のＣに示すように、録音データうち、出力音量レベルよりも小さい音量の発声部分Ｘ２ｂについて、音量を設定された出力音量レベルに基づいて増幅させ、発声部分Ｘ２ｂに変換する。また音声処理装置２０は、図１５のＡに示すように、変換前の録音データファイルについて録音データテーブル７０の発生部分Ｘ２ａに対応するフレーム３について、音量レベル情報８０Ａが設定されている。そして音声処理装置２０は、発生部分Ｘ２ａの音量レベルが変換されると、図１５のＢに示すように、音量レベル情報８０Ｂに出力音量レベルを設定する。 Therefore, as shown in FIG. 14C, the audio amplifying circuit 36 amplifies the utterance portion X2b having a volume smaller than the output volume level in the recording data based on the set output volume level, Convert to X2b. Further, as shown in FIG. 15A, the audio processing device 20 has volume level information 80A set for the frame 3 corresponding to the generation portion X2a of the recording data table 70 for the recording data file before conversion. Then, when the sound volume level of the generated portion X2a is converted, the sound processing device 20 sets the output sound volume level in the sound volume level information 80B as shown in B of FIG.

＜音声処理の他の例について＞ <Other examples of audio processing>

録音データは、たとえば図１６のＢに示すように、設定された出力音量レベルに対し、発声部分Ｘ１ａ、Ｘ３ａの部分が大きな音量で録音されている。このような録音データに対し、音声処理装置２０は、たとえば録音データファイルについて、設定された出力音量レベルを基準とし、この出力音量レベルよりも大きな音量レベルのフレームについて音量レベルを減衰させる。 For example, as shown in FIG. 16B, the sound recording portions X1a and X3a are recorded at a louder volume than the set output volume level. For such recorded data, the audio processing device 20 attenuates the volume level for a frame having a volume level higher than the output volume level, for example, with respect to the set output volume level for the recorded data file.

そこで、音声増幅回路３６では、図１６のＣに示すように、録音データうち、出力音量レベルよりも大きい音量の発声部分Ｘ１ａ、Ｘ３ａについて、出力音量レベルに基づいて減衰させ、発声部分Ｘ１ｂ、Ｘ３ｂに変換する。また音声処理装置２０は、図１７のＡに示すように、変換前の録音データファイルについて録音データテーブル７０の発生部分Ｘ１ａに対応するフレーム１、および発声部分Ｘ３ａに対応するフレーム５について、音量レベル情報８２Ａ、８４Ａが設定されている。そして音声処理装置２０は、発生部分Ｘ１ａ、Ｘ３ａの音量レベルが変換されると、図１７のＢに示すように、音量レベル情報８２Ｂ、８４Ｂに出力音量レベルを設定する。 Therefore, in the audio amplifying circuit 36, as shown in FIG. 16C, the utterance portions X1a and X3a having a volume higher than the output volume level in the recording data are attenuated based on the output volume level, and the utterance portions X1b and X3b are attenuated. Convert to In addition, as shown in FIG. 17A, the audio processing device 20 sets the volume level for the frame 1 corresponding to the generation portion X1a of the recording data table 70 and the frame 5 corresponding to the utterance portion X3a of the recording data file before conversion. Information 82A and 84A are set. Then, when the sound volume levels of the generated portions X1a and X3a are converted, the sound processing device 20 sets the output sound volume level in the sound volume level information 82B and 84B as shown in B of FIG.

なお、録音データに対する音声処理については、音声処理装置２０の利用者が任意で増幅または減衰を選択してもよく、または予め設定してもよい。そのほか、音声処理装置２０は、たとえば音声処理の選択について、発声部分Ｘ１ａ、Ｘ２ａ、Ｘ３ａ・・・の音量のばらつきに対し、出力音量レベルに近い方のフレームを基準にして、増副処理または減衰処理を設定させてもよい。 In addition, about the audio | voice process with respect to recording data, the user of the audio | voice processing apparatus 20 may select amplification or attenuation arbitrarily, or may set beforehand. In addition, the audio processing device 20 selects, for example, audio processing, and performs sub-amplification processing or attenuation on the basis of a frame closer to the output sound volume level with respect to variations in sound volume of the utterance portions X1a, X2a, X3a. Processing may be set.

斯かる構成によれば、録音データの再生処理において、フレーム間の音量のばらつき量が軽減でき、利用者による音量の調整作業の煩わしさを解消できる。また、音声処理を行うことで、再生処理時に音量のばらつきを抑えられるので、フレーム間において利用者が音声の内容の把握をし易くさせることができる。 According to such a configuration, the amount of variation in volume between frames can be reduced in the reproduction processing of recorded data, and the troublesome volume adjustment operation by the user can be eliminated. Also, by performing audio processing, variation in volume during playback processing can be suppressed, so that the user can easily understand the audio content between frames.

〔他の実施の形態〕 [Other Embodiments]

以上説明した実施の形態について、変形例を以下に列挙する。 Examples of modifications described above are listed below.

(1) 上記実施の形態では、音声処理装置２がマイクロフォン６で取り込んだ録音データについて音声処理を行う場合を示したが、これに限られない。音声処理装置２は、外部の通信機器から録音データを受信し、この録音データに含まれる音声についてフレーム分割し、このフレーム毎に音量レベルを調整してもよい。 (1) In the above embodiment, the case where the audio processing device 2 performs audio processing on the recording data captured by the microphone 6 has been described, but the present invention is not limited to this. The audio processing device 2 may receive recording data from an external communication device, divide the frame of audio included in the recording data, and adjust the volume level for each frame.

(2) 上記実施の形態では、音声処理装置２が録音した、または外部から受信した１つの録音データに含まれる音声についてフレーム分割して音量レベルを調整する場合を示したがこれに限られない。音声処理装置２は、複数の録音データ間で音量レベルのばらつきを解析し、音量レベルの調整処理を行ってもよい。これにより録音機器により予め分割された録音データや複数の録音データが添付された音声メールについて、これらの録音データを連続して再生する場合、録音データ毎に再生音量がばらつくのを防止できる。 (2) In the above embodiment, the case has been shown in which the volume level is adjusted by dividing the frame for the sound included in one recording data recorded by the sound processing device 2 or received from the outside, but is not limited thereto. . The sound processing device 2 may analyze the variation of the sound volume level among a plurality of recorded data and perform the sound volume level adjusting process. As a result, when the recorded data is divided in advance by the recording device or the voice mail to which a plurality of recorded data is attached is reproduced continuously, it is possible to prevent the reproduction volume from varying for each recorded data.

(3) 上記実施の形態では、音声処理装置２０は、音声の録音処理を行いながら閾値未満の音量である無音状態を監視し、無音状態が一定時間継続したときにフレーム分けする場合を示したがこれに限られない。音声処理装置２０は、たとえば一定時間毎にフレーム分割して録音データファイルを作成し、フレーム毎に音量レベルを監視して、音量レベルを増幅または減衰させる音声処理を行ってもよい。 (3) In the above embodiment, the voice processing device 20 monitors the silent state having a volume less than the threshold while performing the voice recording process, and divides the frame when the silent state continues for a certain period of time. However, it is not limited to this. For example, the sound processing device 20 may divide the frame at regular intervals to create a recording data file, monitor the sound volume level for each frame, and perform sound processing to amplify or attenuate the sound volume level.

音声処理装置２０は、図１８のＡに示すように、時間経過に従って音量レベルにばらつきがある録音データを記憶する。音声処理装置２０は、録音データについて、設定時間ｔとして、たとえば３〔秒〕毎にフレームカットして録音データファイルを作成するとともに、図１８のＢに示すように、その録音データファイルの情報を管理する録音データテーブル９０を作成する。録音データファイルでは、たとえば最初の発声部分に含まれる音声データ１が音声データ１ａ、１ｂとしてフレーム１、フレーム２に録音される。つまり、音声データ１は、たとえば少なくとも２つのフレームに跨って録音されているので、４秒以上の長さを有する。次に、録音データファイルは、たとえばフレーム３として無音区間が記録されている。 As shown in FIG. 18A, the audio processing device 20 stores recording data having a variation in volume level over time. The audio processing device 20 creates a recording data file by cutting the frame every 3 seconds, for example, as the set time t for the recording data, and as shown in FIG. A recording data table 90 to be managed is created. In the recording data file, for example, audio data 1 included in the first utterance portion is recorded in frames 1 and 2 as audio data 1a and 1b. That is, since the audio data 1 is recorded across, for example, at least two frames, it has a length of 4 seconds or more. Next, in the recording data file, a silent section is recorded as frame 3, for example.

このように音声処理装置２０は、無音区間に関わらず、録音処理の開始からタイマ３７の計時に基づいて録音データファイルを作成してもよい。そして音声処理装置２０は、作成された録音データファイルについて、音量レベルを把握するとともに、音量レベルの増幅処理または減衰処理を行ってもよい。 In this way, the audio processing device 20 may create a recording data file based on the time of the timer 37 from the start of the recording process regardless of the silent period. Then, the sound processing device 20 may grasp the volume level of the created recording data file and may perform an amplification process or an attenuation process of the volume level.

音声処理装置２０による音声処理では、たとえば図１９に示すように、音声入力が開始されたことを契機に（Ｓ５１のＹＥＳ）、タイマ３７による計時を開始する（Ｓ５２）。音声処理装置２０は、マイクロフォン６から入力される録音データを音声バッファに記録し（Ｓ５３）、フレーム分割の設定時間ｔが経過したか否かを監視する（Ｓ５４）。設定時間ｔが経過した場合（Ｓ５４のＹＥＳ）、録音データをフレームカットし（Ｓ５５）、録音データファイルの作成処理に移行する（Ｓ５６）。録音データファイルの作成の後、録音データについて、各録音データファイルについて、ファイルレベルの変換処理として出力音量レベルへの変換処理が行われる（Ｓ５７）。ファイル作成処理（Ｓ５６）およびファイルレベル変換処理（Ｓ５７）は、既述のＳ１５およびＳ１８（図１０）と同様の処理が行われればよく、その説明を割愛する。 In the voice processing by the voice processing device 20, for example, as shown in FIG. 19, when the voice input is started (YES in S51), timing by the timer 37 is started (S52). The audio processing device 20 records the recording data input from the microphone 6 in the audio buffer (S53), and monitors whether or not the frame division set time t has elapsed (S54). When the set time t has elapsed (YES in S54), the recording data is frame cut (S55), and the process proceeds to a recording data file creation process (S56). After the recording data file is created, the recording data is converted into an output sound volume level as a file level conversion process for each recording data file (S57). The file creation process (S56) and the file level conversion process (S57) only need to be performed in the same manner as the above-described S15 and S18 (FIG. 10), and will not be described.

なお、音声処理装置２０は、作成された録音データファイルについて、録音処理と同時にファイル作成処理およびファイルレベル変換処理を行ってもよい。また作成された録音データファイルは、たとえば音声データを含む連続した録音データ同士を１つの録音データに結合してもよい。 Note that the audio processing device 20 may perform file creation processing and file level conversion processing simultaneously with the recording processing on the created recording data file. In the created recording data file, for example, continuous recording data including audio data may be combined into one recording data.

(4) 上記実施の形態では、録音データファイルに対する音量レベルの変換処理において、１つの出力音量レベルを基準に音量レベルの増幅または減衰の処理を行ったが、これに限られない。音量レベルの変換処理では、複数の出力音量レベルを設定し、処理を行う録音データファイル毎に、いずれかの出力音量レベルに基づいて変換処理を行ってもよい。 (4) In the above embodiment, in the volume level conversion process for the recording data file, the volume level amplification or attenuation process is performed based on one output volume level. However, the present invention is not limited to this. In the volume level conversion process, a plurality of output volume levels may be set, and the conversion process may be performed based on any output volume level for each recording data file to be processed.

出力音量レベルの設定は、たとえば利用者がファイル毎に設定してもよく、または録音処理の時間情報に基づいて設定してもよい。または出力音量レベルは、たとえば別途登録し、または音声の解析などにより発声者の声質などに応じて選択してもよい。 For example, the user may set the output volume level for each file, or may set it based on time information of the recording process. Alternatively, the output volume level may be registered separately, or may be selected according to the voice quality of the speaker by analyzing the voice.

斯かる構成によれば、音声処理装置２０は、無音区間の到来に関わらず録音データファイルを作成することで、音声バッファを形成するＲＡＭの容量が少ない場合でも、音声処理を実行できる。また録音データについて、分割フレーム毎の長さを一定にすることで、録音時に音声バッファに対する容量の負担を軽減できる。そのほか録音データを所定時間毎に区切ることで、発声時間が長い区間について、細かく音量レベルを解析し、その出力の増幅または減衰をさせることで、音量のばらつきを小さくでき、把握しやすい音量の録音データを作成することができる。 According to such a configuration, the audio processing device 20 can execute the audio processing even when the capacity of the RAM that forms the audio buffer is small by creating the recording data file regardless of the arrival of the silent period. In addition, by making the length of each divided frame constant for the recorded data, it is possible to reduce the capacity burden on the audio buffer during recording. In addition, by dividing the recording data every predetermined time, the volume level can be analyzed finely in the section where the utterance time is long, and the output can be amplified or attenuated to reduce the variation of the volume, making it easy to grasp the volume. Data can be created.

次に、以上述べた実施の形態に関し、更に以下の付記を開示する。本開示の技術に係る技術的思想は上位概念から下位概念まで、様々なレベルやバリエーションにより把握できるものであり、以下の付記に本開示の技術が限定されるものではない。 Next, the following additional notes are disclosed with respect to the embodiment described above. The technical idea related to the technology of the present disclosure can be grasped by various levels and variations from the superordinate concept to the subordinate concept, and the technology of the present disclosure is not limited to the following supplementary notes.

（付記１）録音機能を備える音声処理装置であって、
レベルの異なる音声を取り込んで録音データを生成する音声入力手段と、
前記音声入力手段で生成した前記録音データを、音声のまとまり毎に複数のフレームに分割し、該フレーム毎に前記録音データに含まれる音声のレベルを判別し、前記録音データのレベルが基準レベルと異なる場合、該基準レベルに基づいて、前記録音データのレベルを調整した新たな録音データを生成する音声処理手段と、
前記音声処理手段で生成した前記録音データを記憶する記憶手段と、
を備えることを特徴とする音声処理装置。 (Supplementary note 1) A voice processing device having a recording function,
A voice input means for capturing voice data of different levels and generating recording data;
The recording data generated by the voice input means is divided into a plurality of frames for each group of voices, the level of the voice included in the recording data is determined for each frame, and the level of the recording data is a reference level. If different, sound processing means for generating new recording data in which the level of the recording data is adjusted based on the reference level;
Storage means for storing the recording data generated by the voice processing means;
An audio processing apparatus comprising:

（付記２）前記音声処理手段は、取り込んだ前記録音データのレベルと前記基レベルの差分を判別し、該差分に基づいて前記フレーム毎にレベルを増減させることを特徴とする付記１に記載の音声処理装置。 (Additional remark 2) The said audio | voice processing means discriminate | determines the difference of the level of the said recording data taken in, and the said base level, and increases / decreases a level for every said frame based on this difference, It is characterized by the above-mentioned. Audio processing device.

（付記３）前記音声処理手段は、前記録音データの音量レベルを監視し、前記基準レベルに基づいて前記録音データの音量レベルを調整することを特徴とする付記１または付記２に記載の音声処理装置。 (Supplementary note 3) The voice processing according to supplementary note 1 or supplementary note 2, wherein the voice processing means monitors the volume level of the recorded data and adjusts the volume level of the recorded data based on the reference level. apparatus.

（付記４）前記音声処理手段は、前記録音データのレベルが閾値以下か否かを判別し、閾値以下となった場合、前記録音データを分割してフレーム分けすることを特徴とする付記１ないし付記３のいずれか１つに記載の音声処理装置。 (Additional remark 4) The said audio | voice processing means discriminate | determines whether the level of the said sound recording data is below a threshold value, and when it becomes below a threshold value, the said sound recording data are divided | segmented into frames, The additional notes 1 thru | or characterized by the above-mentioned. The speech processing apparatus according to any one of the supplementary notes 3.

（付記５）さらに、録音時間を計時するタイマを備え、
前記音声処理手段は、前記録音データのレベルが前記閾値以下の状態が所定時間以上経過した場合、前記録音データを分割して低レベルフレームを作成し、該低レベルフレーム以外のフレームについてレベルの調整を行うことを特徴とする付記４に記載の音声処理装置。 (Supplementary note 5) Furthermore, a timer for measuring the recording time is provided,
The sound processing means divides the recording data to create a low level frame when the state where the recording data level is equal to or lower than the threshold value exceeds a predetermined time, and adjusts the level for frames other than the low level frame The speech processing apparatus according to appendix 4, wherein:

（付記６）録音機能を備える音声処理装置の音声処理方法であって、
レベルの異なる音声を取り込んで録音データを生成し、
生成された前記録音データを、音声のまとまり毎に複数のフレームに分割し、
該フレーム毎に前記録音データのレベルを判別し、前記録音データのレベルが基準レベルと異なる場合、該基準レベルに基づいて、前記録音データのレベルを調整した新たな録音データを生成する、
処理を含む音声処理方法。 (Supplementary note 6) A voice processing method of a voice processing apparatus having a recording function,
Capture audio with different levels to generate recording data,
The generated recording data is divided into a plurality of frames for each unit of voice,
The level of the recording data is determined for each frame, and when the recording data level is different from a reference level, new recording data in which the recording data level is adjusted is generated based on the reference level.
A voice processing method including processing.

（付記７）録音機能を備える音声処理装置のコンピュータに実行させる音声処理プログラムであって、
レベルの異なる音声を取り込んで録音データを生成し、
生成された前記録音データを、音声のまとまり毎に複数のフレームに分割し、
該フレーム毎に前記録音データのレベルを判別し、前記録音データのレベルが基準レベルと異なる場合、該基準レベルに基づいて、前記録音データのレベルを調整した新たな録音データを生成する、
処理をコンピュータに実行させる音声処理プログラム。 (Supplementary note 7) A voice processing program to be executed by a computer of a voice processing device having a recording function,
Capture audio with different levels to generate recording data,
The generated recording data is divided into a plurality of frames for each unit of voice,
The level of the recording data is determined for each frame, and when the recording data level is different from a reference level, new recording data in which the recording data level is adjusted is generated based on the reference level.
A voice processing program that causes a computer to execute processing.

以上、本開示の構成の好ましい実施形態等について説明した。しかし、本開示の技術は上記実施の形態の記載に限定されるものではない。特許請求の範囲に記載され、または明細書に開示された技術の要旨に基づき、当業者において様々な変形や変更が可能であることは勿論である。そして斯かる変形や変更が本開示の技術に含まれることは言うまでもない。
The preferred embodiments of the configuration of the present disclosure have been described above. However, the technology of the present disclosure is not limited to the description of the above embodiment. It goes without saying that various modifications and changes can be made by those skilled in the art based on the gist of the technology described in the claims or disclosed in the specification. Needless to say, such modifications and changes are included in the technology of the present disclosure.

２、２０音声処理装置
４音声入力手段
６マイクロフォン
８音声処理手段
１０記憶手段
１２Ａ、１２Ｂ録音データテーブル
１２−１、１２−２、・・・フレーム
２２プロセッサ
２４記憶部
２６ＲＯＭ
２８ＲＡＭ
３０音声入出力部
３２スピーカ
３４音量レベル測定部
３６音声増幅回路
３７タイマ
５０音量レベル設定テーブル
７０、９０録音データテーブル
７２、７４、７６、８０Ａ、８０Ｂ、８２Ａ、８２Ｂ、８４Ａ、８４Ｂ音量レベル情報

2, 20 Audio processing device 4 Audio input means 6 Microphone 8 Audio processing means 10 Storage means 12A, 12B Recording data table 12-1, 12-2, ... Frame 22 Processor 24 Storage section 26 ROM
28 RAM
30 Audio Input / Output Unit 32 Speaker 34 Volume Level Measurement Unit 36 Audio Amplifier Circuit 37 Timer 50 Volume Level Setting Table 70, 90 Recording Data Table 72, 74, 76, 80A, 80B, 82A, 82B, 84A, 84B Volume Level Information

Claims

A voice processing device having a recording function,
A voice input means for capturing voice data of different levels and generating recording data;
The recording data generated by the voice input means is divided into a plurality of frames for each group of voices, the level of the voice included in the recording data is determined for each frame, and the level of the recording data is a reference level. If different, sound processing means for generating new recording data in which the level of the recording data is adjusted based on the reference level;
Storage means for storing the recording data generated by the voice processing means;
An audio processing apparatus comprising:

2. The audio processing apparatus according to claim 1, wherein the audio processing unit determines a difference between the level of the captured recording data and the reference level, and increases or decreases the level for each frame based on the difference. .

3. The voice processing means determines whether or not the level of the recorded data is equal to or lower than a threshold value, and divides the recorded data into frames when the level is equal to or lower than the threshold value. The voice processing apparatus according to 1.

A voice processing method of a voice processing device having a recording function,
Capture audio with different levels to generate recording data,
The generated recording data is divided into a plurality of frames for each unit of voice,
The level of the recording data is determined for each frame, and when the recording data level is different from a reference level, new recording data in which the recording data level is adjusted is generated based on the reference level.
A voice processing method including processing.

A voice processing program to be executed by a computer of a voice processing device having a recording function,
Capture audio with different levels to generate recording data,
The generated recording data is divided into a plurality of frames for each unit of voice,
The level of the recording data is determined for each frame, and when the recording data level is different from a reference level, new recording data in which the recording data level is adjusted is generated based on the reference level.
A voice processing program that causes a computer to execute processing.