JP7461090B1

JP7461090B1 - Audio processing device, audio processing method, and program

Info

Publication number: JP7461090B1
Application number: JP2023202151A
Authority: JP
Inventors: 健太郎中島
Original assignee: Azstoke
Current assignee: Azstoke
Priority date: 2023-11-29
Filing date: 2023-11-29
Publication date: 2024-04-03
Anticipated expiration: 2043-11-29

Abstract

【課題】複数の音声ファイルに対する自動音量調整のために有利な技術を提供する。【解決手段】音声処理装置は、音声の加工を行うためのソフトウェアであるミドルウェアと、音声の加工を行うための、前記ミドルウェアとは異なるソフトウェアであるデジタルオーディオワークステーション（ＤＡＷ）とを記憶する記憶部と、前記ミドルウェアおよび前記ＤＡＷを実行するプロセッサとを有し、前記プロセッサは、前記ＤＡＷ上で、前記ミドルウェアで設定された音声ファイルに記録された音声がアウトプットに至るまでのパスを示すルーティング情報に基づいて、前記ミドルウェア上での前記音声の音量値の変化量を取得する。【選択図】図１[Problem] To provide an advantageous technique for automatic volume adjustment for multiple audio files. [Solution] An audio processing device has a storage unit that stores middleware, which is software for processing audio, and a digital audio workstation (DAW), which is software different from the middleware for processing audio, and a processor that executes the middleware and the DAW, and the processor obtains the amount of change in the volume value of the audio on the middleware based on routing information that indicates the path taken by the audio recorded in the audio file set by the middleware on the DAW until it reaches the output. [Selected Figure] Figure 1

Description

本発明は、音声処理装置、音声処理方法、およびプログラムに関する。 The present invention relates to an audio processing device, an audio processing method, and a program.

複数の音声ファイルを扱うアプリケーションにおいては、多くの場合、各ファイルの音量は、それぞれ指定された音量に調整されていることが望ましい。例えば、ゲームにおいて、同じキャラクターの動作音（例えば歩行音）の音量が場面によって大きく異なると、ユーザに違和感を与えうる。そのため、開発者は、ゲームにインストールされる複数の音声ファイルの音量を調整する作業に多大な労力を割いている。 In applications that handle multiple audio files, it is often desirable that the volume of each file be adjusted to a designated volume. For example, in a game, if the volume of the action sound (for example, walking sound) of the same character varies greatly depending on the scene, it may give the user a sense of discomfort. Therefore, developers spend a lot of effort adjusting the volume of multiple audio files installed in a game.

従来、複数の音声ファイルに対する音量調整は、例えば次のような手順で行われていた。
（ａ）納品された複数の音声ファイルが記憶装置に保存される。
（ｂ）基準音声ファイルと複数の音声ファイルのうちから選択された１つの音声ファイルとを聴き比べる。
（ｃ）聴感上の音量が同じになるように音声ファイルの信号レベルを調整する。
（ｄ）複数の音声ファイルのうちの未処理の音声ファイルについて、（ｂ）、（ｃ）を繰り返す。 Conventionally, volume adjustment for a plurality of audio files has been performed, for example, using the following procedure.
(a) A plurality of delivered audio files are stored in a storage device.
(b) Listen to and compare the reference audio file and one audio file selected from a plurality of audio files.
(c) Adjust the signal level of the audio file so that the perceptual volume is the same.
(d) Repeat (b) and (c) for unprocessed audio files among the plurality of audio files.

なお、上記工程（ｃ）で行われる信号レベルの調整は、音声データ自体を変更することに限られない。例えば、特許文献１には、自動音量調整要素をオーディオデータと関連付けて記憶しておき、オーディオデータの再生時にその自動音量調整要素を用いて音量を調整することが記載されている。特許文献２には、音楽ファイルのファイル名に再生音量に関する再生制御識別子を付加し、音楽ファイルの再生時にその再生制御識別子を用いて音量を調整することが記載されている。 Note that the signal level adjustment performed in step (c) above is not limited to changing the audio data itself. For example, Patent Document 1 describes that an automatic volume adjustment element is stored in association with audio data, and the volume is adjusted using the automatic volume adjustment element when reproducing the audio data. Patent Document 2 describes adding a playback control identifier regarding playback volume to the file name of a music file, and adjusting the sound volume using the playback control identifier when playing the music file.

特開２００３－２４３９５２号公報JP 2003-243952 A 特開２０１１－１９７６６４号公報JP 2011-197664 A

しかし、例えばゲームで使用される音声ファイルの数は数万以上に及ぶ場合がある。そのような数の音声ファイルの音量を逐一調整するとなると、作業工数は膨大なものとなる。そのため、複数の音声ファイルに対する音量調整を自動化することで音量調整作業にかかる労力の軽減が望まれている。また、ゲーム開発においては、音声ファイルの制作、調整のために、ミドルウェア（オーディオミドルウェア）と、デジタルオーディオワークステーション（ＤＡＷ）の、２つのソフトウェアが使用される。しかし、ＤＡＷ上では、複数の音声ファイルがそれぞれミドルウェアでどのような音量調整がされてきたのかを把握することができず、ミドルウェアでの音量調整結果を考慮した音量調整を行うことができなかった。
本発明は、複数の音声ファイルに対する自動音量調整のために有利な技術を提供する。 However, for example, the number of audio files used in a game may reach tens of thousands. Adjusting the volume of such a large number of audio files one by one would require a huge amount of work. Therefore, it is desired to reduce the labor required for volume adjustment work by automating the volume adjustment for multiple audio files. In addition, in game development, two software programs are used to create and adjust audio files: middleware (audio middleware) and digital audio workstation (DAW). However, on the DAW, it is not possible to grasp how the volume of each of the multiple audio files has been adjusted by the middleware, and it is not possible to adjust the volume taking into account the volume adjustment results of the middleware.
The present invention provides advantageous techniques for automatic volume adjustment for multiple audio files.

本発明の一側面によれば、音声を処理する音声処理装置であって、音声の加工を行うためのソフトウェアであるミドルウェアと、音声の加工を行うための、前記ミドルウェアとは異なるソフトウェアであるデジタルオーディオワークステーション（ＤＡＷ）とを記憶する記憶部と、前記ミドルウェアおよび前記ＤＡＷを実行するプロセッサと、を有し、前記プロセッサは、前記ＤＡＷ上で、前記ミドルウェアで設定された音声ファイルに記録された音声がアウトプットに至るまでのパスを示すルーティング情報に基づいて、前記ミドルウェア上での前記音声の音量値の変化量を取得する、ことを特徴とする音声処理装置が提供される。 According to one aspect of the present invention, there is provided an audio processing device for processing audio, the audio processing device comprising: a storage unit for storing middleware, which is software for processing audio, and a digital audio workstation (DAW), which is software different from the middleware for processing audio; and a processor for executing the middleware and the DAW, the processor acquiring the amount of change in the volume value of the audio on the middleware based on routing information indicating the path taken by the audio recorded in the audio file set by the middleware on the DAW to reach the output.

本発明によれば、複数の音声ファイルに対する自動音量調整のために有利な技術を提供することができる。 The present invention provides an advantageous technique for automatic volume adjustment for multiple audio files.

実施形態に係る音声処理装置の構成を示すブロック図。FIG. 1 is a block diagram showing the configuration of an audio processing device according to an embodiment. ラウドネステーブルの構造例を示す図。FIG. 4 is a diagram showing an example of the structure of a loudness table. ラウドネス値の設定画面を例示する図。FIG. 4 is a diagram illustrating a loudness value setting screen. ミドルウェア上の音声の階層構造を説明する概念図。FIG. 1 is a conceptual diagram illustrating the hierarchical structure of audio on middleware. ミドルウェアによる音量値の変化量を取得するための設定画面を例示する図。FIG. 3 is a diagram illustrating a setting screen for acquiring the amount of change in volume value by middleware. 音声処理方法のフローチャート。Flowchart of the audio processing method. 音声の波形の表示例を示す図。FIG. 3 is a diagram illustrating a display example of audio waveforms.

以下、添付図面を参照して実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲に係る発明を限定するものではなく、また実施形態で説明されている特徴の組み合わせの全てが発明に必須のものとは限らない。実施形態で説明されている複数の特徴のうち二つ以上の特徴は任意に組み合わされてもよい。また、同一若しくは同様の構成には同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. Note that the following embodiments do not limit the claimed invention, and not all combinations of features described in the embodiments are essential to the invention. Two or more features among the plurality of features described in the embodiments may be arbitrarily combined. In addition, the same or similar configurations are given the same reference numerals, and duplicate explanations will be omitted.

図１には、実施形態に係る音声処理装置Ｃの構成を示すブロック図が示されている。音声処理装置Ｃは、ファイルに記録された音声信号を表示し、音声信号に対して信号レベルの調整等の各種処理を行う装置である。本明細書において、「音声」という用語は広義に理解されるべきである。「音声」は、人や動物が発した声のみならず、楽音、コンピュータ生成された効果音等をも含みうるものとする。すなわち、本明細書において、「音声」という用語は、「スピーチ」、「サウンド」、「オーディオ（音響）」を含むことを意図している。 FIG. 1 shows a block diagram showing the configuration of an audio processing device C according to an embodiment. The audio processing device C is a device that displays an audio signal recorded in a file and performs various processing on the audio signal, such as adjusting the signal level. As used herein, the term "sound" should be understood in a broad sense. "Sound" may include not only sounds made by humans or animals, but also musical sounds, computer-generated sound effects, etc. That is, as used herein, the term "voice" is intended to include "speech," "sound," and "audio."

音声処理装置Ｃは、パーソナルコンピュータやワークステーション等のコンピュータ装置でありうる。音声処理装置Ｃは、装置全体の制御を司るＣＰＵ（中央処理装置）１０１、主記憶装置として機能すると共にＣＰＵ１０１のワークエリアを提供するＲＡＭ１０２、固定的なデータ及びプログラムを記憶するＲＯＭ１０３を備える。また、音声処理装置Ｃは、オーディオインタフェース（Ｉ／Ｆ）１０４を備える。オーディオインタフェース１０４には、マイクロホンＭ、スピーカＳが接続されうる。音声処理装置Ｃには、インタフェース（Ｉ／Ｆ）１０５を介して記憶装置（二次記憶装置）１１０（記憶部）が接続される。記憶装置１１０は、例えば、ハードディスクドライブ（ＨＤＤ）、ソリッドステートドライブ（ＳＳＤ）、またはそれらの組み合わせでありうる。なお、記憶装置１１０は、音声処理装置Ｃの内部に構成されてもよいし、外部に構成されてもよい。ネットワークインタフェース１０６は、ネットワークＮと接続して通信を行う。音声処理装置Ｃは、例えば、ネットワークＮを介して、サーバＡと通信可能に接続されうる。 The audio processing device C may be a computer device such as a personal computer or a workstation. The audio processing device C includes a CPU (central processing unit) 101 that controls the entire device, a RAM 102 that functions as a main storage device and provides a work area for the CPU 101, and a ROM 103 that stores fixed data and programs. The audio processing device C also includes an audio interface (I/F) 104. A microphone M and a speaker S can be connected to the audio interface 104. A storage device (secondary storage device) 110 (storage unit) is connected to the audio processing device C via an interface (I/F) 105. Storage device 110 may be, for example, a hard disk drive (HDD), a solid state drive (SSD), or a combination thereof. Note that the storage device 110 may be configured inside the audio processing device C, or may be configured externally. Network interface 106 connects to network N and performs communication. The audio processing device C can be communicatively connected to the server A via a network N, for example.

音声処理装置Ｃには、インタフェース１０７を介して、キーボード、マウス等の入力装置Ｋが接続されうる。また、音声処理装置Ｃには、インタフェース１０８を介して、ＣＤ－ＲＯＭドライブ、ＤＶＤドライブ等の外部メディア装置Ｆが接続されうる。さらに、音声処理装置Ｃは、ビデオコントローラ１０９を備える。ビデオコントローラ１０９は、表示装置（ディスプレイ）Ｄによる画像表示を制御する。入力装置ＫとディスプレイＤとが一体化されたタッチパネル画面が構成されてもよい。 An input device K such as a keyboard or a mouse can be connected to the audio processing device C via an interface 107. An external media device F such as a CD-ROM drive or a DVD drive can be connected to the audio processing device C via an interface 108. Furthermore, the audio processing device C includes a video controller 109. The video controller 109 controls image display by a display device (display) D. A touch panel screen may be configured by integrating the input device K and the display D.

音声処理装置Ｃを起動するためのブートプログラムはＲＯＭ１０３に記憶されている。また、図１に示すように、記憶装置１１０には、オペレーティングシステム（ＯＳ）１１１や１つ以上の音声ファイル１１３がインストールされうる。音声ファイル１１３は、ネットワークＮを介してサーバＡ等の外部装置から供給されてもよいし、外部メディア装置Ｆに収容されたメディアから供給されてもよい。あるいは、音声ファイル１１３は、マイクロホンＭにより収音された音響から作成されたものであってもよい。また、記憶装置１１０には、後述するラウドネステーブル１１４も記憶される。 A boot program for starting the audio processing device C is stored in the ROM 103. Further, as shown in FIG. 1, an operating system (OS) 111 and one or more audio files 113 may be installed in the storage device 110. The audio file 113 may be supplied from an external device such as the server A via the network N, or may be supplied from a medium stored in the external media device F. Alternatively, the audio file 113 may be created from the sound picked up by the microphone M. The storage device 110 also stores a loudness table 114, which will be described later.

音声ファイル１１３は、音声コンテンツが記録された音声ファイルである。一例において、音声ファイル１１３のファイル形式は、パーソナルコンピュータにおいて一般的に利用されるＷＡＶＥファイル形式でありうる。ＷＡＶＥファイルは、ヘッダと、オーディオ信号のデータとを含みうる。ヘッダは、モノラル／ステレオの種別、サンプリング周波数、量子化ビット数等の情報を含みうる。なお、音声ファイル１１３のファイル形式はＷＡＶＥファイル形式に限定されない。音声ファイル１１３のファイル形式は、ＷＡＶＥファイル形式以外の形式、例えば、ＡＩＦＦ、ＭＰ３、ＡＡＣ等の形式であってもよい。 Audio file 113 is an audio file in which audio content is recorded. In one example, the file format of audio file 113 may be a WAVE file format that is commonly used in personal computers. A WAVE file may include a header and audio signal data. The header may include information such as monaural/stereo type, sampling frequency, and quantization bit rate. Note that the file format of audio file 113 is not limited to the WAVE file format. The file format of audio file 113 may be a format other than the WAVE file format, such as AIFF, MP3, AAC, etc.

一例として、音声処理装置Ｃがゲーム開発に使用されることを考える。ゲーム開発における音声の実装は、おおまかには、サウンドクリエーターが音声ファイルを制作し、プログラマーが、制作された音声がユーザ操作に合わせて再生されるようにゲームエンジンによりプログラミングを行う。音声ファイルの制作においては、ゲーム開発の大作化や複雑化に伴い、大きく２つのツールが使われるようになっている。１つは、多彩な音声ファイルを制作するためのＤＡＷ、もう１つは、音声のゲームエンジンへの組み込みの手間を省力化するためのミドルウェア（オーディオミドルウェア）である。ＤＡＷは、デジタルオーディオワークステーション（Digital Audio Workstation）の略語であり、音声制作を目的として音声の録音／編集などをできるようにしたソフトウェアである。ミドルウェアは、ゲームエンジンに渡す音声の再生、加工、管理を行うソフトウェアであり、ＤＡＷで再生させるオーディオデータを作成することができる。そのようなミドルウェアとしては、例えば、Audiokinetic社製のWwiseがある。したがって、音声処理装置Ｃの記憶装置１１０には、ミドルウェア１１５およびＤＡＷ１１２もインストールされている。ＣＰＵ１０１は、ミドルウェアおよびＤＡＷを実行するプロセッサとして機能しうる。ＤＡＷとミドルウェアは、両者間で音声ファイルの受け渡し処理を行うなどの連携処理が可能に構成されている。例えば、ＤＡＷで音声を作成して音声ファイルの書き出しを行い、ＤＡＷからミドルウェアに音声ファイルを移動し、ミドルウェアで音声ファイルをゲームエンジンに実装する、といった処理を行うことができる。また、ＤＡＷからミドルウェアに移動した音声の調整が必要な場合には、音声ファイルをミドルウェアからＤＡＷに再び移動し、ＤＡＷで音声ファイルを調整することもできる。 As an example, consider that the audio processing device C is used for game development. To implement audio in game development, a sound creator creates an audio file, and a programmer uses a game engine to program the created audio so that it is played in accordance with user operations. In the production of audio files, two major tools have come to be used as game development becomes larger and more complex. One is a DAW for creating a variety of audio files, and the other is middleware (audio middleware) that saves the labor of incorporating audio into a game engine. DAW is an abbreviation for Digital Audio Workstation, and is software that allows recording/editing of audio for the purpose of audio production. Middleware is software that plays, processes, and manages audio to be passed to a game engine, and can create audio data to be played by a DAW. An example of such middleware is Wwise manufactured by Audiokinetic. Therefore, the middleware 115 and the DAW 112 are also installed in the storage device 110 of the audio processing device C. CPU 101 can function as a processor that executes middleware and DAW. The DAW and middleware are configured to be able to perform cooperative processing such as passing audio files between them. For example, it is possible to create audio in a DAW, write the audio file, move the audio file from the DAW to middleware, and use the middleware to implement the audio file in a game engine. Furthermore, if it is necessary to adjust the audio that was moved from the DAW to the middleware, the audio file can be moved from the middleware to the DAW again, and the audio file can be adjusted using the DAW.

ゲームに実装される音声ファイルの数は数万以上に及ぶ場合がある。納品された初期の複数の音声ファイルの音量にはばらつきがあるため、音声ファイルごとの音量調整（信号レベルの調整））を行う必要がある。しかし、そのような数の音声ファイルの音量を逐一調整するとなると、作業工数は膨大なものとなる。 The number of audio files implemented in a game can reach tens of thousands. Because there is variation in the volume of the initial audio files delivered, it is necessary to adjust the volume (signal level adjustment) for each audio file. However, adjusting the volume of that many audio files one by one would require an enormous amount of work.

ゲームに使用される音声には、キャラクターのセリフ音声、状況説明（成功、失敗など）音声、効果音、足音、爆破音、環境音、ＢＧＭ等、多種多様な音声が含まれうる。本発明者は、そのような音声の内容と適切な音量値との間には関連があることに着目した。本実施形態では、音声ファイルの音声の内容に応じて音量値を決定する。 The sounds used in games can include a wide variety of sounds, such as character dialogue, situation explanations (success, failure, etc.), sound effects, footsteps, explosions, environmental sounds, background music, etc. The inventors have noted that there is a relationship between the content of such sounds and appropriate volume values. In this embodiment, the volume value is determined according to the content of the sound in the audio file.

ゲーム開発の分野においては、一般には、各音声ファイルは、音声の属性がある程度分かるように命名される。「属性」とは、例えば、キャラクター名、シーン名、動作名、セリフの内容等、音声の内容を特定しうるものをいう。ファイル名は、例えば、「キャラクター名＋動作名」のように、複数の属性情報を含んでいてもよい。ゲーム開発においては、音声ファイルの命名規則が定められ、開発途中でそれが大幅に変更されることがないようにされるのが通常である。したがって、音声ファイルのファイル名から音声の内容を特定し、特定された音声の内容に応じて音量値を決定することが可能である。 In the field of game development, each audio file is generally named so that the attributes of the audio can be determined to some extent. An "attribute" is something that can identify the content of the audio, such as a character name, scene name, action name, or dialogue. A file name may contain multiple attribute information, such as "character name + action name." In game development, naming rules for audio files are usually established so that they are not significantly changed during development. Therefore, it is possible to identify the content of the audio from the file name of the audio file, and to determine the volume value according to the identified content of the audio.

本実施形態では、ＤＡＷ側で、各音声ファイルに記録された音声の音量調整を行うに際し、目標音量値が記述された音量テーブルが参照される。ここで、音量値について説明する。本実施形態では、音量値の尺度（指標）として、人間の聴覚特性が考慮されたラウドネス値が使用される。ラウドネス値は、例えばLUFS（Loudness Units Full Scale）またはLKFS（Loudness K-Weighted Full Scale）の単位で表される。したがって、本実施形態では、ＤＡＷ側で、各音各音声ファイルに記録された音声のラウドネス調整を行うに際し、ターゲットラウドネス値が記述されたラウドネステーブル１１４が音量テーブルとして参照される。ラウドネステーブル１１４は、音声ファイルのファイル名の一部となりうる文字列と音量値であるラウドネス値（ターゲットラウドネス値）との対応関係が記述されたルックアップテーブルである。ラウドネステーブルは、「ラウドネスリスト」とよばれてもよい。図２には、ラウドネステーブル１１４の構造例が示されている。ラウドネステーブル１１４は、文字列（登録文字列）とラウドネス値（ターゲットラウドネス値）との対を１つのレコードとして含む。各レコードに記述された登録文字列は、音声ファイルのファイル名の一部となりうる文字列である。なお、本発明は、音量値の尺度にラウドネス値を用いることに限定されるものではない。音量値の尺度にはラウドネス値以外の尺度（例えば、ＲＭＳ）が用いられてもよい。 In this embodiment, when adjusting the volume of the sound recorded in each audio file on the DAW side, a volume table in which a target volume value is described is referenced. Here, the volume value is described. In this embodiment, a loudness value that takes into account the human hearing characteristics is used as a measure (index) of the volume value. The loudness value is expressed in units of, for example, LUFS (Loudness Units Full Scale) or LKFS (Loudness K-Weighted Full Scale). Therefore, in this embodiment, when adjusting the loudness of the sound recorded in each audio file on the DAW side, the loudness table 114 in which the target loudness value is described is referenced as the volume table. The loudness table 114 is a lookup table in which the correspondence between a character string that can be a part of the file name of the audio file and a loudness value (target loudness value) that is a volume value is described. The loudness table may be called a "loudness list". FIG. 2 shows an example of the structure of the loudness table 114. The loudness table 114 includes a pair of a character string (registered character string) and a loudness value (target loudness value) as one record. The registered character string described in each record is a character string that can become part of the file name of an audio file. Note that the present invention is not limited to using loudness values as a scale for volume values. A scale other than loudness values (e.g., RMS) may be used as a scale for volume values.

図３には、ＣＰＵ１０１によるＤＡＷ１１２の実行時にディスプレイＤに表示される、ラウドネス値の設定画面３０の例が示されている。表示制御部としてのＣＰＵ１０１は、ディスプレイＤ上の設定画面３０にラウドネステーブルの各レコードを編集可能に表示する。ユーザは、この設定画面３０を介してラウドネステーブル１１４にレコードを追加登録することが可能である。ラウドネステーブル１１４に登録されているレコードの数は、レコード数表示窓３１に表示される。追加ボタン３２が押下（マウスによるクリック、タッチパネルを介したタップ操作）されたことに応じて、レコードを追加することができる。リスト３５には、登録された各レコードの内容が表示される。リスト３５における各レコードは、「検索」および「値」の欄を有する。「検索」欄には、検索されるべき登録文字列が、「値」欄には、登録文字列に対応するラウドネス値が表示される。全てのレコードがリスト３５の表示領域に表示しきれない場合には、スクロールバー３６を使用してスクロールさせることができる。 3 shows an example of a loudness value setting screen 30 displayed on the display D when the CPU 101 executes the DAW 112. The CPU 101 as a display control unit displays each record of the loudness table in an editable manner on the setting screen 30 on the display D. A user can add and register a record to the loudness table 114 via this setting screen 30. The number of records registered in the loudness table 114 is displayed in a record number display window 31. A record can be added in response to pressing the add button 32 (clicking with a mouse, tapping via a touch panel). The contents of each registered record are displayed in the list 35. Each record in the list 35 has a "Search" and "Value" column. The "Search" column displays the registered character string to be searched, and the "Value" column displays the loudness value corresponding to the registered character string. If all records cannot be displayed in the display area of the list 35, they can be scrolled using a scroll bar 36.

ユーザは、ラウドネス設定欄３３に、ラウドネス測定方法を指定することができる。ラウドネス測定方法としては、例えば、MaxMomentary、MaxShort-Term、Integratedがある。ラウドネス設定欄３３では、これらのうちのいずれかを選択することができる。MaxMomentaryとは、音声波形の時間軸上を所定時間スライドさせて得られる複数の測定窓（400msec長）のそれぞれでラウドネス計算を行い、そのうちの最大値をラウドネス値として採用するものをいう。MaxShort-Termとは、時間軸上を所定時間スライドさせて得られる複数の測定窓（3sec長）のそれぞれでラウドネス計算を行い、そのうちの最大値をラウドネス値として採用するものをいう。Integratedとは、音源全体（１つの音声ファイルの音声全体）のラウドネスを計測するものをいう。図３の例では、MaxMomentaryが選択されている。さらに、上記した特定の測定窓長ではなく、任意の測定窓長さを指定できるようになっていてもよい。 The user can specify a loudness measurement method in the loudness setting field 33. Examples of loudness measurement methods include MaxMomentary, MaxShort-Term, and Integrated. In the loudness setting field 33, one of these can be selected. MaxMomentary refers to a method in which loudness is calculated in each of multiple measurement windows (400 msec length) obtained by sliding the audio waveform on the time axis for a predetermined period of time, and the maximum value among them is adopted as the loudness value. MaxShort-Term refers to a method in which loudness is calculated in each of multiple measurement windows (3 seconds long) obtained by sliding a predetermined time on the time axis, and the maximum value among them is adopted as the loudness value. Integrated refers to something that measures the loudness of the entire sound source (the entire audio of one audio file). In the example of FIG. 3, MaxMomentary is selected. Furthermore, instead of the specific measurement window length described above, it may be possible to specify an arbitrary measurement window length.

音声ファイルの音声に対してラウドネス調整が行われる前に、オプションとして、ダイナミックレンジ・コンプレッションが行われてもよい。音声ファイル間の再生音量のばらつきが大きい場合がある。そのまま音源の音量を調整しない場合には、ある音声の再生音量が小さすぎあるいは大きすぎとなり聞きにくい状況となりうる。そのため、各音源の信号レベルを揃える必要がある。ダイナミックレンジ・コンプレッションは、そのような音声間の信号レベルを一定に揃えるために実施される。ダイナミックレンジ・コンプレッションは、一般に、信号レベルのピークを含む部分を抑圧し、信号レベルの低い部分を増大させる処理を含む。ただし、信号レベルを単に一定にすればよいわけではない。人の発話音の場合、ある程度抑揚がないと圧縮された感じが強くなる。そのため、ダイナミックレンジ・コンプレッションでは、圧縮対象を定めるための信号レベルのスレッショルドが適切に設定される必要がある。 Dynamic range compression may optionally be performed before loudness adjustment is performed on the audio of the audio file. There may be large variations in playback volume between audio files. If the volume of the sound source is not adjusted, the reproduction volume of certain audio may be too low or too loud, making it difficult to hear. Therefore, it is necessary to equalize the signal levels of each sound source. Dynamic range compression is performed to equalize signal levels between such sounds. Dynamic range compression generally includes processing that suppresses portions of signal levels that include peaks and increases portions of low signal levels. However, it is not sufficient to simply keep the signal level constant. In the case of human speech, if there is no intonation to some extent, the sound will feel compressed. Therefore, in dynamic range compression, it is necessary to appropriately set a signal level threshold for determining the compression target.

ダイナミックレンジ・コンプレッションは、エンベロープ上に配置された複数の調整ポイントのうちの任意の調整ポイントを動かすことにより、ユーザが手動で行うこともできる（手動コンプ）。しかし、手動コンプを全ての音声に対して行うのには多大な労力を要する。そこで、音声ファイルの全体に対してダイナミックレンジ・コンプレッションを自動で行うことも可能である。ダイナミックレンジ・コンプレッションを自動で行うことを、ここでは「自動コンプ」と称する。 Dynamic range compression can also be performed manually by the user by moving any adjustment point among a plurality of adjustment points arranged on the envelope (manual compression). However, it takes a lot of effort to manually compress all audio. Therefore, it is also possible to automatically perform dynamic range compression on the entire audio file. Automatically performing dynamic range compression is herein referred to as "auto compression."

自動コンプは、例えば次のような処理を含みうる。対象の音声ファイルの音声信号は複数のフレームで構成されている。まず、音声信号のエンベロープを取得する。次に、フレーム毎のエンベロープのピーク値を検出し、検出されたフレーム毎のピーク値の平均値（第１平均値）を算出する。次に、第１平均値よりも高いピーク値を検出し、それらの平均値（第２平均値）を算出する。そして、第２平均値よりも高いピーク値のうちの少なくとも一部が抑制されるようにエンベロープを調整する。例えば、第２平均値より高いピーク値を更に検出し、それらの平均値（第３平均値）を算出する。更に、第３平均値より高いピーク値を検出し、それらが第３平均値に近づくように調整する。なお、このような自動コンプの処理方法は一例にすぎず、他の処理方法によって実現されてもよい。 Automatic compression may include, for example, the following process. The audio signal of the target audio file is composed of multiple frames. First, the envelope of the audio signal is obtained. Next, the peak value of the envelope for each frame is detected, and the average value (first average value) of the detected peak values for each frame is calculated. Next, peak values higher than the first average value are detected, and their average value (second average value) is calculated. Then, the envelope is adjusted so that at least some of the peak values higher than the second average value are suppressed. For example, peak values higher than the second average value are further detected, and their average value (third average value) is calculated. Furthermore, peak values higher than the third average value are detected, and adjustments are made so that they approach the third average value. Note that such an automatic compression processing method is merely one example, and may be realized by other processing methods.

本実施形態では、ユーザは、記憶装置１１０の作業用フォルダに格納された全ての音声ファイルに対して自動コンプを適用するかしないかを指定することができる。設定画面３０には、自動コンプの実行を指示する自動コンプ設定欄３４が設けられている。自動コンプ設定欄３４には例えばラジオボタンまたはチェックボックスが用意されていて、そこを選択状態（ＯＮ）にすることで自動コンプの実行が指定される。図３の例では自動コンプ設定欄３４がラジオボタンによりＯＮにされている。この場合、音声ファイルの音声のダイナミックレンジ・コンプレッションが実行された後に、ラウドネス調整が行われる。 In this embodiment, the user can specify whether or not to apply automatic compression to all audio files stored in the working folder of the storage device 110. The setting screen 30 is provided with an automatic compression setting field 34 for instructing the execution of automatic compression. The automatic compression setting field 34 may include, for example, a radio button or a check box, and the execution of automatic compression is specified by selecting the field (ON). In the example of FIG. 3, the automatic compression setting field 34 is set ON by a radio button. In this case, loudness adjustment is performed after dynamic range compression is performed on the audio of the audio file.

設定画面３０は、更に、記憶装置１１０の作業用フォルダに格納された、ラウドネス調整の対象とされる１つ以上の音声ファイルのファイル名を表示するファイル名表示欄３７も有する。 The setting screen 30 further includes a file name display column 37 that displays the file names of one or more audio files that are stored in the work folder of the storage device 110 and are subject to loudness adjustment.

次に、ミドルウェア１１５上での音声ファイルの管理について説明する。個々の音声ファイルは、１つ以上の音声素材（記録された音の部分）を含みうる。１つの音声素材は「トラック」とも呼ばれる。ミドルウェアでは、音声（トラック）は階層構造で分類される。図４は、ミドルウェア上で管理される音声の階層構造Ｈの例を示す概念図である。インプット端子にトラックが入力され、全トラックがマスタートラックＭＳに集められてアウトプット端子から出力される。つまり、ゲームで再生される全ての音声は最終的にはマスタートラックＭＳを通過して出力される。インプット端子に入力されたトラックＩＮ１、ＩＮ２、ＩＮ３個別に、エフェクトＥおよび／または音量調整Ｖがかけられる。階層構造Ｈは、バス（Ｂｕｓ）トラックを含みうる。バストラックとは、１つ以上のトラックをまとめたトラックをいう。図４において、バストラックＢ１は、トラックＩＮ１とトラックＩＮ２を１つのトラックにまとめてマスタートラックＭＳに出力している。バスを使用することにより、複数トラックに対してまとめてエフェクトおよび／または音量調整をかけることができる。また、階層構造Ｈは、オグジュアリトラックを含みうる。オグジュアリトラックとは、あるトラックの複製を横流し（Ｓｅｎｄ）したものである。図４において、オグジュアリトラックＡＵＸは、トラックＩＮ３を入力し、バストラックＢ２に出力している。バストラックＢ２は、トラックＩＮ３を入力するとともに、オグジュアリトラックＡＵＸを入力している。オグジュアリトラックは、例えば、エフェクト（リバーブ、ディレイなど）がかかったトラックとエフェクトがかかっていないトラックとを併存させる場合に使用される。 Next, management of audio files on the middleware 115 will be explained. An individual audio file may contain one or more audio materials (portions of recorded sound). One audio material is also called a "track". In middleware, audio (tracks) are classified in a hierarchical structure. FIG. 4 is a conceptual diagram showing an example of a hierarchical structure H of audio managed on middleware. Tracks are input to the input terminal, all tracks are collected into a master track MS, and output from the output terminal. In other words, all the sounds played in the game ultimately pass through the master track MS and are output. Effect E and/or volume adjustment V is applied to tracks IN1, IN2, and IN3 input to the input terminals individually. The hierarchical structure H may include bus tracks. A bus truck is a truck that is a collection of one or more trucks. In FIG. 4, bus track B1 combines track IN1 and track IN2 into one track and outputs it to master track MS. By using a bus, you can apply effects and/or volume adjustments to multiple tracks at once. Furthermore, the hierarchical structure H may include auxiliary tracks. An auxiliary track is a copy of a certain track that is sent. In FIG. 4, the auxiliary track AUX inputs the track IN3 and outputs it to the bus track B2. The bus track B2 inputs the track IN3 as well as the auxiliary track AUX. The auxiliary track is used, for example, when a track to which an effect (reverb, delay, etc.) has been applied and a track to which no effect has been applied coexist.

ユーザは、不図示の階層構造設定画面を介して、階層構造を設計し、ミドルウェア上に登録されている音声ファイルの任意のトラックについて、階層構造におけるどのインプット端子に配置するかを決定することができる。これにより、各トラックについて、トラックが階層構造におけるインプットからアウトプットに至るまでのパスを示すルーティングが決定される。このように、階層構造Ｈの階層ごとに音量調整部が設けられており、トラックは各音量調整部を通過する度に音量調整を受けうる。ユーザは、設計した階層構造の階層ごとに適用するエフェクトおよび音量調整部の音量値を決定することができる。ミドルウェア上での音声の音量値の総変化量は、パス上の各音量調整部での音量値を合計することにより算出される。例えば、図４に示されているように、マスタートラックでのラウドネス値が－６ｄＢ、バストラックＢ１でのラウドネス値が＋４ｄＢ、トラックＩＮ１でのラウドネス値が＋２ｄＢ、トラックＩＮ２でのラウドネス値が－４ｄＢに設定されたとする。この場合、トラックＩＮ１のミドルウェア上でのラウドネス値の総変化量は、
（－６ｄＢ）＋（＋４ｄＢ）＋（＋２ｄＢ）＝０ｄＢ
となる。また、トラックＩＮ２のミドルウェア上でのラウドネス値の総変化量は、
（－６ｄＢ）＋（＋４ｄＢ）＋（－４ｄＢ）＝－６ｄＢ
となる。 A user can design a hierarchical structure via a hierarchical structure setting screen (not shown), and determine which input terminal in the hierarchical structure to place for any track of an audio file registered on the middleware. This determines routing indicating the path from the input to the output in the hierarchical structure for each track. In this way, a volume adjustment unit is provided for each level of the hierarchical structure H, and the track can be subjected to volume adjustment every time it passes through each volume adjustment unit. A user can determine the effect to be applied for each level of the designed hierarchical structure and the volume value of the volume adjustment unit. The total change in the volume value of the audio on the middleware is calculated by summing up the volume values of the volume adjustment units on the path. For example, as shown in FIG. 4, it is assumed that the loudness value of the master track is set to -6 dB, the loudness value of the bass track B1 is set to +4 dB, the loudness value of the track IN1 is set to +2 dB, and the loudness value of the track IN2 is set to -4 dB. In this case, the total change in the loudness value of the track IN1 on the middleware is
(-6dB) + (+4dB) + (+2dB) = 0dB
The total change in loudness value of track IN2 on the middleware is
(-6dB) + (+4dB) + (-4dB) = -6dB
It becomes.

上記のようなミドルウェアでの階層構造の設計により、音声ファイルに記録された音声がアウトプットに至るまでのパスを示すルーティング情報が作成される。ルーティング情報は、音声ファイルに記録された音声がアウトプットに至るまでのパス（以下「ルーティング」という。）の情報と、パス上の各音量調整部での音量値の情報とを含みうる。
なお、上記説明では、１つの種類の階層構造Ｈの例を示したが、複数種類の階層構造がミドルウェア上に構築されうるように構成されていてもよい。 By designing a hierarchical structure in the middleware as described above, routing information is created that indicates the path along which the audio recorded in the audio file reaches the output. The routing information may include information on the path along which the audio recorded in the audio file reaches the output (hereinafter referred to as "routing"), and information on the volume value at each volume adjustment unit along the path.
In the above explanation, an example of one type of hierarchical structure H has been shown, but a configuration may be adopted in which a plurality of types of hierarchical structures can be constructed on the middleware.

上記したように、ＤＡＷ上でのラウドネス調整は、ラウドネステーブルを参照して音声ファイルのファイル名に応じて決定されるラウドネス値を用いて行われる。しかし、従来、ＤＡＷ側では、ミドルウェア上での各音声の音量調整履歴を把握することはできなかった。そのため、ＤＡＷ側ではミドルウェアでの音量調整履歴を考慮することなくラウドネス調整が行われていた。 As described above, loudness adjustment on the DAW is performed using a loudness value determined according to the file name of the audio file with reference to the loudness table. However, conventionally, the DAW side has not been able to grasp the volume adjustment history of each audio on the middleware. Therefore, on the DAW side, loudness adjustment was performed without considering the volume adjustment history in middleware.

本実施形態では、ＣＰＵ１０１は、ＤＡＷ上で、ミドルウェアで作成されたルーティング情報に基づいて、ミドルウェア上での音声のルーティングにおける音量値の変化量を取得する。そして、ＣＰＵ１０１は、ＤＡＷ上で、取得された変化量に基づいて音量調整を行う。図５には、ＣＰＵ１０１によるＤＡＷ１１２の実行時にディスプレイＤに表示される、ミドルウェア上での音量値の変化量を取得するための設定画面４０の例が示されている。
設定画面４０において、検索チェック欄４１は、ミドルウェアの接続を設定する欄である。
上記したように、ミドルウェア上に複数種類の階層構造が構築される場合がある。検索パス設定欄４４は、ミドルウェア上に複数種類の階層構造が構築されている場合に、どの階層構造を検索対象とするかを設定する欄である。
検知パス設定欄４５は、検索パスの中で同一ファイル名の音声ファイルが複数見つかった場合に優先する階層を設定する欄である。
対象外ルーティング設定欄４７は、ミドルウェア上での音量値の総変化量を求める際の計算対象から除外するルーティング（対象外ルーティング）を設定する欄であり、ユーザは、対象外ルーティングを特定する情報をルーティング指定欄４８に指定することができる。
加算ルーティング設定欄４９は、ミドルウェア上での音量値の総変化量を求める際の計算対象に加えられるべき、検索パス設定欄４４で設定された階層構造とは別の階層構造におけるルーティング（加算ルーティング）を設定する欄であり、ユーザは、加算ルーティングを特定する情報を、ルーティング指定欄５０に指定することができる。ルーティング内での音量調整以外にも、例えば、アップミックス（例：2.0ch→4.0ch）、ダウンミックス（例：4.0ch→2.0ch）、ゲイン（Gain）、リミッター（Limiter）、コンプ（Comp）等のエフェクターを利用した音量調整を行うなど、複数の例外がありうる。加算ルーティング設定欄４９は、そのような例外的な対応のために用意されている。
ルーティング指定欄４８、５０が増えてそれらの全てを表示領域に表示しきれない場合には、スクロールバー５１を使用してスクロールさせることができる。 In this embodiment, the CPU 101 acquires the amount of change in the volume value in the routing of the audio on the middleware based on the routing information created by the middleware on the DAW. Then, the CPU 101 adjusts the volume on the DAW based on the amount of change acquired. Fig. 5 shows an example of a setting screen 40 for acquiring the amount of change in the volume value on the middleware, which is displayed on the display D when the CPU 101 executes the DAW 112.
On the setting screen 40, a search check box 41 is a box for setting a middleware connection.
As described above, multiple types of hierarchical structures may be built on the middleware. The search path setting field 44 is a field for setting which hierarchical structure is to be searched when multiple types of hierarchical structures are built on the middleware.
The detection path setting field 45 is a field for setting a layer to be given priority when a plurality of audio files with the same file name are found in the search path.
The excluded routing setting field 47 is a field for setting routing (excluded routing) to be excluded from the calculation when determining the total change in volume value on the middleware, and the user can specify information identifying the excluded routing in the routing specification field 48.
The additive routing setting field 49 is a field for setting a routing (additive routing) in a hierarchical structure other than the hierarchical structure set in the search path setting field 44, which should be added to the calculation target when calculating the total change amount of the volume value on the middleware, and the user can specify information for specifying the additive routing in the routing specification field 50. In addition to volume adjustment within the routing, there are several exceptions, such as volume adjustment using an effector such as upmix (e.g., 2.0ch to 4.0ch), downmix (e.g., 4.0ch to 2.0ch), gain, limiter, and comp. The additive routing setting field 49 is prepared for such exceptional cases.
If the number of routing specification fields 48, 50 increases and they cannot all be displayed in the display area, a scroll bar 51 can be used to scroll through them.

図６には、音声処理装置Ｃにおける音声処理方法のフローチャートが示されている。このフローチャートに対応するプログラムはＤＡＷ１１２に含まれており、ＤＡＷ１１２を実行中のＣＰＵ１０１によって行われる。 FIG. 6 shows a flowchart of the audio processing method in the audio processing device C. A program corresponding to this flowchart is included in the DAW 112 and is executed by the CPU 101 while the DAW 112 is being executed.

ステップＳ１１で、ＣＰＵ１０１は、ミドルウェアに登録されている音声ファイルを取得し、記憶装置１１０の所定の作業用フォルダに格納する。 In step S11, the CPU 101 obtains the audio file registered in the middleware and stores it in a specified working folder in the storage device 110.

ステップＳ１２で、ＣＰＵ１０１は、取得した音声ファイルに対して自動コンプを実行する。ただし、このステップＳ１２は、図３に示した設定画面３０における自動コンプ設定欄３４が選択状態されている場合のオプションである。自動コンプ設定欄３４が選択状態されていない場合には、ステップＳ１２はスキップされる。 In step S12, the CPU 101 performs automatic compression on the acquired audio file. However, this step S12 is an option when the automatic compression setting field 34 on the setting screen 30 shown in FIG. 3 is selected. If the automatic compression setting field 34 is not selected, step S12 is skipped.

ステップＳ１３で、ＣＰＵ１０１は、取得した音声ファイルに対して、ファイル名と部分一致する文字列を登録文字列として有するレコードをラウドネステーブル１１４から検索する。ＣＰＵ１０１は、この検索により得られたレコードに記述されたラウドネス値Ｒを取得する。 In step S13, the CPU 101 searches the loudness table 114 for a record having a registered character string that partially matches the file name for the acquired audio file. The CPU 101 obtains the loudness value R described in the record obtained by this search.

ステップＳ１４で、ＣＰＵ１０１は、検索パス設定欄４４で設定された検索パス以下に（すなわち、検索対象とする階層構造における少なくともいずれかのルーティングに）、音声ファイルがあるかを判定する。音声ファイルがあれば、処理はステップＳ１５へ進む。ステップＳ１５では、ＣＰＵ１０１は、検索パス以下に音声ファイルが複数あるか否かを判定する。音声ファイルが複数ある場合、処理はステップＳ１６へ進み、音声ファイルが１つのみの場合には処理はステップＳ１８へ進む。 In step S14, the CPU 101 determines whether there is an audio file below the search path set in the search path setting field 44 (that is, in at least one of the routings in the hierarchical structure to be searched). If there is an audio file, the process advances to step S15. In step S15, the CPU 101 determines whether there are multiple audio files under the search path. If there are multiple audio files, the process proceeds to step S16, and if there is only one audio file, the process proceeds to step S18.

ステップＳ１６では、ＣＰＵ１０１は、検知パス設定欄４５で設定された優先階層に音声ファイルがあるか否かを判定する。検知パス設定欄４５で設定された優先階層に音声ファイルがある場合には処理はステップＳ１８に進み、そうでなければ処理はステップＳ１７へ進む。ステップＳ１７では、ステップＳ１５で特定された複数の音声ファイルのうち１番目に検索されたの音声ファイルのパスのルーティング情報を取得する。 In step S16, the CPU 101 determines whether or not there is an audio file in the priority hierarchy set in the detection path setting field 45. If there is an audio file in the priority hierarchy set in the detection path setting field 45, the process proceeds to step S18; if not, the process proceeds to step S17. In step S17, the CPU 101 obtains routing information for the path of the first audio file found among the multiple audio files identified in step S15.

ステップＳ１８では、ＣＰＵ１０１は、ステップＳ１７で取得したルーティング情報を含む、ミドルウェア上での全ルーティング情報を取得する。ルーティング情報は、上記したように、ミドルウェアで設定された音声ファイルに記録された音声がアウトプットに至るまでのパスの情報と、パス上の各音量調整部での音量値の情報とを含みうる。 In step S18, the CPU 101 acquires all routing information on the middleware, including the routing information acquired in step S17. As described above, the routing information may include information on the path along which the audio recorded in the audio file set in the middleware is output, and information on the volume value at each volume adjustment unit on the path.

ステップＳ１９では、ＣＰＵ１０１は、対象外ルーティング設定欄４７で設定された対象外ルーティングを、後述のステップＳ２１でラウドネス値の総変化量を求める際の計算対象から除外する。 In step S19, the CPU 101 excludes the non-target routings set in the non-target routing setting field 47 from the calculations made when calculating the total change in loudness value in step S21 described below.

ミドルウェア上での音量値の総変化量を求める際の計算対象に加えられるべきルーティングは１つだけである。例えばエフェクトをかける際には、ルーティングを複数設定することが可能である。図４には、トラックＩＮ３がオグジュアリトラックＡＵＸを経由することでエフェクトがかけられた状態でバスＢ２へ入力されるルーティングと、トラックＩＮ３が原音（Ｄｒｙ）の状態でバスＢ２へ入力されるルーティングとが併存する例が記載されている。この２つのルーティングのうち音量値を取得すべきルーティングは、トラックＩＮ３が原音（Ｄｒｙ）の状態でバスＢ２へ入力されるルーティングである。したがってこの場合、トラックＩＮ３がオグジュアリトラックＡＵＸを経由してバスＢ２へ入力されるルーティングは、対象外ルーティングに設定されるべきである。しかし、ユーザが、そのようなルーティングを対象外ルーティングに設定することを忘れてしまう場合もありうる。そこで、ステップＳ２０では、ＣＰＵ１０１は、検索対象とするルーティングが１つであるか否かを判定する。検索対象とするルーティングは１つではない場合、ステップＳ２６に進み、ＣＰＵ１０１はエラー出力を行い、処理を終了する。検索対象とするルーティングが１つである場合、処理はステップＳ２１へ進む。ステップＳ２１では、ＣＰＵ１０１は、ミドルウェア上でのラウドネス値の総変化量Ｔを算出する。総変化量Ｔは、検索パス設定欄４４で設定された検索パスのうち音声ファイルがあるルーティングでのラウドネス値の変化量に、加算ルーティング設定欄４９で設定されたルーティングにおけるラウドネス値の変化量を加算することにより得られる。ただし、ステップＳ１４で、検索パス設定欄４４で設定された検索パス以下に音声ファイルが存在しない場合は、ステップＳ２５で、検索パスのラウドネス値の変化量は０とされる。 Only one routing should be added to the calculation target when calculating the total amount of change in the volume value on the middleware. For example, when applying effects, it is possible to set multiple routings. Figure 4 shows the routing in which track IN3 passes through the auxiliary track AUX and is input to bus B2 with an effect applied, and the track IN3 is input to bus B2 in the original (dry) state. An example where routing coexists is described. Of these two routings, the one in which the volume value should be obtained is the one in which the track IN3 is input to the bus B2 in the original sound (dry) state. Therefore, in this case, the routing in which the track IN3 is input to the bus B2 via the auxiliary track AUX should be set to non-target routing. However, there may be cases where the user forgets to set such a route to a non-targeted route. Therefore, in step S20, the CPU 101 determines whether there is only one route to be searched. If there is not one route to be searched, the process advances to step S26, where the CPU 101 outputs an error and ends the process. If there is only one route to be searched, the process advances to step S21. In step S21, the CPU 101 calculates the total amount of change T in the loudness value on the middleware. The total amount of change T is the amount of change in loudness value in the routing set in the search path setting field 44, plus the amount of change in the loudness value in the routing set in the routing setting field 49, to the amount of change in the loudness value in the routing where the audio file is located among the search paths set in the search path setting field 44. Obtained by adding. However, if in step S14 there is no audio file below the search path set in the search path setting field 44, the amount of change in the loudness value of the search path is set to 0 in step S25.

ステップＳ２２で、ＣＰＵ１０１は、最終ラウドネス値ＦＲ（最終音量値）を算出する。最終ラウドネス値ＦＲは、ステップＳ１３でラウドネステーブルから得られたラウドネス値Ｒと、ステップＳ２１で得られたラウドネス値の総変化量Ｔの差を計算することにより得られる。 In step S22, the CPU 101 calculates a final loudness value FR (final volume value). The final loudness value FR is obtained by calculating the difference between the loudness value R obtained from the loudness table in step S13 and the total amount of change T in the loudness values obtained in step S21.

ステップＳ２３で、ＣＰＵ１０１は、ステップＳ２２で算出された最終ラウドネス値ＦＲにより対象音声ファイルに記録された音声のラウドネス調整を行う。ラウドネス調整は、例えば、ラウドネス設定欄３３で指定されたラウドネス測定方法に従い対象音声ファイルの音声（ステップＳ１２が実行された場合は、自動コンプが実行された後の対象音声ファイルの音声）のラウドネス値を測定し、その測定結果に基づいて、ラウドネス値がターゲットラウドネス値になるように音声のゲイン値を調整することにより行われる。 In step S23, the CPU 101 adjusts the loudness of the audio recorded in the target audio file using the final loudness value FR calculated in step S22. The loudness adjustment is performed, for example, by measuring the loudness value of the audio of the target audio file (if step S12 is executed, the audio of the target audio file after automatic compression has been executed) according to the loudness measurement method specified in the loudness setting field 33, and adjusting the gain value of the audio based on the measurement result so that the loudness value becomes the target loudness value.

ステップＳ２４で、表示制御部としてのＣＰＵ１０１は、ステップＳ１１で取得された音声ファイルの音声またはステップＳ１２で自動コンプがかけられた音声の波形である第１波形（ラウドネス調整前の波形）と、ラウドネス調整後の音声の波形である第２波形とを、ディスプレイＤの表示領域に表示させる。波形表示例については後述する。 In step S24, the CPU 101 as a display control unit displays a first waveform (waveform before loudness adjustment), which is the waveform of the audio of the audio file acquired in step S11 or the audio that has been automatically compressed in step S12, and a second waveform, which is the waveform of the audio after loudness adjustment, in the display area of the display D. Examples of waveform display will be described later.

ミドルウェアに登録されている未処理の他の音声ファイルがある場合、処理はステップＳ１１に戻り、次の音声ファイルについて処理が繰り返される。したがって、ミドルウェアに登録されている複数の音声ファイルがある場合、複数の音声ファイルのそれぞれに対して、ステップＳ１３の検索からステップＳ２３のラウドネス調整が順次に行われる。 If there are other unprocessed audio files registered in the middleware, the process returns to step S11, and the process is repeated for the next audio file. Therefore, if there are multiple audio files registered in the middleware, the search in step S13 through loudness adjustment in step S23 are performed sequentially for each of the multiple audio files.

図７には、ステップＳ２４の波形表示の例が示されている。ここでは、３つの音声ファイルが処理された場合の波形表示の例を示す。表示される波形は時間領域波形である。したがって、波形の横軸は時間軸であり、縦軸は信号レベルを示している。図７において、表示領域の上段には、第１音声ファイルの音声の自動コンプ後（ラウドネス調整前）の波形Ｗ１１と、第２音声ファイルの音声の自動コンプ後（ラウドネス調整前）の波形Ｗ１２と、第３音声ファイルの音声の自動コンプ後（ラウドネス調整前）の波形Ｗ１３が、時間軸方向に沿って並べて配置される。波形Ｗ１１、Ｗ１２、Ｗ１３のそれぞれには、信号レベルを調整するために自動コンプにおいて得られたエンベロープ上に離散的に配置された複数の調整ポイントＰが表示されていてもよい。ユーザは、手動で、例えば、任意の調整ポイントをマウスでドラッグすることにより、当該位置の信号レベルを調整することができる。 Figure 7 shows an example of the waveform display in step S24. Here, an example of the waveform display when three audio files are processed is shown. The displayed waveform is a time domain waveform. Therefore, the horizontal axis of the waveform is the time axis, and the vertical axis indicates the signal level. In Figure 7, in the upper part of the display area, a waveform W11 after automatic compression (before loudness adjustment) of the audio of the first audio file, a waveform W12 after automatic compression (before loudness adjustment) of the audio of the second audio file, and a waveform W13 after automatic compression (before loudness adjustment) of the audio of the third audio file are arranged side by side along the time axis direction. Each of the waveforms W11, W12, and W13 may display a plurality of adjustment points P that are discretely arranged on the envelope obtained in the automatic compression in order to adjust the signal level. The user can manually adjust the signal level at any position by, for example, dragging an arbitrary adjustment point with the mouse.

図７において、表示領域の下段には、第１音声ファイルの音声のラウドネス調整後の波形Ｗ２１と、第２音声ファイルの音声のラウドネス調整後の波形Ｗ２２と、第３音声ファイルの音声のラウドネス調整後の波形Ｗ２３が、時間軸方向に沿って並べて配置されている。それぞれのラウドネス調整後の波形は、ステップＳ２３でラウドネス調整が行われた音声を新たにファイルに書き出すことによって得られる。 In FIG. 7, the lower part of the display area shows a waveform W21 of the first audio file after loudness adjustment, a waveform W22 of the second audio file after loudness adjustment, and a waveform W23 of the third audio file after loudness adjustment, arranged side by side along the time axis. Each waveform after loudness adjustment is obtained by writing the audio that has been loudness adjusted in step S23 to a new file.

また、ここには、波形表示の対象となった音声ファイルのラウドネス値の情報が表示されうる。例えば、各波形の音声ファイルのファイル名、ミドルウェア上でのラウドネス値の総変化量Ｔ、ラウドネステーブルの検索により得られたラウドネス値Ｒ、および最終ラウドネス値ＦＲが表示される。これによりユーザは、各音声ファイルが、ミドルウェアおよびＤＡＷにおいてどのくらい音量調整がされたのかを把握することができる。また、本実施形態によれば、ＤＡＷ上で、ミドルウェアでの音量調整履歴を考慮した音量調整を行うことができる。なお、上述した波形およびラウドネス値の情報の表示態様は一例にすぎないものであって、その他の表示態様が採用されてもよい。 Also, information on the loudness value of the audio file that is the target of waveform display can be displayed here. For example, the file name of the audio file of each waveform, the total amount of change T in the loudness value on the middleware, the loudness value R obtained by searching the loudness table, and the final loudness value FR are displayed. This allows the user to understand how much the volume of each audio file has been adjusted in the middleware and DAW. Further, according to the present embodiment, the volume can be adjusted on the DAW in consideration of the volume adjustment history in the middleware. Note that the above-described display manner of the information on the waveform and loudness value is merely an example, and other display manners may be adopted.

（その他の例）
図２に示されるように、ラウドネステーブル１１４における複数のレコードは、登録文字列の接頭辞の共通性によりグループ分けされている。接頭辞は、命名規則によって定められた、音声の属性を表す文字列でありうる。その場合、接頭辞が共通するということは、音声の属性が共通するということである。例えば、接頭辞「vo」は、キャラクターのボイスを表し、接頭辞「atk」は、攻撃（アタック）時の掛け声を表す、等である。図２の例では、複数のレコードは、「vo_」を接頭辞とするグループ１、「vo_atk」を接頭辞とするグループ２、「vo_dmg」を接頭辞とするグループ３、「vo_move」を接頭辞とするグループ４、「vo_cmm」を接頭辞とするグループ５に分類されている。 (Other examples)
As shown in FIG. 2, the plurality of records in the loudness table 114 are grouped based on the commonality of the prefixes of the registered character strings. The prefix can be a string of characters that represents an attribute of the audio as defined by a naming convention. In that case, the fact that the prefixes are common means that the audio attributes are common. For example, the prefix "vo" represents the character's voice, the prefix "atk" represents the shout during an attack, and so on. In the example in Figure 2, the multiple records include group 1 with the prefix "vo_", group 2 with the prefix "vo_atk", group 3 with the prefix "vo_dmg", and group 3 with the prefix "vo_move". It is classified into Group 4, which has the prefix "vo_cmm", and Group 5, which has the prefix "vo_cmm".

ステップＳ１３では、ＣＰＵ１０１は、対象音声ファイルのファイル名と部分一致する接頭辞をラウドネステーブル１１４から検索してグループを特定し、特定されたグループの中から、ファイル名と部分一致する登録文字列を検索する。図２の例では、各グループは、接頭辞のみからなる文字列とラウドネス値との対が記述された代表レコードを含む。代表レコードは各グループの先頭行に存在している。 In step S13, the CPU 101 searches the loudness table 114 for a prefix that partially matches the file name of the target audio file, identifies a group, and selects a registered character string that partially matches the file name from among the identified groups. search for. In the example of FIG. 2, each group includes a representative record in which a pair of a character string consisting only of a prefix and a loudness value is described. A representative record exists in the first row of each group.

ステップＳ１３において、検索の結果、特定されたグループの中から代表レコード以外にファイル名と部分一致する登録文字列が見つからなかった場合、ステップＳ２３では、当該代表レコードに記述されたラウドネス値によりラウドネス調整を行う。以下、具体例を説明する。ステップＳ１３において、最初に対象音声ファイルのファイル名と部分一致する接頭辞を、各グループの先頭行に存在する代表レコードから検索する。例えば、接頭辞「vo_atk」が対象音声ファイルのファイル名と部分一致したとする。この場合、検索対象のグループをグループ２に限定する。そして、グループ２の中から、ファイル名と部分一致する登録文字列を検索する。グループ２には、代表レコード以外に、「vo_atk_charge」、「vo_atk_s」等を登録文字列とするレコードが含まれるが、このグループ２の中から代表レコード以外にファイル名と部分一致する登録文字列が見つからなかった場合、ステップＳ２３では、代表レコード（登録文字列「vo_atk」）に対応するラウドネス値「－２１」によりラウドネス調整を行う。 In step S13, if the search result shows that no registered character string that partially matches the file name is found in the specified group other than the representative record, in step S23, loudness adjustment is performed using the loudness value described in the representative record. A specific example is described below. In step S13, first, a prefix that partially matches the file name of the target audio file is searched from the representative record in the first row of each group. For example, assume that the prefix "vo_atk" partially matches the file name of the target audio file. In this case, the group to be searched is limited to group 2. Then, a registered character string that partially matches the file name is searched for in group 2. In addition to the representative record, group 2 includes records with registered character strings such as "vo_atk_charge" and "vo_atk_s". If no registered character string that partially matches the file name is found in group 2 other than the representative record, in step S23, loudness adjustment is performed using the loudness value "-21" corresponding to the representative record (registered character string "vo_atk").

以上の処理によれば、ラウドネステーブルの検索範囲を限定することができるため、検索速度が向上する。 According to the above processing, the search range of the loudness table can be limited, so the search speed is improved.

なお、図２の例では、「vo_」を接頭辞とするグループ１は、「vo_」とそれに続く他の文字列を接頭辞とするその他のグループの上位レイヤとしての位置づけである。対象音声ファイルのファイル名と部分一致する接頭辞が「vo_」のみである場合、ラウドネス値はグループ１の「vo_」に対応する「－２３」となる。 Note that in the example of FIG. 2, group 1 having a prefix of "vo_" is positioned as a layer above other groups having a prefix of "vo_" and another character string following it. If the only prefix that partially matches the file name of the target audio file is "vo_", the loudness value will be "-23", which corresponds to "vo_" in group 1.

以上説明した実施形態によれば、文字列と音量値（ラウドネス値）との対を１つのレコードとして含む音量テーブル（ラウドネステーブル）から、音声ファイルのファイル名と部分一致する文字列を登録文字列として有するレコードの検索が行われる。そして、検索により得られたレコードに記述された音量値により音声ファイルに記録された音声の音量調整が行われる。音量テーブルが事前に作成されていれば、音量調整のための設定を別途行う必要がない。また、複数の音声ファイルを処理する場合、各音声ファイルに対して上記検索および音量調整が順次に行われる。このように複数の音声ファイルに対して自動的に音量調整が行われる。また、音量テーブルに含まれるレコードの数は複数の音声ファイルの数よりも大幅に少なく済む。よって、本実施形態によれば、複数の音声ファイルのそれぞれの音声を逐一調整していた従来技術と比べて、ユーザの作業工数は大幅に軽減される。さらに、本実施形態によれば、上述したように、ユーザは、各音声ファイルが、ミドルウェアおよびＤＡＷにおいてどのくらい音量調整がされたのかを把握することができる。また、本実施形態によれば、ＤＡＷ上で、ミドルウェアでの音量調整履歴を考慮した音量調整を行うことができる。 According to the embodiment described above, a search is performed for a record having a registered character string that partially matches the file name of the audio file from a volume table (loudness table) that contains a pair of a character string and a volume value (loudness value) as one record. Then, the volume of the audio recorded in the audio file is adjusted by the volume value described in the record obtained by the search. If the volume table is created in advance, there is no need to perform separate settings for volume adjustment. In addition, when multiple audio files are processed, the above search and volume adjustment are performed sequentially for each audio file. In this way, volume adjustment is performed automatically for multiple audio files. In addition, the number of records included in the volume table is significantly smaller than the number of multiple audio files. Therefore, according to this embodiment, the user's work hours are significantly reduced compared to the conventional technology in which the audio of each of the multiple audio files was adjusted one by one. Furthermore, according to this embodiment, as described above, the user can grasp how much volume adjustment has been performed on each audio file in the middleware and DAW. In addition, according to this embodiment, volume adjustment can be performed on the DAW taking into account the volume adjustment history in the middleware.

なお、ラウドネステーブル１１４が記憶装置１１０に記憶されていることは必須ではない。例えば、ネットワークＮを介して接続された外部装置（例えば、図１のサーバＡ）にラウドネステーブル１１４が記憶されており、音声処理装置ＣがネットワークＮを経由して外部装置に記憶されたラウドネステーブル１１４を参照するようにしてもよい。 It is not essential that the loudness table 114 is stored in the storage device 110. For example, the loudness table 114 may be stored in an external device (e.g., server A in FIG. 1) connected via a network N, and the audio processing device C may refer to the loudness table 114 stored in the external device via the network N.

本発明は、上述の実施形態で説明した音声処理方法の各ステップを実行させるためのプログラムを、コンピュータに実行させることによっても実施されうる。 The present invention can also be implemented by causing a computer to execute a program for executing each step of the audio processing method described in the above embodiment.

発明は上記の実施形態に制限されるものではなく、発明の要旨の範囲内で、種々の変形・変更が可能である。 The invention is not limited to the above-described embodiments, and various modifications and changes can be made within the scope of the invention.

Ａ：サーバ、Ｃ：音声処理装置、Ｄ：ディスプレイ、Ｋ：入力装置、１０１：ＣＰＵ、１１２：ＤＡＷ、１１４：ラウドネステーブル、１１５：ミドルウェア A: Server, C: Audio processing device, D: Display, K: Input device, 101: CPU, 112: DAW, 114: Loudness table, 115: Middleware

Claims

An audio processing device that processes audio,
a storage unit that stores middleware that is software for processing audio and a digital audio workstation (DAW) that is software different from the middleware for processing audio;
a processor that executes the middleware and the DAW;
has
The processor, on the DAW,
obtaining the amount of change in the volume value of the audio on the middleware based on routing information indicating the path of the audio recorded in the audio file set on the middleware until it reaches the output;
A voice processing device characterized by:

The processor, on the DAW, further
adjusting the volume of the audio based on the obtained amount of change;
The audio processing device according to claim 1, characterized in that:

In the middleware, the audio is classified in a hierarchical structure, and a volume adjustment section is provided for each layer,
The routing information includes information on the path of the audio and information on volume values at each volume adjustment unit on the path,
The processor, on the DAW,
calculating the total amount of change in the volume value of the audio on the middleware by summing the volume values in each volume adjustment unit on the path;
adjusting the volume of the audio based on the calculated total amount of change;
The audio processing device according to claim 2, characterized in that:

The processor, on the DAW,
Searching for a record having a character string that partially matches the file name of the audio file as a registered character string from a volume table that includes a pair of a character string and a volume value as one record;
adjusting the volume of the audio recorded in the audio file using a final volume value that is the difference between the volume value described in the record obtained by the search and the total amount of change;
The audio processing device according to claim 3, characterized in that:

5. The audio processing device according to claim 4, further comprising a setting means for setting a non-target routing, which is a routing to be excluded from calculation targets when calculating the total amount of change.

The audio according to claim 4, further comprising a setting means for setting an addition routing that is a routing in a hierarchical structure different from the hierarchical structure, which is to be added to the calculation target when calculating the total amount of change. Processing equipment.

The audio processing device according to claim 1, characterized in that the volume value scale is a loudness value.

An audio processing method executed by an audio processing device having a storage unit that stores middleware, which is software for processing audio, and a digital audio workstation (DAW), which is software different from the middleware and is used for processing audio, and a processor that executes the middleware and the DAW, comprising:
The processor, during execution of the DAW,
obtaining routing information indicating a path along which the audio recorded in the audio file set by the middleware reaches an output;
acquiring a change in a volume value of the audio on the middleware;
13. A method for processing audio, comprising:

A program for causing a computer to execute each step of the audio processing method described in claim 8.