JP2016177004A

JP2016177004A - Signal processor

Info

Publication number: JP2016177004A
Application number: JP2015055094A
Authority: JP
Inventors: 広臣四童子; Hiroomi Shidoji
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2015-03-18
Filing date: 2015-03-18
Publication date: 2016-10-06
Also published as: WO2016148298A1

Abstract

PROBLEM TO BE SOLVED: To execute control so that auditory impressions at the switching timing of sound contents can be natural without losing the acoustic feature of a plurality of continuously reproduced sound contents.SOLUTION: A signal processor includes analysis means, changing amount calculation means, and changing means. The analysis means analyzes the acoustic feature of each sound content in a section before/after the switching timing of a plurality of continuously reproduced sound contents. The changing amount calculation means calculates the changing amount of the acoustic feature amount of each sound content in the section before/after the switching timing based on an acoustic feature amount indicating the acoustic feature of the sound content in the section before the switching timing and an acoustic feature amount indicating the acoustic feature of the sound content in the section after the switching timing. The changing means executes processing for the sound content in the section before/after the switching timing according to the changing amount calculated by the changing amount calculation means.SELECTED DRAWING: Figure 3

Description

本発明は、音信号に対する信号処理技術に関し、特に、音量や周波数特性、残響特性などの音響的な特徴を調整する技術に関する。 The present invention relates to a signal processing technique for a sound signal, and more particularly to a technique for adjusting acoustic characteristics such as volume, frequency characteristics, and reverberation characteristics.

近年、ミュージッククリップやコンサートの収録データなど様々な種類の音楽コンテンツがインターネットを通じて利用可能となっている。これらの音楽コンテンツを利用する際には、ユーザの好みに即して選択した複数の音楽コンテンツを携帯音楽プレイヤーなどの再生装置で連続再生できるように配列し直して新たな音楽コンテンツを編集することがある。 In recent years, various kinds of music content such as music clips and recorded data of concerts are available through the Internet. When using these music contents, a plurality of music contents selected according to the user's preference are rearranged so that they can be continuously played on a playback device such as a portable music player, and new music contents are edited. There is.

複数の音楽コンテンツを連続再生する場合、音楽コンテンツが切り換わる際に違和感や聴感的なギャップ（以下、聴感ギャップ等）が発生する場合がある。聴感ギャップ等の発生原因としては、再生中の音楽コンテンツと後続の音楽コンテンツの音響的な特徴（以下、音響特徴）の差異が挙げられる。例えば、再生中の音楽コンテンツの音量と後続の音楽コンテンツの音量に差がある場合にはその音量差が聴感ギャップ等として体感される。また、再生中の音楽コンテンツがスタジオ録音されたものであり、後続の音楽コンテンツがライブ録音されたものである場合には、両者における残響特性の相違が聴感ギャップ等として体感される。以下、音響特徴を表す物理量を音響特徴量と呼ぶ。音響特徴量の具体例としては、音量、残響音の量や質が挙げられる。 When a plurality of music contents are continuously played back, when the music contents are switched, a sense of incongruity or an auditory gap (hereinafter referred to as an auditory gap) may occur. As a cause of occurrence of an audible gap or the like, there is a difference in acoustic characteristics (hereinafter referred to as acoustic characteristics) between the music content being reproduced and the subsequent music content. For example, if there is a difference between the volume of the music content being played and the volume of the subsequent music content, the volume difference is experienced as an audible gap. Further, when the music content being reproduced is recorded in the studio and the subsequent music content is recorded live, the difference in the reverberation characteristics between the two is experienced as an audible gap or the like. Hereinafter, a physical quantity representing an acoustic feature is referred to as an acoustic feature quantity. Specific examples of the acoustic feature amount include volume and amount and quality of reverberant sound.

聴感ギャップ等が発生しないようにするには、再生中の音楽コンテンツと後続の音楽コンテンツとの音響特徴の差異を緩和すること、すなわち、両音楽コンテンツの音響特徴量の差を無くす（或いは、小さくする）ことが必要である。このようなことを可能にする技術の一例としては特許文献１に開示の技術が挙げられる。特許文献１に開示の技術では、複数のオーディオデータを事前に解析して音響特徴についての目標特性を設計し、音響特徴がその目標特性に近づくように各オーディオデータを補正する。例えば、上記音響特徴が音量である場合には、各オーディオデータの音量の相加平均を上記目標特性とし、その目標特性に近づくように各オーディオデータの音量を補正するといった具合である。 In order to prevent an audible gap or the like from occurring, the difference between the acoustic features of the music content being played and the subsequent music content is alleviated, that is, the difference between the acoustic features of the two music contents is eliminated (or reduced). It is necessary to. An example of a technique that enables this is the technique disclosed in Patent Document 1. In the technique disclosed in Patent Document 1, a plurality of audio data is analyzed in advance to design a target characteristic for an acoustic feature, and each audio data is corrected so that the acoustic feature approaches the target characteristic. For example, when the acoustic feature is a volume, the arithmetic average of the volume of each audio data is set as the target characteristic, and the volume of each audio data is corrected so as to approach the target characteristic.

特開２００３−２７３６７８号公報JP 2003-273678 A

近藤多伸、インターネット、[online]、＜ＵＲＬ：http://reverb2014.dereverberation.com/workshop/index.html＞Tadanobu Kondo, Internet, [online], <URL: http://reverb2014.dereverberation.com/workshop/index.html> K.Lebart,et al.,acta acustica・ACUSTICA,Vol.87(2001),pp.359-366K. Lebart, et al., Acta acustica, ACUSTICA, Vol. 87 (2001), pp. 359-366 Jim Y.C.Wen,et al.,Acoustics,Speech and SignalProcessing,2008.ICASSP 2008.March 31 2008-April 4 2008,pp.329-332Jim Y.C. Wen, et al., Acoustics, Speech and SignalProcessing, 2008.ICASSP 2008. March 31 2008-April 4 2008, pp.329-332 Keisuke Kinoshita,et al.,IEEE TRANSACTION ON ON AUDIO,SPEECH ANDLANUAGE PROCESSING,VOL.17,NO.4,MAY 2009,pp.1-12Keisuke Kinoshita, et al., IEEE TRANSACTION ON ON AUDIO, SPEECH ANDLANUAGE PROCESSING, VOL.17, NO.4, MAY 2009, pp.1-12

しかし、特許文献１に開示の技術には以下のような問題があった。まず、第１に、目標特性を定めるために事前に大規模な処理を行っておく必要がある、という点である。第２に、補正により音楽コンテンツ全体を通した聴こえ方が変化し、各音楽コンテンツ本来の特徴（例えば、ライブ音源らしさ等）が損なわれてしまう、という点である。したがって、再生される複数の音楽コンテンツの各々の音響的な特徴を損なうことなく、音楽コンテンツの切り換わり前後の聴感を制御することはできなかった。また、環境音や文章等の読み上げ音声、マスカ音等、音楽コンテンツ以外の複数の音コンテンツを続けて再生する場合も同様の問題が発生する。 However, the technique disclosed in Patent Document 1 has the following problems. First, it is necessary to perform a large-scale process in advance in order to determine the target characteristic. Secondly, the way of listening through the entire music content changes due to the correction, and the original characteristics of each music content (for example, the quality of a live sound source) are impaired. Therefore, it is not possible to control the audibility before and after the switching of the music contents without impairing the acoustic characteristics of each of the plurality of music contents to be reproduced. The same problem also occurs when a plurality of sound contents other than music contents such as environmental sounds and text-to-speech sounds, masker sounds, etc. are continuously played back.

本発明は上記課題に鑑みて為されたものであり、続けて再生される複数の音コンテンツの音響的な特徴を損なうことなく、これら音コンテンツの切り換わりタイミングでの聴感的な印象が自然になるように制御することを可能にする技術を提供することを目的とする。 The present invention has been made in view of the above problems, and it is natural to have an audible impression at the switching timing of these sound contents without deteriorating the acoustic characteristics of a plurality of sound contents to be reproduced continuously. It is an object of the present invention to provide a technique that enables control to be performed.

上記課題を解決するために本発明は、続けて再生される複数の音コンテンツの切り換わりタイミングの前後の区間において、各音コンテンツの音響特徴を解析する解析手段と、前記切り換わりタイミングの前の区間の音コンテンツの音響特徴を表す音響特徴量および前記切り換わりタイミングの後の区間の音コンテンツの音響特徴を表す音響特徴量に基づいて、当該切り換わりタイミングの前後の区間における音コンテンツの音響特徴量の変更量を算出する変更量算出手段と、前記切り換わりタイミングの前後の区間の音コンテンツに対して、前記変更量算出手段により算出された変更量に応じた処理を施す変更手段と、を有することを特徴とする信号処理装置、を提供する。 In order to solve the above-described problems, the present invention provides an analysis means for analyzing acoustic characteristics of each sound content in a section before and after the switching timing of a plurality of sound contents to be played back continuously, and before the switching timing. Based on the acoustic feature amount representing the acoustic feature of the sound content in the section and the acoustic feature amount representing the acoustic feature of the sound content in the section after the switching timing, the acoustic feature of the sound content in the section before and after the switching timing. A change amount calculating means for calculating a change amount of the amount; and a changing means for performing processing according to the change amount calculated by the change amount calculating means on the sound content in the section before and after the switching timing. There is provided a signal processing device characterized by comprising:

本発明の信号処理装置による処理を経た音コンテンツを再生すれば、音コンテンツの切り換わりタイミングを基準として先行する音コンテンツの音響特徴から後続の音コンテンツの音響特徴まで、変更手段による処理内容に応じた態様で音響特徴が時間変化し、これにより切り換わりタイミングの前後の区間における聴感を制御することができる。例えば、先行する音コンテンツの音響特徴から後続の音コンテンツの音響特徴まで滑らかに時間変化するように変更量を変更量算出手段に算出させるようにすれば、聴感ギャップ等の発生を回避することができる。加えて、解析手段による解析対象は、切り換わりタイミングの前後の区間の音コンテンツに限られるため、特許文献１に開示の技術のような大規模な処理を行う必要はない。さらに、変更手段による処理対象は、切り換わりタイミングの前後の区間の音コンテンツに限られる。このため、仮に切り換わりタイミングの前の区間と後の区間の両方の音コンテンツを処理対象とする場合であっても、それら音コンテンツ本来の音響特徴が音コンテンツの全体に亘って損なわれることはない。つまり、本発明の信号処理装置によれば、続けて再生される複数の音コンテンツの音響的な特徴を損なうことなく、これら音コンテンツの切り換わりタイミングでの聴感的な印象が自然になるように制御することが可能になる。 If the sound content that has been processed by the signal processing device of the present invention is played back, the sound feature of the preceding sound content to the sound feature of the subsequent sound content are processed according to the processing content by the changing means, with reference to the switching timing of the sound content. In this manner, the acoustic feature changes with time, and thereby the audibility in the section before and after the switching timing can be controlled. For example, if the change amount is calculated by the change amount calculation means so as to smoothly change in time from the acoustic feature of the preceding sound content to the acoustic feature of the subsequent sound content, the occurrence of an auditory gap or the like can be avoided. it can. In addition, since the analysis target by the analysis unit is limited to the sound content in the section before and after the switching timing, it is not necessary to perform a large-scale process like the technique disclosed in Patent Document 1. Furthermore, the processing target by the changing means is limited to the sound content in the section before and after the switching timing. For this reason, even if the sound content in both the section before and after the switching timing is to be processed, the original acoustic features of the sound contents are not impaired over the entire sound content. Absent. In other words, according to the signal processing device of the present invention, an audible impression at the switching timing of these sound contents can be made natural without deteriorating the acoustic characteristics of a plurality of sound contents that are continuously played back. It becomes possible to control.

前述のように、音響特徴の具体例としては残響特性や音量が挙げられ、複数種の音響特徴を処理対象としても良い。より好ましい態様においては、本発明の信号処理装置は、変更手段の処理の態様を指定する指定手段を有し、変更量算出手段は、指定手段により指定された処理の態様に応じて変更量を算出することを特徴とする。このような態様によれば、音コンテンツの切り換わり前後の聴感を信号処理装置のユーザに自由に制御させることが可能になる。 As described above, specific examples of acoustic features include reverberation characteristics and volume, and multiple types of acoustic features may be processed. In a more preferred aspect, the signal processing apparatus of the present invention has a specifying means for specifying the processing mode of the changing means, and the change amount calculating means determines the change amount according to the processing mode specified by the specifying means. It is characterized by calculating. According to such an aspect, it becomes possible for the user of the signal processing device to freely control the audibility before and after switching of the sound content.

また、別の好ましい態様においては、前記解析手段は、所定の周波数帯域の音響特徴を解析し、前記変更量算出手段は、前記周波数帯域について前記変更量を算出することを特徴とする。このような態様によれば、聴感ギャップ等が特定の周波数帯域の音響特徴の相違に起因している場合に、その周波数帯域の音響特徴のみを変更することで、他の周波数帯域の音響的な特徴を損なうことなく、音コンテンツの切り換わりタイミングで聴感ギャップ等が発生することを回避できる。 In another preferred aspect, the analysis means analyzes acoustic characteristics in a predetermined frequency band, and the change amount calculation means calculates the change amount for the frequency band. According to such an aspect, when an auditory gap or the like is caused by a difference in acoustic characteristics of a specific frequency band, by changing only the acoustic characteristics of that frequency band, It is possible to avoid the occurrence of a hearing gap or the like at the switching timing of the sound content without impairing the characteristics.

さらに好ましい態様においては、前記複数の音コンテンツの各々には、音響特徴の変更を許可するか否を示すフラグが付与されており、前記フラグが変更を許可する値である場合に、前記解析手段、前記変更量算出手段、および前記変更手段を作動させる制御手段をさらに有することを特徴とする。このような態様によれば、音コンテンツの配布元等により音響特徴の変更が許可されていない音コンテンツを保護しつつ、そのような制限のない音コンテンツについてのみコンテンツ切り換わりタイミング前後の音響特徴を制御することができる。 In a further preferred aspect, each of the plurality of sound contents is provided with a flag indicating whether or not the change of the acoustic feature is permitted, and when the flag is a value allowing the change, the analysis means The change amount calculation means and control means for operating the change means are further included. According to such an aspect, while protecting the sound content whose sound feature is not permitted to be changed by the distribution source of the sound content, the sound feature before and after the content switching timing is set only for the sound content without such restriction. Can be controlled.

上記課題を解決するための別の態様としては、ＣＰＵ（Central Processing Unit）などのコンピュータを上記解析手段、変更量算出手段および変更手段として機能させるプログラムを提供する態様が考えられる。このようなプログラムにしたがってコンピュータを作動させることで当該コンピュータを上記信号処理装置として機能させることが可能になるからである。なお、このようなプログラムの具体的な提供態様としては、ＣＤ−ＲＯＭ（Compact Disk-Read Only memory）やＤＶＤ（登録商標：Digital Versatile Disc）、フラッシュＲＯＭなどのコンピュータ読み取り可能な記録媒体に上記プログラムを書き込んで配布する態様や、インターネットなどの電気通信回線経由のダウンロードにより配布する態様が考えられる。 As another aspect for solving the above-described problem, an aspect in which a program that causes a computer such as a CPU (Central Processing Unit) to function as the analysis means, the change amount calculation means, and the change means is conceivable. This is because by operating the computer according to such a program, the computer can function as the signal processing device. As a specific form of providing such a program, the program is stored in a computer-readable recording medium such as a compact disk-read only memory (CD-ROM), a DVD (registered trademark: Digital Versatile Disc), or a flash ROM. There may be a mode of distributing by writing and a mode of distributing by downloading via a telecommunication line such as the Internet.

この発明の一実施形態の信号処理装置１０の構成を示すブロック図である。1 is a block diagram illustrating a configuration of a signal processing device 10 according to an embodiment of the present invention. 同信号処理装置１０の制御部１００が信号処理プログラム１２４ａにしたがって実行する処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the process which the control part 100 of the same signal processing apparatus 10 performs according to the signal processing program 124a. 信号処理プログラム１２４ａにしたがって制御部１００を作動させることにより実現される機能を説明するための機能ブロック図である。It is a functional block diagram for demonstrating the function implement | achieved by operating the control part 100 according to the signal processing program 124a. 制御対象の音響特徴が音量である場合の動作例を示す図である。It is a figure which shows the operation example in case the acoustic feature of a control object is a volume. 制御対象の音響特徴が残響時間の長さである場合の動作例を示す図である。It is a figure which shows the operation example in case the acoustic feature of control object is the length of reverberation time. この発明の変形例を説明するための図である。It is a figure for demonstrating the modification of this invention.

以下、図面を参照しつつ、この発明の実施形態を説明する。
（Ａ：構成）
図１は、本発明の一実施形態の信号処理装置１０の構成例を示す図である。
図１に示す信号処理装置１０は、各々音楽コンテンツを表す複数のオーディオデータ（音楽コンテンツの音波形を表すサンプリングデータ列）の各々に対して、続けて再生した場合に音楽コンテンツの切り換わりタイミングの前後で聴感ギャップ等が発生しないように音響特徴を変更する信号処理を施す装置である。複数の音楽コンテンツを続けて再生することには、音楽コンテンツ間に無音区間等の区切り区間を設けずに連続再生する態様に加えて、区切り区間を設ける態様が含まれる。図１に示すように、信号処理装置１０は、制御部１００、外部機器インタフェース部１１０、記憶部１２０、およびこれら構成要素間のデータ授受を仲介するバス１３０を有する。 Embodiments of the present invention will be described below with reference to the drawings.
(A: Configuration)
FIG. 1 is a diagram illustrating a configuration example of a signal processing device 10 according to an embodiment of the present invention.
When the signal processing apparatus 10 shown in FIG. 1 continuously plays back each of a plurality of audio data (sampling data string indicating the sound waveform of the music content) each representing the music content, the switching timing of the music content is shown. This is a device that performs signal processing for changing acoustic characteristics so that an audible gap or the like does not occur before and after. Playing a plurality of music contents continuously includes a mode in which a segment section is provided in addition to a mode in which music contents are continuously played without a segment section such as a silent section. As shown in FIG. 1, the signal processing device 10 includes a control unit 100, an external device interface unit 110, a storage unit 120, and a bus 130 that mediates data exchange between these components.

制御部１００は、例えばＣＰＵである。制御部１００は、記憶部１２０（より正確には、不揮発性記憶部１２４）に記憶されている信号処理プログラム１２４ａを実行することで信号処理装置１０の制御中枢として機能する。信号処理プログラム１２４ａにしたがって制御部１００が実行する処理の詳細については重複を避けるため後に明らかにする。 The control unit 100 is, for example, a CPU. The control unit 100 functions as a control center of the signal processing device 10 by executing the signal processing program 124a stored in the storage unit 120 (more precisely, the nonvolatile storage unit 124). Details of processing executed by the control unit 100 in accordance with the signal processing program 124a will be made clear later to avoid duplication.

外部機器インタフェース部１１０は、例えばＵＳＢ（Universal Serial Bus）インタフェースなど各種インタフェース（以下、「Ｉ／Ｆ」と表記）の集合体である。外部機器Ｉ／Ｆ部１１０は、各種外部機器を接続し、その外部機器との間でデータの授受を行う。外部機器Ｉ／Ｆ部１１０は、その接続先の外部機器から取得したデータを制御部１００に与える一方、制御部１００から与えられたデータを接続先の外部機器へ出力する。外部機器Ｉ／Ｆ部１１０に接続される外部機器の一例としては、ＵＳＢメモリなどの記憶装置やサウンドシステムが挙げられる。 The external device interface unit 110 is a collection of various interfaces (hereinafter referred to as “I / F”) such as a USB (Universal Serial Bus) interface. The external device I / F unit 110 connects various external devices, and exchanges data with the external devices. The external device I / F unit 110 provides the data acquired from the connection destination external device to the control unit 100, and outputs the data provided from the control unit 100 to the connection destination external device. Examples of external devices connected to the external device I / F unit 110 include a storage device such as a USB memory and a sound system.

本実施形態では、外部機器Ｉ／Ｆ部１１０を介して信号処理の対象となるオーディオデータ（すなわち、続けて再生する複数の音楽コンテンツの各々に対応するオーディオデータ）が信号処理装置１０に入力される。例えば、続けて再生する複数音楽コンテンツの各々に対応するオーディオデータと各オーディオデータの再生順を示すスケジュールデータとを格納したＵＳＢメモリが外部機器Ｉ／Ｆ部１１０に接続されると、外部機器Ｉ／Ｆ部１１０は当該ＵＳＢメモリからスケジュールデータと各オーディオデータを読み出して制御部１００に与える。以降、制御部１００はスケジュールデータと各オーディオデータを不揮発性記憶部１２４に一旦書き込み、スケジュールデータの示す順にオーディオデータを不揮発性記憶部１２４から読み出して上記信号処理を施し、処理済のオーディオデータで上書きする。このようにして不揮発性記憶部１２４に格納された処理済のオーディオデータは、図示せぬ操作部を介してユーザから再生開始指示を与えられたことを契機としてスケジュールデータの示す順に不揮発性記憶部１２４から読み出され、外部機器Ｉ／Ｆ部１１０に接続されたサウンドシステムに出力され、音として再生される。なお、本実施形態では複数のオーディオデータとスケジュールデータとが各々別個のデータである場合について説明するが、これらを一体化した一つのデータであっても勿論良い。 In the present embodiment, audio data to be subjected to signal processing (that is, audio data corresponding to each of a plurality of music contents to be continuously played back) is input to the signal processing device 10 via the external device I / F unit 110. The For example, when a USB memory storing audio data corresponding to each of a plurality of music contents to be continuously played back and schedule data indicating the playback order of each audio data is connected to the external device I / F unit 110, the external device I The / F unit 110 reads the schedule data and each audio data from the USB memory, and gives them to the control unit 100. Thereafter, the control unit 100 once writes the schedule data and each audio data into the nonvolatile storage unit 124, reads out the audio data from the nonvolatile storage unit 124 in the order indicated by the schedule data, performs the signal processing, and uses the processed audio data. Overwrite. The processed audio data stored in the non-volatile storage unit 124 in this way is the non-volatile storage unit in the order indicated by the schedule data in response to a user giving a reproduction start instruction via an operation unit (not shown). The data is read from 124, output to a sound system connected to the external device I / F unit 110, and reproduced as sound. In the present embodiment, a case where a plurality of audio data and schedule data are separate data will be described. However, it is a matter of course that the data may be a single piece of data.

記憶部１２０は、図１に示すように、揮発性記憶部１２２と不揮発性記憶部１２４を含んでいる。揮発性記憶部１２２は例えばＲＡＭ（Random Access Memory）などの揮発性メモリである。揮発性記憶部１２２は、信号処理プログラム１２４ａを実行する際のワークエリアとして制御部１００によって利用される。不揮発性記憶部１２４は例えばフラッシュＲＯＭなどの不揮発性メモリである。不揮発性記憶部１２４には、本発明の特徴を顕著に示す処理を制御部１００に実行させる信号処理プログラム１２４ａが予め記憶されている。本実施形態では、制御部１００は信号処理装置１０の電源（図示略）投入を契機として信号処理プログラム１２４ａを不揮発性記憶部１２４から揮発性記憶部１２２へ読み出し、その実行を開始する。信号処理プログラム１２４ａにしたがって作動している制御部１００は、図示せぬ操作部を介してユーザから処理開始指示を与えられたことを契機として、スケジュールデータの示す順にオーディオデータを読み出し、図２に示す信号処理を開始する。 As shown in FIG. 1, the storage unit 120 includes a volatile storage unit 122 and a nonvolatile storage unit 124. The volatile storage unit 122 is a volatile memory such as a RAM (Random Access Memory). The volatile storage unit 122 is used by the control unit 100 as a work area when the signal processing program 124a is executed. The non-volatile storage unit 124 is a non-volatile memory such as a flash ROM. The nonvolatile storage unit 124 stores in advance a signal processing program 124a that causes the control unit 100 to execute a process that clearly shows the characteristics of the present invention. In the present embodiment, the control unit 100 reads the signal processing program 124 a from the nonvolatile storage unit 124 to the volatile storage unit 122 when the power (not shown) of the signal processing apparatus 10 is turned on, and starts executing the signal processing program 124 a. The control unit 100 operating in accordance with the signal processing program 124a reads out audio data in the order indicated by the schedule data in response to a processing start instruction given by the user via an operation unit (not shown). The signal processing shown is started.

図２は、信号処理プログラム１２４ａにしたがって制御部１００が実行する信号処理の流れを示すフローチャートである。図２に示すように、信号処理プログラム１２４ａにしたがって作動している制御部１００は、末尾を検出するまで（すなわち、ステップＳＡ１００の判定結果がＹｅｓとなるまで）処理対象のオーディオデータを順次読み込み、ステップＳＡ１００の判定結果がＹｅｓになると、制御部１００は、スケジュールデータを参照して後続のオーディオデータの有無を判定する（ステップＳＡ１１０）。ステップＳＡ１１０の判定結果が“Ｙｅｓ”である場合（すなわち、後続の音楽コンテンツが有る場合）には、制御部１１０は、処理対象のオーディオデータの末尾の時刻を音楽コンテンツの切り換わりタイミングとして検出し、ステップＳＡ１２０以降の処理を実行する。これに対して、ステップＳＡ１１０の判定結果が“Ｎｏ”である場合には、ステップＳＡ１２０の処理を実行することなく当該信号処理を終了する。 FIG. 2 is a flowchart showing the flow of signal processing executed by the control unit 100 in accordance with the signal processing program 124a. As shown in FIG. 2, the control unit 100 operating according to the signal processing program 124a sequentially reads the audio data to be processed until the end is detected (that is, until the determination result in Step SA100 is Yes) If the determination result in step SA100 is Yes, the control unit 100 refers to the schedule data and determines whether there is subsequent audio data (step SA110). When the determination result in step SA110 is “Yes” (that is, when there is subsequent music content), the control unit 110 detects the end time of the audio data to be processed as the switching timing of the music content. Then, the processes after step SA120 are executed. On the other hand, when the determination result of step SA110 is “No”, the signal processing is terminated without executing the processing of step SA120.

ステップＳＡ１２０では、制御部１００は上記切り換わりタイミングよりも手前に再生される音楽コンテンツ（以下、先行する音楽コンテンツ）の末尾の所定時間分のオーディオデータを解析し、当該所定時間に対応する区間（以下、先行する音楽コンテンツの解析区間）について、制御対象として予め定められた音響特徴を表す音響特徴量を算出する。なお、解析区間の時間長については適宜実験を行って好適な値に定めるようにすれば良い。また、音響特徴量を算出するための解析手法については制御対象の音響特徴の種類に応じて異なるため、詳細については動作例にて明らかにする。 In step SA120, the control unit 100 analyzes audio data for a predetermined time at the end of the music content (hereinafter referred to as preceding music content) to be played before the switching timing and analyzes a section corresponding to the predetermined time ( Hereinafter, for the preceding music content analysis section), an acoustic feature amount representing an acoustic feature predetermined as a control target is calculated. The time length of the analysis section may be set to a suitable value by performing experiments as appropriate. Moreover, since the analysis method for calculating the acoustic feature amount differs depending on the type of the acoustic feature to be controlled, the details will be clarified in the operation example.

ステップＳＡ１２０に後続するステップＳＡ１３０では、制御部１００は、上記切り換わりタイミングに後続して再生される音楽コンテンツ（以下、後続の音楽コンテンツ）の先頭の所定時間分のオーディオデータを解析し、当該所定時間に対応する時間区間（以下、後続の音楽コンテンツの解析区間）における制御対象の音響特徴を表す音響特徴量を算出する。本実施形態では、先行する音楽コンテンツの解析区間についての音響特徴量の算出（ステップＳＡ１２０）を行った後に後続する音楽コンテンツの解析区間についての音響特徴量の算出（ステップＳＡ１３０）を行ったが、ステップＳＡ１２０とステップＳＡ１３０の実行順を入れ替えても良く、両者を並列に実行しても良い。 In step SA130 subsequent to step SA120, the control unit 100 analyzes the audio data for a predetermined time at the beginning of the music content (hereinafter referred to as the subsequent music content) to be reproduced following the switching timing, and performs the predetermined processing. An acoustic feature amount representing an acoustic feature to be controlled in a time section corresponding to time (hereinafter, an analysis section of subsequent music content) is calculated. In this embodiment, calculation of the acoustic feature amount for the analysis section of the subsequent music content (step SA120) is performed after calculation of the acoustic feature amount for the analysis section of the preceding music content (step SA120). The execution order of step SA120 and step SA130 may be interchanged, or both may be executed in parallel.

ステップＳＡ１３０に後続するステップＳＡ１４０では、制御部１００は、先行する音楽コンテンツと後続の音楽コンテンツの各々について、切り換わりタイミングより前の解析区間における音響特徴から後の解析区間の音響特徴まで、音響特徴を滑らかに時間変化させる際の音響特徴量の変更量を時刻毎に算出する。次いで、制御部１００は、ステップＳＡ１４０にて算出した変更量にしたがって上記各解析区間のオーディオデータを加工し、加工済のオーディオデータを不揮発性記憶部１２４に書き込む（ステップＳＡ１５０）。なお、ステップＳＡ１４０における変更量の算出態様およびステップＳＡ１５０における加工態様についても、制御対象の音響特徴の種類に応じて種々の態様が考えられるため、詳細については動作例にて明らかにする。そして、ステップＳＡ１５０に後続するステップＳＡ１６０では、制御部１００は、処理対象のオーディオデータを、スケジュールデータの示す次の音楽コンテンツのオーディオデータに設定し、ステップＳＡ１００以降の処理を再度実行する。 In step SA140 subsequent to step SA130, the control unit 100 determines, for each of the preceding music content and the subsequent music content, from the acoustic feature in the analysis section before the switching timing to the acoustic feature in the subsequent analysis section. The amount of change of the acoustic feature amount when smoothly changing the time is calculated for each time. Next, the control unit 100 processes the audio data of each analysis section according to the change amount calculated in step SA140, and writes the processed audio data in the nonvolatile storage unit 124 (step SA150). In addition, since various aspects can be considered also about the calculation aspect of change amount in step SA140, and the process aspect in step SA150 according to the kind of acoustic feature to be controlled, details are clarified in an operation example. In step SA160 subsequent to step SA150, the control unit 100 sets the audio data to be processed as the audio data of the next music content indicated by the schedule data, and executes the processes after step SA100 again.

以上説明したように信号処理プログラム１２４ａにしたがって作動している制御部１００は、ステップＳＡ１２０およびＳＡ１３０の処理を実行する解析手段１２４ａ１、ステップＳＡ１４０の処理を実行する変更量算出手段１２４ａ２、およびステップＳＡ１５０の処理を実行する変更手段１２４ａ３として機能する（図３参照）。本実施形態では、図３に示す各手段をソフトウェアモジュールにより実現したが、電子回路などのハードウェアモジュールによりこれら各手段を実現しても勿論良い。
以上が信号処理装置１０の構成である。 As described above, the control unit 100 operating according to the signal processing program 124a includes the analysis unit 124a1 that executes the processing of steps SA120 and SA130, the change amount calculation unit 124a2 that executes the processing of step SA140, and the processing of step SA150. It functions as changing means 124a3 for executing processing (see FIG. 3). In the present embodiment, each unit shown in FIG. 3 is realized by a software module. However, each unit may be realized by a hardware module such as an electronic circuit.
The above is the configuration of the signal processing apparatus 10.

（Ｂ：動作）
次いで、制御対象の音響特徴が音量である場合と残響特性（より具体的には、残響時間の長さ）である場合の各々を例にとって信号処理装置１０の動作を説明する。
（Ｂ−１：制御対象の音響特徴が音量である場合の動作）
まず、図４（ａ）に示すように先行する音楽コンテンツがコンテンツＡ、後続の音楽コンテンツがコンテンツＢであり、図４（ｂ）に示すように、コンテンツＡの解析区間における音量ＶＡの方が、コンテンツＢの解析区間における音量ＶＢよりも大きい場合（すなわち、ＶＡ＞ＶＢの場合）を例にとって信号処理装置１０の動作を説明する。なお、図４（ａ）には、コンテンツＡとコンテンツＢの間に無音区間等の区切り区間を設けない場合について例示されているが、区切り区間を設けても勿論良い。 (B: Operation)
Next, the operation of the signal processing apparatus 10 will be described by taking as an example the case where the acoustic feature to be controlled is volume and the case where the acoustic feature is reverberation characteristics (more specifically, the length of reverberation time).
(B-1: Operation when the acoustic feature to be controlled is volume)
First, as shown in FIG. 4A, the preceding music content is the content A and the subsequent music content is the content B. As shown in FIG. 4B, the volume VA in the analysis section of the content A is better. The operation of the signal processing apparatus 10 will be described by taking as an example a case where the volume is higher than the volume VB in the analysis section of the content B (that is, VA> VB). Although FIG. 4A illustrates a case where no section such as a silent section is provided between the content A and the content B, it is needless to say that a section may be provided.

前述したように、制御部１００は、コンテンツＡからコンテンツＢへの切り換わりタイミングを検出すると（ステップＳＡ１００の判定結果：Ｙｅｓ、かつステップＳＡ１１０の判定結果：Ｙｅｓ）、ステップＳＡ１２０以降の処理を実行する。制御部１００は、ステップＳＡ１２０ではコンテンツＡの解析区間における音量ＶＡを算出し、ステップＳＡ１２０に後続するステップＳＡ１３０ではコンテンツＢの解析区間における音量ＶＢを算出する。なお、各解析区間における音量の算出方法としては、各区間における音響エネルギー（例えば、サンプルデータの二乗値の相加平均）を算出し、当該音響エネルギーを音量とするなど周知の方法を適宜用いれば良い。 As described above, when the control unit 100 detects the switching timing from the content A to the content B (determination result in step SA100: Yes and determination result in step SA110: Yes), the control unit 100 executes the processes after step SA120. . In step SA120, the control unit 100 calculates the volume VA in the analysis section of the content A, and in step SA130 subsequent to step SA120, calculates the volume VB in the analysis section of the content B. As a method for calculating the volume in each analysis section, a known method such as calculating the acoustic energy in each section (for example, the arithmetic mean of the square values of the sample data) and using the acoustic energy as the volume is appropriately used. good.

ステップＳＡ１３０に後続するステップＳＡ１４０では、制御部１００は、コンテンツＡとコンテンツＢの切り換わりタイミングを挟んで音量が滑らかに変化するように、コンテンツＡの解析区間における各時刻の音量の変更量と、コンテンツＢの解析区間における各時刻の音量の変更量とを算出する。本実施形態では、制御部１００は、コンテンツＡの解析区間の始点からコンテンツＢの解析区間の終点まで、横軸を時間、縦軸を音量とする二次元座標において上記始点における音量と上記終点における音量とを通る時間変化曲線（図４（ｂ）に示す例では一点鎖線で示す直線）に沿って音量が変化するように各時刻における音量の変更量を算出する。 In step SA140 subsequent to step SA130, the control unit 100 changes the volume change amount at each time in the analysis section of the content A so that the volume changes smoothly with the timing of switching between the content A and the content B. The change amount of the volume at each time in the analysis section of the content B is calculated. In the present embodiment, the control unit 100 determines the volume at the start point and the end point in two-dimensional coordinates from the start point of the analysis section of the content A to the end point of the analysis section of the content B with time on the horizontal axis and volume on the vertical axis. The amount of change in volume at each time is calculated so that the volume changes along a time change curve that passes through the volume (in the example shown in FIG. 4B, a straight line indicated by a one-dot chain line).

具体的には、制御部１００は、各解析区間における時刻（サンプリングタイミング）毎に、その時刻における上記時間変化曲線の示す値の平方根と当該時刻におけるサンプリングデータの値の差を上記変更量として算出する。後続のステップＳＡ１４０およびステップＳＡ１５０にて、各時刻のサンプルデータに上記変更量を加算することでオーディオデータの振幅を変更できるようにするためである。なお、本動作例では、上記時間変化曲線として直線を用いるが、滑らかな曲線であればどのような曲線であっても良い。具体的には、先行する音楽コンテンツの解析区間の始点における音量と、後続の音楽区間の解析区間の終点における音量と、両区間の境界にて両音量の中間に位置する音量の３点をエルミート補間やスプライン補間することで得られる曲線が考えられる。 Specifically, for each time (sampling timing) in each analysis section, the control unit 100 calculates the difference between the square root of the value indicated by the time change curve at that time and the value of the sampling data at the time as the change amount. To do. This is because the amplitude of the audio data can be changed by adding the change amount to the sample data at each time in the subsequent steps SA140 and SA150. In this operation example, a straight line is used as the time change curve, but any curve may be used as long as it is a smooth curve. Specifically, Hermite has three points: the volume at the start point of the analysis section of the preceding music content, the volume at the end point of the analysis section of the subsequent music section, and the volume located between the two volumes at the boundary of both sections A curve obtained by interpolation or spline interpolation can be considered.

そして、ステップＳＡ１４０に後続するステップＳＡ１５０では、制御部１００は、コンテンツＡの解析区間における音量がステップＳＡ１４０にて算出された変更量にしたがって時間変化するようにコンテンツＡのオーディオデータの振幅を変更する処理（各時刻のサンプルデータに当該時刻に対応する変更量を加算する処理）を施して不揮発性記憶部１２４に書き込むとともに、コンテンツＢの解析区間における音量がステップＳＡ１４０にて算出された変更量にしたがって時間変化するようにコンテンツＢのオーディオデータの振幅を変更する処理を施して不揮発性記憶部１２４に書き込む。以上に説明した要領で処理されたコンテンツＡのオーディオデータとコンテンツＢのオーディオデータとを続けて再生すると、両コンテンツの切り換わりタイミングを挟んで音量はＶＡからＶＢまで図４（ｂ）にて一点鎖線で示す直線にしたがって滑らかに時間変化し、両コンテンツの音量の差異が緩和される。このため、上記切り換わりタイミングの前後で音量の急激な変化に起因する聴感ギャップ等が発生することはない。 In step SA150 subsequent to step SA140, control unit 100 changes the amplitude of the audio data of content A so that the volume in the analysis section of content A changes over time according to the change amount calculated in step SA140. A process (a process of adding the change amount corresponding to the time to the sample data at each time) is written into the nonvolatile storage unit 124, and the volume in the analysis section of the content B is set to the change amount calculated in step SA140. Therefore, a process of changing the amplitude of the audio data of the content B so as to change with time is performed and written in the nonvolatile storage unit 124. When the audio data of the content A and the audio data of the content B processed in the manner described above are continuously played back, the volume is changed from VA to VB at one point in FIG. The time changes smoothly according to the straight line shown by the chain line, and the difference in volume between the two contents is alleviated. For this reason, an audible gap or the like due to a sudden change in volume before and after the switching timing does not occur.

（Ｂ−２：制御対象の音響特徴が残響特性である場合の動作）
次いで、制御対象の音響特徴が残響特性である場合の動作について説明する。
複数の音楽コンテンツを続けて再生する際に先行の音楽コンテンツにおける反射音の量と後続の音楽コンテンツにおける反射音の量に差があると音楽コンテンツの切り換わりタイミングの前後で音場の雰囲気が大きく変わり、聴感ギャップ等が発生する。図５（ａ）に示すように、反射音には、初期反射音と残響音とが含まれる。初期反射音とは音源から放射された後に壁等による最初の反射を経て聴者に至った音のことをいい、残響音とは複数回の反射を経て聴者に至った音のことをいう。残響音は後期反射音とも呼ばれる。制御対象の音響特徴が残響特性である場合には、続けて再生する音楽コンテンツの残響音の量が自然に繋がるよう、音楽コンテンツの残響音の変更量を算出し、残響音を除去或いは付与する処理を信号処理装置１０に行わせるようにすれば良い。 (B-2: Operation when the acoustic feature to be controlled is a reverberation characteristic)
Next, an operation when the acoustic feature to be controlled is a reverberation characteristic will be described.
When playing back multiple music contents in succession, if there is a difference between the amount of reflected sound in the preceding music content and the amount of reflected sound in the subsequent music content, the atmosphere of the sound field becomes large before and after the switching timing of the music content. It changes and a hearing gap occurs. As shown in FIG. 5A, the reflected sound includes an initial reflected sound and a reverberant sound. The initial reflected sound refers to a sound that reaches the listener through the first reflection by a wall or the like after being emitted from the sound source, and the reverberant sound refers to a sound that reaches the listener through a plurality of reflections. Reverberation is also called late reflection. When the acoustic feature to be controlled is a reverberation characteristic, the amount of change in the reverberation sound of the music content is calculated so that the amount of the reverberation sound of the music content to be continuously played is naturally connected, and the reverberation sound is removed or added. What is necessary is just to make it make the signal processing apparatus 10 perform.

より詳細に説明すると、制御対象の音響特徴が残響特性である場合、前述したステップＳＡ１２０およびステップＳＡ１３０では、処理対象のオーディオデータ（ステップＳＡ１２０では先行する音楽コンテンツの末尾の所定時間分のオーディオデータ、ステップＳＡ１３０では後続の音楽コンテンツの先頭から所定時間分のオーディオデータ）を解析して各々における残響時間を算出する処理を制御部１００に実行させるようにすれば良い。残響時間は残響特性を評価する際の指標の一つであり、残響時間が長いほど残響音の量が多いことを意味する。残響時間の算出方法については周知の技術を適宜採用するようにすれば良い。具体的には、処理対象のオーディオデータから初期反射音と残響音のパワーをそれぞれ推定して残響時間を算出する方法（非特許文献１参照）、処理対象のオーディオデータを解析して信号エネルギーの減衰部を見つけ出し、減衰部における減衰率から残響時間を算出する方法（非特許文献２参照）、同減衰部に対する最尤推定により残響時間を算出する方法（非特許文献３）等を利用することが考えられる。また、残響時間ではなく、残響音の音響エネルギーを推定し、残響音の量を直接推定する方法（非特許文献４）を採用しても良い。 More specifically, when the acoustic feature to be controlled is a reverberation characteristic, the audio data to be processed (the audio data for a predetermined time at the end of the preceding music content in step SA120, in step SA120 and step SA130 described above). In step SA130, the control unit 100 may be configured to analyze the audio data for a predetermined time from the beginning of the subsequent music content and calculate the reverberation time for each. The reverberation time is one of the indexes for evaluating the reverberation characteristics, and the longer the reverberation time, the greater the amount of reverberation sound. A known technique may be adopted as appropriate for the calculation method of the reverberation time. Specifically, a method for calculating the reverberation time by estimating the power of the initial reflected sound and the reverberation sound from the audio data to be processed (see Non-Patent Document 1), analyzing the audio data to be processed, and analyzing the signal energy. A method for calculating the reverberation time from the attenuation rate in the attenuation unit (see Non-Patent Document 2), a method for calculating the reverberation time by maximum likelihood estimation for the attenuation unit (Non-Patent Document 3), etc. Can be considered. Moreover, you may employ | adopt the method (Nonpatent literature 4) which estimates the acoustic energy of reverberation sound instead of the reverberation time, and estimates the amount of reverberation sound directly.

制御部１００は、コンテンツＡとコンテンツＢの切り換わりタイミングを挟んで残響特性が滑らかに変化するように、コンテンツＡの解析区間における各時刻の残響音の変更量と、コンテンツＢの解析区間における各時刻の残響音の変更量とを算出する（ステップＳＡ１４０）。図５（ｂ）に示すように、ステップＳＡ１１０にて算出された残響時間（先行する音楽コンテンツ（コンテンツＡ）の残響時間）ＴＡの方が後続の音楽コンテンツ（コンテンツＢ）の残響時間ＴＢよりも長かった（すなわち、コンテンツＡの方が残響音が多い）とする。この場合、制御部１００は、コンテンツＡの解析区間の始点からコンテンツＢの解析区間の終点まで、横軸を時間、縦軸を残響時間の長さとする二次元座標において上記始点における残響時間ＴＡと上記終点における残響時間ＴＢとを通る時間変化曲線（図５（ｂ）に示す例では一点鎖線で示す直線）に沿って残響時間が変化するように各時刻における残響音の変更量（コンテンツＡに対しては除去する残響音の量、コンテンツＢに対しては付加する残響音の量）を算出する。本実施形態では、制御部１００は、コンテンツＡについては解析区間における時刻毎に、その時刻における上記時間変化曲線の示す値とステップＳＡ１２０にて算出した残響時間の値の比（前者を後者を除した値）を各時刻の残響音の変更量とし、コンテンツＢについては解析区間における時刻毎に、その時刻における上記時間変化曲線の示す値とステップＳＡ１３０にて算出した残響時間の値の比を各時刻の残響音の変更量とする。 The control unit 100 changes the amount of reverberation sound at each time in the analysis section of the content A and each change in the analysis section of the content B so that the reverberation characteristics change smoothly between the switching timings of the content A and the content B. The amount of change of the reverberant sound at the time is calculated (step SA140). As shown in FIG. 5B, the reverberation time (the reverberation time of the preceding music content (content A)) TA calculated in step SA110 is greater than the reverberation time TB of the subsequent music content (content B). It is assumed that the content is longer (that is, the content A has more reverberant sounds). In this case, the control unit 100 determines the reverberation time TA at the start point in two-dimensional coordinates from the start point of the analysis interval of the content A to the end point of the analysis interval of the content B, with the horizontal axis representing time and the vertical axis representing the reverberation time. The amount of change of the reverberant sound at each time (in content A) so that the reverberation time changes along the time change curve passing through the reverberation time TB at the end point (in the example shown in FIG. 5B, a straight line indicated by a dashed line). On the other hand, the amount of reverberant sound to be removed and the amount of reverberant sound to be added to content B) are calculated. In the present embodiment, for the content A, the control unit 100 compares the value indicated by the time change curve at the time and the value of the reverberation time calculated at step SA120 for the time in the analysis section (the former is excluded from the latter). For the content B, for each time in the analysis section, the ratio between the value indicated by the time change curve at that time and the value of the reverberation time calculated in step SA130 The amount of change in the reverberant sound at the time.

ステップＳＡ１５０では、制御部１００は、ステップＳＡ１４０にて算出された変更量に応じて、残響音を除去或いは付加する処理を実行する。残響音の除去方法としてはスペクトルサブトラクションが挙げられる。スペクトルサブトラクションとは周波数領域での減算処理であり、以下の要領で実現される。制御部１００は、まず、処理対象のオーディオデータにＦＦＴを施し、周波数領域のデータに変換する。次いで、制御部１００はＦＦＴにおける周波数ビン毎に振幅レベルの時間変化を追跡し、その減衰幅が所定の閾値未満の周波数ビンについては残響音が加わっていると判定し、上記変更量を乗算することで振幅を抑圧する。一般に、反射音の加わった音は、加わっていないものに比べ、振幅の減衰が緩やかだからである。なお、残響音の除去量については、上記閾値の調整や上記振幅の抑圧量の調整（例えば、上記変更量に加えて更に定数を乗算する等）によりさらにきめ細かく調整しても良い。 In step SA150, control unit 100 executes a process of removing or adding reverberant sound according to the change amount calculated in step SA140. Spectral subtraction is an example of a method for removing reverberant sound. Spectral subtraction is a subtraction process in the frequency domain, and is realized as follows. First, the control unit 100 performs FFT on the audio data to be processed and converts it to frequency domain data. Next, the control unit 100 tracks the time change of the amplitude level for each frequency bin in the FFT, determines that a reverberation sound is added to the frequency bin whose attenuation width is less than a predetermined threshold, and multiplies the change amount. This suppresses the amplitude. This is because, in general, the sound with the reflected sound is attenuated more slowly than the sound without the reflected sound. It should be noted that the amount of reverberant sound removal may be adjusted more finely by adjusting the threshold value or adjusting the amplitude suppression amount (for example, multiplying a constant in addition to the change amount).

これに対して残響音を付与する方法は以下の通りである。制御部１００は、まず、処理対象のオーディオデータに上記変更量と当該オーディオデータの音響エネルギーとに応じた振幅のインパルス応答を畳み込んで残響音データを生成する。残響音データとは図５（ａ）における「残響音」を表すデータである。次いで、制御部１００は、上記の要領で生成した残響音データを処理対象のオーディオデータに加算する。なお、残響音の付加量については、残響音データを処理対象のオーディオデータに加算する際のミキシング比率の調整や上記インパルス応答の長さの調整によりさらにきめ細かく調整しても良い。 On the other hand, a method for giving a reverberant sound is as follows. First, the control unit 100 generates reverberation sound data by convolving the audio data to be processed with an impulse response having an amplitude corresponding to the change amount and the acoustic energy of the audio data. The reverberation sound data is data representing the “reverberation sound” in FIG. Next, the control unit 100 adds the reverberant sound data generated in the above manner to the audio data to be processed. Note that the amount of reverberant sound added may be adjusted more finely by adjusting the mixing ratio when adding reverberant sound data to the audio data to be processed or adjusting the length of the impulse response.

本動作例では、コンテンツＡの解析区間については除去量を次第に大きくしつつ残響音を除去する処理が施され、コンテンツＢの解析区間については付加量を次第に小さくしつつ残響音を付加する処理が行われる。コンテンツＡのオーディオデータとコンテンツＢのオーディオデータとを続けて再生すると、両コンテンツの切り換わりタイミングを挟んで残響時間はＴＡからＴＢまで図５（ｂ）にて一点鎖線で示す直線にしたがって滑らかに時間変化し、両コンテンツの残響特性の差異が緩和される。このため、上記切り換わりタイミングの前後で残響特性の急激な変化に起因する聴感ギャップ等が発生することはない。 In this operation example, processing for removing the reverberation sound is performed for the analysis section of the content A while increasing the removal amount gradually, and processing for adding the reverberation sound for the analysis section of the content B while gradually decreasing the addition amount is performed. Done. When the audio data of the content A and the audio data of the content B are continuously played back, the reverberation time is smooth from TA to TB according to the straight line shown by the alternate long and short dash line in FIG. It changes over time and the difference in the reverberation characteristics of both contents is alleviated. For this reason, an audible gap or the like due to a sudden change in the reverberation characteristic does not occur before and after the switching timing.

以上説明したように本実施形態の信号処理装置１０によれば、音楽コンテンツの切り換わりタイミングを基準として、先行する音楽コンテンツから後続の音楽コンテンツまで音量等の音響特徴を滑らかに時間変化させることができる。このため、音楽コンテンツが切り換わる際に両コンテンツの音響特徴の差異に起因する聴感ギャップ等の発生が回避される。加えて、信号処理装置１０による解析対象は、音楽コンテンツの切り換わりタイミングの前後の解析区間のオーディオデータに限られるため、特許文献１に開示の技術のように先行および後続の各音楽コンテンツのオーディオデータ全体を対象とした大規模な処理を行う必要はない。そして、信号処理装置１０による加工対象も音楽コンテンツの切り換わりタイミングの前後の解析区間のオーディオデータに限られるため、先行および後続の音楽コンテンツの各々について本来の音響特徴が音楽コンテンツ全体に亘って損なわれることもない。つまり、本実施形態の信号処理装置１０によれば、続けて再生される複数の音楽コンテンツの各々のコンテンツ全体を通しての音響特徴を損なうことなく、コンテンツの切り換わりタイミングでの聴感的な印象が自然になるように制御する（すなわち、聴感ギャップ等の発生を回避する）ことができる。 As described above, according to the signal processing device 10 of the present embodiment, it is possible to smoothly change the acoustic features such as the volume from the preceding music content to the subsequent music content with time, based on the switching timing of the music content. it can. For this reason, when the music content is switched, an audible gap or the like due to a difference in acoustic characteristics between the two contents is avoided. In addition, since the analysis target by the signal processing apparatus 10 is limited to audio data in the analysis section before and after the switching timing of the music content, the audio of each of the preceding and subsequent music contents as in the technique disclosed in Patent Document 1 is used. There is no need to perform large-scale processing on the entire data. Since the processing target by the signal processing apparatus 10 is also limited to the audio data in the analysis section before and after the switching timing of the music content, the original acoustic features of the preceding and succeeding music contents are impaired throughout the entire music content. It will never happen. In other words, according to the signal processing device 10 of the present embodiment, an audible impression at the switching timing of the content is naturally obtained without impairing the acoustic characteristics of each of the plurality of music contents that are continuously played back. (That is, generation of an auditory gap or the like can be avoided).

（Ｃ：変形）
以上本発明の一実施形態について説明したが、この実施形態に以下の変形を加えても勿論良い。
（１）上記実施形態では、続けて再生される複数の音コンテンツが音楽コンテンツである場合について説明したが、続けて再生される複数の音コンテンツは音楽コンテンツには限定されず、小説や案内文などの文章の読み上げ音声であっても良く、森の音や波の音などの環境音や、攪乱音（音声を複数フレームに分割し、それらフレームを並び替える等することで内容を無意味化した音声）などのマスカ音であっても良い。また、上記実施形態では、音楽コンテンツの切り換わりタイミングの前後の解析区間の両方に対して、制御対象の音響特徴が滑らかに変化するように処理を施す場合について説明した。しかし、音楽コンテンツの切り換わりタイミングの手前の解析区間と後続の解析区間の何れか一方に対して音響特徴が滑らかに変化するように処理を施しても良い。例えば、制御対象の音響特徴が音量である場合には、図６（ａ）にて一点鎖線で示すように先行する音楽コンテンツの解析区間に対してのみ音響特徴の変更を施しても良く、また、図６（ｂ）にて一点鎖線で示すように後続の音楽コンテンツの解析区間に対してのみ音響特徴の変更を施しても良い。 (C: deformation)
Although one embodiment of the present invention has been described above, it goes without saying that the following modifications may be added to this embodiment.
(1) In the above embodiment, the case has been described where the plurality of sound contents to be continuously played back are music contents. However, the plurality of sound contents to be played back continuously is not limited to music contents, and novels and guidance sentences. The sound may be read aloud, such as environmental sounds such as forest sounds and wave sounds, and disturbing sounds (the contents are rendered meaningless by dividing the sound into multiple frames and rearranging the frames, etc.) Or a masker sound such as In the above-described embodiment, a case has been described in which processing is performed so that the acoustic feature to be controlled changes smoothly for both analysis sections before and after the switching timing of music content. However, the processing may be performed so that the acoustic feature smoothly changes in any one of the analysis section before the music content switching timing and the subsequent analysis section. For example, when the acoustic feature to be controlled is a volume, the acoustic feature may be changed only for the analysis section of the preceding music content as indicated by a dashed line in FIG. 6B, the acoustic feature may be changed only for the analysis section of the subsequent music content as indicated by the alternate long and short dash line in FIG.

（２）上記実施形態では、続けて再生する複数の音楽コンテンツの各々を表すオーディオデータを外部機器Ｉ／Ｆ部１１０を介して信号処理装置１０に入力し、当該外部機器Ｉ／Ｆ部１１０を介して処理済のオーディオデータをサウンドシステム等へ出力する場合について説明した。しかし、外部機器Ｉ／Ｆ部１１０の代わりにＮＩＣ（Network Interface Card）などの通信Ｉ／Ｆ部を信号処理装置１０に設け、インターネットなどの電気通信回線を当該通信Ｉ／Ｆ部を接続するとともに、続けて再生する複数の音楽コンテンツの各々を表すオーディオデータを当該電気通信回線経由で信号処理装置１０に入力しても良い。同様に、処理済のオーディオデータについても当該電気通信回線経由で出力するようにしても良い。このような態様によれば、続けて再生する複数の音楽コンテンツの各々を表すオーディオデータを電気通信回線経由で受け付け、各音楽コンテンツの音響特徴を損なうことなく、コンテンツの切り換わりタイミングにおいて聴感ギャップ等が発生しないように処理を施したオーディオデータを返信するＡＳＰ（Application Service
Provider）形式の信号処理サービスを提供することが可能になる。 (2) In the above embodiment, audio data representing each of a plurality of music contents to be continuously played is input to the signal processing apparatus 10 via the external device I / F unit 110, and the external device I / F unit 110 is A case has been described in which processed audio data is output to a sound system or the like. However, instead of the external device I / F unit 110, a communication I / F unit such as a NIC (Network Interface Card) is provided in the signal processing device 10, and an electric communication line such as the Internet is connected to the communication I / F unit. Then, audio data representing each of a plurality of music contents to be played back may be input to the signal processing device 10 via the telecommunication line. Similarly, processed audio data may also be output via the telecommunication line. According to such an aspect, audio data representing each of a plurality of music contents to be continuously played back is received via the telecommunication line, and an auditory gap or the like is provided at the timing of switching the contents without impairing the acoustic characteristics of each music content. ASP (Application Service) that returns audio data that has been processed to prevent occurrence of
Provider) format signal processing services can be provided.

（３）上記実施形態では、音楽コンテンツの切り換わりタイミングの前後の解析区間において音響特徴が滑らかに時間変化するように各解析区間のオーディオデータに処理を施すことで聴感ギャップ等の発生を回避する場合について説明した。しかし、音楽コンテンツの切り換わりタイミングが強調されるように各解析区間のオーディオデータに音響特徴を変更する処理を施しても良く、切り換わりタイミングの前後の解析区間における音響特徴の差異が強調されるように各解析区間のオーディオデータに音響特徴を変更する処理を施しても良い。この場合、聴感ギャップ等が強調されることになるが、これを利用した演出を行うことが可能になる。 (3) In the above embodiment, the generation of an auditory gap or the like is avoided by processing the audio data of each analysis section so that the acoustic features smoothly change in time in the analysis section before and after the switching timing of the music content. Explained the case. However, the audio data of each analysis section may be subjected to processing for changing the acoustic feature so that the switching timing of the music content is emphasized, and the difference of the acoustic feature in the analysis section before and after the switching timing is emphasized. As described above, the audio data of each analysis section may be subjected to processing for changing acoustic characteristics. In this case, the auditory sensation gap or the like is emphasized, but it is possible to perform an effect using this.

音楽コンテンツの切り換わりタイミングが強調されるように各解析区間のオーディオデータに音響特徴を変更する態様の具体例は以下の通りである。続けて再生される複数の音楽コンテンツが、映画やビデオゲームの各シーンで流れるＢＧＭ（Back Ground Music）である場合には切り換わりタイミングを強調することでシーンの切り換わりを聴取者に強く想起させることができる。例えば、狭い洞窟で流れるＢＧＭに引き続き、野原などの開けた空間で流れるＢＧＭが再生される場合、図６（ｃ）に示すように、先行するＢＧＭと後続するＢＧＭとでそれらの残響時間に差がないと音の広がり感の変化に欠け、ＢＧＭのみからシーンの切り換わり（洞窟を抜けたら広大な野原が広がっていたこと）を想起することが難しくなる。これに対して、図６（ｃ）にて一点鎖線で示すように残響時間を急激に長くして音の広がり感を強調した後に徐々に元の値に近づけることでＢＧＭの切り換わりタイミングを強調し、上記シーンの切り換わりを聴取者に強く想起させることができる。なお、図６（ｃ）にて二点鎖線で示すように、残響時間を急激に短くした後に徐々に元の値に近づけることによっても同様の効果が得られる。 A specific example of the aspect in which the acoustic feature is changed to the audio data of each analysis section so that the switching timing of the music content is emphasized is as follows. When multiple music contents that are played back continuously are BGM (Back Ground Music) that flows in each scene of a movie or video game, the switching timing is emphasized and the listener is strongly reminded of the switching of the scene. be able to. For example, when a BGM flowing in an open space such as a field is reproduced following a BGM flowing in a narrow cave, as shown in FIG. 6 (c), there is a difference in reverberation time between the preceding BGM and the succeeding BGM. If there is no sound, there will be no change in the sense of sound spread, and it will be difficult to recall the scene change from BGM alone (the vast field spread after leaving the cave). On the other hand, as shown by the alternate long and short dash line in FIG. 6C, the reverberation time is abruptly increased to emphasize the sense of sound spread, and then gradually brought closer to the original value to emphasize the BGM switching timing. In addition, the listener can be reminded of the switching of the scene. Note that, as shown by a two-dot chain line in FIG. 6C, the same effect can be obtained by reducing the reverberation time rapidly and then gradually bringing it closer to the original value.

切り換わりタイミングの前後の解析区間における音響特徴の差異を強調する態様の具体例は以下の通りである。例えば、音コンテンツＡに続けて音コンテンツＢを再生する場合において、制御対象の音響特徴が音量であり、かつ音コンテンツＡの音量ＶＡが音コンテンツＢの音量ＶＢよりも大きい場合には、図６（ｄ）に示すように、音コンテンツＡについての解析区間において音量をより大きな値（ＶＭＡＸ）まで一旦引き上げた後、コンテンツ切り換えタイミングにおいて音量ＶＢよりも小さい値（ＶＭＩＮ）まで急激に音量を引下げ、その後、音コンテンツＢについての解析区間において音量ＶＢまで徐々に音量を増加させる態様が考えられる。なお、図６（ｄ）に示すように示すように切り換えタイミングにおいて音量を不連続に変化させるのではなく、図６（ｅ）に示すように、コンテンツ切り換えタイミングを挟んで音量ＶＡ→音量ＶＭＡＸ→音量ＶＡと音量ＶＢの中間の値→音量ＶＭＩＮ→音量ＶＢと連続的に音量を変化させても良い。これらの態様によれば、コンテンツ切り換えタイミングの前後で両コンテンツの音響特徴の差異が強調され、これを利用した演出を行うことが可能になる。 The specific example of the aspect which emphasizes the difference of the acoustic feature in the analysis section before and after the switching timing is as follows. For example, when the sound content B is reproduced after the sound content A, the sound feature to be controlled is a volume, and the volume VA of the sound content A is larger than the volume VB of the sound content B, FIG. As shown in (d), after the volume is temporarily increased to a larger value (VMAX) in the analysis section for the sound content A, the volume is rapidly decreased to a value (VMIN) smaller than the volume VB at the content switching timing. Thereafter, a mode in which the volume is gradually increased to the volume VB in the analysis section for the sound content B can be considered. As shown in FIG. 6 (d), the volume is not changed discontinuously at the switching timing, but as shown in FIG. 6 (e), the volume VA → volume VMAX → The volume may be continuously changed in the order of the intermediate value between the volume VA and the volume VB → the volume VMIN → the volume VB. According to these aspects, the difference in acoustic characteristics between the two contents is emphasized before and after the content switching timing, and it is possible to perform an effect using this.

このように本発明によれば、音コンテンツの切り換わりタイミングの前後の解析区間において音響特徴を制御することで従来にはなかった演出を行うことが可能になる。なお、このような演出を行う態様においても、信号処理装置１０による解析および音響特徴を変更する処理の対象が切り換わりタイミングの前後の解析区間の音コンテンツに限られることに変わりはなく、先行および後続の音コンテンツの各々の本来の音響特徴が音コンテンツ全体に亘って損なわれることがない。 As described above, according to the present invention, it is possible to perform an effect that has not been achieved in the past by controlling the acoustic features in the analysis section before and after the switching timing of the sound content. It should be noted that even in the aspect of performing such an effect, there is no change in the analysis by the signal processing device 10 and the processing of changing the acoustic feature is limited to the sound content in the analysis section before and after the switching timing. The original acoustic features of each subsequent sound content are not impaired throughout the sound content.

（４）変更手段１２４ａ３の処理の態様を指定する指定手段を信号処理装置１０に設け、変更量算出手段１２４ａ２には、当該指定手段により指定された処理の態様に応じて変更量を算出させ、変更手段１２４ａ３には当該指定手段により指定された態様の処理を音コンテンツに施させるようにしても良い。ここで、変更手段１２４ａ３の実行する処理の態様については、（ａ）切り換えタイミングの前後の解析区間の両方を処理対象とするのか、切り換えタイミングの前の解析区間と後の解析区間の何れか一方のみを処理対象とするのかといった具合に、処理対象とする解析区間についての分類と、（ｂ）音響特徴の差異を緩和する処理であるか、それとも音響特徴の差異（或いは切り換えタイミング）を強調する処理であるか等、処理内容に関する分類とが考えられる。上記指定手段は、処理対象の解析区間を指定するものであっても良く、処理内容を指定するものであっても良い。また、処理対象の解析区間と処理内容の両方を指定するものであっても良い。このような指定手段の具体例としては、タッチパネルやマウスキーボードなどの入力装置と表示装置とからなるユーザインタフェース部が考えられる。 (4) The signal processing apparatus 10 is provided with a specifying unit that specifies the processing mode of the changing unit 124a3, and the change amount calculating unit 124a2 calculates the change amount according to the processing mode specified by the specifying unit, The changing unit 124a3 may be caused to perform the processing of the mode specified by the specifying unit on the sound content. Here, regarding the mode of processing executed by the changing unit 124a3, (a) whether to analyze both analysis sections before and after the switching timing, or one of the analysis section before and after the switching timing. Classification of the analysis section to be processed, and (b) processing to alleviate differences in acoustic features, or emphasize differences in acoustic features (or switching timing). It is possible to classify the contents of processing, such as whether it is processing. The designation means may designate an analysis section to be processed, or may designate processing contents. Further, both the analysis section to be processed and the processing content may be specified. As a specific example of such a designation means, a user interface unit including an input device such as a touch panel and a mouse keyboard and a display device can be considered.

（５）解析手段１２４ａ１には、解析対象のオーディオデータの所定の周波数帯域の音響特徴を解析させ、変更量算出手段１２４ａ２には当該周波数帯域における変更量を算出させ、変更手段１２４ａ３には当該周波数帯域の音響特徴量が当該変更量に応じた分だけ変化するようにオーディデータを更新させるようにしても良い。例えば、続けて再生する複数の音コンテンツがボーカル曲など音声を中心としたものであり、制御対象の音響特徴が残響特性である場合には、音声帯域（例えば１２５Ｈｚ〜２ｋＨｚ）については音響特徴を変更せず、音声帯域よりも高音域についてのみ音響特徴を制御するのである。音の残響感（広がり感）は高音域の方が感知し易いため、このような態様によれば、音コンテンツ全体の音質（聴感的な印象）を大きく変えることなく、聴感ギャップ等を緩和することが可能になる。 (5) The analysis unit 124a1 analyzes the acoustic characteristics of the audio data to be analyzed in a predetermined frequency band, the change amount calculation unit 124a2 calculates the change amount in the frequency band, and the change unit 124a3 The audio data may be updated so that the acoustic feature amount of the band changes by an amount corresponding to the change amount. For example, when a plurality of sound contents to be continuously played is centered on voice such as vocal music, and the acoustic feature to be controlled is a reverberation characteristic, the acoustic feature is not obtained for the voice band (for example, 125 Hz to 2 kHz). The acoustic feature is controlled only for the higher sound range than the voice band without changing. Since the sound reverberation (spreading) is more easily detected in the high sound range, according to such an aspect, the auditory gap or the like is alleviated without greatly changing the sound quality (auditory impression) of the entire sound content. It becomes possible.

また、解析手段１２４ａ１には、予め定められた複数の帯域成分に分割する帯域分割を解析対象のオーディオデータに対して施して帯域毎に音響特徴を特定させ、変更量算出手段１２４ａ２には、帯域毎に変更量を算出させ、変更手段１２４ａ３には音響特徴量を当該変更量に応じた分だけ変化させる処理を帯域毎に実行させるようにしても良い。この場合、各帯域で共通の音響特徴を変更するものの、音響特徴を変更する処理の処理態様を帯域毎に異ならせても良く、帯域毎に異なる音響特徴を変更しても良い。 Further, the analysis unit 124a1 performs band division to divide into a plurality of predetermined band components on the audio data to be analyzed to specify acoustic features for each band, and the change amount calculation unit 124a2 The change amount may be calculated every time, and the changing unit 124a3 may be caused to execute processing for changing the acoustic feature amount by an amount corresponding to the change amount for each band. In this case, although the common acoustic feature is changed in each band, the processing mode of the process of changing the acoustic feature may be changed for each band, or the different acoustic feature may be changed for each band.

各帯域で共通の音響特徴を変更するものの、音響特徴を変更する処理の処理態様を帯域毎に異ならせる態様の具体例としては、制御対象の音響特徴が残響特性である場合に、高音域を先に変更し、後から低音域を変更する態様、すなわち、変更するタイミングを高音域（例えば２ｋＨｚ〜４ｋＨｚ）と低音域（２ｋＨｚより低い帯域）とで異ならせる態様が挙げられる。前述したように、残響感は高音域の方が感知し易いため、高音域をまず変更したほうが音質がいきなり大きく変化することを避けて聴感ギャップ等を緩和することができるからである。帯域毎に異なる音響特徴を変更する態様の具体例としては、続けて再生する複数の音コンテンツがボーカル曲など音声を中心としたものであり、残響感とボーカルの音量感の両方を制御したい場合に、音声帯域については音量を制御対象の音響特徴とし、高音域については残響特性を制御対象の音響特徴とする態様が考えられる。
このような態様によれば、音コンテンツの切り換わり前後の聴感を帯域毎にきめ細かく制御することが可能になる。なお、本変形と前述した（１）または（３）の変形を組み合わせても勿論良い。 Although specific acoustic features are changed in each band, a specific example of an aspect in which the processing aspect of changing the acoustic feature is different for each band is as follows. The aspect which changes first, changes a low-pitched range later, ie, the aspect which changes the timing to change with a high-pitched-range (for example, 2 kHz-4 kHz) and a low-pitched-range (band | band lower than 2 kHz) is mentioned. As described above, since the reverberation is easier to detect in the high sound range, changing the high sound range first can avoid the sudden change in sound quality and reduce the auditory gap. A specific example of how to change the acoustic features that differ for each band is when multiple sound contents to be played back are mainly voices such as vocal songs, and you want to control both the reverberation and the volume of the vocals. In addition, with respect to the voice band, a mode in which the volume is set as the acoustic feature to be controlled and a reverberation characteristic is set as the acoustic feature to be controlled in the high pitch range can be considered.
According to such an aspect, it becomes possible to finely control the audibility before and after switching of the sound content for each band. Of course, this modification may be combined with the modification (1) or (3) described above.

（６）上記実施形態では、信号処理装置１０に入力された複数のオーディオデータの各々に無条件に音響特徴を変更する処理を施す場合について説明した。ミュージッククリップなどの音楽コンテンツのなかには、商業的なイメージ戦略のために歌唱者や演奏者固有の音響特徴となるように調整が施されているものがあり、このような音楽コンテンツのオーディオデータの音響特徴を変更すると問題が生じる場合がある。音響特徴を変更されたものが動画サイトなどを通じて広まると、上記イメージ戦略に支障を来す虞があるからである。そこで、音響特徴を変更することが好ましくない音楽コンテンツについては、音響特徴の変更を許可しないことを示す値をセットしたフラグを付与した状態でオーディオデータをその配布元に配布させる一方、制御部１００を前述した解析手段１２４ａ１、変更量算出手段１２４ａ２、および変更手段１２４ａ３として機能させるとともに、処理対象のオーディオデータに付与されているフラグの値が音響特徴の変更を許可する旨を示す値である場合に、解析手段１２４ａ１、変更量算出手段１２４ａ２、および変更手段１２４ａ３を作動させる制御手段として機能させる信号処理プログラムを不揮発性記憶部１２４に記憶させておけば良い。 (6) In the above-described embodiment, a case has been described in which a process for unconditionally changing an acoustic feature is performed on each of a plurality of audio data input to the signal processing device 10. Some music content, such as music clips, has been adjusted for the commercial image strategy to be unique to singers and performers. Changing features can cause problems. This is because if the sound characteristics are changed through a video site or the like, the image strategy may be hindered. Therefore, for music content for which it is not desirable to change the acoustic feature, the audio data is distributed to the distribution source with a flag set to indicate that the change of the acoustic feature is not permitted, while the control unit 100 Are functioned as the analysis unit 124a1, the change amount calculation unit 124a2, and the change unit 124a3, and the value of the flag given to the audio data to be processed is a value indicating that the change of the acoustic feature is permitted. In addition, the non-volatile storage unit 124 may store a signal processing program that functions as a control unit that operates the analysis unit 124a1, the change amount calculation unit 124a2, and the change unit 124a3.

１０…信号処理装置、１００…制御部、１１０…外部機器Ｉ／Ｆ部、１２０…記憶部、１２２…揮発性記憶部、１２４…不揮発性記憶部、１２４ａ…信号処理プログラム、１３０…バス。 DESCRIPTION OF SYMBOLS 10 ... Signal processing apparatus, 100 ... Control part, 110 ... External apparatus I / F part, 120 ... Memory | storage part, 122 ... Volatile memory | storage part, 124 ... Nonvolatile memory | storage part, 124a ... Signal processing program, 130 ... Bus.

Claims

Analysis means for analyzing the acoustic characteristics of each sound content in a section before and after the switching timing of the plurality of sound content to be played back continuously;
Based on the acoustic feature amount representing the acoustic feature of the sound content in the section before the switching timing and the acoustic feature amount representing the acoustic feature of the sound content in the section after the switching timing, before and after the switching timing. A change amount calculating means for calculating a change amount of the acoustic feature amount of the sound content in the section;
Change means for performing processing according to the change amount calculated by the change amount calculation means for the sound content in the section before and after the switching timing;
A signal processing apparatus comprising:

The signal processing apparatus according to claim 1, wherein the acoustic feature is a reverberation characteristic.

The signal processing apparatus according to claim 1, wherein the acoustic feature is a volume.

Having a specifying means for specifying the processing mode of the changing means;
The signal processing apparatus according to claim 1, wherein the change amount calculation unit calculates the change amount according to a processing mode designated by the designation unit.

Each of the plurality of sound contents is provided with a flag indicating whether or not the change of the acoustic feature is permitted,
5. The apparatus according to claim 1, further comprising a control unit that activates the analysis unit, the change amount calculation unit, and the change unit when the flag is a value that permits the change. A signal processing device according to 1.