JP6601109B2

JP6601109B2 - Instrument identification device

Info

Publication number: JP6601109B2
Application number: JP2015195238A
Authority: JP
Inventors: 慶太有元
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2015-09-30
Filing date: 2015-09-30
Publication date: 2019-11-06
Anticipated expiration: 2035-09-30
Also published as: WO2017057532A1; JP2017068125A

Description

本発明は楽器類識別装置に関する。 The present invention relates to a musical instrument identification device.

例えば、下記特許文献１には、アコースティック楽器、自然音、人、生物等の音源が発した音から、音源を特定する音源識別装置が開示されている。具体的には、当該音源識別装置は、音源から発せられる音の特徴データをデータベースに登録しておき、特定対象の音源から発せられた音の特徴データベースとの相関に基づいて、音源を特定する。 For example, Patent Document 1 below discloses a sound source identification device that identifies a sound source from sounds emitted from a sound source such as an acoustic instrument, a natural sound, a person, or a living thing. Specifically, the sound source identification device registers the sound feature data emitted from the sound source in a database, and identifies the sound source based on the correlation with the sound feature database emitted from the sound source to be identified. .

特開２０１３−１５６０１号公報JP2013-15601A

しかしながら、上記音源識別装置においては、対象である音響信号毎に独立して音源を特定する構成であることから、例えば、複数の楽器からの音響信号が入力される場合に、類似する音色の楽器について音源の特定の精度が不十分である場合がある。 However, since the sound source identification device is configured to specify a sound source independently for each target acoustic signal, for example, when acoustic signals from a plurality of musical instruments are input, musical instruments of similar tones The specific accuracy of the sound source may be insufficient.

上記に鑑み、本発明は、例えば、複数の楽器からの音響信号が入力される場合であっても、より精度よく楽器の識別が可能な楽器類識別装置等を実現することを目的とする。 In view of the above, an object of the present invention is to realize a musical instrument identification device and the like that can identify musical instruments with higher accuracy even when acoustic signals from a plurality of musical instruments are input.

楽器類識別装置であって、複数のチャンネル毎に取得された音響信号に基づいて、前記音響信号の特徴および前記音響信号の楽器類毎に、当該楽器類に対応する可能性を表す指標値により構成される指標値データを取得する指標値取得手段と、前記複数のチャンネル間の音響信号に基づいて、前記チャンネル間における前記音響信号の特徴を特徴情報として検出するチャンネル間特徴情報検出手段と、前記指標値データと前記特徴情報に基づいて、前記各チャンネルの前記楽器類毎に、前記楽器類に該当する確度に応じた値をスコア情報として生成するスコア情報生成手段と、を含むことを特徴とする。 A musical instrument identification device based on an acoustic signal acquired for each of a plurality of channels, based on an index value representing the characteristics of the acoustic signal and the possibility of corresponding to the instrument for each instrument of the acoustic signal Index value acquisition means for acquiring index value data configured; interchannel feature information detection means for detecting features of the acoustic signal between the channels as feature information based on the acoustic signals between the plurality of channels; Score information generating means for generating, as score information, a value corresponding to the accuracy corresponding to the musical instrument for each musical instrument of each channel based on the index value data and the characteristic information. And

音響信号処理システムの概要の一例を示す図である。It is a figure which shows an example of the outline | summary of an acoustic signal processing system. ミキサの構成の概要の一例を示す図である。It is a figure which shows an example of the outline | summary of a structure of a mixer. 第１の実施形態におけるミキサの制御部の機能的構成の一例を示す図である。It is a figure which shows an example of a functional structure of the control part of the mixer in 1st Embodiment. 指標値取得部の構成の一例を示す図である。It is a figure which shows an example of a structure of an index value acquisition part. 指標値データの一例を示す図である。It is a figure which shows an example of index value data. 閾値判定・除外部の処理の一例について説明するための図である。It is a figure for demonstrating an example of a process of a threshold value determination / exclusion part. チャンネル間特徴情報検出部の特徴検出の一例について説明するための図である。It is a figure for demonstrating an example of the feature detection of the feature information detection part between channels. チャンネル間特徴情報検出部の特徴検出の他の一例について説明するための図である。It is a figure for demonstrating another example of the feature detection of the feature information detection part between channels. スコア情報の一例を示す図である。It is a figure which shows an example of score information. 第１の実施の形態における音響信号を取得してから各チャンネルに対応する楽器類を決定するまでの処理のフローの一例を示す図である。It is a figure which shows an example of the flow of a process after acquiring the acoustic signal in 1st Embodiment until the musical instruments corresponding to each channel are determined. 第２の実施形態におけるミキサの制御部の機能的構成の一例を示す図である。It is a figure which shows an example of a functional structure of the control part of the mixer in 2nd Embodiment. 組み合わせスコア情報の一例を示す図である。It is a figure which shows an example of combination score information. 組み合わせスコア情報抽出部により抽出された組み合わせスコア情報の一例を示す図である。It is a figure which shows an example of the combination score information extracted by the combination score information extraction part. 第２の実施の形態における音響信号を取得してから各チャンネルに対応する楽器類を決定するまでの処理のフローの一例を示す図である。It is a figure which shows an example of the flow of a process after acquiring the acoustic signal in 2nd Embodiment until determining the musical instruments corresponding to each channel. 第３の実施形態におけるミキサの制御部の機能的構成の一例を示す図である。It is a figure which shows an example of a functional structure of the control part of the mixer in 3rd Embodiment. 各チャネル間の相関値データの一例を示す図である。It is a figure which shows an example of the correlation value data between each channel. トップ決定の処理の他の一例を説明するための図である。It is a figure for demonstrating another example of the process of top determination. トップ決定の処理の他の一例を説明するための図である。It is a figure for demonstrating another example of the process of top determination.

以下、本発明の実施形態について、図面を参照しつつ説明する。なお、図面については、同一又は同等の要素には同一の符号を付し、重複する説明は省略する。 Embodiments of the present invention will be described below with reference to the drawings. In addition, about drawing, the same code | symbol is attached | subjected to the same or equivalent element, and the overlapping description is abbreviate | omitted.

図１は、本実施の形態における音響信号処理システムの概要の一例を示す図である。図１に示すように、音響信号処理システム１００は、例えば、キーボード１０１、ドラム１０２、ギター１０３、マイク１０４、トップマイク１０５、ミキサ１０６、アンプ１０７、スピーカ１０８を有する。 FIG. 1 is a diagram illustrating an example of an outline of an acoustic signal processing system according to the present embodiment. As shown in FIG. 1, the acoustic signal processing system 100 includes, for example, a keyboard 101, a drum 102, a guitar 103, a microphone 104, a top microphone 105, a mixer 106, an amplifier 107, and a speaker 108.

キーボード１０１は、例えば、シンセサイザーや電子ピアノであって、演奏者の演奏に応じて、音響信号を出力する。マイク１０４は、例えば、歌手の声を収音し、当該収音した音を音響信号として出力する。ドラム１０２は、例えば、ドラムセットと、当該ドラムセットに含まれる打楽器（例えばバスドラムやスネアドラム等）を打つことにより発生する音を収音する各マイクを含む。当該マイクは、打楽器ごとに設けられており、収音した音を音響信号として出力する。ギター１０３は、例えば、アコースティックギター１０３とマイクを有し、アコースティックギター１０３の音を、当該マイクで収音して音響信号として出力する。なお、ギター１０３は、エレクトリックアコースティックギターやエレクトリックギターとしてもよい。その場合は、マイクを設ける必要はない。トップマイク１０５は、複数の楽器からの音、例えば、ドラムセットの上方に設置されるマイクであって、ドラムセット全体からの音を収音し、音響信号として出力する。なお、トップマイク１０５は、例えば左右に設置する等、複数のマイクから構成してもよい。トップマイク１０５は、ドラムセット以外の楽器類からの音も小音量ながら不可避的に収音する。 The keyboard 101 is, for example, a synthesizer or an electronic piano, and outputs an acoustic signal according to the performance of the performer. For example, the microphone 104 collects a singer's voice and outputs the collected sound as an acoustic signal. The drum 102 includes, for example, a drum set and microphones that collect sounds generated by hitting a percussion instrument (for example, a bass drum or a snare drum) included in the drum set. The microphone is provided for each percussion instrument, and outputs the collected sound as an acoustic signal. The guitar 103 includes, for example, an acoustic guitar 103 and a microphone, and the sound of the acoustic guitar 103 is collected by the microphone and output as an acoustic signal. The guitar 103 may be an electric acoustic guitar or an electric guitar. In that case, there is no need to provide a microphone. The top microphone 105 is a microphone installed above a plurality of musical instruments, for example, a drum set, and collects a sound from the entire drum set and outputs it as an acoustic signal. The top microphone 105 may be composed of a plurality of microphones, for example, installed on the left and right. The top microphone 105 inevitably picks up sounds from musical instruments other than the drum set, though the volume is low.

ミキサ１０６は、複数の入力端子を有し、当該各入力端子に入力された上記キーボード１０１、ドラム１０２、ギター１０３、マイク１０４等からの音響信号を電気的に加算、加工し出力する。具体的には、ミキサ１０６は、例えば、音量等のレベルの制御を行うレベル制御部や音のバランスを変化させ音の定位を調整するパン制御部等を含み、レベル制御等が行われた各音響信号を混合部により、混合して、アンプ１０７に出力する。なお、本実施の形態におけるミキサ１０６は、上記のような一般的なミキサ１０６の構成の他、楽器類識別機能等を有するが、当該楽器類識別機能等の詳細については、後述する。また、一般的なミキサ１０６の構成については周知であるので、詳細については説明を省略する。 The mixer 106 has a plurality of input terminals, and electrically adds, processes, and outputs acoustic signals from the keyboard 101, the drum 102, the guitar 103, the microphone 104, and the like input to the input terminals. Specifically, the mixer 106 includes, for example, a level control unit that controls the level of the volume and the like, a pan control unit that changes the sound balance and adjusts the sound localization, and the like. The acoustic signals are mixed by the mixing unit and output to the amplifier 107. The mixer 106 in the present embodiment has a musical instrument identification function and the like in addition to the configuration of the general mixer 106 as described above. Details of the musical instrument identification function and the like will be described later. Further, since the configuration of the general mixer 106 is well known, a detailed description thereof will be omitted.

アンプ１０７は、ミキサ１０６の出力端子から出力される音響信号を増幅しスピーカ１０８に出力する。スピーカ１０８は、増幅された音響信号に応じて放音する。 The amplifier 107 amplifies the acoustic signal output from the output terminal of the mixer 106 and outputs it to the speaker 108. The speaker 108 emits sound according to the amplified acoustic signal.

次に、本実施の形態におけるミキサ１０６の構成の一例について説明する。図２は、本実施の形態におけるミキサ１０６の構成の概要について説明するための図である。図２に示すように、ミキサ１０６は、例えば、制御部２０１、記憶部２０２、操作部２０３、表示部２０４、入出力部２０５を有する。なお、制御部２０１、記憶部２０２、操作部２０３、表示部２０４、入出力部２０５は、内部バス２０６により互いに接続される。 Next, an example of the configuration of the mixer 106 in the present embodiment will be described. FIG. 2 is a diagram for explaining the outline of the configuration of the mixer 106 in the present embodiment. As illustrated in FIG. 2, the mixer 106 includes, for example, a control unit 201, a storage unit 202, an operation unit 203, a display unit 204, and an input / output unit 205. The control unit 201, the storage unit 202, the operation unit 203, the display unit 204, and the input / output unit 205 are connected to each other via an internal bus 206.

制御部２０１は、例えば、ＣＰＵ、ＭＰＵ等であって、記憶部２０２に格納されたプログラムに従って動作する。記憶部２０２は、例えば、ＲＯＭやＲＡＭ、ハードディスク等の情報記録媒体で構成され、制御部２０１によって実行されるプログラムを保持する情報記録媒体である。 The control unit 201 is, for example, a CPU, MPU, or the like, and operates according to a program stored in the storage unit 202. The storage unit 202 is an information recording medium that includes an information recording medium such as a ROM, a RAM, and a hard disk, and holds a program executed by the control unit 201.

記憶部２０２は、制御部２０１のワークメモリとしても動作する。なお、当該プログラムは、例えば、ネットワーク（図示なし）を介して、ダウンロードされて提供されてもよいし、または、ＣＤ−ＲＯＭやＤＶＤ−ＲＯＭ等のコンピュータで読み取り可能な各種の情報記録媒体によって提供されてもよい。 The storage unit 202 also operates as a work memory for the control unit 201. The program may be provided by being downloaded via a network (not shown), or provided by various information recording media that can be read by a computer such as a CD-ROM or a DVD-ROM. May be.

操作部２０３は、例えば、スライド式のボリューム、ボタン、ツマミ等、ユーザの指示操作に応じて、当該指示操作の内容を制御部２０１に出力する。表示部２０４は、例えば、液晶ディスプレイ、有機ＥＬディスプレイ等であって、制御部２０１からの指示に従い、情報を表示する。 The operation unit 203 outputs the content of the instruction operation to the control unit 201 in accordance with a user instruction operation such as a slide-type volume, button, or knob. The display unit 204 is, for example, a liquid crystal display, an organic EL display, or the like, and displays information according to an instruction from the control unit 201.

入出力部２０５は、複数の入力端子及び出力端子を有する。各入力端子には、キーボード１０１、ドラム１０２、ギター１０３、マイク１０４等の各楽器類およびトップマイク１０５から、音響信号が入力される。また、出力端子からは、上記入力された音響信号を電気的に加算、加工した音響信号が出力される。なお、当該ミキサ１０６の構成は、一例であってこれに限定されるものではない。例えば、レベル制御等の一部の機能をアナログで処理するように構成してもよい。 The input / output unit 205 has a plurality of input terminals and output terminals. Acoustic signals are input to each input terminal from each instrument such as the keyboard 101, drum 102, guitar 103, microphone 104, and top microphone 105. Further, an acoustic signal obtained by electrically adding and processing the input acoustic signal is output from the output terminal. The configuration of the mixer 106 is an example and is not limited to this. For example, some functions such as level control may be processed in an analog manner.

次に、図３を用いて、本実施の形態におけるミキサ１０６の制御部２０１の機能的構成について説明する。 Next, the functional configuration of the control unit 201 of the mixer 106 in the present embodiment will be described with reference to FIG.

音響信号取得部３０１は、チャンネル毎に音響信号を取得する。ここで、各チャンネルは、それぞれ上記キーボード１０１、ドラム１０２、ギター１０３等の各楽器類からの各音響信号の各入力端子に対応する。 The acoustic signal acquisition unit 301 acquires an acoustic signal for each channel. Here, each channel corresponds to each input terminal of each acoustic signal from each instrument such as the keyboard 101, the drum 102, and the guitar 103.

オンセット・オフセット検出部３０２は、入力された音響信号からオンセット及びオフセットを抽出する。オンセットとは音響信号の立ち上がりに相当し、オフセットとは当該音響信号の出力が所定の値以下（例えば、ほぼ０）になることに相当する。 The onset / offset detection unit 302 extracts the onset and offset from the input acoustic signal. Onset corresponds to the rise of the acoustic signal, and offset corresponds to the output of the acoustic signal being equal to or less than a predetermined value (for example, approximately 0).

特徴量抽出部３０３は、オンセットからオフセットの間の音響信号（以下単に「発音区間」という）の特徴量を抽出する。当該抽出はチャンネル毎及び発音区間毎に行う。 The feature amount extraction unit 303 extracts a feature amount of an acoustic signal (hereinafter simply referred to as “sounding section”) between onset and offset. The extraction is performed for each channel and each sounding section.

指標値取得部３０４は、特徴量に基づいて、発音区間毎にどの楽器類であると識別されるかの指標を表す指標値を取得する。なお、当該指標値の取得はチャンネル毎に、また楽器類毎に行う。 The index value acquisition unit 304 acquires an index value representing an index indicating which musical instrument is identified for each sound generation section based on the feature amount. The index value is acquired for each channel and each instrument.

具体的には、例えば、指標値取得部３０４は、図４に示すように、３つのＳＶＭ(Support Vector Machine)を用いて構成する。当該３つのＳＶＭは、上記特徴量(例えば、特徴ベクトル（feature vector)）に基づいて、入力された発音区間がどの楽器類の候補の音響信号に相当するかにつき識別する。具体的には、ＳＶＭ０は、ギター(Guitar)や男性、女性音声(male Vo, female Vo)等を表す調和（harmonic）音か、スネア（Snare）やシンバル（Cymbal）等打楽器(percussive)の音かを識別する。ＳＶＭ１は、入力されたオンセットが、キック(Kick)、スネア、シンバル等の打楽器のいずれの楽器であるかを識別する。ＳＶＭ２は、入力された発音区間が、バス（Bass）、ギター（Guitar）、男性ボーカル(maleVo)、女性ボーカル(femaleVo)等のいずれであるかを識別する。 Specifically, for example, the index value acquisition unit 304 is configured using three SVMs (Support Vector Machines) as shown in FIG. The three SVMs identify which musical instrument candidate acoustic signal corresponds to the input sounding section based on the feature amount (for example, a feature vector). Specifically, SVM0 is a harmonic sound representing guitar, male, female voice (male Vo, female Vo), etc., or a percussive sound such as snare or cymbal. To identify. The SVM 1 identifies whether the input onset is a percussion instrument such as a kick, snare, or cymbal. The SVM 2 identifies whether the input sound generation section is a bass (Bass), a guitar (Guitar), a male vocal (maleVo), a female vocal (femaleVo), or the like.

なお、各ＳＶＭで識別する楽器類の種類や数は例示であって、本実施の形態はこれらに限定されるものではない。また、ＳＶＭとは、機械学習アルゴリズムの１つであって、あらかじめ各楽器類の音響信号の特徴量を学習させておき、これに基づいて、入力された特徴量がいずれの楽器類であるか分類する技術であるが、周知であるので、詳細については説明を省略する。また、本実施の形態においては、機械学習アルゴリズムによる分類の一例としてＳＶＭを用いる場合について説明するが、その他のアルゴリズム（例えば、単純回帰分析（Simple logistic regression））等を用いてもよい。更に、上記においては３つのＳＶＭを用いる場合について説明したが、本実施の形態は、これに限られず例えば２つのＳＶＭ等で構成してもよい。 Note that the types and number of musical instruments identified by each SVM are examples, and the present embodiment is not limited to these. SVM is one of machine learning algorithms, and learns the feature value of the acoustic signal of each instrument beforehand, and based on this, which instrument is the input feature value? Although it is a technique to classify, since it is well known, the description is omitted for details. In this embodiment, the case where SVM is used as an example of classification by a machine learning algorithm will be described, but other algorithms (for example, simple logistic regression) may be used. Furthermore, although the case where three SVMs are used has been described above, the present embodiment is not limited to this, and may be configured with two SVMs, for example.

上記のようにして、取得された指標値から構成される指標値データの一例を図５に示す。ここで、図５において、onsetは入力された各音響信号の発音区間を識別する情報を表す。Onset Timeは、発音区間の開始時刻（以下、オンセット時刻）を表し、Offset Timeは発音区間の終了時刻（以下、オフセット時刻）を示す。amplitudeは、は、発音区間の振幅を表す。また、NonPercussive（上記調和音に相当）及びPercussiveは、ＳＶＭ０からの出力を、CongaからKickまでは、ＳＶＭ１からの出力を表し、BassからWindは、ＳＶＭ２からの出力を示す。なお、指標値の数値が大きいほど、当該楽器類に対応する可能性が高いことを表す。なお、図５において、ＳＶＭ０については０から１までの数値をとるように設計されている。 An example of index value data composed of the index values acquired as described above is shown in FIG. Here, in FIG. 5, onset represents information for identifying the sound generation interval of each input acoustic signal. Onset Time represents the start time (hereinafter referred to as onset time) of the sounding section, and Offset Time represents the end time (hereinafter referred to as offset time) of the sounding section. “amplitude” represents the amplitude of the sound generation interval. Further, NonPercussive (corresponding to the above harmonic sound) and Percussive represent the output from SVM0, Conga to Kick represent the output from SVM1, and Bass to Wind represent the output from SVM2. In addition, it represents that possibility that it corresponds to the said musical instruments is so high that the numerical value of an index value is large. In FIG. 5, SVM0 is designed to take a numerical value from 0 to 1.

閾値判定・除外部３０５は、１つのチャンネルに含まれる発音区間が所定の音量閾値以下であるか否かを判定する。そして、閾値判定・除外部３０５は、所定の音量以下であると判定された発音区間に関する情報を指標値データから除外する。具体的には、例えば、図６に示す場合、Vo. Activeとして示す以外の発音区間（other source）の極大値は、点線で示す音量閾値以下であることから、当該other sourceで表される発音区間を除外する。 The threshold determination / exclusion unit 305 determines whether or not a sound generation section included in one channel is equal to or less than a predetermined volume threshold. Then, the threshold determination / exclusion unit 305 excludes the information related to the sound generation section determined to be equal to or lower than the predetermined volume from the index value data. Specifically, for example, in the case shown in FIG. 6, since the maximum value of the sound production section (other source) other than that indicated as Vo. Active is equal to or less than the volume threshold value indicated by the dotted line, the sound production represented by the other source Exclude sections.

チャンネル間特徴情報検出部３０６は、各チャンネルの発音区間を比較し、所定の特徴情報を検出する。ここで、所定の特徴情報とは、例えば、発音区間の開始タイミングが他のチャンネルと異なるという特徴を示す情報である。また、所定の特徴情報は、他のチャンネルの発音区間の信号レベルが当チャンネルの発音区間の信号レベルと比べて非常に小さい（所定の閾値以下）という特徴を示す情報であってもよい。具体的には、例えば、各チャンネルの発音区間が図７に示す場合、図７の７０１で示す部分のチャンネルの発音区間は、発音区間の開始タイミング（オンセット時刻）が他のチャンネルと異なっており、また、他のチャンネルの信号レベルが当該発音区間の信号レベルと比べて非常に小さい。 The inter-channel feature information detection unit 306 compares the sound generation sections of the respective channels and detects predetermined feature information. Here, the predetermined feature information is, for example, information indicating a feature that the start timing of the sound generation interval is different from that of other channels. Further, the predetermined feature information may be information indicating a characteristic that the signal level of the sound generation section of the other channel is very small (below a predetermined threshold) as compared with the signal level of the sound generation section of the channel. Specifically, for example, when the sound generation interval of each channel is shown in FIG. 7, the sound generation interval of the channel indicated by 701 in FIG. 7 differs from the other channels in the start timing (onset time) of the sound generation interval. In addition, the signal level of the other channel is very small compared to the signal level of the sounding section.

また、所定の特徴情報は、例えば、複数のチャンネルに渡ってオンセット時刻・オフセット時刻がほぼ同時（所定の閾値の範囲内）で、かつ、当該複数の発音チャンネルの発音区間について指標値が最も大きな値を示す楽器類が同じ場合であってもよい。具体的には、当該特徴情報は、例えば、図８の８０１で表す部分に示すように、３チャンネル同時にオンセット時刻、オフセット時刻が同じで、当該３チャンネルのSnareの指標値が当該発音区間について最も高い値であることを示すという特徴情報である。なお、８０２で表す部分が３チャンネルの間で最も大きな指標値を示す。 Further, the predetermined feature information includes, for example, the onset time and the offset time almost simultaneously (within a predetermined threshold range) across a plurality of channels, and the index value for the sound generation section of the sound generation channels is the highest. The same musical instrument may be used. Specifically, for example, as shown in the part indicated by 801 in FIG. 8, the feature information has the same onset time and offset time for the three channels at the same time, and the Snare index value for the three channels indicates the sound generation interval. This is characteristic information indicating the highest value. Note that the portion represented by 802 indicates the largest index value among the three channels.

スコア情報生成部３０７は、指標値データ及び上記検出された特徴情報に基づいて、スコア情報を生成する。具体的には、例えば、図４に示すようにＳＶＭ１の出力、及び、ＳＶＭ２の出力に、ＳＶＭ０の出力のうち調和音と識別された出力、打楽器音と識別された出力をそれぞれ乗算した後、加算してスコア情報を生成する。この場合、ＳＶＭ１からの当該各出力は０乃至１の範囲とし、ＳＶＭ１、ＳＶＭ２の出力のＳＶＭ０からの出力に重みづけして乗算する構成とする。なお、当該指標値データは、上記のように閾値以下のオンセットが含まれている場合には当該オンセットが除外された指標値データに相当する。 The score information generation unit 307 generates score information based on the index value data and the detected feature information. Specifically, for example, as shown in FIG. 4, after multiplying the output of SVM1 and the output of SVM2 by the output identified as the harmonic sound and the output identified as the percussion instrument sound, respectively, among the outputs of SVM0, Add to generate score information. In this case, each output from the SVM 1 is in the range of 0 to 1, and the output from the SVM 0 of the SVM 1 and SVM 2 is weighted and multiplied. Note that the index value data corresponds to the index value data from which the onset is excluded when the onset below the threshold is included as described above.

ここで、スコア情報生成部３０７は、検出された特徴情報に応じて、各発音区間のスコア情報への寄与度を調整しつつスコア情報を生成する。具体的には、例えば、所定の特徴情報が、上記発音区間の開始タイミングが他のチャンネルと異なるという特徴を示す情報や、他のチャンネルの発音区間の信号レベルが当チャンネルの発音区間の信号レベルと比べて非常に小さいという特徴を示す情報の場合は、当該発音区間の寄与度を上げるように調整する。また、所定の特徴情報は、例えば、複数のチャンネルに渡ってオンセット時刻・オフセット時刻がほぼ同時で、かつ、当該発音区間について指標値が最も大きな値を示す楽器類が同じ場合は、他のチャンネルにおける当該発音区間の寄与度を下げるように調整する。その他、例えば、発音区間の開始時刻がほぼ同時の場合、一番早いオンセット時刻の発音区間以外の寄与度を下げるように調整するように構成してもよい。なお、上記においては、寄与度が高くするほど、スコア情報が高くなるものとする。 Here, the score information generation unit 307 generates score information while adjusting the degree of contribution to the score information of each pronunciation section according to the detected feature information. Specifically, for example, the predetermined feature information is information indicating that the start timing of the sound generation interval is different from that of the other channel, or the signal level of the sound generation interval of the other channel is the signal level of the sound generation interval of the channel. In the case of information indicating a feature that is very small compared to, the adjustment is made so as to increase the contribution of the pronunciation section. In addition, the predetermined feature information is, for example, when the onset time and the offset time are almost the same over a plurality of channels and the musical instruments having the largest index value for the sound generation section are the same. Make adjustments to reduce the contribution of the pronunciation interval in the channel. In addition, for example, when the start times of the sound generation intervals are almost the same, the contribution may be adjusted so as to reduce the contribution other than the sound generation interval of the earliest onset time. In the above description, it is assumed that the higher the contribution, the higher the score information.

このようにして生成されたスコア情報を図９に示す。図９に示すように、スコア情報においては、チャンネル毎に各楽器類であることを示すスコアが数値で示される。言い換えれば、当該数値が大きいほど、当該チャンネルが当該楽器であることが確からしいことを表す。すなわち、スコア情報は楽器類に該当する確度に応じた値となっている。 The score information generated in this way is shown in FIG. As shown in FIG. 9, in the score information, a score indicating that each instrument is for each channel is indicated by a numerical value. In other words, the larger the value, the more likely that the channel is the instrument. That is, the score information is a value corresponding to the accuracy corresponding to the musical instrument.

信頼度取得部３０８は、各チャンネルの信頼度を取得する。当該信頼度は、例えば、指標値データにおける各指標値の一貫性（分散）や、発音区間の数、全発音区間の平均音量等に基づいて取得する。具体的には、例えば、指標値データに含まれる発音区間の数が少ないものは信頼度を下げ、例えば、全発音区間の平均音量が他のチャンネルの平均音量が大きいほど信頼度を上げる等である。なお、上記は全てのチャンネルのマイクゲインが均一であることをその前提とする。また、上記信頼度の取得は一例であって、本実施の形態は上記に限定されるものではない。 The reliability acquisition unit 308 acquires the reliability of each channel. The reliability is acquired based on, for example, the consistency (variance) of each index value in the index value data, the number of sounding sections, the average sound volume of all sounding sections, and the like. Specifically, for example, when the number of sounding intervals included in the index value data is small, the reliability is lowered, for example, the higher the average sound volume of all sounding intervals is, the higher the sound volume of other channels is. is there. Note that the above assumes that the microphone gains of all channels are uniform. Further, the acquisition of the reliability is an example, and the present embodiment is not limited to the above.

楽器決定部３０９は、信頼度及びスコアに基づいて、各チャンネルの楽器類を決定する。具体的には、例えば、まず、楽器類が未決定のチャンネルのうち、信頼度が最大のチャンネルでスコアが一番高い楽器類を選択して、当該チャンネルの楽器類に相当すると決定する。次に、２番目に信頼度が高いチャンネルでスコアが一番高い楽器類を当該チャンネルの楽器類であると決定する。以下同様に各チャンネルの楽器類を決定する。 The instrument determination unit 309 determines instruments for each channel based on the reliability and the score. Specifically, for example, first, among the channels for which musical instruments have not been determined, the musical instrument having the highest reliability and the highest score is selected and determined to correspond to the musical instrument of the channel. Next, the instrument having the highest score in the channel with the second highest reliability is determined as the instrument of the channel. Similarly, the instruments for each channel are determined.

なお、楽器決定部３０９は、ユーザが予め定めた制約に反するか否かを判定し、当該判定結果に応じて決定するように構成してもよい。具体的には、例えば、制約に反すると判定した場合には、決定された楽器のスコアを０として当該チャンネルの選択を上記と同様に行うように構成する。なお、当該制約とは、例えば、ドラム１０２は１個しか存在しない、ギター１０３は２本まで、女性ボーカルは存在しないなど、ユーザより入力される制約である。 Note that the instrument determination unit 309 may be configured to determine whether or not the user violates a predetermined constraint and to determine according to the determination result. Specifically, for example, when it is determined that the restriction is violated, the score of the determined instrument is set to 0, and the channel is selected in the same manner as described above. Note that the constraint is a constraint input by the user, for example, there is only one drum 102, up to two guitars 103, and no female vocals.

画像情報生成部３１０は、各チャンネルに対応する各楽器類を表す画像情報を生成し、表示部２０４に表示する。なお、チャンネルに対応する楽器類が決定できない場合には、当該チャンネルについては楽器類が決定できない旨を表すメッセージ等が表示されるように構成してもよい。 The image information generation unit 310 generates image information representing each musical instrument corresponding to each channel and displays it on the display unit 204. Note that when the musical instrument corresponding to the channel cannot be determined, a message indicating that the musical instrument cannot be determined for the channel may be displayed.

次に、図１０を用いて、本実施の形態における音響信号を取得してから各チャンネルに対応する楽器類を決定するまでの処理のフローの一例について説明する。図１０に示すように、まず、音響信号取得部３０１は、入力チャンネル毎に音響信号を取得する（Ｓ１０１）。オンセット・オフセット検出部３０２は、入力された音響信号からオンセット及びオフセットを抽出する（Ｓ１０２）。特徴量抽出部３０３は発音区間の特徴量を抽出する（Ｓ１０３）。指標値取得部３０４は、特徴量に基づいて、発音区間毎にどの楽器類であると推定されるかを表す指標値を取得する（Ｓ１０４）。閾値判定・除外部３０５は、１つのチャンネルに含まれる発音区間が所定の音量閾値以下であるか否かを判定する（Ｓ１０５）。閾値判定・除外部３０５は、所定の音量以下であると判定された発音区間に関する情報を指標値データから除外する（Ｓ１０６）。チャンネル間特徴情報検出部３０６は、各チャンネルの発音区間を比較し、所定の特徴情報を検出する（Ｓ１０７）。スコア情報生成部３０７は、指標値データ及び上記検出された特徴情報に基づいて、スコア情報を生成する（Ｓ１０８）。 Next, an example of a processing flow from acquisition of an acoustic signal according to the present embodiment to determination of musical instruments corresponding to each channel will be described with reference to FIG. As shown in FIG. 10, first, the acoustic signal acquisition unit 301 acquires an acoustic signal for each input channel (S101). The onset / offset detection unit 302 extracts the onset and offset from the input acoustic signal (S102). The feature amount extraction unit 303 extracts the feature amount of the pronunciation section (S103). The index value acquisition unit 304 acquires an index value indicating which musical instrument is estimated for each sound generation section based on the feature amount (S104). The threshold determination / exclusion unit 305 determines whether or not a sound generation section included in one channel is equal to or less than a predetermined volume threshold (S105). The threshold determination / exclusion unit 305 excludes information related to the sound generation section determined to be equal to or lower than the predetermined volume from the index value data (S106). The inter-channel feature information detection unit 306 compares the sound generation sections of the respective channels and detects predetermined feature information (S107). The score information generation unit 307 generates score information based on the index value data and the detected feature information (S108).

信頼度取得部３０８は、各チャンネルの信頼度を取得する（Ｓ１０９）。楽器決定部３０９は、まず対応する楽器類が未決定のチャンネルがあるか否かを判定する（Ｓ１１０）。未決定チャンネルがあると判定した場合には、未決定チャンネルのうち、信頼度が最大のチャンネルで、かつ、スコアが最大の楽器類を選択する（Ｓ１１１）。そして、Ｓ１１０に戻る。一方、未決定チャンネルがないと判定した場合には処理を終了する。なお、上記処理は一例であって、本実施の形態は上記フローに限られない。 The reliability acquisition unit 308 acquires the reliability of each channel (S109). The instrument determination unit 309 first determines whether there is a channel for which the corresponding instrument has not been determined (S110). If it is determined that there is an undetermined channel, the instrument with the highest reliability and the highest score is selected from the undetermined channels (S111). Then, the process returns to S110. On the other hand, if it is determined that there is no undetermined channel, the process ends. The above process is an example, and the present embodiment is not limited to the above flow.

本実施の形態によれば、複数の楽器類からの音響信号が入力される場合であっても、より精度よく楽器類の識別が可能な楽器類識別装置等を実現することができる。 According to the present embodiment, it is possible to realize a musical instrument identification device or the like that can identify musical instruments with higher accuracy even when acoustic signals from a plurality of musical instruments are input.

本発明は、上記実施の形態に限定されるものではなく、例えば、上記実施の形態で示した構成と実質的に同一の構成、同一の作用効果を奏する構成又は同一の目的を達成することができる構成で置き換えることができる。 The present invention is not limited to the above-described embodiment. For example, the configuration substantially the same as the configuration shown in the above-described embodiment, the configuration having the same operational effects, or the same object can be achieved. It can be replaced with a possible configuration.

[第２の実施形態]
次に、本発明の第２の実施形態を説明する。本実施の形態においては、図１１に示すように、主に、組み合わせスコア情報取得部３１１及び組み合わせスコア情報抽出部３１２を有する点、及び、信頼度取得部３０８の処理が、上記第１の実施形態と異なる。なお、下記において第１の実施形態と同様である点については説明を省略する。 [Second Embodiment]
Next, a second embodiment of the present invention will be described. In the present embodiment, as shown in FIG. 11, the points having the combination score information acquisition unit 311 and the combination score information extraction unit 312 and the processing of the reliability acquisition unit 308 are mainly the above first implementation. Different from form. In the following, description of points that are the same as those of the first embodiment will be omitted.

組み合わせスコア情報取得部３１１は、チャンネル毎に楽器の組み合わせを網羅し、スコア情報取得部が取得したスコア情報に基づいてその合計スコアを取得する。具体的には、例えば、図１２に示すように、全ての楽器の組み合わせ毎に、スコアの合計を取得する。図１２において、combi１で表される楽器の組み合わせは、チャンネル１がキック（Kick）、チャンネル２がスネア（Snare）等であり、その合計スコア(score)が３３．５２であることを示す。なお、組み合わせスコア情報取得部３１１は、ユーザから与えられた制約を満たさない組み合わせは除外するように構成してもよい。なお、combi1等は楽器類の各組み合わせを表す。 The combination score information acquisition unit 311 covers combinations of musical instruments for each channel, and acquires the total score based on the score information acquired by the score information acquisition unit. Specifically, for example, as shown in FIG. 12, the total score is obtained for every combination of musical instruments. In FIG. 12, the combination of musical instruments represented by combi1 indicates that channel 1 is kick, channel 2 is snare, etc., and the total score is 33.52. Note that the combination score information acquisition unit 311 may be configured to exclude combinations that do not satisfy the constraints given by the user. In addition, combi1 etc. represents each combination of musical instruments.

組み合わせスコア情報抽出部３１２は、合計スコアの高い順に所定の数の組み合わせスコア情報を抽出する。例えば、図１３は、合計スコアが高い方から５つの組み合わせスコア情報を抽出した場合を示す。 The combination score information extraction unit 312 extracts a predetermined number of combination score information in descending order of the total score. For example, FIG. 13 shows a case where five pieces of combination score information are extracted from a higher total score.

信頼度取得部３０８は、抽出された組み合わせスコア情報に基づいて、各チャンネルの信頼度を取得する。具体的には、例えば、信頼度取得部３０８は、より高い順位に安定して同じ楽器が選択されているか否かに基づいて各チャンネルの信頼度を求める。より具体的には、例えば、チャンネル１(ch1)やチャンネル５(ch5)は、すべてそれぞれKick及びBassが選ばれている一方で、チャンネル３(ch3)は、combi1からcombi４までHi-Hatが選択されていることから、チャンネル１、５の安定度はチャンネル３よりも高い信頼度が取得されるように構成する等である。 The reliability acquisition unit 308 acquires the reliability of each channel based on the extracted combination score information. Specifically, for example, the reliability obtaining unit 308 obtains the reliability of each channel based on whether or not the same instrument is stably selected in a higher order. More specifically, for example, channel 1 (ch1) and channel 5 (ch5) are all selected as Kick and Bass, respectively, while channel 3 (ch3) is selected as Hi-Hat from combi1 to combi4. Therefore, the stability of the channels 1 and 5 is configured such that a higher reliability than that of the channel 3 is obtained.

次に、楽器決定部３０９は、上記取得された信頼度の順に、各チャンネルに対応する楽器類を決定する。具体的には、例えば、図１３に示す場合、チャンネル１と５の信頼度が同じであるので、チャンネル１と５をKick、Bassに対応するとそれぞれ決定する。なお、信頼度が同じ場合は決定の順序はいずれでもよい。次に、チャンネル３は、combi1からcombi４までHi-Hatが選択されており、未決定の他のチャンネルよりもより高い順位で安定度が高く信頼度が高いことから、チャンネル３をHi-Hatと決定する。以下同様に各チャンネルに対応する楽器類を決定する。 Next, the musical instrument determination unit 309 determines musical instruments corresponding to each channel in the order of the acquired reliability. Specifically, for example, in the case shown in FIG. 13, since the reliability of channels 1 and 5 is the same, it is determined that channels 1 and 5 correspond to Kick and Bass, respectively. When the reliability is the same, the order of determination may be any. Next, for channel 3, Hi-Hat is selected from combi1 to combi4, and since channel 3 is named Hi-Hat because it is more stable and more reliable than other channels that have not been determined. decide. In the same manner, instruments corresponding to each channel are determined.

なお、本実施の形態において、安定して同じ楽器が選択されていないチャンネル、つまり、信頼度が所定の閾値以下のチャンネルについては楽器類の決定を保留するように構成してもよい。具体的には、例えば、図１３に示す場合、チャンネル４及び７は選択されている楽器類が不安定なので保留にする等である。この場合、楽器類をユーザが確認し訂正した後に更に保留したチャンネルだけで上記と同様に、組み合わせスコア情報を取得し、保留した各チャンネルに対応する楽器類を決定するように構成してもよい。 In the present embodiment, the determination of musical instruments may be suspended for channels for which the same musical instrument is not stably selected, that is, for channels whose reliability is a predetermined threshold value or less. Specifically, for example, in the case shown in FIG. 13, channels 4 and 7 are put on hold because the selected musical instruments are unstable. In this case, after the user confirms and corrects the musical instruments, the combination score information is acquired only for the reserved channels, and the musical instruments corresponding to the reserved channels may be determined in the same manner as described above. .

次に、図１４を用いて、本実施の形態における音響信号を取得してから各チャンネルに対応する楽器類を決定するまでの処理のフローの一例について説明する。 Next, an example of a processing flow from acquisition of an acoustic signal in this embodiment to determination of musical instruments corresponding to each channel will be described with reference to FIG.

まず、Ｓ２０１乃至Ｓ２０８については、第１の実施形態のＳ１０１乃至Ｓ１０８と同様であるので、説明を省略する。次に、組み合わせスコア情報取得部３１１は、チャンネル毎に楽器の組み合わせを網羅し、スコア情報取得部が取得したスコア情報に基づいてその合計スコアを取得する（Ｓ２０９）。組み合わせスコア情報抽出部３１２は、合計スコアの高い順に所定の数の組み合わせスコア情報を抽出する（Ｓ２１０）。楽器決定部３０９は、まず対応する楽器類が未決定のチャンネルがあるか否かを判定する（Ｓ２１１）。未決定チャンネルがあると判定した場合には、未決定チャンネルのうち、信頼度が最大のチャンネルの楽器類を決定する（Ｓ２１２）。そして、Ｓ２１１に戻る。一方、未決定チャンネルがないと判定した場合には処理を終了する。 First, S201 to S208 are the same as S101 to S108 of the first embodiment, and thus description thereof is omitted. Next, the combination score information acquisition unit 311 covers combinations of musical instruments for each channel, and acquires the total score based on the score information acquired by the score information acquisition unit (S209). The combination score information extraction unit 312 extracts a predetermined number of combination score information in descending order of the total score (S210). The instrument determining unit 309 first determines whether there is a channel for which the corresponding instrument has not been determined (S211). If it is determined that there is an undetermined channel, the instrument of the channel with the highest reliability among the undetermined channels is determined (S212). Then, the process returns to S211. On the other hand, if it is determined that there is no undetermined channel, the process ends.

本実施の形態によれば、上記第１の実施形態と同様に、例えば、チャンネル毎に楽器類を識別する場合と比較して、より精度の高い楽器編成の識別を行うことができ、また、例えば、より容易にどの機器からの音響信号が入力されているかを把握することができる。また、本実施の形態によれば、上記第１の実施形態と比較してより精度の高い楽器編成の識別を行うことができる。 According to the present embodiment, as in the first embodiment, for example, it is possible to identify a musical instrument organization with higher accuracy than when identifying musical instruments for each channel. For example, it is possible to more easily determine from which device an acoustic signal is input. In addition, according to the present embodiment, it is possible to identify a musical instrument organization with higher accuracy than in the first embodiment.

本発明は、上記実施の形態に限定されるものではなく、例えば、上記実施の形態で示した構成と実質的に同一の構成、同一の作用効果を奏する構成又は同一の目的を達成することができる構成で置き換えることができる。例えば、信頼度の取得については、第１の実施形態における信頼度の取得と組み合わせて用いてもよい。 The present invention is not limited to the above-described embodiment. For example, the configuration substantially the same as the configuration shown in the above-described embodiment, the configuration having the same operational effects, or the same object can be achieved. It can be replaced with a possible configuration. For example, the acquisition of reliability may be used in combination with the acquisition of reliability in the first embodiment.

[第３の実施形態]
次に、本発明の第３の実施形態を説明する。本実施の形態においては、図１５に示すように、相関値取得部３１３、相関値加算部３１４、トップマイク決定部３１５を有する点が第１の実施形態と異なる。なお、下記において第１の実施形態と同様である点については説明を省略する。 [Third embodiment]
Next, a third embodiment of the present invention will be described. As shown in FIG. 15, the present embodiment is different from the first embodiment in that a correlation value acquisition unit 313, a correlation value addition unit 314, and a top microphone determination unit 315 are included. In the following, description of points that are the same as those of the first embodiment will be omitted.

相関値取得部３１３は、各チャネル間の音響信号の相関に基づいた相関値を取得する。具体的には、例えば、図１６に示すような各チャネル間の相関値データを取得する。相関値加算部３１４は、チャネル毎に相関値を加算し、合計値を求める。 The correlation value acquisition unit 313 acquires a correlation value based on the correlation of the acoustic signals between the channels. Specifically, for example, correlation value data between channels as shown in FIG. 16 is acquired. The correlation value adding unit 314 adds the correlation values for each channel to obtain a total value.

トップマイク決定部３１５は、合計値に基づいてトップマイク１０５のチャンネルを決定する。具体的には、例えば、通常、トップマイク１０５は左右２つ配置されるため、合計値が最大のものから２つのチャンネルをトップマイク１０５であると決定する。具体的には、図１６に示す場合、チャンネル３及び４の合計値(summary)が最大なので、チャンネル３及び４をトップマイクであると決定する。なお、その他チャンネル間でオンセット、オフセット時間が重なる時間の総和を求め、当該時間に基づいてトップマイク１０５を決定するように構成してもよい。 The top microphone determination unit 315 determines the channel of the top microphone 105 based on the total value. Specifically, for example, normally, two top microphones 105 are arranged on the left and right, so that the two channels with the maximum total value are determined to be the top microphones 105. Specifically, in the case shown in FIG. 16, since the sum of channels 3 and 4 is the maximum, it is determined that channels 3 and 4 are top microphones. It is also possible to obtain the sum of the times when the onset and offset times overlap between other channels and determine the top microphone 105 based on the time.

また、オンセット、オフセット時刻がほぼ同時刻のペアのチャンネルを検出した場合には、他のチャンネルのオンセット時刻との時刻差や、音量差に基づいて、トップマイク１０５のチャンネルを決定するように構成してもよい。具体的には、例えば、上記第１の実施形態によりKickやSnare等のドラム類が決定されている場合には、各チャンネルの音量を当該ドラム類のうちの１のチャンネルの音量から減算する。その場合の様子を図１７に示す。そして、当該ドラム類のチャンネルの音量と比べて最も低い音量（負の値でかつ最も絶対値が大きい）を表すチャンネルから順に２つのチャンネルをトップマイク１０５と決定するように構成する。 If a pair of channels having the same onset and offset times are detected, the channel of the top microphone 105 is determined based on the time difference from the onset time of other channels and the volume difference. You may comprise. Specifically, for example, when drums such as Kick and Snare are determined according to the first embodiment, the volume of each channel is subtracted from the volume of one of the drums. The state in that case is shown in FIG. Then, two channels are determined as the top microphone 105 in order from the channel representing the lowest volume (negative value and the largest absolute value) compared to the volume of the channel of the drum.

また、例えば、図１８に示すように、上記第１の実施形態によりKickやSnare等のドラム類が決定されている場合には、各チャンネルのオンセット時刻を当該ドラム類のうちの１のチャンネルのオンセット時刻から減算する。そして、当該ドラム類のチャンネルのオンセット時刻と比べて最も遅れたオンセット時刻（正の値でかつ最も絶対値が大きい）を表すチャンネル２つをトップマイク１０５と決定するように構成してもよい。なお、図１７及び図１８においては、各チャンネルが決定されていない場合を例として示しているが、例えば、第１の実施形態により、少なくとも上記当該ドラム類のうちの１つが決定されることを前提とする。 Further, for example, as shown in FIG. 18, when drums such as Kick and Snare are determined according to the first embodiment, the onset time of each channel is set to one channel of the drums. Subtract from the onset time. Further, the top microphone 105 may be configured to determine two channels that represent the onset time (positive value and the largest absolute value) that is delayed compared to the onset time of the channel of the drum type. Good. FIGS. 17 and 18 show an example in which each channel is not determined. For example, according to the first embodiment, at least one of the drums is determined. Assumption.

画像情報生成部３１０は、各チャンネルと対応する各楽器類を表す画像情報を生成し、表示部に表示する。ここで、各楽器類にはトップマイク１０５が含まれる。 The image information generation unit 310 generates image information representing each instrument corresponding to each channel and displays it on the display unit. Here, each musical instrument includes a top microphone 105.

本実施の形態によれば、例えば、上記第１及び第２の実施形態と比較して、トップマイクに対応するチャンネルをより精度よく決定することができる。 According to the present embodiment, for example, the channel corresponding to the top microphone can be determined with higher accuracy than in the first and second embodiments.

本発明は、上記実施の形態に限定されるものではなく、上記実施の形態で示した構成と実質的に同一の構成、同一の作用効果を奏する構成又は同一の目的を達成することができる構成で置き換えることができる。 The present invention is not limited to the above-described embodiment, and is substantially the same configuration as the configuration shown in the above-described embodiment, a configuration that exhibits the same operational effects, or a configuration that can achieve the same purpose. Can be replaced.

例えば、上記においては、主に、トップマイク１０５が２つの場合を例として説明したが、トップマイク１０５の数は２つに限られず、１つまたは３つ
以上であってもよい。また、上記においては、第１の実施形態とトップマイク１０５に対応するチャンネルを決定する構成を組み合わせた場合を例として説明したが、第２の実施形態と組み合わせてもよいし、トップマイク１０５に対応するチャンネルを決定する構成のみを単独で実現するように構成してもよい。 For example, in the above description, the case where there are two top microphones 105 is mainly described as an example. However, the number of top microphones 105 is not limited to two, and may be one or three or more. In the above description, the case of combining the first embodiment and the configuration for determining the channel corresponding to the top microphone 105 has been described as an example. However, the configuration may be combined with the second embodiment, You may comprise so that only the structure which determines a corresponding channel may be implement | achieved independently.

また、第３の実施形態は、第１または第２の実施形態と組み合わせて構成してもよい。ここで、例えば、スネアについては、すべての発音区間が、トップマイクとほぼ同じタイミングで存在し、音量(amp)が他のチャンネルよりも大きく、そして、Onset timeが早い。そこで、スネアと対応している発音区間の寄与度を下げるように構成してもよい。この場合、例えば、寄与度は、他のチャンネルで同じタイミングのオンセットのうちの音量が最大の発音区間との音量比等を用いてもよい。 Further, the third embodiment may be configured in combination with the first or second embodiment. Here, for example, with respect to the snare, all the sound generation sections exist at almost the same timing as the top microphone, the volume (amp) is larger than the other channels, and the Onset time is earlier. Thus, the contribution degree of the sound generation section corresponding to the snare may be reduced. In this case, for example, the contribution ratio may be a volume ratio with the sound generation section having the maximum volume in the onset at the same timing in other channels.

また、上記第１乃至第３の実施形態においては、主に、楽器類識別装置をミキサ１０６として実現する場合について説明したが、ミキサ１０６とは別個に形成してもよいし、その他の音響装置内で実現してもよい。 Further, in the first to third embodiments, the case where the instrument identification device is realized as the mixer 106 has been mainly described. However, the instrument identification device may be formed separately from the mixer 106 or other acoustic device. It may be realized within.

また、上記においては、信頼度に基づいて、楽器類を決定する構成について説明したが、信頼度に基づかずに、スコア情報や組み合わせスコア情報に基づいて、楽器類を決定するように構成してもよい。 In the above description, the configuration for determining musical instruments based on the reliability has been described. However, the configuration is such that the musical instruments are determined based on the score information and the combination score information without using the reliability. Also good.

更に、上記においては、音響信号の発音区間の特徴に基づいて指標値の取得や特徴情報の取得等の処理を行う構成について説明したが、音響信号の特徴に基づいて指標値の取得や特徴情報の取得等の処理を行う構成であればその他の構成であってもよい。 Further, in the above description, the configuration for performing processing such as acquisition of an index value and acquisition of feature information based on the characteristics of the sounding section of the acoustic signal has been described. However, acquisition of the index value and feature information based on the characteristics of the acoustic signal are described. Any other configuration may be used as long as it performs processing such as acquisition.

１００音響信号処理システム、１０１キーボード、１０２ドラム、１０３ギター、１０４マイク、１０５トップマイク、１０６ミキサ、１０７アンプ、１０８スピーカ、２０１制御部、２０２記憶部、２０３操作部、２０４表示部、３０１音響信号取得部、３０２オンセット・オフセット検出部、３０３特徴量抽出部、３０４指標値取得部、３０５閾値判定・除外部、３０６チャンネル間特徴情報検出部、３０７スコア情報生成部、３０８信頼度取得部、３０９楽器決定部、３１０画像情報生成部、３１１組み合わせスコア情報取得部、３１２組み合わせスコア情報抽出部、３１３相関値取得部、３１４相関値加算部、３１５トップマイク決定部。 DESCRIPTION OF SYMBOLS 100 Acoustic signal processing system, 101 Keyboard, 102 Drum, 103 Guitar, 104 microphone, 105 Top microphone, 106 Mixer, 107 Amplifier, 108 Speaker, 201 Control part, 202 Storage part, 203 Operation part, 204 Display part, 301 Acoustic signal Acquiring unit, 302 Onset / offset detecting unit, 303 Feature amount extracting unit, 304 Index value acquiring unit, 305 Threshold determination / exclusion unit, 306 Inter-channel feature information detecting unit, 307 Score information generating unit, 308 Reliability acquiring unit, 309 Musical instrument determination unit, 310 Image information generation unit, 311 Combination score information acquisition unit, 312 Combination score information extraction unit, 313 Correlation value acquisition unit, 314 Correlation value addition unit, 315 Top microphone determination unit

Claims

Based on the acoustic signal obtained for each of a plurality of channels, index value data including index values indicating the characteristics of the acoustic signal and the possibility of corresponding to the musical instruments is obtained for each musical instrument of the acoustic signal. Index value acquisition means to
Inter-channel feature information detecting means for detecting, as feature information, characteristics of the acoustic signal between the channels based on acoustic signals between the plurality of channels;
Score information generating means for generating, as score information, a value corresponding to the accuracy corresponding to the musical instrument for each musical instrument of each channel based on the index value data and the feature information;
A musical instrument identification device comprising:

2. The musical instrument identification device according to claim 1, wherein the index value data is configured based on a sound generation section in which a volume of an acoustic signal included in each channel is larger than a predetermined threshold.

Sequential processing method Based on the index value, reliability obtaining means for obtaining a predetermined reliability of each channel;
Instrument determination means for determining an instrument corresponding to each channel based on the reliability and the score information;
The musical instrument identification device according to claim 1, comprising:

Based on the score information, combination score information acquisition means for acquiring, for each combination of each instrument, combination score information representing an index corresponding to each instrument combination of each channel;
Combination score information extraction means for extracting a predetermined number of combination score information in descending order of the index of the combination score information;
Reliability acquisition means for acquiring a predetermined reliability of each channel corresponding to each channel based on the extracted combination score information;
Instrument determining means for determining instruments corresponding to each channel based on the combination score information and the reliability;
The musical instrument identification device according to claim 1 or 2, characterized by comprising:

Correlation value acquisition means for acquiring a correlation value based on the correlation of the acoustic signal between the channels based on the acoustic signal acquired for each channel;
Top microphone identifying means for identifying a channel corresponding to the top microphone arranged to collect sound from a plurality of musical instruments based on the correlation value;
The musical instrument identification device according to claim 1, comprising: