WO2017057532A1 - Instrument type identification device and instrument sound identification method - Google Patents

Instrument type identification device and instrument sound identification method Download PDF

Info

Publication number
WO2017057532A1
WO2017057532A1 PCT/JP2016/078754 JP2016078754W WO2017057532A1 WO 2017057532 A1 WO2017057532 A1 WO 2017057532A1 JP 2016078754 W JP2016078754 W JP 2016078754W WO 2017057532 A1 WO2017057532 A1 WO 2017057532A1
Authority
WO
WIPO (PCT)
Prior art keywords
instrument
channel
index value
acoustic signal
channels
Prior art date
Application number
PCT/JP2016/078754
Other languages
French (fr)
Japanese (ja)
Inventor
慶太 有元
Original Assignee
ヤマハ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ヤマハ株式会社 filed Critical ヤマハ株式会社
Publication of WO2017057532A1 publication Critical patent/WO2017057532A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10GREPRESENTATION OF MUSIC; RECORDING MUSIC IN NOTATION FORM; ACCESSORIES FOR MUSIC OR MUSICAL INSTRUMENTS NOT OTHERWISE PROVIDED FOR, e.g. SUPPORTS
    • G10G1/00Means for the representation of music
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the present invention relates to a musical instrument identification device and a musical instrument sound identification method.
  • Japanese Patent Application Laid-Open No. 2013-15601 discloses a sound source identification device that identifies a sound source from sounds emitted from a sound source such as an acoustic instrument, a natural sound, a person, or a living thing. Specifically, the sound source identification device registers the sound feature data emitted from the sound source in a database, and identifies the sound source based on the correlation with the sound feature database emitted from the sound source to be identified. .
  • the sound source identification device is configured to specify a sound source independently for each target acoustic signal, for example, when acoustic signals from a plurality of musical instruments are input, musical instruments of similar tones The specific accuracy of the sound source may be insufficient.
  • the present invention realizes a musical instrument identification device and a musical instrument sound identification method capable of identifying a musical instrument with higher accuracy even when acoustic signals from a plurality of musical instruments are input. Objective.
  • the instrument identification device of the present invention is based on the acoustic signal acquired for each of a plurality of channels, and the index value representing the characteristics of the acoustic signal and the possibility of corresponding to the instrument for each instrument of the acoustic signal
  • Index value acquisition means for acquiring index value data composed of: interchannel feature information detection means for detecting, as feature information, characteristics of the acoustic signal between the channels based on acoustic signals between the plurality of channels; Score information generating means for generating, as score information, a value corresponding to the accuracy corresponding to the instrument for each instrument of each channel based on the index value data and the feature information.
  • the instrument identification method of the present invention is based on the acoustic signal acquired for each of a plurality of channels, and the index value representing the characteristics of the acoustic signal and the possibility of corresponding to the instrument for each instrument of the acoustic signal.
  • Index value data configured by: obtaining the characteristic information of the acoustic signal between the channels based on the acoustic signal between the plurality of channels as feature information, and based on the index value data and the feature information A value corresponding to the accuracy corresponding to the musical instrument is generated as score information for each musical instrument of each channel.
  • FIG. 1 is a diagram illustrating an example of an outline of an acoustic signal processing system according to the present embodiment.
  • the acoustic signal processing system 100 includes, for example, a keyboard 101, a drum 102, a guitar 103, a microphone 104, a top microphone 105, a mixer 106, an amplifier 107, and a speaker 108.
  • the keyboard 101 is, for example, a synthesizer or an electronic piano, and outputs an acoustic signal according to the performance of the performer.
  • the microphone 104 collects a singer's voice and outputs the collected sound as an acoustic signal.
  • the drum 102 includes, for example, a drum set and microphones that collect sounds generated by hitting a percussion instrument (for example, a bass drum or a snare drum) included in the drum set.
  • the microphone is provided for each percussion instrument, and outputs the collected sound as an acoustic signal.
  • the guitar 103 includes, for example, an acoustic guitar 103 and a microphone, and the sound of the acoustic guitar 103 is collected by the microphone and output as an acoustic signal.
  • the guitar 103 may be an electric acoustic guitar or an electric guitar. In that case, there is no need to provide a microphone.
  • the top microphone 105 is a microphone installed above a plurality of musical instruments, for example, a drum set, and collects a sound from the entire drum set and outputs it as an acoustic signal.
  • the top microphone 105 may be composed of a plurality of microphones, for example, installed on the left and right. The top microphone 105 inevitably picks up sounds from musical instruments other than the drum set, though the volume is low.
  • the mixer 106 has a plurality of input terminals, and electrically adds, processes, and outputs acoustic signals from the keyboard 101, the drum 102, the guitar 103, the microphone 104, and the like input to the input terminals.
  • the mixer 106 includes, for example, a level control unit that controls the level of the volume and the like, a pan control unit that changes the sound balance and adjusts the sound localization, and the like.
  • the acoustic signals are mixed by the mixing unit and output to the amplifier 107.
  • the mixer 106 in the present embodiment has a musical instrument identification function and the like in addition to the configuration of the general mixer 106 as described above. Details of the musical instrument identification function and the like will be described later. Further, since the configuration of the general mixer 106 is well known, a detailed description thereof will be omitted.
  • the amplifier 107 amplifies the acoustic signal output from the output terminal of the mixer 106 and outputs it to the speaker 108.
  • the speaker 108 emits sound according to the amplified acoustic signal.
  • FIG. 2 is a diagram for explaining the outline of the configuration of the mixer 106 in the present embodiment.
  • the mixer 106 includes, for example, a control unit 201, a storage unit 202, an operation unit 203, a display unit 204, and an input / output unit 205.
  • the control unit 201, the storage unit 202, the operation unit 203, the display unit 204, and the input / output unit 205 are connected to each other via an internal bus 206.
  • the control unit 201 is, for example, a CPU or MPU, and operates according to a program stored in the storage unit 202.
  • the storage unit 202 is an information recording medium that includes an information recording medium such as a ROM, a RAM, and a hard disk, and holds a program executed by the control unit 201.
  • the storage unit 202 also operates as a work memory for the control unit 201.
  • the program may be provided by being downloaded through a network (not shown), or provided by various information recording media that can be read by a computer such as a CD-ROM or DVD-ROM. May be.
  • the operation unit 203 outputs the content of the instruction operation to the control unit 201 in accordance with the user's instruction operation such as a slide-type volume, button, or knob.
  • the display unit 204 is, for example, a liquid crystal display, an organic EL display, or the like, and displays information according to an instruction from the control unit 201.
  • the input / output unit 205 has a plurality of input terminals and output terminals. Acoustic signals are input to each input terminal from each instrument such as the keyboard 101, drum 102, guitar 103, microphone 104, and top microphone 105. Further, an acoustic signal obtained by electrically adding and processing the input acoustic signal is output from the output terminal.
  • the configuration of the mixer 106 is an example and is not limited to this. For example, some functions such as level control may be processed in an analog manner.
  • control unit 201 of the mixer 106 in the present embodiment will be described with reference to FIG.
  • the acoustic signal acquisition unit 301 acquires an acoustic signal for each channel.
  • each channel corresponds to each input terminal of each acoustic signal from each instrument such as the keyboard 101, the drum 102, and the guitar 103.
  • the onset / offset detector 302 extracts the onset and offset from the input acoustic signal. Onset corresponds to the rise of the acoustic signal, and offset corresponds to the output of the acoustic signal being equal to or less than a predetermined value (for example, approximately 0).
  • the feature amount extraction unit 303 extracts a feature amount of an acoustic signal (hereinafter simply referred to as “sounding section”) between onset and offset. The extraction is performed for each channel and each sounding section.
  • the index value acquisition unit 304 acquires an index value representing an index indicating which musical instrument is identified for each sound generation section based on the feature amount. The index value is acquired for each channel and each instrument.
  • the index value acquisition unit 304 is configured using three SVMs (Support Vector Vector Machine) as shown in FIG.
  • the three SVMs identify which musical instrument candidate acoustic signal corresponds to the input sound generation section based on the feature amount (for example, feature vector).
  • SVM0 is a harmonic sound representing guitar, male, female, female (male Vo, female Vo), or a percussive sound such as snare or cymbal.
  • the SVM 1 identifies whether the input onset is a percussion instrument such as a kick, snare, or cymbal.
  • the SVM 2 identifies whether the input sound generation section is a bass (Bass), guitar (Guitar), male vocal (maleVo), female vocal (femaleVo), or the like.
  • SVM is one of machine learning algorithms, and learns the feature value of the acoustic signal of each instrument beforehand, and based on this, which instrument is the input feature value? Although it is a technique to classify, since it is well known, the description is omitted for details.
  • SVM is used as an example of classification by a machine learning algorithm.
  • other algorithms for example, simple regression analysis
  • the present embodiment is not limited to this, and may be configured with two SVMs, for example.
  • FIG. 5 shows an example of index value data composed of the index values acquired as described above.
  • onset represents information for identifying the sound generation interval of each input acoustic signal.
  • Onset Time represents the start time (hereinafter referred to as onset time) of the sound generation interval
  • Offset Time represents the end time (hereinafter referred to as offset time) of the sound generation interval.
  • amplitude represents the amplitude of the pronunciation interval.
  • Non-Percussive corresponding to the above harmonic sound
  • Percussive represent the output from SVM0
  • Conga to Kick represent the output from SVM1
  • Bass to Wind represent the output from SVM2.
  • SVM0 is designed to take a numerical value from 0 to 1.
  • the threshold determination / exclusion unit 305 determines whether or not a sound generation section included in one channel is equal to or less than a predetermined volume threshold. Then, the threshold determination / exclusion unit 305 excludes the information related to the sound generation section determined to be equal to or lower than the predetermined volume from the index value data. Specifically, for example, in the case shown in FIG. 6, since the maximum value of the sound generation section (other source) other than that indicated as Vo. Active is equal to or less than the volume threshold indicated by the dotted line, the pronunciation represented by the other source Exclude sections.
  • the inter-channel feature information detection unit 306 compares the sound generation sections of the respective channels and detects predetermined feature information.
  • the predetermined feature information is, for example, information indicating a feature that the start timing of the sound generation interval is different from that of other channels.
  • the predetermined feature information may be information indicating a characteristic that the signal level of the sound generation section of the other channel is very small (below a predetermined threshold) as compared with the signal level of the sound generation section of the channel. Specifically, for example, when the sound generation interval of each channel is shown in FIG. 7, the sound generation interval of the channel indicated by 701 in FIG. 7 differs from the other channels in the start timing (onset time) of the sound generation interval.
  • the signal level of the other channel is very small compared to the signal level of the sounding section.
  • the predetermined feature information includes, for example, the onset time and the offset time almost simultaneously (within a predetermined threshold range) across a plurality of channels, and the index value for the sound generation section of the sound generation channels is the highest.
  • the same musical instrument may be used.
  • the feature information has the same onset time and offset time for the three channels at the same time, and the Snare index value for the three channels indicates the sound generation interval. This is characteristic information indicating the highest value.
  • the portion represented by 802 indicates the largest index value among the three channels.
  • the score information generation unit 307 generates score information based on the index value data and the detected feature information. Specifically, for example, as shown in FIG. 4, after multiplying the output of SVM1 and the output of SVM2 by the output identified as the harmonic sound and the output identified as the percussion instrument sound, respectively, among the outputs of SVM0, Add to generate score information.
  • each output from the SVM 1 is in the range of 0 to 1, and the output from the SVM 0 of the SVM 1 and SVM 2 is weighted and multiplied.
  • the index value data corresponds to the index value data from which the onset is excluded when the onset below the threshold is included as described above.
  • the score information generation unit 307 generates score information while adjusting the degree of contribution to the score information of each pronunciation section according to the detected feature information.
  • the predetermined feature information is information indicating that the start timing of the sound generation interval is different from that of the other channel, or the signal level of the sound generation interval of the other channel is the signal level of the sound generation interval of the channel. In the case of information indicating a feature that is very small compared to, the adjustment is made so as to increase the contribution of the pronunciation section.
  • the predetermined feature information is, for example, when the onset time and the offset time are almost the same over a plurality of channels and the musical instruments having the largest index value for the sound generation section are the same. Make adjustments to reduce the contribution of the pronunciation interval in the channel.
  • the contribution may be adjusted so as to reduce the contribution other than the sound generation interval of the earliest onset time.
  • the higher the contribution the higher the score information.
  • the score information generated in this way is shown in FIG. As shown in FIG. 9, in the score information, a score indicating that each instrument is for each channel is indicated by a numerical value. In other words, the larger the value, the more likely that the channel is the instrument. That is, the score information is a value corresponding to the accuracy corresponding to the musical instrument.
  • the reliability level acquisition unit 308 acquires the reliability level of each channel.
  • the reliability is acquired based on, for example, the consistency (variance) of each index value in the index value data, the number of sounding sections, the average sound volume of all sounding sections, and the like. Specifically, for example, when the number of sounding intervals included in the index value data is small, the reliability is lowered, for example, the reliability is increased as the average sound volume of all sounding intervals is larger than the average sound volume of other channels. is there. Note that the above assumes that the microphone gains of all channels are uniform. Further, the acquisition of the reliability is an example, and the present embodiment is not limited to the above.
  • the instrument determination unit 309 determines instruments for each channel based on the reliability and the score. Specifically, for example, first, among the channels for which musical instruments have not been determined, the musical instrument having the highest reliability and the highest score is selected and determined to correspond to the musical instrument of the channel. Next, the instrument having the highest score in the channel with the second highest reliability is determined as the instrument of the channel. Similarly, the instruments for each channel are determined.
  • the instrument determining unit 309 may be configured to determine whether or not the user violates a predetermined restriction and to determine according to the determination result. Specifically, for example, when it is determined that the restriction is violated, the score of the determined instrument is set to 0, and the channel is selected in the same manner as described above.
  • the constraint is a constraint input by the user, for example, there is only one drum 102, up to two guitars 103, and no female vocals.
  • the image information generation unit 310 generates image information representing each musical instrument corresponding to each channel and displays it on the display unit 204. Note that when the musical instrument corresponding to the channel cannot be determined, a message indicating that the musical instrument cannot be determined for the channel may be displayed.
  • the acoustic signal acquisition unit 301 acquires an acoustic signal for each input channel (S101).
  • the onset / offset detection unit 302 extracts the onset and offset from the input acoustic signal (S102).
  • the feature amount extraction unit 303 extracts the feature amount of the pronunciation section (S103).
  • the index value acquisition unit 304 acquires an index value indicating which musical instrument is estimated for each sound generation section based on the feature amount (S104).
  • the threshold determination / exclusion unit 305 determines whether or not a sound generation section included in one channel is equal to or less than a predetermined volume threshold (S105).
  • the threshold determination / exclusion unit 305 excludes the information related to the sound generation section determined to be equal to or lower than the predetermined volume from the index value data (S106).
  • the inter-channel feature information detection unit 306 compares the sound generation sections of the respective channels and detects predetermined feature information (S107).
  • the score information generation unit 307 generates score information based on the index value data and the detected feature information (S108).
  • the reliability acquisition unit 308 acquires the reliability of each channel (S109).
  • the instrument determination unit 309 first determines whether there is a channel for which the corresponding instrument has not been determined (S110). If it is determined that there is an undetermined channel, the instrument with the highest reliability and the highest score is selected from the undetermined channels (S111). Then, the process returns to S110. On the other hand, if it is determined that there is no undetermined channel, the process ends.
  • the above process is an example, and the present embodiment is not limited to the above flow.
  • a musical instrument identification device or the like that can identify musical instruments more accurately even when acoustic signals from a plurality of musical instruments are input.
  • the present invention is not limited to the above-described embodiment.
  • the points having the combination score information acquisition unit 311 and the combination score information extraction unit 312 and the processing of the reliability acquisition unit 308 are mainly the first implementation. Different from form. Note that the description of the same points as in the first embodiment will be omitted below.
  • the combination score information acquisition unit 311 covers combinations of instruments for each channel, and acquires the total score based on the score information acquired by the score information acquisition unit. Specifically, for example, as shown in FIG. 12, the total score is obtained for every combination of musical instruments.
  • the combination of musical instruments represented by combi1 indicates that channel 1 is kick, channel 2 is snare, etc., and the total score is 33.52.
  • the combination score information acquisition unit 311 may be configured to exclude combinations that do not satisfy the constraints given by the user.
  • combi1 etc. represent each combination of musical instruments.
  • the combination score information extraction unit 312 extracts a predetermined number of combination score information in descending order of the total score. For example, FIG. 13 shows a case where five pieces of combination score information are extracted from a higher total score.
  • the reliability acquisition unit 308 acquires the reliability of each channel based on the extracted combination score information. Specifically, for example, the reliability obtaining unit 308 obtains the reliability of each channel based on whether or not the same instrument is stably selected in a higher order. More specifically, for example, Kick and Bass are all selected for channel 1 (ch1) and channel 5 (ch5), while Hi-Hat is selected for channel 3 (ch3) from combi1 to combi4. Therefore, the stability of the channels 1 and 5 is configured such that a higher reliability than that of the channel 3 is obtained.
  • the instrument determining unit 309 determines instruments corresponding to each channel in the order of the acquired reliability. Specifically, for example, in the case shown in FIG. 13, since the reliability of channels 1 and 5 is the same, it is determined that channels 1 and 5 correspond to Kick and Bass, respectively. When the reliability is the same, the order of determination may be any. Next, for channel 3, Hi-Hat is selected from combi1 to combi4, and since channel 3 is called Hi-Hat because it has higher stability and higher reliability than other undecided channels. decide. In the same manner, instruments corresponding to each channel are determined.
  • the determination of musical instruments may be suspended for channels for which the same musical instrument is not stably selected, that is, for channels whose reliability is a predetermined threshold value or less.
  • channels 4 and 7 are put on hold because the selected musical instruments are unstable.
  • the combination score information is acquired only for the reserved channels, and the musical instruments corresponding to the reserved channels may be determined in the same manner as described above. .
  • the combination score information acquisition unit 311 covers combinations of musical instruments for each channel, and acquires the total score based on the score information acquired by the score information acquisition unit (S209).
  • the combination score information extraction unit 312 extracts a predetermined number of combination score information in descending order of the total score (S210).
  • the instrument determining unit 309 first determines whether there is a channel for which the corresponding instrument has not been determined (S211). If it is determined that there is an undetermined channel, the instrument having the highest reliability among the undetermined channels is determined (S212). Then, the process returns to S211. On the other hand, if it is determined that there is no undetermined channel, the process ends.
  • the present embodiment as in the first embodiment, for example, it is possible to identify a musical instrument organization with higher accuracy than when identifying musical instruments for each channel. For example, it is possible to more easily determine from which device an acoustic signal is input. In addition, according to the present embodiment, it is possible to identify a musical instrument organization with higher accuracy than in the first embodiment.
  • the present invention is not limited to the above-described embodiment.
  • the acquisition of reliability may be used in combination with the acquisition of reliability in the first embodiment.
  • the present embodiment is different from the first embodiment in that a correlation value acquisition unit 313, a correlation value addition unit 314, and a top microphone determination unit 315 are included. Note that the description of the same points as in the first embodiment will be omitted below.
  • Correlation value acquisition unit 313 acquires a correlation value based on the correlation of acoustic signals between channels. Specifically, for example, correlation value data between channels as shown in FIG. 16 is acquired. The correlation value adding unit 314 adds the correlation values for each channel to obtain a total value.
  • the top microphone determination unit 315 determines the channel of the top microphone 105 based on the total value. Specifically, for example, normally, two top microphones 105 are arranged on the left and right, so that the two channels with the maximum total value are determined to be the top microphones 105. Specifically, in the case shown in FIG. 16, since the sum of channels 3 and 4 is the maximum, it is determined that channels 3 and 4 are top microphones. In addition, it may be configured such that the sum of the times when the onset and offset times overlap between channels is obtained, and the top microphone 105 is determined based on the time.
  • the channel of the top microphone 105 is determined based on the time difference from the onset time of other channels and the volume difference. You may comprise. Specifically, for example, when drums such as Kick and Snare are determined according to the first embodiment, the volume of each channel is subtracted from the volume of one of the drums. The state in that case is shown in FIG. Then, two channels are determined as the top microphone 105 in order from the channel representing the lowest volume (negative value and the largest absolute value) compared to the volume of the channel of the drum.
  • the onset time of each channel is set to one of the drums. Subtract from the onset time.
  • the top microphone 105 may be configured to determine two channels that represent the onset time (positive value and the largest absolute value) that is delayed compared to the onset time of the channel of the drum type. Good.
  • FIGS. 17 and 18 show an example in which each channel is not determined. For example, according to the first embodiment, at least one of the drums is determined. Assumption.
  • the image information generation unit 310 generates image information representing each instrument corresponding to each channel and displays it on the display unit.
  • each musical instrument includes a top microphone 105.
  • the channel corresponding to the top microphone can be determined with higher accuracy than in the first and second embodiments.
  • the present invention is not limited to the above-described embodiment, and is substantially the same configuration as the configuration shown in the above-described embodiment, a configuration that exhibits the same operational effects, or a configuration that can achieve the same purpose. Can be replaced.
  • top microphones 105 For example, in the above description, the case where there are mainly two top microphones 105 has been described as an example, but the number of top microphones 105 is not limited to two, and may be one or three or more.
  • the case of combining the first embodiment and the configuration for determining the channel corresponding to the top microphone 105 has been described as an example. However, the configuration may be combined with the second embodiment, and the top microphone 105 may be combined. You may comprise so that only the structure which determines a corresponding channel may be implement
  • the third embodiment may be configured in combination with the first or second embodiment.
  • the contribution degree of the sound generation section corresponding to the snare may be reduced.
  • the contribution ratio may be a volume ratio with the sound generation section having the maximum volume in the onset at the same timing in other channels.
  • the instrument identification device is realized as the mixer 106 mainly described.
  • the instrument identification device may be formed separately from the mixer 106 or other acoustic device. It may be realized within.
  • the configuration for determining musical instruments based on the reliability has been described.
  • the configuration is such that the musical instruments are determined based on the score information and the combination score information without using the reliability. Also good.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Quality & Reliability (AREA)
  • Auxiliary Devices For Music (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

The present invention realizes an instrument type identification device or the like capable of accurately identifying an instrument even when acoustic signals from multiple instruments are inputted. The instrument type identification device is characterized by including: an indicator value acquisition means for acquiring indicator value data comprising an indicator value representing the possibility of corresponding to an instrument type, for each acoustic signal instrument type and acoustic signal characteristic, on the basis of the acoustic signals acquired through a plurality of channels; an inter-channel characteristic information detection means for detecting the characteristics of an acoustic signal between channels as characteristic information, on the basis of an acoustic signal between multiple channels; and a score information generation means for generating, as score information, a value according to the probability of corresponding to the instrument type, for each instrument type in each channel, on the basis of the indicator value data and the characteristic information.

Description

楽器類識別装置及び楽器音識別方法Musical instrument identification device and instrument sound identification method
 本発明は楽器類識別装置及び楽器音識別方法に関する。 The present invention relates to a musical instrument identification device and a musical instrument sound identification method.
 例えば、特開2013-15601号公報には、アコースティック楽器、自然音、人、生物等の音源が発した音から、音源を特定する音源識別装置が開示されている。具体的には、当該音源識別装置は、音源から発せられる音の特徴データをデータベースに登録しておき、特定対象の音源から発せられた音の特徴データベースとの相関に基づいて、音源を特定する。 For example, Japanese Patent Application Laid-Open No. 2013-15601 discloses a sound source identification device that identifies a sound source from sounds emitted from a sound source such as an acoustic instrument, a natural sound, a person, or a living thing. Specifically, the sound source identification device registers the sound feature data emitted from the sound source in a database, and identifies the sound source based on the correlation with the sound feature database emitted from the sound source to be identified. .
 しかしながら、上記音源識別装置においては、対象である音響信号毎に独立して音源を特定する構成であることから、例えば、複数の楽器からの音響信号が入力される場合に、類似する音色の楽器について音源の特定の精度が不十分である場合がある。 However, since the sound source identification device is configured to specify a sound source independently for each target acoustic signal, for example, when acoustic signals from a plurality of musical instruments are input, musical instruments of similar tones The specific accuracy of the sound source may be insufficient.
 上記に鑑み、本発明は、例えば、複数の楽器からの音響信号が入力される場合であっても、より精度よく楽器の識別が可能な楽器類識別装置及び楽器音識別方法を実現することを目的とする。 In view of the above, for example, the present invention realizes a musical instrument identification device and a musical instrument sound identification method capable of identifying a musical instrument with higher accuracy even when acoustic signals from a plurality of musical instruments are input. Objective.
 本発明の楽器類識別装置は、複数のチャンネル毎に取得された音響信号に基づいて、前記音響信号の特徴および前記音響信号の楽器類毎に、当該楽器類に対応する可能性を表す指標値により構成される指標値データを取得する指標値取得手段と、前記複数のチャンネル間の音響信号に基づいて、前記チャンネル間における前記音響信号の特徴を特徴情報として検出するチャンネル間特徴情報検出手段と、前記指標値データと前記特徴情報に基づいて、前記各チャンネルの前記楽器類毎に、前記楽器類に該当する確度に応じた値をスコア情報として生成するスコア情報生成手段と、を含むことを特徴とする。 The instrument identification device of the present invention is based on the acoustic signal acquired for each of a plurality of channels, and the index value representing the characteristics of the acoustic signal and the possibility of corresponding to the instrument for each instrument of the acoustic signal Index value acquisition means for acquiring index value data composed of: interchannel feature information detection means for detecting, as feature information, characteristics of the acoustic signal between the channels based on acoustic signals between the plurality of channels; Score information generating means for generating, as score information, a value corresponding to the accuracy corresponding to the instrument for each instrument of each channel based on the index value data and the feature information. Features.
 本発明の楽器類識別方法は、複数のチャンネル毎に取得された音響信号に基づいて、前記音響信号の特徴および前記音響信号の楽器類毎に、当該楽器類に対応する可能性を表す指標値により構成される指標値データを取得し、前記複数のチャンネル間の音響信号に基づいて、前記チャンネル間における前記音響信号の特徴を特徴情報として検出し、前記指標値データと前記特徴情報に基づいて、前記各チャンネルの前記楽器類毎に、前記楽器類に該当する確度に応じた値をスコア情報として生成することを特徴とする。 The instrument identification method of the present invention is based on the acoustic signal acquired for each of a plurality of channels, and the index value representing the characteristics of the acoustic signal and the possibility of corresponding to the instrument for each instrument of the acoustic signal. Index value data configured by: obtaining the characteristic information of the acoustic signal between the channels based on the acoustic signal between the plurality of channels as feature information, and based on the index value data and the feature information A value corresponding to the accuracy corresponding to the musical instrument is generated as score information for each musical instrument of each channel.
音響信号処理システムの概要の一例を示す図である。It is a figure which shows an example of the outline | summary of an acoustic signal processing system. ミキサの構成の概要の一例を示す図である。It is a figure which shows an example of the outline | summary of a structure of a mixer. 第1の実施形態におけるミキサの制御部の機能的構成の一例を示す図である。It is a figure which shows an example of a functional structure of the control part of the mixer in 1st Embodiment. 指標値取得部の構成の一例を示す図である。It is a figure which shows an example of a structure of an index value acquisition part. 指標値データの一例を示す図である。It is a figure which shows an example of index value data. 閾値判定・除外部の処理の一例について説明するための図である。It is a figure for demonstrating an example of a process of a threshold value determination / exclusion part. チャンネル間特徴情報検出部の特徴検出の一例について説明するための図である。It is a figure for demonstrating an example of the feature detection of the feature information detection part between channels. チャンネル間特徴情報検出部の特徴検出の他の一例について説明するための図である。It is a figure for demonstrating another example of the feature detection of the feature information detection part between channels. スコア情報の一例を示す図である。It is a figure which shows an example of score information. 第1の実施形態における音響信号を取得してから各チャンネルに対応する楽器類を決定するまでの処理のフローの一例を示す図である。It is a figure which shows an example of the flow of a process after acquiring the acoustic signal in 1st Embodiment until determining the musical instruments corresponding to each channel. 第2の実施形態におけるミキサの制御部の機能的構成の一例を示す図である。It is a figure which shows an example of a functional structure of the control part of the mixer in 2nd Embodiment. 組み合わせスコア情報の一例を示す図である。It is a figure which shows an example of combination score information. 組み合わせスコア情報抽出部により抽出された組み合わせスコア情報の一例を示す図である。It is a figure which shows an example of the combination score information extracted by the combination score information extraction part. 第2の実施形態における音響信号を取得してから各チャンネルに対応する楽器類を決定するまでの処理のフローの一例を示す図である。It is a figure which shows an example of the flow of a process after acquiring the acoustic signal in 2nd Embodiment until determining the musical instruments corresponding to each channel. 第3の実施形態におけるミキサの制御部の機能的構成の一例を示す図である。It is a figure which shows an example of a functional structure of the control part of the mixer in 3rd Embodiment. 各チャネル間の相関値データの一例を示す図である。It is a figure which shows an example of the correlation value data between each channel. トップ決定の処理の他の一例を説明するための図である。It is a figure for demonstrating another example of the process of top determination. トップ決定の処理の他の一例を説明するための図である。It is a figure for demonstrating another example of the process of top determination.
 以下、本発明の実施形態について、図面を参照しつつ説明する。なお、図面については、同一又は同等の要素には同一の符号を付し、重複する説明は省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In addition, about drawing, the same code | symbol is attached | subjected to the same or equivalent element, and the overlapping description is abbreviate | omitted.
[第1の実施形態]
 図1は、本実施の形態における音響信号処理システムの概要の一例を示す図である。図1に示すように、音響信号処理システム100は、例えば、キーボード101、ドラム102、ギター103、マイク104、トップマイク105、ミキサ106、アンプ107、スピーカ108を有する。
[First Embodiment]
FIG. 1 is a diagram illustrating an example of an outline of an acoustic signal processing system according to the present embodiment. As shown in FIG. 1, the acoustic signal processing system 100 includes, for example, a keyboard 101, a drum 102, a guitar 103, a microphone 104, a top microphone 105, a mixer 106, an amplifier 107, and a speaker 108.
 キーボード101は、例えば、シンセサイザーや電子ピアノであって、演奏者の演奏に応じて、音響信号を出力する。マイク104は、例えば、歌手の声を収音し、当該収音した音を音響信号として出力する。ドラム102は、例えば、ドラムセットと、当該ドラムセットに含まれる打楽器(例えばバスドラムやスネアドラム等)を打つことにより発生する音を収音する各マイクを含む。当該マイクは、打楽器ごとに設けられており、収音した音を音響信号として出力する。ギター103は、例えば、アコースティックギター103とマイクを有し、アコースティックギター103の音を、当該マイクで収音して音響信号として出力する。なお、ギター103は、エレクトリックアコースティックギターやエレクトリックギターとしてもよい。その場合は、マイクを設ける必要はない。トップマイク105は、複数の楽器からの音、例えば、ドラムセットの上方に設置されるマイクであって、ドラムセット全体からの音を収音し、音響信号として出力する。なお、トップマイク105は、例えば左右に設置する等、複数のマイクから構成してもよい。トップマイク105は、ドラムセット以外の楽器類からの音も小音量ながら不可避的に収音する。 The keyboard 101 is, for example, a synthesizer or an electronic piano, and outputs an acoustic signal according to the performance of the performer. For example, the microphone 104 collects a singer's voice and outputs the collected sound as an acoustic signal. The drum 102 includes, for example, a drum set and microphones that collect sounds generated by hitting a percussion instrument (for example, a bass drum or a snare drum) included in the drum set. The microphone is provided for each percussion instrument, and outputs the collected sound as an acoustic signal. The guitar 103 includes, for example, an acoustic guitar 103 and a microphone, and the sound of the acoustic guitar 103 is collected by the microphone and output as an acoustic signal. The guitar 103 may be an electric acoustic guitar or an electric guitar. In that case, there is no need to provide a microphone. The top microphone 105 is a microphone installed above a plurality of musical instruments, for example, a drum set, and collects a sound from the entire drum set and outputs it as an acoustic signal. The top microphone 105 may be composed of a plurality of microphones, for example, installed on the left and right. The top microphone 105 inevitably picks up sounds from musical instruments other than the drum set, though the volume is low.
 ミキサ106は、複数の入力端子を有し、当該各入力端子に入力された上記キーボード101、ドラム102、ギター103、マイク104等からの音響信号を電気的に加算、加工し出力する。具体的には、ミキサ106は、例えば、音量等のレベルの制御を行うレベル制御部や音のバランスを変化させ音の定位を調整するパン制御部等を含み、レベル制御等が行われた各音響信号を混合部により、混合して、アンプ107に出力する。なお、本実施の形態におけるミキサ106は、上記のような一般的なミキサ106の構成の他、楽器類識別機能等を有するが、当該楽器類識別機能等の詳細については、後述する。また、一般的なミキサ106の構成については周知であるので、詳細については説明を省略する。 The mixer 106 has a plurality of input terminals, and electrically adds, processes, and outputs acoustic signals from the keyboard 101, the drum 102, the guitar 103, the microphone 104, and the like input to the input terminals. Specifically, the mixer 106 includes, for example, a level control unit that controls the level of the volume and the like, a pan control unit that changes the sound balance and adjusts the sound localization, and the like. The acoustic signals are mixed by the mixing unit and output to the amplifier 107. The mixer 106 in the present embodiment has a musical instrument identification function and the like in addition to the configuration of the general mixer 106 as described above. Details of the musical instrument identification function and the like will be described later. Further, since the configuration of the general mixer 106 is well known, a detailed description thereof will be omitted.
 アンプ107は、ミキサ106の出力端子から出力される音響信号を増幅しスピーカ108に出力する。スピーカ108は、増幅された音響信号に応じて放音する。 The amplifier 107 amplifies the acoustic signal output from the output terminal of the mixer 106 and outputs it to the speaker 108. The speaker 108 emits sound according to the amplified acoustic signal.
 次に、本実施の形態におけるミキサ106の構成の一例について説明する。図2は、本実施の形態におけるミキサ106の構成の概要について説明するための図である。図2に示すように、ミキサ106は、例えば、制御部201、記憶部202、操作部203、表示部204、入出力部205を有する。なお、制御部201、記憶部202、操作部203、表示部204、入出力部205は、内部バス206により互いに接続される。 Next, an example of the configuration of the mixer 106 in the present embodiment will be described. FIG. 2 is a diagram for explaining the outline of the configuration of the mixer 106 in the present embodiment. As illustrated in FIG. 2, the mixer 106 includes, for example, a control unit 201, a storage unit 202, an operation unit 203, a display unit 204, and an input / output unit 205. The control unit 201, the storage unit 202, the operation unit 203, the display unit 204, and the input / output unit 205 are connected to each other via an internal bus 206.
 制御部201は、例えば、CPU、MPU等であって、記憶部202に格納されたプログラムに従って動作する。記憶部202は、例えば、ROMやRAM、ハードディスク等の情報記録媒体で構成され、制御部201によって実行されるプログラムを保持する情報記録媒体である。 The control unit 201 is, for example, a CPU or MPU, and operates according to a program stored in the storage unit 202. The storage unit 202 is an information recording medium that includes an information recording medium such as a ROM, a RAM, and a hard disk, and holds a program executed by the control unit 201.
 記憶部202は、制御部201のワークメモリとしても動作する。なお、当該プログラムは、例えば、ネットワーク(図示なし)を介して、ダウンロードされて提供されてもよいし、または、CD-ROMやDVD-ROM等のコンピュータで読み取り可能な各種の情報記録媒体によって提供されてもよい。 The storage unit 202 also operates as a work memory for the control unit 201. The program may be provided by being downloaded through a network (not shown), or provided by various information recording media that can be read by a computer such as a CD-ROM or DVD-ROM. May be.
 操作部203は、例えば、スライド式のボリューム、ボタン、ツマミ等、ユーザの指示操作に応じて、当該指示操作の内容を制御部201に出力する。表示部204は、例えば、液晶ディスプレイ、有機ELディスプレイ等であって、制御部201からの指示に従い、情報を表示する。 The operation unit 203 outputs the content of the instruction operation to the control unit 201 in accordance with the user's instruction operation such as a slide-type volume, button, or knob. The display unit 204 is, for example, a liquid crystal display, an organic EL display, or the like, and displays information according to an instruction from the control unit 201.
 入出力部205は、複数の入力端子及び出力端子を有する。各入力端子には、キーボード101、ドラム102、ギター103、マイク104等の各楽器類およびトップマイク105から、音響信号が入力される。また、出力端子からは、上記入力された音響信号を電気的に加算、加工した音響信号が出力される。なお、当該ミキサ106の構成は、一例であってこれに限定されるものではない。例えば、レベル制御等の一部の機能をアナログで処理するように構成してもよい。 The input / output unit 205 has a plurality of input terminals and output terminals. Acoustic signals are input to each input terminal from each instrument such as the keyboard 101, drum 102, guitar 103, microphone 104, and top microphone 105. Further, an acoustic signal obtained by electrically adding and processing the input acoustic signal is output from the output terminal. The configuration of the mixer 106 is an example and is not limited to this. For example, some functions such as level control may be processed in an analog manner.
 次に、図3を用いて、本実施の形態におけるミキサ106の制御部201の機能的構成について説明する。 Next, the functional configuration of the control unit 201 of the mixer 106 in the present embodiment will be described with reference to FIG.
 音響信号取得部301は、チャンネル毎に音響信号を取得する。ここで、各チャンネルは、それぞれ上記キーボード101、ドラム102、ギター103等の各楽器類からの各音響信号の各入力端子に対応する。 The acoustic signal acquisition unit 301 acquires an acoustic signal for each channel. Here, each channel corresponds to each input terminal of each acoustic signal from each instrument such as the keyboard 101, the drum 102, and the guitar 103.
 オンセット・オフセット検出部302は、入力された音響信号からオンセット及びオフセットを抽出する。オンセットとは音響信号の立ち上がりに相当し、オフセットとは当該音響信号の出力が所定の値以下(例えば、ほぼ0)になることに相当する。 The onset / offset detector 302 extracts the onset and offset from the input acoustic signal. Onset corresponds to the rise of the acoustic signal, and offset corresponds to the output of the acoustic signal being equal to or less than a predetermined value (for example, approximately 0).
 特徴量抽出部303は、オンセットからオフセットの間の音響信号(以下単に「発音区間」という)の特徴量を抽出する。当該抽出はチャンネル毎及び発音区間毎に行う。 The feature amount extraction unit 303 extracts a feature amount of an acoustic signal (hereinafter simply referred to as “sounding section”) between onset and offset. The extraction is performed for each channel and each sounding section.
 指標値取得部304は、特徴量に基づいて、発音区間毎にどの楽器類であると識別されるかの指標を表す指標値を取得する。なお、当該指標値の取得はチャンネル毎に、また楽器類毎に行う。 The index value acquisition unit 304 acquires an index value representing an index indicating which musical instrument is identified for each sound generation section based on the feature amount. The index value is acquired for each channel and each instrument.
 具体的には、例えば、指標値取得部304は、図4に示すように、3つのSVM(Support Vector Machine)を用いて構成する。当該3つのSVMは、上記特徴量(例えば、特徴ベクトル(feature vector))に基づいて、入力された発音区間がどの楽器類の候補の音響信号に相当するかにつき識別する。具体的には、SVM0は、ギター(Guitar)や男性、女性音声(male Vo, female Vo)等を表す調和(harmonic)音か、スネア(Snare)やシンバル(Cymbal)等打楽器(percussive)の音かを識別する。SVM1は、入力されたオンセットが、キック(Kick)、スネア、シンバル等の打楽器のいずれの楽器であるかを識別する。SVM2は、入力された発音区間が、バス(Bass)、ギター(Guitar)、男性ボーカル(maleVo)、女性ボーカル(femaleVo)等のいずれであるかを識別する。 Specifically, for example, the index value acquisition unit 304 is configured using three SVMs (Support Vector Vector Machine) as shown in FIG. The three SVMs identify which musical instrument candidate acoustic signal corresponds to the input sound generation section based on the feature amount (for example, feature vector). Specifically, SVM0 is a harmonic sound representing guitar, male, female, female (male Vo, female Vo), or a percussive sound such as snare or cymbal. To identify. The SVM 1 identifies whether the input onset is a percussion instrument such as a kick, snare, or cymbal. The SVM 2 identifies whether the input sound generation section is a bass (Bass), guitar (Guitar), male vocal (maleVo), female vocal (femaleVo), or the like.
 なお、各SVMで識別する楽器類の種類や数は例示であって、本実施の形態はこれらに限定されるものではない。また、SVMとは、機械学習アルゴリズムの1つであって、あらかじめ各楽器類の音響信号の特徴量を学習させておき、これに基づいて、入力された特徴量がいずれの楽器類であるか分類する技術であるが、周知であるので、詳細については説明を省略する。また、本実施の形態においては、機械学習アルゴリズムによる分類の一例としてSVMを用いる場合について説明するが、その他のアルゴリズム(例えば、単純回帰分析(Simple logistic regression))等を用いてもよい。更に、上記においては3つのSVMを用いる場合について説明したが、本実施の形態は、これに限られず例えば2つのSVM等で構成してもよい。 Note that the types and number of musical instruments identified by each SVM are merely examples, and the present embodiment is not limited to these. SVM is one of machine learning algorithms, and learns the feature value of the acoustic signal of each instrument beforehand, and based on this, which instrument is the input feature value? Although it is a technique to classify, since it is well known, the description is omitted for details. In this embodiment, the case where SVM is used as an example of classification by a machine learning algorithm will be described. However, other algorithms (for example, simple regression analysis) may be used. Furthermore, although the case where three SVMs are used has been described above, the present embodiment is not limited to this, and may be configured with two SVMs, for example.
 上記のようにして、取得された指標値から構成される指標値データの一例を図5に示す。ここで、図5において、onsetは入力された各音響信号の発音区間を識別する情報を表す。Onset Timeは、発音区間の開始時刻(以下、オンセット時刻)を表し、Offset Timeは発音区間の終了時刻(以下、オフセット時刻)を示す。amplitudeは発音区間の振幅を表す。また、Non Percussive(上記調和音に相当)及びPercussiveは、SVM0からの出力を、CongaからKickまでは、SVM1からの出力を表し、BassからWindは、SVM2からの出力を示す。なお、指標値の数値が大きいほど、当該楽器類に対応する可能性が高いことを表す。なお、図5において、SVM0については0から1までの数値をとるように設計されている。 FIG. 5 shows an example of index value data composed of the index values acquired as described above. Here, in FIG. 5, onset represents information for identifying the sound generation interval of each input acoustic signal. Onset Time represents the start time (hereinafter referred to as onset time) of the sound generation interval, and Offset Time represents the end time (hereinafter referred to as offset time) of the sound generation interval. amplitude represents the amplitude of the pronunciation interval. Further, Non-Percussive (corresponding to the above harmonic sound) and Percussive represent the output from SVM0, Conga to Kick represent the output from SVM1, and Bass to Wind represent the output from SVM2. In addition, it represents that possibility that it corresponds to the said musical instruments is so high that the numerical value of an index value is large. In FIG. 5, SVM0 is designed to take a numerical value from 0 to 1.
 閾値判定・除外部305は、1つのチャンネルに含まれる発音区間が所定の音量閾値以下であるか否かを判定する。そして、閾値判定・除外部305は、所定の音量以下であると判定された発音区間に関する情報を指標値データから除外する。具体的には、例えば、図6に示す場合、Vo. Activeとして示す以外の発音区間(other source)の極大値は、点線で示す音量閾値以下であることから、当該other sourceで表される発音区間を除外する。 The threshold determination / exclusion unit 305 determines whether or not a sound generation section included in one channel is equal to or less than a predetermined volume threshold. Then, the threshold determination / exclusion unit 305 excludes the information related to the sound generation section determined to be equal to or lower than the predetermined volume from the index value data. Specifically, for example, in the case shown in FIG. 6, since the maximum value of the sound generation section (other source) other than that indicated as Vo. Active is equal to or less than the volume threshold indicated by the dotted line, the pronunciation represented by the other source Exclude sections.
 チャンネル間特徴情報検出部306は、各チャンネルの発音区間を比較し、所定の特徴情報を検出する。ここで、所定の特徴情報とは、例えば、発音区間の開始タイミングが他のチャンネルと異なるという特徴を示す情報である。また、所定の特徴情報は、他のチャンネルの発音区間の信号レベルが当チャンネルの発音区間の信号レベルと比べて非常に小さい(所定の閾値以下)という特徴を示す情報であってもよい。具体的には、例えば、各チャンネルの発音区間が図7に示す場合、図7の701で示す部分のチャンネルの発音区間は、発音区間の開始タイミング(オンセット時刻)が他のチャンネルと異なっており、また、他のチャンネルの信号レベルが当該発音区間の信号レベルと比べて非常に小さい。 The inter-channel feature information detection unit 306 compares the sound generation sections of the respective channels and detects predetermined feature information. Here, the predetermined feature information is, for example, information indicating a feature that the start timing of the sound generation interval is different from that of other channels. Further, the predetermined feature information may be information indicating a characteristic that the signal level of the sound generation section of the other channel is very small (below a predetermined threshold) as compared with the signal level of the sound generation section of the channel. Specifically, for example, when the sound generation interval of each channel is shown in FIG. 7, the sound generation interval of the channel indicated by 701 in FIG. 7 differs from the other channels in the start timing (onset time) of the sound generation interval. In addition, the signal level of the other channel is very small compared to the signal level of the sounding section.
 また、所定の特徴情報は、例えば、複数のチャンネルに渡ってオンセット時刻・オフセット時刻がほぼ同時(所定の閾値の範囲内)で、かつ、当該複数の発音チャンネルの発音区間について指標値が最も大きな値を示す楽器類が同じ場合であってもよい。具体的には、当該特徴情報は、例えば、図8の801で表す部分に示すように、3チャンネル同時にオンセット時刻、オフセット時刻が同じで、当該3チャンネルのSnareの指標値が当該発音区間について最も高い値であることを示すという特徴情報である。なお、802で表す部分が3チャンネルの間で最も大きな指標値を示す。 Further, the predetermined feature information includes, for example, the onset time and the offset time almost simultaneously (within a predetermined threshold range) across a plurality of channels, and the index value for the sound generation section of the sound generation channels is the highest. The same musical instrument may be used. Specifically, for example, as shown in the part indicated by 801 in FIG. 8, the feature information has the same onset time and offset time for the three channels at the same time, and the Snare index value for the three channels indicates the sound generation interval. This is characteristic information indicating the highest value. Note that the portion represented by 802 indicates the largest index value among the three channels.
 スコア情報生成部307は、指標値データ及び上記検出された特徴情報に基づいて、スコア情報を生成する。具体的には、例えば、図4に示すようにSVM1の出力、及び、SVM2の出力に、SVM0の出力のうち調和音と識別された出力、打楽器音と識別された出力をそれぞれ乗算した後、加算してスコア情報を生成する。この場合、SVM1からの当該各出力は0乃至1の範囲とし、SVM1、SVM2の出力のSVM0からの出力に重みづけして乗算する構成とする。なお、当該指標値データは、上記のように閾値以下のオンセットが含まれている場合には当該オンセットが除外された指標値データに相当する。 The score information generation unit 307 generates score information based on the index value data and the detected feature information. Specifically, for example, as shown in FIG. 4, after multiplying the output of SVM1 and the output of SVM2 by the output identified as the harmonic sound and the output identified as the percussion instrument sound, respectively, among the outputs of SVM0, Add to generate score information. In this case, each output from the SVM 1 is in the range of 0 to 1, and the output from the SVM 0 of the SVM 1 and SVM 2 is weighted and multiplied. Note that the index value data corresponds to the index value data from which the onset is excluded when the onset below the threshold is included as described above.
 ここで、スコア情報生成部307は、検出された特徴情報に応じて、各発音区間のスコア情報への寄与度を調整しつつスコア情報を生成する。具体的には、例えば、所定の特徴情報が、上記発音区間の開始タイミングが他のチャンネルと異なるという特徴を示す情報や、他のチャンネルの発音区間の信号レベルが当チャンネルの発音区間の信号レベルと比べて非常に小さいという特徴を示す情報の場合は、当該発音区間の寄与度を上げるように調整する。また、所定の特徴情報は、例えば、複数のチャンネルに渡ってオンセット時刻・オフセット時刻がほぼ同時で、かつ、当該発音区間について指標値が最も大きな値を示す楽器類が同じ場合は、他のチャンネルにおける当該発音区間の寄与度を下げるように調整する。その他、例えば、発音区間の開始時刻がほぼ同時の場合、一番早いオンセット時刻の発音区間以外の寄与度を下げるように調整するように構成してもよい。なお、上記においては、寄与度が高くするほど、スコア情報が高くなるものとする。 Here, the score information generation unit 307 generates score information while adjusting the degree of contribution to the score information of each pronunciation section according to the detected feature information. Specifically, for example, the predetermined feature information is information indicating that the start timing of the sound generation interval is different from that of the other channel, or the signal level of the sound generation interval of the other channel is the signal level of the sound generation interval of the channel. In the case of information indicating a feature that is very small compared to, the adjustment is made so as to increase the contribution of the pronunciation section. In addition, the predetermined feature information is, for example, when the onset time and the offset time are almost the same over a plurality of channels and the musical instruments having the largest index value for the sound generation section are the same. Make adjustments to reduce the contribution of the pronunciation interval in the channel. In addition, for example, when the start times of the sound generation intervals are almost the same, the contribution may be adjusted so as to reduce the contribution other than the sound generation interval of the earliest onset time. In the above description, it is assumed that the higher the contribution, the higher the score information.
 このようにして生成されたスコア情報を図9に示す。図9に示すように、スコア情報においては、チャンネル毎に各楽器類であることを示すスコアが数値で示される。言い換えれば、当該数値が大きいほど、当該チャンネルが当該楽器であることが確からしいことを表す。すなわち、スコア情報は楽器類に該当する確度に応じた値となっている。 The score information generated in this way is shown in FIG. As shown in FIG. 9, in the score information, a score indicating that each instrument is for each channel is indicated by a numerical value. In other words, the larger the value, the more likely that the channel is the instrument. That is, the score information is a value corresponding to the accuracy corresponding to the musical instrument.
 信頼度取得部308は、各チャンネルの信頼度を取得する。当該信頼度は、例えば、指標値データにおける各指標値の一貫性(分散)や、発音区間の数、全発音区間の平均音量等に基づいて取得する。具体的には、例えば、指標値データに含まれる発音区間の数が少ないものは信頼度を下げ、例えば、全発音区間の平均音量が他のチャンネルの平均音量より大きいほど信頼度を上げる等である。なお、上記は全てのチャンネルのマイクゲインが均一であることをその前提とする。また、上記信頼度の取得は一例であって、本実施の形態は上記に限定されるものではない。 The reliability level acquisition unit 308 acquires the reliability level of each channel. The reliability is acquired based on, for example, the consistency (variance) of each index value in the index value data, the number of sounding sections, the average sound volume of all sounding sections, and the like. Specifically, for example, when the number of sounding intervals included in the index value data is small, the reliability is lowered, for example, the reliability is increased as the average sound volume of all sounding intervals is larger than the average sound volume of other channels. is there. Note that the above assumes that the microphone gains of all channels are uniform. Further, the acquisition of the reliability is an example, and the present embodiment is not limited to the above.
 楽器決定部309は、信頼度及びスコアに基づいて、各チャンネルの楽器類を決定する。具体的には、例えば、まず、楽器類が未決定のチャンネルのうち、信頼度が最大のチャンネルでスコアが一番高い楽器類を選択して、当該チャンネルの楽器類に相当すると決定する。次に、2番目に信頼度が高いチャンネルでスコアが一番高い楽器類を当該チャンネルの楽器類であると決定する。以下同様に各チャンネルの楽器類を決定する。 The instrument determination unit 309 determines instruments for each channel based on the reliability and the score. Specifically, for example, first, among the channels for which musical instruments have not been determined, the musical instrument having the highest reliability and the highest score is selected and determined to correspond to the musical instrument of the channel. Next, the instrument having the highest score in the channel with the second highest reliability is determined as the instrument of the channel. Similarly, the instruments for each channel are determined.
 なお、楽器決定部309は、ユーザが予め定めた制約に反するか否かを判定し、当該判定結果に応じて決定するように構成してもよい。具体的には、例えば、制約に反すると判定した場合には、決定された楽器のスコアを0として当該チャンネルの選択を上記と同様に行うように構成する。なお、当該制約とは、例えば、ドラム102は1個しか存在しない、ギター103は2本まで、女性ボーカルは存在しないなど、ユーザより入力される制約である。 Note that the instrument determining unit 309 may be configured to determine whether or not the user violates a predetermined restriction and to determine according to the determination result. Specifically, for example, when it is determined that the restriction is violated, the score of the determined instrument is set to 0, and the channel is selected in the same manner as described above. Note that the constraint is a constraint input by the user, for example, there is only one drum 102, up to two guitars 103, and no female vocals.
 画像情報生成部310は、各チャンネルに対応する各楽器類を表す画像情報を生成し、表示部204に表示する。なお、チャンネルに対応する楽器類が決定できない場合には、当該チャンネルについては楽器類が決定できない旨を表すメッセージ等が表示されるように構成してもよい。 The image information generation unit 310 generates image information representing each musical instrument corresponding to each channel and displays it on the display unit 204. Note that when the musical instrument corresponding to the channel cannot be determined, a message indicating that the musical instrument cannot be determined for the channel may be displayed.
 次に、図10を用いて、本実施の形態における音響信号を取得してから各チャンネルに対応する楽器類を決定するまでの処理のフローの一例について説明する。図10に示すように、まず、音響信号取得部301は、入力チャンネル毎に音響信号を取得する(S101)。オンセット・オフセット検出部302は、入力された音響信号からオンセット及びオフセットを抽出する(S102)。特徴量抽出部303は発音区間の特徴量を抽出する(S103)。指標値取得部304は、特徴量に基づいて、発音区間毎にどの楽器類であると推定されるかを表す指標値を取得する(S104)。閾値判定・除外部305は、1つのチャンネルに含まれる発音区間が所定の音量閾値以下であるか否かを判定する(S105)。閾値判定・除外部305は、所定の音量以下であると判定された発音区間に関する情報を指標値データから除外する(S106)。チャンネル間特徴情報検出部306は、各チャンネルの発音区間を比較し、所定の特徴情報を検出する(S107)。スコア情報生成部307は、指標値データ及び上記検出された特徴情報に基づいて、スコア情報を生成する(S108)。 Next, an example of a processing flow from acquisition of an acoustic signal according to the present embodiment to determination of musical instruments corresponding to each channel will be described with reference to FIG. As shown in FIG. 10, first, the acoustic signal acquisition unit 301 acquires an acoustic signal for each input channel (S101). The onset / offset detection unit 302 extracts the onset and offset from the input acoustic signal (S102). The feature amount extraction unit 303 extracts the feature amount of the pronunciation section (S103). The index value acquisition unit 304 acquires an index value indicating which musical instrument is estimated for each sound generation section based on the feature amount (S104). The threshold determination / exclusion unit 305 determines whether or not a sound generation section included in one channel is equal to or less than a predetermined volume threshold (S105). The threshold determination / exclusion unit 305 excludes the information related to the sound generation section determined to be equal to or lower than the predetermined volume from the index value data (S106). The inter-channel feature information detection unit 306 compares the sound generation sections of the respective channels and detects predetermined feature information (S107). The score information generation unit 307 generates score information based on the index value data and the detected feature information (S108).
 信頼度取得部308は、各チャンネルの信頼度を取得する(S109)。楽器決定部309は、まず対応する楽器類が未決定のチャンネルがあるか否かを判定する(S110)。未決定チャンネルがあると判定した場合には、未決定チャンネルのうち、信頼度が最大のチャンネルで、かつ、スコアが最大の楽器類を選択する(S111)。そして、S110に戻る。一方、未決定チャンネルがないと判定した場合には処理を終了する。なお、上記処理は一例であって、本実施の形態は上記フローに限られない。 The reliability acquisition unit 308 acquires the reliability of each channel (S109). The instrument determination unit 309 first determines whether there is a channel for which the corresponding instrument has not been determined (S110). If it is determined that there is an undetermined channel, the instrument with the highest reliability and the highest score is selected from the undetermined channels (S111). Then, the process returns to S110. On the other hand, if it is determined that there is no undetermined channel, the process ends. The above process is an example, and the present embodiment is not limited to the above flow.
 本実施の形態によれば、複数の楽器類からの音響信号が入力される場合であっても、より精度よく楽器類の識別が可能な楽器類識別装置等を実現することができる。 According to the present embodiment, it is possible to realize a musical instrument identification device or the like that can identify musical instruments more accurately even when acoustic signals from a plurality of musical instruments are input.
 本発明は、上記実施の形態に限定されるものではなく、例えば、上記実施の形態で示した構成と実質的に同一の構成、同一の作用効果を奏する構成又は同一の目的を達成することができる構成で置き換えることができる。 The present invention is not limited to the above-described embodiment. For example, the configuration substantially the same as the configuration shown in the above-described embodiment, the configuration having the same operational effects, or the same object can be achieved. It can be replaced with a possible configuration.
[第2の実施形態]
 次に、本発明の第2の実施形態を説明する。本実施の形態においては、図11に示すように、主に、組み合わせスコア情報取得部311及び組み合わせスコア情報抽出部312を有する点、及び、信頼度取得部308の処理が、上記第1の実施形態と異なる。なお、下記において第1の実施形態と同様である点については説明を省略する。
[Second Embodiment]
Next, a second embodiment of the present invention will be described. In the present embodiment, as shown in FIG. 11, the points having the combination score information acquisition unit 311 and the combination score information extraction unit 312 and the processing of the reliability acquisition unit 308 are mainly the first implementation. Different from form. Note that the description of the same points as in the first embodiment will be omitted below.
 組み合わせスコア情報取得部311は、チャンネル毎に楽器の組み合わせを網羅し、スコア情報取得部が取得したスコア情報に基づいてその合計スコアを取得する。具体的には、例えば、図12に示すように、全ての楽器の組み合わせ毎に、スコアの合計を取得する。図12において、combi1で表される楽器の組み合わせは、チャンネル1がキック(Kick)、チャンネル2がスネア(Snare)等であり、その合計スコア(score)が33.52であることを示す。なお、組み合わせスコア情報取得部311は、ユーザから与えられた制約を満たさない組み合わせは除外するように構成してもよい。なお、combi1等は楽器類の各組み合わせを表す。 The combination score information acquisition unit 311 covers combinations of instruments for each channel, and acquires the total score based on the score information acquired by the score information acquisition unit. Specifically, for example, as shown in FIG. 12, the total score is obtained for every combination of musical instruments. In FIG. 12, the combination of musical instruments represented by combi1 indicates that channel 1 is kick, channel 2 is snare, etc., and the total score is 33.52. Note that the combination score information acquisition unit 311 may be configured to exclude combinations that do not satisfy the constraints given by the user. In addition, combi1 etc. represent each combination of musical instruments.
 組み合わせスコア情報抽出部312は、合計スコアの高い順に所定の数の組み合わせスコア情報を抽出する。例えば、図13は、合計スコアが高い方から5つの組み合わせスコア情報を抽出した場合を示す。 The combination score information extraction unit 312 extracts a predetermined number of combination score information in descending order of the total score. For example, FIG. 13 shows a case where five pieces of combination score information are extracted from a higher total score.
 信頼度取得部308は、抽出された組み合わせスコア情報に基づいて、各チャンネルの信頼度を取得する。具体的には、例えば、信頼度取得部308は、より高い順位に安定して同じ楽器が選択されているか否かに基づいて各チャンネルの信頼度を求める。より具体的には、例えば、チャンネル1(ch1)やチャンネル5(ch5)は、すべてそれぞれKick及びBassが選ばれている一方で、チャンネル3(ch3)は、combi1からcombi4までHi-Hatが選択されていることから、チャンネル1、5の安定度はチャンネル3よりも高い信頼度が取得されるように構成する等である。 The reliability acquisition unit 308 acquires the reliability of each channel based on the extracted combination score information. Specifically, for example, the reliability obtaining unit 308 obtains the reliability of each channel based on whether or not the same instrument is stably selected in a higher order. More specifically, for example, Kick and Bass are all selected for channel 1 (ch1) and channel 5 (ch5), while Hi-Hat is selected for channel 3 (ch3) from combi1 to combi4. Therefore, the stability of the channels 1 and 5 is configured such that a higher reliability than that of the channel 3 is obtained.
 次に、楽器決定部309は、上記取得された信頼度の順に、各チャンネルに対応する楽器類を決定する。具体的には、例えば、図13に示す場合、チャンネル1と5の信頼度が同じであるので、チャンネル1と5をKick、Bassに対応するとそれぞれ決定する。なお、信頼度が同じ場合は決定の順序はいずれでもよい。次に、チャンネル3は、combi1からcombi4までHi-Hatが選択されており、未決定の他のチャンネルよりもより高い順位で安定度が高く信頼度が高いことから、チャンネル3をHi-Hatと決定する。以下同様に各チャンネルに対応する楽器類を決定する。 Next, the instrument determining unit 309 determines instruments corresponding to each channel in the order of the acquired reliability. Specifically, for example, in the case shown in FIG. 13, since the reliability of channels 1 and 5 is the same, it is determined that channels 1 and 5 correspond to Kick and Bass, respectively. When the reliability is the same, the order of determination may be any. Next, for channel 3, Hi-Hat is selected from combi1 to combi4, and since channel 3 is called Hi-Hat because it has higher stability and higher reliability than other undecided channels. decide. In the same manner, instruments corresponding to each channel are determined.
 なお、本実施の形態において、安定して同じ楽器が選択されていないチャンネル、つまり、信頼度が所定の閾値以下のチャンネルについては楽器類の決定を保留するように構成してもよい。具体的には、例えば、図13に示す場合、チャンネル4及び7は選択されている楽器類が不安定なので保留にする等である。この場合、楽器類をユーザが確認し訂正した後に更に保留したチャンネルだけで上記と同様に、組み合わせスコア情報を取得し、保留した各チャンネルに対応する楽器類を決定するように構成してもよい。 In the present embodiment, the determination of musical instruments may be suspended for channels for which the same musical instrument is not stably selected, that is, for channels whose reliability is a predetermined threshold value or less. Specifically, for example, in the case shown in FIG. 13, channels 4 and 7 are put on hold because the selected musical instruments are unstable. In this case, after the user confirms and corrects the musical instruments, the combination score information is acquired only for the reserved channels, and the musical instruments corresponding to the reserved channels may be determined in the same manner as described above. .
 次に、図14を用いて、本実施の形態における音響信号を取得してから各チャンネルに対応する楽器類を決定するまでの処理のフローの一例について説明する。 Next, an example of a processing flow from acquisition of an acoustic signal in this embodiment to determination of musical instruments corresponding to each channel will be described with reference to FIG.
 まず、S201乃至S208については、第1の実施形態のS101乃至S108と同様であるので、説明を省略する。次に、組み合わせスコア情報取得部311は、チャンネル毎に楽器の組み合わせを網羅し、スコア情報取得部が取得したスコア情報に基づいてその合計スコアを取得する(S209)。組み合わせスコア情報抽出部312は、合計スコアの高い順に所定の数の組み合わせスコア情報を抽出する(S210)。楽器決定部309は、まず対応する楽器類が未決定のチャンネルがあるか否かを判定する(S211)。未決定チャンネルがあると判定した場合には、未決定チャンネルのうち、信頼度が最大のチャンネルの楽器類を決定する(S212)。そして、S211に戻る。一方、未決定チャンネルがないと判定した場合には処理を終了する。 First, S201 to S208 are the same as S101 to S108 of the first embodiment, and thus description thereof is omitted. Next, the combination score information acquisition unit 311 covers combinations of musical instruments for each channel, and acquires the total score based on the score information acquired by the score information acquisition unit (S209). The combination score information extraction unit 312 extracts a predetermined number of combination score information in descending order of the total score (S210). The instrument determining unit 309 first determines whether there is a channel for which the corresponding instrument has not been determined (S211). If it is determined that there is an undetermined channel, the instrument having the highest reliability among the undetermined channels is determined (S212). Then, the process returns to S211. On the other hand, if it is determined that there is no undetermined channel, the process ends.
 本実施の形態によれば、上記第1の実施形態と同様に、例えば、チャンネル毎に楽器類を識別する場合と比較して、より精度の高い楽器編成の識別を行うことができ、また、例えば、より容易にどの機器からの音響信号が入力されているかを把握することができる。また、本実施の形態によれば、上記第1の実施形態と比較してより精度の高い楽器編成の識別を行うことができる。 According to the present embodiment, as in the first embodiment, for example, it is possible to identify a musical instrument organization with higher accuracy than when identifying musical instruments for each channel. For example, it is possible to more easily determine from which device an acoustic signal is input. In addition, according to the present embodiment, it is possible to identify a musical instrument organization with higher accuracy than in the first embodiment.
 本発明は、上記実施の形態に限定されるものではなく、例えば、上記実施の形態で示した構成と実質的に同一の構成、同一の作用効果を奏する構成又は同一の目的を達成することができる構成で置き換えることができる。例えば、信頼度の取得については、第1の実施形態における信頼度の取得と組み合わせて用いてもよい。 The present invention is not limited to the above-described embodiment. For example, the configuration substantially the same as the configuration shown in the above-described embodiment, the configuration having the same operational effects, or the same object can be achieved. It can be replaced with a possible configuration. For example, the acquisition of reliability may be used in combination with the acquisition of reliability in the first embodiment.
[第3の実施形態]
 次に、本発明の第3の実施形態を説明する。本実施の形態においては、図15に示すように、相関値取得部313、相関値加算部314、トップマイク決定部315を有する点が第1の実施形態と異なる。なお、下記において第1の実施形態と同様である点については説明を省略する。
[Third Embodiment]
Next, a third embodiment of the present invention will be described. As shown in FIG. 15, the present embodiment is different from the first embodiment in that a correlation value acquisition unit 313, a correlation value addition unit 314, and a top microphone determination unit 315 are included. Note that the description of the same points as in the first embodiment will be omitted below.
 相関値取得部313は、各チャネル間の音響信号の相関に基づいた相関値を取得する。具体的には、例えば、図16に示すような各チャネル間の相関値データを取得する。相関値加算部314は、チャネル毎に相関値を加算し、合計値を求める。 Correlation value acquisition unit 313 acquires a correlation value based on the correlation of acoustic signals between channels. Specifically, for example, correlation value data between channels as shown in FIG. 16 is acquired. The correlation value adding unit 314 adds the correlation values for each channel to obtain a total value.
 トップマイク決定部315は、合計値に基づいてトップマイク105のチャンネルを決定する。具体的には、例えば、通常、トップマイク105は左右2つ配置されるため、合計値が最大のものから2つのチャンネルをトップマイク105であると決定する。具体的には、図16に示す場合、チャンネル3及び4の合計値(summary)が最大なので、チャンネル3及び4をトップマイクであると決定する。なお、その他、チャンネル間でオンセット、オフセット時間が重なる時間の総和を求め、当該時間に基づいてトップマイク105を決定するように構成してもよい。 The top microphone determination unit 315 determines the channel of the top microphone 105 based on the total value. Specifically, for example, normally, two top microphones 105 are arranged on the left and right, so that the two channels with the maximum total value are determined to be the top microphones 105. Specifically, in the case shown in FIG. 16, since the sum of channels 3 and 4 is the maximum, it is determined that channels 3 and 4 are top microphones. In addition, it may be configured such that the sum of the times when the onset and offset times overlap between channels is obtained, and the top microphone 105 is determined based on the time.
 また、オンセット、オフセット時刻がほぼ同時刻のペアのチャンネルを検出した場合には、他のチャンネルのオンセット時刻との時刻差や、音量差に基づいて、トップマイク105のチャンネルを決定するように構成してもよい。具体的には、例えば、上記第1の実施形態によりKickやSnare等のドラム類が決定されている場合には、各チャンネルの音量を当該ドラム類のうちの1のチャンネルの音量から減算する。その場合の様子を図17に示す。そして、当該ドラム類のチャンネルの音量と比べて最も低い音量(負の値でかつ最も絶対値が大きい)を表すチャンネルから順に2つのチャンネルをトップマイク105と決定するように構成する。 If a pair of channels having the same onset and offset times are detected, the channel of the top microphone 105 is determined based on the time difference from the onset time of other channels and the volume difference. You may comprise. Specifically, for example, when drums such as Kick and Snare are determined according to the first embodiment, the volume of each channel is subtracted from the volume of one of the drums. The state in that case is shown in FIG. Then, two channels are determined as the top microphone 105 in order from the channel representing the lowest volume (negative value and the largest absolute value) compared to the volume of the channel of the drum.
 また、例えば、図18に示すように、上記第1の実施形態によりKickやSnare等のドラム類が決定されている場合には、各チャンネルのオンセット時刻を当該ドラム類のうちの1のチャンネルのオンセット時刻から減算する。そして、当該ドラム類のチャンネルのオンセット時刻と比べて最も遅れたオンセット時刻(正の値でかつ最も絶対値が大きい)を表すチャンネル2つをトップマイク105と決定するように構成してもよい。なお、図17及び図18においては、各チャンネルが決定されていない場合を例として示しているが、例えば、第1の実施形態により、少なくとも上記当該ドラム類のうちの1つが決定されることを前提とする。 Further, for example, as shown in FIG. 18, when drums such as Kick and Snare are determined according to the first embodiment, the onset time of each channel is set to one of the drums. Subtract from the onset time. Further, the top microphone 105 may be configured to determine two channels that represent the onset time (positive value and the largest absolute value) that is delayed compared to the onset time of the channel of the drum type. Good. FIGS. 17 and 18 show an example in which each channel is not determined. For example, according to the first embodiment, at least one of the drums is determined. Assumption.
 画像情報生成部310は、各チャンネルと対応する各楽器類を表す画像情報を生成し、表示部に表示する。ここで、各楽器類にはトップマイク105が含まれる。 The image information generation unit 310 generates image information representing each instrument corresponding to each channel and displays it on the display unit. Here, each musical instrument includes a top microphone 105.
 本実施の形態によれば、例えば、上記第1及び第2の実施形態と比較して、トップマイクに対応するチャンネルをより精度よく決定することができる。 According to the present embodiment, for example, the channel corresponding to the top microphone can be determined with higher accuracy than in the first and second embodiments.
 本発明は、上記実施の形態に限定されるものではなく、上記実施の形態で示した構成と実質的に同一の構成、同一の作用効果を奏する構成又は同一の目的を達成することができる構成で置き換えることができる。 The present invention is not limited to the above-described embodiment, and is substantially the same configuration as the configuration shown in the above-described embodiment, a configuration that exhibits the same operational effects, or a configuration that can achieve the same purpose. Can be replaced.
 例えば、上記においては、主に、トップマイク105が2つの場合を例として説明したが、トップマイク105の数は2つに限られず、1つまたは3つ以上であってもよい。また、上記においては、第1の実施形態とトップマイク105に対応するチャンネルを決定する構成を組み合わせた場合を例として説明したが、第2の実施形態と組み合わせてもよいし、トップマイク105に対応するチャンネルを決定する構成のみを単独で実現するように構成してもよい。 For example, in the above description, the case where there are mainly two top microphones 105 has been described as an example, but the number of top microphones 105 is not limited to two, and may be one or three or more. In the above description, the case of combining the first embodiment and the configuration for determining the channel corresponding to the top microphone 105 has been described as an example. However, the configuration may be combined with the second embodiment, and the top microphone 105 may be combined. You may comprise so that only the structure which determines a corresponding channel may be implement | achieved independently.
 また、第3の実施形態は、第1または第2の実施形態と組み合わせて構成してもよい。ここで、例えば、スネアについては、すべての発音区間が、トップマイクとほぼ同じタイミングで存在し、音量(amp)が他のチャンネルよりも大きく、そして、Onset timeが早い。そこで、スネアと対応している発音区間の寄与度を下げるように構成してもよい。この場合、例えば、寄与度は、他のチャンネルで同じタイミングのオンセットのうちの音量が最大の発音区間との音量比等を用いてもよい。 Further, the third embodiment may be configured in combination with the first or second embodiment. Here, for example, with respect to the snare, all the sound generation sections exist at almost the same timing as the top microphone, the volume (amp) is larger than the other channels, and Onset time is faster. Thus, the contribution degree of the sound generation section corresponding to the snare may be reduced. In this case, for example, the contribution ratio may be a volume ratio with the sound generation section having the maximum volume in the onset at the same timing in other channels.
 また、上記第1乃至第3の実施形態においては、主に、楽器類識別装置をミキサ106として実現する場合について説明したが、ミキサ106とは別個に形成してもよいし、その他の音響装置内で実現してもよい。 Further, in the first to third embodiments, the case where the instrument identification device is realized as the mixer 106 has been mainly described. However, the instrument identification device may be formed separately from the mixer 106 or other acoustic device. It may be realized within.
 また、上記においては、信頼度に基づいて、楽器類を決定する構成について説明したが、信頼度に基づかずに、スコア情報や組み合わせスコア情報に基づいて、楽器類を決定するように構成してもよい。 In the above description, the configuration for determining musical instruments based on the reliability has been described. However, the configuration is such that the musical instruments are determined based on the score information and the combination score information without using the reliability. Also good.
 更に、上記においては、音響信号の発音区間の特徴に基づいて指標値の取得や特徴情報の取得等の処理を行う構成について説明したが、音響信号の特徴に基づいて指標値の取得や特徴情報の取得等の処理を行う構成であればその他の構成であってもよい。 Further, in the above description, the configuration for performing processing such as acquisition of an index value and acquisition of feature information based on the characteristics of the sounding section of the acoustic signal has been described. However, acquisition of the index value and feature information based on the characteristics of the acoustic signal are described. Any other configuration may be used as long as it performs processing such as acquisition.

Claims (12)

  1.  複数のチャンネル毎に取得された音響信号に基づいて、前記音響信号の特徴および前記音響信号の楽器類毎に、当該楽器類に対応する可能性を表す指標値により構成される指標値データを取得する指標値取得手段と、
     前記複数のチャンネル間の音響信号に基づいて、前記チャンネル間における前記音響信号の特徴を特徴情報として検出するチャンネル間特徴情報検出手段と、
     前記指標値データと前記特徴情報に基づいて、前記各チャンネルの前記楽器類毎に、前記楽器類に該当する確度に応じた値をスコア情報として生成するスコア情報生成手段と、
     を含むことを特徴とする楽器類識別装置。
    Based on the acoustic signal obtained for each of a plurality of channels, index value data constituted by an index value representing the characteristic of the acoustic signal and the possibility of corresponding to the instrument is obtained for each instrument of the acoustic signal. Index value acquisition means to
    Inter-channel feature information detecting means for detecting, as feature information, characteristics of the acoustic signal between the channels based on acoustic signals between the plurality of channels;
    Score information generating means for generating, as score information, a value corresponding to the accuracy corresponding to the musical instrument for each musical instrument of each channel based on the index value data and the feature information;
    A musical instrument identification device comprising:
  2.  前記指標値データは、前記各チャンネルに含まれる音響信号の音量が所定の閾値より大きい発音区間に基づいて構成されることを特徴とする請求項1に記載の楽器類識別装置。 2. The musical instrument identification apparatus according to claim 1, wherein the index value data is configured based on a sound generation section in which a volume of an acoustic signal included in each channel is larger than a predetermined threshold.
  3.  前記指標値取得手段は、前記音響信号が調和音か打楽器の音かの指標値を提供する第1のSVMと、前記音響信号がいずれの楽器類の調和音かを表す指標値を提供する第2のSVMと、前記音響信号がいずれの打楽器類の音かを表す指標値を提供する第3のSVMと、を含むことを特徴とする請求項1又は2に記載の楽器類識別装置。 The index value acquisition means provides a first SVM that provides an index value indicating whether the acoustic signal is a harmonic sound or a percussion instrument sound, and an index value that indicates which musical instrument the acoustic signal is a harmonic sound of. The musical instrument identification device according to claim 1, further comprising: a second SVM, and a third SVM that provides an index value indicating which percussion instrument sounds the acoustic signal is.
  4.  前記指標値取得手段は、前記第2のSVMからの指標値及び前記第3のSVMからの指標値にそれぞれ前記第1のSVMからの指標値に基づいて重みづけした値の和に基づいて前記指標値データを取得することを特徴とする請求項3に記載の楽器類識別装置。 The index value acquisition means is based on a sum of values obtained by weighting the index value from the second SVM and the index value from the third SVM based on the index value from the first SVM. 4. The musical instrument identification device according to claim 3, wherein index value data is acquired.
  5.  前記チャンネル間における前記音響信号の特徴情報は、前記チャンネル間における前記音響信号の発音区間の開始時刻または信号レベルに基づく特徴情報であることを特徴とする請求項1乃至4のいずれかに記載の楽器類識別装置。 The characteristic information of the acoustic signal between the channels is characteristic information based on a start time or a signal level of a sound generation period of the acoustic signal between the channels. Instrument identification device.
  6.  前記指標値に基づいて、前記各チャンネルの所定の信頼度を取得する信頼度取得手段と、
     前記信頼度と、前記スコア情報に基づいて、前記各チャンネルに対応する楽器類を決定する楽器類決定手段と、
     を含む特徴とする請求項1乃至5のいずれかに記載の楽器類識別装置。
    Reliability acquisition means for acquiring a predetermined reliability of each channel based on the index value;
    Instrument determining means for determining instruments corresponding to each channel based on the reliability and the score information;
    The musical instrument identification device according to claim 1, comprising:
  7.  前記信頼度は、前記指標値データにおける各指標値の分散、発音区間の数、または、全発音区間の平均音量に基づいて取得されることを特徴とする請求項6に記載の楽器類識別装置。 The instrument identification device according to claim 6, wherein the reliability is acquired based on a variance of each index value in the index value data, the number of sounding sections, or an average volume of all sounding sections. .
  8.  前記スコア情報に基づいて、前記各チャンネルが各楽器類の組み合わせに対応する指標を表す組み合わせスコア情報を、前記各楽器類の組み合わせ毎に取得する組み合わせスコア情報取得手段と、
     前記組み合わせスコア情報の指標が高い順に所定の数の組み合わせスコア情報を抽出する組み合わせスコア情報抽出手段と、
     前記抽出された組み合わせスコア情報に基づいて、前記各チャンネルの所定の信頼度を取得する信頼度取得手段と、
     前記組み合わせスコア情報と、前記信頼度に基づいて、前記各チャンネルに対応する楽器類を決定する楽器類決定手段と、
     を含むことを特徴とする請求項1乃至5のいずれかに記載の楽器類識別装置。
    Based on the score information, combination score information acquisition means for acquiring, for each combination of each instrument, combination score information representing an index corresponding to each instrument combination of each channel;
    Combination score information extracting means for extracting a predetermined number of combination score information in descending order of the index of the combination score information;
    A reliability acquisition means for acquiring a predetermined reliability of each channel based on the extracted combination score information;
    Musical instrument determination means for determining musical instruments corresponding to each channel based on the combination score information and the reliability;
    The musical instrument identification device according to claim 1, comprising:
  9.  前記信頼度は、前記抽出された組み合わせ情報において、より安定して同じ楽器類が選択されているか否かに基づくことを特徴とする請求項8に記載の楽器類識別装置。 The instrument identification device according to claim 8, wherein the reliability is based on whether or not the same instrument is more stably selected in the extracted combination information.
  10.  前記チャンネル毎に取得された音響信号に基づいて、前記各チャンネル間の音響信号の相関に基づいた相関値を取得する相関値取得手段と、
     前記相関値に基づいて、複数の楽器からの音を収音するよう配置されたトップマイクに対応するチャンネルを識別するトップマイク識別手段と、
     を含むことを特徴とする請求項1乃至9のいずれかに記載の楽器類識別装置。
    Correlation value acquisition means for acquiring a correlation value based on the correlation of the acoustic signal between the channels based on the acoustic signal acquired for each channel;
    Top microphone identifying means for identifying a channel corresponding to a top microphone arranged to collect sound from a plurality of musical instruments based on the correlation value;
    10. The musical instrument identification device according to claim 1, further comprising:
  11.  前記トップマイク識別手段は、前記チャンネル毎に取得された音響信号の発音区間の開始または終了時刻との関係に基づいて前記トップマイクに対応するチャンネルを識別することを特徴とする請求項10に記載の楽器類識別装置。 The said top microphone identification means identifies the channel corresponding to the said top microphone based on the relationship with the start or the end time of the sound generation area of the acoustic signal acquired for every said channel. Musical instrument identification device.
  12.  複数のチャンネル毎に取得された音響信号に基づいて、前記音響信号の特徴および前記音響信号の楽器類毎に、当該楽器類に対応する可能性を表す指標値により構成される指標値データを取得し、
     前記複数のチャンネル間の音響信号に基づいて、前記チャンネル間における前記音響信号の特徴を特徴情報として検出し、
     前記指標値データと前記特徴情報に基づいて、前記各チャンネルの前記楽器類毎に、前記楽器類に該当する確度に応じた値をスコア情報として生成する、
     を含むことを特徴とする楽器類識別方法。
    Based on the acoustic signal obtained for each of a plurality of channels, index value data constituted by an index value representing the characteristic of the acoustic signal and the possibility of corresponding to the instrument is obtained for each instrument of the acoustic signal. And
    Based on the acoustic signals between the plurality of channels, the characteristics of the acoustic signals between the channels are detected as feature information,
    Based on the index value data and the feature information, for each instrument of each channel, a value corresponding to the accuracy corresponding to the instrument is generated as score information.
    A method for identifying musical instruments, comprising:
PCT/JP2016/078754 2015-09-30 2016-09-29 Instrument type identification device and instrument sound identification method WO2017057532A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2015195238A JP6601109B2 (en) 2015-09-30 2015-09-30 Instrument identification device
JP2015-195238 2015-09-30

Publications (1)

Publication Number Publication Date
WO2017057532A1 true WO2017057532A1 (en) 2017-04-06

Family

ID=58423749

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/078754 WO2017057532A1 (en) 2015-09-30 2016-09-29 Instrument type identification device and instrument sound identification method

Country Status (2)

Country Link
JP (1) JP6601109B2 (en)
WO (1) WO2017057532A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115116232A (en) * 2022-08-29 2022-09-27 深圳市微纳感知计算技术有限公司 Voiceprint comparison method, device and equipment for automobile whistling and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023157132A (en) * 2022-04-14 2023-10-26 ヤマハ株式会社 Information processing method, information processing device, and program

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010092915A1 (en) * 2009-02-13 2010-08-19 日本電気株式会社 Method for processing multichannel acoustic signal, system thereof, and program
JP2013041128A (en) * 2011-08-17 2013-02-28 Dainippon Printing Co Ltd Discriminating device for plurality of sound sources and information processing device interlocking with plurality of sound sources

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010092915A1 (en) * 2009-02-13 2010-08-19 日本電気株式会社 Method for processing multichannel acoustic signal, system thereof, and program
JP2013041128A (en) * 2011-08-17 2013-02-28 Dainippon Printing Co Ltd Discriminating device for plurality of sound sources and information processing device interlocking with plurality of sound sources

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115116232A (en) * 2022-08-29 2022-09-27 深圳市微纳感知计算技术有限公司 Voiceprint comparison method, device and equipment for automobile whistling and storage medium
CN115116232B (en) * 2022-08-29 2022-12-09 深圳市微纳感知计算技术有限公司 Voiceprint comparison method, device and equipment for automobile whistling and storage medium

Also Published As

Publication number Publication date
JP6601109B2 (en) 2019-11-06
JP2017068125A (en) 2017-04-06

Similar Documents

Publication Publication Date Title
Bittner et al. Deep Salience Representations for F0 Estimation in Polyphonic Music.
US7649137B2 (en) Signal processing apparatus and method, program, and recording medium
US7601907B2 (en) Signal processing apparatus and method, program, and recording medium
Schmidt et al. Hearing loss in relation to sound exposure of professional symphony orchestra musicians
JP2008516289A (en) Method and apparatus for extracting a melody that is the basis of an audio signal
Saitis et al. Brightness perception for musical instrument sounds: Relation to timbre dissimilarity and source-cause categories
WO2017057530A1 (en) Audio processing device and audio processing method
JP6671245B2 (en) Identification device
JP2008516288A (en) Extraction of melody that is the basis of audio signal
WO2017057532A1 (en) Instrument type identification device and instrument sound identification method
US10298192B2 (en) Sound processing device and sound processing method
JP2016071291A (en) Mapping estimation apparatus
KR101907276B1 (en) System for practicing musical instrument and method for supporting the practice
McPherson et al. Relative pitch representations and invariance to timbre
JP6565548B2 (en) Acoustic analyzer
JP2021128297A (en) Estimation model construction method, performance analysis method, estimation model construction device, performance analysis device, and program
US9040799B2 (en) Techniques for analyzing parameters of a musical performance
JP2015200685A (en) Attack position detection program and attack position detection device
JP5153517B2 (en) Code name detection device and computer program for code name detection
Luizard et al. Changes in the voice production of solo singers across concert halls
KR101517957B1 (en) Method and apparatus for quantitative uassessment of acoustical perception and absoulte pitch
JP5843074B2 (en) Stringed instrument performance evaluation apparatus and stringed instrument performance evaluation program
US20080000345A1 (en) Apparatus and method for interactive
JP6604307B2 (en) Code detection apparatus, code detection program, and code detection method
Bauer et al. Tone onset detection using an auditory model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16851703

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16851703

Country of ref document: EP

Kind code of ref document: A1