JP2016057570A

JP2016057570A - Acoustic analysis device

Info

Publication number: JP2016057570A
Application number: JP2014186191A
Authority: JP
Inventors: 英樹阪梨; Hideki Sakanashi; 隆一成山; Ryuichi Nariyama; 舞小池; Mai Koike
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2014-09-12
Filing date: 2014-09-12
Publication date: 2016-04-21
Also published as: WO2016039463A1

Abstract

PROBLEM TO BE SOLVED: To positively estimate a subjective impression.SOLUTION: A correlation formula setting part 40 sets a correlation formula Fm for expressing a relation between an impression index Ym of an auditory impression in a correspondence relation regulated by a relation descriptive data DC using a plurality of reference data r making the impression index ym showing the auditory impression of a reference sound mutually correspond to an acoustic characteristic of the reference sound and the relation descriptive data DC for regulating the correspondence relation between the auditory impression and a plurality of acoustic characteristics.SELECTED DRAWING: Figure 1

Description

本発明は、音響を解析する技術に関する。 The present invention relates to a technique for analyzing sound.

楽曲の歌唱を評価する技術が従来から提案されている。例えば特許文献１には、歌唱音声の音高に加えてビブラートや抑揚等の歌唱表現を加味して歌唱を評価する技術が開示されている。また、特許文献２には、歌唱音声の音高（基本周波数）や音量に応じて歌唱を評価する技術が開示されている。 Techniques for evaluating song singing have been proposed. For example, Patent Literature 1 discloses a technique for evaluating a song in consideration of song expressions such as vibrato and intonation in addition to the pitch of the song voice. Patent Document 2 discloses a technique for evaluating a song according to the pitch (basic frequency) and volume of a singing voice.

特開２００５−１０７３３７号公報JP 2005-107337 A 特開２０１３−０２０２６５号公報JP2013-020265A

しかし、特許文献１や特許文献２の技術では、模範的な歌唱を示す基準値と評価対象の歌唱音声の特徴量との差異のみに着目した歌唱の客観的な巧拙が評価されるに過ぎず、歌唱音声の受聴者が感取する印象等の主観的な観点は適切に評価されないという問題がある。例えば、個性的または特徴的な歌唱は、実際には熟練した印象を受聴者に付与し得るが、模範的な歌唱からは乖離する結果、特許文献１や特許文献２の技術では低評価となる可能性が高い。なお、以上の説明では歌唱音声の評価を例示したが、楽器の演奏音や音響機器の再生音等の各種の音響を評価する場合にも、受聴者が感取する主観的な印象を適切に評価できないという事情は同様に存在する。以上の事情を考慮して、本発明は、音響の主観的な印象を適切に評価することを目的とする。 However, the techniques of Patent Document 1 and Patent Document 2 merely evaluate an objective skill of a singing that focuses only on the difference between a reference value indicating an exemplary singing and a feature value of the singing voice to be evaluated. There is a problem that subjective viewpoints such as impressions perceived by the listener of the singing voice are not properly evaluated. For example, individual or characteristic singing can actually give the listener a skilled impression, but as a result of deviating from the exemplary singing, the techniques of Patent Document 1 and Patent Document 2 are not rated highly. Probability is high. In the above description, the evaluation of the singing voice has been exemplified. However, when evaluating various sounds such as the performance sound of the musical instrument and the reproduction sound of the acoustic device, the subjective impression that the listener senses is appropriately set. There is a similar situation that cannot be evaluated. In view of the above circumstances, an object of the present invention is to appropriately evaluate a subjective impression of sound.

以上の課題を解決するために、本発明の音響解析装置は、参照音の聴覚印象を示す印象指標と当該参照音の音響特徴を示す特徴指標とを相互に対応させた複数の参照データ、および、聴覚印象と複数種の音響特徴との対応関係を規定する関係性記述データを利用して、関係性記述データで規定される対応関係における聴覚印象の印象指標と各音響特徴の特徴指標との関係を表現する関連式を設定する関連式設定手段を具備する。以上の構成では、聴覚印象の印象指標と各音響特徴の特徴指標との関係を表現する関連式が設定される。したがって、関連式設定手段が設定した関連式を利用することで、音響の主観的な印象を適切に評価することが可能である。 In order to solve the above problems, the acoustic analysis device of the present invention includes a plurality of reference data in which an impression index indicating an auditory impression of a reference sound and a feature index indicating an acoustic feature of the reference sound are associated with each other, and Using the relationship description data that defines the correspondence between the auditory impression and multiple types of acoustic features, the impression index of the auditory impression and the feature index of each acoustic feature in the correspondence specified by the relationship description data A relational expression setting unit for setting a relational expression expressing the relationship is provided. In the above configuration, a relational expression that expresses the relationship between the impression index of the auditory impression and the feature index of each acoustic feature is set. Therefore, by using the relational expression set by the relational expression setting means, it is possible to appropriately evaluate the subjective impression of sound.

ところで、参照データの統計的な解析のみで関連式を設定する構成では、疑似相関（特定の特徴指標が実際には特定の聴覚印象に相関しないのに潜在的な要因によって恰も相関するかのように推測される見掛け上の関係）の影響で、実際には聴覚印象に相関しない特徴指標が当該聴覚印象に優勢に影響するような関連式が導出される可能性がある。本発明では、印象指標と特徴指標とを相互に対応させた複数の参照データに加え、聴覚印象と複数種の音響特徴との対応関係を規定する関係性記述データを利用して関連式が設定される。したがって、参照データのみを利用して関連式を設定する構成と比較して、印象指標と複数の特徴指標との実際の相関を適切に反映した関連式（すなわち聴覚印象を適切に評価できる関連式）を設定できるという利点がある。 By the way, in the configuration in which the relational expression is set only by the statistical analysis of the reference data, the pseudo-correlation (as if the specific feature index does not actually correlate with the specific auditory impression but correlates with the wrinkles due to potential factors. There is a possibility that a relational expression in which a feature index that does not actually correlate with the auditory impression has an influence on the auditory impression is derived. In the present invention, in addition to a plurality of reference data in which an impression index and a feature index correspond to each other, a relational expression is set using relationship description data that defines a correspondence relationship between an auditory impression and a plurality of types of acoustic features. Is done. Therefore, compared to a configuration in which a relational expression is set using only reference data, a relational expression that appropriately reflects the actual correlation between the impression index and a plurality of feature indices (that is, a relational expression that can appropriately evaluate an auditory impression) ) Can be set.

本発明の好適な態様において、関係性記述データは、聴覚印象に包含される複数の中間要素を介した当該聴覚印象と複数種の音響特徴との対応関係を規定する。以上の態様では、聴覚印象に包含される複数の中間要素を介した当該聴覚印象と複数種の音響特徴との間の対応関係が関係性記述データで規定されるから、聴覚印象と各音響特徴とを直接的に相関させた場合と比較して、聴覚印象と各音響特徴との実際の相関を適切に反映した関連式を設定できるという前述の効果は格別に顕著である。 In a preferred aspect of the present invention, the relationship description data defines a correspondence relationship between the auditory impression and a plurality of types of acoustic features via a plurality of intermediate elements included in the auditory impression. In the above aspect, since the correspondence relationship between the auditory impression and a plurality of types of acoustic features via a plurality of intermediate elements included in the auditory impression is defined by the relationship description data, the auditory impression and each acoustic feature The above-described effect that the relational expression that appropriately reflects the actual correlation between the auditory impression and each acoustic feature can be set is particularly remarkable as compared with the case where the above is directly correlated.

本発明の好適な態様において、関連式設定手段は、複数種の聴覚印象の各々について関連式を設定する。以上の態様では、複数種の聴覚印象の各々について関連式が設定されるから、多様な観点から聴覚印象を適切に評価できるという利点がある。例えば歌唱音声の聴覚印象を評価するための関連式としては、長幼（大人っぽい／子供っぽい）と明暗（明るい／暗い）と清濁（清らかで透明感がある／嗄れて濁っている）とを含む複数種の聴覚印象の各々について関連式を設定する構成が格別に好適である。 In a preferred aspect of the present invention, the relational expression setting means sets a relational expression for each of a plurality of types of auditory impressions. In the above aspect, since the relational expression is set for each of a plurality of types of auditory impressions, there is an advantage that the auditory impression can be appropriately evaluated from various viewpoints. For example, the relational expressions for evaluating the auditory impression of singing voices are: young (adult / childish), light / dark (bright / dark), and turbidity (clean and transparent / cloudy) A configuration in which a relational expression is set for each of a plurality of types of auditory impressions including is particularly suitable.

本発明の好適な態様において、関連式設定手段は、参照データを取得し、当該参照データを利用して既定の関連式を更新する。以上の態様では、関連式の設定後に取得した参照データを利用して当該関連式が更新されるから、聴覚印象と各音響特徴との実際の相関を適切に反映した関連式を設定できるという前述の効果は格別に顕著である。 In a preferred aspect of the present invention, the relational expression setting unit obtains reference data and updates a predetermined relational expression using the reference data. In the above aspect, since the relational expression is updated using the reference data acquired after setting the relational expression, it is possible to set the relational expression that appropriately reflects the actual correlation between the auditory impression and each acoustic feature. The effect of is particularly remarkable.

本発明の他の態様に係る音響解析装置は、以上の各態様において生成された関連式を利用して解析対象音の聴覚印象を解析する装置であり、解析対象音の特徴指標を抽出する特徴抽出手段と、参照音の聴覚印象を示す印象指標と当該参照音の音響特徴を示す特徴指標とを相互に対応させた複数の参照データ、および、聴覚印象と複数種の音響特徴との対応関係を規定する関係性記述データを利用して算定され、関係性記述データで規定される対応関係における聴覚印象の印象指標と複数種の音響特徴の特徴指標との関係を表現する関連式に、特徴抽出手段が抽出した特徴指標を適用することで、解析対象音の印象指標を算定する印象特定手段とを具備する。以上の態様では、複数の参照データと関係性記述データとを利用することで印象指標と複数の特徴指標との実際の相関を適切に反映した関連式を利用して、解析対象音の聴覚印象を適切に評価することが可能である。 An acoustic analysis apparatus according to another aspect of the present invention is an apparatus that analyzes an auditory impression of a sound to be analyzed using the relational expressions generated in each of the above aspects, and a feature that extracts a feature index of the sound to be analyzed A plurality of reference data in which the extraction means, the impression index indicating the auditory impression of the reference sound and the feature index indicating the acoustic feature of the reference sound are associated with each other, and the correspondence relationship between the auditory impression and the plurality of types of acoustic features A relational expression that calculates the relationship between the impression index of auditory impression and the characteristic index of multiple types of acoustic features in the correspondence relationship defined by the relationship description data. Applying the feature index extracted by the extraction unit includes an impression specifying unit that calculates an impression index of the analysis target sound. In the above aspect, the auditory impression of the analysis target sound is obtained by using the relational expression that appropriately reflects the actual correlation between the impression index and the plurality of feature indices by using the plurality of reference data and the relationship description data. Can be evaluated appropriately.

以上の各態様に係る音響解析装置は、専用の電子回路で実現されるほか、ＣＰＵ（Central Processing Unit）等の汎用の演算処理装置とプログラムとの協働によっても実現される。本発明のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性（non-transitory）の記録媒体であり、ＣＤ-ＲＯＭ等の光学式記録媒体（光ディスク）が好例であるが、半導体記録媒体や磁気記録媒体等の公知の任意の形式の記録媒体を包含し得る。なお、例えば、本発明のプログラムは、通信網を介した配信の形態で提供されてコンピュータにインストールされ得る。また、以上の各態様に係る音響解析装置の動作方法（音響解析方法）としても本発明は特定される。 The acoustic analysis device according to each aspect described above is realized by a dedicated electronic circuit, or by cooperation of a general-purpose arithmetic processing device such as a CPU (Central Processing Unit) and a program. The program of the present invention can be provided in a form stored in a computer-readable recording medium and installed in the computer. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but a known arbitrary one such as a semiconductor recording medium or a magnetic recording medium This type of recording medium can be included. For example, the program of the present invention can be provided in the form of distribution via a communication network and installed in a computer. The present invention is also specified as an operation method (acoustic analysis method) of the acoustic analysis device according to each of the above aspects.

本発明の第１実施形態に係る音響解析装置の構成図である。1 is a configuration diagram of an acoustic analysis device according to a first embodiment of the present invention. 解析結果画像の模式図である。It is a schematic diagram of an analysis result image. 歌唱音声の聴覚印象を解析する動作のフローチャートである。It is a flowchart of the operation | movement which analyzes the auditory impression of a song voice. 関係性記述データで規定される聴覚印象と音響特徴との対応関係の説明図である。It is explanatory drawing of the correspondence of the auditory impression prescribed | regulated with relationship description data, and an acoustic feature. 第２実施形態に係る音響解析装置の構成図である。It is a block diagram of the acoustic analyzer which concerns on 2nd Embodiment. 第３実施形態に係る音響解析装置の構成図である。It is a block diagram of the acoustic analyzer which concerns on 3rd Embodiment. 第３実施形態の変形例の説明図である。It is explanatory drawing of the modification of 3rd Embodiment. 第４実施形態に係る音響解析装置の構成図である。It is a block diagram of the acoustic analyzer which concerns on 4th Embodiment. 第４実施形態における性状推定の説明図である。It is explanatory drawing of the property estimation in 4th Embodiment. 第４実施形態における性状推定の説明図である。It is explanatory drawing of the property estimation in 4th Embodiment. 第５実施形態に係る音響解析装置の構成図である。It is a block diagram of the acoustic analyzer which concerns on 5th Embodiment. 第５実施形態における操作画面の説明図である。It is explanatory drawing of the operation screen in 5th Embodiment. 第６実施形態に係る音響解析装置の構成図である。It is a block diagram of the acoustic analyzer which concerns on 6th Embodiment. 変形例に係る音響解析装置の構成図である。It is a block diagram of the acoustic analyzer which concerns on a modification.

＜第１実施形態＞
図１は、本発明の第１実施形態に係る音響解析装置１００Aの構成図である。第１実施形態の音響解析装置１００Aは、演算処理装置１０と記憶装置１２と入力装置１４と収音装置１６と表示装置１８とを具備するコンピュータシステムで実現される。例えば携帯電話機またはスマートフォン等の可搬型の情報処理装置やパーソナルコンピュータ等の可搬型または据置型の情報処理装置が音響解析装置１００Aとして利用され得る。 <First Embodiment>
FIG. 1 is a configuration diagram of an acoustic analysis device 100A according to the first embodiment of the present invention. The acoustic analysis device 100A according to the first embodiment is realized by a computer system including an arithmetic processing device 10, a storage device 12, an input device 14, a sound collection device 16, and a display device 18. For example, a portable information processing device such as a mobile phone or a smartphone, or a portable or stationary information processing device such as a personal computer can be used as the acoustic analysis device 100A.

収音装置１６は、周囲の音響を収音する機器（マイクロホン）である。第１実施形態の収音装置１６は、利用者が楽曲を歌唱した歌唱音声Ｖを収音する。音響解析装置１００Aは、楽曲の伴奏音と歌唱音声Ｖとを混合して再生するカラオケ装置としても利用され得る。なお、収音装置１６が収音した歌唱音声Ｖの信号をアナログからデジタルに変換するＡ/Ｄ変換器の図示は便宜的に省略した。 The sound collection device 16 is a device (microphone) that collects ambient sounds. The sound collection device 16 of the first embodiment collects a singing voice V in which a user sang a song. The acoustic analysis device 100A can also be used as a karaoke device that mixes and reproduces the accompaniment sound of the music and the singing voice V. In addition, illustration of the A / D converter which converts the signal of the singing voice V picked up by the sound pickup device 16 from analog to digital is omitted for convenience.

表示装置１８（例えば液晶表示パネル）は、演算処理装置１０から指示された画像を表示する。入力装置１４は、音響解析装置１００Aに対する各種の指示のために利用者が操作する操作機器であり、例えば利用者が操作する複数の操作子を含んで構成される。表示装置１８と一体に構成されたタッチパネルを入力装置１４として利用することも可能である。記憶装置１２は、演算処理装置１０が実行するプログラムや演算処理装置１０が使用する各種のデータを記憶する。半導体記録媒体や磁気記録媒体等の公知の記録媒体または複数種の記録媒体の組合せが記憶装置１２として任意に採用される。 The display device 18 (for example, a liquid crystal display panel) displays an image instructed from the arithmetic processing device 10. The input device 14 is an operating device operated by the user for various instructions to the acoustic analysis device 100A, and includes a plurality of operators operated by the user, for example. A touch panel configured integrally with the display device 18 can also be used as the input device 14. The storage device 12 stores a program executed by the arithmetic processing device 10 and various data used by the arithmetic processing device 10. A known recording medium such as a semiconductor recording medium or a magnetic recording medium or a combination of a plurality of types of recording media is arbitrarily employed as the storage device 12.

第１実施形態の音響解析装置１００Aは、収音装置１６が収音した歌唱音声Ｖを解析する信号処理装置である。演算処理装置１０は、記憶装置１２に記憶されたプログラムを実行することで、歌唱音声Ｖを解析するための複数の機能（特徴抽出部２２，印象特定部２４，提示処理部２６，関連式設定部４０）を実現する。なお、演算処理装置１０の各機能を複数の装置に分散した構成や、演算処理装置１０の機能の一部を専用の電子回路が実現する構成も採用され得る。 The acoustic analysis device 100A of the first embodiment is a signal processing device that analyzes the singing voice V collected by the sound collection device 16. The arithmetic processing device 10 executes a program stored in the storage device 12 to thereby analyze a plurality of functions for analyzing the singing voice V (a feature extraction unit 22, an impression identification unit 24, a presentation processing unit 26, and a related expression setting). Part 40). A configuration in which each function of the arithmetic processing device 10 is distributed to a plurality of devices or a configuration in which a dedicated electronic circuit realizes a part of the function of the arithmetic processing device 10 may be employed.

特徴抽出部２２は、収音装置１６が収音した歌唱音声Ｖを解析することで、相異なる種類の音響特徴を示す複数（Ｎ個）の特徴指標Ｘ1〜ＸNを抽出する（Ｎは自然数）。音響特徴は、歌唱音声Ｖの受聴者が感取する聴感的な印象（以下「聴覚印象」という）に影響する歌唱音声Ｖの音響的な特徴を意味する。具体的には、音高（ピッチ）の安定度，ビブラートの深度（音高の振幅），周波数特性等の多様な音響特徴の各々を数値化した特徴指標Ｘn（ｎ＝１〜Ｎ）が歌唱音声Ｖから抽出される。第１実施形態の特徴抽出部２２が抽出するＮ個の特徴指標Ｘ1〜ＸNの数値範囲は共通する。以上の説明から理解される通り、聴覚印象は、歌唱音声Ｖの受聴者が感取する主観的ないし感覚的な特徴（印象）を意味し、音響特徴は、歌唱音声Ｖの解析で抽出される客観的ないし物理的な特徴（特性）を意味する。 The feature extraction unit 22 analyzes the singing voice V collected by the sound collection device 16 to extract a plurality (N) of feature indexes X1 to XN indicating different types of acoustic features (N is a natural number). . The acoustic feature means an acoustic feature of the singing voice V that influences an auditory impression (hereinafter referred to as “auditory impression”) sensed by the listener of the singing voice V. Specifically, the feature index Xn (n = 1 to N) quantifying each of various acoustic features such as pitch (pitch) stability, vibrato depth (pitch amplitude), frequency characteristics, etc. is sung. Extracted from voice V. The numerical value ranges of the N feature indexes X1 to XN extracted by the feature extraction unit 22 of the first embodiment are common. As understood from the above description, the auditory impression means a subjective or sensory feature (impression) that is perceived by the listener of the singing voice V, and the acoustic feature is extracted by analysis of the singing voice V. Means an objective or physical feature.

印象特定部２４は、特徴抽出部２２が抽出したＮ個の特徴指標Ｘ1〜ＸNを利用して歌唱音声Ｖの聴覚印象を特定する。第１実施形態の印象特定部２４は、歌唱音声Ｖの相異なる聴覚印象を示す複数（Ｍ個）の印象指標Ｙ1〜ＹMを算定する（Ｍは自然数）。第１実施形態における任意の１個の印象指標Ｙm（ｍ＝１〜Ｍ）は、相互に対立する２種類の印象の程度を数値化した指標である。具体的には、長幼（大人っぽい／子供っぽい），明暗（明るい／暗い），清濁（清らかで透明感がある／嗄れて濁っている）等の多様な聴覚印象の各々を数値化した印象指標Ｙmが特定される。例えば長幼に関する１個の印象指標Ｙmが正数の範囲で大きいほど大人っぽい音声を意味し、当該印象指標Ｙmが負数の範囲で小さいほど子供っぽい音声を意味する。 The impression specifying unit 24 specifies the auditory impression of the singing voice V using the N feature indexes X1 to XN extracted by the feature extracting unit 22. The impression specifying unit 24 of the first embodiment calculates a plurality (M) of impression indexes Y1 to YM indicating different auditory impressions of the singing voice V (M is a natural number). The arbitrary one impression index Ym (m = 1 to M) in the first embodiment is an index obtained by quantifying the degree of two types of impressions that oppose each other. Specifically, each of various auditory impressions such as young children (adult / childish), light / dark (bright / dark), and turbidity (clear and transparent / slow and muddy) were quantified. An impression index Ym is specified. For example, the larger an impression index Ym related to a young child is in a positive number range, the more adult-like sound is meant, and the smaller the impression index Ym is in a negative number range, the more child-like sound is meant.

Ｎ個の特徴指標Ｘ1〜ＸNに応じた印象指標Ｙm（Ｙ1〜ＹM）の算定には、印象指標Ｙm毎に事前に設定された演算式（以下「関連式」という）Ｆmが利用される。任意の１個の関連式Ｆmは、印象指標ＹmとＮ個の特徴指標Ｘ1〜ＸNとの関係を表現する演算式である。第１実施形態の関連式Ｆmは、以下に例示される通り、Ｎ個の特徴指標Ｘ1〜ＸNの一次式で各印象指標Ｙmを表現する。

For the calculation of the impression index Ym (Y1 to YM) corresponding to the N feature indices X1 to XN, an arithmetic expression (hereinafter referred to as “related expression”) Fm set in advance for each impression index Ym is used. An arbitrary relational expression Fm is an arithmetic expression that expresses the relationship between the impression index Ym and the N feature indices X1 to XN. The relational expression Fm of the first embodiment represents each impression index Ym as a linear expression of N feature indices X1 to XN, as exemplified below.

以上に例示した関連式Ｆmの係数ａnm（ａ11〜ａNM）は、特徴指標Ｘnと印象指標Ｙmとの相関の度合に応じた定数（特徴指標Ｘnに対する印象指標Ｙmの勾配）であり、係数ｂm（ｂ1〜ｂM）は所定の定数（切片）である。係数ａnmは、印象指標Ｙmに対する特徴指標Ｘnの寄与度（加重値）とも換言され得る。印象特定部２４は、特徴抽出部２２が抽出したＮ個の特徴指標Ｘ1〜ＸNを関連式Ｆ1〜ＦMの各々に適用することで、相異なる聴覚印象に対応するＭ個の印象指標Ｙ1〜ＹMを算定する。第１実施形態の印象特定部２４は、各特徴指標Ｘnから算定したＭ個の印象指標Ｙ1〜ＹMに応じた歌唱スタイル情報Ｓを生成する。具体的には、Ｍ個の印象指標Ｙ1〜ＹMを要素とするＭ次元のベクトルが歌唱スタイル情報Ｓとして生成される。以上の説明から理解される通り、歌唱スタイル情報Ｓは、歌唱音声ＶのＭ種類の聴覚印象（受聴者が感取する主観的な歌唱スタイル）を総合的に表現する。なお、第１実施形態では前述の通り線形システムを例示するが、隠れマルコフモデルやニューラルネットワーク（多層パーセプトロン）等の非線形システムを印象指標Ｙm（Ｙ1〜ＹM）の算定に利用することも可能である。 The coefficient anm (a11 to aNM) of the relational expression Fm exemplified above is a constant (gradient of the impression index Ym with respect to the feature index Xn) according to the degree of correlation between the feature index Xn and the impression index Ym, and the coefficient bm ( b1 to bM) are predetermined constants (intercepts). The coefficient anm can also be restated as the contribution (weighted value) of the feature index Xn to the impression index Ym. The impression specifying unit 24 applies the N feature indexes X1 to XN extracted by the feature extraction unit 22 to each of the related expressions F1 to FM, so that M impression indexes Y1 to YM corresponding to different auditory impressions are applied. Is calculated. The impression specifying unit 24 of the first embodiment generates singing style information S corresponding to the M impression indices Y1 to YM calculated from the feature indices Xn. Specifically, an M-dimensional vector having M impression indexes Y1 to YM as elements is generated as singing style information S. As understood from the above description, the singing style information S comprehensively represents M types of auditory impressions (subjective singing styles perceived by the listener) of the singing voice V. Although the linear system is exemplified in the first embodiment as described above, a nonlinear system such as a hidden Markov model or a neural network (multilayer perceptron) can be used for calculating the impression index Ym (Y1 to YM). .

図１の提示処理部２６は、表示装置１８に各種の画像を表示させる。具体的には、第１実施形態の提示処理部２６は、印象特定部２４が特定した歌唱音声ＶのＭ個の印象指標Ｙ1〜ＹM（歌唱スタイル情報Ｓ）を表現する解析結果画像７０を表示装置１８に表示させる。 The presentation processing unit 26 in FIG. 1 displays various images on the display device 18. Specifically, the presentation processing unit 26 of the first embodiment displays an analysis result image 70 that expresses the M impression indices Y1 to YM (singing style information S) of the singing voice V specified by the impression specifying unit 24. It is displayed on the device 18.

図２は、Ｍ種類の印象指標Ｙ1〜ＹMのうち長幼（大人っぽい／子供っぽい）に関する１個の印象指標Ｙ1と清濁（清らかで透明感がある／嗄れて濁っている）に関する１個の印象指標Ｙ2とを表象する解析結果画像７０の表示例である。図２から理解される通り、解析結果画像７０は、印象指標Ｙ1の数値を示す第１軸７１と印象指標Ｙ2の数値を示す第２軸７２とが設定された座標平面を包含する。第１軸７１のうち印象特定部２４が算定した印象指標Ｙ1の数値と、第２軸７２のうち印象特定部２４が算定した印象指標Ｙ2の数値とに対応した座標位置に、歌唱音声Ｖの聴覚印象を意味する画像（アイコン）７４が配置される。以上の説明から理解される通り、解析結果画像７０は、歌唱音声Ｖの聴覚印象を表象する画像（長幼や清濁を含む歌唱スタイルを表象する画像）である。利用者は、表示装置１８に表示された解析結果画像７０を視認することで、歌唱音声Ｖの聴覚印象を視覚的および直観的に把握することが可能である。 FIG. 2 shows one impression index Y1 related to a young child (adult / childish) among M kinds of impression indexes Y1 to YM and one related to turbidity (clean and transparent / slowly turbid) This is a display example of an analysis result image 70 representing the impression index Y2. As understood from FIG. 2, the analysis result image 70 includes a coordinate plane in which a first axis 71 indicating the numerical value of the impression index Y1 and a second axis 72 indicating the numerical value of the impression index Y2 are set. The singing voice V is located at a coordinate position corresponding to the numerical value of the impression index Y1 calculated by the impression specifying unit 24 in the first axis 71 and the numerical value of the impression index Y2 calculated in the second axis 72 by the impression specifying unit 24. An image (icon) 74 meaning an auditory impression is arranged. As understood from the above description, the analysis result image 70 is an image representing an auditory impression of the singing voice V (an image representing a singing style including young children and turbidity). The user can visually and intuitively grasp the auditory impression of the singing voice V by visually recognizing the analysis result image 70 displayed on the display device 18.

図３は、歌唱音声Ｖの聴覚印象を解析する動作のフローチャートである。例えば入力装置１４に対する利用者からの操作（解析開始の指示）を契機として図３の処理が開始される。図３の処理を開始すると、特徴抽出部２２は、収音装置１６が収音した歌唱音声Ｖを取得し（Ｓ1）、歌唱音声Ｖのうち解析区間の音響特徴を示すＮ個の特徴指標Ｘ1〜ＸNを抽出する（Ｓ2）。解析区間は、歌唱音声Ｖのうち聴覚印象の解析対象となる区間であり、例えば歌唱音声Ｖの全区間または一部の区間（例えばサビ区間）である。印象特定部２４は、特徴抽出部２２が抽出したＮ個の特徴指標Ｘ1〜ＸNを各関連式Ｆmに適用することでＭ個の印象指標Ｙ1〜ＹMを算定する（Ｓ3）。提示処理部２６は、印象特定部２４による解析結果を表現する図２の解析結果画像７０を表示装置１８に表示させる（Ｓ4）。 FIG. 3 is a flowchart of the operation of analyzing the auditory impression of the singing voice V. For example, the processing of FIG. 3 is started when an operation (instruction to start analysis) from the user with respect to the input device 14 is triggered. When the processing of FIG. 3 is started, the feature extraction unit 22 acquires the singing voice V picked up by the sound pickup device 16 (S1), and N feature indices X1 indicating the acoustic features of the analysis section of the singing voice V. .About.XN are extracted (S2). The analysis section is a section of the singing voice V that is an analysis target of an auditory impression, and is, for example, the entire section or a part of the singing voice V (for example, a chorus section). The impression specifying unit 24 calculates M impression indexes Y1 to YM by applying the N feature indexes X1 to XN extracted by the feature extraction unit 22 to each related expression Fm (S3). The presentation processing unit 26 causes the display device 18 to display the analysis result image 70 of FIG. 2 representing the analysis result by the impression specifying unit 24 (S4).

図１の関連式設定部４０は、各聴覚印象の印象指標Ｙmの算定に利用される関連式Ｆm（Ｆ1〜ＦM）を設定する。図１に例示される通り、第１実施形態の記憶装置１２には、参照データ群ＤRと関係性記述データＤCとが格納される。関連式設定部４０は、参照データ群ＤRと関係性記述データＤCとを利用してＭ個の関連式Ｆ1〜ＦMを設定する。 The relational expression setting unit 40 in FIG. 1 sets relational expressions Fm (F1 to FM) used for calculating the impression index Ym of each auditory impression. As illustrated in FIG. 1, the storage device 12 of the first embodiment stores a reference data group DR and relationship description data DC. The related expression setting unit 40 sets M related expressions F1 to FM using the reference data group DR and the relationship description data DC.

参照データ群ＤRは、複数の参照データｒの集合（データベース）である。参照データ群ＤRに包含される複数の参照データｒは、不特定多数の発声者が発音した音声（以下「参照音」という）を利用して事前に生成される。例えば任意の発声者が任意の楽曲を歌唱した音声が参照音として収録されて参照データｒの生成に利用される。図１に例示される通り、任意の１個の参照データｒは、参照音の各印象指標ｙm（ｙ1〜ｙM）と当該参照音の特徴指標ｘn（ｘ1〜ｘN）とを相互に対応させたデータである。印象指標ｙmは、参照音の受聴者が実際に感取した聴覚印象に応じた数値に設定され、特徴指標ｘnは、特徴抽出部２２と同様の処理で参照音から抽出された音響特徴の数値に設定される。すなわち、各参照データｒは、印象指標ｙmと特徴指標ｘnとの関係を実際に観測した資料（学習データ）に相当する。 The reference data group DR is a set (database) of a plurality of reference data r. The plurality of reference data r included in the reference data group DR is generated in advance by using a sound (hereinafter referred to as “reference sound”) generated by an unspecified number of speakers. For example, the sound of an arbitrary singer singing an arbitrary piece of music is recorded as a reference sound and used to generate reference data r. As illustrated in FIG. 1, any one piece of reference data r corresponds to each impression index ym (y1 to yM) of the reference sound and the feature index xn (x1 to xN) of the reference sound. It is data. The impression index ym is set to a numerical value corresponding to the auditory impression actually sensed by the listener of the reference sound, and the characteristic index xn is a numerical value of the acoustic feature extracted from the reference sound in the same process as the feature extracting unit 22. Set to That is, each reference data r corresponds to material (learning data) in which the relationship between the impression index ym and the feature index xn is actually observed.

関係性記述データＤCは、聴覚印象と複数の音響特徴との間の対応関係（相関関係）を規定する。図４は、第１実施形態の関係性記述データＤCで規定される対応関係を例示する説明図である。図４に例示される通り、第１実施形態の関係性記述データＤCは、相異なる印象指標Ｙmに対応するＭ種類の聴覚印象ＥY（ＥY1〜ＥYM）の各々について、当該聴覚印象ＥYmに影響する複数種の音響特徴ＥXとの対応関係λm（λ1〜λM）を規定する。図４には、長幼と清濁と明暗との３種類の聴覚印象ＥY1〜ＥY3の各々について複数種の音響特徴ＥXとの対応関係λ1〜λ3が例示されている。 The relationship description data DC defines a correspondence relationship (correlation) between an auditory impression and a plurality of acoustic features. FIG. 4 is an explanatory view illustrating the correspondence defined by the relationship description data DC of the first embodiment. As illustrated in FIG. 4, the relationship description data DC of the first embodiment affects the auditory impression EYm for each of M types of auditory impressions EY (EY1 to EYM) corresponding to different impression indices Ym. A correspondence λm (λ1 to λM) with a plurality of types of acoustic features EX is defined. FIG. 4 illustrates correspondence relationships λ1 to λ3 with a plurality of types of acoustic features EX for each of the three types of auditory impressions EY1 to EY3, which are childhood, clearness, and light and dark.

各聴覚印象ＥYmに相関する音響特徴ＥXの具体的な内容は以下の通りである。以下に例示する各音響特徴ＥXの数値が前述の特徴指標Ｘnに相当する。
・音高の安定度：時間的な音高の微小変化（揺らぎ）の度合
・立上がりの速度：発音直後の音量の増加の度合
・フォール：音高を基準値（音符の音高）から低下させる歌唱表現の度合（例えば回数）
・しゃくり：音高を基準値から経時的に上昇させる歌唱表現の度合（例えば回数）
・ビブラートの深度：ビブラートにおける音高の変化の度合（例えば振幅や回数）
・輪郭：音響の明瞭性の度合。例えば、低域成分に対する高域成分の音量比が好適。
・滑舌：音響特性の時間的な変化の度合。例えば、周波数特性（例えばホルマント周波数や基本周波数）の時間的な変化の度合（典型的には時間変化率）が好適。
・アタック：発音直後の音量
・クレッシェンド：音量の経時的な増加の度合
・周波数特性：周波数スペクトルの形状
・高次倍音：高次側（高域側）の倍音成分の強度 Specific contents of the acoustic feature EX correlated with each auditory impression EYm are as follows. The numerical value of each acoustic feature EX exemplified below corresponds to the above-described feature index Xn.
・ Pitch stability: Degree of minute change (fluctuation) in time ・ Rise speed: Degree of increase in volume immediately after pronunciation ・ Fall: Decrease the pitch from the reference value (note pitch) Degree of singing expression (eg number of times)
・ Scribbling: the degree of singing expression that raises the pitch over time from the reference value (for example, the number of times)
・ Vibrato depth: the degree of pitch change in vibrato (eg amplitude and frequency)
Contour: degree of sound clarity. For example, the volume ratio of the high frequency component to the low frequency component is suitable.
-Tongue: The degree of temporal change in acoustic characteristics. For example, the degree of temporal change (typically the time change rate) of the frequency characteristics (for example, formant frequency or fundamental frequency) is suitable.
・ Attack: Volume immediately after sound generation ・ Crescend: Degree of increase in volume over time ・ Frequency characteristics: Shape of frequency spectrum ・ Higher harmonics: Intensity of higher harmonic components

図４に例示される通り、第１実施形態の関係性記述データＤCが任意の１種類の聴覚印象ＥYmについて規定する対応関係λmは、当該聴覚印象ＥYmに関連する複数種の中間要素ＥZを聴覚印象ＥYmと各音響特徴ＥXとの間に介在させた階層関係（階層構造）である。１種類の聴覚印象ＥYmに関連する複数種の中間要素ＥZは、当該聴覚印象ＥYmを受聴者に知覚させる要因となる印象や当該聴覚印象ＥYmを複数に細分化した印象に相当する。任意の１個の中間要素ＥZには、当該中間要素ＥZに影響する複数種の音響特徴ＥXが対応付けられる。 As illustrated in FIG. 4, the correspondence relationship λm that the relationship description data DC of the first embodiment defines for any one type of auditory impression EYm is used to identify multiple types of intermediate elements EZ related to the auditory impression EYm. This is a hierarchical relationship (hierarchical structure) interposed between the impression EYm and each acoustic feature EX. A plurality of types of intermediate elements EZ related to one type of auditory impression EYm correspond to an impression that causes the listener to perceive the auditory impression EYm and an impression obtained by subdividing the auditory impression EYm into a plurality of parts. Any one intermediate element EZ is associated with a plurality of types of acoustic features EX that affect the intermediate element EZ.

関係性記述データＤCで規定される各対応関係λmは、例えば、音楽や音声（歌唱）に関する専門的な知識が豊富な識者（例えば音楽の制作者または指導者や歌手等）に対する調査（インタビューやアンケート）により、各聴覚印象ＥYmと各音響特徴ＥXとの間の相関（どのような音響特徴ＥXの音声から受聴者が如何なる聴覚印象ＥYmを感取する傾向があるのか）を解析することで構築される。対応関係λmの構築には、評価グリッド法等に代表される公知の調査手法が任意に採用され得る。 Each correspondence λm defined in the relationship description data DC is, for example, a survey (interviews, singer, etc.) for experts who have a lot of specialized knowledge about music and voice (singing). Constructed by analyzing the correlation between each acoustic impression EYm and each acoustic feature EX (what kind of acoustic impression EYm the listener tends to perceive from the sound of the acoustic feature EX) by questionnaire) Is done. For the construction of the correspondence relationship λm, a known investigation technique represented by an evaluation grid method or the like can be arbitrarily employed.

以上に説明した関係性記述データＤCは、対応関係λmに包含される各要素（音響特徴ＥX，中間要素ＥZ，聴覚印象ＥYm）の相互的な関係（連結）のみを規定し、各要素間の相関の度合については規定されない。以上の観点からすると、関係性記述データＤCで規定される各対応関係λmは、現実に不特定多数の発声者から収集した参照音から観測される音響特徴ＥXと聴覚印象ＥYmとの実際の相関（すなわち、現実の参照音の傾向が反映された参照データ群ＤRから統計的に観測される各印象指標ｙmと各特徴指標ｘnとの実際の関係）までは反映されていない仮説的な関係であると言える。 The relationship description data DC described above defines only the mutual relationship (connection) of each element (acoustic feature EX, intermediate element EZ, auditory impression EYm) included in the correspondence relationship λm, and between the elements. The degree of correlation is not specified. From the above viewpoint, each correspondence λm defined by the relationship description data DC is an actual correlation between the acoustic feature EX and the auditory impression EYm observed from the reference sounds collected from a large number of unspecified speakers. (I.e., the actual relationship between each impression index ym and each feature index xn statistically observed from the reference data group DR reflecting the tendency of the actual reference sound) It can be said that there is.

以上に説明した参照データ群ＤRと関係性記述データＤCとが事前に作成されて記憶装置１２に格納される。図１の関連式設定部４０は、記憶装置１２に格納された参照データ群ＤRと関係性記述データＤCとを利用してＭ個の関連式Ｆ1〜ＦMを設定する。すなわち、関連式設定部４０は、関係性記述データＤCが規定する各対応関係λmのもとで聴覚印象ＥYmの印象指標Ｙmと音響特徴ＥXの各特徴指標Ｘnとの関係を表現する関連式Ｆmを、Ｍ個の印象指標Ｙ1〜ＹMの各々について設定する。具体的には、参照データ群ＤRの複数の参照データｒにおける印象指標ｙmと特徴指標ｘnとの相関の度合を関係性記述データＤCの対応関係λmに反映した関係が関連式Ｆmで表現されるように、関連式設定部４０は、各関連式ＦmのＮ個の係数ａ1m〜ａNmと１個の係数ｂmとを設定する。関連式設定部４０による各関連式Ｆmの設定には、例えば、構造方程式モデリング（SEM：Structural Equation Modeling）や多変量解析（例えば重回帰分析）等の公知の統計処理が任意に採用され得る。なお、図４の例示から理解される通り、関係性記述データＤCで表現される対応関係λmのもとで聴覚印象ＥYmとの相関が規定される音響特徴ＥXの種類や総数は、実際には聴覚印象ＥYm毎に相違するが、前掲の各関連式Ｆmに包含される特徴指標Ｘnの種類や総数はＭ個の関連式Ｆ1〜ＦMにわたり共通する。対応関係λmのもとで聴覚印象ＥYmとの相関が規定されていない音響特徴ＥXの特徴指標Ｘnに対応する係数ａnmは、関連式Ｆmにてゼロに設定される（すなわち、当該特徴指標Ｘnは印象指標Ｙmに影響しない）。 The reference data group DR and the relationship description data DC described above are created in advance and stored in the storage device 12. The relational expression setting unit 40 in FIG. 1 sets M relational expressions F1 to FM using the reference data group DR and the relationship description data DC stored in the storage device 12. That is, the relational expression setting unit 40 represents the relational expression Fm that expresses the relationship between the impression index Ym of the auditory impression EYm and the characteristic index Xn of the acoustic feature EX under the corresponding relations λm defined by the relationship description data DC. Is set for each of the M impression indices Y1 to YM. Specifically, the relationship in which the degree of correlation between the impression index ym and the feature index xn in the plurality of reference data r in the reference data group DR is reflected in the correspondence relationship λm of the relationship description data DC is expressed by the relational expression Fm. As described above, the relational expression setting unit 40 sets N coefficients a1m to aNm and one coefficient bm for each relational expression Fm. For the setting of each relational expression Fm by the relational expression setting unit 40, for example, known statistical processing such as structural equation modeling (SEM: Structural Equation Modeling) or multivariate analysis (for example, multiple regression analysis) can be arbitrarily employed. As understood from the example of FIG. 4, the type and total number of acoustic features EX that are correlated with the auditory impression EYm based on the correspondence λm expressed by the relationship description data DC are actually Although different for each auditory impression EYm, the type and the total number of feature indexes Xn included in each of the related formulas Fm described above are common to M related formulas F1 to FM. The coefficient anm corresponding to the feature index Xn of the acoustic feature EX whose correlation with the auditory impression EYm is not defined under the correspondence relationship λm is set to zero in the related expression Fm (that is, the feature index Xn is Does not affect the impression index Ym).

以上の手順で関連式設定部４０が設定したＭ個の関連式（例えば構造方程式や重回帰式）Ｆ1〜ＦMは記憶装置１２に格納される。具体的には、Ｎ個の係数ａ1m〜ａNmと１個の係数ｂmとがＭ個の関連式Ｆ1〜ＦMの各々について記憶装置１２に格納される。前述の通り、印象特定部２４は、関連式設定部４０が設定したＭ個の関連式Ｆ1〜ＦMの各々にＮ個の特徴指標Ｘ1〜ＸNを適用することでＭ種類の印象指標Ｙ1〜ＹMを算定する。 The M relational expressions (for example, structural equations and multiple regression equations) F1 to FM set by the relational expression setting unit 40 in the above procedure are stored in the storage device 12. Specifically, N coefficients a1m to aNm and one coefficient bm are stored in the storage device 12 for each of the M related expressions F1 to FM. As described above, the impression specifying unit 24 applies the M feature indexes X1 to XN to each of the M related formulas F1 to FM set by the related formula setting unit 40, so that the M types of impression indexes Y1 to YM are applied. Is calculated.

以上に説明した通り、第１実施形態では、歌唱音声Ｖから抽出される各特徴指標Ｘnと歌唱音声Ｖの聴覚印象を示す印象指標Ｙmとの関係を規定する関連式Ｆmを利用して、歌唱音声Ｖの聴覚印象（印象指標Ｙ1〜ＹM）が特定される。したがって、例えば模範的な歌唱を示す基準値と歌唱音声Ｖの特徴指標Ｘnとの差異のみに着目して歌唱の巧拙を評価する特許文献１や特許文献２の技術と比較して、歌唱音声Ｖの受聴者が実際に感取する主観的な印象を適切に評価することが可能である。 As described above, in the first embodiment, the singing is performed using the relational expression Fm that defines the relationship between each feature index Xn extracted from the singing voice V and the impression index Ym indicating the auditory impression of the singing voice V. Auditory impressions (impression indices Y1 to YM) of the voice V are specified. Therefore, for example, the singing voice V is compared with the techniques of Patent Document 1 and Patent Document 2 that evaluate the skill of the singing by focusing only on the difference between the reference value indicating the exemplary singing and the characteristic index Xn of the singing voice V. It is possible to appropriately evaluate the subjective impression that the listener actually takes.

ところで、参照データ群ＤRの複数の参照データｒのみを解析することで印象指標ｙmと特徴指標ｘnとの相関の傾向を統計的に解析して関連式Ｆmを設定する構成（以下「対比例」という）も想定され得る。すなわち、対比例では関連式Ｆmの設定に関係性記述データＤCが利用されない。しかし、対比例では、実際には聴覚印象ＥYmに相関しない特定の音響特徴ＥXが潜在的な要因に起因して恰も聴覚印象ＥYmに相関するかのように認識される見掛け上の関係（疑似相関）の影響で、実際には印象指標Ｙmに相関しない特徴指標Ｘnが当該印象指標Ｙmに優勢に影響するような関連式Ｆmが導出される可能性がある。他方、第１実施形態では、各聴覚印象ＥYmと各音響特徴ＥXとの仮説的な対応関係λmを規定する関係性記述データＤCが参照データ群ＤRとともに関連式Ｆmの設定に利用されるから、聴覚印象ＥYmと音響特徴ＥXとの疑似相関の影響が低減（理想的には排除）される。したがって、聴覚印象ＥYmと各音響特徴ＥXとの実際の相関を適切に表現した関連式Ｆmを設定できるという利点がある。第１実施形態では、聴覚印象ＥYmに関連する複数の中間要素ＥZを介した聴覚印象ＥYmと各音響特徴ＥXとの対応関係λmが関係性記述データＤCで規定されるから、聴覚印象ＥYmと各音響特徴ＥXとを直接的に相関させた構成（対応関係λmが聴覚印象ＥYmおよび音響特徴ＥXのみを包含する構成）と比較して、聴覚印象ＥYmと各音響特徴ＥXとの実際の相関を関連式Ｆmで適切に表現できるという前述の効果は格別に顕著である。 By the way, by analyzing only a plurality of reference data r of the reference data group DR, the tendency of the correlation between the impression index ym and the feature index xn is statistically analyzed to set the related expression Fm (hereinafter referred to as “proportional”). Can also be envisaged. That is, in the proportionality, the relationship description data DC is not used for setting the relational expression Fm. In contrast, however, a specific acoustic feature EX that does not actually correlate with the auditory impression EYm is recognized as if it is correlated with the auditory impression EYm due to a potential factor (pseudo-correlation). ), There is a possibility that a relational expression Fm is derived in which the characteristic index Xn that does not actually correlate with the impression index Ym has a dominant influence on the impression index Ym. On the other hand, in the first embodiment, the relationship description data DC defining the hypothetical correspondence λm between each auditory impression EYm and each acoustic feature EX is used together with the reference data group DR for setting the relational expression Fm. The influence of the pseudo correlation between the auditory impression EYm and the acoustic feature EX is reduced (ideally excluded). Therefore, there is an advantage that the relational expression Fm appropriately expressing the actual correlation between the auditory impression EYm and each acoustic feature EX can be set. In the first embodiment, since the correspondence relationship λm between the auditory impression EYm and each acoustic feature EX via a plurality of intermediate elements EZ related to the auditory impression EYm is defined by the relationship description data DC, the auditory impression EYm and each Compared with the configuration in which the acoustic feature EX is directly correlated (the configuration in which the correspondence λm includes only the auditory impression EYm and the acoustic feature EX), the actual correlation between the auditory impression EYm and each acoustic feature EX is related The above-described effect of being able to be appropriately expressed by the formula Fm is particularly remarkable.

＜第１実施形態の変形例＞
前述の説明では、複数の参照データｒが記憶装置１２に事前に記憶された場合を例示したが、以下に例示される通り、収音装置１６が収音した歌唱音声Ｖを参照音とした新規な参照データｒを利用して各関連式Ｆmを更新することも可能である。 <Modification of First Embodiment>
In the above description, a case where a plurality of reference data r is stored in advance in the storage device 12 is exemplified. However, as exemplified below, a new singing voice V collected by the sound collection device 16 is used as a reference sound. It is also possible to update each relational expression Fm using the reference data r.

利用者（歌唱音声Ｖの発声者や受聴者）は、楽曲の終了後に、入力装置１４を適宜に操作することで歌唱音声Ｖの聴覚印象を指定する。例えば、Ｍ種類の聴覚印象の各々について印象指標Ｙmの複数の選択肢（複数段階の評価）が表示装置１８に表示され、利用者は、聴覚印象毎に所望の１個の選択肢を指定する。 A user (speaker or listener of the singing voice V) designates an auditory impression of the singing voice V by appropriately operating the input device 14 after the music is finished. For example, for each of the M types of auditory impressions, a plurality of options (multiple levels of evaluation) of the impression index Ym are displayed on the display device 18, and the user specifies one desired option for each auditory impression.

図１に破線の矢印で図示される通り、関連式設定部４０は、利用者が指定した各聴覚印象の印象指標ｙm（ｙ1〜ｙm）と歌唱音声Ｖについて特徴抽出部２２が抽出した各特徴指標ｘn（ｘ1〜ｘN）とを含む参照データｒを取得して記憶装置１２に格納する。そして、関連式設定部４０は、歌唱音声Ｖに応じた新規な参照データｒを包含する参照データ群ＤRを利用して、第１実施形態と同様の方法で関連式Ｆm（Ｆ1〜ＦM）を設定および記憶する。すなわち、収音装置１６が収音した歌唱音声Ｖの聴覚印象（印象指標ｙm）と音響特徴（特徴指標ｘn）との関係を反映した内容に既定の関連式Ｆm（Ｆ1〜ＦM）が更新される。以上の構成によれば、関連式Ｆ1〜ＦMを、実際の歌唱音声Ｖの聴覚印象と音響特徴との関係を反映した内容に更新できるという利点がある。なお、参照データ群ＤRを利用した関連式Ｆmの設定（更新）の時期は任意である。例えば、歌唱音声Ｖに応じた参照データｒの取得毎に関連式Ｆmを更新する構成や、新規な参照データｒが所定数だけ蓄積された場合に関連式Ｆmを更新する構成が採用され得る。また、以上に例示した変形例は、以降に例示する各実施形態にも同様に適用され得る。 1, the relational expression setting unit 40 extracts each feature extracted by the feature extraction unit 22 for the impression index ym (y1 to ym) and the singing voice V of each auditory impression specified by the user. Reference data r including the index xn (x1 to xN) is acquired and stored in the storage device 12. Then, the relational expression setting unit 40 uses the reference data group DR including the new reference data r according to the singing voice V to obtain the relational expression Fm (F1 to FM) in the same manner as in the first embodiment. Set and remember. That is, the predetermined relational expression Fm (F1 to FM) is updated to reflect the relationship between the auditory impression (impression index ym) and the acoustic feature (feature index xn) of the singing voice V collected by the sound collection device 16. The According to the above configuration, there is an advantage that the relational expressions F1 to FM can be updated to contents reflecting the relationship between the auditory impression of the actual singing voice V and the acoustic features. The timing for setting (updating) the relational expression Fm using the reference data group DR is arbitrary. For example, a configuration in which the related formula Fm is updated each time the reference data r corresponding to the singing voice V is acquired, or a configuration in which the related formula Fm is updated when a predetermined number of new reference data r is accumulated can be adopted. Moreover, the modification illustrated above can be similarly applied to each embodiment illustrated below.

＜第２実施形態＞
本発明の第２実施形態を説明する。なお、以下に例示する各形態において作用や機能が第１実施形態と同様である要素については、第１実施形態の説明で参照した符号を流用して各々の詳細な説明を適宜に省略する。 Second Embodiment
A second embodiment of the present invention will be described. In addition, about the element which an effect | action and function are the same as that of 1st Embodiment in each form illustrated below, the reference | standard referred by description of 1st Embodiment is diverted, and each detailed description is abbreviate | omitted suitably.

図５は、第２実施形態の音響解析装置１００Bの構成図である。図５に例示される通り、第２実施形態の音響解析装置１００Bは、第１実施形態と同様の要素（特徴抽出部２２，印象特定部２４，提示処理部２６，関連式設定部４０）に情報生成部３２を追加した構成である。情報生成部３２は、印象特定部２４が第１実施形態と同様に特定した聴覚印象（Ｍ個の印象指標Ｙ1〜ＹM）に応じた提示データＱAを生成する。すなわち、情報生成部３２は、Ｍ個の印象指標Ｙ1〜ＹMを提示データＱAに変換する要素とも換言され得る。第２実施形態の提示処理部２６は、情報生成部３２が生成した提示データＱAを利用者に提示する。具体的には、提示処理部２６は、提示データＱAの内容を表示装置１８に表示させる。特徴抽出部２２によるＮ個の特徴指標Ｘ1〜ＸNの抽出や関連式設定部４０によるＭ個の関連式Ｆ1〜ＦMの設定は第１実施形態と同様である。したがって、第２実施形態においても第１実施形態と同様の効果が実現される。 FIG. 5 is a configuration diagram of the acoustic analysis device 100B of the second embodiment. As illustrated in FIG. 5, the acoustic analysis device 100 B according to the second embodiment includes the same elements as the first embodiment (the feature extraction unit 22, the impression identification unit 24, the presentation processing unit 26, and the related expression setting unit 40). The information generation unit 32 is added. The information generation unit 32 generates presentation data QA corresponding to the auditory impressions (M impression indexes Y1 to YM) specified by the impression specification unit 24 in the same manner as in the first embodiment. In other words, the information generation unit 32 can be rephrased as an element that converts the M impression indexes Y1 to YM into the presentation data QA. The presentation processing unit 26 of the second embodiment presents the presentation data QA generated by the information generation unit 32 to the user. Specifically, the presentation processing unit 26 causes the display device 18 to display the contents of the presentation data QA. The extraction of N feature indices X1 to XN by the feature extraction unit 22 and the setting of M relational expressions F1 to FM by the relational expression setting unit 40 are the same as in the first embodiment. Therefore, the same effects as those of the first embodiment are realized in the second embodiment.

第２実施形態の情報生成部３２は、印象特定部２４が特定したＭ個の印象指標Ｙ1〜ＹM（歌唱スタイル情報Ｓ）に応じた楽曲の関連データｄAを提示データＱAとして生成する。具体的には、情報生成部３２は、Ｍ個の印象指標Ｙ1〜ＹMに応じた楽曲を複数の候補から検索し、当該楽曲の関連データｄAを取得する。関連データｄAは、楽曲に関連する情報である。例えば、楽曲の識別情報（例えば楽曲番号）のほか楽曲名や歌手名やジャンル等の属性情報が関連データｄAに包含される。 The information generation unit 32 of the second embodiment generates music related data dA corresponding to the M impression indexes Y1 to YM (singing style information S) specified by the impression specifying unit 24 as the presentation data QA. Specifically, the information generation unit 32 searches a plurality of candidates for music corresponding to the M impression indexes Y1 to YM, and acquires related data dA of the music. The related data dA is information related to music. For example, in addition to music identification information (for example, music number), attribute information such as music name, singer name, and genre is included in the related data dA.

情報生成部３２による楽曲の検索（関連データｄAの生成）には、記憶装置１２に記憶された検索用データＷAが利用される。検索用データＷAは、歌唱スタイル情報Ｓ（Ｍ個の印象指標Ｙ1〜ＹM）と楽曲との関係を規定する。具体的には、第２実施形態の検索用データＷAは、相異なる歌唱スタイルに対応する複数のクラスＣL（ＣL1，ＣL2，……）の各々について楽曲の関連データｄA（ｄA1，ｄA2，……）を指定する。 The search data WA stored in the storage device 12 is used for searching for music by the information generating unit 32 (generating related data dA). The search data WA defines the relationship between the singing style information S (M impression indices Y1 to YM) and music. Specifically, the search data WA of the second embodiment is related to music related data dA (dA1, dA2,...) For each of a plurality of classes CL (CL1, CL2,...) Corresponding to different singing styles. ) Is specified.

具体的には、任意の楽曲の歌唱音声Ｖから生成された多数の歌唱スタイル情報Ｓが複数のクラスＣLに分類され、任意の１個のクラスＣLに分類された各歌唱スタイル情報Ｓの歌唱音声Ｖにて例えば歌唱回数が最多である１個の楽曲の関連データｄAが検索用データＷAにて当該クラスＣLに指定される。すなわち、任意の１種類の歌唱スタイルに対応するクラスＣLについては、多数の歌唱者が当該歌唱スタイルで歌唱する傾向がある楽曲の関連データｄAが指定される。歌唱スタイル情報Ｓの分類には公知の統計処理（クラスタリング）が任意に採用され、複数のクラスＣLは、例えば、各クラスＣLに属する歌唱スタイル情報Ｓの分布を近似する混合正規分布で表現される。 Specifically, a large number of singing style information S generated from the singing voice V of an arbitrary piece of music is classified into a plurality of classes CL, and the singing voice of each singing style information S classified into an arbitrary one class CL. In V, for example, the related data dA of one piece of music having the highest number of singing is designated as the class CL in the search data WA. That is, for the class CL corresponding to any one kind of singing style, the related data dA of the music that many singers tend to sing in the singing style is designated. For the classification of the singing style information S, known statistical processing (clustering) is arbitrarily adopted, and the plurality of classes CL are expressed by, for example, a mixed normal distribution that approximates the distribution of the singing style information S belonging to each class CL. .

情報生成部３２は、検索用データＷAに登録された複数のクラスＣLのうち印象特定部２４が生成した歌唱スタイル情報Ｓ（Ｍ個の印象指標Ｙ1〜ＹM）が属する１個のクラスＣLを特定し、検索用データＷAにて当該クラスＣLに指定された楽曲の関連データｄAを提示データＱAとして選択する。提示処理部２６は、情報生成部３２が生成した提示データＱA（関連データｄA）を表示装置１８に表示させる。すなわち、楽曲の識別情報や属性情報が表示装置１８に表示される。以上の説明から理解される通り、歌唱音声Ｖと同様の歌唱スタイルで多数の歌唱者が歌唱する傾向がある楽曲（すなわち、歌唱音声Ｖと同様の歌唱スタイルで歌唱し易い楽曲）が利用者に提示される。 The information generating unit 32 specifies one class CL to which the singing style information S (M impression indexes Y1 to YM) generated by the impression specifying unit 24 among a plurality of classes CL registered in the search data WA belongs. Then, the related data dA of the music designated as the class CL in the search data WA is selected as the presentation data QA. The presentation processing unit 26 causes the display device 18 to display the presentation data QA (related data dA) generated by the information generation unit 32. That is, music identification information and attribute information are displayed on the display device 18. As will be understood from the above description, a song that tends to be sung by many singers in the same singing style as the singing voice V (that is, a tune that is easy to sing in the same singing style as the singing voice V) is given to the user. Presented.

また、第２実施形態の情報生成部３２は、収音装置１６が収音した歌唱音声Ｖを利用して検索用データＷAを更新することが可能である。具体的には、情報生成部３２は、任意の１個の楽曲の歌唱音声Ｖから生成された歌唱スタイル情報Ｓを当該楽曲の関連データｄAとともに記憶装置１２に順次に蓄積し、記憶装置１２に蓄積された歌唱スタイル情報Ｓと関連データｄAの関係が反映されるように例えば公知の機械学習により検索用データＷAを更新する。以上の説明から理解される通り、第２実施形態では、楽曲自体の特性（曲調等）を基準に楽曲が検索されるのではなく、多数の歌唱者が過去に歌唱した歌唱スタイルを基準に楽曲が検索される。例えば、歌唱音声Ｖの聴覚印象が「情熱的で明るい歌唱」であれば、同様に「情熱的で明るい歌唱」の歌唱スタイルで多数の歌唱者が過去に歌唱した楽曲が検索される。 Further, the information generation unit 32 of the second embodiment can update the search data WA using the singing voice V collected by the sound collection device 16. Specifically, the information generation unit 32 sequentially accumulates the singing style information S generated from the singing voice V of any one piece of music together with the related data dA of the piece of music in the storage device 12 and stores it in the storage device 12. The search data WA is updated by known machine learning, for example, so that the relationship between the accumulated singing style information S and the related data dA is reflected. As will be understood from the above description, in the second embodiment, music is not searched based on the characteristics (musical tone, etc.) of the music itself, but based on singing styles sung by many singers in the past. Is searched. For example, if the auditory impression of the singing voice V is “passionate and bright singing”, similarly, songs sung by many singers in the past with the singing style of “passionate and bright singing” are searched.

ところで、利用者の要求に適合する楽曲を検索する技術は従来から提案されている。例えば特開２０１１−１９７３４５号公報には、利用者が指定したキーワードに対応する楽曲を検索して利用者に提示する技術が開示されている。しかし、以上の技術では、利用者が指定したキーワードに形式的に関連する楽曲が検索されるに過ぎない。第２実施形態では、印象特定部２４が特定した歌唱音声Ｖの聴覚印象（歌唱スタイル）に応じた楽曲が利用者に提示されるから、歌唱音声Ｖを発声した利用者が自身の歌唱スタイルに適合する楽曲（自身の歌唱スタイルで歌唱し易い楽曲）を認識できるという利点がある。 By the way, a technique for searching for music that matches a user's request has been proposed. For example, Japanese Patent Application Laid-Open No. 2011-197345 discloses a technique of searching for music corresponding to a keyword designated by the user and presenting it to the user. However, the above technique only searches for music that is formally related to the keyword specified by the user. In the second embodiment, music corresponding to the auditory impression (singing style) of the singing voice V specified by the impression specifying unit 24 is presented to the user, so that the user who uttered the singing voice V uses his / her singing style. There is an advantage that it is possible to recognize compatible music (music that is easy to sing in its own singing style).

＜第２実施形態の変形例＞
（１）以上の説明では、歌唱スタイル情報Ｓの複数のクラスＣLの各々について検索用データＷAが１個の楽曲を指定する構成を例示したが、任意の１個のクラスＣLについて、当該クラスＣLに分類された歌唱スタイルの各歌唱音声Ｖにて歌唱された複数の楽曲の関連データｄAを指定することも可能である。情報生成部３２は、印象特定部２４が特定した歌唱スタイル情報Ｓが属する１個のクラスＣLに指定された複数の楽曲の関連データｄAを提示データＱAとして生成する。すなわち、歌唱音声Ｖと同様の歌唱スタイルで歌唱される傾向がある複数の楽曲が利用者に提示される。 <Modification of Second Embodiment>
(1) In the above description, the configuration in which the search data WA specifies one piece of music for each of the plurality of classes CL of the singing style information S has been exemplified. However, for any one class CL, the class CL It is also possible to specify related data dA of a plurality of music pieces sung by each singing voice V of the singing style classified as “1”. The information generation unit 32 generates, as the presentation data QA, related data dA of a plurality of pieces of music specified in one class CL to which the singing style information S specified by the impression specifying unit 24 belongs. That is, a plurality of music pieces that tend to be sung in the same singing style as the singing voice V are presented to the user.

（２）歌唱スタイル情報Ｓが属する１個のクラスＣLに指定された複数の楽曲のうち１個の楽曲の関連データｄAを選択的に提示することも可能である。１個のクラスＣLに指定された複数の楽曲から１個の楽曲を選択する条件は任意であるが、例えば、当該クラスＣLに分類された各歌唱音声Ｖでの歌唱回数が最多である楽曲を選択する構成や、当該クラスＣLに指定された複数の楽曲のうち利用者からの指示（例えば利用者が指定した「９０年代」等の選択条件や利用者の年齢等の属性情報）に応じた１個の楽曲を選択する構成が好適である。 (2) It is also possible to selectively present related data dA of one piece of music among a plurality of pieces of music designated in one class CL to which the singing style information S belongs. The condition for selecting one piece of music from a plurality of pieces designated for one class CL is arbitrary. For example, a piece of music having the largest number of times of singing with each singing voice V classified into the class CL is selected. According to the configuration to be selected and instructions from the user (for example, selection conditions such as “90s” specified by the user and attribute information such as the user's age) among the plurality of songs specified in the class CL A configuration in which one piece of music is selected is preferable.

（３）歌唱音声Ｖの歌唱スタイル情報Ｓと当該歌唱音声Ｖの楽曲との関係が反映されるように検索用データＷAを随時に更新することも可能である。もっとも、聴覚印象が適切でない歌唱音声Ｖの歌唱スタイル情報Ｓが検索用データＷAに反映されると、楽曲の適切な検索が阻害される可能性がある。そこで、検索用データＷAに反映させる歌唱スタイル情報Ｓを選別する構成が好適である。例えば、歌唱音声Ｖの実際の聴覚印象の適否を利用者（発声者や受聴者）が入力装置１４の操作で指定し、聴覚印象が適切と判定された歌唱音声Ｖについては歌唱スタイル情報Ｓと楽曲との関係が検索用データＷAに反映され、聴覚印象が不適切と判定された歌唱音声Ｖの歌唱スタイル情報Ｓは検索用データＷAに反映されない。以上の構成によれば、多数の歌唱者の歌唱スタイルを反映した検索用データＷAを生成できるという利点がある。 (3) The search data WA can be updated at any time so that the relationship between the singing style information S of the singing voice V and the music of the singing voice V is reflected. However, if the singing style information S of the singing voice V whose hearing impression is not appropriate is reflected in the search data WA, an appropriate search for music may be hindered. Therefore, it is preferable to select the singing style information S to be reflected in the search data WA. For example, the user (speaker or listener) specifies whether or not the actual auditory impression of the singing voice V is appropriate by operating the input device 14, and the singing style information S and the singing voice V for which the auditory impression is determined to be appropriate. The relationship with the music is reflected in the search data WA, and the singing style information S of the singing voice V for which the auditory impression is determined to be inappropriate is not reflected in the search data WA. According to the above structure, there exists an advantage that the search data WA reflecting the singing style of many singers can be generated.

＜第３実施形態＞
図６は、第３実施形態の音響解析装置１００Cの構成図である。図６に例示される通り、第３実施形態の音響解析装置１００Cは、第２実施形態の音響解析装置１００B（図５）と同様に、印象特定部２４が特定したＭ個の印象指標Ｙ1〜ＹMに応じた提示データＱAを生成する情報生成部３２を第１実施形態に追加した構成である。特徴抽出部２２によるＮ個の特徴指標Ｘ1〜ＸNの抽出や関連式設定部４０によるＭ個の関連式Ｆ1〜ＦMの設定は第１実施形態と同様である。したがって、第３実施形態においても第１実施形態と同様の効果が実現される。 <Third Embodiment>
FIG. 6 is a configuration diagram of an acoustic analysis device 100C according to the third embodiment. As illustrated in FIG. 6, the acoustic analysis device 100 C of the third embodiment is similar to the acoustic analysis device 100 B (FIG. 5) of the second embodiment in that M impression indexes Y 1 to Y specified by the impression specification unit 24. In this configuration, an information generation unit 32 that generates presentation data QA corresponding to YM is added to the first embodiment. The extraction of N feature indices X1 to XN by the feature extraction unit 22 and the setting of M relational expressions F1 to FM by the relational expression setting unit 40 are the same as in the first embodiment. Therefore, the third embodiment can achieve the same effect as the first embodiment.

第３実施形態の記憶装置１２は、聴覚印象（Ｍ個の印象指標Ｙ1〜ＹM）を表象する複数の画像データｄBを記憶する。各画像データｄBは、聴覚印象を比喩的ないし模式的に表象する画像（記号や文字を含む）を表現する。例えば、印象特定部２４が特定した聴覚印象に適合したキャラクタ（動物等）や有名人等の画像が画像データｄBとして好適である。第２実施形態の情報生成部３２は、記憶装置１２に記憶された複数の画像データｄBのうち印象特定部２４が特定した聴覚印象（Ｍ個の印象指標Ｙ1〜ＹM）を表象する画像データｄBを提示データＱAとして選択する。情報生成部３２による画像データｄBの選択には、記憶装置１２に記憶された変換用データＷBが利用される。 The storage device 12 according to the third embodiment stores a plurality of image data dB representing an auditory impression (M impression indices Y1 to YM). Each image data dB represents an image (including symbols and characters) representing the auditory impression figuratively or schematically. For example, an image of a character (animal or the like) or a celebrity suitable for the auditory impression specified by the impression specifying unit 24 is suitable as the image data dB. The information generating unit 32 of the second embodiment displays image data dB representing the auditory impression (M impression indices Y1 to YM) specified by the impression specifying unit 24 among the plurality of image data dB stored in the storage device 12. Is selected as the presentation data QA. For the selection of the image data dB by the information generation unit 32, the conversion data WB stored in the storage device 12 is used.

変換用データＷBは、Ｍ個の印象指標Ｙ1〜ＹM（歌唱スタイル情報Ｓ）と画像データｄBとの関係を規定する。具体的には、変換用データＷBは、Ｍ個の印象指標Ｙ1〜ＹMと画像データｄBとの相関を規定する構造方程式を表現する。各構造方程式の設定には、第１実施形態で例示したＭ個の関連式Ｆ1〜ＦMの設定と同様に、例えば構造方程式モデリング（SEM）が好適に利用される。すなわち、例えば、Ｍ個の印象指標Ｙ1〜ＹMと画像データｄBとを相互に対応させた複数の学習データ（学習データ）と、Ｍ種類の聴覚印象と画像データｄBとの対応関係を規定する関係性記述データとを利用した構造方程式モデリングで、Ｍ個の印象指標Ｙ1〜ＹMと画像データｄBとの関係を規定する構造方程式が事前に設定され、変換用データＷBとして記憶装置１２に格納される。例えば、長幼に関する印象指標Ｙmが子供っぽい歌唱を示す場合には子供っぽいキャラクタの画像を示す画像データｄBが生成され、明暗に関する印象指標Ｙmが明るい歌唱を示す場合には明るい表情のキャラクタの画像を示す画像データｄBが生成されるように、変換用データＷBが設定および記憶される。 The conversion data WB defines the relationship between the M impression indexes Y1 to YM (singing style information S) and the image data dB. Specifically, the conversion data WB expresses a structural equation that defines the correlation between the M impression indexes Y1 to YM and the image data dB. For the setting of each structural equation, for example, structural equation modeling (SEM) is preferably used as in the setting of M related equations F1 to FM exemplified in the first embodiment. That is, for example, a plurality of learning data (learning data) in which M impression indices Y1 to YM and image data dB are associated with each other, and a relationship that defines a correspondence relationship between M types of auditory impressions and image data dB. In the structural equation modeling using the sex description data, the structural equation defining the relationship between the M impression indexes Y1 to YM and the image data dB is set in advance and stored in the storage device 12 as the conversion data WB. . For example, when the impression index Ym regarding a young child indicates a child-like song, image data dB indicating an image of a child-like character is generated, and when the impression index Ym regarding light and dark indicates a bright song, a character with a bright expression is generated. Conversion data WB is set and stored so that image data dB representing an image is generated.

情報生成部３２は、印象特定部２４が特定したＭ個の印象指標Ｙ1〜ＹMを変換用データＷBの構造方程式に適用することで画像データｄBを特定し、記憶装置１２に記憶された当該画像データｄBを提示データＱAとして取得する。提示処理部２６は、情報生成部３２が生成した提示データＱAを表示装置１８に表示させる。以上の説明から理解される通り、歌唱音声Ｖの聴覚印象（歌唱スタイル）を比喩的または模式的に表象する画像が表示装置１８に表示される。利用者は、表示装置１８に表示された画像を視認することで、歌唱音声Ｖの聴覚印象を視覚的および直観的に把握することが可能である。 The information generation unit 32 specifies the image data dB by applying the M impression indices Y1 to YM specified by the impression specification unit 24 to the structural equation of the conversion data WB, and the image stored in the storage device 12 Data dB is acquired as presentation data QA. The presentation processing unit 26 causes the display device 18 to display the presentation data QA generated by the information generation unit 32. As understood from the above description, an image representing the auditory impression (singing style) of the singing voice V figuratively or schematically is displayed on the display device 18. The user can visually and intuitively grasp the auditory impression of the singing voice V by visually recognizing the image displayed on the display device 18.

ところで、例えば特開２００２−０４１０６３号公報には、楽曲名や歌唱回数や採点結果等の情報に応じたキャラクタの画像を表示する技術が開示されている。しかし、以上の技術では、歌唱音声の聴覚印象とは無関係な画像が表示されるに過ぎない。第３実施形態では、印象特定部２４が特定した歌唱音声Ｖの聴覚印象に応じた画像（歌唱スタイルを表象する画像）が提示されるから、歌唱音声Ｖの聴覚印象を利用者が直観的に把握できるという利点や利用者に興趣性を提供できるという利点がある。 By the way, for example, Japanese Patent Application Laid-Open No. 2002-041063 discloses a technique for displaying an image of a character according to information such as a song name, the number of singings, and a scoring result. However, the above technique only displays an image unrelated to the auditory impression of the singing voice. In the third embodiment, an image (an image representing the singing style) corresponding to the auditory impression of the singing voice V specified by the impression specifying unit 24 is presented, so that the user intuitively views the auditory impression of the singing voice V. There are advantages of being able to grasp and providing interest to users.

なお、単純に歌唱音声Ｖの特性に応じた画像データｄBを選択するならば、歌唱音声Ｖの各特徴指標Ｘnと画像データｄBとの間の直接的な関係を事前に決定し、特徴抽出部２２が抽出した各特徴指標Ｘnに応じた画像データｄBを選択する構成も想定され得る。しかし、各特徴指標Ｘnの具体的な数値が何れの聴覚印象に対応するのかを把握することは困難であるから、歌唱音声Ｖの聴覚印象を表象する画像データｄBを特徴指標Ｘnに適切に対応させることは実際には困難である。第３実施形態では、聴覚印象（Ｍ個の印象指標Ｙ1〜ＹM）と画像データｄBとの関係が変換用データＷBで規定されるから、各聴覚印象に相応しい画像データｄBを変換用データＷBにて各印象指標Ｙmに対応させることが可能である。また、関連式Ｆ1〜ＦMとは独立に、聴覚印象と画像データｄBとの関係を変更できるという利点もある。 If the image data dB corresponding to the characteristics of the singing voice V is simply selected, a direct relationship between each feature index Xn of the singing voice V and the image data dB is determined in advance, and a feature extracting unit is selected. A configuration in which image data dB corresponding to each feature index Xn extracted by 22 can be assumed. However, since it is difficult to grasp which auditory impression corresponds to a specific numerical value of each feature index Xn, image data dB representing the auditory impression of the singing voice V is appropriately associated with the feature index Xn. It is actually difficult to do. In the third embodiment, since the relationship between the auditory impression (M impression indices Y1 to YM) and the image data dB is defined by the conversion data WB, the image data dB suitable for each auditory impression is used as the conversion data WB. It is possible to correspond to each impression index Ym. There is also an advantage that the relationship between the auditory impression and the image data dB can be changed independently of the relational expressions F1 to FM.

＜第３実施形態の変形例＞
（１）歌唱音声Ｖの聴覚印象に応じた複数の画像データｄBを提示データＱAとして生成することも可能である。具体的には、記憶装置１２に記憶された複数の画像データｄBが複数（Ｋ個）のグループ（カテゴリ）に分類され、相異なるグループから選択したＫ個の画像データｄBを含む提示データＱAを情報生成部３２が生成する。図７に例示される通り、各グループは、特定の物品を構成する各要素に対応する。例えば、「トッピング」と「クリーム」と「ベース」とを要素として構成される「ケーキ」を想定すると（Ｋ＝３）、「トッピング」の各画像を示す複数の画像データｄBと、「クリーム」の各画像を示す複数の画像データｄBと、「ベース」の各画像を示す複数の画像データｄBとが記憶装置１２に記憶される。 <Modification of Third Embodiment>
(1) It is also possible to generate a plurality of image data dB corresponding to the auditory impression of the singing voice V as the presentation data QA. Specifically, a plurality of image data dB stored in the storage device 12 is classified into a plurality (K) groups (categories), and presentation data QA including K image data dB selected from different groups is displayed. Generated by the information generator 32. As illustrated in FIG. 7, each group corresponds to each element constituting a specific article. For example, assuming “cake” having “topping”, “cream” and “base” as elements (K = 3), a plurality of image data dB indicating each image of “topping”, and “cream” A plurality of image data dB indicating each of the images and a plurality of image data dB indicating each “base” image are stored in the storage device 12.

特徴抽出部２２によるＮ個の特徴指標Ｘ1〜ＸNの抽出と印象特定部２４によるＭ個の印象指標Ｙ1〜ＹMの特定とが、歌唱音声Ｖを時間軸上で区分したＫ個の単位区間の各々について順次に実行される。歌唱音声Ｖを複数の単位区間に区分する方法は任意であるが、例えば図７に例示される通り、楽曲の音楽的な意味に応じて歌唱音声Ｖを複数の単位区間（Ａ〜Ｃメロ，サビ１，サビ２）に区分することが可能である。Ｋ個の単位区間の各々は画像データｄBの１個のグループに対応する。情報生成部３２は、歌唱音声ＶのＫ個の単位区間の各々について、当該単位区間に対応するグループの複数の画像データｄBのうち当該単位区間のＭ個の印象指標Ｙ1〜ＹMに応じた１個の画像データｄBを選択する。すなわち、歌唱音声Ｖの単位区間毎に１個の画像データｄBが選択され、最終的には、相異なる単位区間に対応するＫ個の画像データｄBを含む提示データＱAが生成される。具体的には、歌唱音声Ｖのうち「Ａ〜Ｃメロ」の単位区間から特定されたＭ個の印象指標Ｙ1〜ＹMに応じて「トッピング」のグループから１個の画像データｄB（図７の例示では「イチゴ」の画像）が選択され、「サビ１」の単位区間のＭ個の印象指標Ｙ1〜ＹMに応じて「クリーム」のグループから１個の画像データｄB（図７の例示では「ホイップクリーム」の画像）が選択され、「サビ２」の単位区間のＭ個の印象指標Ｙ1〜ＹMに応じて「ベース」のグループから１個の画像データｄB（図７の例示では「円盤状のスポンジ」の画像）が選択される。 The extraction of the N feature indexes X1 to XN by the feature extraction unit 22 and the specification of the M impression indexes Y1 to YM by the impression specification unit 24 are the K unit sections obtained by dividing the singing voice V on the time axis. Each is executed sequentially. Although the method of dividing the singing voice V into a plurality of unit sections is arbitrary, for example, as illustrated in FIG. 7, the singing voice V is divided into a plurality of unit sections (A to C melos, It is possible to classify into rust 1 and rust 2). Each of the K unit intervals corresponds to one group of image data dB. For each of the K unit sections of the singing voice V, the information generating unit 32 selects 1 corresponding to M impression indexes Y1 to YM of the unit section among the plurality of image data dB of the group corresponding to the unit section. Pieces of image data dB are selected. That is, one image data dB is selected for each unit section of the singing voice V, and finally, presentation data QA including K image data dB corresponding to different unit sections is generated. Specifically, in the singing voice V, one piece of image data dB (see FIG. 7) from the “topping” group according to the M impression indexes Y1 to YM specified from the unit section of “A to C melody”. In the example, “strawberry” image) is selected, and one piece of image data dB from the “cream” group (“in the example of FIG. 7,“ in the illustration of FIG. 7 ”) according to the M impression indexes Y1 to YM in the unit section of“ rust 1 ”. "Whipped cream" image) is selected, and one piece of image data dB ("disc-like" in the example of FIG. 7) from the "base" group according to the M impression indices Y1 to YM of the unit section of "Rust 2" "Sponge" image) is selected.

提示処理部２６は、楽曲の歌唱の終了後に、提示データＱAに包含されるＫ個の画像データｄBを組合せた画像を表示装置１８に表示させる。具体的には、図７に例示される通り、「トッピング」の画像データｄBと「クリーム」の画像データｄBと「ベース」の画像データｄBとを組合せた「ケーキ」の画像が表示装置１８に表示される。各単位区間の画像データｄBは当該単位区間の聴覚印象に応じて選択されるから、提示データＱAに応じて表示される物品の画像の内容（物品を構成する各要素の態様）は各単位区間の聴覚印象に応じて変化する。したがって、利用者に提示される画像が多様化されて興趣性を提供することが可能である。なお、複数の画像データｄBの組合せで表示される画像の内容は以上の例示（ケーキ）に限定されない。例えば、利用者を表象するアバター等のキャラクタを表示する構成では、キャラクタを構成する各要素（例えば衣服または髪型等の各要素や、顔を構成する目や口等の各要素）の画像を示す画像データｄBが歌唱音声Ｖの単位区間毎に選択される。 The presentation processing unit 26 causes the display device 18 to display an image obtained by combining the K pieces of image data dB included in the presentation data QA after the song singing is completed. Specifically, as illustrated in FIG. 7, an image of “cake”, which is a combination of “topping” image data dB, “cream” image data dB, and “base” image data dB, is displayed on the display device 18. Is displayed. Since the image data dB of each unit section is selected according to the auditory impression of the unit section, the content of the image of the article displayed according to the presentation data QA (the mode of each element constituting the article) is the unit section. It changes according to the auditory impression. Therefore, it is possible to provide interest by diversifying the images presented to the user. Note that the content of the image displayed by the combination of the plurality of image data dB is not limited to the above example (cake). For example, in a configuration in which a character such as an avatar representing the user is displayed, an image of each element constituting the character (for example, each element such as clothes or hairstyle, each element such as eyes and mouth constituting the face) is shown. The image data dB is selected for each unit section of the singing voice V.

なお、以上の例示では、歌唱音声Ｖを時間軸上で区分した単位区間毎に画像データｄBを選択したが、単位区間以外の要素毎に画像データｄBを選択することも可能である。例えば、情報生成部３２が、Ｍ種類の聴覚印象の各々について（すなわち聴覚印象毎に）当該聴覚印象の印象指標Ｙmに応じた画像データｄBを選択する構成も採用され得る。 In the above example, the image data dB is selected for each unit section obtained by dividing the singing voice V on the time axis. However, it is also possible to select the image data dB for each element other than the unit section. For example, a configuration in which the information generation unit 32 selects the image data dB corresponding to the impression index Ym of the auditory impression for each of M types of auditory impressions (that is, for each auditory impression) may be employed.

（２）事前に用意された複数の画像データｄBを複数のグループ（カテゴリ）に分類し、複数のグループのうち所定の条件で選択された１個のグループからＭ個の印象指標Ｙ1〜ＹMに応じた画像データｄBを情報生成部３２が提示データＱAとして選択することも可能である。１個のグループを選択する条件は任意であるが、例えば、複数のグループのうち利用者が入力装置１４に対する操作で指定したグループから画像データｄBを選択する構成や、複数のグループのうち利用者の属性情報（例えば年齢や性別等）に応じて選択したグループから画像データｄBを選択する構成が好適である。また、複数の利用者の属性情報に応じて画像データｄBのグループを選択することも可能である。 (2) A plurality of image data dB prepared in advance is classified into a plurality of groups (categories), and M impression indexes Y1 to YM are selected from one group selected under a predetermined condition among the plurality of groups. The corresponding image data dB can be selected by the information generation unit 32 as the presentation data QA. The condition for selecting one group is arbitrary. For example, a configuration in which the image data dB is selected from a group designated by an operation on the input device 14 by a user among a plurality of groups, or a user among a plurality of groups. A configuration in which the image data dB is selected from a group selected according to the attribute information (for example, age, sex, etc.) is suitable. It is also possible to select a group of image data dB according to the attribute information of a plurality of users.

（３）以上の説明では、各印象指標Ｙmと画像データｄBとの相関を規定する構造方程式を表現する変換用データＷBを例示したが、各印象指標Ｙmと画像データｄBとを相互に対応させたデータテーブルを変換用データＷBとして利用することも可能である。 (3) In the above description, the conversion data WB that represents the structural equation that defines the correlation between each impression index Ym and the image data dB has been exemplified. However, the impression index Ym and the image data dB are associated with each other. It is also possible to use the data table as the conversion data WB.

＜第４実施形態＞
図８は、第４実施形態の音響解析装置１００Dの構成図である。図８に例示される通り、第４実施形態の音響解析装置１００Dは、第２実施形態の音響解析装置１００B（図５）と同様に、Ｍ個の印象指標Ｙ1〜ＹMに応じた提示データＱAを生成する情報生成部３２を第１実施形態に追加した構成である。特徴抽出部２２によるＮ個の特徴指標Ｘ1〜ＸNの抽出や関連式設定部４０によるＭ個の関連式Ｆ1〜ＦMの設定は第１実施形態と同様である。したがって、第４実施形態においても第１実施形態と同様の効果が実現される。 <Fourth embodiment>
FIG. 8 is a configuration diagram of an acoustic analysis device 100D of the fourth embodiment. As illustrated in FIG. 8, the acoustic analysis device 100D of the fourth embodiment is similar to the acoustic analysis device 100B (FIG. 5) of the second embodiment, and the presentation data QA corresponding to the M impression indexes Y1 to YM. This is a configuration in which an information generation unit 32 for generating the information is added to the first embodiment. The extraction of N feature indices X1 to XN by the feature extraction unit 22 and the setting of M relational expressions F1 to FM by the relational expression setting unit 40 are the same as in the first embodiment. Therefore, the same effect as that of the first embodiment is realized in the fourth embodiment.

第４実施形態では、歌唱音声Ｖの聴覚印象の履歴を示す履歴データＨが利用者毎に記憶装置１２に記憶される。図８に例示される通り、履歴データＨは、利用者情報ｈAと印象履歴ｈBとを含んで構成される。利用者情報ｈAは、歌唱音声Ｖを発声した利用者の識別情報や属性情報（例えば年齢や性別）を包含する。印象履歴ｈBは、利用者の歌唱音声Ｖから印象特定部２４が過去に特定した各印象指標Ｙmの時系列である。歌唱音声ＶのＭ個の印象指標Ｙ1〜ＹM（歌唱スタイル情報Ｓ）を特定すると、印象特定部２４は、当該歌唱音声Ｖを発声した利用者の履歴データＨの印象履歴ｈBに当該印象指標Ｙ1〜ＹMを追加する。以上の説明から理解される通り、履歴データＨは、各利用者の歌唱スタイルの時間的な遷移を表現する時系列データとも換言され得る。 In 4th Embodiment, the log | history data H which shows the log | history of the hearing impression of the singing voice V are memorize | stored in the memory | storage device 12 for every user. As illustrated in FIG. 8, the history data H includes user information hA and impression history hB. The user information hA includes identification information and attribute information (for example, age and sex) of the user who uttered the singing voice V. The impression history hB is a time series of each impression index Ym specified by the impression specifying unit 24 in the past from the user's singing voice V. When M impression indexes Y1 to YM (singing style information S) of the singing voice V are specified, the impression specifying unit 24 adds the impression index Y1 to the impression history hB of the history data H of the user who uttered the singing voice V. Add ~ YM. As understood from the above description, the history data H can be rephrased as time-series data expressing temporal transition of each user's singing style.

第４実施形態の記憶装置１２は、利用者の性状を表現する複数の性状データｄCを記憶する。具体的には、性状データｄCは、利用者の性状を意味する文字列を表現する。利用者の性状とは、利用者の性質（気質，性格）や状態（例えば精神的または肉体的な状況）である。例えば、公知の性格分類（例えばクレッチマー気質分類，ユング分類，エニアグラム分類）で規定される複数の性格が性状データｄCで表現される。 The storage device 12 of the fourth embodiment stores a plurality of property data dC that represents the properties of the user. Specifically, the property data dC represents a character string that represents the user's property. A user's property is a user's property (temperament, character) and state (for example, mental or physical situation). For example, a plurality of personalities defined by known personality classifications (for example, Kretschmer temperament classification, Jung classification, Enneagram classification) are expressed by the property data dC.

第４実施形態の情報生成部３２は、利用者の歌唱音声Ｖについて印象特定部２４が過去に特定した聴覚印象に応じて当該利用者の性状を推定する。具体的には、情報生成部３２は、記憶装置１２に記憶された複数の性状データｄCのうち利用者の履歴データＨが示す聴覚印象の履歴に応じた性状データｄCを提示データＱAとして選択する。情報生成部３２による性状データｄCの選択（利用者の性状の推定）には、記憶装置１２に記憶された変換用データＷCが利用される。 The information generation part 32 of 4th Embodiment estimates the property of the said user according to the auditory impression which the impression specific | specification part 24 specified about the user's song voice V in the past. Specifically, the information generation unit 32 selects, as the presentation data QA, the property data dC corresponding to the auditory impression history indicated by the user history data H among the plurality of property data dC stored in the storage device 12. . The conversion data WC stored in the storage device 12 is used for the selection of the property data dC (estimation of the user's property) by the information generator 32.

変換用データＷCは、印象履歴ｈB（聴覚印象の時系列）と性状データｄCとの関係を規定する。具体的には、第４実施形態の変換用データＷCは、印象履歴ｈB（ｈB1，ｈB2，……）と性状データｄC（ｄC1，ｄC2，……）とを相互に対応させたデータテーブルである。例えば、図９に例示される通り、明暗に関する印象指標Ｙmの時系列において明暗（明るい／暗い）が交互に現れる印象履歴ｈBには、クレッチマー気質分類における「循環型気質」の性状データｄCが対応する。また、図１０に例示される通り、活動性（強勢な／静穏な）に関する印象指標Ｙmの時系列において強勢な（激しい）音声から静穏な音声に変化する印象履歴ｈBには「今日はお疲れですか」等の状態を示す性状データｄCが対応する。 The conversion data WC defines the relationship between the impression history hB (auditory impression time series) and the property data dC. Specifically, the conversion data WC of the fourth embodiment is a data table in which impression history hB (hB1, hB2,...) And property data dC (dC1, dC2,...) Are associated with each other. . For example, as illustrated in FIG. 9, the characteristic data dC of “circular temperament” in the Kretschmer temperament classification corresponds to the impression history hB in which light and dark (bright / dark) appear alternately in the time series of the impression index Ym related to light and dark. To do. In addition, as illustrated in FIG. 10, the impression history hB that changes from a strong (severe) voice to a quiet voice in the time series of the impression index Ym related to activity (strong / quiet) is “I am tired today. Corresponding to property data dC indicating a state such as “?”.

利用者は、入力装置１４に対する操作で自身の識別情報を指定したうえで楽曲を歌唱する。情報生成部３２は、記憶装置１２に記憶された複数の性状データｄCのうち、識別情報で特定される利用者の履歴データＨの印象履歴ｈBに変換用データＷCにて対応づけられた性状データｄCを提示データＱAとして特定する。提示処理部２６は、情報生成部３２が生成した提示データＱAを表示装置１８に表示させる。以上の説明から理解される通り、第４実施形態では、歌唱音声Ｖの聴覚印象を参照して利用者の性状を推定した結果（性状データｄC）が表示装置１８に表示される。利用者は、表示装置１８に表示された画像を視認することで、自身の性状の推定結果を確認することが可能である。第４実施形態では特に、歌唱音声Ｖの各印象指標Ｙmの時系列（印象履歴ｈB）を利用して発声者の性状が推定されるから、歌唱音声Ｖの聴覚印象の時間変化を加味した適切な性状を推定できるという利点がある。 The user sings the music after designating his / her identification information through an operation on the input device 14. The information generating unit 32 sets the property data associated with the impression history hB of the user history data H specified by the identification information among the plurality of property data dC stored in the storage device 12 by the conversion data WC. dC is specified as the presentation data QA. The presentation processing unit 26 causes the display device 18 to display the presentation data QA generated by the information generation unit 32. As understood from the above description, in the fourth embodiment, the result (property data dC) of estimating the user's property with reference to the auditory impression of the singing voice V is displayed on the display device 18. The user can confirm the estimation result of his / her property by visually recognizing the image displayed on the display device 18. In the fourth embodiment, since the character of the speaker is estimated using the time series (impression history hB) of each impression index Ym of the singing voice V, it is appropriate to take into account the temporal change of the auditory impression of the singing voice V. There is an advantage that a proper property can be estimated.

なお、特許文献１や特許文献２の技術では、模範的な歌唱音声と評価対象の歌唱音声との特徴量の相違のみに着目した歌唱の客観的な巧拙が評価されるに過ぎない。第４実施形態によれば、歌唱音声Ｖの聴覚印象に応じて利用者の性状が推定および提示されるから、演出的な効果や興趣性を利用者に付与することが可能である。また、第４実施形態にて歌唱音声Ｖから利用者の性状を推定した結果を、利用者の精神的／肉体的な状態の管理等（例えば心理カウンセリング，健康管理，セラピー，自己啓発）に利用することも可能である。また、表示装置１８に提示される自分の性状が目標に近付くように歌唱スタイルを調整することで、所望の印象を他者に付与できるような歌唱スタイルを習得することも可能である。 In addition, in the technique of patent document 1 or patent document 2, the objective skill of the singing which paid attention only to the difference of the feature-value between model singing voice and the singing voice of evaluation object is only evaluated. According to 4th Embodiment, since a user's property is estimated and shown according to the auditory impression of the singing voice V, it is possible to give a production effect and interest property to a user. In addition, the result of estimating the user's properties from the singing voice V in the fourth embodiment is used for managing the mental / physical state of the user (for example, psychological counseling, health management, therapy, self-development). It is also possible to do. It is also possible to learn a singing style that can give a desired impression to others by adjusting the singing style so that his / her properties presented on the display device 18 approach the target.

＜第４実施形態の変形例＞
（１）事前に用意された複数の性状データｄCを複数のグループ（カテゴリ）に分類し、複数のグループのうち所定の条件で選択された１個のグループから履歴データＨ（印象履歴ｈB）に応じた性状データｄCを情報生成部３２が特定することも可能である。１個のグループを選択する条件は任意であるが、例えば、複数のグループのうち利用者が入力装置１４に対する操作で指定したグループから性状データｄCを選択する構成や、複数のグループのうち利用者の属性情報（例えば年齢や性別等）に応じて選択したグループから性状データｄCを選択する構成が好適である。また、複数の利用者の属性情報に応じて性状データｄCのグループを選択することも可能である。 <Modification of Fourth Embodiment>
(1) A plurality of property data dC prepared in advance is classified into a plurality of groups (categories), and history data H (impression history hB) is selected from one group selected under a predetermined condition among the plurality of groups. It is also possible for the information generating unit 32 to specify the corresponding property data dC. The condition for selecting one group is arbitrary. For example, the configuration in which the property data dC is selected from the group specified by the operation of the input device 14 by the user among a plurality of groups, or the user among the plurality of groups. A configuration in which the property data dC is selected from a group selected according to the attribute information (for example, age and sex) is suitable. It is also possible to select a group of property data dC according to the attribute information of a plurality of users.

（２）履歴データＨの印象履歴ｈBの内容は以上の例示（印象指標Ｙmの時系列）に限定されない。例えば、印象指標Ｙmの数値毎の頻度や変動率（単位時間内の変化量）を印象履歴ｈBとして利用することも可能である。また、楽曲のうち特定の区間（例えばサビ）の印象指標Ｙmの時系列を印象履歴ｈBとして履歴データＨを生成する構成や、特定の期間毎（例えば１日毎，１週毎，１月毎）に履歴データＨを生成する構成も採用され得る。また、楽曲の曲調に応じて歌唱の仕方が相違し得ることを考慮すると、楽曲毎（または楽曲のジャンル毎）に履歴データＨを生成する構成も好適である。 (2) The content of the impression history hB of the history data H is not limited to the above example (time series of impression index Ym). For example, it is also possible to use the frequency and variation rate (change amount within unit time) for each numerical value of the impression index Ym as the impression history hB. In addition, the history data H is generated by using the time series of the impression index Ym of a specific section (for example, rust) in the music as the impression history hB, or for each specific period (for example, every day, every week, every month). Alternatively, a configuration for generating the history data H may be employed. Further, considering that the way of singing may differ depending on the tune of the music, a configuration in which the history data H is generated for each music (or for each genre of music) is also suitable.

（３）以上の説明では、利用者の性状を意味する文字列を示す性状データｄCを例示したが、利用者の性状を表象する画像（例えば似顔絵やキャラクタ）の画像データを性状データｄCとして利用することも可能である。また、性状データｄCが共通する利用者や有名人を提示する構成や、性状データｄCが示す性状とは反対の性状を利用者に提案する構成も採用され得る。 (3) In the above description, the property data dC indicating the character string indicating the user's property is illustrated, but image data of an image (for example, a portrait or character) representing the user's property is used as the property data dC. It is also possible to do. In addition, a configuration in which a user or a celebrity who shares the property data dC is presented, or a configuration in which a property opposite to the property indicated by the property data dC is proposed to the user may be employed.

＜第５実施形態＞
図１１は、第５実施形態の音響解析装置１００Eの構成図である。図１１に例示される通り、第５実施形態の音響解析装置１００Eは、第１実施形態と同様の要素（特徴抽出部２２，印象特定部２４，提示処理部２６，関連式設定部４０）に目標設定部４２と解析処理部４４とを追加した構成である。特徴抽出部２２によるＮ個の特徴指標Ｘ1〜ＸNの抽出と、印象特定部２４によるＭ個の印象指標Ｙ1〜ＹMの特定と、関連式設定部４０によるＭ個の関連式Ｆ1〜ＦMの設定とは第１実施形態と同様である。したがって、第５実施形態においても第１実施形態と同様の効果が実現される。 <Fifth Embodiment>
FIG. 11 is a configuration diagram of an acoustic analysis device 100E according to the fifth embodiment. As illustrated in FIG. 11, the acoustic analysis device 100E according to the fifth embodiment includes elements similar to those in the first embodiment (the feature extraction unit 22, the impression identification unit 24, the presentation processing unit 26, and the related expression setting unit 40). In this configuration, a target setting unit 42 and an analysis processing unit 44 are added. Extraction of N feature indexes X1 to XN by the feature extraction unit 22, specification of M impression indexes Y1 to YM by the impression specification unit 24, and setting of M related equations F1 to FM by the related equation setting unit 40 Is the same as in the first embodiment. Therefore, the same effect as that of the first embodiment is also realized in the fifth embodiment.

図１１の目標設定部４２は、Ｍ個の印象指標Ｙ1〜ＹMの各々について目標値Ａm（Ａ1〜ＡM）を設定する。具体的には、目標設定部４２は、入力装置１４に対する利用者からの指示に応じて各目標値Ａmを可変に設定する。 The target setting unit 42 in FIG. 11 sets a target value Am (A1 to AM) for each of the M impression indices Y1 to YM. Specifically, the target setting unit 42 variably sets each target value Am in accordance with an instruction from the user to the input device 14.

例えば第５実施形態の提示処理部２６は、各印象指標Ｙmの目標値Ａmの指示を受付ける図１２の操作画面８０を表示装置１８に表示させる。操作画面８０は、Ｍ個の印象指標Ｙ1〜ＹM（図１２の例示ではＭ＝３）の各々に対応する操作子画像８２を包含する。各操作子画像８２は、入力装置１４に対する利用者からの指示に応じて移動するスライダ型の操作子の画像であり、利用者による目標値Ａmの指示を受付ける。目標設定部４２は、各操作子画像８２の位置に応じて各印象指標Ｙmの目標値Ａmを設定する。なお、操作画面８０の複数の操作子画像８２は各々が個別に移動され得るが、各操作子画像８２を相互に連動して移動させることも可能である。 For example, the presentation processing unit 26 of the fifth embodiment causes the display device 18 to display the operation screen 80 of FIG. 12 that accepts an instruction of the target value Am of each impression index Ym. The operation screen 80 includes an operator image 82 corresponding to each of the M impression indices Y1 to YM (M = 3 in the illustration of FIG. 12). Each operation element image 82 is an image of a slider-type operation element that moves in response to an instruction from the user to the input device 14 and accepts an instruction of a target value Am by the user. The target setting unit 42 sets a target value Am for each impression index Ym according to the position of each operator image 82. Note that each of the plurality of operation element images 82 on the operation screen 80 can be moved individually, but each operation element image 82 can also be moved in conjunction with each other.

図１１の解析処理部４４は、印象特定部２４が歌唱音声Ｖについて特定した各印象指標Ｙmを目標値Ａmに近付けるために変化させるべき音響特徴（特徴指標Ｘn）を特定する。第５実施形態の解析処理部４４は、各印象指標Ｙmを目標値Ａmに近付けるために変化させるべき音響特徴と当該変化の方向（増加／減少）とを指定する解析データＱBを生成する。提示処理部２６は、解析処理部４４が生成した解析データＱBの内容（変化対象の音響特徴と変化方向）を表示装置１８に表示させる。したがって、利用者は、自身の歌唱を目標の聴覚印象に近付けるための改善点を把握することが可能である。以上の説明から理解される通り、解析データＱBの提示は、目標の聴覚印象を実現するための歌唱指導に相当する。 The analysis processing unit 44 in FIG. 11 specifies the acoustic feature (feature index Xn) to be changed in order to bring each impression index Ym specified for the singing voice V by the impression specifying unit 24 close to the target value Am. The analysis processing unit 44 according to the fifth embodiment generates analysis data QB that designates acoustic features that should be changed to bring each impression index Ym closer to the target value Am and the direction (increase / decrease) of the change. The presentation processing unit 26 causes the display device 18 to display the contents of the analysis data QB generated by the analysis processing unit 44 (acoustic features to be changed and change directions). Therefore, the user can grasp an improvement point for bringing his / her song close to the target auditory impression. As understood from the above description, the presentation of the analysis data QB corresponds to singing instruction for realizing a target auditory impression.

第５実施形態の解析処理部４４は、印象指標Ｙmと目標値Ａmとの差分の絶対値|Ｙm−Ａm|をＭ個の聴覚印象について合計した数値（以下「合計差分」という）δを最小化するために変化させるべき音響特徴をＮ種類の音響特徴から特定する。具体的には、解析処理部４４は、Ｎ種類のうち任意の１種類の音響特徴の特徴指標Ｘnを所定の変化量ｐだけ変化させたと仮定した場合の合計差分δを、変化対象の音響特徴を相違させた複数の場合について算定したうえで相互に比較し、合計差分δが最小となる場合の変化対象の音響特徴と当該変化の方向（増加／現象）とを指定する解析データＱBを生成する。 The analysis processing unit 44 of the fifth embodiment minimizes a numerical value (hereinafter referred to as “total difference”) δ obtained by summing the absolute value | Ym−Am | of the difference between the impression index Ym and the target value Am for M auditory impressions. The acoustic features that should be changed in order to be converted are identified from the N types of acoustic features. Specifically, the analysis processing unit 44 calculates the total difference δ when it is assumed that the feature index Xn of any one of N types of acoustic features is changed by a predetermined change amount p, as the acoustic feature to be changed. Calculate multiple cases with different values and compare them with each other to generate analysis data QB that specifies the acoustic feature to be changed and the direction (increase / phenomenon) of the change when the total difference δ is minimized To do.

任意の１個の特徴指標Ｘnを変化量ｐだけ変化させた場合の合計差分δは、以下の数式(A)で表現される。

数式(A)のうち変化量ｐと係数ａnmとの乗算値の減算は、特徴指標Ｘnを変化量ｐだけ変化させる処理に相当する。解析処理部４４は、変化量ｐの正負を反転させた２通りの場合（ｐ＝±１）について、特徴指標Ｘnを変化量ｐだけ変化させた数式(A)の合計差分δを算定し、合計差分δが最小化された場合の変化対象の音響特徴と変化の方向（変化量ｐの正負）とを特定する。 The total difference δ when any one feature index Xn is changed by the change amount p is expressed by the following formula (A).

The subtraction of the multiplication value of the change amount p and the coefficient anm in the formula (A) corresponds to a process of changing the feature index Xn by the change amount p. The analysis processing unit 44 calculates the total difference δ of the mathematical formula (A) in which the characteristic index Xn is changed by the change amount p in two cases where the sign of the change amount p is reversed (p = ± 1), The acoustic feature to be changed and the direction of change (positive or negative of the change amount p) when the total difference δ is minimized are specified.

例えば、長幼の印象指標Ｙ1および清濁の印象指標Ｙ2と、ビブラートの深度を示す特徴指標Ｘ1および音高の正確性を示す特徴指標Ｘ2とに着目し（Ｍ＝Ｎ＝２）、関連式Ｆ1および関連式Ｆ2を以下のように仮定する（ａ11＝０.７，ａ21＝０.３，ａ12＝−０.４，ａ22＝０.７）。

For example, paying attention to the impression index Y1 for young children and the impression index Y2 for clearness, the feature index X1 indicating the vibrato depth and the feature index X2 indicating the accuracy of the pitch (M = N = 2), the relational expression F1 and The relational expression F2 is assumed as follows (a11 = 0.7, a21 = 0.3, a12 = −0.4, a22 = 0.7).

いま、印象指標Ｙ1が５であるのに対して目標値Ａ1が４であり、印象指標Ｙ2が４であるのに対して目標値Ａ2が６である場合を想定する（（Ｙ1,Ｙ2）＝（５,４），（Ａ1,Ａ2）＝（４,６））。すなわち、評価済の歌唱音声Ｖと比較して「子供っぽく清らかな歌唱」（Ｙ1：５→４，Ｙ2：４→６）を実現するために変化させるべき特徴指標Ｘnを探索する。 Assume that the target value A1 is 4 for the impression index Y1 being 5 and the target value A2 is 6 for the impression index Y2 being 4 ((Y1, Y2) = (5,4), (A1, A2) = (4,6)). That is, the feature index Xn to be changed is searched in order to realize “child-like and clean singing” (Y1: 5 → 4, Y2: 4 → 6) as compared with the evaluated singing voice V.

［１］ｐ＝１（特徴指標Ｘnの増加を仮定）
・条件１ａ：特徴指標Ｘ1の変化を仮定（ビブラートの深度を増加させる場合）
δ＝｜Ａ1−Ｙ1−ｐ・ａ11｜＋｜Ａ2−Ｙ2−ｐ・ａ12｜
＝｜４−５−１・０.７｜＋｜６−４−１・（−０.４）｜
＝１.７＋２.４＝４.１
・条件１ｂ：特徴指標Ｘ2の変化を仮定（音高の正確性を増加させる場合）
δ＝｜Ａ1−Ｙ1−ｐ・ａ21｜＋｜Ａ2−Ｙ2−ｐ・ａ22｜
＝｜４−５−１・０.３｜＋｜６−４−１・０.７｜
＝１.３＋１.３＝２.６
［２］ｐ＝−１（特徴指標Ｘnの減少を仮定）
・条件２ａ：特徴指標Ｘ1の変化を仮定（ビブラートの深度を減少させる場合）
δ＝｜Ａ1−Ｙ1−ｐ・ａ11｜＋｜Ａ2−Ｙ2−ｐ・ａ12｜
＝｜４−５−（−１）・０.７｜＋｜６−４−（−１）・（−０.４）｜
＝０.３＋１.６＝１.９
・条件２ｂ：特徴指標Ｘ2の変化を仮定（音高の正確性を減少させる場合）
δ＝｜Ａ1−Ｙ1−ｐ・ａ21｜＋｜Ａ2−Ｙ2−ｐ・ａ22｜
＝｜４−５−（−１）・０.３｜＋｜６−４−（−１）・０.７｜
＝０.７＋２.７＝３.４ [1] p = 1 (assuming an increase in the feature index Xn)
・ Condition 1a: Assuming a change in the characteristic index X1 (when increasing the vibrato depth)
δ = | A1-Y1-p.a11 | + | A2-Y2-p.a12 |
= | 4-5-1 · 0.7 | + | 6-4-1 · (−0.4) |
= 1.7 + 2.4 = 4.1
・ Condition 1b: Assuming a change in the characteristic index X2 (when increasing pitch accuracy)
δ = | A1-Y1-p.a21 | + | A2-Y2-p.a22 |
= | 4-5-1 · 0.3 | + | 6-4-1 · 0.7 |
= 1.3 + 1.3 = 2.6
[2] p = −1 (assuming a decrease in the feature index Xn)
・ Condition 2a: Assuming a change in the feature index X1 (when reducing the vibrato depth)
δ = | A1-Y1-p.a11 | + | A2-Y2-p.a12 |
= | 4-5-(-1) · 0.7 | + | 6-4-(-1) · (−0.4) |
= 0.3 + 1.6 = 1.9
・ Condition 2b: Assuming a change in the characteristic index X2 (when reducing pitch accuracy)
δ = | A1-Y1-p.a21 | + | A2-Y2-p.a22 |
= | 4-5-(-1) .0.3 | + | 6-4-(-1) .0.7 |
= 0.7 + 2.7 = 3.4

以上の通り、特徴指標Ｘ1を減少させる条件２ａのもとで合計差分δは最小値（δ＝１.９）となる。したがって、解析処理部４４は、歌唱音声Ｖを目標（Ａ1，Ａ2）に近付けるための条件として「ビブラートの深度の減少」（音響特徴＝ビブラートの深度，変化方向＝減少）を指定する解析データＱBを生成する。以上の説明から理解される通り、目標値Ａmと相違する印象指標Ｙmの関連式Ｆmにおいて係数ａnmが大きい特徴指標Ｘn（すなわち印象指標Ｙmに対する影響が相対的に大きい特徴指標Ｘn）が、当該印象指標Ｙmを目標値Ａmに近付けるために変化させるべき特徴指標Ｘnとして優先的に選択される。解析処理部４４による解析の結果（解析データＱB）を表示装置１８で確認した利用者は、自身が目指す「子供っぽく清らかな歌唱」を実現するには「ビブラートの深度を減少させる」という方策が最善であると把握できる。 As described above, the total difference δ becomes the minimum value (δ = 1.9) under the condition 2a for reducing the feature index X1. Therefore, the analysis processing unit 44 specifies the analysis data QB that designates “decrease in vibrato depth” (acoustic feature = depth of vibrato, change direction = decrease) as a condition for bringing the singing voice V closer to the target (A1, A2). Is generated. As understood from the above description, the characteristic index Xn having a large coefficient anm in the relational expression Fm of the impression index Ym different from the target value Am (that is, the characteristic index Xn having a relatively large influence on the impression index Ym) is the impression. The index Ym is preferentially selected as the characteristic index Xn to be changed in order to bring the index Ym close to the target value Am. The user who confirms the analysis result (analysis data QB) by the analysis processing unit 44 on the display device 18 is a policy of “decrease the vibrato depth” in order to realize the “child-friendly and clean singing”. Can be grasped as the best.

ところで、例えば特開２００８−２０７９８号公報には、模範的な歌唱を示す基準値と評価対象の歌唱音声の特徴量との差異を順次に評価し、「発音は明瞭に」「はっきりと」等の歌唱指導のコメントを評価の結果に応じて表示する技術が開示されている。しかし、以上の技術では、模範的な歌唱に近付くための改善点が利用者に提示されるに過ぎない。すなわち、歌唱指導のコメントに適合するように歌唱しても模範的な歌唱に近付くだけであり、特定の聴覚印象を受聴者に感取させ得る歌唱に近付けることはできない。第５実施形態によれば、前述の例示からも把握される通り、歌唱音声Ｖを目標の聴覚印象に近付けるための最適な改善点（音響特徴）を利用者が把握できるという利点がある。また、目標に近付くように利用者が自身の歌唱を改善することで、自己実現や健康維持（心理療法やフィットネス）の手法としての応用も期待できる。 By the way, for example, in Japanese Patent Application Laid-Open No. 2008-20798, a difference between a reference value indicating an exemplary singing and a feature amount of a singing voice to be evaluated is sequentially evaluated, and “sounding is clear”, “clearly”, The technique of displaying the comment of singing instruction according to the result of evaluation is disclosed. However, the above technique only presents the user with improvements for approaching an exemplary song. That is, even if it sings so as to match the comment of the singing instruction, it only approaches an exemplary singing, and cannot approach a singing that can make the listener feel a specific auditory impression. According to the fifth embodiment, as can be understood from the above-described examples, there is an advantage that the user can grasp the optimum improvement point (acoustic feature) for bringing the singing voice V close to the target auditory impression. In addition, by improving the user's singing so as to approach the goal, application as a method of self-realization and health maintenance (psychotherapy and fitness) can also be expected.

＜第６実施形態＞
図１３は、第６実施形態の音響解析装置１００Fの構成図である。図１３に例示される通り、第６実施形態の音響解析装置１００Fは、第５実施形態と同様の要素（特徴抽出部２２，印象特定部２４，提示処理部２６，関連式設定部４０，目標設定部４２，解析処理部４４）に音響処理部４６を追加した構成である。特徴抽出部２２によるＮ個の特徴指標Ｘ1〜ＸNの抽出と、印象特定部２４によるＭ個の印象指標Ｙ1〜ＹMの特定と、関連式設定部４０によるＭ個の関連式Ｆ1〜ＦMの設定とは第１実施形態と同様である。したがって、第６実施形態においても第１実施形態と同様の効果が実現される。 <Sixth Embodiment>
FIG. 13 is a configuration diagram of an acoustic analysis device 100F according to the sixth embodiment. As illustrated in FIG. 13, the acoustic analysis device 100 F according to the sixth embodiment includes the same elements (feature extraction unit 22, impression identification unit 24, presentation processing unit 26, relational expression setting unit 40, target as in the fifth embodiment. The sound processing unit 46 is added to the setting unit 42 and the analysis processing unit 44). Extraction of N feature indexes X1 to XN by the feature extraction unit 22, specification of M impression indexes Y1 to YM by the impression specification unit 24, and setting of M related equations F1 to FM by the related equation setting unit 40 Is the same as in the first embodiment. Therefore, the sixth embodiment can achieve the same effect as that of the first embodiment.

第６実施形態の目標設定部４２は、第５実施形態と同様に、例えば利用者からの指示に応じて各印象指標Ｙmの目標値Ａmを設定する。解析処理部４４は、印象特定部２４が歌唱音声Ｖについて特定した各印象指標Ｙmを目標値Ａmに近付けるために変化させるべき音響特徴（特徴指標Ｘn）を指定する解析データＱBを第５実施形態と同様の方法で生成する。 Similar to the fifth embodiment, the target setting unit 42 of the sixth embodiment sets a target value Am of each impression index Ym in accordance with, for example, an instruction from the user. The analysis processing unit 44 uses the analysis data QB for designating acoustic features (feature index Xn) to be changed in order to bring each impression index Ym specified by the impression specifying unit 24 for the singing voice V close to the target value Am, in the fifth embodiment. Generate in the same way as

図１３の音響処理部４６は、解析処理部４４が特定した音響特徴を変化させる音響処理を歌唱音声Ｖに対して実行する。具体的には、音響処理部４６は、解析処理部４４が生成した解析データＱBで指定される音響特徴が、当該解析データＱBで指定される方向に変化（増加／減少）するように、収音装置１６が収音した歌唱音声Ｖに対して音響処理を実行する。すなわち、歌唱音声ＶのＮ個の特徴指標Ｘ1〜ＸNのうち、目標値Ａmと相違する印象指標Ｙmの関連式Ｆmにおいて係数（印象指標Ｙmに対する寄与度）ａnmが大きい特徴指標Ｘn（すなわち印象指標Ｙmを効率的に目標値Ａmに近付けることが可能な特徴指標Ｘn）が、音響処理部４６による音響処理で優先的に変更される。 The sound processing unit 46 in FIG. 13 performs an acoustic process on the singing voice V that changes the acoustic feature specified by the analysis processing unit 44. Specifically, the acoustic processing unit 46 converges so that the acoustic feature specified by the analysis data QB generated by the analysis processing unit 44 changes (increases / decreases) in the direction specified by the analysis data QB. Acoustic processing is performed on the singing voice V collected by the sound device 16. That is, among the N characteristic indexes X1 to XN of the singing voice V, the characteristic index Xn (that is, the impression index) having a large coefficient (contribution to the impression index Ym) anm in the relational expression Fm of the impression index Ym different from the target value Am. The characteristic index Xn) that can effectively bring Ym close to the target value Am is preferentially changed by the acoustic processing by the acoustic processing unit 46.

歌唱音声Ｖに対して実行される具体的な音響処理には、変更対象の音響特徴の種類に応じた公知の音響処理技術が任意に採用される。例えば、清濁に関する印象指標Ｙmを目標値Ａmに近付けるための特徴指標Ｘnが「ノイズ感」である場合、音響処理部４６は、歌唱音声Ｖに雑音成分を付与する音響処理（雑音付与処理）を実行する。また、例えば、前述の第５実施形態の例示のように「ビブラートの深度の減少」を解析データＱBが指定する場合、音響処理部４６は、歌唱音声Ｖにおける音高の微小な変動を抑制する音響処理を歌唱音声Ｖに対して実行する。音響処理部４６による処理後の歌唱音声Ｖは例えば放音装置１７（スピーカやヘッドホン）から再生される。なお、歌唱音声Ｖの再生に代えて（または再生とともに）、音響処理部４６による処理後の歌唱音声Ｖのファイルを生成することも可能である。 For specific acoustic processing executed on the singing voice V, a known acoustic processing technique corresponding to the type of acoustic feature to be changed is arbitrarily employed. For example, when the characteristic index Xn for bringing the impression index Ym related to clearness to the target value Am is “sense of noise”, the acoustic processing unit 46 performs acoustic processing (noise addition processing) for adding a noise component to the singing voice V. Run. Further, for example, when the analysis data QB designates “decrease in vibrato depth” as illustrated in the fifth embodiment, the acoustic processing unit 46 suppresses minute fluctuations in pitch in the singing voice V. An acoustic process is performed on the singing voice V. The singing voice V after processing by the acoustic processing unit 46 is reproduced from, for example, the sound emitting device 17 (speaker or headphones). Note that, instead of (or along with) reproduction of the singing voice V, it is also possible to generate a file of the singing voice V after processing by the acoustic processing unit 46.

以上の説明から理解される通り、第５実施形態によれば、歌唱音声Ｖの聴覚印象を所望の印象（目標値Ａmに応じた聴覚印象）に調整することが可能である。第５実施形態の例示では特に、利用者からの指示に応じて各目標値Ａmが可変に設定されるから、利用者の所望の聴覚印象の歌唱音声Ｖを生成できるという利点がある。 As understood from the above description, according to the fifth embodiment, it is possible to adjust the auditory impression of the singing voice V to a desired impression (auditory impression according to the target value Am). Particularly in the illustration of the fifth embodiment, since each target value Am is variably set in accordance with an instruction from the user, there is an advantage that the singing voice V having a desired auditory impression can be generated.

なお、解析データＱBが指定する特徴指標Ｘn（以下では便宜的に「優先指標」という）を歌唱音声Ｖにて充分に（すなわち印象指標Ｙmが目標値Ａmに充分に近似する程度に）変動させることができない場合がある。例えば、解析データＱBが「ビブラートの深度の増加」を指定しても、ビブラートが付加され得る程度の時間長にわたり音高が維持される区間を歌唱音声Ｖが包含しない場合には、優先指標である「ビブラートの深度」の増加により印象指標Ｙmを目標値Ａmに充分に近付けることはできない。以上の場合、音響処理部４６は、歌唱音声ＶのＮ個の特徴指標Ｘ1〜ＸNのうち各印象指標ＹMを目標値Ａmに近付けるために有効な順番（合計差分δの昇順）で優先指標の次位に位置する特徴指標Ｘnが変化するように歌唱音声Ｖに対する音響処理を実行する。以上の構成によれば、歌唱音声Ｖの特性に関わらず各印象指標Ｙmを有効に目標値Ａmに近付けることが可能である。 The characteristic index Xn specified by the analysis data QB (hereinafter referred to as “priority index” for convenience) is sufficiently varied in the singing voice V (that is, the impression index Ym sufficiently approximates the target value Am). It may not be possible. For example, even if the analysis data QB specifies “increase in the depth of vibrato”, if the singing voice V does not include a section in which the pitch is maintained over a length of time that vibrato can be added, the priority index is The impression index Ym cannot be made sufficiently close to the target value Am by increasing a certain “depth of vibrato”. In the above case, the acoustic processing unit 46 sets the priority index in the order effective for increasing the impression index YM from the N feature indices X1 to XN of the singing voice V to the target value Am (ascending order of the total difference δ). The acoustic processing for the singing voice V is executed so that the feature index Xn positioned at the next position changes. According to the above configuration, each impression index Ym can be effectively brought close to the target value Am regardless of the characteristics of the singing voice V.

ところで、例えば特開２０１１−０９５３９７号公報には、音声合成に適用される複数種の制御変数を利用者からの指示に応じて設定する構成が開示されている。しかし、以上の技術では、複数種の制御変数のうちの何れを如何に調整すれば所望の聴覚印象の音声が実現されるのかを、利用者が明確に把握することが困難であるという問題がある。第６実施形態では、各聴覚印象の目標値Ａmが利用者からの指示に応じて設定されるから、例えば音声合成の制御変数に関する専門的な知識がない利用者でも所望の聴覚印象の歌唱音声Ｖを有効に生成できる（利用者による指示が容易化される）という利点がある。 Incidentally, for example, Japanese Patent Application Laid-Open No. 2011-09597 discloses a configuration in which a plurality of types of control variables applied to speech synthesis are set according to instructions from a user. However, with the above technique, there is a problem that it is difficult for the user to clearly understand which of the multiple types of control variables should be adjusted to achieve a desired audio impression. is there. In the sixth embodiment, the target value Am of each auditory impression is set in accordance with an instruction from the user. For example, even a user who does not have specialized knowledge about the speech synthesis control variable has a desired auditory impression singing voice. There is an advantage that V can be generated effectively (instruction by the user is facilitated).

＜変形例＞
以上の各形態は多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２個以上の態様は適宜に併合され得る。 <Modification>
Each of the above forms can be variously modified. Specific modifications are exemplified below. Two or more aspects arbitrarily selected from the following examples may be appropriately combined.

（１）前述の各形態では、楽曲の全区間にわたる歌唱音声Ｖを対象として聴覚印象を特定したが、歌唱音声Ｖを時間軸上で区分した複数の区間の各々について聴覚印象（Ｍ個の印象指標Ｙ1〜ＹM）を順次に特定することも可能である。歌唱音声Ｖの区間毎に聴覚印象を順次に特定する場合、第２実施形態から第４実施形態で例示した提示データＱAや第５実施形態および第６実施形態で例示した解析データＱBを、歌唱音声Ｖの各区間の聴覚印象に応じて区間毎に順次に（実時間的に）更新する構成も採用され得る。 (1) In each of the above-described forms, the auditory impression is specified for the singing voice V over the entire section of the music, but the auditory impression (M impressions) for each of a plurality of sections obtained by dividing the singing voice V on the time axis. It is also possible to specify the indices Y1 to YM) sequentially. When the auditory impression is sequentially specified for each section of the singing voice V, the presentation data QA exemplified in the second to fourth embodiments and the analysis data QB exemplified in the fifth and sixth embodiments are sung. A configuration may also be adopted in which updating is performed sequentially (in real time) for each section according to the auditory impression of each section of the voice V.

（２）前述の各形態では、収音装置１６が収音した歌唱音声Ｖを解析する要素（特徴抽出部２２，印象特定部２４，提示処理部２６，情報生成部３２，目標設定部４２，解析処理部４４，音響処理部４６）と、各関連式Ｆmを設定する関連式設定部４０との双方を具備する音響解析装置１００（１００A，１００B，１００C，１００D，１００E，１００F，１００G）を例示したが、関連式設定部４０を他の要素とは別体の装置に搭載することも可能である。 (2) In each embodiment described above, elements for analyzing the singing voice V picked up by the sound pickup device 16 (feature extraction unit 22, impression identification unit 24, presentation processing unit 26, information generation unit 32, target setting unit 42, An acoustic analysis apparatus 100 (100A, 100B, 100C, 100D, 100E, 100F, 100G) including both an analysis processing unit 44 and an acoustic processing unit 46) and a related formula setting unit 40 for setting each related formula Fm is provided. Although illustrated, it is also possible to mount the relational expression setting unit 40 in a separate device from other elements.

例えば図１４に例示される通り、通信網２００（例えばインターネット）を介して相互に通信する音響解析装置１１０と音響解析装置１２０とに、前述の各形態で例示した機能を分担させることも可能である。音響解析装置（関連式設定装置）１１０は、参照データ群ＤRと関係性記述データＤCとを利用して第１実施形態と同様の方法でＭ個の関連式Ｆ1〜ＦMを設定する関連式設定部４０を具備する。例えば通信網２００に接続されたサーバ装置で音響解析装置１１０は実現される。図１４に例示される通り、音響解析装置１１０（関連式設定部４０）が設定したＭ個の関連式Ｆ1〜ＦMは、通信網２００を介して音響解析装置１２０に転送される。音響解析装置１１０から複数の音響解析装置１２０にＭ個の関連式Ｆ1〜ＦMを共通に転送することも可能である。音響解析装置１２０は、特徴抽出部２２と印象特定部２４とを含んで構成され、音響解析装置１１０から転送されたＭ個の関連式Ｆ1〜ＦMを利用して第１実施形態と同様に歌唱音声Ｖを解析することで歌唱音声Ｖの聴覚印象（Ｍ個の印象指標Ｙ1〜ＹM）を特定する。音響解析装置１２０には、第２実施形態から第４実施形態と同様の情報生成部３２や、第５実施形態および第６実施形態と同様の目標設定部４２および解析処理部４４が設置され得る。図１４の構成では、参照データ群ＤRおよび関係性記述データＤCの保持や各関連式Ｆmの設定を音響解析装置１２０に実行させる必要がないから、音響解析装置１２０の構成および処理が簡素化されるという利点がある。 For example, as illustrated in FIG. 14, the acoustic analysis device 110 and the acoustic analysis device 120 that communicate with each other via the communication network 200 (for example, the Internet) can share the functions illustrated in the above-described embodiments. is there. The acoustic analysis device (relevant formula setting device) 110 uses the reference data group DR and the relationship description data DC to set the M related formulas F1 to FM in the same manner as in the first embodiment. Part 40. For example, the acoustic analysis device 110 is realized by a server device connected to the communication network 200. As illustrated in FIG. 14, the M related formulas F1 to FM set by the acoustic analysis device 110 (the related formula setting unit 40) are transferred to the acoustic analysis device 120 via the communication network 200. It is also possible to transfer the M relational expressions F1 to FM from the acoustic analysis device 110 to the plurality of acoustic analysis devices 120 in common. The acoustic analysis device 120 includes a feature extraction unit 22 and an impression identification unit 24, and sings in the same manner as in the first embodiment using M related expressions F1 to FM transferred from the acoustic analysis device 110. By analyzing the voice V, the auditory impression (M impression indices Y1 to YM) of the singing voice V is specified. In the acoustic analysis device 120, an information generation unit 32 similar to those in the second to fourth embodiments, a target setting unit 42, and an analysis processing unit 44 similar to those in the fifth and sixth embodiments may be installed. . In the configuration of FIG. 14, it is not necessary to cause the acoustic analysis device 120 to hold the reference data group DR and the relationship description data DC and to set each relational expression Fm, so that the configuration and processing of the acoustic analysis device 120 are simplified. There is an advantage that.

（３）第２実施形態から第４実施形態において、各種の機器を制御するための制御データを提示データＱAとして歌唱音声Ｖの聴覚印象に応じて設定することも可能である。制御データは、例えば楽曲の歌唱中に表示装置１８に表示される画像（背景画像）の制御や、再生機器（カラオケ装置）が再生する伴奏音の再生、照明機器等の演出効果の制御に適用される。カラオケ店等の店舗内で注文可能な飲食物を提示データＱAに応じて変更することも可能である。また、歌唱音声Ｖの聴覚印象（Ｍ個の印象指標Ｙ1〜ＹM）を歌唱評価（採点）に応用することも可能である。例えば、歌唱評価に適用される変数を提示データＱAに応じて調整する構成や、楽曲毎に事前に登録された印象と歌唱音声Ｖの聴覚印象との類似度（異同）を評価結果に反映させる構成（例えば両者が類似するほど加点を増加させる構成）が好適に採用される。 (3) In the second to fourth embodiments, the control data for controlling various devices can be set as the presentation data QA according to the auditory impression of the singing voice V. The control data is applied to, for example, control of an image (background image) displayed on the display device 18 during song singing, playback of accompaniment sounds played by a playback device (karaoke device), and control of effects such as lighting devices. Is done. It is also possible to change foods and drinks that can be ordered in a store such as a karaoke store according to the presentation data QA. It is also possible to apply the auditory impression (M impression indices Y1 to YM) of the singing voice V to singing evaluation (scoring). For example, a configuration in which a variable applied to singing evaluation is adjusted according to the presentation data QA, and a similarity (dissimilarity) between an impression registered in advance for each song and an auditory impression of the singing voice V is reflected in the evaluation result. A configuration (for example, a configuration in which the points are increased as the two are similar) is preferably employed.

（４）第５実施形態および第６実施形態では、利用者からの指示に応じて各目標値Ａmを設定したが、目標値Ａmの設定の方法は以上の例示に限定されない。例えば、楽曲毎に目標値Ａm（Ａ1〜ＡM）を事前に選定し、利用者が実際に歌唱する楽曲の目標値Ａmを目標設定部４２が選択する構成も採用され得る。また、利用者が歌唱する楽曲の属性（主旋律，ジャンル，歌手等）に応じて目標設定部４２が各目標値Ａmを可変に設定することも可能である。 (4) In the fifth embodiment and the sixth embodiment, each target value Am is set according to an instruction from the user, but the method of setting the target value Am is not limited to the above examples. For example, a configuration in which the target value Am (A1 to AM) is selected in advance for each piece of music and the target setting unit 42 selects the target value Am of the music that the user actually sings may be employed. Also, the target setting unit 42 can variably set each target value Am according to the attributes of the music sung by the user (main melody, genre, singer, etc.).

（５）前述の各形態では、利用者が楽曲を歌唱した歌唱音声Ｖを例示したが、解析対象は歌唱音声Ｖに限定されない。例えば、会話音等の音声や楽器の演奏音（楽音）、音声合成技術で生成された合成音声（歌唱音声や会話音）について各関連式Ｆmを利用した解析で聴覚印象（Ｍ個の印象指標Ｙ1〜ＹM）を特定することも可能である。また、遠隔地間で音声を授受する遠隔会議システムのもとで各地点にて再生される音声（例えば会議での会話音）や、スピーカ等の放音装置を含む任意の音響システムから放射される音響についても聴覚印象を特定し得る。以上の説明から理解される通り、本発明において解析対象となる音響（解析対象音）の具体的な内容（種類）や発音の原理等は任意である。 (5) In each form mentioned above, although the singing voice V which the user sang the music was illustrated, the analysis object is not limited to the singing voice V. For example, auditory impressions (M impression indicators) are analyzed by using each related expression Fm for voices such as conversation sounds, musical instrument performance sounds (musical sounds), and synthesized voices generated by voice synthesis technology (singing voices and conversational sounds). It is also possible to specify Y1-YM). Moreover, it is emitted from any sound system including sound played back at each point under a remote conference system that sends and receives sound between remote locations (for example, conversation sound at a conference) and sound emitting devices such as speakers. The auditory impression can be specified for the sound to be heard. As understood from the above description, the specific content (type) of the sound (analysis target sound) to be analyzed in the present invention, the principle of pronunciation, and the like are arbitrary.

１００（１００A，１００B，１００C，１００D，１００E，１００F，１００G），１１０，１２０……音響解析装置、１０……演算処理装置、１２……記憶装置、１４……入力装置、１６……収音装置、１８……表示装置、２２……特徴抽出部、２４……印象特定部、２６……提示処理部、３２……情報生成部、４０……関連式設定部、４２……目標設定部、４４……解析処理部、４６……音響処理部。

100 (100A, 100B, 100C, 100D, 100E, 100F, 100G), 110, 120 ... acoustic analysis device, 10 ... arithmetic processing device, 12 ... storage device, 14 ... input device, 16 ... sound collecting Device: 18 ... Display device, 22 ... Feature extraction unit, 24 ... Impression identification unit, 26 ... Presentation processing unit, 32 ... Information generation unit, 40 ... Related expression setting unit, 42 ... Target setting unit 44 …… Analysis processing unit, 46 …… Sound processing unit.

（５）前述の各形態では、利用者が楽曲を歌唱した歌唱音声Ｖを例示したが、解析対象は歌唱音声Ｖに限定されない。例えば、会話音等の音声や楽器の演奏音（楽音）、音声合成技術で生成された合成音声（歌唱音声や会話音）について各関連式Ｆmを利用した解析で聴覚印象（Ｍ個の印象指標Ｙ1〜ＹM）を特定することも可能である。例えば、楽器の演奏音の解析では、前述の各形態と同様に、例えば明暗や清濁等の印象指標Ｙmが特定され得る。また、遠隔地間で音声を授受する遠隔会議システムのもとで各地点にて再生される音声（例えば会議での会話音）や、スピーカ等の放音装置を含む任意の音響システムから放射される音響についても聴覚印象を特定し得る。以上の説明から理解される通り、本発明において解析対象となる音響（解析対象音）の具体的な内容（種類）や発音の原理等は任意である。
(5) In each form mentioned above, although the singing voice V which the user sang the music was illustrated, the analysis object is not limited to the singing voice V. For example, auditory impressions (M impression indicators) are analyzed by using each related expression Fm for voices such as conversation sounds, musical instrument performance sounds (musical sounds), and synthesized voices generated by voice synthesis technology (singing voices and conversational sounds). It is also possible to specify Y1-YM). For example, in the analysis of the performance sound of a musical instrument, an impression index Ym such as light and dark and clearness can be specified as in the above-described embodiments. Moreover, it is emitted from any sound system including sound played back at each point under a remote conference system that sends and receives sound between remote locations (for example, conversation sound at a conference) and sound emitting devices such as speakers. The auditory impression can be specified for the sound to be heard. As understood from the above description, the specific content (type) of the sound (analysis target sound) to be analyzed in the present invention, the principle of pronunciation, and the like are arbitrary.

Claims

A plurality of reference data in which an impression index indicating an auditory impression of a reference sound and a feature index indicating an acoustic feature of the reference sound are associated with each other, and a correspondence relationship between the auditory impression and a plurality of types of acoustic features are defined. Relational expression setting means for setting a relational expression that expresses the relationship between the impression index of the auditory impression and the feature index of each acoustic feature in the correspondence defined by the relational description data using the relation description data An acoustic analysis apparatus comprising:

The acoustic analysis apparatus according to claim 1, wherein the relationship description data defines a correspondence relationship between the auditory impression and the plurality of types of acoustic features via a plurality of intermediate elements included in the auditory impression.

The acoustic analysis apparatus according to claim 1, wherein the relational expression setting unit sets the relational expression for each of a plurality of types of auditory impressions.

The acoustic analysis apparatus according to claim 1, wherein the relational expression setting unit acquires the reference data and updates a predetermined relational expression using the reference data.

A feature extraction means for extracting a feature index of the sound to be analyzed;
A plurality of reference data in which an impression index indicating an auditory impression of a reference sound and a feature index indicating an acoustic feature of the reference sound are associated with each other, and a correspondence relationship between the auditory impression and a plurality of types of acoustic features are defined. The relational expression calculated using the relationship description data and expressing the relationship between the impression index of the auditory impression and the feature index of the plurality of types of acoustic features in the correspondence defined by the relationship description data, An acoustic analysis apparatus comprising: an impression specifying unit that calculates an impression index of the analysis target sound by applying the feature index extracted by the feature extraction unit.