JP2006254187A

JP2006254187A - Acoustic field determining method and device

Info

Publication number: JP2006254187A
Application number: JP2005069288A
Authority: JP
Inventors: Yukiya Sasaki; 幸弥佐々木; Takuya Tamaru; 卓也田丸; Takuro Sone; 卓朗曽根
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2005-03-11
Filing date: 2005-03-11
Publication date: 2006-09-21

Abstract

<P>PROBLEM TO BE SOLVED: To provide au acoustic field determining method and device by which an optimal acoustic field forming conditions can be determined with a high degree of accuracy. <P>SOLUTION: The acoustic field determining method includes steps of; analyzing contents of a reproducing object to obtain an amount of characteristics of the contents, obtaining listener's setting history information and contents attribute information set in the contents, and determining the acoustic field forming conditions for reproducing the contents based on the amount of the characteristics and the setting history information or the attribute information. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は音場判定方法及び音場判定装置に関し、特に音声、画像等を伝送する放送データ、オーディオビデオデータ等のコンテンツを再生するための音場形成条件を複数の情報に基づいて判定する技術に関する。 The present invention relates to a sound field determination method and a sound field determination device, and in particular, a technique for determining sound field forming conditions for reproducing contents such as broadcast data and audio video data for transmitting sound, images, and the like based on a plurality of information. About.

特許文献１〜８には、コンテンツに設定されている属性情報に基づいてコンテンツを再生するための音場形成条件を判定する技術が開示されている。コンテンツに設定されている属性情報としては、ＥＰＧ（Electric Program Guide）、ＲＤＳ（Radio Data System）等で案内されるカテゴリ、ステレオ／モノラル種別等が知られている。上記文献に開示された技術によると、例えばクラシック放送に対しては残響を付与して再生したり、ニュース放送に対しては残響を除去して再生したりできる。しかし、コンテンツに設定されている属性情報は、項目別には１番組に１つであることが一般的である。例えば、クラシック放送番組の属性情報のカテゴリという項目には「クラシック」という１カテゴリのみが設定されていることが一般的である。ところが、コンテンツに最適な音場形成条件はコンテンツの再生進行に伴って変化することが多い。例えばクラシック放送番組には解説音声が挿入されているため、クラシックという属性情報に基づいて番組全体の音場形成条件を設定すると、解説音声にも残響が付与される結果、解説が聞き取りづらくなるといった問題が発生する。 Patent Documents 1 to 8 disclose techniques for determining sound field forming conditions for reproducing content based on attribute information set for the content. As attribute information set in content, a category guided by EPG (Electric Program Guide), RDS (Radio Data System), etc., a stereo / monaural type, and the like are known. According to the technique disclosed in the above-mentioned document, for example, replay can be given to a classic broadcast, or reverberation can be removed for a news broadcast. However, the attribute information set in the content is generally one per program for each item. For example, in general, only one category of “classic” is set in the item of the category of attribute information of a classic broadcast program. However, the optimum sound field forming conditions for the content often change as the playback of the content progresses. For example, since a commentary sound is inserted in a classic broadcast program, if the sound field formation conditions for the entire program are set based on the attribute information of classic, reverberation is also added to the commentary sound, making it difficult to hear the commentary. A problem occurs.

特許文献９には、音響チャネルを解析した結果に基づいて音場形成条件を動的に判定する技術が開示されている。しかし、特許文献９に開示された技術では、音響チャネルの解析精度には限界があり、またリスナによって好みの音場が異なるため、不適切な音場形成条件が設定されるおそれがある。 Patent Document 9 discloses a technique for dynamically determining a sound field forming condition based on a result of analyzing an acoustic channel. However, in the technique disclosed in Patent Document 9, there is a limit to the analysis accuracy of the acoustic channel, and the preferred sound field varies depending on the listener.

特開平５−１１０５２８号公報Japanese Patent Laid-Open No. 5-110528 特開平６−２９１６９２号公報JP-A-6-291692 特開平７−２８４１８７号公報JP-A-7-284187 特開２００２−９６４８号公報JP 2002-9648 A 特開２００２−２７３５２号公報JP 2002-27352 A 特開２００２−３３９７６号公報JP 2002-33976 A 特開２００２−１５９０９９号公報JP 2002-159099 A 特開２００２−３１４４４７号公報JP 2002-314447 A 特開平７−６６７４０号公報JP-A-7-66740

本発明は上述の問題に鑑みて創作されたものであって、最適な音場形成条件を高い精度で判定できる音場判定方法及び音場判定装置を提供することを目的とする。 The present invention has been created in view of the above-described problems, and an object thereof is to provide a sound field determination method and a sound field determination apparatus that can determine an optimum sound field forming condition with high accuracy.

（１）上記目的を達成するための音場判定方法は、再生対象のコンテンツを解析して前記コンテンツの特徴量を取得する段階と、リスナの設定履歴情報又は前記コンテンツに設定されている前記コンテンツの属性情報を取得する段階と、前記コンテンツを再生するための音場形成条件を前記特徴量と前記設定履歴情報又は前記属性情報とに基づいて判定する段階と、を含む。
本発明によると、コンテンツの特徴量に加えてリスナの設定履歴情報又はコンテンツの属性情報に基づいて音場形成条件を判定するため、最適な音場形成条件を高い精度で判定することができる。 (1) A sound field determination method for achieving the above object includes a step of analyzing a content to be reproduced to acquire a feature amount of the content, and listener setting history information or the content set in the content And obtaining the sound field forming condition for reproducing the content based on the feature amount and the setting history information or the attribute information.
According to the present invention, since the sound field forming condition is determined based on the setting history information of the listener or the attribute information of the content in addition to the feature amount of the content, the optimum sound field forming condition can be determined with high accuracy.

（２）前記音場形成条件を判定する段階では、前記コンテンツの再生中に前記特徴量の変化に応じて前記音場形成条件を判定してもよい。
本発明によると、コンテンツの再生の進行に伴って動的に音場形成条件が判定されるため、最適な音場形成条件が常に高い精度で判定される。 (2) In the step of determining the sound field forming condition, the sound field forming condition may be determined according to a change in the feature amount during reproduction of the content.
According to the present invention, since the sound field forming condition is dynamically determined as the reproduction of the content proceeds, the optimum sound field forming condition is always determined with high accuracy.

（３）リスナの操作を検出する段階と、
検出された操作に基づいて前記リスナを推定する段階とをさらに含んでもよい。前記設定履歴情報を取得する段階では、推定された前記リスナに対応する前記設定履歴情報を取得してもよい。
本発明によると、リスナの操作に基づいてリスナが推定され、推定されたリスナに対応する設定履歴情報が取得されるため、リスナが自分自身を毎回認識させるための操作が不要となり、操作性が向上する。尚、リスナの操作を検出する段階では、例えば、いつコンテンツの再生を要求したか、どのコンテンツの再生を要求したか、どこからリモートコントローラを操作しているか、といったリスナの推定に役立つ操作内容を検出する。たとえば再生が要求された時間帯によってリスナを推定することができる場合（例えば平日の昼間であれば平日の昼間に他の放送を視聴したリスナと推定できる。）や、再生が要求されたコンテンツによってリスナを推定できる場合（例えば野球実況放送であれば別の野球実況放送を視聴したリスナと推定できる。）や、再生が要求された場所（例えば音場判定装置の正面で視聴したリスナであればその場所での視聴頻度が高いリスナであると推定できる。）によってユーザが特定できる場合がある。 (3) detecting the operation of the listener;
And estimating the listener based on the detected operation. In the step of acquiring the setting history information, the setting history information corresponding to the estimated listener may be acquired.
According to the present invention, the listener is estimated based on the operation of the listener, and the setting history information corresponding to the estimated listener is acquired. Therefore, the operation for the listener to recognize itself every time becomes unnecessary, and the operability is improved. improves. In addition, at the stage of detecting the listener operation, for example, the operation contents useful for listener estimation such as when the content playback is requested, which content playback is requested, and where the remote controller is operated are detected. To do. For example, if the listener can be estimated based on the time when playback is requested (for example, during the daytime on weekdays, it can be estimated that the listener watched other broadcasts during the daytime on weekdays), or depending on the content requested for playback. If the listener can be estimated (for example, if it is a baseball live broadcast, it can be estimated that the listener listened to another baseball live broadcast), or if the playback is requested (for example, if the listener is viewed in front of the sound field determination device) It can be estimated that the listener has a high viewing frequency at the place.

（４）前記特徴量を取得する段階では、前記コンテンツの音響チャネルを解析して前記音響チャネルの特徴量を取得してもよい。 (4) In the step of acquiring the feature amount, the acoustic channel of the content may be analyzed to acquire the feature amount of the acoustic channel.

（５）前記特徴量を取得する段階では、前記音響チャネルのＬチャネルとＲチャネルとの相関係数、前記音響チャネルが示す音の大きさ、前記音響チャネルが示す音の高さ、前記音響チャネルが示す音の経時変化特性のいずれか２つ以上の特徴量を取得してもよい。
本発明によると、２つ以上の音声特徴量に基づいて音場形成条件を判定するため、コンテンツに適した音が形成条件を高い精度で判定することができる。 (5) In the step of acquiring the feature amount, the correlation coefficient between the L channel and the R channel of the acoustic channel, the volume of sound indicated by the acoustic channel, the pitch of sound indicated by the acoustic channel, the acoustic channel Any two or more characteristic amounts of the sound temporal change characteristic indicated by may be acquired.
According to the present invention, since the sound field forming condition is determined based on two or more audio feature amounts, the sound suitable for the content can be determined with high accuracy.

（６）前記特徴量を取得する段階では、前記コンテンツの画像チャネルを解析して前記画像チャネルの特徴量を取得してもよい。 (6) In the step of acquiring the feature amount, the image channel of the content may be analyzed to acquire the feature amount of the image channel.

（７）上記目的を達成するための音場判定方法は、リスナの設定履歴情報及び再生対象のコンテンツに設定されている前記コンテンツの属性情報を取得する段階と、前記コンテンツを再生するための音場形成条件を前記設定履歴情報と前記属性情報とに基づいて判定する段階と、を含む。
本発明によると、リスナの設定履歴情報及びコンテンツの属性情報という性質が異なる２つの情報に基づいて音場形成条件を判定するため、最適な音場形成条件を高い精度で判定することができる。 (7) A sound field determination method for achieving the above-described object includes a step of acquiring listener setting history information and attribute information of the content set in the content to be reproduced, and a sound for reproducing the content. Determining a field formation condition based on the setting history information and the attribute information.
According to the present invention, since the sound field forming condition is determined based on two pieces of information having different properties such as listener setting history information and content attribute information, the optimum sound field forming condition can be determined with high accuracy.

（８）前記音場判定方法は、リスナの操作を検出する段階と、検出された操作に基づいて前記リスナを推定する段階とをさらに含んでもよい。前記設定履歴情報を取得する段階では、推定された前記リスナに対応する前記設定履歴情報を取得してもよい。
本発明によると、リスナの操作に基づいてリスナが推定され、推定されたリスナに対応する設定履歴情報が取得されるため、リスナが自分自身を認識させるための操作が不要となり、操作性が向上する。 (8) The sound field determination method may further include a step of detecting an operation of the listener and a step of estimating the listener based on the detected operation. In the step of acquiring the setting history information, the setting history information corresponding to the estimated listener may be acquired.
According to the present invention, the listener is estimated based on the operation of the listener, and the setting history information corresponding to the estimated listener is acquired, so the operation for the listener to recognize itself is unnecessary, and the operability is improved. To do.

（９）上記目的を達成するための音場判定方法は、再生対象のコンテンツを解析し、前記コンテンツの音響チャネルのＬチャネルとＲチャネルとの相関係数、前記音響チャネルが示す音の大きさ、前記音響チャネルが示す音の高さ、前記音響チャネルが示す音の経時変化特性のいずれか２つ以上の特徴量を取得する段階と、前記コンテンツを再生するための音場形成条件を前記特徴量に基づいて判定する段階と、を含む。
本発明によると、性質が異なる２つ以上の音声特徴量に基づいて音場形成条件を判定するため、最適な音場形成条件を高い精度で判定することができる。 (9) In the sound field determination method for achieving the above object, the content to be reproduced is analyzed, the correlation coefficient between the L channel and the R channel of the acoustic channel of the content, and the volume of sound indicated by the acoustic channel Obtaining at least two feature amounts of the pitch of the sound indicated by the acoustic channel and the temporal change characteristic of the sound indicated by the acoustic channel, and a sound field forming condition for reproducing the content. Determining based on the quantity.
According to the present invention, since the sound field forming condition is determined based on two or more sound feature quantities having different properties, the optimum sound field forming condition can be determined with high accuracy.

（１０）上記目的を達成するための音場判定装置は、再生対象のコンテンツを解析して前記コンテンツの特徴量を取得する手段と、リスナの設定履歴情報又は前記コンテンツに設定されている前記コンテンツの属性情報を取得する手段と、前記コンテンツを再生するための音場形成条件を前記特徴量と前記設定履歴情報又は前記属性情報とに基づいて判定する手段と、を備える。
本発明によると、コンテンツの特徴量に加えてリスナの設定履歴情報又はコンテンツの属性情報に基づいて音場形成条件を判定するため、最適な音場形成条件を高い精度で判定することができる。 (10) A sound field determination apparatus for achieving the above object includes means for analyzing the content to be reproduced to acquire the feature amount of the content, and listener setting history information or the content set in the content Means for obtaining the attribute information, and means for determining a sound field forming condition for reproducing the content based on the feature amount and the setting history information or the attribute information.
According to the present invention, since the sound field forming condition is determined based on the setting history information of the listener or the attribute information of the content in addition to the feature amount of the content, the optimum sound field forming condition can be determined with high accuracy.

（１１）上記目的を達成するための音場判定装置は、リスナの設定履歴情報及び再生対象のコンテンツに設定されている前記コンテンツの属性情報を取得する手段と、前記コンテンツを再生するための音場形成条件を前記設定履歴情報と前記属性情報とに基づいて判定する手段と、を備える。
本発明によると、リスナの設定履歴情報及びコンテンツの属性情報という性質が異なる２つの情報に基づいて音場形成条件を判定するため、最適な音場形成条件を高い精度で判定することができる。 (11) A sound field determination device for achieving the above object includes means for acquiring listener setting history information and attribute information of the content set in the content to be reproduced, and sound for reproducing the content. Means for determining a field formation condition based on the setting history information and the attribute information.
According to the present invention, since the sound field forming condition is determined based on two pieces of information having different properties such as listener setting history information and content attribute information, the optimum sound field forming condition can be determined with high accuracy.

（１２）上記目的を達成するための音場判定装置は、再生対象のコンテンツを解析し、前記コンテンツの音響チャネルのＬチャネルとＲチャネルとの相関係数、前記音響チャネルが示す音の大きさ、前記音響チャネルが示す音の高さ、前記音響チャネルが示す音の経時変化特性のいずれか２つ以上の特徴量を取得する手段と、前記コンテンツを再生するための音場形成条件を前記特徴量に基づいて判定する手段と、を備える。
本発明によると、性質が異なる２つ以上の音声特徴量に基づいて音場形成条件を判定するため、最適な音場形成条件を高い精度で判定することができる。 (12) The sound field determination apparatus for achieving the above object analyzes the content to be played back, the correlation coefficient between the L channel and the R channel of the acoustic channel of the content, and the volume of the sound indicated by the acoustic channel , Means for obtaining at least two feature amounts of a sound pitch indicated by the acoustic channel and a time-dependent change characteristic of the sound indicated by the acoustic channel, and a sound field forming condition for reproducing the content. And a means for determining based on the quantity.
According to the present invention, since the sound field forming condition is determined based on two or more sound feature quantities having different properties, the optimum sound field forming condition can be determined with high accuracy.

尚、請求項に記載された方法の各動作の順序は、技術上の阻害要因がない限り、記載順に限定されるものではなく、どのような順番で実行されてもよく、また同時に実行されてもよい。また、本発明に備わる複数の手段の各機能は、構成自体で機能が特定されるハードウェア資源、プログラムにより機能が特定されるハードウェア資源、又はそれらの組み合わせにより実現される。また、これら複数の手段の各機能は、各々が物理的に互いに独立したハードウェア資源で実現されるものに限定されない。また、本発明は方法及び装置の発明として特定できるだけでなく、プログラムの発明としても、そのプログラムを記録した記録媒体の発明としても特定することができる。 It should be noted that the order of each operation of the method described in the claims is not limited to the order of description as long as there is no technical obstruction factor, and may be executed in any order, or may be executed simultaneously. Also good. In addition, each function of the plurality of means provided in the present invention is realized by a hardware resource whose function is specified by the configuration itself, a hardware resource whose function is specified by a program, or a combination thereof. The functions of the plurality of means are not limited to those realized by hardware resources that are physically independent of each other. In addition, the present invention can be specified not only as a method and apparatus invention, but also as a program invention and a recording medium recording the program.

以下、本発明の実施の形態を実施例に基づいて説明する。
図１は、本発明の一実施例による音場判定装置１を示す機能ブロック図である。音場判定装置１は、ＡＶアンプ、ＤＶＤプレーヤ、ＡＶ再生機能付きパーソナルコンピュータ等に組み込まれる。音場判定装置１は、ＡＶアンプ等に入力されるＡＶデータ、リスナの設定履歴及びコンテンツの属性情報に基づいて再生対象のＡＶデータのコンテンツに最適な音場形成条件を判定し、判定結果に基づいて音場処理部２０に音場形成条件を設定する。ＡＶデータは画像チャネルの画像データ及び音響チャネルのオーディオ信号からなるコンテンツと、属性情報としての付属データを含みうる。画像データは、ＭＰＥＧ、ＮＴＳＣ等のどのようなフォーマットでもよく、フォーマットに対応したデコーダを備えることにより、ディスプレイコントローラ２２及びディスプレイ２６で再生することができる。オーディオ信号も、ディジタルでもアナログでもよく、フォーマットに対応したデコーダを備えることで、増幅器２４及びスピーカ２８、３０、３２、３４で再生することができる。付属データは、コンテンツに設定されている属性情報に相当し、ステレオ／モノラルの種別、一カ国語／二カ国語の種別等を判定可能にするデータである。 Hereinafter, embodiments of the present invention will be described based on examples.
FIG. 1 is a functional block diagram showing a sound field determination apparatus 1 according to an embodiment of the present invention. The sound field determination device 1 is incorporated in an AV amplifier, a DVD player, a personal computer with an AV playback function, or the like. The sound field determination device 1 determines the optimum sound field forming condition for the content of the AV data to be reproduced based on the AV data input to the AV amplifier or the like, the listener setting history, and the content attribute information. Based on this, a sound field forming condition is set in the sound field processing unit 20. The AV data can include content composed of image data of an image channel and an audio signal of an audio channel, and attached data as attribute information. The image data may be in any format such as MPEG or NTSC, and can be reproduced by the display controller 22 and the display 26 by providing a decoder corresponding to the format. The audio signal may be digital or analog, and can be reproduced by the amplifier 24 and the speakers 28, 30, 32, and 34 by providing a decoder corresponding to the format. The attached data corresponds to attribute information set in the content, and is data that makes it possible to determine a stereo / monaural type, a monolingual / bilingual type, and the like.

音場判定装置１は、解析部１０、操作部１４、リスナ推定部１６、コンテンツカテゴリ取得部１８及び音場形成条件判定部１２を備える。解析部１０は、図示しないＣＰＵ、ＲＡＭ、ＲＯＭ、ディスク記憶装置、音声処理用のＡＳＩＣ、画像処理用のＡＳＩＣ等で構成される。リスナ推定部１６、コンテンツカテゴリ取得部１８及び音場形成条件判定部１２は解析部１０を構成しているＣＰＵ、ＲＡＭ、ＲＯＭ及びディスク記憶装置で構成される。 The sound field determination device 1 includes an analysis unit 10, an operation unit 14, a listener estimation unit 16, a content category acquisition unit 18, and a sound field formation condition determination unit 12. The analysis unit 10 includes a CPU, RAM, ROM, disk storage device, audio processing ASIC, image processing ASIC, and the like (not shown). The listener estimation unit 16, the content category acquisition unit 18, and the sound field formation condition determination unit 12 are configured by a CPU, a RAM, a ROM, and a disk storage device that constitute the analysis unit 10.

操作部１４は、ＡＶアンプ、ＤＶＤプレーヤ等の本体に設けられた操作パネル、パーソナルコンピュータのキーボード等とリモートコントローラとから構成される。操作部１４がリモートコントローラと受信機とを備え、受信機に発信位置特定機能を備えることにより、リスナの視聴位置を特定できる。具体的には例えば、リモートコントローラと受信機とに赤外線通信を行わせ、受信機に赤外線の発光源方向の特定機能を備えることにより、本体前面からみて中央、右側、左側のいずれの方向にリスナが存在するかを特定し、中央、右、左のいずれかを視聴位置として出力する。操作部１４は、音場判定装置１にコンテンツカテゴリ、音場形成条件、視聴開始要求、視聴終了要求、チャネル、リスナＩＤ等を入力するための各種のボタン、十字キー、ジョグダイヤル、ＬＣＤ等を備えている。 The operation unit 14 includes an operation panel provided in a main body of an AV amplifier, a DVD player, etc., a keyboard of a personal computer, and a remote controller. When the operation unit 14 includes a remote controller and a receiver, and the receiver has a transmission position specifying function, the listening position of the listener can be specified. Specifically, for example, by making infrared communication between the remote controller and the receiver and providing the receiver with a function for specifying the direction of the infrared light source, the listener can be viewed in the center, right side, or left side as viewed from the front of the main unit. Is output, and the center, right, or left is output as the viewing position. The operation unit 14 includes various buttons, a cross key, a jog dial, an LCD, and the like for inputting a content category, a sound field forming condition, a viewing start request, a viewing end request, a channel, a listener ID, and the like. ing.

コンテンツカテゴリ取得部１８は、コンテンツに設定されている属性情報を取得し、属性情報に基づいてコンテンツカテゴリを出力する。コンテンツカテゴリ取得部１８がアクセスできる属性情報としては、ＥＰＧやＲＤＳで配信される番組カテゴリがある。属性情報は、コンテンツデータに添付されているデータであってもよいし、コンテンツデータの提供者が設定した属性情報を公開しているサーバコンピュータに格納されているデータであってもよい。 The content category acquisition unit 18 acquires attribute information set for the content, and outputs a content category based on the attribute information. The attribute information that can be accessed by the content category acquisition unit 18 includes a program category distributed by EPG or RDS. The attribute information may be data attached to the content data, or data stored in a server computer that publishes the attribute information set by the content data provider.

図２は、解析部１０を示す機能ブロック図である。
画像特徴解析部５０は、画像のＲＧＢ各チャネルの濃淡ヒストグラム、フレーム間の相関係数等の特徴量を算出する。濃淡ヒストグラムからは各フレームにおいて比較的広い面積を占める色が何かを判定することができる。フレーム間の相関係数からは対象物の動きが大きいか小さいかや、コンテンツが切り替わるタイミングを判定することができる。
ＦＦＴ部５６は、高速フーリエ変換によってオーディオ信号のスペクトルを算出する。 FIG. 2 is a functional block diagram showing the analysis unit 10.
The image feature analysis unit 50 calculates feature amounts such as a density histogram of each channel of RGB of an image and a correlation coefficient between frames. From the density histogram, it can be determined what color occupies a relatively large area in each frame. From the correlation coefficient between frames, it is possible to determine whether the movement of the object is large or small and the timing at which the content is switched.
The FFT unit 56 calculates the spectrum of the audio signal by fast Fourier transform.

帯域抽出部５８は、オーディオ信号の特定周波数成分の振幅の経時変化を検出する。振幅の経時変化を検出する周波数は、話し声、歌声、拍手音、ゴルフボールのカップインの音、伝統楽器音、電子楽器音、テニスボールが弾む音等の特定の音源に固有の周波数に設定される。特定周波数成分の振幅が安定しているか、細かく振動するか、ゆるやかに変動するか、離散的にピークが表れるか等によって、音源種を推定することができる。 The band extracting unit 58 detects a change over time in the amplitude of the specific frequency component of the audio signal. The frequency for detecting changes in amplitude over time is set to a frequency specific to a specific sound source, such as speaking voice, singing voice, applause sound, golf ball cup-in sound, traditional musical instrument sound, electronic musical instrument sound, and tennis ball bouncing sound. The The sound source type can be estimated depending on whether the amplitude of the specific frequency component is stable, vibrates finely, changes gently, or peaks appear discretely.

レベル検出部６０は、オーディオ信号に基づいて音の大きさを検出する。具体的には例えば、オーディオ信号の実効値レベルを検出する。音の大きさは音源種を推定するために検出される。
ピッチ検出部６２は、オーディオ信号に基づいて音の高さを検出する。具体的には例えば、振幅が最も大きい周波数成分を検出する。音の高さは音源種を推定するために検出される。 The level detector 60 detects the loudness based on the audio signal. Specifically, for example, the effective value level of the audio signal is detected. The loudness is detected to estimate the sound source type.
The pitch detector 62 detects the pitch of the sound based on the audio signal. Specifically, for example, a frequency component having the largest amplitude is detected. The pitch of the sound is detected in order to estimate the sound source type.

左右相関検出部６４は、オーディオ信号に基づいてＬチャネルとＲチャネルの相関係数（左右相関係数）を検出する。具体的には例えば、加算器６６でＬチャネルのレベルとＲチャネルのレベルを加算し、減算器６８でＬチャネルのレベルとＲチャネルのレベルの差を求め、加算器６６と減算器６８とでそれぞれ求めた和と差との差を減算器７２で求める。減算器７２から出力されるレベルが大きいほどＬチャネルとＲチャネルの相関が高く、小さいほど相関が低いと判定することができる。ＬチャネルとＲチャネルの相関の高低によって、録音環境を特定することができる。すなわち例えば、音源に対してどのようにマイクロホンが配置されているかを特定することができる。 The left / right correlation detection unit 64 detects a correlation coefficient (left / right correlation coefficient) of the L channel and the R channel based on the audio signal. Specifically, for example, the adder 66 adds the L channel level and the R channel level, the subtractor 68 obtains the difference between the L channel level and the R channel level, and the adder 66 and the subtractor 68 The subtracter 72 calculates the difference between the calculated sum and difference. It can be determined that the larger the level output from the subtractor 72, the higher the correlation between the L channel and the R channel, and the lower the correlation, the lower the correlation. The recording environment can be specified by the level of the correlation between the L channel and the R channel. That is, for example, it is possible to specify how the microphone is arranged with respect to the sound source.

音声特徴解析部５４は、帯域抽出部５８、レベル検出部６０、ピッチ検出部６２及び左右相関検出部６４の出力をコンテンツの特徴量として解析し音源種及び録音環境等を推定する。
音源種は例えば次のように推定できる。話し声、歌声などの人の声は、音の高さと、母音を特徴付ける成分音（ホルマント）の有無とによって判定できる。また、話し声は、人の声の高さに対応する周波数成分の振幅が比較的細かく振動する傾向がある。また、歌声は人の声の高さに対応する周波数成分の振幅が比較的ゆるやかに変化する傾向がある。また歓声は、低い周波数成分の振幅が比較的大きくなる傾向がある。楽器音は、音の高さ、特定の周波数成分の振幅の減衰特性、音の大きさ等によって楽器種まで判定できる。例えば、打楽器は音の高さが特定範囲内に分布し、対応する周波数成分の振幅の減衰が速く、音の大きさが大きい傾向がある。弦楽器は音の高さが特定範囲内に分布し、特定の周波数成分の振幅の減衰が遅い傾向がある。ピアノは音の高さが特定範囲内に分布し、対応する周波数成分の振幅の立ち上がりが速く減衰がやや遅い傾向がある。電子楽器は対応する周波数成分の振幅変化波形に均一なパターンが表れる傾向がある。拍手音は、音の高さが特定範囲内に分布し、対応する周波数成分の振幅が細かく狭い幅で振動する傾向がある。テニスボールが弾む音は、音の高さが特定範囲内に分布し、対応する周波数成分の振幅変化に１秒前後の間隔で特定のパターンが表れる傾向がある。室内楽とオーケストラとを比較すると、オーケストラの音の大きさは広い範囲に分布する傾向がある。 The audio feature analysis unit 54 analyzes the outputs of the band extraction unit 58, the level detection unit 60, the pitch detection unit 62, and the left-right correlation detection unit 64 as content feature amounts, and estimates the sound source type, recording environment, and the like.
The sound source type can be estimated as follows, for example. The voice of a person such as a speaking voice or a singing voice can be determined by the pitch of the sound and the presence or absence of a component sound (formant) that characterizes the vowel. In addition, the speaking voice tends to vibrate relatively finely in the amplitude of the frequency component corresponding to the height of the human voice. In addition, the singing voice tends to change the amplitude of the frequency component corresponding to the pitch of the human voice relatively slowly. Also, cheers tend to have relatively large amplitudes of low frequency components. The instrument sound can be determined up to the instrument type based on the pitch, the attenuation characteristic of the amplitude of a specific frequency component, the loudness, and the like. For example, percussion instruments tend to have a pitch that is distributed within a specific range, the amplitude of the corresponding frequency component is rapidly attenuated, and the volume of the sound is large. In stringed instruments, the pitch of sounds is distributed within a specific range, and the attenuation of the amplitude of a specific frequency component tends to be slow. Pianos have pitches distributed within a specific range, and the corresponding frequency components tend to rise quickly and decay somewhat slowly. Electronic musical instruments tend to have a uniform pattern in the amplitude variation waveform of the corresponding frequency component. The applause sound has a tendency that the pitch of the sound is distributed in a specific range and the amplitude of the corresponding frequency component is fine and vibrates with a narrow width. The sound of a tennis ball bouncing has a tendency that the pitch of the sound is distributed within a specific range, and a specific pattern appears at intervals of about 1 second in the amplitude change of the corresponding frequency component. When chamber music and orchestra are compared, the loudness of the orchestra tends to be distributed over a wide range.

録音環境は、ＬチャネルのレベルとＲチャネルのレベルとの相関の高さによって推定される。例えば、ニュース番組の録音環境では左右相関係数が極めて高くなる。また、ロックやポップスといった電子楽器が用いられる音楽の録音環境では、音源毎にマイクロホンがセットされ、それぞれのマイクロホンに対応する音像を分散して定位させるため、左右相関係数が極めて低くなる。また、オーケストラ、室内楽、スポーツ中継等の録音環境では、分散配置された複数の音源から離れた位置にマイクロホンがセットされることが多いため、左右相関係数がやや低くなる傾向にある。 The recording environment is estimated by the high correlation between the L channel level and the R channel level. For example, the left-right correlation coefficient is extremely high in a news program recording environment. Also, in a music recording environment in which electronic musical instruments such as rock and pop are used, microphones are set for each sound source, and sound images corresponding to the respective microphones are distributed and localized, so the left-right correlation coefficient is extremely low. In recording environments such as orchestras, chamber music, and sports broadcasts, microphones are often set at positions distant from a plurality of distributed sound sources, so the left-right correlation coefficient tends to be slightly lower.

推定部５２は、画像特徴解析部５０及び音声特徴解析部５４の出力と属性データとに基づいてコンテンツのカテゴリを推定する。推定部５２はコンテンツ推定用データベース１１を参照してコンテンツのカテゴリを推定する。
図３はコンテンツ推定用データベース１１の初期状態の一例を示す表である。図３に示した画像特徴、付属情報及び音特徴と推定カテゴリとの対応付けはあくまで例示であるが、例えば、画像に黒の頻度が高く、画像の経時変化が少なく、オーディオ信号がステレオ形式であって、伝統楽器音が極めて広いレベルに分布している場合、黒いタキシードを着た多数の楽団員が様々な楽器を座って演奏していると推定することができるため、推定カテゴリとして「オーケストラ」を登録することができる。また例えば、画像の経時変化が少なく、オーディオ信号が二カ国語の音声多重形式であって、話し声が多く、左右相関が極めて高い場合、アナウンサが座って解説を読み上げていると推定することができるため、推定カテゴリとして「解説」を登録することができる。 The estimation unit 52 estimates the content category based on the output and attribute data of the image feature analysis unit 50 and the audio feature analysis unit 54. The estimation unit 52 estimates the content category with reference to the content estimation database 11.
FIG. 3 is a table showing an example of the initial state of the content estimation database 11. The correspondence between the image feature, the attached information and the sound feature and the estimated category shown in FIG. 3 is merely an example. For example, the frequency of black is high in the image, the change of the image with time is small, and the audio signal is in a stereo format. If traditional musical instrument sounds are distributed over a very wide level, it can be estimated that many orchestras wearing black tuxedos are sitting and playing various instruments. Can be registered. Also, for example, if there is little change over time in the image, the audio signal is in a bilingual audio multiplex format, there are many spoken voices, and the left-right correlation is extremely high, it can be estimated that the announcer is sitting and reading the explanation. Therefore, “commentary” can be registered as an estimated category.

解析部１０は、リスナの設定履歴に基づいてカテゴリの特徴を学習する。具体的には、リスナが操作部１４を用いて特定の放送番組について明示的に特定のコンテンツカテゴリを設定すると、設定された番組に固有の画像特徴、付属情報及び音特徴を画像特徴解析部５０及び音声特徴解析部５４が検出し、解析部１０は検出された画像特徴、付属情報及び音特徴と、設定されたコンテンツカテゴリとを対応付けて図４に示すようにコンテンツ推定用データベース１１に登録する。例えば、リスナがコンテンツカテゴリとして「相撲」を設定した放送番組について、「肌色が多い」という画像特徴と「拍子の音、話し声、歓声が混じる」という音特徴とが検出されたとする。この場合、推定部５２は「相撲」という推定カテゴリと「肌色が多い」という画像特徴と「拍子の音、話し声、歓声が混じる」という音特徴とを対応付けてコンテンツ推定用データベース１１に登録する。 The analysis unit 10 learns the characteristics of the category based on the listener setting history. Specifically, when the listener uses the operation unit 14 to explicitly set a specific content category for a specific broadcast program, the image feature analysis unit 50 displays image features, attached information, and sound features specific to the set program. And the audio feature analysis unit 54 detect, and the analysis unit 10 associates the detected image feature, attached information and sound feature with the set content category and registers them in the content estimation database 11 as shown in FIG. To do. For example, it is assumed that an image feature of “many skin color” and a sound feature of “mixed beat, speech, and cheer” are detected for a broadcast program in which the listener sets “sumo” as the content category. In this case, the estimation unit 52 associates the estimated category of “sumo” with the image feature of “many skin color” and the sound feature of “mixed beat, speech, and cheer” in the content estimation database 11. .

リスナ推定部１６は、操作部１４で受け付けられるリスナの操作履歴が登録されるリスナ推定用データベース１７を備える。リスナ推定用データベース１７は、操作部１４が受け付けるユーザの視聴開始要求、視聴終了要求、チャネル選択要求及びリスナＩＤと、操作部１４が出力する視聴位置とを関連付けたレコードを設定履歴情報として蓄積する。図５はリスナ推定用データベース１７の一例を示す図である。視聴曜日及び視聴時間帯は操作部１４でユーザの視聴開始要求及び視聴終了要求が受け付けられたときにリスナ推定部１６がリアルタイムクロックから日時情報を取得することによって登録される。 The listener estimation unit 16 includes a listener estimation database 17 in which a listener operation history accepted by the operation unit 14 is registered. The listener estimation database 17 stores, as setting history information, a record in which the user's viewing start request, viewing end request, channel selection request, and listener ID received by the operation unit 14 are associated with the viewing position output by the operation unit 14. . FIG. 5 is a diagram illustrating an example of the listener estimation database 17. The viewing day of the week and the viewing time zone are registered by the listener estimating unit 16 acquiring date / time information from the real-time clock when the operation unit 14 receives a user's viewing start request and viewing end request.

リスナ推定部１６は、リスナ推定用データベース１７の登録内容に基づいて、現在視聴しているリスナを推定し、推定したリスナに対応するリスナＩＤを出力する。図６はリスナの操作内容とリスナ推定部１６から出力されるリスナＩＤとの対応関係の一例を示す図である。リスナ推定部１６は、操作部１４がユーザの視聴開始要求、チャネル選択要求等を新たに受け付けると、要求されたチャネルを視聴開始要求が受け付けられた曜日と時間帯に視聴していたことを示す履歴が所定の期間内にリスナ推定用データベース１７に所定回数（例えば１回）以上登録されていれば、過去に当該曜日と当該時間帯に当該チャネルを視聴したリスナから視聴開始要求を新たに受け付けたと推定し、推定したリスナに対応するリスナＩＤを出力する。例えば図５に示す設定履歴情報がリスナ推定用データベース１７に登録されている場合、月曜日から金曜日のいずれかの１２時１５分に３チャネルの視聴開始要求が受け付けられると、リスナ推定部１６はリスナＩＤとして「１００」を出力する。リスナ推定部１６は、推定されたリスナについてリスナによって明示的にリスナＩＤが入力されていればそのリスナＩＤを出力でき、入力されていなければ推定されたリスナにリスナＩＤを自動割り当てしてもよい。例えば、月曜日から金曜日の１２時１５分から１２時３０分の３チャネルの視聴履歴については明示的にリスナＩＤが入力されていないため、リスナ推定部１６は月曜日から金曜日の１２時１５分から１２時３０分の３チャネルの視聴についてリスナＩＤ「１００」を割り当てて出力する。また水曜日の２１時から２２時の１チャネルの視聴履歴についても明示的にリスナＩＤが入力されていないため、リスナ推定部１６は水曜日の１チャネルの視聴についてリスナＩＤ「１０１」を出力する。 The listener estimation unit 16 estimates the listener currently viewing based on the registered contents of the listener estimation database 17 and outputs a listener ID corresponding to the estimated listener. FIG. 6 is a diagram illustrating an example of a correspondence relationship between the operation contents of the listener and the listener ID output from the listener estimation unit 16. When the operation unit 14 newly accepts a user's viewing start request, channel selection request, etc., the listener estimation unit 16 indicates that the requested channel was viewed on the day and time of the day when the viewing start request was accepted. If the history has been registered in the listener estimation database 17 a predetermined number of times (for example, once) or more within a predetermined period, a new viewing start request is received from a listener who has viewed the channel in the past day and time. The listener ID corresponding to the estimated listener is output. For example, in the case where the setting history information shown in FIG. 5 is registered in the listener estimation database 17, if a request to start viewing 3 channels is received at 12:15 from Monday to Friday, the listener estimation unit 16 “100” is output as the ID. The listener estimation unit 16 can output the listener ID if the listener ID is explicitly input by the listener for the estimated listener, and may automatically assign the listener ID to the estimated listener if the listener ID is not input. . For example, since the listener ID is not explicitly input for the viewing history of the three channels from 12:15 to 12:30 from Monday to Friday, the listener estimation unit 16 performs the operation from 12:15 to 12:30 from Monday to Friday. Listener ID “100” is assigned and output for viewing of three channels. Also, since the listener ID is not explicitly input for the viewing history of 1 channel from 21:00 to 22:00 on Wednesday, the listener estimation unit 16 outputs the listener ID “101” for viewing of 1 channel on Wednesday.

同一時間帯の同一チャネルについて複数のリスナの視聴要求が登録されている場合、リスナ推定部１６は視聴位置に基づいてリスナを推定する。例えば日曜日の２１時から２２時に「右」の視聴位置で３チャネルについて視聴要求が受け付けられると、リスナ推定部１６は「右」の視聴位置から過去に同一の視聴要求をしたリスナのリスナＩＤ「００３」を出力する。同一の視聴要求を「中央」の視聴位置から受け付けた場合、リスナ推定部１６はリスナＩＤ「００１」を出力する。この推定は、視聴位置がリスナによって決まっているという前提に基づいている。尚、リスナの推定に用いる設定履歴情報として、例えばコンテンツカテゴリ、音場形成条件等の他の情報を用いてもよい。 When a plurality of listener viewing requests are registered for the same channel in the same time zone, the listener estimation unit 16 estimates the listener based on the viewing position. For example, when a viewing request is received for three channels at a viewing position of “right” from 21:00 to 22:00 on Sunday, the listener estimation unit 16 listens to the listener ID “of the listener who has made the same viewing request in the past from the viewing position of“ right ”. 003 "is output. When the same viewing request is received from the “center” viewing position, the listener estimation unit 16 outputs the listener ID “001”. This estimation is based on the assumption that the viewing position is determined by the listener. In addition, as setting history information used for listener estimation, for example, other information such as a content category and a sound field forming condition may be used.

音場形成条件判定部１２は（図１参照）、判定用データベース１３を備え、操作部１４又はコンテンツカテゴリ取得部１８から出力されるコンテンツカテゴリと解析部１０から出力される推定カテゴリと判定用データベース１３とに基づいて推定される最適な音場形成条件を音場処理部２０に設定する。音場形成条件判定部１２は、視聴開始要求が受け付けられたときと、その後の視聴中の一定時間毎に音場形成条件を設定する。この結果、番組途中でコンテンツの内容が変わったときにでも（例えばクラシック音楽から解説に変わったとき）、視聴中にチャネル変更要求が受け付けられても、コンテンツに最適な音場形成条件が設定されることになる。 The sound field formation condition determination unit 12 (see FIG. 1) includes a determination database 13, a content category output from the operation unit 14 or the content category acquisition unit 18, an estimated category output from the analysis unit 10, and a determination database. 13 is set in the sound field processing unit 20. The sound field formation condition determination unit 12 sets the sound field formation condition when a viewing start request is received and every certain time during the subsequent viewing. As a result, even when the content changes during the program (for example, when it changes from classical music to commentary), the optimum sound field formation conditions are set for the content even if a channel change request is accepted during viewing. Will be.

初期状態の判定用データベース１３には、例えば図７に示すように、リスナＩＤ、カテゴリ及び音場形成条件が対応付けて登録されている。音場形成条件としては、残響設定及び音像定位設定のほか、帯域毎のゲイン設定、音量設定等を含みうる。初期状態の判定用データベースには、コンテンツのカテゴリ毎に一般的に最適な音場形成条件が登録されており、リスナＩＤに固有の音場形成条件は存在しない。また判定用データベース１３には、図８に示すようにコンテンツカテゴリと推定カテゴリとが対応付けて登録されているカテゴリ対応テーブルが記録されている。カテゴリ対応テーブルには、操作部１４及びコンテンツカテゴリ取得部１８から番組毎にコンテンツカテゴリが出力された場合に、出力されたコンテンツカテゴリに対応する番組にはどのようなカテゴリが含まれる可能性があるかを示すレコードが登録されている。例えば、スポーツ番組には、ゴルフ、野球、テニスといった競技内容のシーンのほか、一般に解説のシーンも含まれる。ＥＰＧや操作部１４から「スポーツ」というコンテンツカテゴリが取得された場合であっても、解説中には「解説」を聞くのに最適な音場形成条件を設定することが望ましい。一方、ＥＰＧや操作部１４から取得できるコンテンツカテゴリを全く無視し、コンテンツの解析結果にのみ基づいて音場形成条件を設定すると、コンテンツの解析精度によっては不適切な音場形成条件が設定されるおそれもある。そこで、ＥＰＧや操作部１４から取得できるコンテンツカテゴリによってコンテンツの推定範囲を絞り込み、絞り込んだ範囲でコンテンツを推定することにより、動的に設定される音場形成条件を最適化することができる。 In the initial state determination database 13, for example, as shown in FIG. 7, a listener ID, a category, and a sound field forming condition are registered in association with each other. The sound field forming conditions may include reverberation setting and sound image localization setting, gain setting for each band, volume setting, and the like. In the initial state determination database, generally optimum sound field forming conditions are registered for each content category, and there is no sound field forming condition unique to the listener ID. The determination database 13 stores a category correspondence table in which content categories and estimated categories are registered in association with each other as shown in FIG. In the category correspondence table, when a content category is output for each program from the operation unit 14 and the content category acquisition unit 18, any category may be included in the program corresponding to the output content category. A record indicating that is registered. For example, a sports program generally includes commentary scenes in addition to scenes of competition content such as golf, baseball, and tennis. Even when the content category “sports” is acquired from the EPG or the operation unit 14, it is desirable to set an optimum sound field forming condition for listening to “explanation” during the explanation. On the other hand, if the sound field forming condition is set based only on the content analysis result while ignoring the content category that can be acquired from the EPG or the operation unit 14, an inappropriate sound field forming condition is set depending on the content analysis accuracy. There is also a fear. Therefore, the sound field forming conditions that are dynamically set can be optimized by narrowing down the content estimation range based on the content category that can be acquired from the EPG or the operation unit 14 and estimating the content within the narrowed range.

音場形成条件判定部１２は、リスナの設定履歴に基づいて最適な音場形成条件を学習する。すなわち、音場形成条件判定部１２は、リスナの設定履歴に基づいて判定用データベース１３を更新する。例えば、リスナＩＤが「００１」のリスナが「解説」というコンテンツカテゴリについて明示的に残響設定を「ルーム」に設定すると、音場形成条件判定部１２は図９に示すように、コンテンツカテゴリ「解説」に対応付けられたリスナＩＤ「全員」を「００１以外」に変更し、コンテンツカテゴリ「解説」とリスナＩＤ「００１」と残響設定「ルーム」とからなるレコードを判定用データベース１３に新規登録する。また例えばリスナＩＤが「１００」のリスナがコンテンツカテゴリ「ドラマ」について「右」の視聴位置から高頻度で視聴要求を入力すると、コンテンツカテゴリ「ドラマ」に対応付けられたリスナＩＤ「全員」を「１００以外」に変更し、コンテンツカテゴリ「ドラマ」とリスナＩＤ「１００」と音像定位設定「右」とからなるレコードを判定用データベース１３に新規登録する。音場定位設定が明示的に設定されていない場合であっても、リスナ推定用データベース１７にコンテンツカテゴリのフィールドを設けておき、解析部１０及びコンテンツカテゴリ取得部１８が出力するコンテンツカテゴリをリスナ推定用データベース１７に登録することによって、リスナＩＤと視聴位置とコンテンツカテゴリとから最適な音像定位設定条件を特定できる。 The sound field formation condition determination unit 12 learns the optimum sound field formation condition based on the listener setting history. That is, the sound field formation condition determination unit 12 updates the determination database 13 based on the listener setting history. For example, if the reverberation setting is explicitly set to “room” for the content category with the listener ID “001” being “commentary”, the sound field forming condition determination unit 12 will display the content category “commentary” as shown in FIG. ”Is changed to“ other than 001 ”, and a record including the content category“ commentary ”, the listener ID“ 001 ”, and the reverberation setting“ room ”is newly registered in the determination database 13. . For example, when a listener with a listener ID “100” frequently inputs a viewing request from the “right” viewing position for the content category “drama”, the listener ID “all” associated with the content category “drama” is set to “ The record is changed to “other than 100”, and a record including the content category “drama”, the listener ID “100”, and the sound image localization setting “right” is newly registered in the determination database 13. Even if the sound field localization setting is not explicitly set, a content category field is provided in the listener estimation database 17 to estimate the content category output by the analysis unit 10 and the content category acquisition unit 18. By registering in the database 17, the optimum sound image localization setting condition can be specified from the listener ID, viewing position, and content category.

以上説明したように、音場形成条件判定部１２は、コンテンツの特徴量、リスナの設定内容、コンテンツの付属情報のうち、２以上の情報に基づいて音場形成条件を総合的に判定するため、高い精度で最適な音場形成条件を設定することができる。 As described above, the sound field formation condition determination unit 12 comprehensively determines the sound field formation condition based on two or more pieces of information among the feature amount of the content, the setting contents of the listener, and the attached information of the content. It is possible to set an optimum sound field forming condition with high accuracy.

音場処理部２０は、音場形成条件判定部１２又は操作部１４によって設定される音場形成条件に基づいてオーディオ信号を処理し、特定の音場を形成するオーディオ信号を出力する。具体的には音場処理部２０は、遅延時間の異なるディレイが施された信号を加算することによって残響を形成したり、高周波成分を付加してボーカル音を引き立たせたり、ＬチャネルとＲチャネルに個別にディレイを設定することにより音像を特定位置に定位させる処理を行う。音場処理部２０は、これらの音場形成処理をディジタル信号処理で行ってもよいし、アナログ信号処理で行ってもよい。音場処理部２０は最後にＤＡ変換を施してアナログのオーディオ信号を出力する。 The sound field processing unit 20 processes the audio signal based on the sound field forming conditions set by the sound field forming condition determining unit 12 or the operation unit 14, and outputs an audio signal that forms a specific sound field. Specifically, the sound field processing unit 20 forms reverberation by adding signals subjected to delays having different delay times, adds a high frequency component to enhance vocal sound, and performs L channel and R channel. A process for localizing the sound image to a specific position by individually setting a delay is performed. The sound field processing unit 20 may perform these sound field forming processes by digital signal processing or analog signal processing. The sound field processing unit 20 finally performs DA conversion and outputs an analog audio signal.

音場処理部２０から出力されるオーディオ信号は増幅器２４で増幅され、スピーカ２８、３０、３２、３４に出力される。オーディオ信号に基づいて音響を放出するスピーカの数は１つであってもよいし、２つであってもよいし、３つ以上であってもよい。 The audio signal output from the sound field processing unit 20 is amplified by the amplifier 24 and output to the speakers 28, 30, 32 and 34. The number of speakers that emit sound based on the audio signal may be one, two, or three or more.

以上説明した本発明の一実施例によると、解析部１０から出力される推定カテゴリ、ＥＰＧから取得されるコンテンツカテゴリ等の付属情報及びリスナの設定履歴を用いて音場形成条件が設定されるため、高い精度で最適な音場形成条件を設定することができる。さらに、解析部１０では、画像データとオーディオ信号を解析してカテゴリを推定し、さらにオーディオ信号については音の大きさと、左右相関と、音の高さと、音の経時変化特性とを総合的に解析するため、高い精度でコンテンツのカテゴリを推定することができる。また、リスナ推定部１６が視聴履歴から特定の視聴パターンを検出し、それぞれの視聴パターンにリスナＩＤを割り当てるため、リスナがリスナＩＤを入力しなくても、音場形成条件判定部１２はリスナ毎に最適な音場形成条件を設定することができる。 According to the embodiment of the present invention described above, the sound field forming condition is set using the attached information such as the estimated category output from the analysis unit 10, the content category acquired from the EPG, and the listener setting history. It is possible to set an optimum sound field forming condition with high accuracy. Further, the analysis unit 10 analyzes the image data and the audio signal to estimate the category, and further, for the audio signal, comprehensively determines the sound volume, left-right correlation, sound pitch, and sound aging characteristics. Since the analysis is performed, the content category can be estimated with high accuracy. In addition, since the listener estimation unit 16 detects a specific viewing pattern from the viewing history and assigns a listener ID to each viewing pattern, the sound field formation condition determination unit 12 does not input the listener ID for each listener. It is possible to set the optimum sound field forming conditions.

本発明の一実施例に係る機能ブロック図。The functional block diagram which concerns on one Example of this invention. 本発明の一実施例に係る機能ブロック図。The functional block diagram which concerns on one Example of this invention. 本発明の一実施例に係るデータベースを示す図。The figure which shows the database which concerns on one Example of this invention. 本発明の一実施例に係るデータベースを示す図。The figure which shows the database which concerns on one Example of this invention. 本発明の一実施例に係るデータベースを示す図。The figure which shows the database which concerns on one Example of this invention. 本発明の一実施例に係るリスナ推定方法を説明するための表。The table | surface for demonstrating the listener estimation method which concerns on one Example of this invention. 本発明の一実施例に係るデータベースを示す図。The figure which shows the database which concerns on one Example of this invention. 本発明の一実施例に係るデータベースを示す図。The figure which shows the database which concerns on one Example of this invention. 本発明の一実施例に係るデータベースを示す図。The figure which shows the database which concerns on one Example of this invention.

Explanation of symbols

１：音場判定装置、１０：解析部、１１：コンテンツ推定用データベース、１２：音場形成条件判定部、１３：判定用データベース、１４：操作部、１６：リスナ推定部、１７：リスナ推定用データベース、１８：コンテンツカテゴリ取得部、２０：音場処理部、 1: sound field determination device, 10: analysis unit, 11: content estimation database, 12: sound field formation condition determination unit, 13: determination database, 14: operation unit, 16: listener estimation unit, 17: listener estimation Database, 18: content category acquisition unit, 20: sound field processing unit,

Claims

Analyzing the content to be played back to obtain a feature amount of the content;
Obtaining listener setting history information or attribute information of the content set in the content;
Determining a sound field forming condition for reproducing the content based on the feature amount and the setting history information or the attribute information;
The sound field determination method characterized by including.

The sound field determination method according to claim 1, wherein in the step of determining the sound field formation condition, the sound field formation condition is determined according to a change in the feature amount during reproduction of the content.

Detecting the operation of the listener;
Estimating the listener based on the detected operation, and
The sound field determination method according to claim 1, wherein in the step of acquiring the setting history information, the setting history information corresponding to the estimated listener is acquired.

The sound field determination method according to claim 1, wherein in the step of acquiring the feature amount, the acoustic channel of the content is analyzed to acquire the feature amount of the acoustic channel.

In the step of acquiring the feature amount, the correlation coefficient between the L channel and the R channel of the acoustic channel, the volume of the sound indicated by the acoustic channel, the pitch of the sound indicated by the acoustic channel, and the sound indicated by the acoustic channel 5. The sound field determination method according to claim 4, wherein two or more feature amounts of the time-dependent change characteristics of the sound field are acquired.

The sound field determination method according to claim 1, wherein in the step of acquiring the feature amount, the feature amount of the image channel is acquired by analyzing an image channel of the content.

Obtaining listener setting history information and attribute information of the content set in the content to be played;
Determining a sound field forming condition for reproducing the content based on the setting history information and the attribute information;
The sound field determination method characterized by including.

Detecting the operation of the listener;
Estimating the listener based on the detected operation, and
The sound field determination method according to claim 7, wherein in the step of acquiring the setting history information, the setting history information corresponding to the estimated listener is acquired.

Analyzing the content to be played back, the correlation coefficient between the L channel and the R channel of the audio channel of the content, the volume of the sound indicated by the audio channel, the pitch of the sound indicated by the audio channel, and the audio channel Obtaining any two or more features of the time-dependent characteristics of the sound;
Determining a sound field forming condition for reproducing the content based on the feature amount;
The sound field determination method characterized by including.

Means for analyzing the content to be played back and obtaining the feature amount of the content;
Means for acquiring listener setting history information or attribute information of the content set in the content;
Means for determining a sound field forming condition for reproducing the content based on the feature amount and the setting history information or the attribute information;
A sound field determination apparatus comprising:

Means for acquiring listener setting history information and attribute information of the content set in the content to be played;
Means for determining a sound field forming condition for reproducing the content based on the setting history information and the attribute information;
A sound field determination apparatus comprising:

Analyzing the content to be played back, the correlation coefficient between the L channel and the R channel of the audio channel of the content, the volume of the sound indicated by the audio channel, the pitch of the sound indicated by the audio channel, and the audio channel Means for acquiring any two or more feature quantities of the time-dependent characteristic of sound;
Means for determining a sound field forming condition for reproducing the content based on the feature amount;
A sound field determination apparatus comprising: