JP2010175908A

JP2010175908A - Musical piece extraction device

Info

Publication number: JP2010175908A
Application number: JP2009019402A
Authority: JP
Inventors: Hisatoshi Omae; 寿敏大前; Tomoji Yamamoto; 友二山本; Tatsuo Koga; 達雄古賀; Satoru Matsumoto; 悟松本
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 2009-01-30
Filing date: 2009-01-30
Publication date: 2010-08-12

Abstract

<P>PROBLEM TO BE SOLVED: To provide a musical piece extraction device avoiding separation of a musical piece section to the minimum even if wrong clustering is performed temporarily in the middle of the musical piece section. <P>SOLUTION: The musical piece extraction device, which extracts the musical piece section from voice data, includes: a voice data acquisition part which acquires the voice data; a classification part which classifies the acquired voice data portion into a musical piece portions and non-musical piece portions; and an identification part which identifies the start point and the end point of the musical piece section. The identification part detects as the start point a place where the classification result transits from the non-musical piece portion to the musical piece portion, and detects as the end point a place where the result transits from the musical piece portion to the non-musical piece portion. If no musical piece portion is detected in a predetermined period of time after the end point, the end point is defined as the end point of the musical piece section. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、音声信号から楽曲区間を抽出する楽曲抽出装置に関する。 The present invention relates to a music extraction device that extracts music sections from audio signals.

テレビ放送やラジオ放送の音楽番組等における音声信号には、一般的に、楽曲（音楽）の区間と、非楽曲の区間（楽曲ではない区間）が混在している。なお非楽曲の区間としては、例えば、ＭＣ［master of ceremony］やＤＪ［disk jockey］等による、話し声が占める区間が挙げられる。 An audio signal in a television broadcast or radio broadcast music program or the like generally includes a music (music) section and a non-music section (non-music section). Examples of the non-music section include a section occupied by spoken voice, such as MC [master of ceremony] or DJ [disk jockey].

このような状況において、視聴者等によっては、音声信号の中から楽曲の区間だけを記録したいと要望することがある。この場合、放送を録音しておき、好みの音楽だけを後で編集作業によって切出すことも可能と考えられる。 In such a situation, some viewers or the like may desire to record only the music section from the audio signal. In this case, it is also possible to record the broadcast and cut out only the favorite music later by editing work.

しかし、このような編集作業は通常煩わしいため、楽曲の区間が自動的に抽出されて記録される装置があれば便利である。また、大まかにでも楽曲の区間が自動的に抽出されて記録されていれば、当該編集作業の負担は軽減されると考えられる。なお、例えば特許文献１によれば、ＭＦＣＣなどの周波数特徴量を用いて、音楽と話し声の識別を行う装置が開示されている。 However, such editing work is usually cumbersome, and it would be convenient if there was an apparatus that automatically extracted and recorded music sections. In addition, if the music section is automatically extracted and recorded even roughly, it is considered that the burden of the editing work is reduced. For example, according to Patent Document 1, an apparatus for discriminating music and spoken voice using a frequency feature quantity such as MFCC is disclosed.

特許文献１に開示されている装置は、音声信号の各箇所（カット点の近傍）について、音楽か話し声かの分類（クラスタリング）を実行する。そして話し声から音楽に遷移した箇所を、楽曲区間の開始点（始点）として特定し、音楽から話し声に遷移した箇所を、楽曲区間の終了点（終点）として特定するようになっている。これにより、始点から終点までの区間、すなわち楽曲に分類される箇所が連続する区間を、楽曲区間として抽出するようになっている。 The apparatus disclosed in Patent Literature 1 performs classification (clustering) of music or speech for each part (near a cut point) of an audio signal. And the location which changed from speech to music is specified as the start point (start point) of a music section, and the location which changed from music to speech is specified as the end point (end point) of a music section. As a result, a section from the start point to the end point, that is, a section in which portions classified as music pieces continue is extracted as a music section.

特開２００８−２４１８５０号公報JP 2008-241850 A 特開２００７−２１９１７８号公報JP 2007-219178 A

上述したような楽曲抽出装置においては、始点や終点を正確に特定するため、クラスタリングが常時正確に実行されることが望ましい。しかし実際には、何らかの原因（例えばノイズの混入など）により、一時的に誤ったクラスタリングがなされることも想定される。そこで、楽曲抽出装置においては、このように誤ったクラスタリングがなされた場合であっても、出来るだけユーザビリティが損なわれないようになっていることが望ましい。 In the music extraction device as described above, it is desirable that clustering is always performed accurately in order to accurately specify the start point and the end point. However, in reality, it is also assumed that erroneous clustering is temporarily performed due to some cause (for example, mixing of noise or the like). In view of this, it is desirable that the music extraction device is designed so that usability is not impaired as much as possible even when erroneous clustering is performed.

ここで、実際には楽曲区間の途中であるにも関わらず、一時的に、誤ったクラスタリングがなされた場合（非楽曲とクラスタリングされた場合）を想定する。この場合、従来の楽曲抽出装置では、誤ったクラスタリングがなされた箇所で、楽曲区間が分離されてしまうことになる。このような事態は、本来は一つであるはずの楽曲区間が、複数の楽曲区間に分離された状態で抽出されることにも繋がり、ユーザビリティの観点などから好ましいとは言えない。 Here, a case is assumed in which erroneous clustering is temporarily performed (in the case of clustering with non-music) in spite of actually being in the middle of a music section. In this case, in the conventional music extraction device, music sections are separated at the places where erroneous clustering is performed. Such a situation also leads to the fact that a music section that should be one in nature is extracted in a state of being separated into a plurality of music sections, which is not preferable from the viewpoint of usability.

本発明は上述した問題点に鑑み、楽曲区間の途中において一時的に誤ったクラスタリングがなされた場合であっても、当該楽曲区間が分離される事態を極力回避することが可能な楽曲抽出装置の提供を目的とする。 In view of the above-described problems, the present invention provides a music extraction device capable of avoiding the situation where music sections are separated even when temporary clustering is temporarily performed in the middle of the music sections. For the purpose of provision.

上記目的を達成するため、本発明に係る楽曲抽出装置は、音声データから楽曲区間を抽出する楽曲抽出装置であって、音声データを取得する音声データ取得部と、取得した音声データの部分を楽曲部か非楽曲部かに分類する分類部と、該分類の結果に基づいて、楽曲区間の始点と終点を特定する特定部と、を備え、前記特定部は、前記分類の結果が非楽曲部から楽曲部に遷移した箇所を始点として検出し、楽曲部から非楽曲部に遷移した箇所を終点として検出するとともに、前記終点以降の所定期間において楽曲部が検出されなかった場合は、前記終点を楽曲区間の終点として確定する構成とする。 In order to achieve the above object, a music extraction device according to the present invention is a music extraction device that extracts a music section from audio data, and an audio data acquisition unit that acquires audio data and a portion of the acquired audio data as a music A classifying unit that classifies a music piece or a non-music part, and a specifying unit that specifies a start point and an end point of a music section based on the result of the classification, wherein the specifying unit has a result of the classification that is a non-music part If the music part is not detected in the predetermined period after the end point, the end point is detected as the end point. The composition is determined as the end point of the music section.

本構成によれば、楽曲区間の途中において一時的に誤ったクラスタリング（分類）がなされた場合であっても、上記所定期間内において、正しいクラスタリングがなされた場合には、楽曲区間の終点が誤って設定されることが防止される。そのため、当該楽曲区間が分離される事態を、極力回避することが可能となる。 According to this configuration, even if clustering (classification) is temporarily performed in the middle of a music section, if correct clustering is performed within the predetermined period, the end point of the music section is incorrect. Setting is prevented. Therefore, it is possible to avoid the situation where the music section is separated as much as possible.

上述した通り、本発明に係る楽曲抽出装置によれば、楽曲区間の途中において一時的に誤ったクラスタリング（分類）がなされた場合であっても、上記所定期間内において、正しいクラスタリングがなされた場合には、楽曲区間の終点が誤って設定されることが防止される。そのため、当該楽曲区間が分離される事態を、極力回避することが可能となる。 As described above, according to the music extraction device according to the present invention, even when incorrect clustering (classification) is temporarily performed in the middle of a music section, correct clustering is performed within the predetermined period. This prevents the end point of the music section from being set erroneously. Therefore, it is possible to avoid the situation where the music section is separated as much as possible.

本発明の実施形態に係る放送受信装置の構成図である。It is a block diagram of the broadcast receiver which concerns on embodiment of this invention. 楽曲区間抽出処理のフローチャートの一部である。It is a part of flowchart of a music area extraction process. 楽曲区間抽出処理のフローチャートの一部である。It is a part of flowchart of a music area extraction process. 楽曲区間抽出処理を説明するための説明図である。It is explanatory drawing for demonstrating a music area extraction process.

本発明の実施形態について、ＦＭラジオの放送を受信する放送受信装置を例に挙げて、以下に説明する。当該放送受信装置の構成図（ブロック図）を、図１に示す。本図に示すように、放送受信装置１は、ＦＭチューナ部１１、Ａ／Ｄ変換部１２、ＭＰ３Ｃｏｄｅｃ部１３、Ｄ／Ａ変換部１４、スピーカ１５、ＤＳＰ１６、ＣＰＵ１７、メモリ１８、バス１９、ＨＤＤ２０、およびＨＤＤ−ＩＦ２１などを備えている。 An embodiment of the present invention will be described below by taking a broadcast receiving apparatus that receives FM radio broadcasts as an example. A block diagram of the broadcast receiving apparatus is shown in FIG. As shown in the figure, the broadcast receiving apparatus 1 includes an FM tuner unit 11, an A / D conversion unit 12, an MP3 codec unit 13, a D / A conversion unit 14, a speaker 15, a DSP 16, a CPU 17, a memory 18, a bus 19, and an HDD 20. And HDD-IF 21 and the like.

ＦＭチューナ部１１は、前段側（例えばアンテナ）から継続的に入力されるＦＭラジオの放送（アナログの音声データ）に対し、選局処理を施して後段側に伝送する。Ａ／Ｄ変換部１２は、ＦＭチューナ部１１から継続的に伝送されるアナログの音声データを、ＰＣＭ［Pulse Code Modulation］の方式によって、デジタルの音声信号（以下、「ＰＣＭデータ」と称する）に変換し、後段側に出力する。このＰＣＭデータは、アナログの音声データにおける各箇所（ＰＣＭでのサンプリングがなされた箇所）での音声の大きさを、デジタル的に表すことになる。 The FM tuner unit 11 performs channel selection processing on FM radio broadcast (analog audio data) continuously input from the preceding stage (for example, an antenna) and transmits it to the subsequent stage. The A / D conversion unit 12 converts the analog audio data continuously transmitted from the FM tuner unit 11 into a digital audio signal (hereinafter referred to as “PCM data”) by a PCM [Pulse Code Modulation] method. Convert and output to the latter stage. This PCM data digitally represents the volume of audio at each location (location where sampling is performed by PCM) in analog audio data.

なお本願における、音声データについての「箇所」の用語は、当該音声データの内容を時間軸に表した場合の位置（どのタイミングでその音声が現れるか）を表す概念である。ある音声データにおける互いに異なる箇所の音声は、該音声データが再生された場合、異なるタイミングで出力されることになる。 In the present application, the term “location” for audio data is a concept representing the position (when the audio appears) when the content of the audio data is represented on the time axis. The sound at different locations in certain sound data is output at different timings when the sound data is reproduced.

ＭＰ３Ｃｏｄｅｃ部１３は、Ａ／Ｄ変換部１２から出力されるＰＣＭデータをＭＰ３形式の符号化によって圧縮する機能と、圧縮されたＰＣＭデータを伸張する（復号化する）機能を備えている。例えば、Ａ／Ｄ変換部１２から出力されるＰＣＭデータを符号化したり、後述するＤＳＰ１６により抽出された楽曲区間のＰＣＭデータを符号化したりする。またＭＰ３Ｃｏｄｅｃ部１３は、ＨＤＤ２０に圧縮符号化して記録されたＰＣＭデータを、復号化する。 The MP3 Codec unit 13 has a function of compressing PCM data output from the A / D conversion unit 12 by MP3 format encoding and a function of expanding (decoding) the compressed PCM data. For example, PCM data output from the A / D conversion unit 12 is encoded, or PCM data of a music section extracted by the DSP 16 described later is encoded. In addition, the MP3 Codec unit 13 decodes PCM data recorded by compression encoding on the HDD 20.

Ｄ／Ａ変換部１４は、ＭＰ３Ｃｏｄｅｃ部１３によって復号化されたＰＣＭデータを、アナログの音声信号に変換する。スピーカ１５は、Ｄ／Ａ変換部１４から伝送されてきた音声信号に基づいて、音声を発生させる。 The D / A conversion unit 14 converts the PCM data decoded by the MP3 Codec unit 13 into an analog audio signal. The speaker 15 generates sound based on the sound signal transmitted from the D / A conversion unit 14.

ＤＳＰ［Digital Signal Processor］１６は、予めその内部に記録されているプログラム（「音声データ処理プログラム」とする）等に従って、各種の処理を実行する。当該処理には、ＰＣＭデータから楽曲に対応する区間（楽曲区間）を抽出して、ＨＤＤ２０に記録させる処理（楽曲区間抽出処理）などが含まれる。なお楽曲区間抽出処理のより詳細な内容については、改めて説明する。 The DSP [Digital Signal Processor] 16 executes various processes according to a program (referred to as an “audio data processing program”) or the like recorded therein in advance. The process includes a process of extracting a section (music section) corresponding to the music from the PCM data and recording it on the HDD 20 (music section extraction process). More detailed contents of the music section extraction process will be described again.

ＣＰＵ［Central Processing Unit］１７は、放送受信装置１において実行される各種処理を制御する。またメモリ１８は、音声信号などを一時的に記憶する他、放送受信装置１での処理に用いられる各種情報を記憶する。なおメモリ１８が記憶する情報には、楽曲区間抽出処理において用いられる、楽曲モデルおよび非楽曲モデルが含まれる。 The CPU [Central Processing Unit] 17 controls various processes executed in the broadcast receiving apparatus 1. The memory 18 temporarily stores audio signals and the like, and stores various information used for processing in the broadcast receiving apparatus 1. Note that the information stored in the memory 18 includes a music model and a non-music model used in the music section extraction process.

この「楽曲モデル」は、何らかの楽曲、或いは、楽曲の教師データ（例えば複数ジャンルの代表的な楽曲が合成されて作成されたもの）についての、周波数領域の特徴量の分布を表すモデルのことである。なお本願では、「周波数領域の特徴量」を「周波数特徴量」と略記することがある。この周波数特徴量は、例えば、ＭＦＣＣ［Mel Frequency Cepstral Coefficients］が該当する。また「非楽曲モデル」は、何らかの非楽曲（例えば、ＤＪなどの話し声）についての周波数特徴量の分布を表すモデルのことである。 This “music model” is a model that represents the distribution of feature quantities in the frequency domain for some music or for teacher data of music (for example, created by combining representative music of multiple genres). is there. In the present application, the “frequency domain feature amount” may be abbreviated as “frequency feature amount”. This frequency feature amount corresponds to, for example, MFCC [Mel Frequency Cepstral Coefficients]. Further, the “non-music model” is a model that represents the distribution of frequency feature amounts for some non-music (for example, a voice of DJ or the like).

また、ＭＰ３Ｃｏｄｅｃ部１３、Ｄ／Ａ変換部１４、ＤＳＰ１６、ＣＰＵ１７、およびメモリ１８は、バス１９を通じて互いにアクセス可能となっている。 In addition, the MP3 Codec unit 13, the D / A conversion unit 14, the DSP 16, the CPU 17, and the memory 18 can access each other through a bus 19.

ＨＤＤ［Hard Disk Drive］２０は、大容量記憶装置であり、ＭＰ３形式で符号化されたＰＣＭデータ等、各種の情報を記憶する。またＨＤＤ２０は、ＨＤＤ−ＩＦ２１（例えばＡＴＡインターフェース）を介して、ＭＰ３Ｃｏｄｅｃ部１３に接続されている。 The HDD [Hard Disk Drive] 20 is a mass storage device, and stores various information such as PCM data encoded in the MP3 format. The HDD 20 is connected to the MP3 Codec unit 13 via an HDD-IF 21 (for example, an ATA interface).

上述した構成により、放送受信装置１は、ＦＭラジオ放送の通常の受信に加え、先述した楽曲区間抽出処理を実行することが可能となっている。次に、楽曲区間抽出処理の内容について、図２および図３に示すフローチャートを参照しながら、以下に説明する。 With the above-described configuration, the broadcast receiving apparatus 1 can execute the above-described music segment extraction process in addition to normal reception of FM radio broadcast. Next, the contents of the music segment extraction process will be described below with reference to the flowcharts shown in FIGS.

楽曲区間抽出処理は、例えばユーザによる楽曲区間抽出処理の実行指示が有った場合に、開始される。楽曲区間抽出処理が開始されると、ＤＳＰ１６は、Ａ／Ｄ変換部１２から継続的に出力されるＰＣＭデータを、次々とメモリ１８に一時記憶させる処理（楽曲区間抽出処理が終了するまで記憶させる処理）を開始する（ステップＳ１１）。この処理は、楽曲区間抽出処理が終了するまで継続される。 The music segment extraction process is started, for example, when there is an instruction to perform the music segment extraction process by the user. When the music section extraction process is started, the DSP 16 temporarily stores the PCM data continuously output from the A / D converter 12 in the memory 18 one after another (stores until the music section extraction process is completed). Process) is started (step S11). This process is continued until the music segment extraction process is completed.

また当該処理に並行して、まず楽曲区間の始点を特定するべく、以下の処理が実行される。ＤＳＰ１６は、メモリ１８に記憶されたＰＣＭデータについて、前の方の箇所から順に（つまり、時間的に早い方から）パワーの変化量（音声の大きさの変化度合を表す）を調べる。そしてこのパワーの変化量が所定値（Δｐとする）以上となる箇所を、第１カット点として検出する（ステップＳ１２）。つまり、ＰＣＭデータにおける音声の大きさの変化度合が所定値以上となる箇所が、第１カット点として検出される。 In parallel with this process, the following process is first executed to specify the start point of the music section. The DSP 16 checks the amount of change in power (representing the degree of change in the sound level) in order from the earlier location (that is, from the earlier time point) for the PCM data stored in the memory 18. A portion where the amount of change in power is equal to or greater than a predetermined value (Δp) is detected as a first cut point (step S12). That is, a portion where the degree of change in the sound volume in the PCM data is greater than or equal to a predetermined value is detected as the first cut point.

より具体的には、ＰＣＭデータにおける着目する箇所を基準としたパワーと、当該着目する箇所の前の箇所を基準としたパワーの差を算出し、この差がΔｐ以上となる場合に、当該着目する箇所を第１カット点として検出する。なおパワーの算出は、例えば、１個または数サンプルについて二乗平均をとることで実現される。またパワーの変化量は、上述したものの他、例えば、前後のパワーの時間微分や相関としても構わない。また、以降のカット点を検出する処理（ステップＳ１５、Ｓ２０、およびＳ２４）も、ステップＳ１２の処理と同様の手法で実行される。 More specifically, the difference between the power based on the location of interest in the PCM data and the power based on the location before the location of interest is calculated, and when this difference is greater than or equal to Δp, The location to be detected is detected as the first cut point. Note that the calculation of power is realized, for example, by taking the root mean square for one or several samples. The amount of change in power may be, for example, the time differentiation or correlation of the power before and after the above. Further, the process for detecting the subsequent cut points (steps S15, S20, and S24) is also executed by the same method as the process of step S12.

そして第１カット点が検出されたら、ＤＳＰ１６は、第１カット点におけるＰＣＭデータを、楽曲か非楽曲にクラスタリング（分類）する（ステップＳ１３）。このクラスタリングは、例えば、次のようにして実現される。 When the first cut point is detected, the DSP 16 clusters (classifies) the PCM data at the first cut point into music or non-music (step S13). This clustering is realized as follows, for example.

先ず、ＰＣＭデータについて、第１カット点を中心とした所定領域（例えば１秒分の領域）が所定数（例えば１００個）のパートに分割される。そして各パートについての周波数特徴量が算出される。その後、各パートについて、当該周波数特徴量と楽曲モデルとの尤度Ａ、および、当該周波数特徴量と非楽曲モデルとの尤度Ｂが算出されるとともに、尤度Ａと尤度Ｂの何れが大きいかが判定される。 First, for PCM data, a predetermined area (for example, an area for one second) centered on the first cut point is divided into a predetermined number (for example, 100) parts. Then, the frequency feature amount for each part is calculated. After that, for each part, the likelihood A between the frequency feature quantity and the music model and the likelihood B between the frequency feature quantity and the non-music model are calculated, and any of the likelihood A and the likelihood B is calculated. It is determined whether it is larger.

なお尤度の算出には、例えばＧＭＭ［Gaussian Mixture Model］を用いた公知の手法が用いられる。また両者間の尤度が高いほど、両者が一致している可能性が高い、或いは、両者がより近似していると言うことができる。また尤度算出は、メモリ１８に登録された楽曲および非楽曲に関する所定の評価関数に、各パートについての周波数特徴量を代入することによって、実現させることも可能である。 For calculating the likelihood, for example, a known method using GMM [Gaussian Mixture Model] is used. Moreover, it can be said that the higher the likelihood between the two, the higher the possibility that they are coincident with each other, or they are more approximate. The likelihood calculation can also be realized by substituting the frequency feature amount for each part into a predetermined evaluation function related to music and non-music registered in the memory 18.

そして、尤度Ａが尤度Ｂより大きいと判定されたパートの割合が、第１カット点における「楽曲への近似度合」として算出される。なお当該近似度合が高いほど、第１カット点におけるＰＣＭデータが、楽曲区間に属している可能性が高いと言える。 Then, the ratio of the parts determined that the likelihood A is greater than the likelihood B is calculated as the “approximation degree to the music” at the first cut point. It can be said that the higher the degree of approximation, the higher the possibility that the PCM data at the first cut point belongs to the music section.

そして更に、当該近似度合が、予め設定されている判定閾値Ｔａ以上であるかが判断される。その結果、当該近似度合が判定閾値Ｔａ以上であると判断された場合には、第１カット点でのＰＣＭデータは楽曲にクラスタリングされ、逆に判定閾値Ｔａ未満であると判断された場合には、第１カット点でのＰＣＭデータは非楽曲にクラスタリングされる。なお、以降のクラスタリングに係る処理（ステップＳ２１およびＳ２５）も、同様の手法で実行される。 Further, it is determined whether the degree of approximation is equal to or greater than a preset determination threshold Ta. As a result, when it is determined that the degree of approximation is greater than or equal to the determination threshold Ta, the PCM data at the first cut point is clustered into music, and conversely when it is determined that the degree of approximation is less than the determination threshold Ta. The PCM data at the first cut point is clustered into non-music pieces. In addition, the process (step S21 and S25) regarding subsequent clustering is performed by the same method.

上述したステップＳ１３の処理がなされた結果、第１カット点でのＰＣＭデータが非楽曲にクラスタリングされた場合には（ステップＳ１４のＮ）、第１カット点は、楽曲区間の始点ではないと推定される。そこでＤＳＰ１６は、今回検出された第１カット点の情報を破棄し、ＰＣＭデータにおける、その次に（今回検出された第１カット点より後に）パワーの変化量がΔｐ以上となる箇所を、新たに第１カット点として検出する（ステップＳ１５）。その後ＤＳＰ１６は、再度ステップＳ１３の処理を実行する。 If the PCM data at the first cut point is clustered into non-music as a result of the processing in step S13 described above (N in step S14), it is estimated that the first cut point is not the start point of the music section. Is done. Therefore, the DSP 16 discards the information of the first cut point detected this time, and newly adds a point in the PCM data (after the first cut point detected this time) where the power change amount is equal to or greater than Δp. Are detected as first cut points (step S15). Thereafter, the DSP 16 executes the process of step S13 again.

一方、第１カット点でのＰＣＭデータが楽曲にクラスタリングされた場合には（ステップＳ１４のＹ）、ＤＳＰ１６は、今回検出された第１カット点を、楽曲区間の始点に設定する（ステップＳ１６）。なおこのとき、第１カット点自体ではなく、第１カット点を基準とした所定箇所（例えば、第１カット点から所定時間分だけ離れた箇所）が、楽曲区間の始点に設定されても構わない。その後、今度は楽曲区間の終点を特定するべく、以下の処理が実行される。 On the other hand, when the PCM data at the first cut point is clustered into music (Y in step S14), the DSP 16 sets the first cut point detected this time as the start point of the music section (step S16). . At this time, not the first cut point itself but a predetermined location based on the first cut point (for example, a location separated from the first cut point by a predetermined time) may be set as the start point of the music section. Absent. Thereafter, the following processing is executed in order to specify the end point of the music section.

ＤＳＰ１６は、ＰＣＭデータにおける、その次に（楽曲区間の始点より後に）パワーの変化量がΔｐ以上となる箇所を、第２カット点として検出する（ステップＳ２０）。そして第２カット点が検出されたら、ＤＳＰ１６は、第２カット点でのＰＣＭデータを、楽曲か非楽曲にクラスタリングする（ステップＳ２１）。 The DSP 16 detects, as the second cut point, the next point in the PCM data (after the start point of the music section) where the power change amount is Δp or more (step S20). When the second cut point is detected, the DSP 16 clusters the PCM data at the second cut point into music or non-music (step S21).

そして第２カット点でのＰＣＭデータが楽曲にクラスタリングされた場合には（ステップＳ２２のＮ）、第２カット点においては、依然として楽曲区間が継続していると考えられる。そこでＤＳＰ１６は、今回検出された第２カット点の情報を破棄し、ステップＳ２０の処理に戻る。ここでのステップＳ２０の処理では、ＰＣＭデータにおける、その次に（今回検出された第２カット点より後に）パワーの変化量がΔｐ以上となる箇所が、新たに第２カット点として検出される。 When the PCM data at the second cut point is clustered into music (N in step S22), it is considered that the music section is still continued at the second cut point. Therefore, the DSP 16 discards the information of the second cut point detected this time, and returns to the process of step S20. In the process of step S20 here, the next point in the PCM data (after the second cut point detected this time) where the power change amount becomes Δp or more is newly detected as the second cut point. .

一方、第２カット点が非楽曲にクラスタリングされた場合には（ステップＳ２２のＹ）、第２カット点においては、原則として楽曲区間は終了していると考えられる。ただし、何らかの原因により、実際には楽曲区間の継続中であるにも関わらず、一時的に非楽曲にクラスタリングされた（つまり、誤ったクラスタリングがなされた）可能性も否めない。そこで、第２カット点において楽曲区間が終了しているか否かを、より精度良く判断するべく、更に以下の処理が実行される。 On the other hand, when the second cut point is clustered into a non-music piece (Y in step S22), it is considered that the music section is basically terminated at the second cut point. However, there is a possibility that, for some reason, the music section is actually continued, but it is temporarily clustered into non-music (that is, erroneous clustering is performed). Therefore, the following processing is further executed in order to more accurately determine whether or not the music section has ended at the second cut point.

まずＤＳＰ１６は、当該第２カット点を、仮終点に設定する（ステップＳ２３）。その後、ＰＣＭデータにおける、その次に（仮終点よりも後に）パワーの変化量がΔｐ以上となる箇所を、第３カット点として検出する（ステップＳ２４）。そして第３カット点が検出されたら、ＤＳＰ１６は、第３カット点でのＰＣＭデータを、楽曲か非楽曲にクラスタリングする（ステップＳ２５）。 First, the DSP 16 sets the second cut point as a temporary end point (step S23). Thereafter, the next point (after the provisional end point) in the PCM data where the power change amount is Δp or more is detected as a third cut point (step S24). When the third cut point is detected, the DSP 16 clusters the PCM data at the third cut point into music or non-music (step S25).

そして第３カット点でのＰＣＭデータが楽曲にクラスタリングされた場合には（ステップＳ２６のＮ）、仮終点では、ＰＣＭデータが一時的に非楽曲にクラスタリングされたものの、実際には、依然として楽曲区間が継続している（つまり、楽曲区間の終点とすべきでない）と推定される。そこでＤＳＰ１６は、今回設定された仮終点の情報を破棄し、ステップＳ２１の処理に戻ることとする（つまり、今回設定された仮終点については、真の終点としない）。 When the PCM data at the third cut point is clustered into a song (N in step S26), the PCM data is temporarily clustered into a non-music at the temporary end point, but actually, the song section is still Is presumed to be continued (that is, it should not be the end point of the music section). Therefore, the DSP 16 discards the information on the temporary end point set this time and returns to the process of step S21 (that is, the temporary end point set this time is not regarded as a true end point).

一方、第３カット点でのＰＣＭデータが非楽曲にクラスタリングされた場合には（ステップＳ２６のＹ）、ＤＳＰ１６は、仮終点と第３カット点の間隔が、所定の期間Ｔｓ以上であるかを判断する（ステップＳ２７）。この期間Ｔｓは、例えば音声データに一時的なノイズが混入した場合に、クラスタリングの処理が不安定になると考えられる期間（例えば、数秒程度）として、予め設定されているものである。 On the other hand, when the PCM data at the third cut point is clustered into a non-music piece (Y in step S26), the DSP 16 determines whether the interval between the temporary end point and the third cut point is equal to or longer than a predetermined period Ts. Judgment is made (step S27). This period Ts is set in advance as a period (for example, about several seconds) in which the clustering process is considered to be unstable when, for example, temporary noise is mixed in the audio data.

その結果、期間Ｔｓ未満であった場合には（ステップＳ２７のＮ）、ステップＳ２４の処理に戻る。ここでのステップＳ２４の処理では、今回検出された第３カット点の情報が破棄され、その次に（当該第３カット点より後に）パワーの変化量がΔｐ以上となる箇所が、新たに第３カット点として検出されることになる。 As a result, if it is less than the period Ts (N in step S27), the process returns to step S24. In the process of step S24 here, the information of the third cut point detected this time is discarded, and the next point (after the third cut point) is a new location where the amount of change in power becomes Δp or more. It will be detected as 3 cut points.

またステップＳ２７の処理において、仮終点と第３カット点の間隔が期間Ｔｓ以上であったと判断された場合には（ステップＳ２７のＹ）、少なくとも仮終点から期間Ｔｓが経過するまでの間、ＰＣＭデータは、一度も楽曲にクラスタリングされなかったことになる。そのため仮終点においては、一時的に誤ったクラスタリングがなされたのではなく（本当は楽曲であるのに、非楽曲にクラスタリングされてしまったというものではなく）、実際に楽曲区間が終了している可能性が高いといえる。 In the process of step S27, if it is determined that the interval between the temporary end point and the third cut point is equal to or longer than the period Ts (Y in step S27), at least until the period Ts elapses from the temporary end point. The data has never been clustered into music. Therefore, at the temporary end point, the clustering is not temporarily wrong (it is not the fact that it is actually a song but has been clustered into a non-music piece), and the song section may actually end. It can be said that the nature is high.

そこでＤＳＰ１６は、仮終点を楽曲区間の終点に設定する（ステップＳ２８）。なおこのとき、仮終点自体ではなく、仮終点を基準とした所定箇所（例えば、仮終点から所定時間分だけ離れた箇所）が、楽曲区間の終点に設定されても構わない。 The DSP 16 sets the temporary end point as the end point of the music section (step S28). At this time, instead of the temporary end point itself, a predetermined location based on the temporary end point (for example, a location separated from the temporary end point by a predetermined time) may be set as the end point of the music section.

ここまでの処理により、ＰＣＭデータにおける楽曲区間の始点と終点の双方が特定され、ひいては、一つの楽曲区間が特定されたことになる。そこでＤＳＰ１６は、メモリ１８に一時記録されているＰＣＭデータからこの楽曲区間を抽出する。換言すれば、一時記憶されているＰＣＭデータに基づいて、当該楽曲区間のみを表すＰＣＭデータを取得する。その後、当該取得されたＰＣＭデータは、ＭＰ３Ｃｏｄｅｃ部１３において符号化圧縮処理が施された後、一つのファイルとしてＨＤＤ２０に記録される（ステップＳ２９）。放送受信装置１は、当該記録されたＰＣＭデータの音声を、事後的に再生させることが可能である。 By the process so far, both the start point and end point of the music section in the PCM data are specified, and as a result, one music section is specified. Therefore, the DSP 16 extracts the music section from the PCM data temporarily recorded in the memory 18. In other words, PCM data representing only the music section is acquired based on the temporarily stored PCM data. Thereafter, the obtained PCM data is subjected to encoding / compression processing in the MP3 Codec unit 13 and then recorded in the HDD 20 as one file (step S29). The broadcast receiving apparatus 1 can reproduce the sound of the recorded PCM data afterwards.

なお、楽曲区間の抽出手法としては、楽曲区間のみを表すＰＣＭデータを取得する手法の他、一時記憶されているＰＣＭデータ（楽曲区間以外の部分も含まれたＰＣＭデータ）に、楽曲区間を識別可能とする情報を付加する手法（例えば、楽曲区間の始点と終点をマーキングする手法）等であっても構わない。 As a method for extracting a music section, in addition to a technique for acquiring PCM data representing only a music section, the music section is identified by temporarily stored PCM data (PCM data including parts other than the music section). It may be a method of adding information to be enabled (for example, a method of marking the start and end points of a music section).

その後、ＤＳＰ１６は、ステップＳ１２の処理に戻り、さらに次の楽曲区間を記録するべく、上述した一連の処理を繰り返す。なお、例えばユーザによって、楽曲区間抽出処理の終了指示がなされた場合には、楽曲区間抽出処理は終了される。 Thereafter, the DSP 16 returns to the process of step S12 and repeats the series of processes described above to record the next music section. For example, when the user gives an instruction to end the music segment extraction process, the music segment extraction process is ended.

ここで、上述した楽曲区間抽出処理の内容等をより理解容易とするため、当該処理に関する簡略的な事例について、図４を参照しながら以下に説明する。 Here, in order to make it easier to understand the contents and the like of the music segment extraction process described above, a simple example related to the process will be described below with reference to FIG.

当該事例では、楽曲区間抽出処理の開始後に、メモリ１８に一時記憶されるＰＣＭデータにおける楽曲への近似度合が、図４に示すグラフの通りとなる場合を想定する。なお図４のグラフでは、横軸が時刻（つまり、ＰＣＭデータにおけるどの箇所か）を、縦軸が楽曲への近似度合を示している。 In this case, it is assumed that the degree of approximation to music in the PCM data temporarily stored in the memory 18 after the start of the music segment extraction processing is as shown in the graph shown in FIG. In the graph of FIG. 4, the horizontal axis indicates time (that is, where in the PCM data), and the vertical axis indicates the degree of approximation to music.

またＡ〜Ｎの各箇所は、ＰＣＭデータにおいてパワーの変化量がΔｐ以上となる箇所（つまり、カット点として検出される箇所）を表している。また破線３１は、先述した判定閾値Ｔａを表し、矢印３２は、先述した期間Ｔｓを表している。 In addition, each of A to N represents a location where the amount of change in power is equal to or greater than Δp in the PCM data (that is, a location detected as a cut point). A broken line 31 represents the determination threshold Ta described above, and an arrow 32 represents the period Ts described above.

なお図４のグラフに係るＰＣＭデータでは、実際には、概ねＸで示す箇所からＹで示す箇所までが、連続した楽曲区間（つまり、単一の楽曲区間）である。しかし、Ｚ１およびＺ２で示す箇所においては、実際には楽曲区間に属しているにも関わらず、何らかの原因によって、一時的に楽曲への近似度合が判定閾値Ｔａよりも低くなっている。 Note that in the PCM data according to the graph of FIG. 4, in practice, a portion generally indicated by X to a portion indicated by Y is a continuous music section (that is, a single music section). However, in the places indicated by Z1 and Z2, the degree of approximation to the music is temporarily lower than the determination threshold Ta for some reason although it actually belongs to the music section.

楽曲区間抽出処理が開始されると、まずＡの箇所が、第１カット点として検出される（ステップＳ１２）。しかしＡの箇所ではＰＣＭデータは非楽曲にクラスタリングされるため（ステップＳ１４のＮ）、次にＢの箇所が、新たに第１カット点として検出される（ステップＳ１５）。 When the music section extraction process is started, the location A is first detected as the first cut point (step S12). However, since the PCM data is clustered into a non-music piece at the location A (N in step S14), the location B is newly detected as the first cut point (step S15).

しかしＢの箇所でもＰＣＭデータは非楽曲にクラスタリングされるため（ステップＳ１４のＮ）、次にＣの箇所が、新たに第１カット点として検出される（ステップＳ１５）。そしてＣの箇所ではＰＣＭデータは楽曲にクラスタリングされるため（ステップＳ１４のＹ）、Ｃの箇所が楽曲区間の始点に設定される（ステップＳ１６）。 However, since the PCM data is clustered into a non-music piece at the location B (N in step S14), the location C is newly detected as the first cut point (step S15). Since the PCM data is clustered into songs at the location C (Y in step S14), the location C is set as the start point of the song section (step S16).

その後、Ｄの箇所が、第２カット点として検出される（ステップＳ２０）。Ｄの箇所では、ＰＣＭデータは非楽曲にクラスタリングされるため（ステップＳ２２のＹ）、Ｄの箇所は仮終点に設定される（ステップＳ２３）。そして次に、Ｅの箇所が第３カット点として検出される（ステップＳ２４）。しかしＥの箇所では、ＰＣＭデータは楽曲にクラスタリングされるため（ステップＳ２６のＮ）、今回設定された仮終点（Ｄの箇所）が楽曲区間の終点に設定されることにはならず、次に、Ｆの箇所が、新たに第２カット点として検出される（ステップＳ２０）。 Thereafter, the portion D is detected as the second cut point (step S20). At the location D, the PCM data is clustered into non-music (Y in step S22), so the location D is set as a temporary end point (step S23). Next, the location E is detected as the third cut point (step S24). However, at the location E, the PCM data is clustered into music (N in step S26), so the temporary end point (the location D) set this time is not set as the end point of the music section. , F are newly detected as second cut points (step S20).

しかしＦの箇所においても、ＰＣＭデータは楽曲にクラスタリングされるため（ステップＳ２２のＮ）、次にパワーの変化量がΔｐ以上となるＧの箇所が、新たに第２カット点として検出される（ステップＳ２０）。さらにＧの箇所においても、ＰＣＭデータは楽曲にクラスタリングされるため（ステップＳ２２のＮ）、次にパワーの変化量がΔｐ以上となるＨの箇所が、新たに第２カット点として検出される（ステップＳ２０）。 However, since the PCM data is also clustered into music at the location F (N in step S22), the location G where the power change amount is equal to or greater than Δp is newly detected as the second cut point ( Step S20). Further, since the PCM data is also clustered into music at the location G (N in step S22), the location H at which the power change amount is equal to or greater than Δp is newly detected as the second cut point ( Step S20).

Ｈの箇所では、ＰＣＭデータは非楽曲にクラスタリングされるため（ステップＳ２２のＹ）、Ｈの箇所は仮終点に設定される（ステップＳ２３）。そして次に、Ｉの箇所が第３カット点として検出される（ステップＳ２４）。しかし、Ｉの箇所では、ＰＣＭデータは楽曲にクラスタリングされるため（ステップＳ２６のＮ）、今回設定された仮終点（Ｈの箇所）が楽曲区間の終点に設定されることにはならず、次に、Ｊの箇所が、新たに第２カット点として検出される（ステップＳ２０）。 Since the PCM data is clustered into a non-music piece at the location H (Y in step S22), the location H is set as a temporary end point (step S23). Next, the location I is detected as the third cut point (step S24). However, since the PCM data is clustered into music at the location I (N in step S26), the temporary end point (the location H) set this time is not set as the end point of the music section. In addition, the position of J is newly detected as the second cut point (step S20).

またＪの箇所でも、ＰＣＭデータは楽曲にクラスタリングされるため（ステップＳ２２のＮ）、次にＫの箇所が、新たに第２カット点として検出される（ステップＳ２０）。またＫの箇所でも、ＰＣＭデータは楽曲にクラスタリングされるため（ステップＳ２２のＮ）、次にＬの箇所が、新たに第２カット点として検出される（ステップＳ２０）。 In addition, since the PCM data is clustered into music at the location J (N at step S22), the location K is newly detected as the second cut point (step S20). Since the PCM data is also clustered into music at the K location (N in step S22), the L location is then newly detected as the second cut point (step S20).

そしてＬの箇所では、ＰＣＭデータは非楽曲にクラスタリングされるため（ステップＳ２２のＹ）、Ｌの箇所は仮終点に設定される（ステップＳ２３）。そして次に、Ｍの箇所が第３カット点として検出される（ステップＳ２４）。Ｍの箇所では、ＰＣＭデータは非楽曲にクラスタリングされるが（ステップＳ２６のＹ）、図４のグラフに示すように、仮終点（Ｌの箇所）と第３カット点（Ｍの箇所）の間隔は、期間Ｔｓより小さい（ステップＳ２７のＮ）。 Since the PCM data is clustered into a non-music piece at the location L (Y in step S22), the location L is set as a temporary end point (step S23). Next, the location M is detected as the third cut point (step S24). At the location M, the PCM data is clustered into a non-music piece (Y in step S26), but as shown in the graph of FIG. 4, the interval between the temporary end point (location L) and the third cut point (location M). Is smaller than the period Ts (N in step S27).

そのため次に、Ｎの箇所が第３カット点として検出され（ステップＳ２４）、Ｎの箇所では、ＰＣＭデータは楽曲にクラスタリングされる（ステップＳ２６のＹ）。そして更に、仮終点（Ｌの箇所）と第３カット点（Ｎの箇所）の間隔は、期間Ｔｓ以上である（ステップＳ２７のＹ）。そこで、仮終点として検出されていたＬの箇所が、楽曲区間の終点に設定される（ステップＳ２８）。その結果、図４の下部に「本実施形態」で示すように、ＰＣＭデータにおけるＣからＬまでの区間が単一の楽曲区間として抽出され、ＨＤＤ２０に記録されることとなる（ステップＳ２９）。 For this reason, the location N is then detected as the third cut point (step S24), and the PCM data is clustered into music at the location N (Y in step S26). Further, the interval between the temporary end point (L location) and the third cut point (N location) is equal to or longer than the period Ts (Y in step S27). Therefore, the portion L that has been detected as the temporary end point is set as the end point of the music section (step S28). As a result, as indicated by “this embodiment” at the bottom of FIG. 4, the section from C to L in the PCM data is extracted as a single music section and recorded in the HDD 20 (step S29).

以上に説明したとおり放送受信装置１によれば、仮終点以降の期間Ｔｓの間に、ＰＣＭデータが楽曲とクラスタリングされることが無かった場合に限って、仮終点が、楽曲区間の終点に設定される。そのため、楽曲区間の途中において一時的に誤ったクラスタリングがなされてしまった場合であっても、その箇所（上記事例では「Ｄ」や「Ｈ」の箇所）で楽曲区間が分離されないようになっている。 As described above, according to the broadcast receiving apparatus 1, the temporary end point is set as the end point of the music section only when the PCM data is not clustered with the music during the period Ts after the temporary end point. Is done. Therefore, even if erroneous clustering is temporarily performed in the middle of the music section, the music section is not separated at that point (in the above example, “D” or “H”). Yes.

なお仮に、従来方式（非楽曲から楽曲に遷移した箇所を楽曲区間の始点とし、楽曲から非楽曲に遷移した箇所を楽曲区間の終点とする方式）によって、図４のグラフに示すＰＣＭデータについての楽曲区間が抽出される場合（「ケース１」とする）には、図４の下部に「ケース１」で示すように、ＣからＤまでの区間、ＥからＨまでの区間、およびＩからＬまでの区間が、それぞれ別個の楽曲区間として抽出されてしまうことになる。 Note that the PCM data shown in the graph of FIG. 4 is assumed to be based on the conventional method (a method in which a point where a transition from a non-music piece to a music piece is set as the start point of the music section and a point where the music piece is changed to a non-music piece is set as the end point). When the music section is extracted (referred to as “Case 1”), the section from C to D, the section from E to H, and I to L, as shown by “Case 1” at the bottom of FIG. Will be extracted as separate music sections.

このようにケース１によれば、本来は一つの楽曲区間が、一時的に誤ったクラスタリングがなされることにより、複数の楽曲区間に分離して認識される。この点、本実施形態に係る放送受信装置１は、このように分離して認識されてしまうことがないため、ユーザビリティに優れていると言うことができる。 As described above, according to the case 1, one music section is originally recognized by being divided into a plurality of music sections by performing erroneous clustering temporarily. In this regard, it can be said that the broadcast receiving apparatus 1 according to the present embodiment is excellent in usability because it is not recognized separately in this way.

なおここで、本実施形態における楽曲区間の終点の設定と、同様の主旨の処理を、楽曲区間の始点の設定にも適用した方式（例えば、クラスタリング結果が楽曲に遷移した第１カット点から期間Ｔｓが経過するまでの間に、非楽曲とクラスタリングされることが無かった場合に限って、当該第１カット点を楽曲区間の始点に設定する方式）によって、図４のグラフに示すＰＣＭデータについて楽曲区間が抽出される場合（「ケース２」とする）を想定する。 It should be noted that here, a method in which the same processing as the end point of the music section in the present embodiment is applied to the setting of the start point of the music section (for example, the period from the first cut point at which the clustering result transitions to the music) PCM data shown in the graph of FIG. 4 by a method in which the first cut point is set as the start point of a music section only when there is no clustering with non-music before Ts elapses. Assume that a music segment is extracted (referred to as “Case 2”).

この場合、Ｃの箇所（クラスタリング結果が楽曲に遷移する第１カット点）から期間Ｔｓが経過するまでの間に属するＤの箇所において、ＰＣＭデータは非楽曲とクラスタリングされるから、Ｃの箇所は楽曲区間の始点として設定されない。その結果、図４の下部に「ケース２」で示すように、本来の楽曲区間の始点から大幅に遅れたＥの箇所が、楽曲区間の始点として設定されてしまうことになる。 In this case, since the PCM data is clustered with the non-music at the location D belonging from the location C (the first cut point at which the clustering result transitions to the music) until the period Ts elapses, the location C is It is not set as the start point of the music section. As a result, as indicated by “Case 2” in the lower part of FIG. 4, the location E that is significantly delayed from the starting point of the original music section is set as the starting point of the music section.

この点本実施形態では、楽曲区間の始点の設定にあたっては、従来の手法と同様に、クラスタリング結果が楽曲に遷移した箇所が、楽曲区間の始点に設定される。つまり、クラスタリング結果が楽曲に遷移した場合、その後の箇所におけるクラスタリング結果に関わらず、当該遷移した箇所が楽曲区間の始点に設定される。そのため、本来の楽曲区間の始点に近いＣの箇所が楽曲区間の始点として設定されるため、本実施形態に係る放送受信装置１は、楽曲区間をより精度良く抽出することが可能であり、よりユーザビリティに優れていると言うことができる。 In this respect, in the present embodiment, in setting the start point of the music section, the location where the clustering result has transitioned to the music is set as the start point of the music section, as in the conventional method. That is, when the clustering result transitions to music, the transitioned portion is set as the start point of the music section regardless of the clustering result at the subsequent portion. Therefore, since the location C near the start point of the original music section is set as the start point of the music section, the broadcast receiving device 1 according to the present embodiment can extract the music section more accurately, and more It can be said that it is excellent in usability.

このように、放送受信装置１では、楽曲区間の始点の設定と終点の設定とが、互いに異なる主旨の処理によって実現されている。また放送受信装置１は、当該始点の設定と当該終点の設定を、それぞれに適した処理を通じて実現させ、出来るだけユーザビリティを向上させるものとなっている。 Thus, in the broadcast receiving apparatus 1, the setting of the start point and the setting of the end point of the music section are realized by processes having different main points. In addition, the broadcast receiving apparatus 1 realizes the setting of the start point and the setting of the end point through processing suitable for each, and improves usability as much as possible.

なお、楽曲区間抽出処理の態様としては、必要な各処理が、音声データの受信に並行して（リアルタイムに）実行されるようになっていても良く、音声データの受信が終了した後に（事後的に）実行されるようになっていても良い。また例えば、Ａ／Ｄ変換部１２とＤＳＰ１６の間に所定のバッファを設けておくとともに、Ａ／Ｄ変換部１２から出力されるＰＣＭデータが当該バッファに一時的に蓄積されるようにしておき、この蓄積されたＰＣＭデータが、ＤＳＰ１６に伝送されるようにしても良い。 In addition, as an aspect of the music section extraction process, each necessary process may be executed in parallel with the reception of the audio data (in real time). It may be executed). Further, for example, a predetermined buffer is provided between the A / D converter 12 and the DSP 16, and PCM data output from the A / D converter 12 is temporarily stored in the buffer. The accumulated PCM data may be transmitted to the DSP 16.

このようにすれば、楽曲区間抽出処理においてステップＳ１１の処理を行う代わりに、楽曲区間の始点をリアルタイムに検出するとともに、始点が検出され次第、当該始点からのＰＣＭデータを、逐次メモリ１８に記憶させていくことができる。また楽曲区間の終点をリアルタイムに検出し、終点より後のＰＣＭデータがメモリ１８に記憶されないようにすることもできる。つまり、ＰＣＭデータにおける楽曲区間以外の部分が、メモリ１８に記録されることを、省略することが可能である。その結果、ＰＣＭデータの一時記憶に要するメモリ１８の容量を、極力抑えることが可能となる。 In this way, instead of performing the process of step S11 in the music section extraction process, the start point of the music section is detected in real time, and the PCM data from the start point is sequentially stored in the memory 18 as soon as the start point is detected. I can let you. It is also possible to detect the end point of the music section in real time so that PCM data after the end point is not stored in the memory 18. That is, it is possible to omit the recording of the part other than the music section in the PCM data in the memory 18. As a result, the capacity of the memory 18 required for temporary storage of PCM data can be suppressed as much as possible.

また楽曲区間抽出処理においては、ＰＣＭデータのクラスタリングは、各カット点（パワーの変化量がΔｐ以上となる各箇所）についてのみ、実行されるようになっている（ステップＳ１３、Ｓ２１、Ｓ２５）。そのため、当該クラスタリングに要する処理負担を、極力軽減することが可能となっている。また、音声データにおいて楽曲から非楽曲に、或いは非楽曲から楽曲に遷移する箇所では、当該音声データにおけるパワーの変化量（音声の大きさの変化度合）が、比較的大きくなり易いことが判っている。なお、ＰＣＭデータのクラスタリングが、各カット点以外についても実行される（例えば、所定周期で一律に実行される）ようになっていても構わない。 In the music segment extraction process, the PCM data clustering is executed only for each cut point (each location where the power change amount is equal to or greater than Δp) (steps S13, S21, and S25). Therefore, it is possible to reduce the processing burden required for the clustering as much as possible. Also, it can be seen that the amount of change in power (degree of change in the amount of sound) in the audio data tends to be relatively large at locations where the audio data transitions from music to non-music or from non-music to music. Yes. Note that the PCM data clustering may be executed at points other than the cut points (for example, uniformly executed at a predetermined cycle).

以上に説明した通り、本発明の実施形態に係る放送受信装置１は、音声データを取得する機能部（音声データ取得部）と、取得した音声データの部分を楽曲部（楽曲である部分）か非楽曲部（非楽曲である部分）かに分類する機能部（分類部）と、該分類の結果に基づいて、楽曲区間の始点と終点を特定する機能部（特定部）と、を備えている。そして特定部は、前記分類の結果が非楽曲部から楽曲部に遷移した箇所を始点として検出し、楽曲部から非楽曲部に遷移した箇所を終点（先述した仮終点に相当する）として検出する。また特定部は、前記終点以降の所定期間において楽曲部が検出されなかった場合は、前記終点を楽曲区間の終点として確定する。 As described above, the broadcast receiving apparatus 1 according to the embodiment of the present invention includes a function unit (audio data acquisition unit) that acquires audio data, and whether the acquired audio data part is a music part (part that is a music). A function unit (classification unit) for classifying into a non-music unit (part that is a non-music unit) and a function unit (identification unit) for identifying the start point and end point of a music section based on the result of the classification Yes. And a specific part detects the location which the result of the said classification changed from the non-music part to the music part as a starting point, and detects the part which changed from the music part to the non-music part as an end point (equivalent to the temporary end point mentioned above). . The identifying unit determines the end point as the end point of the music section when the music unit is not detected in a predetermined period after the end point.

また別の見方をすれば、本発明の実施形態に係る放送受信装置１は、音声データを取得する機能、前記音声データにおける各箇所（パワーの変化量がΔｐ以上となる各箇所）について、前から順に楽曲か非楽曲に分類する機能、該分類の結果に基づいて、前記音声データにおける楽曲区間の始点を特定する機能（始点特定機能）、および、該分類の結果に基づいて、前記音声データにおける該楽曲区間の終点を特定する機能（終点特定機能）を備え、前記音声データから該楽曲区間を抽出するものとなっている。 From another point of view, the broadcast receiving apparatus 1 according to the embodiment of the present invention has a function of acquiring audio data, and each location in the audio data (each location where the power change amount is equal to or greater than Δp). A function for classifying music or non-music in order, a function for specifying the start point of the music section in the audio data based on the result of the classification (start point specifying function), and the audio data based on the result of the classification Is provided with a function for specifying the end point of the music section (end point specifying function), and the music section is extracted from the audio data.

そしてこの終点特定機能は、音声データにおける前記分類の結果が非楽曲に遷移した箇所（つまり、直前の箇所では楽曲に分類されており、今回は非楽曲に分類された箇所）を、仮終点として検出するとともに、該仮終点以降の判定期間Ｔｓにおいて、音声データが楽曲に分類されなかった場合に限り、該仮終点を基準とした所定の箇所を、前記終点として特定するようになっている。 And this end point specifying function uses, as a temporary end point, a location where the result of the classification in the audio data has transitioned to a non-music (that is, a location that has been classified as a music in the immediately preceding location and is classified as a non-music this time). In addition to the detection, a predetermined location based on the temporary end point is specified as the end point only when the audio data is not classified into music in the determination period Ts after the temporary end point.

そのため、楽曲区間の途中において一時的に誤ったクラスタリングがなされた場合であっても、上記所定期間内において、正しいクラスタリングがなされた場合には、楽曲区間の終点が誤って設定されることが防止される。そのため、当該楽曲区間が分離される事態を、極力回避することが可能となっている。 For this reason, even if the wrong clustering is temporarily performed in the middle of the music section, if the correct clustering is performed within the predetermined period, the end point of the music section is prevented from being set incorrectly. Is done. Therefore, it is possible to avoid the situation where the music section is separated as much as possible.

また始点特定機能は、音声データにおいて前記分類の結果が楽曲に遷移した箇所（つまり、直前の箇所では非楽曲に分類されており、今回は楽曲に分類された箇所）を検出し、該箇所を基準とした所定の箇所を、前記始点として特定するようになっている。そのため、上述したケース２に係る楽曲抽出装置に比べ、楽曲区間の始点を適切に特定することが可能となっている。 In addition, the start point specifying function detects a location where the result of the classification transitions to music in the audio data (that is, a location that is classified as a non-music at the previous location and is classified as a music this time), and detects the location. A predetermined location as a reference is specified as the starting point. Therefore, it is possible to appropriately specify the start point of the music section as compared to the music extraction device according to case 2 described above.

また先述した音声データ処理プログラムは、ＤＳＰ１６（コンピュータの一種と見ることができる）を、音声データにおける各箇所について、前から順に楽曲か非楽曲に分類するクラスタリング手段、該分類の結果に基づいて、音声データにおける楽曲区間の始点を特定する始点特定手段、該分類の結果に基づいて、音声データにおける楽曲区間の終点を特定する終点特定手段、および、音声データから該楽曲区間を抽出する抽出手段、として機能させるプログラムと言える。 Further, the audio data processing program described above is based on the clustering means for classifying the DSP 16 (which can be regarded as a kind of computer) into music or non-music in order from the front for each location in the audio data, Start point specifying means for specifying the start point of the music section in the audio data, end point specifying means for specifying the end point of the music section in the audio data based on the result of the classification, and extraction means for extracting the music section from the audio data; It can be said that the program functions as

なお、この始点特定手段は、音声データにおける前記分類の結果が楽曲に遷移した箇所を検出し、該箇所を基準とした所定の箇所を、前記始点として特定する手段である。また終点特定手段は、音声データにおける前記分類の結果が非楽曲に遷移した箇所を、仮終点として検出するとともに、該仮終点以降の所定期間において、前記音声データが楽曲に分類されなかった場合に限り、該仮終点を基準とした所定の箇所を、前記終点として特定する手段である。 The start point specifying means is a means for detecting a location where the classification result in the audio data has transitioned to music, and specifying a predetermined location based on the location as the start point. In addition, the end point specifying unit detects a portion where the classification result in the audio data has changed to a non-music piece as a temporary end point, and when the audio data is not classified into a music piece in a predetermined period after the temporary end point. As long as it is a means for specifying a predetermined location based on the temporary end point as the end point.

以上、本発明の実施形態について説明したが、本発明はこの内容に限定されるものではない。また本発明は、その主旨を逸脱しない範囲において、種々の改変を加えて実施されうる。 As mentioned above, although embodiment of this invention was described, this invention is not limited to this content. The present invention can be implemented with various modifications without departing from the spirit of the present invention.

本発明は、音声データを受信する受信装置などの分野において利用可能である。 The present invention can be used in fields such as a receiving apparatus that receives audio data.

１放送受信装置（楽曲抽出装置）
１１ＦＭチューナ部
１２Ａ／Ｄ変換部
１３ＭＰ３Ｃｏｄｅｃ部
１４Ｄ／Ａ変換部
１５スピーカ
１６ＤＳＰ
１７ＣＰＵ
１８メモリ
１９バス
２０ＨＤＤ
２１ＨＤＤ−ＩＦ 1 Broadcast receiving device (music extraction device)
11 FM tuner section 12 A / D conversion section 13 MP3 Codec section 14 D / A conversion section 15 Speaker 16 DSP
17 CPU
18 Memory 19 Bus 20 HDD
21 HDD-IF

Claims

A music extraction device that extracts music sections from audio data,
An audio data acquisition unit for acquiring audio data;
A classification unit for classifying the acquired audio data part into a music part or a non-music part;
Based on the result of the classification, a specifying unit for specifying the start point and end point of the music section;
With
The specific part is:
While detecting the location where the result of the classification has changed from the non-music portion to the music portion as a start point, and detecting the location where the music portion has changed to the non-music portion as an end point,
The music extraction device according to claim 1, wherein when no music part is detected in a predetermined period after the end point, the end point is determined as an end point of the music section.