JP2005284041A

JP2005284041A - Method for distributing contents, contents distribution server and contents receiver

Info

Publication number: JP2005284041A
Application number: JP2004099137A
Authority: JP
Inventors: Takeya Fujii; 毅也藤井
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2004-03-30
Filing date: 2004-03-30
Publication date: 2005-10-13

Abstract

<P>PROBLEM TO BE SOLVED: To provide a contents distributing method capable of transmitting contents by dynamically switching tracks and smoothly reproducing the switched position and a contents distribution server and a contents receiver in which the contents distributing method is used. <P>SOLUTION: When a transmission control part 32 transmits the file name of contents data, a track number, a contents reproducing range, and a contents reproduction request message and a transmission control part 22 receives the reproduction request message, a contents acquisition part 21 acquires the contents element of a specified track in accordance with the file name, the track number and the contents reproducing range. When the transmission control part 32 transmits the track number and a track switching request message and the transmission control part 22 receives the switching request message, the contents acquisition part 21 acquires the contents element of a specified track having time information continued to time information included in the sent contents element in accordance with the track number. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、ネットワークを介して、複数トラックから構成されるコンテンツデータを送信するコンテンツ配信サーバと当該コンテンツデータを受信再生するコンテンツ受信装置からなるコンテンツ配信システムにおいて、トラックを動的に切り替えて送信し、切り替えられた箇所を滑らかに再生することを可能にするコンテンツ配信方法、およびコンテンツ配信方法を用いたコンテンツ配信サーバ、およびコンテンツ受信装置に関する。 The present invention dynamically switches and transmits tracks in a content distribution system including a content distribution server that transmits content data composed of a plurality of tracks and a content reception device that receives and reproduces the content data via a network. The present invention relates to a content distribution method capable of smoothly reproducing a switched portion, a content distribution server using the content distribution method, and a content receiving device.

現在、通信カラオケシステムと呼ばれるカラオケシステムが一般的に用いられている。この通信カラオケシステムの大部分は、MIDI（Musical Instrument Digital Interface）に類する音色や譜面情報を含む曲データを再生前に受信しておくダウンロードシステムを採用している。MIDIは、PCM（Pulse Code Modulation）等のサンプリングを基調としたデジタル音声データと比較して、1/10〜1/100程度のデータ量であり、公衆電話回線等の狭帯域通信路を用いてデータ配信サービスを行うのに好適であるため、通信カラオケシステムに多く採用されている。コンテンツ受信装置では、受信した曲データ（MIDIデータ）を、予め多種の楽器に相当する音色を備えたMIDI音源装置によって演奏することで、利用者にサービスを提供している。 Currently, a karaoke system called a communication karaoke system is generally used. Most of this communication karaoke system employs a download system that receives music data including tone and musical score information similar to MIDI (Musical Instrument Digital Interface) before playback. MIDI is about 1/10 to 1/100 the amount of data compared to digital audio data based on sampling such as PCM (Pulse Code Modulation), and uses a narrowband communication path such as a public telephone line. Since it is suitable for performing a data distribution service, it is often employed in a communication karaoke system. In the content receiving device, the received music data (MIDI data) is played in advance by a MIDI sound source device having timbres corresponding to various musical instruments, thereby providing a service to the user.

一方で、カラオケシステムには、キーコントロール機能が必要である。キーコントロール機能とは、利用者の声質にあわせて楽曲の音程を調整することで、快適に歌いやすくするための機能である。既存のMIDIを採用した通信カラオケシステムの場合、キーコントロールは、MIDIに含まれる譜面情報に対して移調処理を行うことで、極めて容易に実現できる。 On the other hand, a karaoke system requires a key control function. The key control function is a function that makes it easy to sing comfortably by adjusting the pitch of the music according to the voice quality of the user. In the case of an existing communication karaoke system using MIDI, key control can be realized very easily by transposing the musical score information contained in MIDI.

一方で、MIDIを採用した通信カラオケシステムでは、近年の高品質化の要望に伴って、様々な問題が生じている。例えば、楽曲の音質を高品質化したい場合、MIDIで操作できる部分は譜面情報のみであるため、コンテンツ受信装置の音源を交換または改修しなければならない。これは利用者（または店舗）が数十万世帯に達する通信カラオケシステムでは莫大なコストとなる。 On the other hand, in the communication karaoke system using MIDI, various problems have arisen with the recent demand for higher quality. For example, when it is desired to improve the sound quality of a music piece, since the only part that can be operated by MIDI is musical score information, the sound source of the content receiving apparatus must be exchanged or modified. This is a huge cost in a communication karaoke system where users (or stores) reach hundreds of thousands of households.

また、近年では、通信カラオケシステムにおいて、バックコーラス、あるいはデュエット曲におけるパートナー音声（例えば、利用者が男性であれば、女性が歌うパートナー音声）の需用が高まっている。しかし、上述のように、MIDIは譜面情報が主であるため、楽曲にあわせて別の手段でバックコーラスやパートナー音声のデータを伝送する必要があった。 In recent years, demand for partner voices in back chorus or duet music (for example, partner voices sung by women if the user is male) is increasing in online karaoke systems. However, as described above, since MIDI mainly includes musical score information, it is necessary to transmit back chorus and partner voice data by another means in accordance with the music.

次に、ストリーミングシステムを応用した通信カラオケシステムについて述べる。 Next, a communication karaoke system using a streaming system will be described.

近年、インターネットにおいては、受信側で再生コンテンツの曲データ全てを蓄積せずに、受信しながら再生するストリーミングシステムが盛んに利用されている。一般的なストリーミングシステムは、MPEG（Moving Picture Experts Group）等の高能率符号化方式によってデジタル化された再生コンテンツを扱うことが多い。 2. Description of the Related Art In recent years, streaming systems that play while receiving data without accumulating all the music data of the playback content on the receiving side are actively used on the Internet. A general streaming system often handles playback content digitized by a high-efficiency encoding method such as MPEG (Moving Picture Experts Group).

例えば、音声であればPCMデジタルデータをMPEG Audio Layer2,3,AAC（Advanced Audio Coding）等の高能率符号化処理を施すことで、データ容量を1/3〜1/15程度に圧縮でき、近年のADSL（Asymmetric Digital Subscriber Line：非対称デジタル加入者線）等のブロードバンド通信路を用いての実時間伝送が可能となってきた。 For example, in the case of audio, PCM digital data can be compressed to about 1/3 to 1/15 by performing high-efficiency encoding processing such as MPEG Audio Layer 2, 3, AAC (Advanced Audio Coding). Real-time transmission using broadband communication channels such as ADSL (Asymmetric Digital Subscriber Line) has become possible.

このストリーミングシステムでは、再生コンテンツを全てコンテンツ受信装置に蓄積する必要がないため、原音に非常に近いがMIDIよりも大容量の高能率符号化された曲データを迅速にコンテンツ受信装置へ提供できる。この特徴により、バックコーラスやパートナー音声入りの曲データを原音に近い品質で利用者に提供するので、サービスの高品質化に寄与できる。 In this streaming system, since it is not necessary to store all the playback content in the content receiving device, it is possible to quickly provide the content receiving device with highly efficient encoded music data that is very close to the original sound but larger in volume than MIDI. With this feature, music data with back chorus and partner voice is provided to the user with quality close to the original sound, which can contribute to higher quality of service.

しかしながら、ストリーミングシステムを採用した通信カラオケシステムでは、キーコントロール機能を付加しようとすると、その演算量が問題となる。MIDIの場合は、元が譜面情報であるがゆえに、あたかも楽譜を書き換えるかのように簡単に移調できる。しかし、PCMやMPEG Audio等の曲データの場合は、音声の特徴を利用して移調と同じ効果を得なければならない。 However, in a communication karaoke system employing a streaming system, the amount of computation becomes a problem when a key control function is added. In the case of MIDI, since the original is musical score information, it is easy to transpose as if rewriting the score. However, in the case of music data such as PCM and MPEG Audio, it is necessary to obtain the same effect as transposition using the characteristics of audio.

例えば、特開平９−１８５３９２号公報（特許文献１）では、周波数変換を基調とした音程変換装置が開示されているが、これは周波数領域におけるピッチシフトに相当し、MIDIでの移調と比較して莫大な演算を必要とする。それゆえに、ストリーミングシステムを採用した通信カラオケシステムでは、コンテンツ受信装置に莫大な演算を行うことができる音程変換手段を搭載しなければならなかった。 For example, Japanese Patent Laid-Open No. 9-185392 (Patent Document 1) discloses a pitch conversion device based on frequency conversion, which corresponds to a pitch shift in the frequency domain, and is compared with MIDI transposition. Enormous operations. Therefore, in the communication karaoke system that employs the streaming system, the content receiving device must be equipped with a pitch conversion means that can perform enormous calculations.

ここで、映像・音声等の実時間配信に用いられているストリーミングシステムの非同期モデルについて説明する。 Here, an asynchronous model of a streaming system used for real-time distribution of video / audio will be described.

Ethernet（登録商標）（IEEE802.3）を代表とする小さなデータサイズの伝送フレームの相互交換を基調としたデータリンクでは、エンド・ツー・エンド接続された２つの通信装置間に同期クロックは存在しない場合が大半であり、データリンクを下位層として、複数のネットワーク間のパケット交換を可能にするためのIP（Internet Protocol,RFC791等）、さらにIPを下位層としてIPパケットを用いてセッション（通信路）管理を行うためのTCP（Transmission Control Protocol,RFC793）やUDP（User Datagram Protocol,RFC768）などのネットワークプロトコルにも同期クロックは存在しない。 There is no synchronization clock between two end-to-end communication devices in a data link based on the exchange of transmission frames of small data size represented by Ethernet (registered trademark) (IEEE802.3). In most cases, IP (Internet Protocol, RFC791, etc.) is used to enable packet exchange between multiple networks with the data link as the lower layer, and IP packets are used as sessions (communication path) with IP as the lower layer. There is no synchronous clock in network protocols such as TCP (Transmission Control Protocol, RFC793) and UDP (User Datagram Protocol, RFC768) for management.

そのため、ある通信装置から送信された伝送フレームがいつ何時目的の通信装置へ届くかは保証されない。また、伝送フレームの送信から受信までに発生する時間遅延は一定ではなく、時間揺らぎ（ジッタ）が生じる。また、伝送フレームは伝送路上で消失することもあり、ある時間軸に沿って複数の伝送フレームを送信したとしても、その到着順序は保証されない。 Therefore, it is not guaranteed when and when a transmission frame transmitted from a certain communication device arrives at the target communication device. Also, the time delay that occurs from transmission to reception of the transmission frame is not constant, and time fluctuation (jitter) occurs. Also, transmission frames may be lost on the transmission path, and even if a plurality of transmission frames are transmitted along a certain time axis, the arrival order is not guaranteed.

ところで、ストリーミングシステムとは、ネットワークを介してコンテンツを受信し再生するシステムのうち、受信側でコンテンツの全てを蓄積せずに、受信しながら再生が可能となるようなシステムの総称である。映像・音声等は、実時間性を持ったコンテンツであるから、1秒間に所定のコマ数（例えば29.97フレーム／秒）で再生するといった再生同期制御が必須であり、同期クロックの存在は好適であると言える。例えば、ＴＶ放送においては、NTSC_TV信号におけるフレーム同期信号と呼ばれる同期クロックが存在し、受信側は、送信側の同期クロックを用いて非常に高い精度で再生することが可能である。 By the way, a streaming system is a generic name of systems that receive and reproduce content via a network and that can reproduce while receiving without accumulating all of the content on the receiving side. Since video, audio, etc. are contents with real-time properties, playback synchronization control such as playback at a predetermined number of frames per second (eg 29.97 frames / second) is essential, and the presence of a synchronization clock is suitable. It can be said that there is. For example, in TV broadcasting, there is a synchronization clock called a frame synchronization signal in the NTSC_TV signal, and the reception side can reproduce it with very high accuracy using the synchronization clock on the transmission side.

しかし、IPをベースとしたIPネットワークを介したストリーミングシステムでは、上述のような同期クロックがなく、送信側と受信側を同期させることはできない。必然的に非同期モデルによって再生機構を実現せざるを得ない。非同期モデルとは、各々の機器自身のシステムクロックを基準として動作する構成およびプログラミング手法である。 However, in a streaming system via an IP network based on IP, there is no synchronization clock as described above, and the transmission side and the reception side cannot be synchronized. Inevitably, the playback mechanism must be realized by an asynchronous model. The asynchronous model is a configuration and programming technique that operates based on the system clock of each device itself.

図９に、コンテンツ配信システム１０１ａ（非同期モデル）におけるコンテンツ配信サーバ１０２ａとコンテンツ受信装置１０３ａの機能ブロック図を示す。コンテンツ配信サーバ１０２ａとコンテンツ受信装置１０３ａは、IPネットワーク１０４によって相互に接続される。 FIG. 9 shows a functional block diagram of the content distribution server 102a and the content receiving device 103a in the content distribution system 101a (asynchronous model). The content distribution server 102a and the content receiving device 103a are connected to each other by the IP network 104.

コンテンツ配信サーバ１０２ａは、コンテンツ取得部１０２１、送信部１０２２、システムクロック発振部１０２３、およびコンテンツ記憶部１０２４を有する。 The content distribution server 102a includes a content acquisition unit 1021, a transmission unit 1022, a system clock oscillation unit 1023, and a content storage unit 1024.

コンテンツ取得部１０２１は、再生するコンテンツのコンテンツデータをコンテンツ記憶部１０２４に記録し、任意のファイルポインタ（コンテンツ上の読み出し位置）を用いてコンテンツ素片を取得する機能を有する。この時、ファイルポインタとは、コンテンツファイル名、トラック番号、フレーム番号の３つのうちいずれか、もしくはそれらのうちいくつかの組み合わせである。また、“コンテンツ素片”とは、任意のデータ長を有するコンテンツの実データのことを指す。 The content acquisition unit 1021 has a function of recording content data of content to be reproduced in the content storage unit 1024 and acquiring a content fragment using an arbitrary file pointer (reading position on the content). At this time, the file pointer is any one of the content file name, the track number, and the frame number, or some combination thereof. A “content segment” refers to actual data of content having an arbitrary data length.

コンテンツ取得部１０２１は、システムクロック信号に従ってコンテンツ素片を読み出す。コンテンツ取得部１０２１は、内部にコンテンツファイル名、トラック番号、フレーム番号等の初期値を持ち、起動時から自動的に読み出しを開始しても良いし、外部入力からコンテンツファイル名、トラック番号、コンテンツ再生開始時刻を得てから読み出しを開始しても良い。また、コマンド入力を受け付ける外部入力を設け、コマンド入力＝再生開始定数（例えば“PLAＹ”）である場合は再生するようにしても良いし、コマンド入力＝再生停止定数（例えば“STOP”）である場合は停止するようにしても良い。あるいは自動読み出し開始動作と外部読み出し開始動作を自動的に判別するようにしても良い。 The content acquisition unit 1021 reads the content segment according to the system clock signal. The content acquisition unit 1021 has initial values such as a content file name, a track number, and a frame number inside, and may automatically start reading from the time of startup, or the content file name, track number, and content from an external input Reading may be started after the reproduction start time is obtained. In addition, an external input for receiving command input is provided, and when command input = reproduction start constant (for example, “PLAY”), reproduction may be performed, or command input = reproduction stop constant (for example, “STOP”). If so, it may be stopped. Alternatively, the automatic read start operation and the external read start operation may be automatically determined.

また、システムクロック発振部１０２３は、水晶発振子等で実現されたリアルタイムクロック（高精度の時計）であり、システムクロック信号をコンテンツ取得部へ供給する機能を有する。 The system clock oscillator 1023 is a real-time clock (high-precision clock) realized by a crystal oscillator or the like, and has a function of supplying a system clock signal to the content acquisition unit.

送信部１０２２は、セッションと呼ぶ仮想的な通信路を確保し、コンテンツ素片を適切なヘッダを付加してパケット化し、そのパケットを、IPネットワーク１０４を介してコンテンツ受信装置１０３ａへ送信する機能を有する。例えば、ヘッダにはネットワークタイムスタンプが含まれる。 The transmission unit 1022 has a function of securing a virtual communication path called a session, packetizing the content fragment with an appropriate header, and transmitting the packet to the content reception device 103a via the IP network 104. Have. For example, the header includes a network time stamp.

パケットを送信する際には、送信部１０２２は、コンテンツ受信装置１０３ａへパケットを届けるために送信アドレスを用いる。送信アドレスは、送信アドレスと送信ポート番号の組で構成される。送信アドレスは送信部１０２２内部の定数として予め持っていてもよいし、他のブロックから入力を受け付けてもよい。なお、パケットを送信するためのセッション（通信路）は送信アドレスが確定した時点で動的に確立するものとする。また、コンテンツ素片とパケットが必ずしも一致する必要はない。 When transmitting a packet, the transmission unit 1022 uses a transmission address to deliver the packet to the content reception device 103a. The transmission address is composed of a combination of a transmission address and a transmission port number. The transmission address may be previously stored as a constant in the transmission unit 1022 or may be input from another block. Note that a session (communication path) for transmitting a packet is dynamically established when a transmission address is determined. Further, the content segment and the packet do not necessarily match.

また、コンテンツ受信装置１０３ａは、受信部１０３１、バッファ部１０３２、システムクロック発振部１０３３、コンテンツ読取部１０３４、および再生部１０３５を有する。 In addition, the content receiving apparatus 103a includes a receiving unit 1031, a buffer unit 1032, a system clock oscillation unit 1033, a content reading unit 1034, and a reproduction unit 1035.

受信部１０３１は、IPネットワーク１０４からパケットを受信して、ヘッダを解釈して、コンテンツ素片を抽出し、バッファ部１０３２に供給する機能を有する。受信部１０３１は、コンテンツ配信サーバ１０２ａからのパケットのうち自己に必要なパケットのみを受信するために受信アドレスを用いる。受信アドレスは、受信アドレスと受信ポート番号の組で構成される。受信アドレスは受信部１０３１内部の定数として予め持っていても良いし、他のブロックから入力を受け付けても良い。 The receiving unit 1031 has a function of receiving a packet from the IP network 104, interpreting a header, extracting a content fragment, and supplying the content unit 1032 to the buffer unit 1032. The receiving unit 1031 uses a reception address in order to receive only a packet necessary for itself among the packets from the content distribution server 102a. The reception address is composed of a combination of a reception address and a reception port number. The reception address may be previously stored as a constant in the reception unit 1031 or may be input from another block.

バッファ部１０３２は、コンテンツ素片をヘッダから得られたネットワークタイムスタンプ（後述）と共に一次的に蓄積記憶する機能を有する。 The buffer unit 1032 has a function of temporarily accumulating and storing content pieces together with a network time stamp (described later) obtained from the header.

コンテンツ読取部１０３４は、バッファ部１０３２を監視し、再生に十分なコンテンツ素片が蓄積されたと判断した時点から、システムクロック発振部１０３３からのシステムクロック信号に従ってコンテンツ素片の読み出しを開始し、コンテンツ素片の集合をコンテンツに復元して出力する機能を有する。 The content reading unit 1034 monitors the buffer unit 1032 and starts reading the content unit according to the system clock signal from the system clock oscillation unit 1033 from the time when it is determined that the content unit sufficient for reproduction has been accumulated. It has a function of restoring and outputting a set of segments as content.

再生部１０３５は、入力されたコンテンツ素片に応じた復号化を行い、スピーカ等の所定の出力装置に対して音声信号、映像信号を出力する機能を有する。例えば、コンテンツ素片がMPEGに類する高能率符号化されたデジタル音声データである場合、再生部１０３５は、高能率符号復号器（デコーダ）、D/A変換器、アナログアンプ等を有して構成される。 The playback unit 1035 has a function of performing decoding according to the input content segment and outputting an audio signal and a video signal to a predetermined output device such as a speaker. For example, when the content segment is high-efficiency encoded digital audio data similar to MPEG, the playback unit 1035 includes a high-efficiency code decoder (decoder), a D / A converter, an analog amplifier, and the like. Is done.

図９に示すコンテンツ配信システムａ（非同期モデル）の第１の特徴は、コンテンツ配信サーバ１０２ａとコンテンツ受信装置１０３ａが個別にシステムクロック発振部１０２３、１０３３を有する点にある。両者のシステムクロック周波数の精度は相当に高い必要はあるが、伝送路であるIPネットワーク１０４が同期クロックの伝送機構を備えてはいない。 The first feature of the content distribution system a (asynchronous model) shown in FIG. 9 is that the content distribution server 102a and the content reception device 103a individually have system clock oscillation units 1023 and 1033, respectively. Although the accuracy of both system clock frequencies needs to be considerably high, the IP network 104 serving as a transmission path does not include a synchronous clock transmission mechanism.

また、第２の特徴は、コンテンツ受信装置１０３ａがバッファ部１０３２を備え、バッファ部１０３２が空にならないようにコンテンツ読取部１０３４を構成することによって、再生部１０３５が映像・音声等の実時間性を持ったコンテンツを再生する場合であっても、IPネットワーク１０４によって生じるパケットの時間遅延や時間揺らぎ（ジッタ）を吸収しつつ、正常な再生を継続する点にある。 The second feature is that the content receiving device 103a includes the buffer unit 1032 and the content reading unit 1034 is configured so that the buffer unit 1032 is not emptied. Even in the case of reproducing content having a delay, normal reproduction is continued while absorbing time delay and time fluctuation (jitter) of packets caused by the IP network 104.

以上が、映像・音声等の実時間配信に用いられているストリーミングシステムの非同期モデルの概要である。 The above is the outline of the asynchronous model of the streaming system used for real-time distribution of video / audio.

次に、ストリーミングシステムにおける再生時刻管理について説明する。 Next, playback time management in the streaming system will be described.

まず、コンテンツの再生時刻の付与方法について説明する。 First, a method for assigning content playback time will be described.

上述の非同期モデルにおいては、コンテンツが連続メディアとして正しい再生時刻に再生されるためには、コンテンツ再生時刻をコンテンツ素片に関連付けて伝送する必要がある。例えば、図９において、コンテンツ読取部１０３４が、0秒0フレーム目の映像に相当するコンテンツ素片を選び出して、再生部１０３５に出力するためには、そのコンテンツ素片が0秒0フレーム目のデータであることを示すコンテンツ再生時刻が必要である。そうしなければ、ネットワーク上で時間遅延などが生じた場合は、そのまま再生時の時刻がずれてしまうことになる。 In the above asynchronous model, in order for content to be played back as a continuous medium at the correct playback time, it is necessary to transmit the content playback time in association with the content segment. For example, in FIG. 9, in order for the content reading unit 1034 to select a content segment corresponding to the 0 second 0 frame video and output it to the playback unit 1035, the content segment is the 0 second 0 frame. Content playback time indicating data is required. Otherwise, if a time delay or the like occurs on the network, the playback time will be shifted as it is.

ストリーミングシステムにおいては、コンテンツ素片にコンテンツ再生時刻を付与するために、RTP（Real-time Transfer Protocol,RFC1889）を用いることが多い（Audio-Video Transport Working Group, "RTP: A Transport Protocol for Real-Time Applications",RFC1889, Internet Engineering Taskforce, Jan 1996）。RTPでは、コンテンツ素片をペイロード（貨物）とし、また、ペイロードに先立って付与される最低96bitのRTPヘッダを主体とし、そのうち32bitがRTPタイムスタンプと呼ばれるネットワークタイムスタンプ（コンテンツ再生時刻情報を変形したもの）に割り当てられている。 In streaming systems, RTP (Real-time Transfer Protocol, RFC1889) is often used to give content playback time to content pieces (Audio-Video Transport Working Group, "RTP: A Transport Protocol for Real- Time Applications ", RFC 1889, Internet Engineering Taskforce, Jan 1996). In RTP, the content fragment is the payload (cargo), and the main part is the RTP header of at least 96 bits given before the payload. Stuff).

RTPタイムスタンプは、予めコンテンツの種類によって定められた、RTPタイムスタンプ周波数によって実時間に写像される。RTPタイムスタンプ周波数は、ネットワークタイムスタンプ周波数とも呼ばれる。一般に、MPEGの定めた高能率符号化方式によってデジタル化されたコンテンツの場合、RTPタイムスタンプ周波数には90000（90KHz）を用いることが多い。これは、1秒分のコンテンツを読み出した時、RTPタイムスタンプは+90000増加されることを意味する。つまり、RTPタイムスタンプに類するネットワークタイムスタンプとネットワークタイムスタンプ周波数とコンテンツ再生時刻には、以下の式が成り立つ。 The RTP time stamp is mapped in real time by the RTP time stamp frequency determined in advance by the content type. The RTP timestamp frequency is also called a network timestamp frequency. In general, in the case of content digitized by a high-efficiency encoding method defined by MPEG, 90000 (90 KHz) is often used as the RTP timestamp frequency. This means that when reading 1 second of content, the RTP timestamp is incremented by +90000. That is, the following formula is established for the network time stamp, the network time stamp frequency, and the content playback time similar to the RTP time stamp.

[数１]
コンテンツ再生時刻＝ネットワークタイムスタンプ／ネットワークタイムスタンプ周波数
例えば、29.97fpsのMPEG-2映像の場合、最初の1フレーム目のRTPタイムスタンプが“0”であるとすると、2フレーム目は“3003”、30フレーム目は“90090”となる。 [Equation 1]
Content playback time = network time stamp / network time stamp frequency For example, in the case of MPEG-2 video at 29.97fps, if the RTP time stamp of the first frame is "0", the second frame is "3003", The 30th frame is “90090”.

また、MPEG-4 Audio AACの場合、１フレームは1024サンプル固定であり、かつRTPタイムスタンプ周波数は音源のPCMサンプリング周波数を用いる。例えば、PCMサンプリング周波数44100Hz（44.1KHz PCM= CD Audio）の場合、RTPタイムスタンプ周波数は44100であり、1フレーム目のRTPタイムスタンプは“0”、2フレーム目は“1024”、約1秒分に当たる43フレーム目は“44032”となる。 In the case of MPEG-4 Audio AAC, one frame is fixed at 1024 samples, and the RTP timestamp frequency uses the PCM sampling frequency of the sound source. For example, if the PCM sampling frequency is 44100Hz (44.1KHz PCM = CD Audio), the RTP timestamp frequency is 44100, the RTP timestamp of the first frame is “0”, the second frame is “1024”, approximately 1 second The 43rd frame corresponding to is “44032”.

なお、今回はRTPを例示して説明したが、再生時刻に供されるネットワークタイムスタンプと、それに対応したネットワークタイムスタンプ周波数から、受信側でのコンテンツ再生時刻を計算しうるパケットフォーマットは他にもあり、極めて有り触れたものである。例えば、MPEGが規格化した伝送規格MPEG-2 Transport StreamにはPresentation Time Stampと呼ばれる、全く同様の機能を持つタイムスタンプが規格化されている。以上が、コンテンツ再生時刻の付与方法の概要である。 In this example, RTP has been described as an example, but there are other packet formats that can be used to calculate the content playback time on the receiving side from the network time stamp provided for the playback time and the corresponding network time stamp frequency. Yes, very common. For example, the MPEG-2 Transport Stream standardized by MPEG standardizes a time stamp called a “Presentation Time Stamp” having exactly the same function. The above is the outline of the content reproduction time giving method.

次に、コンテンツ配信サーバ１０２ａが管理しコンテンツ受信装置へ送信するコンテンツを記録したファイル（以下、コンテンツファイルと称す）のデータ構造について説明する。 Next, a data structure of a file (hereinafter referred to as a content file) in which content managed by the content distribution server 102a and transmitted to the content receiving device is recorded will be described.

図１０に、コンテンツ取得部１０２１によって管理されるコンテンツデータのデータ構造の一例を示す。図１０に示すように、コンテンツデータは、コンテンツファイル名、総トラック数Ｔ、および各トラックのトラックデータから構成され、さらに各トラックのトラックデータは、トラック番号、トラックタイムスタンプ周波数、総フレーム数、およびフレーム番号、時刻情報、コンテンツ素片からなる複数のデータフレームから構成される。 FIG. 10 shows an example of the data structure of content data managed by the content acquisition unit 1021. As shown in FIG. 10, the content data is composed of the content file name, the total number of tracks T, and the track data of each track. The track data of each track further includes a track number, a track timestamp frequency, the total number of frames, And a plurality of data frames composed of frame numbers, time information, and content pieces.

コンテンツファイル名は、コンテンツ記憶部上でコンテンツデータを格納するファイルを一意に特定する名前である。また、総トラック数Ｔは１つのコンテンツデータ内に収められたトラック数を記録する。トラックとは、１つのファイル内に格納されている複数のコンテンツの１つひとつを指し、トラック番号ｔ（総トラック数Ｔ≧ｔ≧１）によって、任意のトラック（コンテンツ）を選択することができ、ポインタとしての時刻情報によって、任意の位置から取り出すことができる。また、各トラックに記録されるトラックタイムスタンプ周波数は、時刻情報の単位時間当たりの増分を示す。 The content file name is a name that uniquely identifies a file that stores content data on the content storage unit. The total number of tracks T records the number of tracks stored in one content data. A track refers to each of a plurality of contents stored in one file, and an arbitrary track (content) can be selected by a track number t (total number of tracks T ≧ t ≧ 1). The time information as a pointer can be taken out from an arbitrary position. The track time stamp frequency recorded in each track indicates an increment per unit time of time information.

また、各トラックには、データフレームの総数である総フレーム数が記録されているが、この総フレーム数は各トラックで一致していなくても良い。また、各データフレームには、時系列順にフレーム番号が付与され、トラックタイムスタンプ周波数を基準とした時刻情報が付与される。 In addition, the total number of frames, which is the total number of data frames, is recorded in each track, but the total number of frames does not have to match in each track. Further, frame numbers are assigned to each data frame in chronological order, and time information based on the track time stamp frequency is assigned.

以上のような複数のコンテンツが多重化されたコンテンツデータのデータ構造は、音声・動画等を格納するファイル形式として極めて有り触れたものである。ファイル形式によっては、トラックタイムスタンプ周波数がシステム定数でデータ内に含まれていない場合や、時刻情報の保存領域を節約するために同一の時刻情報を持つフレーム同士がインデックス化されている場合がある等の細かな違いはあるが、フレーム番号と時刻情報が相互変換可能な状態で保持されているという点は共通である。 The data structure of content data in which a plurality of contents are multiplexed as described above is very common as a file format for storing audio / moving images. Depending on the file format, the track time stamp frequency may not be included in the data as a system constant, or frames with the same time information may be indexed to save the time information storage area. However, the frame number and the time information are held in a mutually convertible state in common.

このようなファイル形式としては、Microsoft AVIや、Apple Computer QuickTime FormatやISO/IEC 14496-1 MP4 Fileなどが挙げられる。また、Apple Computer QuickTime Format やISO/IEC 14496-1 MP4 Fileにおいては、参照Atom（Data Reference Atom）を使用することによって、複数トラックの全部または一部のコンテンツを、外部のコンテンツデータとして独立することができるが、主となるコンテンツデータにフレーム番号と時刻情報が相互変換可能な状態で記録されているので、コンテンツデータと同等のデータ構造を持つとみなして良い。以上がコンテンツ配信サーバ１０２ａにおけるコンテンツデータのデータ構造の概要である。 Examples of such file formats include Microsoft AVI, Apple Computer QuickTime Format, and ISO / IEC 14496-1 MP4 File. In addition, in Apple Computer QuickTime Format and ISO / IEC 14496-1 MP4 File, by using Reference Atom (Data Reference Atom), all or part of the contents of multiple tracks can be made independent as external content data. However, since the frame number and time information are recorded in the main content data in a mutually convertible state, it may be regarded as having the same data structure as the content data. The above is the outline of the data structure of the content data in the content distribution server 102a.

次に、コンテンツデータを読み出す際のコンテンツ取得部１０２１の再生時刻管理処理（送信側）について、図１１のフローチャートを用いて説明する。 Next, the reproduction time management process (transmission side) of the content acquisition unit 1021 when reading content data will be described with reference to the flowchart of FIG.

まず、コンテンツ受信装置１０３ａからコンテンツの再生要求がされると、コンテンツ配信サーバ１０２ａのコンテンツ取得部１０２１は、以下に示す６つの変数の初期化を行う（ステップＳ１０１）。 First, when a content reproduction request is received from the content receiving apparatus 103a, the content acquisition unit 1021 of the content distribution server 102a initializes the following six variables (step S101).

１：コンテンツファイル名CF＝デフォルトのコンテンツファイル名（例えば“hoge.mp4”）
２：トラック番号TN＝デフォルトのトラック番号（例えば“1”）
３：開始フレーム番号FN＝デフォルトのフレーム番号（例えば“1”）
４：再生開始時刻STS＝システムクロック発振部１０２３から得た現在時刻NTS
５：コンテンツ再生時刻CTS＝“0”
６：再生状態フラグF＝真偽値“Ｎ”
次に、コンテンツ取得部１０２１は、自動的に再生を開始するか否かを示す予め定められた自動モード定数を参照し、自動再生を行うかどうか判定する（ステップＳ１０２）。自動モード定数が真偽値“Ｙ”である場合は、ステップＳ１１０の処理へ移行し、自動モード定数が真偽値“Ｎ”である場合には、ステップＳ１０３の処理へ移行する。 1: Content file name CF = Default content file name (eg "hoge.mp4")
2: Track number TN = default track number (eg "1")
3: Start frame number FN = default frame number (for example, “1”)
4: Playback start time STS = current time NTS obtained from the system clock oscillator 1023
5: Content playback time CTS = “0”
6: Playback state flag F = true value “N”
Next, the content acquisition unit 1021 refers to a predetermined automatic mode constant indicating whether or not to automatically start reproduction, and determines whether to perform automatic reproduction (step S102). When the automatic mode constant is the true value “Y”, the process proceeds to step S110, and when the automatic mode constant is the true value “N”, the process proceeds to step S103.

ステップＳ１０２の処理において、自動モード定数が真偽値“Ｎ”であった場合には、コンテンツ取得部１０２１は、外部入力（コマンド入力CMD、コンテンツファイル名CF1、トラック番号TN1、コンテンツ再生開始時刻CTS1の４つの変数の組）があるかどうか判定し（ステップＳ１０３）、外部入力がある場合に真偽値“Ｙ”となりステップＳ１０４の処理へ移行し、外部入力がない場合に真偽値“Ｎ”となりステップＳ１０６の処理へ移行する。 In the process of step S102, when the automatic mode constant is a true / false value “N”, the content acquisition unit 1021 performs external input (command input CMD, content file name CF1, track number TN1, content reproduction start time CTS1. (Step S103), if there is an external input, the truth value is “Y”, and the process proceeds to step S104. If there is no external input, the truth value “N” "And the process proceeds to step S106.

ステップＳ１０３の処理において、コンテンツ取得部１０２１は、外部入力のうちコマンド入力CMD＝再生停止定数であるかを判定し（ステップＳ１０４）、コマンド入力CMD＝再生停止定数である場合は真偽値“Ｙ”となり、全体の動作を停止終了し、それ以外の場合は真偽値“Ｎ”となり、ステップＳ１０５の処理へ移行する。 In the process of step S103, the content acquisition unit 1021 determines whether command input CMD = reproduction stop constant among external inputs (step S104). If command input CMD = reproduction stop constant, true / false value “Y” is determined. ", The entire operation is stopped and terminated. Otherwise, the truth value is" N ", and the process proceeds to step S105.

ステップＳ１０４の処理において、真偽値“Ｎ”であった場合には、コンテンツ取得部１０２１は、外部入力（コンテンツファイル名CF1、トラック番号TN1、コンテンツ再生開始時刻CTS1）の３変数を用いて５つの変数の初期化を行う（ステップＳ１０５）。 In the process of step S104, when the truth value is “N”, the content acquisition unit 1021 uses the three variables of the external input (content file name CF1, track number TN1, content reproduction start time CTS1) to set 5. Two variables are initialized (step S105).

１：コンテンツファイル名CF＝コンテンツファイル名CF1
２：トラック番号TN＝トラック番号TN1
３：再生開始時刻STS＝システムクロック発振部１０２３から得た現在時刻NTS
４：コンテンツ再生時刻CTS＝コンテンツ再生開始時刻CTS1
５：再生状態フラグF＝真偽値“Ｙ”
さらにコンテンツ取得部１０２１は、コンテンツ再生時刻CTSを用いた開始フレーム番号FNの初期化を行い、前段階として送出開始時刻情報StartTSを算出する。送出開始時刻情報StartTSの算出式を以下に示す。 1: Content file name CF = Content file name CF1
2: Track number TN = Track number TN1
3: Playback start time STS = current time NTS obtained from the system clock oscillator 1023
4: Content playback time CTS = Content playback start time CTS1
5: Playback state flag F = true value “Y”
Further, the content acquisition unit 1021 initializes the start frame number FN using the content reproduction time CTS, and calculates transmission start time information StartTS as a previous step. The calculation formula of the transmission start time information StartTS is shown below.

[数２]
StartTS＝CTS×TRS
ここでトラックタイムスタンプ周波数TRSは、コンテンツファイル名CFの中のトラック番号TNが指し示すトラックのトラックタイムスタンプ周波数である。 [Equation 2]
StartTS = CTS × TRS
Here, the track time stamp frequency TRS is the track time stamp frequency of the track indicated by the track number TN in the content file name CF.

次に、コンテンツ取得部１０２１は、コンテンツファイル名CFの中のトラック番号TNが指し示すトラックのフレーム番号を１から順にサーチし、時刻情報≧StartTSが成り立つ地点のフレーム番号NFNを得て、フレーム番号NFNを開始フレーム番号FNに設定する。 Next, the content acquisition unit 1021 searches the frame number of the track indicated by the track number TN in the content file name CF in order from 1 to obtain the frame number NFN of the point where time information ≧ StartTS is satisfied, and the frame number NFN. Is set to the start frame number FN.

また、ステップＳ１０３の処理において、自動モード定数が真偽値“Ｎ”であった場合には、コンテンツ取得部１０２１は、再生状態フラグの値が真偽値“Ｙ”であるかどうか判定し（ステップＳ１０６）、値が真偽値“Ｙ”である場合ステップＳ１０７の処理へ移行し、真偽値“Ｎ”である場合にはＳ１０３の処理へ移行する。 If the automatic mode constant is a true / false value “N” in the process of step S103, the content acquisition unit 1021 determines whether the value of the playback state flag is a true / false value “Y” ( In step S106), if the value is a true / false value “Y”, the process proceeds to step S107. If the value is a true / false value “N”, the process proceeds to step S103.

ステップＳ１０６の処理において再生状態フラグの値が真偽値“Ｙ”であった場合、またはステップＳ１０５から継続する処理の場合には、コンテンツ取得部１０２１は、システムクロック発振部１０２３から得た現在時刻NTSと再生開始時刻STSとシステムクロック周波数SSから経過時間を算出し、経過時間とコンテンツ再生時刻CTSとトラックタイムスタンプ周波数TRSから、送出可能時刻情報LastTSを算出することによって、送信可能範囲を検索する（ステップＳ１０７）。送出可能時刻情報LastTSの算出式を以下に示す。 In the case where the value of the reproduction state flag is the true / false value “Y” in the process of step S106, or in the case of the process continued from step S105, the content acquisition unit 1021 obtains the current time obtained from the system clock oscillation unit 1023. The elapsed time is calculated from NTS, playback start time STS, and system clock frequency SS, and the transmittable range is searched by calculating sendable time information LastTS from the elapsed time, content playback time CTS, and track timestamp frequency TRS. (Step S107). The formula for calculating the sendable time information LastTS is shown below.

[数３]
LastTS＝（NTS−STS）／SS×TRS
ここでシステムクロック周波数SSとは、（現在時刻NTS−再生開始時刻STS）を秒単位に変換するための定数である。例えばシステムクロックが1/1000000秒の精度を持っているとすると1000000である。 [Equation 3]
LastTS = (NTS−STS) / SS × TRS
Here, the system clock frequency SS is a constant for converting (current time NTS−reproduction start time STS) into seconds. For example, if the system clock has an accuracy of 1/1000000 second, it is 1000000.

次に、コンテンツ取得部１０２１は、コンテンツファイル名CFの中のトラック番号TNが指し示すトラック内の、開始フレーム番号FNからフレーム番号を順にサーチし、時刻情報≧LastTSが成り立つ点のフレーム番号NFNを取得する。条件を満たすフレーム番号NFNが求まらない場合は、コンテンツ取得部１０２１は、フレーム番号NFNに終端定数“−1”を設定する。 Next, the content acquisition unit 1021 sequentially searches the frame number from the start frame number FN in the track indicated by the track number TN in the content file name CF, and acquires the frame number NFN at which time information ≧ LastTS is satisfied. To do. When the frame number NFN that satisfies the condition is not found, the content acquisition unit 1021 sets the termination constant “−1” to the frame number NFN.

次に、コンテンツ取得部１０２１は、開始フレーム番号FNとフレーム番号NFNを比較することによって送信の可否を判定し（ステップＳ１０８）、FN＜NFNが成立する場合には真偽値“Ｙ”となりステップＳ１０９の処理へ移行し、成立しない場合は真偽値“Ｎ”となりステップＳ１０３の処理へ移行する。 Next, the content acquisition unit 1021 determines whether or not transmission is possible by comparing the start frame number FN and the frame number NFN (step S108). If FN <NFN is satisfied, the truth value “Y” is obtained. The process proceeds to S109. If not established, the truth value is “N”, and the process proceeds to Step S103.

ステップＳ１０８の処理において真偽値“Ｙ”であった場合には、コンテンツ取得部１０２１は、コンテンツファイル名CFの中のトラック番号TNが指し示すトラック内の、開始フレーム番号FNからフレーム番号NFN未満の各フレーム番号に対応するコンテンツ素片それぞれと、それぞれのコンテンツ素片について、対応する時刻情報から下記の式に従って生成されたネットワークタイムスタンプPTSを、送信部１０２２へ出力する（ステップＳ１０９）。送信部１０２２は、ネットワークタイムスタンプPTSを用いて適切なヘッダを生成し、コンテンツ素片に付加する。 If the value is true or false in the process of step S108, the content acquisition unit 1021 determines that the track number TN in the track indicated by the track number TN in the content file name CF is less than the frame number NFN from the start frame number FN. For each content segment corresponding to each frame number, and for each content segment, the network time stamp PTS generated according to the following equation from the corresponding time information is output to the transmission unit 1022 (step S109). The transmission unit 1022 generates an appropriate header using the network time stamp PTS and adds it to the content fragment.

[数４]
PTS＝時刻情報／TRS×ネットワークタイムスタンプ周波数
出力後、送信済みの位置まで開始フレーム番号FNをずらすため、コンテンツ取得部１０２１は、開始フレーム番号FN＝フレーム番号NFNと設定する。 [Equation 4]
After outputting PTS = time information / TRS × network time stamp frequency, the content acquisition unit 1021 sets start frame number FN = frame number NFN to shift the start frame number FN to the transmitted position.

また、ステップＳ１０２の処理において、自動モード定数が真偽値“Ｙ”であった場合には、コンテンツ取得部１０２１は、再生状態フラグF＝真偽値“Ｙ”と設定し、再生開始時刻STS＝システムクロック発振部１０２３から得た現在時刻NTSを設定し、自動再生開始のための設定を行う（ステップＳ１１０）。以上がコンテンツ取得部１０２１の再生時刻管理処理（送信側）についての概要である。 If the automatic mode constant is the true / false value “Y” in the process of step S102, the content acquisition unit 1021 sets the reproduction state flag F = the true / false value “Y” and the reproduction start time STS. = The current time NTS obtained from the system clock oscillator 1023 is set, and the setting for starting automatic reproduction is performed (step S110). The above is the outline of the reproduction time management process (transmission side) of the content acquisition unit 1021.

次に、ネットワークタイムスタンプを用いたコンテンツ読取部１０３４の再生時刻管理処理（受信側）について、図１２のフローチャートを用いて説明する。 Next, the reproduction time management processing (reception side) of the content reading unit 1034 using the network time stamp will be described with reference to the flowchart of FIG.

コンテンツ読取部１０３４は、コンテンツの再生を開始すると、バッファ部１０３２に一定量コンテンツ素片と、これに関連付けられた受信ネットワークタイムスタンプPTSが蓄積されるまで蓄積量を監視し（ステップＳ１２１）、予め定められた蓄積量のしきい値を超えると、真偽値“Ｙ”となりステップＳ１２２の処理へ移行する。この蓄積量のしきい値は、単純に総バッファ量の半分（Half Full）でも良いし、予めコンテンツのビットレートが既知である場合は適切な計算して、バッファ部１０３２がバッファアンダーフローを起こさないように決定しても良い。 When the content reading unit 1034 starts reproduction of the content, the content reading unit 1034 monitors the accumulation amount until a certain amount of content pieces and the received network time stamp PTS associated therewith are accumulated in the buffer unit 1032 (step S121). When the threshold value of the predetermined accumulation amount is exceeded, the truth value becomes “Y”, and the process proceeds to step S122. The threshold value of the accumulation amount may be simply half of the total buffer amount (Half Full), or when the bit rate of the content is known in advance, the buffer unit 1032 causes a buffer underflow. You may decide not to.

次に、コンテンツ読取部１０３４は、システムクロック発振部１０３３を参照し、再生開始時刻STS＝システムクロック発振部１０２３から得た現在時刻NTSを設定し、読み出し開始のための設定を行う（ステップＳ１２２）。また、コンテンツ読取部１０３４は、タイムアウト変数TOUT＝タイムアウト定数（例えば“10”）に設定する。 Next, the content reading unit 1034 refers to the system clock oscillating unit 1033, sets the reproduction start time STS = the current time NTS obtained from the system clock oscillating unit 1023, and performs settings for starting reading (step S122). . The content reading unit 1034 sets a timeout variable TOUT = timeout constant (for example, “10”).

次に、コンテンツ読取部１０３４は、システムクロック発振部１０３３を参照して再度現在時刻NTSを更新した上で、下記の式に従って読み出すべきネットワークタイムスタンプPTSを算出する（ステップＳ１２３）。 Next, the content reading unit 1034 updates the current time NTS again with reference to the system clock oscillation unit 1033, and calculates a network time stamp PTS to be read according to the following equation (step S123).

[数５]
PTS＝（NTS−STS）／SS×ネットワークタイムスタンプ周波数
次に、コンテンツ読取部１０３４は、ネットワークタイムスタンプPTSに基づいてバッファ部１０３２を検索し、ネットワークタイムスタンプPTSと同じかより小さいネットワークタイムスタンプPTSと関連付けられたコンテンツ素片を取り出す（ステップＳ１２４）。コンテンツ素片が１つも見つからない場合は真偽値“Ｎ”となり、コンテンツ読取部１０３４は、ステップＳ１２６の処理へ移行し、コンテンツ素片が見つかった場合は真偽値“Ｙ”となり、タイムアウト変数TOUTにタイムアウト定数（例えば“10”）をセットして、ステップＳ１２５の処理へ移行する。 [Equation 5]
PTS = (NTS−STS) / SS × network time stamp frequency Next, the content reading unit 1034 searches the buffer unit 1032 based on the network time stamp PTS, and the network time stamp PTS is equal to or smaller than the network time stamp PTS. The content segment associated with is taken out (step S124). If no content fragment is found, the truth value “N” is obtained, and the content reading unit 1034 proceeds to the processing of step S126. If the content fragment is found, the truth value “Y” is obtained, and a time-out variable is set. A timeout constant (for example, “10”) is set in TOUT, and the process proceeds to step S125.

ステップＳ１２４で真偽値“Ｎ”であった場合には、コンテンツ読取部１０３４は、一定時間（例えば0.5秒程度）待機した後、タイムアウト変数TOUTを“1”だけ減算し、タイムアウトが発生したか判定する（ステップＳ１２５）。減算したTOUTが0未満になった場合は真偽値“Ｙ”となり、コンテンツ読取部１０３４は、処理を終了し、真偽値“Ｎ”となれば、ステップＳ１２３の処理に移行する。結果的に、5秒（0.5×10）の間パケットが到達しなかった場合には、終了動作を行うことになる。 In the case where the truth value is “N” in step S124, the content reading unit 1034 waits for a certain time (for example, about 0.5 seconds), and then subtracts the time-out variable TOUT by “1” to determine whether a time-out has occurred. Determination is made (step S125). When the subtracted TOUT is less than 0, the true / false value “Y” is obtained, and the content reading unit 1034 ends the process. When the subtracted TOUT becomes the true / false value “N”, the process proceeds to step S123. As a result, if the packet does not arrive for 5 seconds (0.5 × 10), the end operation is performed.

また、ステップＳ１２４で真偽値“Ｙ”であった場合には、コンテンツ読取部１０３４は、ステップＳ１２４の処理で取り出されたコンテンツ素片を再生部１０３５へ出力する（ステップＳ１２６）。この際、コンテンツ素片を、再生部１０３５の希望に沿うように、適当な手段で並べ替えたり、コンテンツ素片同士を併合したりしても良い。例えば、RTPヘッダ内のシーケンス番号を用いて並べ替えた後に、同一のネットワークタイムスタンプ（RTPタイムスタンプ）を持つコンテンツ素片同士を結合しても良い。 If the value is true / false “Y” in step S124, the content reading unit 1034 outputs the content segment extracted in step S124 to the reproduction unit 1035 (step S126). At this time, the content pieces may be rearranged by an appropriate means so as to meet the request of the playback unit 1035, or the content pieces may be merged. For example, content pieces having the same network time stamp (RTP time stamp) may be combined after rearrangement using the sequence number in the RTP header.

なお、説明の便を考えて、受信パケットに付与されるネットワークタイムスタンプは常に0から始まることを仮定していたが、後述するRTSP等の、他の通信手順によって、予め送信開始時点のネットワークタイムスタンプ初期値OFSが既知である場合は、式１０５は、式１０６に変形できる。 For the convenience of explanation, it was assumed that the network time stamp given to the received packet always starts from 0. However, the network time at the start of transmission is preliminarily determined by other communication procedures such as RTSP described later. When the stamp initial value OFS is known, the equation 105 can be transformed into the equation 106.

[数６]
PTS＝（NTS−STS）／SS×ネットワークタイムスタンプ周波数＋OFS
以上がネットワークタイムスタンプを用いた、コンテンツ読取部１０３４の再生時刻管理処理（受信側）の概要である。 [Equation 6]
PTS = (NTS-STS) / SS x network time stamp frequency + OFS
The above is the outline of the reproduction time management process (reception side) of the content reading unit 1034 using the network time stamp.

次に、ストリーミングシステムにおける伝送制御機構について説明する。 Next, a transmission control mechanism in the streaming system will be described.

ストリーミングシステムにおいては、利用者の入力処理から任意のコンテンツを選択・再生・停止等の操作を受け付け、操作に応じてTCP／UDP等のセッションを動的に接続・切断する必要がある。これを伝送制御機構と呼ぶ。図１３に、伝送制御機構を備えるコンテンツ配信システム１０１ｂを構成するコンテンツ配信サーバ１０２ｂとコンテンツ受信装置１０３ｂの機能ブロック図を示す。コンテンツ配信サーバ１０２ｂとコンテンツ受信装置１０３ｂは、IPネットワーク１０４によって相互に接続される。 In a streaming system, it is necessary to select / play / stop operations such as arbitrary content from user input processing, and to dynamically connect / disconnect a session such as TCP / UDP according to the operation. This is called a transmission control mechanism. FIG. 13 shows a functional block diagram of a content distribution server 102b and a content receiving device 103b that constitute a content distribution system 101b having a transmission control mechanism. The content distribution server 102b and the content receiving device 103b are connected to each other by the IP network 104.

コンテンツ配信サーバ１０２ｂは、コンテンツ取得部１０２１、送信部１０２２、システムクロック発振部１０２３、コンテンツ記憶部１０２４、および伝送制御部１０２５を有し、また、コンテンツ受信装置１０３ｂは、受信部１０３１、バッファ部１０３２、システムクロック発振部１０３３、コンテンツ読取部１０３４、再生部１０３５、伝送制御部１０３６、および入力部１０３７を有する。なお、図９と同じものについては同じ番号を付し、その説明は省略する。 The content distribution server 102b includes a content acquisition unit 1021, a transmission unit 1022, a system clock oscillation unit 1023, a content storage unit 1024, and a transmission control unit 1025. The content reception device 103b includes a reception unit 1031 and a buffer unit 1032. A system clock oscillation unit 1033, a content reading unit 1034, a reproduction unit 1035, a transmission control unit 1036, and an input unit 1037. In addition, the same number is attached | subjected about the same thing as FIG. 9, and the description is abbreviate | omitted.

コンテンツ配信サーバ１０２ｂの伝送制御部１０２５は、コンテンツ受信装置１０３ｂからのコンテンツの選択、コンテンツの再生・停止等の指示を含む伝送制御情報を受信すると、この伝送制御情報を、コンテンツ取得部１０２１と、送信部１０２２に通知する機能を有する。また、伝送制御部１０２５は、コンテンツ記憶部１０２４を参照してコンテンツファイル名、トラック番号、コンテンツ再生開始時刻を取得し、それらをコンテンツ取得部１０２１に対して入力し、送信部１０２２に対して送信アドレスを入力する機能を有する。 When the transmission control unit 1025 of the content distribution server 102b receives transmission control information including instructions for selecting content, playing / stopping the content, etc. from the content receiving device 103b, the transmission control unit 1025 sends the transmission control information to the content acquisition unit 1021. A function of notifying the transmission unit 1022; Also, the transmission control unit 1025 refers to the content storage unit 1024, acquires the content file name, track number, and content playback start time, inputs them to the content acquisition unit 1021, and transmits them to the transmission unit 1022 It has a function to input an address.

適用するプロトコルとしては、RFC2326に規定されているRTSP（Real Time Streaming Protocol）に代表される実時間データ伝送制御用プロトコルを想定しており、Setup、Play、 Pause、Teardown、Describe等のメソッドを利用できる（H.Schulzrinne etal, "Real Time Streaming Protocol", RFC2326, Internet Engineering Taskforce, Apr 1998）。 The protocol to be applied is assumed to be a real-time data transmission control protocol typified by RTSP (Real Time Streaming Protocol) defined in RFC2326, and uses methods such as Setup, Play, Pause, Teardown, Describe, etc. Yes (H. Schulzrinne etal, "Real Time Streaming Protocol", RFC 2326, Internet Engineering Taskforce, Apr 1998).

また、コンテンツ受信装置１０３ｂの伝送制御部１０３６は、入力部１０３７からの伝送制御情報をコンテンツ配信サーバ１０２ｂの伝送制御部１０２５へ送信、その応答を受信し、応答を解析した後、受信アドレスを抽出し、受信部１０３１へ入力する機能を有する。 Further, the transmission control unit 1036 of the content receiving device 103b transmits the transmission control information from the input unit 1037 to the transmission control unit 1025 of the content distribution server 102b, receives the response, analyzes the response, and extracts the received address. And has a function of inputting to the receiving unit 1031.

また、コンテンツ受信装置１０３ｂの入力部１０３７は、利用者から、コンテンツの選択・コンテンツの再生・停止等の伝送制御情報の入力処理を受け付ける機能を有する。例えば、ビットマップディスプレイとキーボードを用いて、再生ボタン、停止ボタン、コンテンツ選択ダイヤログなどのＧＵＩ部品を利用者に提供する。 Further, the input unit 1037 of the content receiving apparatus 103b has a function of accepting transmission control information input processing such as content selection, content playback, and stop from the user. For example, using a bitmap display and a keyboard, GUI parts such as a play button, a stop button, and a content selection dialog are provided to the user.

次に、伝送制御情報の送受信時における伝送制御部１０２５および伝送制御部１０３６の動作について、図１４のシーケンス図を用いて説明する。図１４は、RTSPを用いて１つのコンテンツを選択し、伝送のためのセッションを確立し、再生を開始し、停止するまでに行われる処理である。なお、ここでは、コンテンツ配信サーバ１０２ｂのホスト名を“server.jvc-victor.jp”とし、コンテンツを蓄積しているコンテンツファイル名を“hoge.mp4”とし、“hoge.mp4”内部には１つのMPEG-4 AAC音声トラックが存在し、トラック番号は１であるとし、再生時間は235秒であるとする。 Next, operations of transmission control section 1025 and transmission control section 1036 at the time of transmission / reception of transmission control information will be described using the sequence diagram of FIG. FIG. 14 shows processing performed until one content is selected using RTSP, a session for transmission is established, playback is started, and stopped. Here, the host name of the content distribution server 102b is “server.jvc-victor.jp”, the content file name storing the content is “hoge.mp4”, and “hoge.mp4” contains 1 Assume that there are two MPEG-4 AAC audio tracks, the track number is 1, and the playback time is 235 seconds.

RTSPは、コンテンツをネットワーク上で一意に特定するための資源識別子として、コンテンツURI（Uniform Resource Identifier）を適用する（T. Berners-Lee, "Uniform Resource Identifiers （URI）: Generic Syntax", RFC2396, Internet Engineering Taskforce, Aug 1998）。コンテンツ“hoge.mp4”は以下のようなコンテンツURIによって表される。 RTSP applies content URIs (Uniform Resource Identifiers) as resource identifiers for uniquely identifying content on the network (T. Berners-Lee, "Uniform Resource Identifiers (URI): Generic Syntax", RFC2396, Internet Engineering Taskforce, Aug 1998). The content “hoge.mp4” is represented by the following content URI.

“rtsp://server.jvc-victor.jp/hoge.mp4”
まず、伝送制御部１０３６は、コンテンツURIを用いて、セッション記述を要求する（ステップＳ１３１）。このセッション記述とは、コンテンツURIに関連付けられたコンテンツデータに対して、どのようなセッション（通信路）が確立できるかを示したテキストデータである。セッション記述形式としては、SDP（Session Description Protocol）が適用される（M. Handley, " SDP: Session Description Protocol ", RFC2327, Internet Engineering Taskforce, April 1998）。RTSPにおいてセッション記述の要求にはDESCRIBEメソッドを使用する。なお、各メソッドおよびその応答メッセージ（その他付随する情報も含めて）は、それぞれリクエストメッセージのヘッダおよびレスポンスメッセージのヘッダに挿入されて送受信される。 “Rtsp: //server.jvc-victor.jp/hoge.mp4”
First, the transmission control unit 1036 requests a session description using the content URI (step S131). The session description is text data indicating what kind of session (communication path) can be established for the content data associated with the content URI. As a session description format, Session Description Protocol (SDP) is applied (M. Handley, “SDP: Session Description Protocol”, RFC2327, Internet Engineering Taskforce, April 1998). Use the DESCRIBE method to request a session description in RTSP. Each method and its response message (including other accompanying information) are inserted and received in the header of the request message and the header of the response message, respectively.

“DESCRIBE rtsp://server.jvc-victor.jp/hoge.mp4 RTSP/1.0”
伝送制御部１０２５は、セッション記述を含んだ応答を送信する（ステップＳ１３２）。セッション記述には、再生時間と、メディア記述が含まれる。この再生時間とは、指定したコンテンツURIに関連付けられた連続メディアの最大の再生時間である。再生時間のフォーマットは多種類あるが、最も簡単なフォーマットは、開始時間と終了時間を、浮動小数点を用いた秒数で表したNPT（Normal Play Time, ISO8601）である。例えば、235秒分のMPEG-4 AAC音声トラックを含んだコンテンツファイルを指し示すコンテンツURIの再生時間を、NPTを用いて表すと以下のようになる。 “DESCRIBE rtsp: //server.jvc-victor.jp/hoge.mp4 RTSP / 1.0”
The transmission control unit 1025 transmits a response including the session description (step S132). The session description includes a playback time and a media description. This playback time is the maximum playback time of continuous media associated with the specified content URI. There are many types of playback time formats, but the simplest format is NPT (Normal Play Time, ISO8601) in which the start time and end time are expressed in seconds using floating point. For example, the reproduction time of a content URI indicating a content file including an MPEG-4 AAC audio track for 235 seconds is expressed as follows using NPT.

“a=range:npt=0.0-235.0”
また、メディア記述には、セッションを確立するための事前情報として、コンテンツ種別やネットワークタイムスタンプ周波数などの情報を含む。 “A = range: npt = 0.0-235.0”
Also, the media description includes information such as content type and network time stamp frequency as prior information for establishing a session.

“m=audio 0 RTP/AVP/UDP 96”
“a=rtpmap:96 mpeg4-generic/48000/2”
“a=control:rtsp://server.jvc-victor.jp/hoge.mp4/trackID=1”
“a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-hbr; config=1190; SizeLength=13; IndexLength=3; IndexDeltaLength=3; Profile=1;”
上記のメディア記述は、MPEG-4 AAC Hi-bitrate符号化方式で符号化された48000Hz（48KHz）PCMサンプル周波数のステレオ音声が含まれていることを表現している。また、セッションを確立する際には、UDPを下位ネットワークプロトコルとし、RTPを適用して伝送しなければならないことが示されている。 “M = audio 0 RTP / AVP / UDP 96”
“A = rtpmap: 96 mpeg4-generic / 48000/2”
“A = control: rtsp: //server.jvc-victor.jp/hoge.mp4/trackID=1”
“A = fmtp: 96 streamtype = 5; profile-level-id = 15; mode = AAC-hbr; config = 1190; SizeLength = 13; IndexLength = 3; IndexDeltaLength = 3; Profile = 1;”
The above media description expresses that stereo sound of 48000 Hz (48 KHz) PCM sample frequency encoded by the MPEG-4 AAC Hi-bitrate encoding method is included. Also, it is shown that when establishing a session, UDP must be used as a lower layer network protocol and RTP should be applied for transmission.

また、このセッションの準備要求には以下のコントロールURIを使うことが示されている。コントロールURIは、1つのコンテンツデータ内の複数トラックを識別するためにトラック番号（trackID=1）が付与されている。 Also, it is shown that the following control URI is used for the preparation request of this session. The control URI is given a track number (trackID = 1) to identify a plurality of tracks in one content data.

“rtsp://server.jvc-victor.jp/hoge.mp4/trackID=1”
セッション記述を解析した伝送制御部１０３６は、MPEG-4 AACステレオ音声を伝送するためのセッションを確立することを決定し、セッションの受信アドレスを決定し、セッション確立準備要求を伝送制御部１０２５へ送信する（ステップＳ１３３）。RTSPにおけるセッション確立準備要求にはSETUPメソッドを用いる。下記の例では、受信アドレスは、伝送制御部１０３６の持つ受信アドレス（136.198.190.100）と受信ポート番号（6668-6669）となっている。 “Rtsp: //server.jvc-victor.jp/hoge.mp4/trackID=1”
After analyzing the session description, the transmission control unit 1036 determines to establish a session for transmitting MPEG-4 AAC stereo audio, determines a reception address of the session, and transmits a session establishment preparation request to the transmission control unit 1025. (Step S133). The SETUP method is used for a session establishment preparation request in RTSP. In the following example, the reception address is the reception address (136.198.190.100) and the reception port number (6668-6669) that the transmission control unit 1036 has.

“SETUP rtsp://server.jvc-victor.jp/hoge.mp4/trackID=1 RTSP/1.0”
“Transport: RTP/AVP/UDP;unicast;destination=136.198.190.100;client_port=6668-
6669”
セッション確立準備要求を正しく受信した伝送制御部１０２５は、新たに送信アドレスを決定し、送信アドレスを送信部１０２２へ入力し、あわせて、セッション情報を伝送制御部１０３６へ送信する（ステップＳ１３４）。セッション情報には、配信に用いる配信サーバの送信アドレス（例では136.198.190.1）や送信ポート番号（下記の例では19000〜19001）が含まれる。 “SETUP rtsp: //server.jvc-victor.jp/hoge.mp4/trackID=1 RTSP / 1.0”
“Transport: RTP / AVP / UDP; unicast; destination = 136.198.190.100; client_port = 6668-
6669 ”
Upon receiving the session establishment preparation request correctly, the transmission control unit 1025 newly determines a transmission address, inputs the transmission address to the transmission unit 1022, and transmits session information to the transmission control unit 1036 (step S134). The session information includes a transmission address (136.198.190.1 in the example) and a transmission port number (19000 to 19001 in the following example) used for distribution.

“Transport: RTP/AVP/UDP;unicast;source=136.198.190.1;server_port=19000-19001”
また、伝送制御部１０３６では、セッション情報を受け付けた時点で、先に決定しておいた受信アドレスを受信部１０３１へ入力する。送信部１０２２と受信部１０３１双方の処理が完了した時点で、新たなセッションが確立される。 “Transport: RTP / AVP / UDP; unicast; source = 136.198.190.1; server_port = 19000-19001”
In addition, the transmission control unit 1036 inputs the previously determined reception address to the reception unit 1031 when the session information is received. A new session is established when the processing of both the transmission unit 1022 and the reception unit 1031 is completed.

次に、伝送制御部１０３６は、再生開始要求を送信する（ステップＳ１３５）。再生開始要求では、再生範囲を、NPTを用いて指定することができる。下記の例では、コンテンツの最初（0.0秒）から最後（235.0秒）までの指定している。RTSPにおいて再生開始要求にはPLAYメソッドを用いる。 Next, the transmission control unit 1036 transmits a reproduction start request (step S135). In the playback start request, the playback range can be specified using NPT. In the example below, the content is specified from the beginning (0.0 seconds) to the end (235.0 seconds). The PLAY method is used for a playback start request in RTSP.

“PLAY rtsp://server.jvc-victor.jp/hoge.mp4 RTSP/1.0”
“Range: npt=0.0-235.0”
伝送制御部１０２５は、再生開始要求の応答を送信する（ステップＳ１３６）。再生開始要求を受け付けた伝送制御部１０２５は、既に準備の済んでいる全てのセッション対して、各セッションのコントロールURIと再生開始要求に含まれる再生範囲から、コンテンツファイル名・トラック番号・コンテンツ再生開始時刻を算出し、コマンド入力＝再生開始定数（例えば“PLAＹ”）と合わせて、コンテンツ取得部１０２１に入力し、RTPパケットの送信を開始する。 “PLAY rtsp: //server.jvc-victor.jp/hoge.mp4 RTSP / 1.0”
“Range: npt = 0.0-235.0”
The transmission control unit 1025 transmits a response to the reproduction start request (step S136). The transmission control unit 1025 that has received the reproduction start request, for all the sessions that have already been prepared, from the control URI of each session and the reproduction range included in the reproduction start request, the content file name / track number / content reproduction start The time is calculated and input to the content acquisition unit 1021 together with command input = reproduction start constant (for example, “PLAY”), and transmission of the RTP packet is started.

また、この応答は、コンテンツ受信装置１０３ｂのコンテンツ読取部１０３４の動作に必要となるRTPパケット情報を含む。例えば、このセッションにおいて、最初に送られてくるRTPパケットのRTPタイムスタンプ（ネットワークタイムスタンプ）などが含まれる（下記の例では0000000）。 This response also includes RTP packet information necessary for the operation of the content reading unit 1034 of the content receiving device 103b. For example, in this session, the RTP time stamp (network time stamp) of the first RTP packet sent is included (0000000 in the following example).

“RTP-Info: url= rtsp://server.jvc-victor.jp/hoge.mp4/trackID=1;rtptime=0000000;”
伝送制御部１０３６は、利用者が停止入力を行った時点で停止要求を送信する（ステップＳ１３７）。RTSPにおいて停止要求はTEARDOWNメソッドである。 “RTP-Info: url = rtsp: //server.jvc-victor.jp/hoge.mp4/trackID=1; rtptime = 0000000;”
The transmission control unit 1036 transmits a stop request when the user makes a stop input (step S137). In RTSP, the stop request is the TEARDOWN method.

“TEARDOWN rtsp://server.jvc-victor.jp/hoge.mp4 RTSP/1.0”
停止要求を受け付けた伝送制御部１０２５は、送信部１０２２を制御し、RTPパケットの送信を停止、セッションを切断し、送信を停止したことを通知する（ステップＳ１３８）。また、伝送制御部１０２５は、コンテンツ取得部１０２１に対して、コマンド入力＝再生停止定数（例えば“STOP”）を入力し、読み出しを停止する。 “TEARDOWN rtsp: //server.jvc-victor.jp/hoge.mp4 RTSP / 1.0”
The transmission control unit 1025 that has received the stop request controls the transmission unit 1022 to stop the transmission of the RTP packet, disconnect the session, and notify that the transmission is stopped (step S138). In addition, the transmission control unit 1025 inputs command input = reproduction stop constant (for example, “STOP”) to the content acquisition unit 1021 and stops reading.

なお、説明の簡単化のために、１つのコンテンツデータに1つのコンテンツのみが含まれると仮定していたが、前記の伝送制御機構は容易に複数のコンテンツの同期再生に拡張可能である。例えば、ステップＳ１３２において、複数のコンテンツに関するメディア記述を列挙し、ステップＳ１３３〜ステップＳ１３４の準備要求をメディア記述分繰り返すだけで、複数のコンテンツの同期再生が容易に行うことが可能である。以上がストリーミングシステムにおける伝送制御機構の概要である。 For simplification of explanation, it is assumed that only one content is included in one content data. However, the transmission control mechanism can be easily extended to synchronous reproduction of a plurality of contents. For example, it is possible to easily perform synchronized playback of a plurality of contents simply by listing media descriptions regarding a plurality of contents in step S132 and repeating the preparation requests in steps S133 to S134 for the media descriptions. The above is the outline of the transmission control mechanism in the streaming system.

ここで、コンテンツ再生中に、動的に、滑らかに音声や映像の切替を行おうとした場合、非同期モデルを採用し、かつ動的なセッション生成を伴う伝送制御機構を備えたストリーミングシステムとでは、２つ以上のコンテンツを再生途中から滑らかにビットストリーム切り替えの動作を行うことはできない。 Here, when trying to switch between audio and video dynamically and smoothly during content playback, a streaming system that employs an asynchronous model and has a transmission control mechanism with dynamic session generation, It is not possible to perform a bitstream switching operation smoothly during playback of two or more contents.

例えば、235秒分の再生時間を持つ２つのコンテンツ（トラック番号１および２とする）のうち、トラック番号１のコンテンツを再生しており、利用者がちょうど100秒目でトラック番号２のコンテンツへ切り替える指示を入力部１０３７へ入力したとする。 For example, out of two contents (track numbers 1 and 2) having a playback time of 235 seconds, the content of track number 1 is being played back, and the user moves to the content of track number 2 in the 100th second. It is assumed that a switching instruction is input to the input unit 1037.

コンテンツ受信装置１０３ｂは、非同期モデルを採用しているから、利用者から見た100秒目とは、コンテンツ読取部１０３４内部で計算されている時刻差分（現在時刻NTS−再生開始時刻STS）であり、既にバッファ部１０３２には、コンテンツ配信サーバ１０２ｂから送信済みの100秒目以降のコンテンツ素片が、既に蓄積されているか、その一部がIPネットワーク１０４上のルーターやスイッチングハブに滞留していると考えられる。 Since the content receiving apparatus 103b employs an asynchronous model, the 100th second seen from the user is a time difference (current time NTS−reproduction start time STS) calculated inside the content reading unit 1034. In the buffer unit 1032, the content pieces after the 100th second already transmitted from the content distribution server 102 b have already been accumulated, or a part of them has stayed in the router or switching hub on the IP network 104. it is conceivable that.

この状況において、トラック番号１のセッションを切断し（ステップＳ１３７）、トラック番号２のコンテンツに対してステップＳ１３３〜ステップＳ１３５の一連の通信手順を行うことを考えた場合、ステップＳ１３５における再生範囲は以下のような指定せざるを得ない。 In this situation, when the session of track number 1 is disconnected (step S137) and a series of communication procedures from step S133 to step S135 are performed on the content of track number 2, the reproduction range in step S135 is as follows. It must be specified like this.

“PLAY rtsp://server.jvc-victor.jp/hoge.mp4 RTSP/1.0”
“Range: npt=100.0-235.0”
ステップＳ１３６の処理直後における、バッファ部１０３２に蓄積されているコンテンツ素片の順列を、図１５に示す。再生時刻100秒のコンテンツの時刻情報をT1とし、時系列順にT2,T3,T4とする。図１５では、トラック番号１のコンテンツはコンテンツ素片４まで蓄積され、続いてトラック番号２のコンテンツがコンテンツ素片１から蓄積されていることが示されている。このコンテンツ素片の順列で利用者が視聴した場合、コンテンツはT4からT1へ戻るため、一瞬巻き戻ったように感じ、滑らかには切り替わらず、激しい違和感が生じることになる。 “PLAY rtsp: //server.jvc-victor.jp/hoge.mp4 RTSP / 1.0”
“Range: npt = 100.0-235.0”
FIG. 15 shows a permutation of the content pieces stored in the buffer unit 1032 immediately after the process of step S136. Let T1 be the time information of content with a playback time of 100 seconds, and T2, T3, and T4 in chronological order. FIG. 15 shows that the content of track number 1 is accumulated up to the content segment 4, and the content of track number 2 is subsequently accumulated from the content segment 1. When the user views in this permutation of content pieces, the content returns from T4 to T1, so it feels like it has been wound for a moment, and it does not switch smoothly, and a severe discomfort occurs.

また、トラック番号１のセッションをステップＳ１３７の処理で切断した時点からトラック番号２のセッション確立準備の要求を受信するまでにかかった時間が、既にバッファ部１０３２に蓄積されているパケットの総再生時間よりも長いとバッファアンダーフローを起こし、コンテンツ読取部１０３４のタイムアウト変数TOUTが満了すれば、再生が停止することになる。
同様に、トラック番号１のセッションをステップＳ１３７の処理で切断し、トラック番号２のコンテンツに対してステップＳ１３３〜ステップＳ１３５までの一連の通信手順にかかる時間を計測し、ステップＳ１３５の処理におけるトラック番号２のNPTに補正をかけることで違和感を少なくする手法が考えられる。例えば、一連の通信手順に5.5秒かかったとしたら、npt=105.5-235.0に補正する。 The time taken from the time when the session with track number 1 is disconnected in the process of step S137 until the request for preparation for session establishment with track number 2 is received is the total playback time of packets already accumulated in the buffer unit 1032. If it is longer than this, a buffer underflow occurs, and when the timeout variable TOUT of the content reading unit 1034 expires, the reproduction stops.
Similarly, the session of track number 1 is disconnected in the process of step S137, the time taken for a series of communication procedures from step S133 to step S135 is measured for the content of track number 2, and the track number in the process of step S135 is measured. A method to reduce the sense of incongruity by correcting the NPT of 2 can be considered. For example, if it takes 5.5 seconds for a series of communication procedures, it is corrected to npt = 105.5-235.0.

しかし、この手法を適用してもトラック番号２の再生開始要求がコンテンツ配信サーバ１０２ｂへ到達して実際にパケット送信が開始されるまでの時間遅延を考慮することができず、バッファ内部の全てのパケットについてコンテンツ再生時刻が完全に連続するように構成することは困難である。 However, even if this method is applied, it is not possible to consider the time delay until the reproduction start request for track number 2 reaches the content distribution server 102b and the packet transmission is actually started. It is difficult to configure the content playback time for packets to be completely continuous.

上記の問題に対処する第１の改善方法として、特開２００２−１１８５９２（特許文献２）が開示されている。 Japanese Patent Laid-Open No. 2002-118592 (Patent Document 2) is disclosed as a first improvement method for coping with the above problem.

これは配信サーバと、クライアント間に、二次配信サーバを設け、二次配信サーバが切り替え後のビットストリームをある程度バッファリングしてから切り替えることで、クライアントのバッファ部が空にならず、コンテンツを途切れなく切り替えることができるというものである。 This is because a secondary delivery server is provided between the delivery server and the client, and the secondary delivery server buffers the bitstream after switching to some extent and then switches, so that the client buffer is not emptied and the content is It can be switched without interruption.

しかし、特許文献２では、複数の配信サーバから配信されるビットストリームの各々は互いに時間的に独立であって、フレーム境界が同期していない。つまり、バッファアンダーフローは防げるが、コンテンツの巻き戻り感に関しては無力であり、コンテンツ再生時刻を完全に連続するように構成することは困難である。 However, in Patent Document 2, each of bitstreams distributed from a plurality of distribution servers is temporally independent from each other, and frame boundaries are not synchronized. That is, buffer underflow can be prevented, but the content rewinding feeling is ineffective, and it is difficult to configure the content playback time to be completely continuous.

上記の問題に対処する第２の改善方法として、多くのストリーミングシステムでは、複数トラック同時受信構成を採用している。一例として、２トラック同時受信可能なコンテンツ配信システム１０１ｃを構成するコンテンツ配信サーバ１０２ｃとコンテンツ受信装置１０３ｃの機能ブロック図を、図１６に示す。 As a second improvement method for coping with the above problem, many streaming systems adopt a multi-track simultaneous reception configuration. As an example, FIG. 16 shows a functional block diagram of the content distribution server 102c and the content receiving device 103c that constitute the content distribution system 101c capable of receiving two tracks simultaneously.

コンテンツ配信サーバ１０２ｃは、コンテンツ取得部１０２１Ａ、送信部１０２２Ａ、コンテンツ取得部１０２１Ｂ、送信部１０２２Ｂ、システムクロック発振部１０２３、およびコンテンツ記憶部１０２４を有する。また、コンテンツ受信装置１０３ｃは、受信部１０３１Ａ、バッファ部１０３２Ａ、コンテンツ読取部１０３４Ａ、受信部１０３１Ｂ、バッファ部１０３２Ｂ、コンテンツ読取部１０３４Ｂ、システムクロック発振部１０３３、再生部１０３５、伝送制御部１０３６、入力部１０３７、および切替部１０３８を有する。なお、図１３と同じものについては同じ番号を付し、その説明は省略する。 The content distribution server 102c includes a content acquisition unit 1021A, a transmission unit 1022A, a content acquisition unit 1021B, a transmission unit 1022B, a system clock oscillation unit 1023, and a content storage unit 1024. In addition, the content receiving device 103c includes a receiving unit 1031A, a buffer unit 1032A, a content reading unit 1034A, a receiving unit 1031B, a buffer unit 1032B, a content reading unit 1034B, a system clock oscillation unit 1033, a reproduction unit 1035, a transmission control unit 1036, an input Part 1037 and switching part 1038. In addition, the same number is attached | subjected about the same thing as FIG. 13, and the description is abbreviate | omitted.

切替部１０３８は、トラック切替入力に従って、２つのコンテンツ素片のうち、どちらか一方を採用して再生部１０３５へ入力する機能を有する。採用しなかった方のコンテンツ素片は、読み出しはするものの、そのまま破棄される。 The switching unit 1038 has a function of adopting one of the two content segments and inputting it to the playback unit 1035 in accordance with the track switching input. Although the content piece that has not been adopted is read, it is discarded as it is.

なお、入力部１０３７は、利用者が入力するトラック番号１または２を切り替えるためのトラック切替入力を受け付けるものとする。 Note that the input unit 1037 accepts a track switching input for switching the track number 1 or 2 input by the user.

コンテンツ受信装置１０３ｃでは、複数の同期するトラックがあった場合、ステップＳ１３３〜ステップＳ１３４に相当する一連の準備要求・返答をトラック数分繰り返してから、ステップＳ１３５に相当する再生要求を開始し、全てのトラック分のストリームを受信し、受信側で複数のトラックのうち、１つを再生するような切替部１０３８をコンテンツ受信装置１０３ｃに備えることで対処している。 When there are a plurality of synchronized tracks, the content receiving apparatus 103c repeats a series of preparation requests / responses corresponding to steps S133 to S134 for the number of tracks, and then starts a reproduction request corresponding to step S135. This is dealt with by providing the content receiving apparatus 103c with a switching unit 1038 that receives a stream corresponding to the number of tracks and reproduces one of a plurality of tracks on the receiving side.

図１７は、複数トラック同時受信構成を有するコンテンツ受信装置１０３ｃの動作を示したフローチャートである。ステップＳ１４１の処理以外、全ての処理（ステップＳ１４２〜ステップＳ１４６）は、図１２のステップＳ１２２〜ステップＳ１２６と同一なので、その説明を省略する。 FIG. 17 is a flowchart showing the operation of the content receiving apparatus 103c having a multiple-track simultaneous reception configuration. Since all processes (steps S142 to S146) other than the process of step S141 are the same as steps S122 to S126 of FIG. 12, the description thereof is omitted.

再生を選択されているトラックのコンテンツを受信する側のコンテンツ読取部１０３４（ＡまたはＢ）は、バッファ部１０３２（ＡまたはＢ）を監視し、一定量蓄積されるまで待機し、予め定められた蓄積量のしきい値を超えると、真偽値“Ｙ”を生成し、次の処理へ移行する（ステップＳ１４１）。これは、複数のトラック間における再生開始時刻STSが全く同一となることを意図している。 The content reading unit 1034 (A or B) on the side that receives the content of the track that is selected for playback monitors the buffer unit 1032 (A or B), waits until a predetermined amount is accumulated, and is set in advance. When the threshold value of the accumulation amount is exceeded, a true / false value “Y” is generated, and the process proceeds to the next process (step S141). This is intended that the reproduction start times STS between a plurality of tracks are exactly the same.

以上が、従来の映像・音声等の実時間配信に用いられているビットストリーム切り替え動作を伴うストリーミングシステムである。 The above is a conventional streaming system with a bitstream switching operation used for real-time distribution of video and audio.

このようなストリーミングシステムを採用した通信カラオケシステムには、以下のような問題点があった。 The communication karaoke system employing such a streaming system has the following problems.

まず、１つ目の問題点としては、ストリーミングシステムを採用した通信カラオケシステムにおいて、MIDIを採用した通信カラオケシステムと同様なキーコントロール機能を実現するためには、コンテンツ受信装置側に高い演算能力を有する音程変換手段を搭載しなければならず、コストアップの原因となっていた。 First, as a first problem, in order to realize a key control function similar to that of a communication karaoke system employing MIDI in a communication karaoke system employing a streaming system, the content receiving device side has a high computing capability. It has been necessary to mount the pitch conversion means, which has been a cause of cost increase.

また、２つ目の問題点としては、コンテンツ受信装置として、携帯電話やPDA（Personal Digital Assistant：携帯情報端末）等、比較的小さな、基板面積に制約のある機器を用いる場合、音程変換手段を搭載できない状況が発生することがあった。 The second problem is that when a relatively small device such as a mobile phone or a PDA (Personal Digital Assistant) is used as the content receiving device and the board area is limited, a pitch conversion means is used. There was a situation that could not be installed.

また、３つ目の問題点としては、音程変換手段をソフトウェアによって実現する場合、高品質にしようとするほど演算量が増大するため、コンテンツ受信装置として非力な中央処理装置を有する携帯電話やPDAでは、高品質な音程変換アルゴリズムを採用できなかった。
特開平９−１８５３９２号公報特開２００２−１１８５９２号公報 The third problem is that when the pitch conversion means is realized by software, the amount of calculation increases as the quality is improved, so that the mobile phone or PDA having a central processing unit that is ineffective as a content receiving device. Then, it was not possible to adopt a high-quality pitch conversion algorithm.
JP-A-9-185392 JP 2002-118592 A

本発明は、上記事情に鑑みてなされたものであり、ネットワークを介して、複数トラックから構成されるコンテンツデータを送信するコンテンツ配信サーバと当該コンテンツデータを受信再生するコンテンツ受信装置からなるコンテンツ配信システムにおいて、トラックを動的に切り替えて送信し、切り替えられた箇所を滑らかに再生することを可能にするコンテンツ配信方法、およびコンテンツ配信方法を用いたコンテンツ配信サーバ、およびコンテンツ受信装置に関する。 The present invention has been made in view of the above circumstances, and a content distribution system comprising a content distribution server that transmits content data composed of a plurality of tracks via a network, and a content reception device that receives and reproduces the content data. The present invention relates to a content distribution method, a content distribution server using the content distribution method, and a content receiving apparatus that can dynamically switch and transmit tracks and smoothly reproduce the switched portion.

上記目的を達成するために、請求項１に記載のコンテンツ配信方法は、複数のコンテンツ素片データを有する複数のトラックを備えたコンテンツデータを記憶しているコンテンツ配信サーバと、このコンテンツ配信サーバに対して、ネットワークを介してコンテンツの再生要求を行い、その再生要求に応じてコンテンツ配信サーバからネットワークを介して送信される当該コンテンツ素片データを受信し再生するコンテンツ受信装置とを備えたコンテンツ配信システムにおけるコンテンツ配信方法であって、前記コンテンツ配信サーバに記憶されているコンテンツデータは、ファイル名と複数のトラックとを有するコンテンツデータであり、前記各トラックのトラックデータは、トラック番号、トラックタイムスタンプ周波数、総フレーム数、音程の移調の度合いを示すトラックキー情報、パートナー音声の有無を示すトラックデュエット情報およびフレーム番号、再生のタイミングを示す時刻情報、コンテンツ素片データを有するものであり、同一の前記フレーム番号における各トラックの前記データフレームが、同一の前記時刻情報を有するものであり、前記コンテンツ受信装置が、前記コンテンツ受信装置と前記コンテンツ配信サーバとの間でセッションを確立して前記コンテンツ素片データを受信するための事前情報であるセッション記述を要求する工程と、前記コンテンツ配信サーバが、前記トラック毎に少なくとも前記トラック番号、前記トラックキー情報、前記トラックデュエット情報を含む前記セッション記述を送信する工程と、前記コンテンツ受信装置が、前記セッション記述を受信すると、前記トラック番号に前記トラックキー情報および前記トラックデュエット情報を関連付けて表示する工程と、前記コンテンツ受信装置が、前記コンテンツデータのファイル名、前記トラック番号、およびコンテンツの再生範囲と共にコンテンツの再生要求メッセージを送信する工程と、前記コンテンツ配信サーバが、前記再生要求メッセージを受信すると、前記ファイル名、前記トラック番号、および前記コンテンツの再生範囲に従って、指定された前記トラックの前記コンテンツ素片データを取得、送信する工程とを有し、前記トラックを切り替える際には、前記コンテンツ受信装置が、前記トラック番号と共にトラックの切替要求メッセージを送信する工程と、前記コンテンツ配信サーバが、前記切替要求メッセージを受信すると、前記トラック番号に従って、既送の前記コンテンツ素片データのうち再生順序が最後となっている前記コンテンツ素片データに連続して再生されるべき時刻情報を有する指定された前記トラックの前記コンテンツ素片データを取得して送信する工程とを有することを特徴とする。 In order to achieve the above object, a content distribution method according to claim 1 includes a content distribution server storing content data including a plurality of tracks having a plurality of content segment data, and the content distribution server. On the other hand, a content distribution device comprising: a content receiving device that makes a content reproduction request via a network and receives and reproduces the content fragment data transmitted from the content distribution server via the network in response to the reproduction request In the content distribution method in the system, the content data stored in the content distribution server is content data having a file name and a plurality of tracks, and the track data of each track includes a track number, a track time stamp Frequency, total number of frames Each track in the same frame number includes track key information indicating the degree of transposition of the pitch, track duet information and frame number indicating presence / absence of partner audio, time information indicating timing of reproduction, and content fragment data. The data frames have the same time information, and the content receiving device establishes a session between the content receiving device and the content distribution server to receive the content fragment data Requesting a session description, which is prior information, transmitting the session description including at least the track number, the track key information, and the track duet information for each of the tracks, and the content The receiving device is When receiving the description of the application, the step of displaying the track key information and the track duet information in association with the track number, and the content receiving device, together with the file name of the content data, the track number, and the playback range of the content Transmitting a content reproduction request message; and when the content distribution server receives the reproduction request message, the content element of the designated track according to the file name, the track number, and the content reproduction range. Obtaining and transmitting one piece of data, and when switching the track, the content receiving device transmits a track switching request message together with the track number, and the content distribution server includes the switching Request When the message is received, the designated track having time information to be reproduced continuously in accordance with the track number and the content fragment data whose reproduction order is the last among the transmitted content fragment data And acquiring and transmitting the content segment data.

また、請求項２に記載のコンテンツ配信サーバは、コンテンツ受信装置からの再生要求に従って、複数のコンテンツ素片データを有する複数のトラックを備えたコンテンツデータから再生要求されたコンテンツ素片データを選択し、選択したコンテンツ素片データをネットワークを介して前記コンテンツ受信装置へ送信するコンテンツ配信サーバであって、ファイル名と複数のトラックとを有するコンテンツデータであり、前記各トラックのトラックデータは、トラック番号と複数のデータフレームとを有し、前記各データフレームは、トラック番号、トラックタイムスタンプ周波数、総フレーム数、音程の移調の度合いを示すトラックキー情報、パートナー音声の有無を示すトラックデュエット情報およびフレーム番号、再生のタイミングを示す時刻情報、およびコンテンツ素片データを有するものであり、同一の前記フレーム番号における各トラックの前記データフレームが、同一の前記時刻情報を有するコンテンツデータを記憶するコンテンツ記憶手段と、前記コンテンツ受信装置と前記コンテンツ配信サーバとの間でセッションを確立して前記コンテンツ素片データを受信するための事前情報であるセッション記述が要求されると、前記トラック毎に少なくとも前記トラック番号、前記トラックキー情報、前記トラックデュエット情報を含む前記セッション記述を送信し、前記コンテンツデータのファイル名、前記トラック番号、およびコンテンツの再生範囲と共にコンテンツの再生要求メッセージと、前記トラック番号と共にトラックの切替要求メッセージを受信する伝送制御手段と、前記伝送制御手段が前記再生要求メッセージを受信すると、前記ファイル名、前記トラック番号、および前記コンテンツの再生範囲に従って、指定された前記トラックの前記コンテンツ素片データを前記コンテンツ記憶手段から取得し、前記伝送制御手段が前記切替要求メッセージを受信すると、前記トラック番号に従って、既送の前記コンテンツ素片データのうち再生順序が最後となっている前記コンテンツ素片データに連続して再生されるべき時刻情報を有する指定された前記トラックの前記コンテンツ素片データを前記コンテンツ記憶手段から取得するコンテンツ取得手段とを備えることを特徴とする。 The content distribution server according to claim 2 selects content segment data requested for reproduction from content data including a plurality of tracks having a plurality of content segment data in accordance with a reproduction request from the content receiving device. A content distribution server for transmitting the selected content fragment data to the content receiving device via a network, the content data having a file name and a plurality of tracks, wherein the track data of each track has a track number Each of the data frames includes a track number, a track time stamp frequency, a total number of frames, track key information indicating the degree of pitch transposition, track duet information indicating the presence or absence of partner audio, and a frame. Number, playback timing Content storage means for storing content data having the same time information in which the data frame of each track in the same frame number has the time information and the content segment data. When a session description, which is prior information for establishing a session between the content distribution server and receiving the content fragment data, is requested, at least the track number, the track key information, Transmission of the session description including the track duet information, and reception of a content reproduction request message together with the file name of the content data, the track number, and a content reproduction range, and a track switching request message together with the track number. When the control means and the transmission control means receive the playback request message, the content segment data of the designated track is sent from the content storage means according to the file name, the track number, and the playback range of the content. And when the transmission control means receives the switching request message, the content unit data that has been played in the last order among the content unit data that has already been sent is reproduced in accordance with the track number. Content acquisition means for acquiring the content fragment data of the designated track having time information to be acquired from the content storage means.

また、請求項３に記載のコンテンツ受信装置は、複数のコンテンツ素片データを有する複数トラックを備えたコンテンツデータを記憶しているコンテンツ配信サーバに対して、ネットワークを介してコンテンツの再生要求を行い、その再生要求に応じてコンテンツ配信サーバからネットワークを介して送信されるコンテンツ素片データを受信し再生するコンテンツ受信装置であって、コンテンツ配信サーバに記憶されているコンテンツデータは、ファイル名と複数のトラックとを有するコンテンツデータであり、前記各トラックのトラックデータは、トラック番号と複数のデータフレームとを有し、前記各データフレームは、トラック番号と複数のデータフレームとを有し、前記各データフレームは、トラック番号、トラックタイムスタンプ周波数、総フレーム数、音程の移調の度合いを示すトラックキー情報、パートナー音声の有無を示すトラックデュエット情報およびフレーム番号、再生のタイミングを示す時刻情報、およびコンテンツ素片データを有するものであり、同一の前記フレーム番号における各トラックの前記データフレームが、同一の前記時刻情報を有するものであり、前記コンテンツ配信サーバに対して、前記コンテンツ受信装置と前記コンテンツ配信サーバとの間でセッションを確立して前記コンテンツ素片データを受信するための事前情報であるセッション記述を要求し、前記セッション記述を受信すると、前記トラック番号に前記トラックキー情報および前記トラックデュエット情報を関連付けて表示し、コンテンツを再生する際には、前記コンテンツデータのファイル名、前記トラック番号、およびコンテンツの再生範囲と共にコンテンツの再生要求メッセージを送信し、トラックを切り替える際には、前記トラック番号と共にトラックの切替要求メッセージを送信する伝送制御手段を備えることを特徴とする。 According to a third aspect of the present invention, there is provided a content receiving apparatus that makes a content reproduction request via a network to a content distribution server storing content data having a plurality of tracks having a plurality of content segment data. A content receiving apparatus that receives and plays back content fragment data transmitted from the content distribution server via the network in response to the playback request, and the content data stored in the content distribution server includes a file name and a plurality of data The track data of each track has a track number and a plurality of data frames, and each data frame has a track number and a plurality of data frames. Data frame is track number, track time stamp It has the number of waves, total number of frames, track key information indicating the degree of transposition of pitch, track duet information and frame number indicating presence / absence of partner audio, time information indicating playback timing, and content fragment data, and the same The data frame of each track in the frame number has the same time information, and a session is established between the content receiving device and the content distribution server with respect to the content distribution server. Requesting a session description, which is prior information for receiving the content fragment data, and receiving the session description, display the track key information and the track duet information in association with the track number, and reproduce the content In the case of A transmission control means for transmitting a content reproduction request message together with the track number, the track number, and the content reproduction range, and transmitting a track switching request message together with the track number when switching tracks. Features.

本発明における通信カラオケシステムでは、バックコーラスやパートナー音声入りのデータを、原音に近い品質で利用者に提供でき、しかも利用者の要求に応じて音程を動的に切り替えることができるので、サービスの高品質化に寄与できる。 In the communication karaoke system according to the present invention, data including back chorus and partner voice can be provided to the user with quality close to the original sound, and the pitch can be dynamically switched according to the user's request. Contributes to high quality.

また、受信クライアント側に音程変換を行う手段を搭載する必要がなく、再生コンテンツのオーサリング時に音程変換を行うため、従来よりも低速かつ高品質な音程変換手段を用いることができ、サービス品質を向上させることができる。 In addition, there is no need to install pitch conversion means on the receiving client side, and pitch conversion is performed at the time of authoring playback content, so it is possible to use low-speed and higher-quality pitch conversion means than before, improving service quality. Can be made.

また、受信クライアント側に音程変換手段を搭載する必要がなく、再生コンテンツのオーサリング時に音程変換を行うため、受信クライアントの出荷後にいつでも新しい音程変換手段を適用することができ、その結果得られる受信クライアントにおける音程変換の効果は、受信クライアントのハードウェアの製作時期および出荷時期に依存せず、全ての受信クライアントで同等の効果が得られる。 In addition, it is not necessary to install pitch conversion means on the receiving client side, and pitch conversion is performed at the time of authoring the playback content. Therefore, new pitch conversion means can be applied at any time after the receiving client is shipped, and the resulting receiving client The effect of the pitch change in is independent of the hardware production time and shipping time of the receiving client, and the same effect can be obtained in all receiving clients.

また、再生コンテンツのオーサリング時に、複数の音程範囲のうち、ある音程分は機械的な処理によって音程変換を行い、ある音程分は、移調された楽譜を用いて演奏者・歌唱者が実際に演奏・歌唱した音声データを用いることによって、従来の手法では不可能かつ高品質なサービスを実現することができる。 Also, during authoring of the playback content, a certain part of a plurality of pitch ranges is converted by a mechanical process, and a certain part is actually played by a performer / singer using a transposed score. -By using the sung voice data, it is possible to realize a high quality service that is impossible with the conventional method.

また、再生コンテンツの著作者が音程変換等の楽曲加工に承諾しない再生コンテンツに関しては、オーサリング時に音程変換を行ったトラックを追加しないだけで、受信クライアント全てに特別な情報を送信することなく、容易に特定の再生コンテンツに対して音程変換再生を禁止できる。 In addition, for playback content that the author of the playback content does not consent to music processing such as pitch conversion, it is easy without adding special information to all receiving clients without adding a track that has been pitch-converted during authoring. In addition, it is possible to prohibit pitch conversion playback for specific playback content.

本発明の実施形態を、図１〜図８を用いて説明する。 An embodiment of the present invention will be described with reference to FIGS.

≪第１の実施形態≫
図１に、コンテンツ配信システム１ａ（通信カラオケシステム）のシステム構成を示す。コンテンツ配信システム１ａは、コンテンツ配信サーバ２ａ、コンテンツ受信装置３ａ（カラオケ装置）、およびオーサリング装置５ａとから構成され、コンテンツ配信サーバ２ａとコンテンツ受信装置３ａは、IPネットワーク４によって相互に接続される。 << First Embodiment >>
FIG. 1 shows a system configuration of the content distribution system 1a (communication karaoke system). The content distribution system 1a includes a content distribution server 2a, a content reception device 3a (karaoke device), and an authoring device 5a. The content distribution server 2a and the content reception device 3a are connected to each other by an IP network 4.

コンテンツ配信サーバ２ａは、コンテンツ取得部２１、伝送制御部２２ａ、送信部２３、システムクロック発振部２４、およびコンテンツ記憶部２５を有する。 The content distribution server 2a includes a content acquisition unit 21, a transmission control unit 22a, a transmission unit 23, a system clock oscillation unit 24, and a content storage unit 25.

コンテンツ取得部２１は、再生するコンテンツのコンテンツデータをコンテンツ記憶部２５に記録し、システムクロック信号に従って、伝送制御情報に基づいて任意のポインタ（コンテンツ上の読み出し位置）を用いてコンテンツ素片を取得する機能を有する。このポインタとは、コンテンツファイル名、トラック番号、フレーム番号の３つのうちいずれか、もしくはそれらのうちいくつかの組み合わせである。また、“コンテンツ素片”とは、任意のデータ長を有するデータブロックのことを指す。 The content acquisition unit 21 records content data of the content to be reproduced in the content storage unit 25, and acquires a content fragment using an arbitrary pointer (read position on the content) based on the transmission control information in accordance with the system clock signal. Has the function of This pointer is any one of the content file name, the track number, and the frame number, or some combination thereof. The “content segment” refers to a data block having an arbitrary data length.

コンテンツ取得部２１は、内部にコンテンツファイル名、トラック番号、フレーム番号等の初期値を持ち、起動時から自動的に読み出しを開始しても良いし、外部入力からコンテンツファイル名、トラック番号、コンテンツ再生開始時刻を得てから読み出しを開始しても良い。また、コマンド入力を受け付ける外部入力を設け、コマンド入力＝再生開始定数（例えば“PLAＹ”）である場合は再生するようにしても良いし、コマンド入力＝再生停止定数（例えば“STOP”）である場合は停止するようにしても良い。あるいは自動読み出し開始動作と外部読み出し開始動作を自動的に判別するようにしても良い。 The content acquisition unit 21 has initial values such as a content file name, a track number, and a frame number inside, and may automatically start reading from the time of start-up, or the content file name, track number, and content from an external input Reading may be started after the reproduction start time is obtained. In addition, an external input for receiving command input is provided, and when command input = reproduction start constant (for example, “PLAY”), reproduction may be performed, or command input = reproduction stop constant (for example, “STOP”). If so, it may be stopped. Alternatively, the automatic read start operation and the external read start operation may be automatically determined.

伝送制御部２２ａは、コンテンツ受信装置３ａからのコンテンツの選択、コンテンツの再生・停止、トラック切替要求等の指示を含む伝送制御情報を受信すると、その応答を返信し、この伝送制御情報を、コンテンツ取得部２１と、送信部２３に通知する機能を有する。また、伝送制御部２２ａは、コンテンツ記憶部２５を参照してコンテンツファイル名、トラック番号、コンテンツ再生開始時刻を取得し、それらをコンテンツ取得部２１に対して入力し、送信部２３に対して送信アドレスを入力する機能を有する。 When the transmission control unit 22a receives transmission control information including instructions such as content selection, content playback / stop, and track switching request from the content receiving device 3a, the transmission control unit 22a returns a response to the transmission control information. It has a function to notify the acquisition unit 21 and the transmission unit 23. Also, the transmission control unit 22 a refers to the content storage unit 25, acquires the content file name, track number, and content playback start time, inputs them to the content acquisition unit 21, and transmits them to the transmission unit 23. It has a function to input an address.

適用するプロトコルとしては、RFC2326に規定されているRTSPに代表される実時間データ伝送制御用プロトコルを想定しており、Setup、Play、 Pause、Teardown、Describe等のメソッドを利用できる。 As a protocol to be applied, a real-time data transmission control protocol typified by RTSP defined in RFC2326 is assumed, and methods such as Setup, Play, Pause, Teardown, and Describe can be used.

送信部２３は、セッションと呼ぶ仮想的な通信路を確保し、コンテンツ素片を適切なヘッダを付加してパケット化し、そのパケットを、IPネットワーク４を介してコンテンツ受信装置３ａへ送信する機能を有する。例えば、ヘッダにはネットワークタイムスタンプが含まれる。 The transmission unit 23 has a function of securing a virtual communication path called a session, packetizing the content fragment with an appropriate header, and transmitting the packet to the content reception device 3a via the IP network 4. Have. For example, the header includes a network time stamp.

パケットを送信する際には、送信部２３は、コンテンツ受信装置３ａへパケットを届けるために送信アドレスを用いる。送信アドレスは、送信アドレスと送信ポート番号の組で構成される。送信アドレスは送信部２３内部の定数として予め持っていても良いし、他のブロックから入力を受け付けても良い。なお、パケットを送信するためのセッション（通信路）は送信アドレスが確定した時点で動的に確立するものとする。また、コンテンツ素片とパケットが必ずしも一致する必要はない。 When transmitting a packet, the transmission unit 23 uses a transmission address to deliver the packet to the content receiving device 3a. The transmission address is composed of a combination of a transmission address and a transmission port number. The transmission address may be previously stored as a constant in the transmission unit 23, or an input may be received from another block. Note that a session (communication path) for transmitting a packet is dynamically established when a transmission address is determined. Further, the content segment and the packet do not necessarily match.

システムクロック発振部２４は、水晶発振子等で実現されたリアルタイムクロック（高精度の時計）であり、システムクロック信号をコンテンツ取得部２１へ供給する機能を有する。 The system clock oscillator 24 is a real-time clock (high-precision clock) realized by a crystal oscillator or the like, and has a function of supplying a system clock signal to the content acquisition unit 21.

また、コンテンツ記憶部２５は、コンテンツデータを所定のファイル形式で記憶する機能を有する。 The content storage unit 25 has a function of storing content data in a predetermined file format.

また、コンテンツ受信装置３ａは、受信部３１、伝送制御部３２ａ、バッファ部３３、システムクロック発振部３４、コンテンツ読取部３５、再生部３６、入力部３７、およびマイク入力部３８を有する。 Further, the content receiving device 3a includes a receiving unit 31, a transmission control unit 32a, a buffer unit 33, a system clock oscillation unit 34, a content reading unit 35, a reproducing unit 36, an input unit 37, and a microphone input unit 38.

受信部３１は、IPネットワーク４からパケットを受信して、ヘッダを解釈して、コンテンツ素片を抽出し、バッファ部３３に供給する機能を有する。受信部３１は、コンテンツ配信サーバ２ａからのパケットのうち自己に必要なパケットのみを受信するために受信アドレスを用いる。受信アドレスは、受信アドレスと受信ポート番号の組で構成される。受信アドレスは受信部３１内部の定数として予め持っていても良いし、他のブロックから入力を受け付けても良い。 The receiving unit 31 has a function of receiving a packet from the IP network 4, interpreting the header, extracting a content fragment, and supplying the extracted fragment. The receiving unit 31 uses the reception address in order to receive only the packets necessary for itself among the packets from the content distribution server 2a. The reception address is composed of a combination of a reception address and a reception port number. The reception address may be previously stored as a constant inside the reception unit 31, or input may be received from another block.

伝送制御部３２ａは、入力部３７からの伝送制御情報をコンテンツ配信サーバ２ａの伝送制御部２２ａへ送信、その応答を受信し、応答を解析した後、受信アドレスを抽出し、受信部３１へ入力する機能を有する。 The transmission control unit 32a transmits the transmission control information from the input unit 37 to the transmission control unit 22a of the content distribution server 2a, receives the response, analyzes the response, extracts the reception address, and inputs the received address to the reception unit 31. It has the function to do.

バッファ部３３は、コンテンツ素片をヘッダから得られたネットワークタイムスタンプ（後述）と共に一次的に蓄積記憶する機能を有する。 The buffer unit 33 has a function of temporarily accumulating and storing the content pieces together with a network time stamp (described later) obtained from the header.

コンテンツ読取部３５は、バッファ部３３を監視し、再生に十分なコンテンツ素片が蓄積されたと判断した時点から、システムクロック発振部３４からのシステムクロック信号に従ってコンテンツ素片の読み出しを開始し、コンテンツ素片の集合をコンテンツに復元して出力する機能を有する。 The content reading unit 35 monitors the buffer unit 33 and starts reading the content unit in accordance with the system clock signal from the system clock oscillation unit 34 when it is determined that the content unit sufficient for reproduction has been accumulated. It has a function of restoring and outputting a set of segments as content.

再生部３６は、入力されたコンテンツ素片に応じた復号化を行い、マイク入力部からの音声信号と合成した後、スピーカ等の所定の出力装置に対して音声信号、映像信号を出力する機能を有する。例えば、コンテンツ素片がMPEGに類する高能率符号化されたデジタル音声データである場合、再生部３６は、高能率符号復号器（デコーダ）、D/A変換器、アナログアンプ等を有して構成される。 The playback unit 36 performs decoding according to the input content fragment, combines it with the audio signal from the microphone input unit, and then outputs the audio signal and the video signal to a predetermined output device such as a speaker. Have For example, when the content segment is digital audio data encoded with high efficiency similar to MPEG, the playback unit 36 includes a high efficiency code decoder (decoder), a D / A converter, an analog amplifier, and the like. Is done.

ここでいう合成の最低要件は、コンテンツ読取部３５とマイク入力部３８とからの２つの入力をほぼ同時に再生することである。仮にコンテンツ素片の種別が音声である場合は、マイク入力部３８からの音声と不自然でないように適切なミキシング合成や音量調整を行っても良い。 Here, the minimum requirement for composition is to reproduce two inputs from the content reading unit 35 and the microphone input unit 38 almost simultaneously. If the content segment type is audio, appropriate mixing synthesis and volume adjustment may be performed so as not to be unnatural with the audio from the microphone input unit 38.

入力部３７は、利用者から、コンテンツの選択・コンテンツの再生・停止、トラック切替等の伝送制御情報の入力処理を受け付ける機能を有する。例えば、ビットマップディスプレイとキーボードを用いて、再生ボタン、停止ボタン、コンテンツ選択ダイヤログなどのＧＵＩ部品を利用者に提供し、また、テンキーや数段階のスライドスイッチなどを用いて、トラック番号の入力を受け付ける操作スイッチを提供する。 The input unit 37 has a function of accepting transmission control information input processing such as content selection, content playback / stop, and track switching from the user. For example, using a bitmap display and a keyboard, GUI parts such as a play button, a stop button, and a content selection dialog are provided to the user, and a track number is input using a numeric keypad and several stages of slide switches. Provide an operation switch that accepts.

マイク入力部３８は、マイク等の音声取込装置によって取り込まれた音声信号の入力を受け付け、その音声信号を再生部３６ｂが扱うデータ形式に変換する機能を有する。 The microphone input unit 38 has a function of receiving an input of an audio signal captured by an audio capturing device such as a microphone and converting the audio signal into a data format handled by the playback unit 36b.

また、オーサリング装置５ａは、オーサリング部５１ａ、音程変換部５２、コンテンツ送信部５３、入力部５４、マイク入力部５５、およびコンテンツ一次記憶部５６を有する。このオーサリング装置５ａは、コンテンツ配信サーバ２ａと同一の筐体に収められても良いし、別の筐体に収められても良い。なお、便宜上、オーサリング装置５ａを利用する利用者のことを、編集者と呼ぶ。 The authoring device 5 a includes an authoring unit 51 a, a pitch conversion unit 52, a content transmission unit 53, an input unit 54, a microphone input unit 55, and a content primary storage unit 56. The authoring device 5a may be housed in the same housing as the content distribution server 2a or in a separate housing. For convenience, the user who uses the authoring device 5a is called an editor.

オーサリング部５１ａは、入力部５４からの指示入力とマイク入力部５５からの音声データを用いて、コンテンツデータの新規作成・トラック追加・コンテンツデータの削除等、コンテンツデータの作成編集を行う機能、作成編集したコンテンツデータをコンテンツ一次記憶部５６に記録する機能を有する。 The authoring unit 51a uses the instruction input from the input unit 54 and the audio data from the microphone input unit 55 to create and edit content data, such as creating new content data, adding tracks, deleting content data, etc. It has a function of recording edited content data in the content primary storage unit 56.

音程変換部５２は、コンテンツを構成する原音の音声データ（以降、ソース音声データと称す）に対して、少なくとも半音もしくは全音ごとに音程変換を行う機能（例えば、音声の周波数領域変換を利用した周波数シフトアルゴリズムを用いる）を有する。 The pitch conversion unit 52 performs a pitch conversion for at least a semitone or a full tone on the original sound data (hereinafter referred to as source sound data) constituting the content (for example, frequency using sound frequency domain conversion). Using a shift algorithm).

ここで、基準となるソース音声データの音階を“0”とし、上に半音シフトすると“＋1”、下に半音シフトすると“−1”となるような整数変数をキー変数kと呼ぶ。一般的なカラオケ装置における音程変換を実現する場合、キー変数kは、少なくとも“−6〜0〜＋6”までの範囲を採り得る。 Here, an integer variable in which the scale of the reference source audio data is “0”, “+1” when shifted upward by a semitone, and “−1” when shifted downward by a semitone is referred to as a key variable k. When realizing pitch conversion in a general karaoke device, the key variable k can take a range of at least “−6 to 0 to +6”.

例えば、キー変数k=“＋1”の場合、短２度の移調に相当し、原曲がハ長調だった場合は移調後の調は変ニ長調となる。また、キー変数k=“−6”の場合は、減５度に相当し、原曲がハ長調だった場合は移調後の調は嬰ヘ長調となる。 For example, when the key variable k = “+ 1”, this corresponds to a transposition of the second minor, and when the original music is in C major, the key after the transposition is a D major. Further, when the key variable k = “− 6”, it corresponds to a decrease of 5 degrees, and when the original music is in C major, the key after transposition is in F major.

なお、キー変数kが“−6”未満（もしくは＋6より大）が用いられない理由は、例えば、半音シフトで−6だけさげるということは、半音シフトで＋6（増４度）してから１オクターブ下げる移調に相当するからである。カラオケの場合、利用者本人が意図している調に合致しさえすれば、演奏のオクターブが異なっても、快適に歌唱することができるため、半音±６以上の移調は必要無い。 Note that the reason why the key variable k is less than “−6” (or greater than +6) is not to use, for example, to reduce it by −6 in a semitone shift, and to 1 after a +6 (increase of 4 degrees) in a semitone shift. This is because it corresponds to a transposition to lower the octave. In the case of karaoke, as long as it matches the key intended by the user, it is possible to sing comfortably even if the performance octave is different.

コンテンツ送信部５３は、入力部５４からコンテンツファイル移動指示と移動先配信サーバ名が入力されると、コンテンツ一次記憶部５６に保存されているコンテンツデータを、移動先配信サーバ名の指し示す配信サーバ内のコンテンツ記憶部２５へネットワークを介して送信する機能を有する。 When the content file movement instruction and the destination distribution server name are input from the input unit 54, the content transmission unit 53 stores the content data stored in the content primary storage unit 56 in the distribution server indicated by the destination distribution server name. The content storage unit 25 has a function of transmitting via the network.

また、コンテンツ送信部５３は、CD-RやDVD-R等の記録媒体にコンテンツデータを記録する機能を有する。ネットワークを介してコンテンツデータの送信を行わない場合は、記録媒体にてコンテンツデータを提供する。 The content transmission unit 53 has a function of recording content data on a recording medium such as a CD-R or a DVD-R. When content data is not transmitted via a network, the content data is provided on a recording medium.

入力部５４は、編集者が、コンテンツデータ新規作成指示・トラック追加指示・コンテンツデータ移動指示・コンテンツデータ削除指示の各指示入力を受け付ける機能を有する。例えば、ビットマップディスプレイとキーボードを用いて、再生ボタン、停止ボタン、コンテンツ選択ダイヤログなどのＧＵＩ部品を編集者に提供し、また、テンキーや数段階のスライドスイッチなどを用いて、トラック番号や、音程変換のためのキー変数ｋの入力を受け付ける操作スイッチを提供する。 The input unit 54 has a function for the editor to accept each instruction input of a content data new creation instruction, a track addition instruction, a content data movement instruction, and a content data deletion instruction. For example, using a bitmap display and a keyboard, GUI parts such as a play button, a stop button, and a content selection dialog are provided to the editor, and a track number, a slide switch, etc. Provided is an operation switch for receiving an input of a key variable k for pitch conversion.

マイク入力部５５は、マイク等の音声取込装置によって取り込まれた音声信号の入力を受け付け、その音声信号を再生部３６ｂが扱うデータ形式に変換してソース音声データを生成する機能を有する。また、マイク入力部５５は、CD-ROMドライブやネットワーク等の外部からソース音声データを受け付けるためのインタフェースを備えても良い。さらに、マイク入力部５５はマイクとメモリを備え、一定時間分の音声信号を録音してソース音声データに変換するようにしても良い。編集者がトラック追加指示を入力すると、同時にソース音声データがオーサリング装置５ａにマイク入力部５５を通して取り込まれる。 The microphone input unit 55 has a function of receiving input of an audio signal captured by an audio capturing device such as a microphone and converting the audio signal into a data format handled by the playback unit 36b to generate source audio data. The microphone input unit 55 may include an interface for receiving source audio data from the outside such as a CD-ROM drive or a network. Further, the microphone input unit 55 may include a microphone and a memory, and may record an audio signal for a predetermined time and convert it into source audio data. When the editor inputs a track addition instruction, source audio data is simultaneously taken into the authoring device 5a through the microphone input unit 55.

コンテンツ一次記憶部５６は、オーサリング部５１ａが作成編集したコンテンツデータを一次的に記憶する機能を有する。 The content primary storage unit 56 has a function of temporarily storing content data created and edited by the authoring unit 51a.

次に、オーサリング装置５ａの動作について、図２のフローチャートを用いて説明する。 Next, the operation of the authoring device 5a will be described using the flowchart of FIG.

（１）コンテンツデータの新規作成
オーサリング部５１ａは、入力部５４からコンテンツデータ新規作成指示と、コンテンツファイル名CF1が入力されると、コンテンツ一次記憶部５６に、トラックの全く含まれないコンテンツデータを、コンテンツファイル名CF1で作成する。 (1) New creation of content data When the content data new creation instruction and the content file name CF1 are input from the input unit 54, the authoring unit 51a stores content data that does not include any track in the content primary storage unit 56. Create with content file name CF1.

（２）コンテンツデータの削除
オーサリング部５１ａは、入力部５４からコンテンツデータ削除指示と、コンテンツファイル名CF1が入力されると、コンテンツ一次記憶部５６に記録されているコンテンツファイル名CF1のコンテンツデータを削除する。 (2) Deletion of Content Data When the content data deletion instruction and the content file name CF1 are input from the input unit 54, the authoring unit 51a deletes the content data of the content file name CF1 recorded in the content primary storage unit 56. delete.

（３）トラック追加
オーサリング部５１ａは、入力部５４からトラック追加指示、コンテンツファイル名CF1、トラック番号TN1、キー変数kが入力され、マイク入力部５５から入力されるソース音声データを取得すると（ステップＳ０１）、まず、キー変数kが“0”か否かを判定する（ステップＳ０２）。キー変数ｋが“0”以外であれば、音程変換部５２が入力されたソース音声データに対して音程変換処理を施し（ステップＳ０３）、オーサリング部５１ａにその音程変換処理を施したソース音声データを供給する。キー変数ｋが“0”であれば、ステップＳ０４の処理に移行する。 (3) Track addition The authoring unit 51a receives the track addition instruction, the content file name CF1, the track number TN1, and the key variable k from the input unit 54, and acquires the source audio data input from the microphone input unit 55 (step First, it is determined whether or not the key variable k is “0” (step S02). If the key variable k is other than “0”, the pitch conversion process is performed on the input source voice data by the pitch conversion unit 52 (step S03), and the source voice data subjected to the pitch conversion process on the authoring unit 51a. Supply. If the key variable k is “0”, the process proceeds to step S04.

次に、オーサリング部５１ａは、コンテンツファイル名CF1の指し示すコンテンツデータをオープンし、総トラック数を+1し、トラック番号TN1の指し示すトラックデータを追加し、ソース音声データを適当なサンプル数毎にコンテンツ素片に分割する（ステップＳ０４）。次に、オーサリング部５１ａは、必要であれば高能率符号化を施し、フレーム番号と時刻情報を付与、多重化してトラックデータを生成する（ステップＳ０５）。トラックタイムスタンプ周波数は、音声データのサンプリング周波数を測定して書き込まれる。 Next, the authoring unit 51a opens the content data indicated by the content file name CF1, adds +1 to the total number of tracks, adds the track data indicated by the track number TN1, and sets the source audio data for each appropriate number of samples. Divide into segments (step S04). Next, the authoring unit 51a performs high-efficiency encoding if necessary, and assigns and multiplexes the frame number and time information to generate track data (step S05). The track time stamp frequency is written by measuring the sampling frequency of the audio data.

基本的には、図１０に示す従来のコンテンツデータと同様に、本実施形態におけるコンテンツデータは、コンテンツファイル名、総トラック数Ｔ、および各トラックのトラックデータから構成され、さらに各トラックのトラックデータは、トラック番号、トラックタイムスタンプ周波数、総フレーム数、および複数のデータフレームから構成され、さらに個々のデータフレームは、フレーム番号、時刻情報、コンテンツ素片から構成される。 Basically, like the conventional content data shown in FIG. 10, the content data in this embodiment is composed of a content file name, the total number of tracks T, and track data of each track, and further, track data of each track. Is composed of a track number, a track time stamp frequency, the total number of frames, and a plurality of data frames, and each data frame is composed of a frame number, time information, and a content fragment.

コンテンツ配信サーバ２ａから複数トラック分のコンテンツ素片がコンテンツ受信装置３ａへ到達した際、滑らかに再生を継続させるためには、少なくともコンテンツファイル中にトラック切替可能な複数のトラックの相互に、コンテンツ素片が割り当てられている各データフレームにおいて、コンテンツ再生時刻上での時間的な境界が一致している（同じフレーム番号のコンテンツ素片は同じ再生時刻で再生される）必要がある。 In order to continue playback smoothly when content pieces for a plurality of tracks have arrived from the content distribution server 2a to the content receiving device 3a, at least a plurality of tracks in which a track can be switched in the content file are mutually connected. In each data frame to which a piece is assigned, the time boundaries on the content reproduction time must match (content pieces having the same frame number are reproduced at the same reproduction time).

言い換えれば、あるトラックＡとあるトラックＢに、同一のフレーム番号FNAおよびFNBが存在する時、FNAに付随する時刻情報／トラックＡのトラックタイムスタンプ周波数と、FNBに付随する時刻情報／トラックＡのトラックタイムスタンプ周波数とが、一致している必要がある。 In other words, when the same frame numbers FNA and FNB exist in a certain track A and a certain track B, the time information accompanying the FNA / track time stamp frequency of the track A and the time information accompanying the FNB / the time information of the track A The track timestamp frequency must match.

図３に、複数トラックを有するコンテンツデータのデータ構造（時間軸に沿って並べたもの）を示す。横軸をコンテンツ再生時刻（各々の時刻情報／トラックタイムスタンプ周波数）とする。 FIG. 3 shows a data structure of content data having a plurality of tracks (arranged along the time axis). The horizontal axis represents the content playback time (each time information / track time stamp frequency).

図３中、コンテンツ素片、FN1，FN2…FNn，FNm等は、コンテンツ素片に割り当てられたフレーム番号である。１トラックのフレーム番号の総数は一致している必要はなく、図３では、トラック１は総フレーム数がｎ、トラック２は総フレーム数がｍとなっている。また、コンテンツのデータが、MPEG-4 AAC hi-bitrate符号化などの高能率符号化を施されている場合は、１つのコンテンツ素片に、複数の高能率符号化の圧縮パケットを含んでいてもかまわない。 In FIG. 3, content segments, FN1, FN2,... FNn, FNm, etc. are frame numbers assigned to the content segments. The total number of frame numbers of one track does not need to match, and in FIG. 3, track 1 has a total number of frames n and track 2 has a total number of frames m. If the content data has been subjected to high-efficiency encoding such as MPEG-4 AAC hi-bitrate encoding, a single content segment includes a plurality of high-efficiency encoded compressed packets. It doesn't matter.

しかし、１つのコンテンツ素片に含まれる複数の圧縮パケットを再生した際の時間長の合計は、２つ以上のトラックから同一フレーム番号のコンテンツ素片を抽出した時、一致していなければならない。そのため、同一フレーム番号FNxにおける各トラックのコンテンツ素片xの時刻情報xは、各トラックのトラックタイムスタンプ周波数が同じ値であれば、必ず同じ値を有している。 However, the total time length when reproducing a plurality of compressed packets included in one content segment must match when content segments having the same frame number are extracted from two or more tracks. Therefore, the time information x of the content segment x of each track in the same frame number FNx always has the same value if the track timestamp frequency of each track is the same value.

従って、図３中の矢印（Ａ）で示すように、フレーム番号FN2でトラック１からトラック２へと再生途中で切り替えても、滑らかに再生を継続することができる。先に述べたオーサリング部５１ａは、コンテンツデータを生成するのに好適である。同一のソース音声データを、順次異なるキー変数を指定して、各トラックに格納すれば、全ての音声データはステップＳ０４にて同一ルールでコンテンツ素片へ分割されるために前述のデータ構造を備える。 Therefore, as shown by the arrow (A) in FIG. 3, even when switching from the track 1 to the track 2 in the middle of the reproduction at the frame number FN2, the reproduction can be continued smoothly. The authoring unit 51a described above is suitable for generating content data. If the same source audio data is sequentially stored in each track by designating different key variables, all the audio data is divided into content segments according to the same rule in step S04, so that the above-described data structure is provided. .

このように、音程の異なるトラックを複数作成し、作成されたコンテンツデータを、オーサリング装置５ａはコンテンツ配信サーバ２ａへ供給することで、コンテンツのオーサリング時に高品質な音程変換が可能になるため、サービス品質を向上させることができる。 In this way, a plurality of tracks having different pitches are created, and the authoring device 5a supplies the created content data to the content distribution server 2a, so that high-quality pitch conversion can be performed during content authoring. Quality can be improved.

次に、伝送制御情報の送受信時におけるコンテンツ配信サーバ２ａの伝送制御部２２ａおよびコンテンツ受信装置３ａの伝送制御部３２ａの動作について、図４のシーケンス図を用いて説明する。図４は、RTSPを用いて１つのコンテンツを選択し、伝送のためのセッションを確立し、再生を開始し、停止するまでに行われる処理であり、以下に示す送受信されるメッセージは、伝送制御情報に含まれる。 Next, operations of the transmission control unit 22a of the content distribution server 2a and the transmission control unit 32a of the content receiving device 3a at the time of transmission / reception of transmission control information will be described with reference to the sequence diagram of FIG. FIG. 4 shows a process that is performed until one content is selected using RTSP, a session for transmission is established, playback is started, and stopped. Included in the information.

なお、ここでは、コンテンツ配信サーバ２ａのホスト名を“server.jvc-victor.jp”とし、コンテンツを蓄積しているコンテンツファイル名を“hoge.mp4”とし、“hoge.mp4”内部には１つのMPEG-4 AAC音声トラックが存在し、トラック番号は１であるとし、再生時間は235秒であるとする。 Here, the host name of the content distribution server 2a is “server.jvc-victor.jp”, the content file name in which the content is stored is “hoge.mp4”, and “hoge.mp4” contains 1 Assume that there are two MPEG-4 AAC audio tracks, the track number is 1, and the playback time is 235 seconds.

RTSPは、コンテンツをネットワーク上で一意に特定するための資源識別子として、コンテンツURIを適用する。コンテンツ“hoge.mp4”は以下のようなコンテンツURIによって表される。 RTSP applies the content URI as a resource identifier for uniquely identifying the content on the network. The content “hoge.mp4” is represented by the following content URI.

“rtsp://server.jvc-victor.jp/hoge.mp4”
まず、伝送制御部３２ａは、コンテンツURIを送信することによって、セッション記述を要求する（ステップＳ１１）。このセッション記述とは、コンテンツURIに関連付けられたコンテンツデータに対して、どのようなセッション（通信路）が確立できるかを示したテキスト情報である。セッション記述形式としては、SDPが適用される。RTSPにおいてセッション記述の要求にはDESCRIBEメソッドを使用する。なお、各メソッドおよびその応答メッセージ（その他付随する情報も含めて）は、それぞれリクエストメッセージのヘッダおよびレスポンスメッセージのヘッダに挿入されて送受信される。 “Rtsp: //server.jvc-victor.jp/hoge.mp4”
First, the transmission control unit 32a requests a session description by transmitting a content URI (step S11). The session description is text information indicating what kind of session (communication path) can be established for the content data associated with the content URI. SDP is applied as the session description format. Use the DESCRIBE method to request a session description in RTSP. Each method and its response message (including other accompanying information) are inserted and received in the header of the request message and the header of the response message, respectively.

“DESCRIBE rtsp://server.jvc-victor.jp/hoge.mp4 RTSP/1.0”
伝送制御部２２ａは、セッション記述を含んだ応答を伝送制御部３２ａに送信する（ステップＳ１２）。セッション記述には、再生時間と、メディア記述が含まれる。この再生時間とは、指定したコンテンツURIに関連付けられた連続メディアの最大の再生時間である。再生時間のフォーマットは多種類あるが、最も簡単なフォーマットは、開始時間と終了時間を、浮動小数点を用いた秒数で表したNPTである。例えば、235秒分のMPEG-4 AAC音声トラックを含んだコンテンツデータを指し示すコンテンツURIの再生時間を、NPTを用いて表すと以下のようになる。 “DESCRIBE rtsp: //server.jvc-victor.jp/hoge.mp4 RTSP / 1.0”
The transmission control unit 22a transmits a response including the session description to the transmission control unit 32a (step S12). The session description includes a playback time and a media description. This playback time is the maximum playback time of continuous media associated with the specified content URI. There are many different playback time formats, but the simplest format is NPT, which represents the start time and end time in seconds using floating point. For example, the playback time of a content URI indicating content data including an MPEG-4 AAC audio track for 235 seconds is expressed as follows using NPT.

“rtsp://server.jvc-victor.jp/hoge.mp4/trackID=1”
セッション記述を解析した伝送制御部３２ａは、MPEG-4 AACステレオ音声を伝送するためのセッションを確立することを決定し、セッションの受信アドレスを決定し、セッション確立準備の要求を伝送制御部２２ａへ送信する（ステップＳ１３）。RTSPにおけるセッション確立準備要求にはSETUPメソッドを用いる。下記の例では、受信アドレスは、伝送制御部３２ａの持つ受信アドレス（136.198.190.100）と受信ポート番号（6668-6669）となっている。 “Rtsp: //server.jvc-victor.jp/hoge.mp4/trackID=1”
The transmission control unit 32a that has analyzed the session description determines to establish a session for transmitting MPEG-4 AAC stereo audio, determines a reception address of the session, and sends a request for session establishment preparation to the transmission control unit 22a. Transmit (step S13). The SETUP method is used for a session establishment preparation request in RTSP. In the following example, the reception address is the reception address (136.198.190.100) and the reception port number (6668-6669) of the transmission control unit 32a.

“SETUP rtsp://server.jvc-victor.jp/hoge.mp4/trackID=1 RTSP/1.0”
“Transport: RTP/AVP/UDP;unicast;destination=136.198.190.100;client_port=6668-
6669”
セッション確立準備要求を正しく受信した伝送制御部２２ａは、新たに送信アドレスを決定し、送信アドレスを送信部２３へ入力し、あわせて、セッション情報を伝送制御部３２ａへ送信する（ステップＳ１４）。セッション情報には、配信に用いる配信サーバの送信アドレス（例では136.198.190.1）や送信ポート番号（下記の例では19000〜19001）が含まれる。 “SETUP rtsp: //server.jvc-victor.jp/hoge.mp4/trackID=1 RTSP / 1.0”
“Transport: RTP / AVP / UDP; unicast; destination = 136.198.190.100; client_port = 6668-
6669 ”
The transmission control unit 22a that has correctly received the session establishment preparation request newly determines a transmission address, inputs the transmission address to the transmission unit 23, and transmits session information to the transmission control unit 32a (step S14). The session information includes a transmission address (136.198.190.1 in the example) and a transmission port number (19000 to 19001 in the following example) used for distribution.

“Transport: RTP/AVP/UDP;unicast;source=136.198.190.1;server_port=19000-19001”
また、伝送制御部３２ａでは、セッション情報を受け付けた時点で、先に決定しておいた受信アドレスを受信部３１へ入力する。送信部２３と受信部３１の双方の処理が完了した時点で、新たなセッションが確立される。 “Transport: RTP / AVP / UDP; unicast; source = 136.198.190.1; server_port = 19000-19001”
In addition, the transmission control unit 32 a inputs the previously determined reception address to the reception unit 31 when the session information is received. A new session is established when the processing of both the transmission unit 23 and the reception unit 31 is completed.

次に、伝送制御部３２ａは、伝送制御部２２ａに対して、利用者の入力操作に応じた再生するコンテンツのトラックを指定するトラック切替要求（CHANGEメソッド）を送信する（ステップＳ１５）。トラック番号指定（TrackID=）には、次に切り替えたいトラック番号を設定する。 Next, the transmission control unit 32a transmits to the transmission control unit 22a a track switching request (CHANGE method) that designates a track of content to be played back in accordance with a user input operation (step S15). In the track number designation (TrackID =), the track number to be switched next is set.

“CHANGE rtsp://server.jvc-victor.jp/hoge.mp4/TrackID=2 RTSP/1.0”
伝送制御部２２ａは、トラック切替要求を受信すると、コンテンツ記憶部２５を参照して、次に切り替えたいトラック番号に対応するトラックがコンテンツhoge.mp4内に存在する場合は、下記の正常応答を送信する（ステップＳ１６）。 “CHANGE rtsp: //server.jvc-victor.jp/hoge.mp4/TrackID=2 RTSP / 1.0”
Upon receiving the track switching request, the transmission control unit 22a refers to the content storage unit 25, and if the track corresponding to the track number to be switched next exists in the content hoge.mp4, transmits the normal response below. (Step S16).

“RTSP/1.0 200 OK”
また、次に切り替えたいトラック番号に対応するトラックがコンテンツhoge.mp4内に存在しない場合は、伝送制御部２２ａは、下記の異常応答を送信する。 “RTSP / 1.0 200 OK”
If the track corresponding to the track number to be switched next does not exist in the content hoge.mp4, the transmission control unit 22a transmits the following abnormal response.

“RTSP/1.0 404 NOT FOUND”
なお、このトラック切替要求およびトラック切替応答は一例であって、文字列表現の多少の差異によって独自性が失われることはない。本実施形態ではRTSPを拡張したが、HTTP（Hyper Text Transfer Protocol）のGETメソッドを用いて、トラック切替要求を以下のように書き替えてもよい（Network Working Group, "Hypertext Transfer Protocol -- HTTP/1.1", RFC2616, The Internet Society, June 1999）。 “RTSP / 1.0 404 NOT FOUND”
Note that the track switching request and the track switching response are examples, and uniqueness is not lost due to a slight difference in the character string expression. In this embodiment, RTSP is extended, but the track switching request may be rewritten as follows using the GET method of HTTP (Hyper Text Transfer Protocol) (Network Working Group, “Hypertext Transfer Protocol—HTTP / 1.1 ", RFC2616, The Internet Society, June 1999).

“GET /hoge.mp4?TrackID=2 HTTP/1.1”
GETメソッドにはコンテンツ配信サーバ２ａのホスト名が含まれないが、コンテンツファイル名と次に切り替えるトラック番号を、RTSPと同様の手法で内包させることができる。 “GET /hoge.mp4?TrackID=2 HTTP / 1.1”
Although the GET method does not include the host name of the content distribution server 2a, the content file name and the track number to be switched next can be included in the same manner as RTSP.

次に、伝送制御部３２ａは、再生開始要求を送信する（ステップＳ１７）。再生開始要求では、再生範囲を、NPTを用いて指定することができる。下記の例では、コンテンツの最初（0.0秒）から最後（235.0秒）までの指定している。RTSPにおいて再生開始要求にはPLAYメソッドを用いる。 Next, the transmission control unit 32a transmits a reproduction start request (step S17). In the playback start request, the playback range can be specified using NPT. In the example below, the content is specified from the beginning (0.0 seconds) to the end (235.0 seconds). The PLAY method is used for a playback start request in RTSP.

“PLAY rtsp://server.jvc-victor.jp/hoge.mp4 RTSP/1.0”
“Range: npt=0.0-235.0”
伝送制御部２２ａは、再生開始要求の応答を送信する（ステップＳ１８）。再生開始要求を受け付けた伝送制御部２２ａは、既に準備の済んでいる全てのセッション対して、各セッションのコントロールURIと再生開始要求に含まれる再生範囲から、コンテンツファイル名・トラック番号・コンテンツ再生開始時刻を算出し、コマンド入力＝再生開始定数（例えば“PLAＹ”）と合わせて、コンテンツ取得部２１に入力し、RTPパケットの送信を開始する。 “PLAY rtsp: //server.jvc-victor.jp/hoge.mp4 RTSP / 1.0”
“Range: npt = 0.0-235.0”
The transmission control unit 22a transmits a response to the reproduction start request (Step S18). The transmission control unit 22a that has received the reproduction start request, for all the sessions that have already been prepared, from the control URI of each session and the reproduction range included in the reproduction start request, the content file name / track number / content reproduction start The time is calculated and input to the content acquisition unit 21 together with command input = reproduction start constant (for example, “PLAY”), and transmission of the RTP packet is started.

また、この応答は、コンテンツ受信装置３ａのコンテンツ読取部３５の動作に必要となるRTPパケット情報を含む。例えば、このセッションにおいて、最初に送られてくるRTPパケットのRTPタイムスタンプ（ネットワークタイムスタンプ）などが含まれる（下記の例では0000000）。 This response includes RTP packet information necessary for the operation of the content reading unit 35 of the content receiving device 3a. For example, in this session, the RTP time stamp (network time stamp) of the first RTP packet sent is included (0000000 in the following example).

“RTP-Info: url= rtsp://server.jvc-victor.jp/hoge.mp4/trackID=1;rtptime=0000000;”
なおトラック切替要求は、ステップＳ１４の準備要求応答以降かつ停止前であれば、いつでも送信して良い。従って、あるトラックを再生途中でトラック切替が入力されると、伝送制御部３２ａは、ステップＳ１５で送信されたトラック切替要求を送信し（ステップＳ１９）、伝送制御部２２ａは、トラック切替要求を受信すると、次に切り替えたいトラック番号をコンテンツ取得部２１へ入力する。コンテンツ取得部２１は、次に切り替えたいトラック番号の存在を確認して、切替結果（＝正常または＝異常）を伝送制御部２２ａへ入力する。伝送制御部２２ａは、切替結果が入力されると、トラック切替応答（切替結果＝正常であれば正常応答へ、切替結果＝異常であれば異常応答）を伝達制御部３２ａへ送信する（ステップＳ２０）。 “RTP-Info: url = rtsp: //server.jvc-victor.jp/hoge.mp4/trackID=1; rtptime = 0000000;”
The track switching request may be transmitted at any time after the preparation request response in step S14 and before the stop. Accordingly, when track switching is input during playback of a certain track, the transmission control unit 32a transmits the track switching request transmitted in step S15 (step S19), and the transmission control unit 22a receives the track switching request. Then, the track number to be switched next is input to the content acquisition unit 21. The content acquisition unit 21 confirms the presence of the track number to be switched next, and inputs the switching result (= normal or = abnormal) to the transmission control unit 22a. When the switching result is input, the transmission control unit 22a transmits a track switching response (a normal response if the switching result = normal, an abnormal response if the switching result = abnormal) to the transmission control unit 32a (step S20). ).

また、コンテンツ取得部２１は、指定されたトラックのコンテンツ素片をコンテンツ記憶部２５から取得し、送信部２３を介してコンテンツ受信装置３ａへ送信する（詳細は後述）。 Further, the content acquisition unit 21 acquires a content fragment of the designated track from the content storage unit 25 and transmits it to the content reception device 3a via the transmission unit 23 (details will be described later).

また、伝送制御部３２ａは、利用者が停止入力を行った時点で停止要求を送信する（ステップＳ２１）。RTSPにおいて停止要求はTEARDOWNメソッドである。 Moreover, the transmission control part 32a transmits a stop request | requirement at the time of a user performing stop input (step S21). In RTSP, the stop request is the TEARDOWN method.

“TEARDOWN rtsp://server.jvc-victor.jp/hoge.mp4 RTSP/1.0”
停止要求を受け付けた伝送制御部２２ａは、送信部２３を制御し、RTPパケットの送信を停止、セッションを切断し、送信を停止したことを通知する（ステップＳ２２）。また、伝送制御部２２ａは、コンテンツ取得部２１に対して、コマンド入力＝再生停止定数（例えば“STOP”）を入力し、読み出しを停止する。 “TEARDOWN rtsp: //server.jvc-victor.jp/hoge.mp4 RTSP / 1.0”
The transmission control unit 22a that has received the stop request controls the transmission unit 23 to stop transmission of the RTP packet, disconnect the session, and notify that the transmission has been stopped (step S22). Further, the transmission control unit 22a inputs a command input = reproduction stop constant (for example, “STOP”) to the content acquisition unit 21, and stops reading.

なお、説明の簡単化のために、１つのコンテンツデータに1つのコンテンツのみが含まれると仮定していたが、前記の伝送制御機構は容易に複数のコンテンツの同期再生に拡張可能である。例えば、ステップＳ１２において、複数のコンテンツに関するメディア記述を列挙し、ステップＳ１３〜ステップＳ１４の準備要求をメディア記述分繰り返すだけで、複数のコンテンツの同期再生が容易に行うことが可能である。 For simplification of explanation, it is assumed that only one content is included in one content data. However, the transmission control mechanism can be easily extended to synchronous reproduction of a plurality of contents. For example, synchronized playback of a plurality of contents can be easily performed by simply listing media descriptions regarding a plurality of contents in step S12 and repeating the preparation requests in steps S13 to S14 for the media descriptions.

次に、コンテンツデータを読み出す際のコンテンツ取得部２１の再生時刻管理処理（送信側）について、図５のフローチャートを用いて説明する。 Next, the reproduction time management process (transmission side) of the content acquisition unit 21 when reading content data will be described with reference to the flowchart of FIG.

まず、コンテンツ受信装置３ａからコンテンツの再生要求がされると、コンテンツ配信サーバ２ａのコンテンツ取得部２１は、以下に示す６つの変数の初期化を行う（ステップＳ３１）。 First, when a content reproduction request is made from the content receiving device 3a, the content acquisition unit 21 of the content distribution server 2a initializes the following six variables (step S31).

１：コンテンツファイル名CF＝デフォルトのコンテンツファイル名（例えば“hoge.mp4”）
２：トラック番号TN＝デフォルトのトラック番号（例えば“1”）
３：開始フレーム番号FN＝デフォルトのフレーム番号（例えば“1”）
４：再生開始時刻STS＝システムクロック発振部２４から得た現在時刻NTS
５：コンテンツ再生時刻CTS＝“0”
６：再生状態フラグF＝真偽値“Ｎ”
次に、コンテンツ取得部２１は、自動的に再生を開始するか否かを示す予め定められた自動モード定数を参照し、自動再生を行うかどうか判定する（ステップＳ３２）。自動モード定数が真偽値“Ｙ”である場合は、ステップＳ４４の処理へ移行し、自動モード定数が真偽値“Ｎ”である場合には、ステップＳ３３の処理へ移行する。 1: Content file name CF = Default content file name (eg "hoge.mp4")
2: Track number TN = default track number (eg "1")
3: Start frame number FN = default frame number (for example, “1”)
4: Playback start time STS = current time NTS obtained from the system clock oscillator 24
5: Content playback time CTS = “0”
6: Playback state flag F = true value “N”
Next, the content acquisition unit 21 refers to a predetermined automatic mode constant indicating whether or not to automatically start reproduction, and determines whether to perform automatic reproduction (step S32). If the automatic mode constant is a true / false value “Y”, the process proceeds to step S44. If the automatic mode constant is a true / false value “N”, the process proceeds to step S33.

ステップＳ３２の処理において、自動モード定数が真偽値“Ｎ”であった場合には、コンテンツ取得部２１は、外部入力（コマンド入力CMD、コンテンツファイル名CF1、トラック番号TN1、コンテンツ再生開始時刻CTS1の４つの変数の組）があるかどうか判定し（ステップＳ３３）、外部入力がある場合に真偽値“Ｙ”となりステップＳ３４の処理へ移行し、外部入力がない場合に真偽値“Ｎ”となりステップＳ４０の処理へ移行する。 In the process of step S32, if the automatic mode constant is a true / false value “N”, the content acquisition unit 21 performs external input (command input CMD, content file name CF1, track number TN1, content playback start time CTS1. (Step S33), if there is an external input, the truth value is “Y”, and the process proceeds to step S34. If there is no external input, the truth value “N” "And the process proceeds to step S40.

ステップＳ３３の処理において、真偽値“Ｙ”であった場合には、コンテンツ取得部２１は、外部入力のうち、コマンド入力CMD＝トラック切替定数（例えば“CHANGE”）であるか判定し（ステップＳ３４）、トラック切替定数であった場合に真偽値“Ｙ”となりステップＳ３５の処理へ移行し、真偽値“Ｎ”の場合は、ステップＳ３８の処理へ移行する。 In the process of step S33, if the truth value is “Y”, the content acquisition unit 21 determines whether command input CMD = track switching constant (eg, “CHANGE”) among external inputs (step S33). S34) If the track switching constant, the true / false value is “Y”, and the process proceeds to step S35. If the true / false value is “N”, the process proceeds to step S38.

また、ステップＳ３４の処理において、真偽値“Ｙ”であった場合には、コンテンツ取得部２１は、外部入力のうち（コンテンツファイル名CF1、トラック番号TN1）を用いて、CFとCF1が等しく、かつ、コンテンツファイル名CFにトラック番号TN1が指し示すトラックが存在するか、トラック切替可能か判定し（ステップＳ３５）、可能であれ真偽値“Ｙ”となりステップＳ３６の処理へ移行し、トラック番号TN＝トラック番号TN1となる代入しトラック切替処理を行った上で、伝送制御部２２ａへ切替結果＝正常を通知する（ステップＳ３６）。また、トラックが存在せずに切替不可能であれば真偽値“Ｎ”となり、コンテンツ取得部２１は、ステップＳ３７の処理へ移行し、伝送制御部２２ａへ切替結果＝異常を通知する（ステップＳ３７）。 Also, in the process of step S34, if the truth value is “Y”, the content acquisition unit 21 uses the external input (content file name CF1, track number TN1) to make CF and CF1 equal. In addition, it is determined whether or not the track indicated by the track number TN1 exists in the content file name CF and whether the track can be switched (step S35). If possible, the true value is “Y”, and the process proceeds to step S36. After substituting TN = track number TN1 and performing track switching processing, the transmission control unit 22a is notified of the switching result = normal (step S36). If the track is not present and cannot be switched, the truth value “N” is obtained, and the content acquisition unit 21 proceeds to the process of step S37 and notifies the transmission control unit 22a of the switching result = abnormal (step). S37).

また、ステップＳ３４の処理において、真偽値“Ｎ”となった場合には、コンテンツ取得部２１は、外部入力のうちコマンド入力CMD＝再生停止定数であるかを判定し（ステップＳ３８）、コマンド入力CMD＝再生停止定数である場合は真偽値“Ｙ”となり、全体の動作を停止終了し、それ以外の場合は真偽値“Ｎ”となり、ステップＳ３９の処理へ移行する。 In the process of step S34, if the truth value is “N”, the content acquisition unit 21 determines whether command input CMD = reproduction stop constant among the external inputs (step S38), and the command When the input CMD = reproduction stop constant, the truth value “Y” is obtained, and the entire operation is stopped. Otherwise, the truth value “N” is obtained, and the process proceeds to step S39.

ステップＳ３８の処理において、真偽値“Ｎ”であった場合には、コンテンツ取得部２１は、外部入力（コンテンツファイル名CF1、トラック番号TN1、コンテンツ再生開始時刻CTS1）の３変数を用いて５つの変数の初期化を行う（ステップＳ３９）。 In the process of step S38, if the truth value is “N”, the content acquisition unit 21 uses the three variables of the external input (content file name CF1, track number TN1, content playback start time CTS1). Two variables are initialized (step S39).

１：コンテンツファイル名CF＝コンテンツファイル名CF1
２：トラック番号TN＝トラック番号TN1
３：再生開始時刻STS＝システムクロック発振部２４から得た現在時刻NTS
４：コンテンツ再生時刻CTS＝コンテンツ再生開始時刻CTS1
５：再生状態フラグF＝真偽値“Ｙ”
さらにコンテンツ取得部２１は、コンテンツ再生時刻CTSを用いた開始フレーム番号FNの初期化を行い、前段階として送出開始時刻情報StartTSを算出する。送出開始時刻情報StartTSの算出式を以下に示す。 1: Content file name CF = Content file name CF1
2: Track number TN = Track number TN1
3: Playback start time STS = current time NTS obtained from the system clock oscillator 24
4: Content playback time CTS = Content playback start time CTS1
5: Playback state flag F = true value “Y”
Further, the content acquisition unit 21 initializes the start frame number FN using the content reproduction time CTS, and calculates transmission start time information StartTS as a previous step. The calculation formula of the transmission start time information StartTS is shown below.

[数７]
StartTS＝CTS×TRS
ここでトラックタイムスタンプ周波数TRSは、コンテンツファイル名CFの中のトラック番号TNが指し示すトラックのトラックタイムスタンプ周波数である。 [Equation 7]
StartTS = CTS × TRS
Here, the track time stamp frequency TRS is the track time stamp frequency of the track indicated by the track number TN in the content file name CF.

次に、コンテンツ取得部２１は、コンテンツファイル名CFの中のトラック番号TNが指し示すトラックのフレーム番号を１から順に検索し、時刻情報≧StartTSが成り立つ点のフレーム番号NFNを得て、フレーム番号NFNを開始フレーム番号FNに設定する。 Next, the content acquisition unit 21 sequentially searches the frame number of the track indicated by the track number TN in the content file name CF from 1 to obtain the frame number NFN of the point where time information ≧ StartTS is satisfied, and the frame number NFN. Is set to the start frame number FN.

また、ステップＳ３３の処理において、コンテンツ取得部２１は、再生状態フラグＦの値が真偽値“Ｙ”であるかどうか判定し（ステップＳ４０）、値が真偽値“Ｙ”である場合ステップＳ４１の処理へ移行し、真偽値“Ｎ”である場合にはステップＳ３３の処理へ移行する。 Also, in the process of step S33, the content acquisition unit 21 determines whether or not the value of the reproduction state flag F is a true / false value “Y” (step S40), and if the value is a true / false value “Y”, step The process proceeds to S41, and if the value is “N”, the process proceeds to Step S33.

ステップＳ４０の処理において再生状態フラグの値が真偽値“Ｙ”であった場合、またはステップＳ３６、Ｓ３７、Ｓ３９から継続する処理の場合には、コンテンツ取得部２１は、システムクロック発振部２４から得た現在時刻NTSと再生開始時刻STSとシステムクロック周波数SSから経過時間を算出し、経過時間とコンテンツ再生時刻CTSとトラックタイムスタンプ周波数TRSから、送出可能時刻情報LastTSを算出することによって、送信可能範囲を検索する（ステップＳ４１）。送出可能時刻情報LastTSの算出式を以下に示す。 In the case where the value of the reproduction state flag is the true / false value “Y” in the process of step S40, or in the case of the process continued from steps S36, S37, and S39, the content acquisition unit 21 starts from the system clock oscillation unit 24. Transmission is possible by calculating the elapsed time from the obtained current time NTS, playback start time STS, and system clock frequency SS, and calculating sendable time information LastTS from the elapsed time, content playback time CTS, and track time stamp frequency TRS A range is searched (step S41). The formula for calculating the sendable time information LastTS is shown below.

[数８]
LastTS＝（NTS−STS）／SS×TRS
ここでシステムクロック周波数SSとは、（現在時刻NTS−再生開始時刻STS）を秒単位に変換するための定数である。例えばシステムクロックが1/1000000秒の精度を持っているとすると1000000である。 [Equation 8]
LastTS = (NTS−STS) / SS × TRS
Here, the system clock frequency SS is a constant for converting (current time NTS−reproduction start time STS) into seconds. For example, if the system clock has an accuracy of 1/1000000 second, it is 1000000.

次に、コンテンツ取得部２１は、コンテンツファイル名CFの中のトラック番号TNが指し示すトラック内の、開始フレーム番号FNからフレーム番号を順にサーチし、時刻情報≧LastTSが成り立つ点のフレーム番号NFNを取得する。条件を満たすフレーム番号NFNが求まらない場合は、コンテンツ取得部２１は、フレーム番号NFNに終端定数“−1”を設定する。 Next, the content acquisition unit 21 sequentially searches the frame number from the start frame number FN in the track indicated by the track number TN in the content file name CF, and acquires the frame number NFN at which time information ≧ LastTS is satisfied. To do. When the frame number NFN that satisfies the condition is not found, the content acquisition unit 21 sets the termination constant “−1” to the frame number NFN.

次に、コンテンツ取得部２１は、開始フレーム番号FNとフレーム番号NFNを比較することによって送信の可否を判定し（ステップＳ４２）、FN＜NFNが成立する場合には真偽値“Ｙ”となりステップＳ４３の処理へ移行し、成立しない場合は真偽値“Ｎ”となりステップＳ３３の処理へ移行する。 Next, the content acquisition unit 21 determines whether or not transmission is possible by comparing the start frame number FN and the frame number NFN (step S42). If FN <NFN is satisfied, the truth value “Y” is obtained. The process proceeds to S43. If not established, the truth value is “N”, and the process proceeds to Step S33.

ステップＳ４２の処理において真偽値“Ｙ”であった場合には、コンテンツ取得部２１は、コンテンツファイル名CFの中のトラック番号TNが指し示すトラック内の、開始フレーム番号FNからフレーム番号NFN未満の各フレーム番号に対応するコンテンツ素片それぞれと、それぞれのコンテンツ素片について、対応する時刻情報から下記の式に従って生成されたネットワークタイムスタンプPTSとを、送信部２３へ出力する（ステップＳ４３）。 If the value in the process of step S42 is true / false “Y”, the content acquisition unit 21 is less than the frame number NFN from the start frame number FN in the track indicated by the track number TN in the content file name CF. Each content unit corresponding to each frame number and the network time stamp PTS generated from the corresponding time information according to the following formula for each content unit are output to the transmitter 23 (step S43).

[数９]
PTS＝時刻情報／TRS×ネットワークタイムスタンプ周波数
出力後、送信済みの位置まで開始フレーム番号FNをずらすため、コンテンツ取得部２１は、開始フレーム番号FN＝フレーム番号NFNと設定する。 [Equation 9]
After outputting PTS = time information / TRS × network time stamp frequency, the content acquisition unit 21 sets start frame number FN = frame number NFN to shift the start frame number FN to the transmitted position.

また、ステップＳ３２の処理において、自動モード定数が真偽値“Ｙ”であった場合には、コンテンツ取得部２１は、再生状態フラグF＝真偽値“Ｙ”と設定し、再生開始時刻STS＝システムクロック発振部２４から得た現在時刻NTSを設定し、自動再生開始のための設定を行う（ステップＳ４４）。 If the automatic mode constant is the true / false value “Y” in the process of step S32, the content acquisition unit 21 sets the reproduction state flag F = the true / false value “Y” and the reproduction start time STS. = Set the current time NTS obtained from the system clock oscillating unit 24 and set for automatic reproduction start (step S44).

このように、伝送制御部２２ａおよび伝送制御部３２ａによって確立されるコントロールラインでトラック切替のメッセージが送受信され、そのメッセージに従って、時刻情報を基に切り替えた先のトラックのコンテンツ素片を取得、送信するので、コンテンツ受信装置３ａでは、トラックが切り替えられても滑らかな再生を行うことができる。 In this way, a track switching message is transmitted and received on the control line established by the transmission control unit 22a and the transmission control unit 32a, and the content piece of the destination track switched based on the time information is acquired and transmitted according to the message. Therefore, the content receiving device 3a can perform smooth reproduction even when the track is switched.

なお、送信部２３−受信部３１間の通信は、RTP（Real-time Transfer Protocol, RFC1889）の規約に従って行っても良い。 Note that communication between the transmission unit 23 and the reception unit 31 may be performed in accordance with RTP (Real-time Transfer Protocol, RFC1889) rules.

なお、図１のコンテンツ配信システム１ａでは、コンテンツ配信サーバ２ａ、コンテンツ受信装置３ａ、オーサリング装置５ａが、１つずつしかない構成であったが、図６に示すように、オーサリング装置５ａに複数台のコンテンツ配信サーバ２ａを接続し、それぞれのコンテンツ配信サーバ２ａにコンテンツ受信装置３ｂを接続するようにしても良い。 In the content distribution system 1a of FIG. 1, the content distribution server 2a, the content receiving device 3a, and the authoring device 5a have only one configuration. However, as shown in FIG. The content distribution server 2a may be connected, and the content receiving device 3b may be connected to each content distribution server 2a.

これは、多数の利用者にサービスを提供可能な通信カラオケシステムの例として好適である。オーサリング装置５ａで音程変換処理を行って複数トラックに音程の異なるコンテンツデータが記録されたコンテンツデータを用意することによって、複数のコンテンツ配信サーバ２ａに同一のコンテンツデータをコピーするだけで、各コンテンツ受信装置３ａは音程変換を行う手段を搭載せずとも、トラックを変更することで再生中に動的な音程変換を行うことが可能となる。また、利用者が増加し、コンテンツ配信サーバ２ａとコンテンツ受信装置３ａの組が増加しても、音程変換処理に必要とされる時間・演算コストは常に一定であり、劇的なコスト削減が可能となる。 This is suitable as an example of a communication karaoke system that can provide services to a large number of users. Receiving each content only by copying the same content data to a plurality of content distribution servers 2a by preparing content data in which content data having different pitches are recorded on a plurality of tracks by performing a pitch conversion process in the authoring device 5a. Even if the device 3a is not equipped with means for performing pitch conversion, it is possible to perform dynamic pitch conversion during playback by changing the track. In addition, even if the number of users increases and the number of content distribution servers 2a and content receivers 3a increases, the time and calculation costs required for the pitch conversion process are always constant, and dramatic cost reductions are possible. It becomes.

≪第２の実施形態：デュエット曲におけるパートナー音声の切り替え≫
上記の実施形態では、予めどのトラックにどの音程のコンテンツデータが記録されているか分かっていることを前提としている。すなわち、トラック番号とキー変数が１対１で対応付けされている状態と言える。 << Second embodiment: Switching partner audio in duet >>
In the above embodiment, it is assumed that you know whether what pitch of the content data in advance which tracks are recorded. That is, it can be said that the track number and the key variable are associated one by one.

しかし、デュエット曲の場合は、伴奏の音程が変更されるとパートナー音声もあわせて音程を変更する必要がある。そこで第２の実施形態では、パートナー音声の切り替えとキーコントロールを同時に実現するコンテンツ配信システム１ｂについて説明する。 However, in the case of a duet song, when the accompaniment pitch is changed, it is necessary to change the pitch of the partner voice as well. Therefore, in the second embodiment, a content distribution system 1b that simultaneously realizes partner voice switching and key control will be described.

図７に、コンテンツ配信システム１ｂ（通信カラオケシステム）のシステム構成を示す。コンテンツ配信システム１ｂは、コンテンツ配信サーバ２ｂ、コンテンツ受信装置３ｂ（カラオケ装置）、およびオーサリング装置５ｂとから構成され、コンテンツ配信サーバ２ｂとコンテンツ受信装置３ｂは、IPネットワーク４によって相互に接続される。なお、第１の実施形態と同じものについては、同じ番号を付し、その詳細な説明を省略する。 FIG. 7 shows a system configuration of the content distribution system 1b (communication karaoke system). The content distribution system 1b includes a content distribution server 2b, a content reception device 3b (karaoke device), and an authoring device 5b. The content distribution server 2b and the content reception device 3b are connected to each other by an IP network 4. In addition, the same number is attached | subjected about the same thing as 1st Embodiment, and the detailed description is abbreviate | omitted.

コンテンツ配信サーバ２ｂは、コンテンツ取得部２１、伝送制御部２２ｂ、送信部２３、システムクロック発振部２４、およびコンテンツ記憶部２５を有する。 The content distribution server 2b includes a content acquisition unit 21, a transmission control unit 22b, a transmission unit 23, a system clock oscillation unit 24, and a content storage unit 25.

伝送制御部２２ｂは、コンテンツ受信装置３ｂからのコンテンツの選択、コンテンツの再生・停止、トラック切替要求等の指示を含む伝送制御情報を受信すると、トラック番号、トラックキー情報およびトラックデュエット情報を含む応答を返信し、この伝送制御情報を、コンテンツ取得部２１と、送信部２３に通知する機能を有する。また、伝送制御部２２ｂは、コンテンツ記憶部２５を参照してコンテンツファイル名、トラック番号、コンテンツ再生開始時刻を取得し、それらをコンテンツ取得部２１に対して入力し、送信部２３に対して送信アドレスを入力する機能を有する。 When the transmission control unit 22b receives transmission control information including instructions such as content selection, content playback / stop, and track switching request from the content receiving device 3b, the response including the track number, track key information, and track duet information is received. And transmitting the transmission control information to the content acquisition unit 21 and the transmission unit 23. Also, the transmission control unit 22 b refers to the content storage unit 25 to acquire the content file name, track number, and content playback start time, inputs them to the content acquisition unit 21, and transmits them to the transmission unit 23. It has a function to input an address.

また、コンテンツ受信装置３ｂは、受信部３１、伝送制御部３２ｂ、バッファ部３３、システムクロック発振部３４、コンテンツ読取部３５、再生部３６、入力部３７、マイク入力部３８、および出力部３９を有する。 The content receiving device 3b includes a receiving unit 31, a transmission control unit 32b, a buffer unit 33, a system clock oscillation unit 34, a content reading unit 35, a reproduction unit 36, an input unit 37, a microphone input unit 38, and an output unit 39. Have.

伝送制御部３２ｂは、入力部３７からの伝送制御情報をコンテンツ配信サーバ２ｂの伝送制御部２２ｂへ送信、その応答を受信し、応答を解析した後、受信アドレスを抽出して受信部３１へ入力し、トラック番号、トラックキー情報およびトラックデュエット情報を抽出して出力部３９へ通知する機能を有する。 The transmission control unit 32b transmits the transmission control information from the input unit 37 to the transmission control unit 22b of the content distribution server 2b, receives the response, analyzes the response, extracts the received address, and inputs it to the reception unit 31. The track number, track key information, and track duet information are extracted and notified to the output unit 39.

出力部３９は、モニタ等の表示装置が接続され、セッション記述を含んだ応答から抽出されるトラックキー情報とトラックデュエット情報を表示装置に出力する機能を有する。 The output unit 39 is connected to a display device such as a monitor, and has a function of outputting track key information and track duet information extracted from a response including a session description to the display device.

また、オーサリング装置５ｂは、オーサリング部５１ｂ、音程変換部５２、コンテンツ送信部５３、入力部５４、マイク入力部５５、およびコンテンツ一次記憶部５６を有する。 The authoring device 5 b includes an authoring unit 51 b, a pitch conversion unit 52, a content transmission unit 53, an input unit 54, a microphone input unit 55, and a content primary storage unit 56.

オーサリング部５１ｂは、入力部５４からの指示入力とマイク入力部５５からの音声データを用いて、コンテンツデータの新規作成・トラック追加・コンテンツデータの削除等、コンテンツデータの作成編集を行う機能、作成編集したコンテンツデータにトラックキー情報とトラックデュエット情報を付加してコンテンツ一次記憶部５６に記録する機能を有する。 The authoring unit 51b uses the instruction input from the input unit 54 and the audio data from the microphone input unit 55 to create and edit content data, such as creating new content data, adding tracks, deleting content data, etc. It has a function of adding track key information and track duet information to the edited content data and recording them in the content primary storage unit 56.

第２の実施形態においては、１トラックのコンテンツデータにパートナー音声が含まれているか否かを示すデュエット変数ｄを設ける。１トラックのコンテンツデータがパートナー音声無しの場合は、デュエット変数ｄ＝“0”とし、男性パートナー音声入りの場合は、デュエット変数ｄ＝“1”とし、女性パートナー音声入りの場合は、デュエット変数ｄ＝“2”とする。なお、伴奏なしのパートナー音声のみのトラックを用意しても良い。 In the second embodiment, a duet variable d indicating whether partner audio is included in the content data of one track is provided. When the content data of one track has no partner audio, the duet variable d = “0”, when the male partner audio is included, the duet variable d = “1”, and when the female partner audio is input, the duet variable d = “2”. A track with only partner audio without accompaniment may be prepared.

図８に第２の実施形態におけるコンテンツデータのデータ構造を示す。第１の実施形態と異なる点は、各トラックの情報として、コンテンツデータにトラックキー情報（１〜ｔ）とトラックデュエット情報（１〜ｔ）が追加されている点である。この時、添字ｔの最大値は総トラック数Ｔの値に等しい。 FIG. 8 shows the data structure of content data in the second embodiment. The difference from the first embodiment is that track key information (1 to t) and track duet information (1 to t) are added to the content data as information of each track. At this time, the maximum value of the subscript t is equal to the value of the total number of tracks T.

このトラックキー情報は、そのトラックのコンテンツデータがどの程度移調されているかを示す情報であり、オーサリング装置５ｂのオーサリング部５１ｂが、コンテンツデータを生成する際に参照するキー変数ｋの値が記録される。また、トラックデュエット情報は、パートナー音声の有無を示す情報であり、デュエット変数ｄの値が記録される。 The track key information is information indicating how much the content data of the track is transposed, and the value of the key variable k to be referred to when the authoring unit 51b of the authoring device 5b generates the content data is recorded. The The track duet information is information indicating the presence / absence of partner voice, and the value of the duet variable d is recorded therein.

例えば、図８のトラック番号が1のトラックは、トラックキー情報１の値が“0”、トラックデュエット情報１の値が“1”となっているので、移調されていない男性パートナーの音声入りのコンテンツデータであることが分かる。 For example, the track number 1 in FIG. 8 has a track key information 1 value of “0” and a track duet information 1 value of “1”. It turns out that it is content data.

次に、伝送制御情報の送受信時におけるコンテンツ配信サーバ２ｂの伝送制御部２２ｂおよびコンテンツ受信装置３ｂの伝送制御部３２ｂの動作について説明する。 Next, operations of the transmission control unit 22b of the content distribution server 2b and the transmission control unit 32b of the content receiving device 3b at the time of transmission / reception of transmission control information will be described.

伝送制御部３２ｂが、コンテンツURIを送信することによってセッション記述を要求すると、下記のメディア記述を含みセッション記述を含んだ応答を伝送制御部３２ａに送信する。 When the transmission control unit 32b requests a session description by transmitting a content URI, the transmission control unit 32b transmits a response including the following media description and including the session description to the transmission control unit 32a.

例えば、コンテンツ記憶部２５に用意されているコンテンツデータに２つのトラックがあり、トラック番号が1のトラックのトラックキー情報が0、トラックデュエット情報が1であり、トラック番号が2のトラックのトラックキー情報が1、トラックデュエット情報が2であるとすると、メディア記述は下記の通りとなる。 For example, the content data prepared in the content storage unit 25 has two tracks, the track key information of the track with the track number 1 is 0, the track duet information is 1, and the track key of the track with the track number 2 If the information is 1 and the track duet information is 2, the media description is as follows.

“m=audio 0 RTP/AVP/UDP 96”
“a=rtpmap:96 mpeg4-generic/48000/2”
“a=control:rtsp://server.jvc-victor.jp/hoge.mp4/trackID=1”
“a=fmtp:96 streamtype=5; profile-level-id=15; mode=AAC-hbr; config=1190; SizeLength=13; IndexLength=3; IndexDeltaLength=3; Profile=1;”
“a=karaoke:96 keyinfo=0; duetinfo=1”
“m=audio 0 RTP/AVP/UDP 97”
“a=rtpmap:97 mpeg4-generic/48000/2”
“a=control:rtsp://server.jvc-victor.jp/hoge.mp4/trackID=2”
“a=fmtp:97 streamtype=5; profile-level-id=15; mode=AAC-hbr; config=1190; SizeLength=13; IndexLength=3; IndexDeltaLength=3; Profile=1;”
“a=karaoke:97 keyinfo=0; duetinfo=2”
このうち、keyinfoはトラックキー情報が設定される変数であり、duetinfoはトラックデュエット情報が設定される変数である。また、“96”“97”という数値は、SDPにおいてfmt_litsと呼ばれ、各メディア記述を識別するための値である。 “M = audio 0 RTP / AVP / UDP 96”
“A = rtpmap: 96 mpeg4-generic / 48000/2”
“A = control: rtsp: //server.jvc-victor.jp/hoge.mp4/trackID=1”
“A = fmtp: 96 streamtype = 5; profile-level-id = 15; mode = AAC-hbr; config = 1190; SizeLength = 13; IndexLength = 3; IndexDeltaLength = 3; Profile = 1;”
“A = karaoke: 96 keyinfo = 0; duetinfo = 1”
“M = audio 0 RTP / AVP / UDP 97”
“A = rtpmap: 97 mpeg4-generic / 48000/2”
“A = control: rtsp: //server.jvc-victor.jp/hoge.mp4/trackID=2”
“A = fmtp: 97 streamtype = 5; profile-level-id = 15; mode = AAC-hbr; config = 1190; SizeLength = 13; IndexLength = 3; IndexDeltaLength = 3; Profile = 1;”
“A = karaoke: 97 keyinfo = 0; duetinfo = 2”
Of these, keyinfo is a variable in which track key information is set, and duetinfo is a variable in which track duet information is set. The numerical values “96” and “97” are called fmt_lits in the SDP, and are values for identifying each media description.

セッション記述を解析した伝送制御部３２ｂは、メディア記述からトラック番号、トラックキー情報、およびトラックデュエット情報を取得し出力部３９へ通知すると共に、MPEG-4 AACステレオ音声を伝送するためのセッションを確立することを決定し、セッションの受信アドレスを決定し、セッション確立準備の要求を伝送制御部２２ｂへ送信する。 The transmission control unit 32b that has analyzed the session description acquires the track number, track key information, and track duet information from the media description, notifies the output unit 39, and establishes a session for transmitting MPEG-4 AAC stereo audio. The session reception address is determined, and a request for session establishment preparation is transmitted to the transmission control unit 22b.

出力部３９は、トラック番号とトラックキー情報およびトラックデュエット情報を関連付けてモニタに出力する。この時の表示形態は、キー変数ｋやデュエット変数ｄの数字ではなく、音符のアイコンや、男性女性のアイコン等、デザインされた画像を表示するようにしても良い。このモニタに表示されたトラック番号、トラックキー情報、およびトラックデュエット情報を基に、利用者が変更したいトラックを選択すると、伝送制御部３２ｂは、利用者の入力操作によって入力されたトラック番号を入力部３７から取得し、第１の実施形態で示したトラックの切替処理が行われる。 The output unit 39 outputs the track number, the track key information, and the track duet information in association with each other to the monitor. The display form at this time may display a designed image such as a note icon or a male / female icon instead of the numbers of the key variable k and the duet variable d. When the user selects a track to be changed based on the track number, track key information, and track duet information displayed on the monitor, the transmission control unit 32b inputs the track number input by the user's input operation. The track switching process acquired from the unit 37 and shown in the first embodiment is performed.

以上、本発明の実施形態について説明したが、本発明はこれらに限定されるものでない。コンテンツ配信サーバ２、コンテンツ受信装置３、およびオーサリング装置５の構成要件のうち、全てまたは一部をコンピュータで実行可能なプログラムとして実現し、予めコンピュータ読み取り可能な記録媒体などに記録して提供することも可能である。 As mentioned above, although embodiment of this invention was described, this invention is not limited to these. All or part of the configuration requirements of the content distribution server 2, the content receiving device 3, and the authoring device 5 are realized as a computer-executable program, and are recorded in a computer-readable recording medium and provided in advance. Is also possible.

また、物理的に１台のコンピュータで、コンテンツ配信サーバ２に相当するプログラムを複数実行しても良い。さらに、物理的に１台のコンピュータで、コンテンツ受信装置３に相当するプログラムを複数実行しても良い。 A plurality of programs corresponding to the content distribution server 2 may be executed by a single physical computer. Further, a plurality of programs corresponding to the content receiving device 3 may be executed by a single physical computer.

また、物理的に1台のコンピュータで、コンテンツ配信サーバ２に相当するプログラムとコンテンツ受信装置３に相当するプログラムを実行しても良い。さらに、物理的に１台のコンピュータで、コンテンツ配信サーバ２に相当するプログラムとオーサリング装置５に相当するプログラムを実行しても良い。 Further, a program corresponding to the content distribution server 2 and a program corresponding to the content receiving device 3 may be executed by a single computer. Furthermore, the program corresponding to the content distribution server 2 and the program corresponding to the authoring device 5 may be executed by a single physical computer.

コンテンツ配信システム１ａ（非同期モデル）におけるコンテンツ配信サーバ２ａ、コンテンツ受信装置３ａ、およびオーサリング装置５ａの機能ブロック図である。It is a functional block diagram of the content distribution server 2a, the content receiver 3a, and the authoring apparatus 5a in the content distribution system 1a (asynchronous model). オーサリング時の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence at the time of authoring. 複数トラックを有するコンテンツデータのデータ構造を示す図である。It is a figure which shows the data structure of the content data which has several tracks. 伝送制御情報の送受信時におけるコンテンツ配信サーバ２ａの伝送制御部２２ａおよびコンテンツ受信装置３ａの伝送制御部３２ａの動作を示すシーケンス図である。It is a sequence diagram which shows operation | movement of the transmission control part 22a of the content delivery server 2a at the time of transmission / reception of transmission control information, and the transmission control part 32a of the content receiver 3a. コンテンツデータを読み出す際のコンテンツ取得部２１の再生時刻管理処理（送信側）を示すフローチャートである。It is a flowchart which shows the reproduction time management process (transmission side) of the content acquisition part 21 at the time of reading content data. オーサリング装置５ａに複数台のコンテンツ配信サーバ２ａを接続した構成例を示す図である。It is a figure which shows the structural example which connected the several content delivery server 2a to the authoring apparatus 5a. コンテンツ配信システム１ｂ（非同期モデル）におけるコンテンツ配信サーバ２ｂ、コンテンツ受信装置３ｂ、およびオーサリング装置５ｂの機能ブロック図である。It is a functional block diagram of the content distribution server 2b, the content receiver 3b, and the authoring apparatus 5b in the content distribution system 1b (asynchronous model). コンテンツデータのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of content data. 非同期モデルにおけるコンテンツ配信サーバ１０２ａとコンテンツ受信装置１０３ａの機能ブロック図である。It is a functional block diagram of the content delivery server 102a and the content receiver 103a in an asynchronous model. コンテンツデータのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of content data. コンテンツファイルを読み出す際のコンテンツ取得部１０２１の再生時刻管理処理（送信側）を示すフローチャートである。It is a flowchart which shows the reproduction time management process (transmission side) of the content acquisition part 1021 at the time of reading a content file. ネットワークタイムスタンプを用いたコンテンツ読取部１０３４の再生時刻管理処理（受信側）を示すフローチャートである。It is a flowchart which shows the reproduction time management process (reception side) of the content reading part 1034 using a network time stamp. 伝送制御機構を備えるコンテンツ配信システム１０１ｂを構成するコンテンツ配信サーバ１０２ｂとコンテンツ受信装置１０３ｂの機能ブロック図である。It is a functional block diagram of the content delivery server 102b and the content receiver 103b which comprise the content delivery system 101b provided with a transmission control mechanism. 伝送制御情報の送受信時における伝送制御部１０２５および伝送制御部１０３６の動作を示すシーケンス図である。It is a sequence diagram which shows operation | movement of the transmission control part 1025 and the transmission control part 1036 at the time of transmission / reception of transmission control information. バッファ部１０３２に蓄積されているコンテンツ素片の順列を示す図である。It is a figure which shows the permutation of the content piece accumulate | stored in the buffer part 1032. ２トラック同時受信可能なコンテンツ配信システム１０１ｃを構成するコンテンツ配信サーバ１０２ｃとコンテンツ受信装置１０３ｃの機能ブロック図である。It is a functional block diagram of the content distribution server 102c and the content receiver 103c which comprise the content distribution system 101c which can receive 2 tracks simultaneously. 複数トラック同時受信構成を有するコンテンツ受信装置１０３ｃの動作を示すフローチャートである。It is a flowchart which shows operation | movement of the content receiver 103c which has a multi-track simultaneous reception structure.

Explanation of symbols

１ａ、ｂ…コンテンツ配信システム、２ａ、ｂ…コンテンツ配信サーバ、３ａ、ｂ…コンテンツ受信装置、４…IPネットワーク、５ａ、ｂ…オーサリング装置、２１…コンテンツ取得部、２２ａ、ｂ…伝送制御部、２３…送信部、２４…システムクロック発振部、２５…コンテンツ記憶部、３１…受信部、３２ａ、ｂ…伝送制御部、３３…バッファ部、３４…システムクロック発振部、３５…コンテンツ読取部、３６…再生部、３７…入力部、３８…マイク入力部、３９…出力部、５１ａ、ｂ…オーサリング部、５２…音程変換部、５３…コンテンツ送信部、５４…入力部、５５…マイク入力部、５６…コンテンツ一次記憶部、１０１ａ、ｂ、ｃ…コンテンツ配信システム、１０２ａ、ｂ、ｃ…コンテンツ配信サーバ、１０３ａ、ｂ、ｃ…コンテンツ受信装置、１０４…ネットワーク、１０２１…コンテンツ取得部（Ａ，Ｂ）、１０２２…送信部（Ａ，Ｂ）、１０２３…システムクロック発振部、１０２４…コンテンツ記憶部、１０２５…伝送制御部、１０３１…受信部（Ａ，Ｂ）、１０３２…バッファ部（Ａ，Ｂ）、１０３３…システムクロック発振部、１０３４…コンテンツ読取部（Ａ，Ｂ）、１０３５…再生部、１０３６…伝送制御部、１０３７…入力部、１０３８…切替部 1a, b ... content delivery system, 2a, b ... content delivery server, 3a, b ... content receiving device, 4 ... IP network, 5a, b ... authoring device, 21 ... content acquisition unit, 22a, b ... transmission control unit, 23 ... Transmitter, 24 ... System clock oscillator, 25 ... Content storage, 31 ... Receiver, 32a, b ... Transmission controller, 33 ... Buffer, 34 ... System clock oscillator, 35 ... Content reader, 36 ... Playback unit 37 ... Input unit 38 ... Microphone input unit 39 ... Output unit 51a, b ... Authoring unit 52 ... Pitch conversion unit 53 ... Content transmission unit 54 ... Input unit 55 ... Microphone input unit 56 ... Content primary storage unit, 101a, b, c ... Content distribution system, 102a, b, c ... Content distribution server, 103a, b, c ... Tens receiving device, 104 ... network, 1021 ... content acquisition unit (A, B), 1022 ... transmission unit (A, B), 1023 ... system clock oscillation unit, 1024 ... content storage unit, 1025 ... transmission control unit, 1031 ... Receiving part (A, B), 1032 ... Buffer part (A, B), 1033 ... System clock oscillation part, 1034 ... Content reading part (A, B), 1035 ... Reproduction part, 1036 ... Transmission control part, 1037 ... Input Part, 1038... Switching part

Claims

A content distribution server storing content data including a plurality of tracks having a plurality of content segment data and a content reproduction request to the content distribution server via the network, and responding to the reproduction request A content distribution method in a content distribution system comprising a content receiving device that receives and reproduces the content fragment data transmitted from a content distribution server via a network,
The content data stored in the content distribution server is content data having a file name and a plurality of tracks, and the track data of each track includes a track number, a track timestamp frequency, the total number of frames, and a pitch transposition. Track key information indicating the degree of sound, track duet information and frame number indicating presence / absence of partner audio, time information indicating playback timing, and content segment data, and the data of each track in the same frame number Frames have the same time information;
The content receiving device requesting a session description that is prior information for establishing a session between the content receiving device and the content distribution server and receiving the content fragment data;
The content distribution server transmitting the session description including at least the track number, the track key information, and the track duet information for each track;
When the content receiving device receives the session description, the track number information and the track duet information are associated with the track number and displayed.
The content receiving device transmitting a content reproduction request message together with a file name of the content data, the track number, and a content reproduction range;
When the content distribution server receives the playback request message, the content distribution server acquires and transmits the content segment data of the designated track according to the file name, the track number, and the playback range of the content;
Have
When switching between the tracks,
The content receiving device transmitting a track switching request message together with the track number;
When the content distribution server receives the switching request message, it should be played back in succession to the content segment data whose playback order is the last among the transmitted content segment data according to the track number. Obtaining and transmitting the content fragment data of the designated track having time information; and
A content distribution method characterized by comprising:

In accordance with the playback request from the content receiving device, the content segment data requested to be played back is selected from the content data having a plurality of tracks having a plurality of content segment data, and the selected content segment data is transmitted via the network. A content distribution server for transmitting to the content receiving device,
Content data having a file name and a plurality of tracks, the track data of each track having a track number and a plurality of data frames, each data frame including a track number, a track timestamp frequency, and a total frame Number, track key information indicating the degree of transposition of pitch, track duet information and frame number indicating presence / absence of partner sound, time information indicating playback timing, and content segment data, and the same frame number Content storage means for storing content data in which the data frame of each track has the same time information;
When a session description that is prior information for establishing a session between the content receiving device and the content distribution server and receiving the content fragment data is requested, at least the track number for each track, Transmitting the session description including the track key information and the track duet information;
A transmission control means for receiving a content reproduction request message together with a file name of the content data, the track number, and a reproduction range of the content, and a track switching request message together with the track number;
When the transmission control means receives the playback request message, the content control unit obtains the content fragment data of the designated track from the content storage means according to the file name, the track number, and the playback range of the content, When the transmission control means receives the switching request message, time information that should be continuously reproduced in accordance with the track number, in the content segment data whose reproduction order is the last among the transmitted content segment data Content acquisition means for acquiring the content fragment data of the designated track having the content storage means from the content storage means;
A content distribution server comprising:

A content distribution server that stores content data including a plurality of tracks having a plurality of content segment data is requested to reproduce the content via the network, and the network is connected from the content distribution server according to the reproduction request. A content receiving device for receiving and playing back content fragment data transmitted via
The content data stored in the content distribution server is content data having a file name and a plurality of tracks, and the track data of each track has a track number and a plurality of data frames, and each data frame Has a track number and a plurality of data frames, and each data frame has a track number, a track time stamp frequency, the total number of frames, track key information indicating the degree of pitch transposition, and a track indicating the presence or absence of partner audio. It has duet information and frame number, time information indicating the timing of reproduction, and content fragment data, and the data frame of each track in the same frame number has the same time information,
For the content distribution server,
Requesting a session description which is prior information for establishing a session between the content receiving device and the content distribution server and receiving the content fragment data;
When the session description is received, the track key information and the track duet information are displayed in association with the track number,
When playing back content, a content playback request message is transmitted together with the file name of the content data, the track number, and the playback range of the content. When switching tracks, a track switching request message is sent together with the track number. A content receiving apparatus comprising transmission control means for transmitting a message.