JP2009204841A

JP2009204841A - Voice processor, voice processing method, and program

Info

Publication number: JP2009204841A
Application number: JP2008046423A
Authority: JP
Inventors: Norikazu Miura; 憲和三浦; Tsutomu Watanabe; 勉渡邉
Original assignee: Konami Digital Entertainment Co Ltd
Current assignee: Konami Digital Entertainment Co Ltd
Priority date: 2008-02-27
Filing date: 2008-02-27
Publication date: 2009-09-10
Anticipated expiration: 2028-02-27
Also published as: JP4714230B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice processor regenerating waveform data expressing a singing voice of the human being, and a command data expressing an accompaniment, while synchronizing therewith each other. <P>SOLUTION: A storage part 302 of the voice processor 301 stores a command sequence for regenerating a voice, and the plurality of waveform data to be regenerated synchronously, a command regeneration part 303 starts the generation of the command sequence, a lapse time measuring part 304 measures a lapse time clocked from the start of the regeneration with precision of a prescribed command time length, a fragment waveform selecting part 305 finds a breakoff to minimize an error when expressing the lapse time of breaking the waveform data to be regenerated in the near future, in each prescribed waveform time length from the head, by the command time length, and a waveform regeneration part 306 starts the regeneration of the following waveform data after the breakoff, when the lapse time comes to the lapse time of the breakoff found with the precision of the command time length. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、人間の歌声等を表す波形データと、その伴奏等を表すコマンドデータとを同期させて再生するのに好適な音声処理装置、音声処理方法、ならびに、これらをコンピュータにて実現するプログラムに関する。 The present invention relates to a sound processing apparatus, a sound processing method, and a program for realizing them on a computer, which are suitable for synchronizing and reproducing waveform data representing human singing voice and the like and command data representing accompaniment thereof. About.

従来から、ＰＣＭ（Pulse Coded Modulation）、ＡＤＰＣＭ（Adaptive Differential PCM）、ＭＰ３（MPeg audio layer-3）、Ｖｏｒｂｉｓ等によって表現される波形データと、ＭＩＤＩ（Musical Instrument Digital Interface）もしくはＭＭＬ（Music Macro Language）、ＰＳＧ（Programmable Sound Generator）のレジスタに与える値等によって表現されるコマンドと、を組み合わせて音声を出力する技術が利用されている。 Conventionally, waveform data represented by PCM (Pulse Coded Modulation), ADPCM (Adaptive Differential PCM), MP3 (MPeg audio layer-3), Vorbis, etc., MIDI (Musical Instrument Digital Interface) or MML (Music Macro Language) A technique for outputting a sound by combining a command expressed by a value given to a register of PSG (Programmable Sound Generator) and the like is used.

たとえば以下の文献には、自由かつエンタテインメント性の高い着信音を出力し、音源データ更新の際のユーザや携帯端末の負担を少なくするため、携帯端末にＰＣＭデータを記録したＰＣＭデータ記録部、ＰＳＧデータを記録したＰＳＧデータ記録部を設け、着信音データ生成部は、ＰＣＭデータ記録部、ＰＳＧデータ記録部から適宜音源データであるＰＣＭデータ、ＰＳＧデータを読み出し、これらを組み合わせて着信音データを生成し、これに基づく着信音を着信音出力スピーカ装置から出力させる技術が開示されている。
特開２００１−２４５０２０号公報 For example, in the following document, a PCM data recording unit that records PCM data on a portable terminal, PSG, in order to output a free and highly entertaining ringtone and reduce the burden on the user and portable terminal when updating sound source data, A PSG data recording unit for recording data is provided, and the ring tone data generation unit reads PCM data and PSG data as sound source data from the PCM data recording unit and the PSG data recording unit, and generates ring tone data by combining them. And the technique of outputting the ringtone based on this from a ringtone output speaker apparatus is disclosed.
JP 2001-245020 A

このような音声合成技術は、各種のゲームにおいて、キャラクターの発声とＢＧＭ（Back Ground Music）とを合成して出力する場合にも応用が可能である。 Such speech synthesis technology can also be applied to a case where a character utterance and BGM (Back Ground Music) are synthesized and output in various games.

ここで、キャラクターの声は声優に演じさせ、人間の肉声を利用して不自然さをなくす一方で、ＢＧＭなどはＭＩＤＩなどによる演奏を利用してデータ量を抑制する手法が、一般に採用されている。 Here, a technique is generally adopted in which the voice of the character is played by a voice actor and the human voice is used to eliminate unnaturalness, while the BGM uses a performance such as MIDI to suppress the amount of data. Yes.

したがって、伴奏に合わせてキャラクターが歌を歌うような場面では、ＰＣＭなどにより表現されている波形データの再生と、ＭＩＤＩなどにより表現されているコマンドデータの再生と、の同期をとる必要がある。 Therefore, in a scene where the character sings along with the accompaniment, it is necessary to synchronize the reproduction of the waveform data expressed by PCM and the reproduction of the command data expressed by MIDI.

一般に、波形データの再生は、所定のレジスタやバッファに一定間隔で当該間隔分の波形データを書き込んだり、所定の再生ライブラリの関数を当該間隔分の波形データを引数として渡す処理を一定間隔で実行することで行われる。たとえば、サンプリングレート４４１００Ｈｚ、モノラル１６ビット（２バイト）の波形データを、垂直同期周期である６０分の１秒間隔で書き込む場合には、垂直同期割込が生じるごとに、２×４４１００×（１／６０）＝１４７０バイトを渡すことになる。この場合、波形データを１４７０バイトごとに区切った断片波形単位で管理する技術が利用されている。 In general, waveform data is replayed at predetermined intervals by writing waveform data for a predetermined interval to a predetermined register or buffer, or passing a function of a predetermined reproduction library as an argument for the waveform data for the predetermined interval. It is done by doing. For example, when waveform data with a sampling rate of 44100 Hz and monaural 16 bits (2 bytes) is written at intervals of 1/60 second, which is the vertical synchronization period, 2 × 44100 × (1 / 60) = 1470 bytes will be passed. In this case, a technique for managing the waveform data in units of fragment waveforms obtained by dividing every 1470 bytes is used.

したがって、圧縮を行わない場合には、経過時間とデータの大きさとは、比例することになる。また、ＭＰ３等のように圧縮を行っている場合であっても、最終的にはＰＣＭデータに変換する必要があるため、同じ議論を敷衍することができる。 Therefore, when compression is not performed, the elapsed time and the data size are proportional. Further, even when compression is performed as in MP3 or the like, since it is necessary to finally convert to PCM data, the same argument can be made.

一方、コマンドデータは、ＰＳＧデータにせよ、ＭＩＤＩデータにせよ、各種のＭＭＬデータにせよ、楽曲のテンポを決めるコマンド、複数のチャネルのそれぞれに楽器の音色を割り当てるコマンド、いずれのチャネルでどの音程でどの音長さでどの音量で音を出すか、を指定するコマンドの列と考えることができる。 On the other hand, the command data is PSG data, MIDI data, various MML data, a command for determining the tempo of the music, a command for assigning the tone of the musical instrument to each of a plurality of channels, and at which pitch in which channel. It can be thought of as a sequence of commands that specify which sound length and sound volume to produce.

したがって、再生に要する時間とコマンド列の長さとは、必ずしも比例しない。したがって、コマンド列を解釈しながら何らかの処理を実行するために、音符に対する基本精度を定め、その精度の間隔で割込を生じさせたり、カウンタが増加されるようにして、経過時間を計測する技術が利用されている。 Therefore, the time required for reproduction is not necessarily proportional to the length of the command string. Therefore, in order to execute some processing while interpreting the command string, a technique is provided for measuring the elapsed time by determining the basic precision for the note and generating an interrupt at an interval of that precision or increasing the counter. Is being used.

この基本精度は、拍（４分音符の時間長）を所定の定数で割り算した時間長とすることが多い。この場合、コマンドにより指定されるテンポが変化すると、基本精度も変化することになる。 In many cases, the basic accuracy is a time length obtained by dividing a beat (time length of a quarter note) by a predetermined constant. In this case, when the tempo specified by the command changes, the basic accuracy also changes.

さて、キャラクターが伴奏に合わせて歌う場合等を考えると、伴奏音は途切れなく出力されるべきであるが、歌唱においては、発声等を行うため音が生じている「フレーズ」と呼ばれる区間と、呼吸等を行うため事実上無音となるべき「ブレス」と呼ばれる区間とがあるのが一般的である。また、一つの楽曲の中で同じ「フレーズ」が複数回利用されることもある。 Now, considering the case where the character sings to accompaniment, etc., the accompaniment sound should be output without interruption, but in singing, the section called `` phrase '' where sound is generated to utter, etc., In general, there is a section called “breath” that should be virtually silent for breathing and the like. In addition, the same “phrase” may be used a plurality of times in one piece of music.

したがって、再生すべきデータの中から、波形データのデータ量を低減するため、ブレス区間等の無音区間を除去することができるようにする技術が求められている。 Therefore, there is a need for a technique that can remove silent sections such as breath sections in order to reduce the amount of waveform data from data to be reproduced.

一方で、無音区間を除去した場合には、波形データの再生とコマンドデータの再生との間で適切に同期をとる技術が必要となる。 On the other hand, when the silent section is removed, a technique for appropriately synchronizing between the reproduction of the waveform data and the reproduction of the command data is required.

また、無音区間を除去しない場合であっても、波形データを、たとえば小節単位などで管理して、楽曲内で繰り返し同じように歌われるパートについては、共用することでデータ量を抑制したいことも多い。 In addition, even if the silent section is not removed, the waveform data may be managed in units of measures, for example, and it may be desired to reduce the amount of data by sharing parts that are repeatedly sung in the music. Many.

さらに、コマンドデータにおける経過時間の計測の精度と、波形データにおける断片波形の時間長とが一致しない状況を考慮する必要もある。 Furthermore, it is necessary to consider a situation where the measurement accuracy of the elapsed time in the command data does not match the time length of the fragment waveform in the waveform data.

本発明は、上記のような課題を解決するもので、人間の歌声等を表す波形データと、その伴奏等を表すコマンドデータとを同期させて再生するのに好適な音声処理装置、音声処理方法、ならびに、これらをコンピュータにて実現するプログラムを提供することを目的とする。 The present invention solves the above-described problems, and is a sound processing apparatus and sound processing method suitable for synchronizing and reproducing waveform data representing a human singing voice and the like and command data representing the accompaniment and the like. An object of the present invention is to provide a program that realizes these on a computer.

以上の目的を達成するため、本発明の原理にしたがって、下記の発明を開示する。 In order to achieve the above object, the following invention is disclosed in accordance with the principle of the present invention.

本発明の第１の観点に係る音声処理装置は、記憶部、コマンド再生部、経過時間計測部、断片波形選択部、波形再生部を備え、以下のように構成する。 The speech processing apparatus according to the first aspect of the present invention includes a storage unit, a command playback unit, an elapsed time measurement unit, a fragment waveform selection unit, and a waveform playback unit, and is configured as follows.

すなわち、記憶部には、再生すべき音程および音長を指定するコマンドを含むコマンド列が１つ記憶され、所定の波形時間長の音声の波形を指定し、連続して再生されるべき複数の断片波形からなる断片波形列が記憶される。 That is, the storage unit stores one command sequence including a command for designating a pitch and a tone length to be reproduced, designates a sound waveform having a predetermined waveform time length, and a plurality of commands to be reproduced continuously. A fragment waveform sequence composed of fragment waveforms is stored.

ここで、当該断片波形列には、所定の基準時からの経過時間が対応付けられ、当該断片波形列に含まれる断片波形のそれぞれには、当該断片波形列において当該断片波形列より前に存在する断片波形の個数に当該波形時間長を乗じて当該断片波形列に対応付けられる経過時間を加算した時間が、当該基準時からの経過時間として対応付けられる。 Here, an elapsed time from a predetermined reference time is associated with the fragment waveform sequence, and each fragment waveform included in the fragment waveform sequence is present before the fragment waveform sequence in the fragment waveform sequence. The time obtained by multiplying the number of fragment waveforms to be multiplied by the waveform time length and adding the elapsed time associated with the fragment waveform sequence is associated as the elapsed time from the reference time.

典型的には、コマンド列は、伴奏音などを奏でるためのものであり、断片波形列は、人間の歌声などを表現するためのものである。ここで、断片波形列のそれぞれは、人間の歌声のフレーズ（先頭に無音区間があっても良い）に相当するものであり、楽曲の再生を開始した時点を基準時としたときに、そのフレーズを再生すべきなのは、どれだけ時間が経過してからなのか、を、断片波形列に対応付けられる経過時間で表す。 Typically, the command sequence is for playing accompaniment sounds and the like, and the fragment waveform sequence is for expressing human singing voices and the like. Here, each of the fragment waveform sequences corresponds to a phrase of a human singing voice (which may have a silent section at the beginning), and the phrase when the playback start of the music is used as a reference time. Is to be reproduced by the elapsed time associated with the fragment waveform sequence.

また、断片波形列は、断片波形が並んだ列であり、当該断片波形を順に再生することで、当該フレーズの音声が出力される。したがって、各断片波形を再生すべき経過時間は、断片波形列に対応付けられる経過時間と、断片波形列の中でその断片波形が（０から数え始めて）何番目かと、から計算することが可能である。また、各断片波形の時間長は、波形データ再生の際のバッファ長に呼応する時間長である。 The fragment waveform sequence is a sequence in which fragment waveforms are arranged, and the sound of the phrase is output by sequentially reproducing the fragment waveforms. Therefore, the elapsed time for each fragment waveform to be reproduced can be calculated from the elapsed time associated with the fragment waveform sequence and the number of the fragment waveform in the fragment waveform sequence (starting counting from 0). It is. Further, the time length of each fragment waveform is a time length corresponding to the buffer length at the time of waveform data reproduction.

一方、コマンド再生部は、当該コマンド列を先頭から解釈して、当該コマンド列に指定された音程および音長の音声波形の再生を開始する。 On the other hand, the command playback unit interprets the command sequence from the beginning, and starts playback of a sound waveform having a pitch and a tone length specified in the command sequence.

上記のように、コマンドには、テンポを指定するもの、各チャネルに割り当てる楽器の音色を指定するもの、ある小節において、どのチャネルでどの音程でどの音長でどの大きさで音を出すかを指定するもの等が存在する。 As described above, the command specifies the tempo, the timbre of the instrument to be assigned to each channel, and which channel in which measure, which pitch, which pitch, and how loud. There are things to specify.

さらに、経過時間計測部は、コマンド再生部による音声波形の再生が開始されてからの経過時間を、所定のコマンド時間長の精度で計測する。 Furthermore, the elapsed time measuring unit measures the elapsed time from the start of the reproduction of the audio waveform by the command reproducing unit with an accuracy of a predetermined command time length.

上記のように、コマンド時間長の精度は、コマンド再生部の機能によって異なるが、拍を所定の定数で割り切った時間長とすることが多い。この場合、テンポが変化すると、コマンド時間長の精度も変化することになる。 As described above, the accuracy of the command time length varies depending on the function of the command playback unit, but is often the time length obtained by dividing the beat by a predetermined constant. In this case, when the tempo changes, the accuracy of the command time length also changes.

そして、断片波形選択部は、当該複数の断片波形のうち、当該断片波形に対応付けられる経過時間を、当該コマンド時間長の精度で表現した場合の誤差が最小となる誤差最小断片波形を選択する。 Then, the fragment waveform selection unit selects, from among the plurality of fragment waveforms, the minimum error fragment waveform that minimizes the error when the elapsed time associated with the fragment waveform is expressed with the accuracy of the command time length. .

すなわち、当該断片波形列に含まれる断片波形を先頭から順に走査して、どこから再生を開始すると、波形データとコマンドデータとの時間的なずれが最小となるか、を調べる。 In other words, the fragment waveforms included in the fragment waveform sequence are scanned in order from the top, and it is checked where reproduction starts from where the time lag between the waveform data and the command data is minimized.

この際の基準となるのが、ある断片波形を再生すべき経過時間Tを、当該経過時間Tにおけるコマンド時間長Δtにより表現した場合の誤差eである。典型的には、以下の条件を満たす整数nを見つけることができれば、
Δt×n≦T<Δt×(n+1)
当該誤差eは、
e = min(T-Δt×n，Δt×(n+1)-T)
のように表現することができる。 The reference in this case is an error e when the elapsed time T for reproducing a certain fragment waveform is expressed by the command time length Δt at the elapsed time T. Typically, if you can find an integer n that satisfies the following condition:
Δt × n ≦ T <Δt × (n + 1)
The error e is
e = min (T-Δt × n, Δt × (n + 1) -T)
It can be expressed as

本発明においては、複数の断片波形のうち、いずれから再生を開始することとすると、コマンド列との同期ずれが生じにくいか、を、あらかじめ選択することができる。 In the present invention, it is possible to select in advance which of the plurality of fragment waveforms, from which reproduction is to be started, is less likely to cause synchronization deviation with the command sequence.

さらに、波形再生部は、当該コマンド時間長の精度で計測された経過時間が、選択された誤差最小断片波形に対応付けられる経過時間に至ると、当該断片波形列のうち、当該選択された誤差最小断片波形以降の断片波形の再生を開始する。 Further, when the elapsed time measured with the accuracy of the command time length reaches the elapsed time associated with the selected error minimum fragment waveform, the waveform reproduction unit selects the selected error in the fragment waveform sequence. Start playback of fragment waveforms after the minimum fragment waveform.

経過時間を調べる精度がコマンド列に指定されるテンポ等によって支配され、波形再生が垂直同期割込等の間隔によって支配される場合であっても、上記のように、誤差最小の断片波形から再生を開始するため、ずれをできるだけ抑制することができる。 Even if the accuracy of checking the elapsed time is governed by the tempo specified in the command sequence and waveform playback is governed by intervals such as vertical synchronization interrupts, as described above, playback is performed from the fragment waveform with the smallest error. Therefore, the deviation can be suppressed as much as possible.

したがって、本発明によれば、コマンド列による音声の再生に同期させて、波形データによる音声を誤差最小断片波形から開始して再生することができるようになる。 Therefore, according to the present invention, the sound based on the waveform data can be reproduced starting from the minimum error fragment waveform in synchronization with the sound reproduction based on the command sequence.

本発明のその他の観点に係る音声処理装置は、記憶部、コマンド再生部、経過時間計測部、断片波形選択部、波形再生部を備え、以下のように構成する。 A speech processing apparatus according to another aspect of the present invention includes a storage unit, a command playback unit, an elapsed time measurement unit, a fragment waveform selection unit, and a waveform playback unit, and is configured as follows.

すなわち、記憶部には、再生すべき音程および音長を指定するコマンドを含むコマンド列が１つ記憶され、再生すべき所定の波形時間長の波形を指定する断片波形からなる断片波形列が複数記憶される。 That is, the storage unit stores one command sequence including a command for designating a pitch to be reproduced and a tone length, and a plurality of fragment waveform sequences composed of fragment waveforms designating a waveform having a predetermined waveform time length to be reproduced. Remembered.

ここで、当該複数の断片波形列のそれぞれについて、当該断片波形列には、所定の基準時からの経過時間が対応付けられ、当該断片波形列に含まれる断片波形のそれぞれには、当該断片波形列において当該断片波形列より前に存在する断片波形の個数に当該波形時間長を乗じて当該断片波形列に対応付けられる経過時間を加算した時間が、当該基準時からの経過時間として対応付けられる。 Here, for each of the plurality of fragment waveform sequences, the fragment waveform sequence is associated with an elapsed time from a predetermined reference time, and each fragment waveform included in the fragment waveform sequence is associated with the fragment waveform. The time obtained by multiplying the number of fragment waveforms existing before the fragment waveform sequence in the sequence by the waveform time length and adding the elapsed time associated with the fragment waveform sequence is associated as the elapsed time from the reference time. .

さらに、断片波形選択部は、当該複数の断片波形列のうちいずれかの断片波形列に対応付けられる経過時間が、計測された経過時間以上、計測された経過時間に当該波形時間長を加算した時間未満である場合、当該断片波形列の先頭所定個数の断片波形のそれぞれについて、当該断片波形に対応付けられる経過時間を、当該コマンド時間長の精度で表現した場合の誤差が最小となる誤差最小断片波形を選択する。 Further, the fragment waveform selection unit adds the waveform time length to the measured elapsed time for the elapsed time associated with any one of the plurality of fragment waveform sequences that is equal to or greater than the measured elapsed time. If the time is less than the time, for each of a predetermined number of fragment waveforms at the beginning of the fragment waveform sequence, the error minimum when the elapsed time associated with the fragment waveform is expressed with the accuracy of the command time length is minimized. Select a fragment waveform.

ある断片波形列に対応付けられる経過時間が、計測された経過時間以上、計測された経過時間に当該波形時間長を加算した時間未満である、とは、当該断片波形列の再生を間もなく開始しなければならないことを意味する。 The elapsed time associated with a fragment waveform sequence is equal to or greater than the measured elapsed time and less than the measured elapsed time plus the waveform time length. It means you have to.

本発明では、このような状況になると、当該断片波形列に含まれる断片波形を先頭から順に走査して、どこから再生を開始すると、波形データとコマンドデータとの時間的なずれが最小となるか、を調べる。 In the present invention, in such a situation, the fragment waveform included in the fragment waveform sequence is scanned sequentially from the beginning, and where reproduction starts from where the time lag between the waveform data and the command data is minimized. Find out.

この際の基準となるのが、断片波形を再生すべき経過時間Tを、当該経過時間Tにおけるコマンド時間長Δtにより表現した場合の誤差eである。典型的には、以下の条件を満たす整数nを見つけることができれば、
Δt×n≦T<Δt×(n+1)
当該誤差eは、
e = min(T-Δt×n，Δt×(n+1)-T)
のように表現することができる。 The reference in this case is an error e when the elapsed time T for which the fragment waveform is to be reproduced is expressed by the command time length Δt at the elapsed time T. Typically, if you can find an integer n that satisfies the following condition:
Δt × n ≦ T <Δt × (n + 1)
The error e is
e = min (T-Δt × n, Δt × (n + 1) -T)
It can be expressed as

なお、走査すべき断片波形の個数は、定数とすることができるが、可変とすることも可能であり、当該態様については後述する。 The number of fragment waveforms to be scanned can be a constant, but can also be variable, and this aspect will be described later.

そして、波形再生部は、当該コマンド時間長の精度で計測された経過時間が、選択された誤差最小断片波形に対応付けられる経過時間に至ると、選択された誤差最小断片波形が含まれる断片波形列のうち、当該選択された誤差最小断片波形以降の断片波形の再生を開始する。 When the elapsed time measured with the accuracy of the command time length reaches the elapsed time associated with the selected minimum error fragment waveform, the waveform reproduction unit includes the selected minimum error fragment waveform. In the column, the reproduction of the fragment waveform after the selected minimum error fragment waveform is started.

したがって、本発明によれば、コマンド列による音声の再生と、波形データによる音声の再生と、を、適切に同期させて再生することができるようになる。 Therefore, according to the present invention, it is possible to reproduce the sound by the command sequence and the sound by the waveform data in an appropriately synchronized manner.

また、本発明の音声処理装置において、当該複数の断片波形列のそれぞれについて、当該断片波形列の先頭１個以上の断片波形は、いずれも、当該波形時間長の無音を表す無音断片波形であり、断片波形選択部は、当該断片波形列の先頭の無音断片波形の個数を当該所定個数として、誤差最小断片波形を選択するように構成することができる。 In the audio processing device of the present invention, for each of the plurality of fragment waveform sequences, the first one or more fragment waveforms of the fragment waveform sequence are all silent fragment waveforms representing silence of the waveform time length. The fragment waveform selection unit can be configured to select the minimum error fragment waveform with the number of silent fragment waveforms at the beginning of the fragment waveform sequence as the predetermined number.

すなわち、先頭の無音断片波形のみを走査して、誤差最小の断片波形を選択するのである。ここで、無音断片波形とは、結局のところ、変位が０（定数）である波形データのことである。したがって、断片波形のフォーマットとしては、無音断片波形であるか否かを示すフラグを用意して、無音断片波形である場合には、変位を表す波形データを省略することとすれば、データ量を抑制することができる。 That is, only the leading silent fragment waveform is scanned, and the fragment waveform with the smallest error is selected. Here, the silent fragment waveform is waveform data whose displacement is 0 (constant) after all. Therefore, as a format of the fragment waveform, a flag indicating whether or not it is a silent fragment waveform is prepared, and in the case of a silent fragment waveform, if the waveform data representing the displacement is omitted, the amount of data is reduced. Can be suppressed.

本発明によれば、断片波形列の再生は、いずれも、必ず無音部分から開始されることとなるため、波形データの再生開始時に生じやすい「ポツ」という音が生じるのを防止して、自然な波形データの再生を行うことができるようになる。 According to the present invention, since the reproduction of the fragment waveform sequence is always started from the silent portion, it is possible to prevent the occurrence of a “pop” sound that is likely to occur at the start of the reproduction of the waveform data. Waveform data can be reproduced.

また、本発明の音声処理装置は、経過時間入力受付部をさらに備え、以下のように構成することができる。 The speech processing apparatus of the present invention further includes an elapsed time input receiving unit, and can be configured as follows.

すなわち、経過時間入力受付部は、基準時からの経過時間を指定する経過時間入力を受け付ける。 That is, the elapsed time input reception unit receives an elapsed time input that specifies an elapsed time from the reference time.

これは、楽曲再生を行うメディアプレイヤー等に用意されている早送り、巻き戻し、タイムワープ機能に相当するものであり、ユーザからの指定や当該音声処理装置を制御するプログラムによる指定を受付けるものである。 This corresponds to the fast forward, rewind, and time warp functions prepared in a media player or the like that plays back music, and accepts designation from the user or designation by a program that controls the audio processing device. .

そして、当該経過時間入力が受け付けられた場合、コマンド再生部は、当該コマンド列を先頭から解釈して、当該音長を積算した値に相当する時間が当該経過時間入力に指定される経過時間に至るまで、当該コマンドを無視し、当該経過時間入力に指定される経過時間に至ると、当該コマンド列に指定された音程および音長の音声波形の再生を開始する。 Then, when the elapsed time input is accepted, the command playback unit interprets the command sequence from the beginning, and the time corresponding to the value obtained by integrating the sound length is set to the elapsed time specified in the elapsed time input. Until this time, the command is ignored, and when the elapsed time specified for the elapsed time input is reached, the reproduction of the sound waveform having the pitch and pitch specified in the command sequence is started.

上記のように、コマンド列は先頭から解釈する必要があるため、音を出さないままに楽曲再生に要する時間長を積算して、それが指定された経過時間になるまでスキップをする。そして、指定された経過時間になったら、そこからコマンド解釈に基づく楽曲再生を開始する。 As described above, since it is necessary to interpret the command sequence from the beginning, the time length required for music reproduction is added without producing a sound, and skipping is performed until it reaches the specified elapsed time. When the specified elapsed time is reached, music playback based on command interpretation is started from there.

一方、経過時間計測部は、当該音長を積算した値に相当する時間と、コマンド再生部による音声波形の再生が開始されてからの経過時間と、の和を、当該コマンド時間長の精度で計測された経過時間とする。 On the other hand, the elapsed time measurement unit calculates the sum of the time corresponding to the value obtained by integrating the sound lengths and the elapsed time since the playback of the voice waveform by the command playback unit with the accuracy of the command time length. The measured elapsed time.

すなわち、スキップした分の積算時間を経過時間として計上するのである。 That is, the skipped integration time is counted as the elapsed time.

本発明によれば、コマンド列と波形データによる音声の同期再生において、早送りや巻き戻し、タイムワープ等、途中からの再生を行うことができるようになる。 According to the present invention, it is possible to perform playback from the middle, such as fast forward, rewind, and time warp, in the synchronized playback of audio using a command sequence and waveform data.

また、本発明の音声処理装置において、経過時間計測部は、当該コマンド時間長おきに生じる割り込みの回数を計測することにより、コマンド再生部による音声波形の再生が開始されてからの経過時間を計測し、断片波形選択部は、当該割り込みが生じるごとに、対応付けられる経過時間が、計測された経過時間以上、計測された経過時間に当該波形時間長を加算した時間未満である断片波形列が記憶されているか否かを判断するように構成することができる。 In the speech processing apparatus of the present invention, the elapsed time measurement unit measures the elapsed time from the start of the playback of the voice waveform by the command playback unit by measuring the number of interrupts that occur every other command time length. Then, each time the interrupt occurs, the fragment waveform selection unit generates a fragment waveform sequence whose associated elapsed time is equal to or longer than the measured elapsed time and less than the measured elapsed time plus the waveform time length. It can be configured to determine whether it is stored.

本発明は、上記発明の好適実施形態に係るものであり、コマンド列の解釈を行う際に、コマンド列の再生テンポに同期した所定の周期の割込が生じるものとして、当該割込によって波形データの再生開始も制御するものである。 The present invention relates to a preferred embodiment of the above invention, and it is assumed that when interpreting a command sequence, an interrupt with a predetermined period synchronized with the playback tempo of the command sequence occurs. The playback start is also controlled.

また、本発明の音声処理装置において、波形再生部は、垂直同期割込が生じるごとに、１つの断片波形を再生するように構成することができる。 In the audio processing device of the present invention, the waveform reproduction unit can be configured to reproduce one fragment waveform every time a vertical synchronization interrupt occurs.

本発明は、上記発明の好適実施形態に係るものであり、ゲーム装置などにおいて、画面表示処理と同期して波形データの音声処理を行う場合に本発明を適用するもので、垂直同期割込周期の間に出力されるべき音声データの長さを、波形時間長に一致させることで、波形データの管理ならびに再生処理を容易にするものである。 The present invention relates to a preferred embodiment of the above-described invention, and applies the present invention when performing sound processing of waveform data in synchronization with screen display processing in a game device or the like. By making the length of the audio data to be output during the period coincide with the waveform time length, the management and reproduction processing of the waveform data is facilitated.

また、本発明の音声処理装置において、当該コマンド列は、ＭＩＤＩ（Musical Instrument Digital Interface）もしくはＭＭＬ（Music Macro Language）により表現され、当該断片波形は、ＰＣＭ（Pulse Code Modulation）、ＡＤＰＣＭ（Adaptive Differential PCM）、ＭＰ３（MPeg audio layer-3）もしくはＶｏｒｂｉｓにより表現されるように構成することができる。 In the speech processing apparatus of the present invention, the command sequence is expressed by MIDI (Musical Instrument Digital Interface) or MML (Music Macro Language), and the fragment waveform is PCM (Pulse Code Modulation), ADPCM (Adaptive Differential PCM). ), MP3 (MPeg audio layer-3) or Vorbis.

本発明は、上記発明の好適実施形態に係るものであり、コマンド列のデータとして典型的なものと、波形データとして典型的なものと、を採用するものである。 The present invention relates to a preferred embodiment of the above invention, and employs typical data as command sequence data and typical data as waveform data.

本発明のその他の観点に係る音声処理方法は、記憶部、コマンド再生部、経過時間計測部、断片波形選択部、波形再生部を有する音声処理装置が実行し、コマンド再生工程、経過時間計測工程、断片波形選択工程、波形再生工程を備え、以下のように構成する。 A speech processing method according to another aspect of the present invention is executed by a speech processing apparatus having a storage unit, a command playback unit, an elapsed time measurement unit, a fragment waveform selection unit, and a waveform playback unit, and executes a command playback process and an elapsed time measurement process. And a fragment waveform selection step and a waveform reproduction step, which are configured as follows.

そして、当該断片波形列には、所定の基準時からの経過時間が対応付けられ、当該断片波形列に含まれる断片波形のそれぞれには、当該断片波形列において当該断片波形列より前に存在する断片波形の個数に当該波形時間長を乗じて当該断片波形列に対応付けられる経過時間を加算した時間が、当該基準時からの経過時間として対応付けられる。 An elapsed time from a predetermined reference time is associated with the fragment waveform sequence, and each of the fragment waveforms included in the fragment waveform sequence is present before the fragment waveform sequence in the fragment waveform sequence. The time obtained by multiplying the number of fragment waveforms by the waveform time length and adding the elapsed time associated with the fragment waveform sequence is associated as the elapsed time from the reference time.

一方、コマンド再生工程では、コマンド再生部が、当該コマンド列を先頭から解釈して、当該コマンド列に指定された音程および音長の音声波形の再生を開始する。 On the other hand, in the command playback step, the command playback unit interprets the command sequence from the beginning, and starts playback of a sound waveform having a pitch and a tone length specified in the command sequence.

さらに、経過時間計測工程では、経過時間計測部が、コマンド再生工程にて音声波形の再生が開始されてからの経過時間を、所定のコマンド時間長の精度で計測する。 Further, in the elapsed time measurement step, the elapsed time measurement unit measures the elapsed time from the start of the reproduction of the voice waveform in the command reproduction step with an accuracy of a predetermined command time length.

そして、断片波形選択工程では、断片波形選択部が、当該複数の断片波形のうち、当該断片波形に対応付けられる経過時間を、当該コマンド時間長の精度で表現した場合の誤差が最小となる誤差最小断片波形を選択する。 Then, in the fragment waveform selection step, the error that minimizes the error when the fragment waveform selection unit expresses the elapsed time associated with the fragment waveform among the plurality of fragment waveforms with the accuracy of the command time length. Select the smallest fragment waveform.

一方、波形再生工程では、当該コマンド時間長の精度で計測された経過時間が、選択された誤差最小断片波形に対応付けられる経過時間に至ると、波形再生部が、当該断片波形列のうち、当該選択された誤差最小断片波形以降の断片波形の再生を開始する。 On the other hand, in the waveform reproduction step, when the elapsed time measured with the accuracy of the command time length reaches the elapsed time associated with the selected minimum error fragment waveform, the waveform reproduction unit The reproduction of the fragment waveforms after the selected minimum error fragment waveform is started.

そして、当該複数の断片波形列のそれぞれについて、当該断片波形列には、所定の基準時からの経過時間が対応付けられ、当該断片波形列に含まれる断片波形のそれぞれには、当該断片波形列において当該断片波形列より前に存在する断片波形の個数に当該波形時間長を乗じて当該断片波形列に対応付けられる経過時間を加算した時間が、当該基準時からの経過時間として対応付けられる。 For each of the plurality of fragment waveform sequences, the fragment waveform sequence is associated with an elapsed time from a predetermined reference time, and each of the fragment waveforms included in the fragment waveform sequence is associated with the fragment waveform sequence. The time obtained by multiplying the number of fragment waveforms existing before the fragment waveform sequence by the waveform time length and adding the elapsed time associated with the fragment waveform sequence is associated as the elapsed time from the reference time.

ここで、コマンド再生工程では、コマンド再生部が、当該コマンド列を先頭から解釈して、当該コマンド列に指定された音程および音長の音声波形の再生を開始する。 Here, in the command playback step, the command playback unit interprets the command sequence from the beginning, and starts playback of a speech waveform having a pitch and a tone length specified in the command sequence.

一方、経過時間計測工程では、経過時間計測部が、コマンド再生工程にて音声波形の再生が開始されてからの経過時間を、所定のコマンド時間長の精度で計測する。 On the other hand, in the elapsed time measurement step, the elapsed time measurement unit measures the elapsed time from the start of the reproduction of the voice waveform in the command reproduction step with an accuracy of a predetermined command time length.

さらに、断片波形選択工程では、当該複数の断片波形列のうちいずれかの断片波形列に対応付けられる経過時間が、計測された経過時間以上、計測された経過時間に当該波形時間長を加算した時間未満である場合、断片波形選択部が、当該断片波形列の先頭所定個数の断片波形のそれぞれについて、当該断片波形に対応付けられる経過時間を、当該コマンド時間長の精度で表現した場合の誤差が最小となる誤差最小断片波形を選択する。 Furthermore, in the fragment waveform selection step, the elapsed time associated with any one of the plurality of fragment waveform sequences is equal to or longer than the measured elapsed time, and the waveform time length is added to the measured elapsed time. If the time is less than the time, an error when the fragment waveform selection unit represents the elapsed time associated with the fragment waveform for each of the first predetermined number of fragment waveforms in the fragment waveform sequence with the accuracy of the command time length. Select the smallest error fragment waveform that minimizes.

そして、波形再生工程では、当該コマンド時間長の精度で計測された経過時間が、選択された誤差最小断片波形に対応付けられる経過時間に至ると、波形再生部が、選択された誤差最小断片波形が含まれる断片波形列のうち、当該選択された誤差最小断片波形以降の断片波形の再生を開始する。 In the waveform reproduction step, when the elapsed time measured with the accuracy of the command time length reaches the elapsed time associated with the selected minimum error fragment waveform, the waveform reproduction unit selects the selected minimum error fragment waveform. In the fragment waveform sequence including “”, reproduction of fragment waveforms after the selected minimum error fragment waveform is started.

本発明のその他の観点に係るプログラムは、コンピュータを上記の音声処理装置の各部として機能させるように構成する。 A program according to another aspect of the present invention is configured to cause a computer to function as each unit of the above-described sound processing device.

また、本発明のプログラムは、コンパクトディスク、フレキシブルディスク、ハードディスク、光磁気ディスク、ディジタルビデオディスク、磁気テープ、半導体メモリ等のコンピュータ読取可能な情報記憶媒体に記録することができる。 The program of the present invention can be recorded on a computer-readable information storage medium such as a compact disk, flexible disk, hard disk, magneto-optical disk, digital video disk, magnetic tape, and semiconductor memory.

上記プログラムは、プログラムが実行されるコンピュータとは独立して、コンピュータ通信網を介して配布・販売することができる。また、上記情報記憶媒体は、コンピュータとは独立して配布・販売することができる。 The above program can be distributed and sold via a computer communication network independently of the computer on which the program is executed. The information storage medium can be distributed and sold independently from the computer.

本発明によれば、人間の歌声等を表す波形データと、その伴奏等を表すコマンドデータとを同期させて再生するのに好適な音声処理装置、音声処理方法、ならびに、これらをコンピュータにて実現するプログラムを提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the audio | voice processing apparatus suitable for reproducing | regenerating synchronously the waveform data showing a human singing voice, etc., and the command data showing the accompaniment etc., and these are implement | achieved by computer A program can be provided.

以下に本発明の実施形態を説明する。以下では、理解を容易にするため、ゲーム用の情報処理装置を利用して本発明が実現される実施形態を説明するが、以下に説明する実施形態は説明のためのものであり、本願発明の範囲を制限するものではない。したがって、当業者であればこれらの各要素もしくは全要素をこれと均等なものに置換した実施形態を採用することが可能であるが、これらの実施形態も本発明の範囲に含まれる。 Embodiments of the present invention will be described below. In the following, for ease of understanding, an embodiment in which the present invention is realized using a game information processing device will be described. However, the embodiment described below is for explanation, and the present invention is described. It does not limit the range. Therefore, those skilled in the art can employ embodiments in which each or all of these elements are replaced with equivalent ones, and these embodiments are also included in the scope of the present invention.

図１は、プログラムを実行することにより、本発明の音声処理装置の機能を果たす典型的な情報処理装置の概要構成を示す模式図である。以下、本図を参照して説明する。 FIG. 1 is a schematic diagram showing a schematic configuration of a typical information processing apparatus that performs the function of the speech processing apparatus of the present invention by executing a program. Hereinafter, a description will be given with reference to FIG.

情報処理装置１００は、ＣＰＵ（Central Processing Unit）１０１と、ＲＯＭ１０２と、ＲＡＭ（Random Access Memory）１０３と、インターフェイス１０４と、コントローラ１０５と、外部メモリ１０６と、画像処理部１０７と、ＤＶＤ−ＲＯＭ（Digital Versatile Disc ROM）ドライブ１０８と、ＮＩＣ（Network Interface Card）１０９と、音声処理部１１０と、マイク１１１と、を備える。 The information processing apparatus 100 includes a CPU (Central Processing Unit) 101, a ROM 102, a RAM (Random Access Memory) 103, an interface 104, a controller 105, an external memory 106, an image processing unit 107, and a DVD-ROM. (Digital Versatile Disc ROM) drive 108, NIC (Network Interface Card) 109, audio processing unit 110, and microphone 111.

ゲーム用のプログラムおよびデータを記憶したＤＶＤ−ＲＯＭをＤＶＤ−ＲＯＭドライブ１０８に装着して、情報処理装置１００の電源を投入することにより、当該プログラムが実行され、本実施形態の音声処理装置が実現される。 A DVD-ROM storing a game program and data is loaded in the DVD-ROM drive 108 and the information processing apparatus 100 is turned on to execute the program, thereby realizing the audio processing apparatus of the present embodiment. Is done.

ＣＰＵ１０１は、情報処理装置１００全体の動作を制御し、各構成要素と接続され制御信号やデータをやりとりする。また、ＣＰＵ１０１は、レジスタ（図示せず）という高速アクセスが可能な記憶域に対してＡＬＵ（Arithmetic Logic Unit）（図示せず）を用いて加減乗除等の算術演算や、論理和、論理積、論理否定等の論理演算、ビット和、ビット積、ビット反転、ビットシフト、ビット回転等のビット演算などを行うことができる。さらに、マルチメディア処理対応のための加減乗除等の飽和演算や、三角関数等、ベクトル演算などを高速に行えるように、ＣＰＵ１０１自身が構成されているものや、コプロセッサを備えて実現するものがある。 The CPU 101 controls the overall operation of the information processing apparatus 100 and is connected to each component to exchange control signals and data. Further, the CPU 101 uses arithmetic operations such as addition / subtraction / multiplication / division, logical sum, logical product, etc. using an ALU (Arithmetic Logic Unit) (not shown) for a storage area called a register (not shown) that can be accessed at high speed. , Logic operations such as logical negation, bit operations such as bit sum, bit product, bit inversion, bit shift, and bit rotation can be performed. In addition, the CPU 101 itself is configured so that saturation operations such as addition / subtraction / multiplication / division for multimedia processing, vector operations such as trigonometric functions, etc. can be performed at a high speed, and those provided with a coprocessor. There is.

ＲＯＭ１０２には、電源投入直後に実行されるＩＰＬ（Initial Program Loader）が記録され、これが実行されることにより、ＤＶＤ−ＲＯＭに記録されたプログラムをＲＡＭ１０３に読み出してＣＰＵ１０１による実行が開始される。また、ＲＯＭ１０２には、情報処理装置１００全体の動作制御に必要なオペレーティングシステムのプログラムや各種のデータが記録される。 The ROM 102 records an IPL (Initial Program Loader) that is executed immediately after the power is turned on, and when this is executed, the program recorded on the DVD-ROM is read out to the RAM 103 and execution by the CPU 101 is started. The The ROM 102 stores an operating system program and various data necessary for operation control of the entire information processing apparatus 100.

ＲＡＭ１０３は、データやプログラムを一時的に記憶するためのもので、ＤＶＤ−ＲＯＭから読み出したプログラムやデータ、その他ゲームの進行やチャット通信に必要なデータが保持される。また、ＣＰＵ１０１は、ＲＡＭ１０３に変数領域を設け、当該変数に格納された値に対して直接ＡＬＵを作用させて演算を行ったり、ＲＡＭ１０３に格納された値を一旦レジスタに格納してからレジスタに対して演算を行い、演算結果をメモリに書き戻す、などの処理を行う。 The RAM 103 is for temporarily storing data and programs, and holds programs and data read from the DVD-ROM and other data necessary for game progress and chat communication. Further, the CPU 101 provides a variable area in the RAM 103 and performs an operation by directly operating the ALU on the value stored in the variable, or temporarily stores the value stored in the RAM 103 in the register. Perform operations such as performing operations on registers and writing back the operation results to memory.

インターフェイス１０４を介して接続されたコントローラ１０５は、ユーザがゲーム実行の際に行う操作入力を受け付ける。 The controller 105 connected via the interface 104 receives an operation input performed when the user executes the game.

インターフェイス１０４を介して着脱自在に接続された外部メモリ１０６には、ゲーム等のプレイ状況（過去の成績等）を示すデータ、ゲームの進行状態を示すデータ、ネットワーク対戦の場合のチャット通信のログ（記録）のデータなどが書き換え可能に記憶される。ユーザは、コントローラ１０５を介して指示入力を行うことにより、これらのデータを適宜外部メモリ１０６に記録することができる。 The external memory 106 detachably connected via the interface 104 stores data indicating game play status (past results, etc.), data indicating game progress, and log of chat communication in the case of a network match ( Data) is stored in a rewritable manner. The user can record these data in the external memory 106 as appropriate by inputting an instruction via the controller 105.

ＤＶＤ−ＲＯＭドライブ１０８に装着されるＤＶＤ−ＲＯＭには、ゲームを実現するためのプログラムとゲームに付随する画像データや音声データが記録される。ＣＰＵ１０１の制御によって、ＤＶＤ−ＲＯＭドライブ１０８は、これに装着されたＤＶＤ−ＲＯＭに対する読み出し処理を行って、必要なプログラムやデータを読み出し、これらはＲＡＭ１０３等に一時的に記憶される。 A DVD-ROM mounted on the DVD-ROM drive 108 stores a program for realizing the game and image data and audio data associated with the game. Under the control of the CPU 101, the DVD-ROM drive 108 performs a reading process on the DVD-ROM loaded therein, reads out necessary programs and data, and these are temporarily stored in the RAM 103 or the like.

画像処理部１０７は、ＤＶＤ−ＲＯＭから読み出されたデータをＣＰＵ１０１や画像処理部１０７が備える画像演算プロセッサ（図示せず）によって加工処理した後、これを画像処理部１０７が備えるフレームメモリ（図示せず）に記録する。フレームメモリに記録された画像情報は、所定の同期タイミングでビデオ信号に変換され画像処理部１０７に接続されるモニタ（図示せず）へ出力される。これにより、各種の画像表示が可能となる。 The image processing unit 107 processes the data read from the DVD-ROM by an image arithmetic processor (not shown) included in the CPU 101 or the image processing unit 107, and then processes the processed data on a frame memory ( (Not shown). The image information recorded in the frame memory is converted into a video signal at a predetermined synchronization timing and output to a monitor (not shown) connected to the image processing unit 107. Thereby, various image displays are possible.

画像演算プロセッサは、２次元の画像の重ね合わせ演算やαブレンディング等の透過演算、各種の飽和演算を高速に実行できる。 The image calculation processor can execute a two-dimensional image overlay calculation, a transmission calculation such as α blending, and various saturation calculations at high speed.

また、仮想３次元空間に配置され、各種のテクスチャ情報が付加されたポリゴン情報を、Ｚバッファ法によりレンダリングして、所定の視点位置から仮想３次元空間に配置されたポリゴンを所定の視線の方向へ俯瞰したレンダリング画像を得る演算の高速実行も可能である。 Also, polygon information arranged in the virtual three-dimensional space and added with various texture information is rendered by the Z buffer method, and the polygon arranged in the virtual three-dimensional space from the predetermined viewpoint position is determined in the direction of the predetermined line of sight It is also possible to perform high-speed execution of operations for obtaining rendered images.

さらに、ＣＰＵ１０１と画像演算プロセッサが協調動作することにより、文字の形状を定義するフォント情報にしたがって、文字列を２次元画像としてフレームメモリへ描画したり、各ポリゴン表面へ描画することが可能である。 Further, the CPU 101 and the image arithmetic processor operate in a coordinated manner, so that a character string can be drawn as a two-dimensional image in a frame memory or drawn on the surface of each polygon according to font information that defines the character shape. is there.

ＮＩＣ１０９は、情報処理装置１００をインターネット等のコンピュータ通信網（図示せず）に接続するためのものであり、ＬＡＮ（Local Area Network）を構成する際に用いられる１０ＢＡＳＥ−Ｔ／１００ＢＡＳＥ−Ｔ規格にしたがうものや、電話回線を用いてインターネットに接続するためのアナログモデム、ＩＳＤＮ（Integrated Services Digital Network）モデム、ＡＤＳＬ（Asymmetric Digital Subscriber Line）モデム、ケーブルテレビジョン回線を用いてインターネットに接続するためのケーブルモデム等と、これらとＣＰＵ１０１との仲立ちを行うインターフェース（図示せず）により構成される。 The NIC 109 is used to connect the information processing apparatus 100 to a computer communication network (not shown) such as the Internet, and is based on the 10BASE-T / 100BASE-T standard used when configuring a LAN (Local Area Network). To connect to the Internet using an analog modem, ISDN (Integrated Services Digital Network) modem, ADSL (Asymmetric Digital Subscriber Line) modem, or cable television line. A cable modem or the like and an interface (not shown) that mediates between these and the CPU 101 are configured.

音声処理部１１０は、ＤＶＤ−ＲＯＭから読み出した音声データをアナログ音声信号に変換し、これに接続されたスピーカ（図示せず）から出力させる。また、ＣＰＵ１０１の制御の下、ゲームの進行の中で発生させるべき効果音や楽曲データを生成し、これに対応した音声をスピーカや、ヘッドホン（図示せず）、イヤフォン（図示せず）から出力させる。 The audio processing unit 110 converts audio data read from the DVD-ROM into an analog audio signal and outputs the analog audio signal from a speaker (not shown) connected thereto. Further, under the control of the CPU 101, sound effects and music data to be generated during the progress of the game are generated, and the corresponding sound is output from a speaker, headphones (not shown), and earphones (not shown). Output.

音声処理部１１０では、ＤＶＤ−ＲＯＭに記録された音声データがＭＩＤＩデータである場合には、これが有する音源データを参照して、ＭＩＤＩデータをＰＣＭデータに変換する。また、ADPCM形式やOgg Vorbis形式等の圧縮済音声データである場合には、これを展開してＰＣＭデータに変換する。ＰＣＭデータは、そのサンプリング周波数に応じたタイミングでＤ／Ａ（Digital/Analog）変換を行って、スピーカに出力することにより、音声出力が可能となる。 When the audio data recorded on the DVD-ROM is MIDI data, the audio processing unit 110 refers to the sound source data included in the audio data and converts the MIDI data into PCM data. If the compressed audio data is in ADPCM format or Ogg Vorbis format, it is expanded and converted to PCM data. The PCM data can be output by performing D / A (Digital / Analog) conversion at a timing corresponding to the sampling frequency and outputting it to a speaker.

さらに、情報処理装置１００には、インターフェイス１０４を介してマイク１１１を接続することができる。この場合、マイク１１１からのアナログ信号に対しては、適当なサンプリング周波数でＡ／Ｄ変換を行い、ＰＣＭ形式のディジタル信号として、音声処理部１１０でのミキシング等の処理ができるようにする。 Furthermore, a microphone 111 can be connected to the information processing apparatus 100 via the interface 104. In this case, the analog signal from the microphone 111 is subjected to A / D conversion at an appropriate sampling frequency so that processing such as mixing in the sound processing unit 110 can be performed as a PCM format digital signal.

このほか、情報処理装置１００は、ハードディスク等の大容量外部記憶装置を用いて、ＲＯＭ１０２、ＲＡＭ１０３、外部メモリ１０６、ＤＶＤ−ＲＯＭドライブ１０８に装着されるＤＶＤ−ＲＯＭ等と同じ機能を果たすように構成してもよい。 In addition, the information processing apparatus 100 uses a large-capacity external storage device such as a hard disk so as to perform the same function as the ROM 102, the RAM 103, the external memory 106, the DVD-ROM mounted on the DVD-ROM drive 108, and the like. You may comprise.

以上で説明した情報処理装置１００は、いわゆる「コンシューマ向けテレビゲーム装置」に相当するものであるが、仮想空間を表示するような画像処理を行うものであれば本発明を実現することができる。したがって、携帯電話、携帯ゲーム機器、カラオケ装置、一般的なビジネス用コンピュータなど、種々の計算機上で本発明を実現することが可能である。 The information processing apparatus 100 described above corresponds to a so-called “consumer video game apparatus”, but the present invention can be realized as long as it performs image processing to display a virtual space. Therefore, the present invention can be realized on various computers such as a mobile phone, a portable game device, a karaoke apparatus, and a general business computer.

たとえば、一般的なコンピュータは、上記情報処理装置１００と同様に、ＣＰＵ、ＲＡＭ、ＲＯＭ、ＤＶＤ−ＲＯＭドライブ、および、ＮＩＣを備え、情報処理装置１００よりも簡易な機能を備えた画像処理部を備え、外部記憶装置としてハードディスクを有する他、フレキシブルディスク、光磁気ディスク、磁気テープ等が利用できるようになっている。また、コントローラ１０５ではなく、キーボードやマウスなどを入力装置として利用する。 For example, a general computer, like the information processing apparatus 100, includes an image processing unit that includes a CPU, RAM, ROM, DVD-ROM drive, and NIC and has simpler functions than the information processing apparatus 100. In addition to having a hard disk as an external storage device, a flexible disk, a magneto-optical disk, a magnetic tape, and the like can be used. Further, not the controller 105 but a keyboard or a mouse is used as an input device.

（音声処理装置）
図２は、上記情報処理装置１００にプログラムを実行させることにより実現される音声処理装置の一実施形態の概要構成を示す説明図である。以下、本図を参照して説明する。 (Speech processor)
FIG. 2 is an explanatory diagram showing a schematic configuration of an embodiment of a sound processing apparatus realized by causing the information processing apparatus 100 to execute a program. Hereinafter, a description will be given with reference to FIG.

本実施形態に係る音声処理装置３０１は、記憶部３０２、コマンド再生部３０３、経過時間計測部３０４、断片波形選択部３０５、波形再生部３０６、経過時間入力受付部３０７を備える。 The speech processing apparatus 301 according to the present embodiment includes a storage unit 302, a command playback unit 303, an elapsed time measurement unit 304, a fragment waveform selection unit 305, a waveform playback unit 306, and an elapsed time input reception unit 307.

記憶部３０２には、キャラクターが歌を歌うシーン等で用いる伴奏音声用のコマンド列と、声優が演じた音声用の波形データとが記憶される。 The storage unit 302 stores an accompaniment voice command sequence used in a scene where a character sings a song and voice waveform data played by a voice actor.

コマンド再生部３０３は、上記のコマンド列により音声を再生するものであり、経過時間計測部３０４は、コマンド列に基づく音声の再生が開始されてからの経過時間を計測する。 The command playback unit 303 plays back the voice using the above command sequence, and the elapsed time measurement unit 304 measures the elapsed time since the playback of the voice based on the command sequence is started.

断片波形選択部３０５は、計測された経過時間に基づいて、どの時点で、波形データのどの部分から、再生を開始するかを決定し、波形再生部３０６は、決定された時点で、決定された部分の再生を開始する。 Based on the measured elapsed time, the fragment waveform selection unit 305 determines at which time point from which part of the waveform data to start playback, and the waveform playback unit 306 is determined at the determined time point. Playback of the selected part starts.

また、経過時間入力受付部３０７は、音声を先頭から再生するのではなく、途中から再生する場合に、その開始位置の入力を受け付けるものであり、省略することも可能である。 The elapsed time input accepting unit 307 accepts the input of the start position when the audio is reproduced from the middle instead of being reproduced from the beginning, and can be omitted.

これらによって、たとえばキャラクターが歌を歌うシーンを実現する際に、電子楽器による伴奏音の再生に声優が演じた歌声の再生を同期させることができるようになる。以下、詳細に説明する。 Thus, for example, when realizing a scene in which a character sings a song, the reproduction of the singing voice played by the voice actor can be synchronized with the reproduction of the accompaniment sound by the electronic musical instrument. This will be described in detail below.

すなわち、記憶部３０２には、以下の情報が記憶される。
（１）再生すべき音程および音長を指定するコマンドを含むコマンド列が少なくとも１つ。当該コマンド列は、ＭＩＤＩ（Musical Instrument Digital Interface）もしくはＭＭＬ（Music Macro Language）により表現されるのが典型的であり、たとえば、キャラクターが歌を歌うシーンで再生される伴奏音を表現する。
（２）再生すべき所定の波形時間長の波形を指定する断片波形からなる断片波形列が複数。当該断片波形は、ＰＣＭ（Pulse Code Modulation）、ＡＤＰＣＭ（Adaptive Differential PCM）、ＭＰ３（MPeg audio layer-3）もしくはＶｏｒｂｉｓにより表現されるのが典型的である。たとえば、キャラクターが歌を歌うシーンで再生されるキャラクターの声（声優が演じた音声）を表現する。 That is, the storage unit 302 stores the following information.
(1) At least one command string including a command for designating a pitch to be reproduced and a sound length. The command string is typically expressed by MIDI (Musical Instrument Digital Interface) or MML (Music Macro Language), and for example, represents an accompaniment sound reproduced in a scene where a character sings a song.
(2) A plurality of fragment waveform sequences composed of fragment waveforms specifying a waveform having a predetermined waveform time length to be reproduced. The fragment waveform is typically expressed by PCM (Pulse Code Modulation), ADPCM (Adaptive Differential PCM), MP3 (MPeg audio layer-3), or Vorbis. For example, the voice of a character (sound played by a voice actor) reproduced in a scene where the character sings a song is expressed.

ここで、各断片波形列は、１つのフレーズ、上記の例でいえば、声優が発した音声が、呼吸などの無音によって挟まれている区間ごとに区切られている。 Here, each fragment waveform sequence is divided into one phrase, in the above example, for each section in which the voice produced by the voice actor is sandwiched by silence such as breathing.

断片波形列は、波形再生部３０６における処理の単位となるバッファ長に適合した長さの断片波形の列からなり、断片波形列の先頭の所定個数は、無音を表す波形データの断片波形とするのが典型的である。 The fragment waveform sequence is composed of a sequence of fragment waveforms having a length suitable for the buffer length that is a unit of processing in the waveform reproduction unit 306, and the predetermined number at the beginning of the fragment waveform sequence is a fragment waveform of waveform data representing silence. It is typical.

たとえば、ある楽曲に対するN個の断片波形列A[1]，A[2]，…，A[N]が記憶部３０２に記憶されている場合を考える。 For example, consider a case where N fragment waveform sequences A [1], A [2],..., A [N] for a certain musical piece are stored in the storage unit 302.

i = 1，2，…，Nのそれぞれについて、当該断片波形列A[i]には、以下のような情報が対応付けられる。
（ａ）当該断片波形列A[i]の再生される楽曲中での位置。すなわち、楽曲（伴奏）再生の開始を基準時点としたときの、当該断片波形列A[i]の再生を開始すべき経過時間T[i]。
（ｂ）当該断片波形列A[i]の長さL[i]、すなわち、当該断片波形列A[i]を構成する断片波形の個数。 For each of i = 1, 2,..., N, the following information is associated with the fragment waveform sequence A [i].
(A) The position of the fragment waveform sequence A [i] in the reproduced music. That is, the elapsed time T [i] at which playback of the fragment waveform sequence A [i] should start when the start of music (accompaniment) playback is set as the reference time point.
(B) The length L [i] of the fragment waveform sequence A [i], that is, the number of fragment waveforms constituting the fragment waveform sequence A [i].

ここで、長さL[i]の断片波形列A[i]のj = 1，2，…，L[i]番目の断片波形をA[i][j]と表記することとすると、任意のi，jについて、当該断片波形A[i][j]を再生するのに必要とされる時間長Dは等しい。 Here, if the j = 1, 2,..., L [i] th fragment waveform of the fragment waveform sequence A [i] of length L [i] is expressed as A [i] [j], it is arbitrary. The time length D required to reproduce the fragment waveform A [i] [j] is equal for i and j.

この時間長Dは、波形再生部３０６を構成するハードウェアやソフトウェアの制約によって定められるもので、たとえば、６０分の１秒ごとに生じる垂直同期を利用して波形再生を行うような構成においては、６０分の１秒が、時間長Dに相当する。 This time length D is determined by restrictions of hardware and software constituting the waveform reproduction unit 306. For example, in a configuration in which waveform reproduction is performed using vertical synchronization that occurs every 1/60 second. 1/60 second corresponds to the time length D.

さて、断片波形列A[i]の再生が経過時間t = T[i]から開始される、ということは、以下を意味する。
（１）経過時間t = T[i]で、断片波形A[i][1]の再生を開始し、
（２）経過時間t = T[i]+Dで、断片波形A[i][2]の再生を開始し、
…
（ｊ）経過時間t = T[i]+(j-1)Dで、断片波形A[i][j]の再生を開始し、
… Now, the reproduction of the fragment waveform sequence A [i] starts from the elapsed time t = T [i] means the following.
(1) At the elapsed time t = T [i], playback of the fragment waveform A [i] [1] is started,
(2) At the elapsed time t = T [i] + D, start playing the fragment waveform A [i] [2]
...
(J) At the elapsed time t = T [i] + (j−1) D, the reproduction of the fragment waveform A [i] [j] is started,
...

（Ｌ[ｉ]）経過時間t = T[i]+(L[i]-1)Dで、断片波形A[i][L[i]]の再生を開始し、 (L [i]) At the elapsed time t = T [i] + (L [i] -1) D, playback of the fragment waveform A [i] [L [i]] is started,

（Ｌ[ｉ]＋１）経過時間t = T[i]+L[i]Dに、当該断片波形列A[i]の再生がすべて完了する。 At (L [i] +1) elapsed time t = T [i] + L [i] D, the reproduction of the fragment waveform sequence A [i] is all completed.

すなわち、断片波形A[i][j]には、経過時間T[i]+(j-1)Dが対応付けられる。 That is, the elapsed time T [i] + (j−1) D is associated with the fragment waveform A [i] [j].

また、断片波形列A[1]，A[2]，…，A[N]のそれぞれは、ある一連の波形データのブレス区間を適宜除去して、フレーズに区切ったものとする。したがって、任意の整数1≦i<j≦Nについて、以下が成立する。
T[i]+L[i]D≦T[j] In addition, each of the fragment waveform sequences A [1], A [2],..., A [N] is divided into phrases by appropriately removing the breath section of a series of waveform data. Therefore, for any integer 1 ≦ i <j ≦ N:
T [i] + L [i] D ≦ T [j]

波形データの変位をBバイトで表現することとし、波形データのサンプリングレートがF[Hz]であるとすると、断片波形A[i][j]の長さは、一般にはB×F×Dバイトとなる。これが、波形データのバッファ長に相当するものであり、この長さを単位とすることで、波形データの管理を一貫させることができる。 If the waveform data displacement is expressed in B bytes and the sampling rate of the waveform data is F [Hz], the length of the fragment waveform A [i] [j] is generally B x F x D bytes. It becomes. This corresponds to the buffer length of the waveform data. By using this length as a unit, management of the waveform data can be made consistent.

さて、波形データの再生時の先頭で生じる「ポツ音」を防止するため、断片波形列A[i]の先頭には無音の断片波形を配置するのが一般的である。そこで、上記の情報のほか、断片波形列A[i]には、以下の情報を対応付ける。
（ｃ）断片波形列A[i]の先頭に配置されるべき無音の断片波形の個数S[i]。 Now, in order to prevent “sounds” that occur at the beginning of waveform data reproduction, it is common to place a silent fragment waveform at the beginning of the fragment waveform sequence A [i]. Therefore, in addition to the above information, the following information is associated with the fragment waveform sequence A [i].
(C) Number of silent fragment waveforms S [i] to be placed at the beginning of the fragment waveform sequence A [i].

すると、断片波形A[i][1]，A[i][2]，…，A[i][S[i]]はいずれも無音状態を表現することとなり、典型的には、変位０を意味する整数がF×D個並んだものとなる。これは同じ値が並ぶだけの定数列であるから、記憶部３０２にF×D個の列を記憶する必要はない。当該断片波形A[i][1]，A[i][2]，…，A[i][S[i]]の再生開始時には、変位０を意味する整数がF×D個並んだ定数列を生成する。そして、これを波形データとして再生を行うこととすれば良い。 Then, each of the fragment waveforms A [i] [1], A [i] [2],..., A [i] [S [i]] represents a silent state and typically has a displacement of 0. Integers that mean F × D. Since this is a constant string in which only the same values are arranged, it is not necessary to store F × D columns in the storage unit 302. A constant in which F × D integers representing displacement 0 are arranged at the start of playback of the fragment waveforms A [i] [1], A [i] [2], ..., A [i] [S [i]] Generate a column. Then, this may be reproduced as waveform data.

図３は、本実施形態に係る音声処理装置３０１にて実行される音声処理の制御の流れを示すフローチャートである。以下、本図を参照して説明する。本音声処理は、情報処理装置１００のＣＰＵ１０１の制御の下で行われる。 FIG. 3 is a flowchart showing a flow of control of audio processing executed by the audio processing device 301 according to the present embodiment. Hereinafter, a description will be given with reference to FIG. This sound processing is performed under the control of the CPU 101 of the information processing apparatus 100.

本音声処理が開始されると、ＣＰＵ１０１は、本音声処理を開始する際に指定された再生を開始すべき経過時間を取得する（ステップＳ３５１）。再生開始に係る経過時間は、ユーザによる入力に基づいて指定されることとしても良いし、プログラム等における関数呼び出しの引き数として指定されることとしても良い。したがって、ＣＰＵ１０１は、ＲＡＭ１０３等と共働して、経過時間入力受付部３０７として機能する。 When the audio processing is started, the CPU 101 obtains an elapsed time to start reproduction designated when starting the audio processing (step S351). The elapsed time related to the start of reproduction may be specified based on an input by the user, or may be specified as an argument of a function call in a program or the like. Therefore, the CPU 101 functions as the elapsed time input receiving unit 307 in cooperation with the RAM 103 and the like.

そして、コマンド再生部３０３に、音声出力をしないままに、再生開始に係る経過時間に至るまで、記憶部３０２に記憶されるコマンド列を解釈しつつ、スキップさせる（ステップＳ３５２）。 Then, the command reproduction unit 303 is skipped while interpreting the command sequence stored in the storage unit 302 until the elapsed time related to the reproduction start is reached without outputting the sound (step S352).

コマンド再生部３０３により解釈されるコマンドには、以下のようなものがある。
（１）再生のテンポを指定するコマンド。
（２）１小節が何拍かを指定するコマンド。
（３）各チャネルに割り当てる音色（楽器）を指定するコマンド。
（４）小節の開始。
（５）当該小節内で、どのチャネルでどの音程でどの音長で音を出すか。
（６）小節の終了。 The commands interpreted by the command playback unit 303 include the following.
(1) A command for designating the playback tempo.
(2) A command that specifies how many beats a measure has.
(3) A command for designating a tone (instrument) assigned to each channel.
(4) Start of measure.
(5) Which channel and which pitch will produce a sound within which measure.
(6) End of measure.

ＭＩＤＩなどの再生においては、スキップできる単位を小節単位とするのが典型的であり、この場合、上記（１）（２）（４）（６）のコマンドを走査することによって、当該コマンドが再生開始からどれだけ時間が経過したときに再生されるものか、を取得することができる。 In the reproduction of MIDI or the like, it is typical that the unit that can be skipped is a measure unit. In this case, the command is reproduced by scanning the commands (1), (2), (4), and (6). You can get how much time has elapsed since the start to play back.

必要な数だけコマンド列からコマンドをスキップしたら、コマンド列の走査によって得られたスキップに係る経過時間をカウンタ変数tに格納して（ステップＳ３５３）、ＣＰＵ１０１は、以降のコマンド列のコマンドの再生を開始するよう、コマンド再生部３０３に指示を出す（ステップＳ３５４）。 When the necessary number of commands are skipped from the command sequence, the elapsed time related to the skip obtained by scanning the command sequence is stored in the counter variable t (step S353), and the CPU 101 reproduces the commands in the subsequent command sequence. Is instructed to the command playback unit 303 to start (step S354).

以降は、コマンド列の再生処理は、本音声処理とは並行して行われることになる。そして、コマンド列の再生処理が進むと、その経過時間に合わせて、カウンタ変数tの値も次第に増加する。すなわち、カウンタ変数tを参照すれば、コマンド列の再生を開始してから現在までどれだけ時間が経過したか、を知ることができるようにする。 Thereafter, the command string reproduction process is performed in parallel with the voice process. As the command string reproduction process proceeds, the value of the counter variable t gradually increases in accordance with the elapsed time. That is, by referring to the counter variable t, it is possible to know how much time has elapsed from the start of command sequence reproduction to the present.

このような並行処理的な音声処理は、情報処理装置１００の音声処理部１１０が担うものであり、カウンタ変数tを増加させる処理は、現在の再生のテンポにおける１拍を所定の整定数で割り切った時間（以下「コマンド時間長ΔT」という。）ごとに行われる。このような構成は、各種のＰＳＧ、ＭＩＤＩ、ＭＭＬによる音声再生機構としては一般的なものである。 Such parallel audio processing is performed by the audio processing unit 110 of the information processing apparatus 100, and the process of increasing the counter variable t is to divide one beat at the current playback tempo by a predetermined integer constant. For each specified time (hereinafter referred to as “command time length ΔT”). Such a configuration is a general sound reproduction mechanism using various PSG, MIDI, and MML.

カウンタ変数tの値は、コマンド時間長ΔTの精度で増加していくが、コマンド時間長ΔTは、楽曲のテンポによって変化する。 The value of the counter variable t increases with the accuracy of the command time length ΔT, but the command time length ΔT changes depending on the tempo of the music.

したがって、カウンタ変数tの更新は、再生時間がコマンド時間長ΔT経過するごとに１ずつ増えることとするのではなく、現在のコマンド時間長ΔTずつ増えることとするのが望ましい。すなわち、カウンタ変数tは、現在の経過時間そのものを意味するものとする。 Therefore, it is desirable that the update of the counter variable t is not incremented by 1 every time the reproduction time passes the command time length ΔT, but is incremented by the current command time length ΔT. That is, the counter variable t means the current elapsed time itself.

カウンタ変数tを増加させる処理、ならびに、カウンタ変数tの値を取得する処理は、音声処理部１１０、ＲＡＭ１０３、ＣＰＵ１０１が共働して行うため、これらは、経過時間計測部３０４として機能する。 Since the process of increasing the counter variable t and the process of acquiring the value of the counter variable t are performed in cooperation with the voice processing unit 110, the RAM 103, and the CPU 101, these function as the elapsed time measurement unit 304. .

さて、伴奏音に相当するコマンド列の再生が開始されると、経過時間計測部３０４は、現在の経過時間tを取得して（ステップＳ３５５）、断片波形列に対応付けられる経過時間T[1]，T[2]，…，T[N]のうち、以下の条件を満たすものを探す（ステップＳ３５６）。
T[i]-K×D≦t<T[i] Now, when the reproduction of the command sequence corresponding to the accompaniment sound is started, the elapsed time measuring unit 304 acquires the current elapsed time t (step S355), and the elapsed time T [1 associated with the fragment waveform sequence is obtained. ], T [2],..., T [N] are searched for those satisfying the following conditions (step S356).
T [i] -K × D ≦ t <T [i]

ここで、正定数Kは適当な閾値を定める定数であり、ステップＳ３５６〜Ｓ３５９の計算に要する時間長がK×D以下となるように設定する。多くのコンピュータでは、K = 1とすれば十分である。また、これが成立するように、各S[1]，S[2]，…，S[N]の値を設定することとしても良い。 Here, the positive constant K is a constant that determines an appropriate threshold value, and is set so that the time length required for the calculation in steps S356 to S359 is not more than K × D. For many computers, K = 1 is sufficient. Moreover, it is good also as setting the value of each S [1], S [2], ..., S [N] so that this may be materialized.

上記の条件を満たすものがあれば（ステップＳ３５６；Ｙｅｓ）、近々断片波形列A[i]の再生を開始しなければならないこととなる。 If there is one that satisfies the above conditions (step S356; Yes), the reproduction of the fragment waveform sequence A [i] must be started soon.

そこで、断片波形列A[i]の先頭S[i]個分（本実施形態では、無音断片波形の個数分であるが、一般には、所定の個数とすることができる。）に対応付けられる経過時間を、現在のコマンド時間長Δtで表現した場合の誤差e[1]，e[2]，…，e[S[i]]を計算する（ステップＳ３５７）。 Therefore, it is associated with the first S [i] pieces of the fragment waveform sequence A [i] (in this embodiment, the number is the number of silent fragment waveforms, but can generally be a predetermined number). Errors e [1], e [2],..., E [S [i]] when the elapsed time is expressed by the current command time length Δt are calculated (step S357).

ここで、誤差の計算にあたっては、引き数の小数点以下を切り捨てる切捨て演算（床演算）floor(・)を考える。切捨て演算floor(x)は、
x≦n<x+1
を満たす整数値nを返すものである。 Here, in calculating the error, a truncation operation (floor operation) floor (•) for truncating the decimal part of the argument is considered. The truncation operation floor (x) is
x ≦ n <x + 1
Returns an integer value n that satisfies.

すると、ある時間Tをコマンド時間長Δtの精度で表現したときの誤差を表す関数e(T,Δt)は、
e(T,Δt) = min(T-Δt×floor(T/Δt)，Δt×(floor(T/Δt)+1)-T)
のように計算できる。 Then, a function e (T, Δt) representing an error when a certain time T is expressed with the accuracy of the command time length Δt is:
e (T, Δt) = min (T-Δt × floor (T / Δt), Δt × (floor (T / Δt) +1) -T)
It can be calculated as follows.

このほか、以下のような各種の誤差関数を利用しても良い。
e(T,Δt) = T-Δt×floor(T/Δt)；
e(T,Δt) = Δt×(floor(T/Δt)+1)-T In addition, the following various error functions may be used.
e (T, Δt) = T−Δt × floor (T / Δt);
e (T, Δt) = Δt × (floor (T / Δt) +1) -T

また、切捨て演算floor(・)のほかに、切上げ演算（天井演算）ceil(・)や、丸め演算（四捨五入演算）round(・)などを利用することとしても良い。 In addition to the rounding operation floor (•), a rounding operation (ceiling operation) ceil (•), a rounding operation (rounding operation) round (•), or the like may be used.

上記のように誤差関数が定められれば、先頭S[i]個分の断片波形についての誤差は、以下のように計算することが可能である。
（１）e[1] = e(T[i],Δt)；
（２）e[2] = e(T[i]+D,Δt)；
…；
（ｊ）e[j] = e(T[j]+(j-1)D,Δt)；
…； If the error function is determined as described above, the error for the first S [i] fragment waveforms can be calculated as follows.
(1) e [1] = e (T [i], Δt);
(2) e [2] = e (T [i] + D, Δt);
…;
(J) e [j] = e (T [j] + (j−1) D, Δt);
…;

（Ｓ[ｉ]）e[S[i]] = e(T[i]+(S[i]-1)D,Δt) (S [i]) e [S [i]] = e (T [i] + (S [i] -1) D, Δt)

そして、誤差e[1]，e[2]，…，e[S[i]]のうち、値が最小となるものを選択する（ステップＳ３５８）。 Then, the error e [1], e [2],..., E [S [i]] having the smallest value is selected (step S358).

図４は、ある断片波形列A[i]についての断片波形の誤差の概念を示す説明図である。以下、本図を参照して説明する。 FIG. 4 is an explanatory diagram showing the concept of fragment waveform error for a certain fragment waveform sequence A [i]. Hereinafter, a description will be given with reference to FIG.

本図に示す例では、横軸として時間軸４０１が設定されている。また、コマンド時間長Δtごとに、時間軸４０１に区切りが入っている。 In the example shown in this figure, a time axis 401 is set as the horizontal axis. In addition, the time axis 401 is divided for each command time length Δt.

また、本例では、断片波形列A[i]（元の断片波形列４０２）については、S[i] = 4となっており、断片波形列A[i]（元の断片波形列４０２）の先頭を構成する無音断片波形A[i][1]，A[i][2]，…，A[i][4]（図中では、白い長方形で描いている。）の後に、有音断片波形A[i][5]，A[i][6]，…（図中では、斜線を引いた長方形で描いている。）が続いている。 In this example, for the fragment waveform sequence A [i] (original fragment waveform sequence 402), S [i] = 4, and the fragment waveform sequence A [i] (original fragment waveform sequence 402). , A [i] [1], A [i] [2], ..., A [i] [4] (shown in white rectangles in the figure) The sound fragment waveforms A [i] [5], A [i] [6],... (In the figure, they are drawn with hatched rectangles) are continued.

無音断片波形の開始時点のそれぞれを、Δtを単位に計測したときの誤差e[1]，e[2]，…，e[4]を比較すると、本図に示す例では、e[3]が最小である。 Comparing the error e [1], e [2], ..., e [4] when the start time of the silent fragment waveform is measured in units of Δt, in the example shown in this figure, e [3] Is the smallest.

そこで、A[i][3]の再近傍の区切り４０３から、断片波形列の一部A[i][3]，A[i][4]，A[i][5]，…，A[i][L[i]]（再生される断片波形列４０４）の再生を開始することとする。 Therefore, from the re-neighboring section 403 of A [i] [3], a part of the fragment waveform sequence A [i] [3], A [i] [4], A [i] [5],. [i] [L [i]] (reproduced fragment waveform sequence 404) starts to be reproduced.

元の断片波形列４０２と、再生される断片波形列４０４と、には、時間ずれe[3]が生じてしまうことになるが、これは、同期再生においては不可避と考えられるもので、本手法によれば、時間ずれを最小とすることができる。 A time lag e [3] will occur between the original fragment waveform sequence 402 and the reproduced fragment waveform sequence 404. This is considered inevitable in synchronous reproduction. According to the method, the time lag can be minimized.

また、再生される断片波形列４０４は、元の断片波形列４０２の先頭の無音部分をいくつか除去したものであるから、途中から再生したとしても、聴取者が不自然な印象を受けることはない。 In addition, since the fragment waveform sequence 404 to be reproduced is obtained by removing some of the silent portions at the beginning of the original fragment waveform sequence 402, even if the fragment waveform sequence 404 is reproduced from the middle, the listener will not receive an unnatural impression. Absent.

一方、断片波形列の先頭に無音断片波形を並べることなく、有音の断片波形を並べ、所定の個数S[i]を定めることとした場合には、波形データによる音声が途中から突然開始されることとなるが、用途によっては、このような簡易な手法を採用することとしても良い。 On the other hand, if a predetermined number S [i] is determined by arranging sounded fragment waveforms without arranging silent fragment waveforms at the beginning of the fragment waveform sequence, the sound based on the waveform data is suddenly started from the middle. However, depending on the application, such a simple method may be adopted.

また、誤差の大きさが同じものがある場合には、できるだけ先頭に近いものを選択することとする。一般に、先頭の断片波形のうち、k番目の誤差e[k]が最小であったとする。 Also, if there are errors with the same error magnitude, the error that is as close to the head as possible is selected. In general, it is assumed that the k-th error e [k] of the first fragment waveform is the smallest.

すると、断片波形列A[i]を再生するにあたっては、時刻T[k]+(k-1)Dから断片波形A[i][k]，A[i][k+1]，…，A[i][L[i]]を、順に、再生することとすれば、誤差最小で、コマンド再生との同期が可能となる。 Then, in reproducing the fragment waveform sequence A [i], the fragment waveforms A [i] [k], A [i] [k + 1],... From the time T [k] + (k−1) D If A [i] [L [i]] are reproduced in order, synchronization with command reproduction is possible with a minimum error.

図３に戻り、経過時間計測部３０４により計測される経過時間tを取得して（ステップＳ３５９）、当該tが、
T[k]+(k-1)D≦t
となったか否か、すなわち、コマンド時間長Δtの精度で計測した経過時間tが選択された断片波形の再生を開始すべき経過時間T[k]+(k-1)Dに至ったか否かを判定し（ステップＳ３６０）、そうなるまで（ステップＳ３６０；Ｎｏ）、ステップＳ３５９に戻る処理を繰り返す。 Returning to FIG. 3, the elapsed time t measured by the elapsed time measuring unit 304 is acquired (step S359), and the t is
T [k] + (k-1) D≤t
That is, whether or not the elapsed time t measured with the accuracy of the command time length Δt has reached the elapsed time T [k] + (k−1) D at which playback of the selected fragment waveform should be started (Step S360), and until that happens (step S360; No), the process of returning to step S359 is repeated.

一方、断片波形A[i][k]の再生を開始すべき時刻T[k]+(k-1)Dに至った場合（ステップＳ３６０；Ｙｅｓ）、ＣＰＵ１０１は、断片波形A[i][k]，A[i][k+1]，…，A[i][L[i]]の再生を順に行うように、音声処理部１１０に指示を出して（ステップＳ３６１）、ステップＳ３５５に戻る。 On the other hand, when the time T [k] + (k−1) D at which reproduction of the fragment waveform A [i] [k] should be started is reached (step S360; Yes), the CPU 101 determines that the fragment waveform A [i]. An instruction is given to the audio processing unit 110 to sequentially reproduce [k], A [i] [k + 1],..., A [i] [L [i]] (step S361), and step S355. Return to.

なお、ステップＳ３６０において、等号（＝）による比較ではなく、不等号（≦）による比較を行うのは、ステップＳ３５９〜Ｓ３６０の繰り返しの中で、テンポの変更が生じ、コマンド時間長が変化する、という極めて例外的な状況に対応するためである。 In step S360, the comparison with the inequality sign (≦), not the comparison with the equal sign (=), is performed because the tempo changes during the repetition of steps S359 to S360, and the command time length changes. This is to cope with extremely exceptional situations.

したがって、一般的な状況では、ステップＳ３６０における条件が成立する場合には、
T[k]+(k-1)D = t
が成立すると考えて良い。 Therefore, in a general situation, when the condition in step S360 is satisfied,
T [k] + (k-1) D = t
You can think that

一方、再生を開始すべき断片波形列が存在しない場合（ステップＳ３５６；Ｎｏ）、そのままステップＳ３５５に戻れば良い。 On the other hand, if there is no fragment waveform sequence to start reproduction (step S356; No), the process may return to step S355 as it is.

上記の処理において、経過時間tを取得してから何らかの処理を実行する繰り返し部分については、経過時間計測部３０４を実現する音声処理部１１０の割込ハンドラにより実装しても良い。 In the above processing, the repeated portion that executes some processing after obtaining the elapsed time t may be implemented by the interrupt handler of the voice processing unit 110 that implements the elapsed time measuring unit 304.

このように、本実施形態によれば、コマンド列による音声処理の時間制御単位を利用しつつ、波形データの再生をコマンド列による再生に同期させて、音声の再生を行うことができる。 As described above, according to the present embodiment, it is possible to reproduce the sound by synchronizing the reproduction of the waveform data with the reproduction by the command sequence while using the time control unit of the audio processing by the command sequence.

なお、本実施形態においては、途中からの再生を可能とするために、経過時間入力受付部３０７およびステップＳ３５１〜Ｓ３５３を設けたが、これらの構成は省略して、カウンタ変数tを０に初期化してからステップＳ３５４以降を実行するような形態を採用することも可能である。 In this embodiment, the elapsed time input receiving unit 307 and steps S351 to S353 are provided in order to enable playback from the middle. However, these configurations are omitted, and the counter variable t is initialized to 0. It is also possible to adopt a form in which step S354 and subsequent steps are executed after conversion.

本実施形態をキャラクターが歌う状況に応用した場合には、キャラクターの歌声は共通とし、伴奏音を奏でるＭＩＤＩやＭＭＬなどを複数用意しておけば、その中からいずれか１つを選定することで、ある歌声を様々な伴奏音で再生することが可能となる。 When this embodiment is applied to a situation where a character sings, the character's singing voice is common, and if multiple MIDI or MML playing accompaniment sounds are prepared, one of them can be selected. A certain singing voice can be reproduced with various accompaniment sounds.

以上説明したように、本発明によれば、人間の歌声等を表す波形データと、その伴奏等を表すコマンドデータとを同期させて再生するのに好適な音声処理装置、音声処理方法、ならびに、これらをコンピュータにて実現するプログラムを提供することができる。 As described above, according to the present invention, a sound processing device, a sound processing method, and a sound processing apparatus suitable for synchronizing and reproducing waveform data representing a human singing voice and the like and command data representing the accompaniment, etc. A program for realizing these by a computer can be provided.

プログラムを実行することにより、本発明の音声処理装置の機能を果たす典型的な情報処理装置の概要構成を示す模式図である。It is a schematic diagram which shows the outline | summary structure of the typical information processing apparatus which performs the function of the audio | voice processing apparatus of this invention by running a program. 上記情報処理装置にプログラムを実行させることにより実現される音声処理装置の一実施形態の概要構成を示す説明図である。It is explanatory drawing which shows the schematic structure of one Embodiment of the audio processing apparatus implement | achieved by making the said information processing apparatus run a program. 本実施形態に係る音声処理装置にて実行される音声処理の制御の流れを示すフローチャートである。It is a flowchart which shows the flow of control of the audio | voice process performed with the audio | voice processing apparatus which concerns on this embodiment. ある断片波形列A[i]についての断片波形の誤差の概念を示す説明図である。It is explanatory drawing which shows the concept of the error of the fragment waveform about a certain fragment waveform sequence A [i].

Explanation of symbols

１００情報処理装置
１０１ＣＰＵ
１０２ＲＯＭ
１０３ＲＡＭ
１０４インターフェイス
１０５コントローラ
１０６外部メモリ
１０７画像処理部
１０８ＤＶＤ−ＲＯＭドライブ
１０９ＮＩＣ
１１０音声処理部
１１１マイク
３０１音声処理装置
３０２記憶部
３０３コマンド再生部
３０４経過時間計測部
３０５断片波形選択部
３０６波形再生部
３０７経過時間入力受付部
４０１時間軸
４０２元の断片波形列
４０３誤差最小の断片波形の最近傍の区切り（再生の開始時点）
４０４再生される断片波形列 100 Information processing apparatus 101 CPU
102 ROM
103 RAM
104 Interface 105 Controller 106 External Memory 107 Image Processing Unit 108 DVD-ROM Drive 109 NIC
DESCRIPTION OF SYMBOLS 110 Voice processing part 111 Microphone 301 Voice processing apparatus 302 Storage part 303 Command reproduction part 304 Elapsed time measurement part 305 Fragment waveform selection part 306 Waveform reproduction part 307 Elapsed time input reception part 401 Time axis 402 Original fragment waveform sequence 403 Minimal error The nearest segmentation of the fragment waveform (at the start of playback)
404 Fragment waveform sequence to be reproduced

Claims

One command sequence including a command for designating the pitch and tone length to be reproduced is stored, a waveform waveform of a predetermined waveform time length is designated, and a fragment waveform sequence comprising a plurality of fragment waveforms to be reproduced continuously Including a storage unit for storing
The fragment waveform sequence is associated with an elapsed time from a predetermined reference time, and each fragment waveform included in the fragment waveform sequence has a fragment waveform existing before the fragment waveform sequence in the fragment waveform sequence. The time obtained by multiplying the number of the waveform times by the waveform time length and adding the elapsed time associated with the fragment waveform sequence is associated as the elapsed time from the reference time,
A command playback unit that interprets the command sequence from the beginning and starts playback of a sound waveform having a pitch and a pitch specified in the command sequence;
An elapsed time measuring unit that measures the elapsed time from the start of the reproduction of the audio waveform by the command reproducing unit with an accuracy of a predetermined command time length;
Of the plurality of fragment waveforms, a fragment waveform selection unit that selects an error minimum fragment waveform that minimizes an error when the elapsed time associated with the fragment waveform is expressed with the accuracy of the command time length,
When the elapsed time measured with the accuracy of the command time length reaches the elapsed time associated with the selected minimum error fragment waveform, the fragments after the selected minimum error fragment waveform in the fragment waveform sequence An audio processing apparatus comprising: a waveform reproduction unit that starts waveform reproduction.

There is provided a storage unit for storing one command sequence including a command for designating a pitch to be reproduced and a tone length, and storing a plurality of fragment waveform sequences composed of fragment waveforms for designating a waveform having a predetermined waveform time length to be reproduced. ,
For each of the plurality of fragment waveform sequences, the fragment waveform sequence is associated with an elapsed time from a predetermined reference time, and each of the fragment waveforms included in the fragment waveform sequence is associated with the fragment waveform sequence. The time obtained by multiplying the number of fragment waveforms existing before the fragment waveform sequence by the waveform time length and adding the elapsed time associated with the fragment waveform sequence is associated as the elapsed time from the reference time,
A command playback unit that interprets the command sequence from the beginning and starts playback of a sound waveform having a pitch and a pitch specified in the command sequence;
An elapsed time measuring unit that measures the elapsed time from the start of the reproduction of the audio waveform by the command reproducing unit with an accuracy of a predetermined command time length;
When the elapsed time associated with any one of the plurality of fragment waveform sequences is equal to or longer than the measured elapsed time and less than the time obtained by adding the waveform time length to the measured elapsed time, For each of a predetermined number of fragment waveforms at the beginning of the fragment waveform sequence, a fragment that selects the minimum error fragment waveform that minimizes the error when the elapsed time associated with the fragment waveform is expressed with the accuracy of the command time length. Waveform selector,
When the elapsed time measured with the accuracy of the command time length reaches the elapsed time associated with the selected minimum error fragment waveform, among the fragment waveform sequence including the selected minimum error fragment waveform, An audio processing apparatus comprising: a waveform reproduction unit that starts reproduction of fragment waveforms after the selected minimum error fragment waveform.

The speech processing apparatus according to claim 2,
For each of the plurality of fragment waveform sequences, the first one or more fragment waveforms of the fragment waveform sequence are all silent fragment waveforms representing silence of the waveform time length,
The speech processing apparatus, wherein the fragment waveform selection unit selects a minimum error fragment waveform with the number of silent fragment waveforms at the head of the fragment waveform sequence as the predetermined number.

The speech processing apparatus according to claim 2 or 3,
An elapsed time input receiving unit that receives an elapsed time input that specifies an elapsed time from the reference time;
When the elapsed time input is accepted, the command playback unit interprets the command sequence from the beginning, and the time corresponding to the value obtained by integrating the sound length reaches the elapsed time specified in the elapsed time input. Until this command is ignored and the elapsed time specified for the elapsed time input is reached, playback of the sound waveform of the pitch and pitch specified in the command sequence is started,
The elapsed time measuring unit calculates a sum of a time corresponding to a value obtained by integrating the sound lengths and an elapsed time since the reproduction of the voice waveform by the command reproduction unit is started with the accuracy of the command time length. A voice processing device characterized by having measured elapsed time.

The speech processing apparatus according to any one of claims 2 to 4,
The elapsed time measurement unit measures the elapsed time from the start of the reproduction of the voice waveform by the command reproduction unit by measuring the number of interruptions that occur every other command time length,
The fragment waveform selection unit, each time the interrupt occurs, the associated elapsed time is not less than the measured elapsed time and less than a time obtained by adding the waveform time length to the measured elapsed time. It is judged whether it is memorize | stored. The audio processing apparatus characterized by the above-mentioned.

The speech processing apparatus according to any one of claims 2 to 5,
The waveform processing unit reproduces one fragment waveform every time a vertical synchronization interrupt occurs.

The speech processing apparatus according to any one of claims 1 to 6,
The command sequence is expressed by MIDI (Musical Instrument Digital Interface) or MML (Music Macro Language),
An audio processing apparatus characterized in that the fragment waveform is expressed by PCM (Pulse Code Modulation), ADPCM (Adaptive Differential PCM), MP3 (MPeg audio layer-3) or Vorbis.

A speech processing method executed by a speech processing apparatus having a storage unit, a command playback unit, an elapsed time measurement unit, a fragment waveform selection unit, and a waveform playback unit,
The storage unit stores one command sequence including a command for designating a pitch to be reproduced and a tone length, designates a waveform of a sound having a predetermined waveform time length, and a plurality of fragments to be reproduced continuously. A fragment waveform sequence consisting of waveforms is stored,
The fragment waveform sequence is associated with an elapsed time from a predetermined reference time, and each fragment waveform included in the fragment waveform sequence has a fragment waveform existing before the fragment waveform sequence in the fragment waveform sequence. The time obtained by multiplying the number of the waveform times by the waveform time length and adding the elapsed time associated with the fragment waveform sequence is associated as the elapsed time from the reference time,
A command playback step in which the command playback unit interprets the command sequence from the beginning and starts playback of a sound waveform having a pitch and a pitch specified in the command sequence;
An elapsed time measuring step in which the elapsed time measuring unit measures an elapsed time from the start of reproduction of a voice waveform in the command reproduction step with a precision of a predetermined command time length;
The fragment in which the fragment waveform selection unit selects the minimum error fragment waveform that minimizes the error when the elapsed time associated with the fragment waveform is expressed with the accuracy of the command time length among the plurality of fragment waveforms. Waveform selection process,
When the elapsed time measured with the accuracy of the command time length reaches the elapsed time associated with the selected minimum error fragment waveform, the waveform reproduction unit selects the selected error in the fragment waveform sequence. An audio processing method comprising: a waveform reproduction step of starting reproduction of a fragment waveform after the minimum fragment waveform.

A speech processing method executed by a speech processing apparatus having a storage unit, a command playback unit, an elapsed time measurement unit, a fragment waveform selection unit, and a waveform playback unit,
The storage unit stores one command sequence including a command for designating a pitch to be reproduced and a tone length, and stores a plurality of fragment waveform sequences composed of fragment waveforms designating a waveform having a predetermined waveform time length to be reproduced. And
For each of the plurality of fragment waveform sequences, the fragment waveform sequence is associated with an elapsed time from a predetermined reference time, and each of the fragment waveforms included in the fragment waveform sequence is associated with the fragment waveform sequence. The time obtained by multiplying the number of fragment waveforms existing before the fragment waveform sequence by the waveform time length and adding the elapsed time associated with the fragment waveform sequence is associated as the elapsed time from the reference time,
A command playback step in which the command playback unit interprets the command sequence from the beginning and starts playback of a sound waveform having a pitch and a pitch specified in the command sequence;
An elapsed time measuring step in which the elapsed time measuring unit measures an elapsed time from the start of reproduction of a voice waveform in the command reproduction step with an accuracy of a predetermined command time length;
When the elapsed time associated with any one of the plurality of fragment waveform sequences is equal to or longer than the measured elapsed time and less than the time obtained by adding the waveform time length to the measured elapsed time, An error that minimizes an error when the fragment waveform selection unit represents the elapsed time associated with the fragment waveform with the accuracy of the command time length for each of the first predetermined number of fragment waveforms in the fragment waveform sequence. Fragment waveform selection step for selecting the minimum fragment waveform,
When the elapsed time measured with the accuracy of the command time length reaches the elapsed time associated with the selected minimum error fragment waveform, the waveform reproduction unit includes the fragment including the selected minimum error fragment waveform. An audio processing method comprising: a waveform reproduction step of starting reproduction of a fragment waveform after the selected minimum error fragment waveform in the waveform sequence.

Computer
One command sequence including a command for designating the pitch and tone length to be reproduced is stored, a waveform waveform of a predetermined waveform time length is designated, and a fragment waveform sequence comprising a plurality of fragment waveforms to be reproduced continuously Function as a storage unit that stores
The fragment waveform sequence is associated with an elapsed time from a predetermined reference time, and each fragment waveform included in the fragment waveform sequence has a fragment waveform existing before the fragment waveform sequence in the fragment waveform sequence. The time obtained by multiplying the number of times by the waveform time length and adding the elapsed time associated with the fragment waveform sequence is made to function as the elapsed time from the reference time,
A command playback unit that interprets the command sequence from the beginning and starts playback of a sound waveform having a pitch and a pitch specified in the command sequence;
An elapsed time measuring unit that measures the elapsed time from the start of the reproduction of the audio waveform by the command reproducing unit with an accuracy of a predetermined command time length;
Of the plurality of fragment waveforms, a fragment waveform selection unit that selects an error minimum fragment waveform that minimizes an error when the elapsed time associated with the fragment waveform is expressed with the accuracy of the command time length,
When the elapsed time measured with the accuracy of the command time length reaches the elapsed time associated with the selected minimum error fragment waveform, fragments after the selected minimum error fragment waveform in the fragment waveform sequence A program that further functions as a waveform playback unit that starts waveform playback.

Computer
Function as a storage unit that stores one command sequence including a command that specifies a pitch and a tone length to be reproduced, and that stores a plurality of fragment waveform sequences that consist of fragment waveforms that specify a waveform having a predetermined waveform time length to be reproduced Let
For each of the plurality of fragment waveform sequences, the fragment waveform sequence is associated with an elapsed time from a predetermined reference time, and each of the fragment waveforms included in the fragment waveform sequence is associated with the fragment waveform sequence. Function that the time obtained by multiplying the number of fragment waveforms existing before the fragment waveform sequence by the waveform time length and adding the elapsed time associated with the fragment waveform sequence is correlated as the elapsed time from the reference time Let
A command playback unit that interprets the command sequence from the beginning and starts playback of a sound waveform having a pitch and a pitch specified in the command sequence;
An elapsed time measuring unit that measures the elapsed time from the start of the reproduction of the audio waveform by the command reproducing unit with an accuracy of a predetermined command time length;
When the elapsed time associated with any one of the plurality of fragment waveform sequences is equal to or longer than the measured elapsed time and less than the time obtained by adding the waveform time length to the measured elapsed time, For each of a predetermined number of fragment waveforms at the beginning of the fragment waveform sequence, a fragment that selects the minimum error fragment waveform that minimizes the error when the elapsed time associated with the fragment waveform is expressed with the accuracy of the command time length. Waveform selector,
When the elapsed time measured with the accuracy of the command time length reaches the elapsed time associated with the selected minimum error fragment waveform, among the fragment waveform sequence including the selected minimum error fragment waveform, A program which further functions as a waveform reproduction unit for starting reproduction of fragment waveforms after the selected minimum error fragment waveform.