JP6304092B2

JP6304092B2 - Display control apparatus and program

Info

Publication number: JP6304092B2
Application number: JP2015062817A
Authority: JP
Inventors: 伸行浅野
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2015-03-25
Filing date: 2015-03-25
Publication date: 2018-04-04
Anticipated expiration: 2035-03-25
Also published as: JP2016183999A

Description

本発明は、歌唱音高を表示する表示制御装置、及びプログラムに関する。 The present invention relates to a display control device for displaying a singing pitch and a program.

従来、複数の音符のうちの少なくとも一部に歌詞が割り当てられた楽曲に対する歌唱音声を分析して歌唱音高を推定し、その楽曲のピアノロール上に当該歌唱音高を重ねて表示する表示装置が知られている（特許文献１参照）。 Conventionally, a display device that analyzes a singing voice of a song in which lyrics are assigned to at least a part of a plurality of notes, estimates a singing pitch, and displays the singing pitch on a piano roll of the song. Is known (see Patent Document 1).

この特許文献１に記載された表示装置においては、入力された音声を周波数解析（例えば、ＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ））した結果に基づいて歌唱音高を推定している。 In the display device described in Patent Document 1, the singing pitch is estimated based on the result of frequency analysis (for example, FFT (Fast Fourier Transform)) of the input voice.

特開２００８−０２０７９８号公報Japanese Patent Laid-Open No. 2008-020798

このように、特許文献１に記載された表示装置では、歌唱音高の推定に周波数解析が必要となるため、音声が入力されてから歌唱音高の推定が完了するまで、ひいては、歌唱音高を表示するまでに時間を要する。 As described above, in the display device described in Patent Document 1, since frequency analysis is required for estimating the singing pitch, until the estimation of the singing pitch is completed after the voice is input, the singing pitch is eventually extended. It takes time to display.

この結果、従来の表示装置においては、現在歌唱している区間と、歌唱音高が表示される区間とにズレが生じる。このようなズレによって、従来の表示装置の利用者は、違和感を覚える、という課題があった。 As a result, in the conventional display device, there is a difference between the currently singing section and the section in which the singing pitch is displayed. Due to such a shift, there is a problem that the user of the conventional display device feels uncomfortable.

そこで、本発明は、利用者にとって、より違和感が少ない表示を実現することを目的とする。 Therefore, an object of the present invention is to realize a display with less discomfort for the user.

上記目的を達成するためになされた本発明の一態様は、取得手段と、特定手段と、算出手段と、推定手段と、表示制御手段とを備える表示制御装置である。
取得手段は、対象楽曲の演奏中に入力された音声である音声データを取得する。対象楽曲とは、複数の音符のうちの少なくとも一部に歌詞が割り当てられた楽曲であり、かつ、指定された楽曲である。 One aspect of the present invention made to achieve the above object is a display control apparatus including an acquisition unit, a specification unit, a calculation unit, an estimation unit, and a display control unit.
The acquisition means acquires audio data that is audio input during the performance of the target music piece. The target song is a song in which lyrics are assigned to at least a part of a plurality of notes, and is a designated song.

特定手段は、取得手段で取得した音声データに基づいて、その音声データにおける歌唱音高を、規定された区間である単位歌唱区間ごとに特定する。そして、算出手段は、お手本データに基づいて、特定手段で特定した歌唱音高と、その歌唱音高の単位歌唱区間に対応する歌唱音符の音高との歌唱音高差を算出する。ここで言うお手本データとは、対象楽曲において歌詞が割り当てられた音符である歌唱音符それぞれの音高を表すデータである。 The specifying means specifies the singing pitch in the voice data for each unit singing section, which is a specified section, based on the voice data acquired by the acquiring means. Then, the calculation means calculates a singing pitch difference between the singing pitch specified by the specifying means and the pitch of the singing note corresponding to the unit singing section of the singing pitch based on the model data. The model data referred to here is data representing the pitch of each singing note, which is a note to which lyrics are assigned in the target music.

さらに、推定手段が、後続単位区間に対応する歌唱音符の音高に、算出手段で算出した歌唱音高差を加算することで、後続単位区間に対する歌唱音高を表す歌唱推定音高を推定する。ここで言う後続単位区間とは、現時点での単位歌唱区間である現単位区間よりも後の単位歌唱区間である。 Further, the estimation means estimates the estimated singing pitch representing the singing pitch for the subsequent unit section by adding the singing pitch difference calculated by the calculating means to the pitch of the singing note corresponding to the subsequent unit section. . The subsequent unit section referred to here is a unit singing section after the current unit section which is a unit singing section at the present time.

そして、表示制御手段が、推定手段で推定された歌唱推定音高を、後続単位区間と対応付けて表示する。
表示制御装置では、過去の歌唱音高に基づいて、次に歌唱すべき単位歌唱区間を含む後続単位区間の歌唱音高である歌唱推定音高を推定している。そして、その推定した歌唱推定音高を、後続単位区間に対応する歌唱音符と対応付けて表示している。 Then, the display control means displays the estimated singing pitch estimated by the estimation means in association with the subsequent unit section.
In the display control apparatus, the estimated singing pitch, which is the singing pitch of the subsequent unit section including the unit singing section to be sung next, is estimated based on the past singing pitch. And the estimated song estimation pitch is displayed in association with the song note corresponding to the subsequent unit section.

したがって、表示制御装置によれば、後続単位区間が実際に歌唱される前に、その後続単位区間において歌唱される歌唱推定音高を表示できる。よって、表示制御装置によれば、現在歌唱している単位歌唱区間と、歌唱音高が表示される単位歌唱区間とのズレを低減できる。 Therefore, according to the display control apparatus, the estimated singing pitch sung in the subsequent unit section can be displayed before the subsequent unit section is actually sung. Therefore, according to the display control device, it is possible to reduce the deviation between the unit singing section currently being sung and the unit singing section in which the singing pitch is displayed.

さらに言えば、表示制御装置によれば、利用者自身の歌唱音高を調整するまでのタイムラグを抑制できる。この結果、表示制御装置によれば、利用者にとって、より違和感が少ない表示を実現することができる。 Furthermore, according to the display control apparatus, the time lag until the user's own singing pitch is adjusted can be suppressed. As a result, according to the display control device, it is possible to realize a display with less discomfort for the user.

表示制御装置においては、補正手段を更に備えていてもよい。
この補正手段は、推定手段で推定した歌唱推定音高を、時間軸に沿って現単位区間よりも前の少なくとも１つの前単位区間での歌唱音高と、少なくとも１つの前単位区間に対応する歌唱音符の音高との音高差に基づいて補正する。 The display control apparatus may further include a correction unit.
The correcting means corresponds to the estimated singing pitch estimated by the estimating means to the singing pitch in at least one previous unit section before the current unit section along the time axis, and to at least one previous unit section. Correction is made based on the pitch difference from the pitch of the singing note.

このような表示制御装置では、現単位区間よりも時間軸に沿って前の前単位区間に対する歌唱音高と、その前単位区間に対応する歌唱音符の音高との音高差に基づいて補正することができる。 In such a display control device, correction is performed based on the pitch difference between the singing pitch for the previous unit section and the pitch of the singing note corresponding to the previous unit section along the time axis from the current unit section. can do.

補正手段は、前単位区間での歌唱音高と、その前単位区間に対応する歌唱音符の音高との音高差が時間軸に沿って拡大していれば、算出手段で算出した現単位区間における音高差よりも音高の差が大きくなるように、歌唱推定音高を補正してもよい。 If the pitch difference between the singing pitch in the previous unit section and the pitch of the singing note corresponding to the previous unit section is enlarged along the time axis, the correcting means is the current unit calculated by the calculating means. The estimated singing pitch may be corrected so that the pitch difference is larger than the pitch difference in the section.

例えば、対象楽曲において、歌詞が割り当てられた複数の音符のうち、特定の音符で音高が急に高くなる音符の並びが繰り返される場合を想定する。このような想定においては、音高が急に高くなる音符を、時間軸に沿った最初の段階では歌唱できる。しかしながら、時間軸に沿って後の段階となるほど、当該音高が急に高くなる音符を利用者が発声しづらくなり、当該音符を歌唱できなくなることが想定される。このような場合、音高が急に高くなる音符の並びの繰り返しの中で後の段階の音符ほど、音符の音高と歌唱音高との差分が大きくなる。 For example, in the target music, a case is assumed where, among a plurality of notes to which lyrics are assigned, a sequence of notes whose pitches are suddenly increased with specific notes is repeated. In such an assumption, a note whose pitch suddenly increases can be sung at the first stage along the time axis. However, it is assumed that the later the stage along the time axis, the more difficult it is for the user to utter a note whose pitch suddenly increases, and the note cannot be sung. In such a case, the difference between the pitch of the note and the singing pitch becomes larger as the note of the later stage in the repetition of the arrangement of the notes where the pitch suddenly increases.

これに対し、表示制御装置によれば、前単位区間に対応する歌唱音符の音高との音高差が時間軸に沿って拡大していれば、現単位区間での音高差よりも音高の差が大きくなるように歌唱推定音高を補正するため、利用者が実際に歌唱した場合の状況に応じて歌唱推定音高を補正できる。これにより、表示制御装置によれば、歌唱推定音高を、利用者の歌唱実体に近い音高とすることができる。 On the other hand, according to the display control device, if the pitch difference with the pitch of the singing note corresponding to the previous unit section is expanding along the time axis, the sound is higher than the pitch difference in the current unit section. Since the estimated song pitch is corrected so that the difference in height becomes larger, the estimated song pitch can be corrected according to the situation when the user actually sings. Thereby, according to the display control apparatus, a song estimation pitch can be made into the pitch close | similar to a user's song entity.

さらに、補正手段は、前単位区間での歌唱音高と、その前単位区間に対応する歌唱音符の音高との音高差が時間軸に沿って縮小していれば、算出手段で算出した現単位区間における音高差よりも音高の差が小さくなるように、歌唱推定音高を補正してもよい。 Furthermore, the correction means is calculated by the calculation means if the pitch difference between the singing pitch in the previous unit section and the pitch of the singing note corresponding to the previous unit section is reduced along the time axis. The estimated singing pitch may be corrected so that the pitch difference is smaller than the pitch difference in the current unit section.

対象楽曲における音符の並びとして、上記のような並びを想定した場合、音高が急に高くなる音符を、時間軸に沿った最初の段階では歌唱できず、時間軸に沿って後の段階となるほど、当該音高が急に高くなる音符に利用者が対応できるようになり、当該音符を歌唱できるようになることも想定される。このような場合、音高が急に高くなる音符の並びの繰り返しの中で後の段階の音符ほど、音符の音高と歌唱音高との差分が小さくなる。 Assuming the above-mentioned arrangement of notes in the target song, notes that suddenly increase in pitch cannot be sung at the first stage along the time axis, and later stages along the time axis. It is assumed that the user can respond to a note whose pitch increases suddenly and can sing the note. In such a case, the difference between the pitch of the note and the singing pitch becomes smaller as the note of the later stage in the repetition of the arrangement of the notes where the pitch suddenly increases.

これに対し、表示制御装置によれば、前単位区間に対応する歌唱音符の音高との音高差が時間軸に沿って縮小していれば、現単位区間での音高差よりも音高の差が小さくなるように歌唱推定音高を補正するため、利用者が実際に歌唱した場合の状況に応じて歌唱推定音高を補正できる。 On the other hand, according to the display control device, if the pitch difference with the pitch of the singing note corresponding to the previous unit interval is reduced along the time axis, the sound is higher than the pitch difference in the current unit interval. Since the estimated song pitch is corrected so that the difference in height becomes smaller, the estimated song pitch can be corrected according to the situation when the user actually sings.

これにより、表示制御装置によれば、歌唱推定音高を、利用者の歌唱実体に近い音高とすることができる。
本発明の他の態様は、取得手順と、特定手順と、算出手順と、推定手順と、表示制御手順とをコンピュータに実行させるプログラムであってもよい。 Thereby, according to the display control apparatus, a song estimation pitch can be made into the pitch close | similar to a user's song entity.
Another aspect of the present invention may be a program that causes a computer to execute an acquisition procedure, a specific procedure, a calculation procedure, an estimation procedure, and a display control procedure.

取得手順では、音声データを取得する。特定手順では、取得した音声データに基づいて、その音声データにおける歌唱音高を単位歌唱区間ごとに特定する。算出手順では、歌唱音符それぞれの音高を表すお手本データに基づいて、特定した歌唱音高と、その歌唱音高の単位歌唱区間に対応する歌唱音符の音高との歌唱音高差を算出する。 In the acquisition procedure, audio data is acquired. In the specifying procedure, the singing pitch in the voice data is specified for each unit singing section based on the acquired voice data. In the calculation procedure, the singing pitch difference between the specified singing pitch and the pitch of the singing note corresponding to the unit singing section of the singing pitch is calculated based on the model data representing the pitch of each singing note. .

そして、推定手順では、後続単位区間に対応する歌唱音符の音高に、算出手順で算出した歌唱音高差を加算することで、歌唱推定音高を推定する。さらに、表示制御手順では、推定された歌唱推定音高を、後続単位区間と対応付けて表示する。 In the estimation procedure, the estimated singing pitch is estimated by adding the singing pitch difference calculated in the calculation procedure to the pitch of the singing note corresponding to the subsequent unit section. Furthermore, in the display control procedure, the estimated singing estimated pitch is displayed in association with the subsequent unit section.

このように、本発明がプログラムとしてなされていれば、記録媒体から必要に応じてコンピュータにロードさせて起動することや、必要に応じて通信回線を介してコンピュータに取得させて起動することにより用いることができる。そして、コンピュータに各手順を実行させることで、そのコンピュータを表示制御装置として機能させることができる。 As described above, if the present invention is implemented as a program, it is used by loading the computer from a recording medium as necessary and starting it, or by acquiring it and starting it through a communication line as necessary. be able to. And by making a computer perform each procedure, the computer can be functioned as a display control apparatus.

なお、ここで言う記録媒体には、例えば、ＤＶＤ−ＲＯＭ、ＣＤ−ＲＯＭ、ハードディスク等のコンピュータ読み取り可能な電子媒体を含む。 The recording medium referred to here includes, for example, a computer-readable electronic medium such as a DVD-ROM, a CD-ROM, and a hard disk.

本発明が適用されたカラオケ装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the karaoke apparatus to which this invention was applied. 表示処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of a display process. 表示処理における歌唱音高の特定方法の一例を示す説明図である。It is explanatory drawing which shows an example of the specific method of the song pitch in a display process. 表示処理における表示態様の一例を示す説明図である。It is explanatory drawing which shows an example of the display mode in a display process. 推定処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of an estimation process. 歌唱音高の補正態様の一例を示す説明図である。It is explanatory drawing which shows an example of the correction | amendment aspect of a song pitch. 歌唱音高の補正態様の一例を示す説明図である。It is explanatory drawing which shows an example of the correction | amendment aspect of a song pitch.

以下、本発明の実施形態を図面と共に説明する。
＜カラオケ装置＞
図１に示すカラオケ装置３０は、ユーザが指定した楽曲（以下、対象楽曲と称す）を演奏し、その演奏音と共に歌唱音声を放音する装置である。さらに、カラオケ装置３０は、歌唱音声を表示する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
<Karaoke equipment>
The karaoke device 30 shown in FIG. 1 is a device that plays a song designated by a user (hereinafter referred to as a target song) and emits a singing voice together with the performance sound. Furthermore, the karaoke apparatus 30 displays a singing voice.

カラオケ装置３０は、通信部３２と、入力受付部３４と、楽曲再生部３６と、記憶部３８と、音声制御部４０と、映像制御部４６と、制御部５０とを備えている。
通信部３２は、通信網を介して、カラオケ装置３０が外部との間で通信を行う。入力受付部３４は、外部からの操作に従って情報や指令の入力を受け付ける入力機器である。ここで言う入力機器とは、例えば、キーやスイッチ、リモコンの受付部などである。 The karaoke apparatus 30 includes a communication unit 32, an input reception unit 34, a music playback unit 36, a storage unit 38, an audio control unit 40, a video control unit 46, and a control unit 50.
In the communication unit 32, the karaoke apparatus 30 communicates with the outside via a communication network. The input receiving unit 34 is an input device that receives input of information and commands in accordance with external operations. The input device referred to here is, for example, a key, a switch, a reception unit of a remote controller, or the like.

楽曲再生部３６は、ＭＩＤＩ楽曲ＭＤに基づく楽曲の演奏を実行する。この楽曲再生部３６は、例えば、ＭＩＤＩ音源である。音声制御部４０は、音声の入出力を制御するデバイスであり、出力部４２と、マイク入力部４４とを備えている。 The music playback unit 36 performs a music performance based on the MIDI music MD. The music reproducing unit 36 is, for example, a MIDI sound source. The voice control unit 40 is a device that controls voice input / output, and includes an output unit 42 and a microphone input unit 44.

マイク入力部４４には、マイク６２が接続される。これにより、マイク入力部４４は、マイク６２を介して入力された音声を取得する。出力部４２には、スピーカ６０が接続されている。出力部４２は、楽曲再生部３６によって再生される楽曲の音源信号、マイク入力部４４からの歌唱音の音源信号をスピーカ６０に出力する。スピーカ６０は、出力部４２から入力された音源信号を音に換えて出力する。 A microphone 62 is connected to the microphone input unit 44. As a result, the microphone input unit 44 acquires the sound input via the microphone 62. A speaker 60 is connected to the output unit 42. The output unit 42 outputs the sound source signal of the music reproduced by the music reproducing unit 36 and the sound source signal of the singing sound from the microphone input unit 44 to the speaker 60. The speaker 60 outputs the sound source signal input from the output unit 42 instead of sound.

映像制御部４６は、制御部５０から送られてくる映像データに基づく映像または画像を出力する。映像制御部４６には、映像または画像を表示する表示部６４が接続されている。 The video control unit 46 outputs a video or an image based on the video data sent from the control unit 50. The video control unit 46 is connected to a display unit 64 that displays video or images.

記憶部３８は、記憶内容を読み書き可能に構成された周知の記憶装置である。この記憶部３８には、少なくとも、ＭＩＤＩ楽曲ＭＤが格納される。
制御部５０は、ＲＯＭ５２，ＲＡＭ５４，ＣＰＵ５６を少なくとも有した周知のコンピュータを中心に構成されている。ＲＯＭ５２は、電源が切断されても記憶内容を保持する必要がある処理プログラムやデータを記憶する。ＲＡＭ５４は、処理プログラムやデータを一時的に記憶する。ＣＰＵ５６は、ＲＯＭ５２やＲＡＭ５４に記憶された処理プログラムに従って各処理を実行する。 The storage unit 38 is a well-known storage device configured to be able to read and write stored contents. The storage unit 38 stores at least a MIDI music piece MD.
The control unit 50 is configured around a known computer having at least a ROM 52, a RAM 54, and a CPU 56. The ROM 52 stores processing programs and data that need to retain stored contents even when the power is turned off. The RAM 54 temporarily stores processing programs and data. The CPU 56 executes each process according to a processing program stored in the ROM 52 or the RAM 54.

本実施形態のＲＯＭ５２には、ＭＩＤＩ楽曲ＭＤに基づいて、対象楽曲を演奏してスピーカ６０から演奏音を放音すると共に表示部６４に歌詞を表示する再生処理を、制御部５０が実行するための処理プログラムが記憶されている。さらに、本実施形態のＲＯＭ５２には、対象楽曲の歌唱旋律と共に、利用者の歌唱音声を表示する表示処理を、制御部５０が実行するための処理プログラムが記憶されている。
＜ＭＩＤＩ楽曲＞
ＭＩＤＩ楽曲ＭＤは、楽曲ごとに予め用意されたものであり、楽曲データと、歌詞データと、楽曲情報とを有している。 In the ROM 52 of the present embodiment, the control unit 50 executes a reproduction process of playing the target music and emitting the performance sound from the speaker 60 and displaying the lyrics on the display unit 64 based on the MIDI music MD. Are stored. Furthermore, the ROM 52 of the present embodiment stores a processing program for the control unit 50 to execute display processing for displaying the user's singing voice together with the singing melody of the target music.
<MIDI music>
The MIDI music MD is prepared in advance for each music, and has music data, lyrics data, and music information.

このうち、楽曲データは、周知のＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｔｒｕｍｅｎｔＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）規格によって、一つの楽曲の楽譜を表したデータである。この楽曲データは、当該楽曲にて用いられる楽器ごとの楽譜を表す楽譜トラックを少なくとも有している。 Among these, the music data is data representing the score of one music according to the well-known MIDI (Musical Instrument Digital Interface) standard. This music data has at least a musical score track representing a musical score for each musical instrument used in the music.

楽譜トラックには、ＭＩＤＩ音源から出力される個々の演奏音について、少なくとも、音高（いわゆるノートナンバー）と、ＭＩＤＩ音源が演奏音を出力する期間（以下、音符長と称す）とが規定されている。楽譜トラックにおける音符長は、当該演奏音のノートオンタイミングと、当該演奏音のノートオフタイミングとによって規定されている。 The musical score track defines at least a pitch (so-called note number) and a period during which the MIDI sound source outputs the performance sound (hereinafter referred to as a note length) for each performance sound output from the MIDI sound source. Yes. The note length in the score track is defined by the note-on timing of the performance sound and the note-off timing of the performance sound.

この楽譜トラックの中には、歌唱旋律（即ち、メロディライン）を担当する楽器の楽譜を表し、かつ、その歌唱旋律を構成する演奏音（即ち、音符）を時間軸に沿って配置したお手本データが含まれる。なお、歌唱旋律を担当する楽器として、例えば、ビブラフォンが考えられる。 In this musical score track, sample data representing the musical score of the musical instrument in charge of the singing melody (ie, melody line) and the performance sounds (ie, notes) constituting the singing melody are arranged along the time axis. Is included. For example, a vibraphone is considered as an instrument in charge of singing melody.

一方、歌詞データは、楽曲の歌詞に関するデータであり、歌詞テキストデータと、歌詞出力データとを備えている。歌詞テキストデータは、楽曲の歌詞を表す。歌詞出力データは、歌詞を構成する文字（以下、「歌詞構成文字」と称す）の出力タイミングである歌詞出力タイミングを、楽曲データの演奏と対応付けるタイミング対応関係が規定されたデータである。また、タイミング対応関係においては、楽曲における歌唱旋律を構成する演奏音（即ち、音符）のうち、少なくとも一部の音符に歌詞構成文字を割り当てることが規定されている。 On the other hand, the lyric data is data relating to the lyrics of the music, and includes lyric text data and lyric output data. The lyric text data represents the lyrics of the music. The lyrics output data is data in which a timing correspondence relationship that associates the lyrics output timing, which is the output timing of characters constituting the lyrics (hereinafter referred to as “lyric constituent characters”), with the performance of the music data. In addition, the timing correspondence relationship stipulates that lyrics constituent characters are assigned to at least some of the performance sounds (that is, notes) constituting the singing melody in the music.

楽曲情報は、楽曲に関する情報（例えば、楽曲名、アーティスト名など）であり、楽曲を識別する識別情報（即ち、楽曲ＩＤ）を含む。
＜表示処理＞
次に、制御部５０が実行する表示処理について説明する。 The music information is information about the music (for example, music name, artist name, etc.), and includes identification information (that is, music ID) for identifying the music.
<Display processing>
Next, display processing executed by the control unit 50 will be described.

この表示処理は、再生処理において対象楽曲の演奏が開始されると、予め規定された時間間隔で繰り返し起動される。
そして、表示処理が起動されると、制御部５０は、図２に示すように、マイク６２を介して入力された音声（即ち、マイク入力）を音声データとして取得する（Ｓ１１０）。続いて、制御部５０は、Ｓ１１０で取得した音声データに基づいて、その音声データにおける歌唱音高（基本周波数）を表すピッチを算出する（Ｓ１２０）。 This display process is repeatedly started at predetermined time intervals when the performance of the target music is started in the reproduction process.
Then, when the display process is activated, the control unit 50 acquires the voice (that is, the microphone input) input through the microphone 62 as voice data as shown in FIG. 2 (S110). Then, the control part 50 calculates the pitch showing the singing pitch (basic frequency) in the audio | voice data based on the audio | voice data acquired by S110 (S120).

本実施形態におけるＳ１２０では、制御部５０は、図３に示すように、予め規定された単位時間を表す分析窓を互いに隣接するように音声データに設定する。ここで言う分析窓は、音符の音価よりも短い時間長であり、特許請求の範囲に記載された歌唱単位区間の一例である。なお、図３に示す（１）から（９）は１つ１つの分析窓を意味する。 In S120 in the present embodiment, as shown in FIG. 3, the control unit 50 sets the analysis windows representing the predetermined unit times in the audio data so as to be adjacent to each other. The analysis window here is a time length shorter than the note value of a note, and is an example of a singing unit section described in the claims. Note that (1) to (9) shown in FIG. 3 mean each analysis window.

そして、Ｓ１２０では、制御部５０は、設定された分析窓それぞれの音声データについて周波数解析（例えば、ＤＦＴ）を実施する。さらに、Ｓ１２０では、制御部５０は、自己相関の結果、最も強い周波数成分をピッチとすることで、１つの分析窓に対して１つのピッチを算出している。なお、ピッチの算出手法は、これに限るものではなく、種種の周知の手法が考えられる。 In S120, the control unit 50 performs frequency analysis (for example, DFT) on the audio data of each set analysis window. Further, in S120, the control unit 50 calculates one pitch for one analysis window by setting the strongest frequency component as a pitch as a result of autocorrelation. Note that the pitch calculation method is not limited to this, and various known methods can be considered.

さらに、表示処理では、制御部５０は、Ｓ１２０で算出したピッチを時間軸に沿って記憶する（Ｓ１３０）。このＳ１３０でのピッチの記憶先は、ＲＡＭ５４であってもよいし、記憶部３８であってもよい。 Further, in the display process, the control unit 50 stores the pitch calculated in S120 along the time axis (S130). The storage location of the pitch in S130 may be the RAM 54 or the storage unit 38.

続いて、表示処理では、制御部５０は、比較対象区間におけるピッチが記憶されているか否かを判定する（Ｓ１４０）。ここで言う比較対象区間とは、歌唱旋律を構成し、歌詞が割り当てられた音符の少なくとも一部分に対応する分析窓である。この比較対象区間に含まれる分析窓の数は、「単数」であってもよいし、「複数」であってもよい。 Subsequently, in the display process, the control unit 50 determines whether or not the pitch in the comparison target section is stored (S140). The comparison target section referred to here is an analysis window corresponding to at least a part of a note constituting a singing melody and assigned lyrics. The number of analysis windows included in the comparison target section may be “single” or “plural”.

そして、Ｓ１４０での判定の結果、比較対象区間のピッチが記憶されていなければ（Ｓ１４０：ＮＯ）、制御部５０は、詳しくは後述するＳ１７０へと表示処理を移行させる。一方、Ｓ１４０での判定の結果、比較対象区間のピッチが記憶されていれば（Ｓ１４０：ＹＥＳ）、制御部５０は、表示処理をＳ１５０へと移行させる。 If the pitch of the comparison target section is not stored as a result of the determination in S140 (S140: NO), the control unit 50 shifts the display process to S170 described later in detail. On the other hand, as a result of the determination in S140, if the pitch of the comparison target section is stored (S140: YES), the control unit 50 shifts the display process to S150.

そのＳ１５０では、制御部５０は、お手本データに基づいて歌唱音高差を算出する。ここで言う歌唱音高差とは、現時点での分析窓（以下、「現単位区間」と称す）のピッチと、その現単位区間に対応する音符（以下、「歌唱音符」と称す）における音高との差分である。このＳ１５０では、制御部５０は、お手本データによって表された歌唱音符の音高中心から、現単位区間のピッチを減算することで、歌唱音高差を算出する。 In S150, the control unit 50 calculates the singing pitch difference based on the model data. The singing pitch difference mentioned here is the pitch of the current analysis window (hereinafter referred to as “current unit section”) and the sound in the note corresponding to the current unit section (hereinafter referred to as “singing note”). It is the difference from high. In S150, the control unit 50 calculates the singing pitch difference by subtracting the pitch of the current unit section from the pitch center of the singing note represented by the model data.

続いて、制御部５０は、Ｓ１５０で算出した歌唱音高差を歌唱音符の音高と対応付けて記憶する（Ｓ１６０）。このＳ１６０においては、制御部５０は、歌唱音高差を時間軸に沿って記憶する。なお、歌唱音高差の記憶先は、ＲＡＭ５４であってもよいし、記憶部３８であってもよい。 Subsequently, the control unit 50 stores the singing pitch difference calculated in S150 in association with the pitch of the singing note (S160). In S160, the control unit 50 stores the singing pitch difference along the time axis. Note that the storage destination of the singing pitch difference may be the RAM 54 or the storage unit 38.

その後、制御部５０は、表示処理をＳ１７０へと移行させる。
そのＳ１７０では、制御部５０は、推定元区間のピッチが記憶されているか否かを判定する。ここで言う「推定元区間」とは、歌唱旋律を構成し、かつ、歌詞が割り当てられた音符の少なくとも一部分に対応する分析窓のうち、現単位区間よりも、時間軸に沿って過去の設定数分の分析窓までの複数の分析窓である。ここで言う「設定数」は、「単数」であってもよいし、「複数」であってもよい。推定元区間の具体例として、「Ａメロ」、「Ｂメロ」などの１フレーズが考えられる。 Thereafter, the control unit 50 shifts the display process to S170.
In S170, the control unit 50 determines whether or not the pitch of the estimation source section is stored. The "estimation source section" here refers to the setting of the past along the time axis in the analysis window corresponding to at least a part of the notes to which the melody is assigned and the lyrics are assigned. It is a plurality of analysis windows up to several minutes analysis window. The “set number” referred to here may be “single” or “plural”. As a specific example of the estimation source section, one phrase such as “A melody” and “B melody” is conceivable.

なお、本実施形態では、詳しくは後述する推定処理において、推定元区間のピッチ及び歌唱音高差に基づいて、後続単位区間での歌唱音高を推定する。ここで言う「後続単位区間」とは、歌唱旋律を構成し、かつ、歌詞が割り当てられた音符の少なくとも一部分に対応する分析窓のうち、現単位区間よりも後の少なくとも１つの分析窓である。「後続単位区間」の一例として、「Ａメロ」、「Ｂメロ」などの１フレーズが考えられる。 In the present embodiment, the singing pitch in the subsequent unit section is estimated based on the pitch of the estimation source section and the singing pitch difference in the estimation process described in detail later. The “subsequent unit section” referred to here is at least one analysis window that constitutes a singing melody and that corresponds to at least a part of a note to which lyrics are assigned, and that is after the current unit section. . As an example of the “subsequent unit section”, one phrase such as “A melody” or “B melody” is conceivable.

このＳ１７０での判定の結果、推定元区間のピッチが記憶されていなければ（Ｓ１７０：ＮＯ）、即ち、後続単位区間での歌唱音高の推定に用いる音声データのピッチが格納されていなければ、制御部５０は、本表示処理を終了する。 As a result of the determination in S170, if the pitch of the estimation source section is not stored (S170: NO), that is, if the pitch of the voice data used for estimating the singing pitch in the subsequent unit section is not stored, The control unit 50 ends this display process.

一方、Ｓ１７０での判定の結果、推定元区間のピッチが記憶されていれば（Ｓ１７０：ＹＥＳ）、即ち、後続単位区間での歌唱音高の推定に用いる音声データの区間分のピッチが格納されていれば、制御部５０は、本表示処理をＳ１８０へと移行させる。 On the other hand, if the pitch of the estimation source section is stored as a result of the determination in S170 (S170: YES), that is, the pitch for the section of the voice data used for estimating the singing pitch in the subsequent unit section is stored. If so, the control unit 50 shifts the display process to S180.

そのＳ１８０では、制御部５０は、後続単位区間での歌唱音高を推定する推定処理を実行する。また、以下では、Ｓ１８０で推定された、後続単位区間での歌唱音高を歌唱推定音高と称す。 In S180, the control unit 50 executes an estimation process for estimating the singing pitch in the subsequent unit section. Hereinafter, the singing pitch in the subsequent unit section estimated in S180 is referred to as a singing estimated pitch.

続いて、制御部５０は、前単位区間に対応する歌唱音符の音高または現単位区間に対応する歌唱音符の音高と対応付けて、Ｓ１２０で算出したピッチを表示する（Ｓ１９０）。さらに、Ｓ１９０では、制御部５０は、後続単位区間に対応する歌唱音符の音高と対応付けて、Ｓ１８０で推定された歌唱推定音高を表示する（Ｓ１９０）。 Subsequently, the control unit 50 displays the pitch calculated in S120 in association with the pitch of the singing note corresponding to the previous unit interval or the pitch of the singing note corresponding to the current unit interval (S190). Further, in S190, the control unit 50 displays the estimated singing pitch estimated in S180 in association with the pitch of the singing note corresponding to the subsequent unit section (S190).

具体的にＳ１９０では、制御部５０は、歌唱推定音高を映像制御部４６へと出力する。その歌唱推定音高を取得した映像制御部４６は、図４に示すように、現時点までに歌唱したピッチ（即ち、Ｓ１２０で算出したピッチ）及び歌唱推定音高を、歌唱音符と対応付けて表示部６４に表示する。ここで言う歌唱音符には、現単位区間に対応する歌唱音符、時間軸に沿って現単位区間よりも前の少なくとも１つの分析窓である前単位区間に対応する歌唱音符、及び後続単位区間に対応する歌唱音符を含む。さらに、Ｓ１９０では、映像制御部４６は、図４に示すように、各歌唱音符に割り当てられた歌詞を表示部６４に表示してもよい。なお、図４においては、Ｓ１２０で算出したピッチを実線で表し、歌唱推定音高を破線で表している。 Specifically, in S <b> 190, the control unit 50 outputs the estimated singing pitch to the video control unit 46. As shown in FIG. 4, the video control unit 46 that has acquired the estimated singing pitch displays the pitch sung so far (that is, the pitch calculated in S120) and the estimated singing pitch in association with the singing note. Displayed on the unit 64. The singing notes mentioned here include the singing notes corresponding to the current unit interval, the singing notes corresponding to the previous unit interval that is at least one analysis window before the current unit interval along the time axis, and the subsequent unit intervals. Includes corresponding singing notes. Further, in S190, the video control unit 46 may display the lyrics assigned to each singing note on the display unit 64 as shown in FIG. In FIG. 4, the pitch calculated in S120 is represented by a solid line, and the estimated singing pitch is represented by a broken line.

制御部５０は、その後、本表示処理を終了する。
＜推定処理＞
次に、表示処理のＳ１８０にて起動される推定処理について説明する。 Thereafter, the control unit 50 ends the display process.
<Estimation process>
Next, the estimation process started in S180 of the display process will be described.

図５に示す推定処理が起動されると、制御部５０は、推定元区間における歌唱音高差を取得する（Ｓ３１０）。続いて、制御部５０は、後続単位区間に対応する歌唱音符の音高を取得する（Ｓ３２０）。 When the estimation process shown in FIG. 5 is started, the control unit 50 acquires the singing pitch difference in the estimation source section (S310). Subsequently, the control unit 50 acquires the pitch of the singing note corresponding to the subsequent unit section (S320).

そして、推定処理では、制御部５０は、その取得した後続単位区間に対応する歌唱音符の音高、及びＳ３１０で取得した歌唱音高差に基づいて、後続単位区間における歌唱推定音高を推定する（Ｓ３３０）。 And in an estimation process, the control part 50 estimates the song estimation pitch in a subsequent unit area based on the pitch of the song note corresponding to the acquired subsequent unit area, and the song pitch difference acquired by S310. (S330).

具体的にＳ３３０では、制御部５０は、後続単位区間に対応する歌唱音符の音高中心に、現単位区間における歌唱音高差を加算することで、後続単位区間に対応する歌唱推定音高を推定する。なお、Ｓ３３０において加算される歌唱音高差は、後続単位区間に対応する歌唱音符の音高と対応付けられた歌唱音高差であってもよいし、前単位区間及び現単位区間における歌唱音高差の代表値であってもよい。ここで言う代表値とは、相加平均の結果であってもよいし、中心値であってもよいし、最頻値であってもよい。 Specifically, in S330, the control unit 50 adds the singing pitch difference in the current unit section to the pitch center of the singing note corresponding to the subsequent unit section, thereby obtaining the estimated singing pitch corresponding to the subsequent unit section. presume. Note that the singing pitch difference added in S330 may be a singing pitch difference associated with the pitch of the singing note corresponding to the subsequent unit interval, or the singing pitch in the previous unit interval and the current unit interval. It may be a representative value of the height difference. The representative value referred to here may be an arithmetic mean result, a center value, or a mode value.

さらに、推定処理では、制御部５０は、Ｓ３３０で推定した歌唱推定音高を補正する（Ｓ３４０）。このＳ３４０では、制御部５０は、Ｓ３３０で推定した歌唱推定音高を、推定元区間での歌唱音高差に基づいて補正する。 Further, in the estimation process, the control unit 50 corrects the estimated singing pitch estimated in S330 (S340). In this S340, the control part 50 correct | amends the song estimation pitch estimated by S330 based on the song pitch difference in an estimation origin area.

具体的にＳ３４０では、制御部５０は、図６に示すように、推定元区間での歌唱音高と、その推定元区間に対応する歌唱音符の音高との音高差が時間軸に沿って拡大していれば、現単位区間における音高差よりも音高の差が大きくなるように、歌唱推定音高を補正する。 Specifically, in S340, as shown in FIG. 6, the control unit 50 determines that the pitch difference between the singing pitch in the estimation source section and the pitch of the singing note corresponding to the estimation source section is along the time axis. The estimated singing pitch is corrected so that the pitch difference is larger than the pitch difference in the current unit interval.

また、Ｓ３４０では、制御部５０は、図７に示すように、推定元区間での歌唱音高と、その推定元区間に対応する歌唱音符の音高との音高差が時間軸に沿って縮小していれば、現単位区間における音高差よりも音高の差が小さくなるように、歌唱推定音高を補正する。 Further, in S340, as shown in FIG. 7, the control unit 50 determines that the pitch difference between the singing pitch in the estimation source section and the pitch of the singing note corresponding to the estimation source section is along the time axis. If it is reduced, the estimated singing pitch is corrected so that the pitch difference is smaller than the pitch difference in the current unit section.

さらに、推定処理では、制御部５０は、Ｓ３３０で推定した歌唱推定音高及びＳ３４０で補正した歌唱推定音高を接続する（Ｓ３５０）。すなわち、本実施形態における歌唱推定音高は、１つの分析窓に対して１つの値が求められるため、これらの値を滑らかに接続する処理をＳ３５０にて実行する。このＳ３５０にて実行させる処理としては、最小二乗法やその他の近似法が考えられる。 Further, in the estimation process, the control unit 50 connects the estimated song pitch estimated in S330 and the estimated song pitch corrected in S340 (S350). That is, since one value is calculated | required with respect to one analysis window in the song estimation pitch in this embodiment, the process which connects these values smoothly is performed in S350. As a process to be executed in S350, a least square method or another approximation method can be considered.

その後、制御部５０は、本推定処理を終了し、表示処理のＳ１９０へと処理を移行させる。
［実施形態の効果］
以上説明したように、表示処理では、過去の歌唱音高に基づいて、次に歌唱すべき分析窓（即ち、後続単位区間）の歌唱推定音高を推定している。そして、その推定した歌唱推定音高を、後続単位区間に対応する歌唱音符と対応付けて表示している。 Thereafter, the control unit 50 ends the estimation process and shifts the process to S190 of the display process.
[Effect of the embodiment]
As described above, in the display process, the estimated singing pitch of the analysis window (that is, the subsequent unit section) to be sung next is estimated based on the past singing pitch. And the estimated song estimation pitch is displayed in association with the song note corresponding to the subsequent unit section.

したがって、表示処理によれば、後続単位区間が実際に歌唱される前に、その後続単位区間において歌唱される歌唱推定音高を表示できる。よって、カラオケ装置３０によれば、現在歌唱している単位歌唱区間と、歌唱音高が表示される単位歌唱区間とのズレを低減できる。 Therefore, according to the display process, the estimated singing pitch to be sung in the subsequent unit section can be displayed before the subsequent unit section is actually sung. Therefore, according to the karaoke apparatus 30, the shift | offset | difference with the unit singing area where the singing pitch is displayed and the unit singing area currently singing can be reduced.

さらに言えば、カラオケ装置３０によれば、利用者自身の歌唱音高を調整するまでのタイムラグを抑制できる。この結果、カラオケ装置３０によれば、利用者にとって、より違和感が少ない表示を実現することができる。 Furthermore, according to the karaoke apparatus 30, the time lag until the user's own singing pitch is adjusted can be suppressed. As a result, according to the karaoke apparatus 30, it is possible to realize a display with less discomfort for the user.

このようなカラオケ装置３０では、現単位区間よりも時間軸に沿って前の前単位区間に対する歌唱音高と、その前単位区間に対応する歌唱音符の音高との音高差に基づいて補正することができる。 In such a karaoke apparatus 30, it correct | amends based on the pitch difference of the singing pitch with respect to the front unit area before the current unit area along the time axis, and the pitch of the singing note corresponding to the front unit area. can do.

ところで、対象楽曲として、歌詞が割り当てられた複数の音符のうち、特定の音符で音高が急に高くなる音符の並びが繰り返される歌唱旋律を有した対象楽曲を想定する。このような対象楽曲に対して、利用者は、音高が急に高くなる音符を時間軸に沿った最初の段階では歌唱できるものの、時間軸に沿って後の段階については、当該音高が急に高くなる音符を発声しづらくなり、当該音符を歌唱できなくなる可能性がある。このような場合、音高が急に高くなる音符の並びの繰り返しの中で後の段階の音符ほど、歌唱音高差が大きくなる。 By the way, the object music which has the song melody in which the arrangement | sequence of the note from which the pitch becomes high suddenly with a specific note among the several notes to which the lyrics are allocated is assumed as object music. For such a target song, the user can sing a note whose pitch suddenly increases at the first stage along the time axis, but at the later stage along the time axis, the pitch is There is a possibility that it becomes difficult to utter a note that suddenly increases, and it becomes impossible to sing the note. In such a case, the singing pitch difference becomes larger as the notes in the later stage in the repetition of the arrangement of notes whose pitches suddenly increase.

これに対し、推定処理では、推定元区間における歌唱音高差が時間軸に沿って拡大していれば、後続単位区間における音高の差が現単位区間での歌唱音高差よりも大きくなるように歌唱推定音高を補正している。このため、推定処理によれば、利用者が実際に歌唱した場合の状況に応じて、後続単位区間における歌唱推定音高を補正できる。 On the other hand, in the estimation process, if the singing pitch difference in the estimation source section expands along the time axis, the difference in pitch in the subsequent unit section becomes larger than the singing pitch difference in the current unit section. Thus, the estimated singing pitch is corrected. For this reason, according to the estimation process, the estimated singing pitch in the subsequent unit section can be corrected according to the situation when the user actually sings.

また、対象楽曲として上記のような楽曲を想定した場合、利用者は、音高が急に高くなる音符を、時間軸に沿った最初の段階では歌唱できないものの、時間軸に沿って後の段階については、その音高に対応でき、当該音高を歌唱できる可能性がある。このような場合、音高が急に高くなる音符の並びの繰り返しの中で後の段階の音符ほど、音符の音高と歌唱音高との差分が小さくなる。 In addition, when assuming the music as described above as the target music, the user cannot sing a note whose pitch suddenly increases at the first stage along the time axis, but at a later stage along the time axis. For, there is a possibility that it can respond to the pitch and sing the pitch. In such a case, the difference between the pitch of the note and the singing pitch becomes smaller as the note of the later stage in the repetition of the arrangement of the notes where the pitch suddenly increases.

これに対し、推定処理によれば、推定元区間における歌唱音高差が時間軸に沿って縮小していれば、後続単位区間における音高の差が現単位区間での音高差よりも小さくなるように歌唱推定音高を補正する。このため、推定処理によれば、利用者が実際に歌唱した場合の状況に応じて、後続単位区間における歌唱推定音高を補正できる。 On the other hand, according to the estimation process, if the singing pitch difference in the estimation source section is reduced along the time axis, the pitch difference in the subsequent unit section is smaller than the pitch difference in the current unit section. The estimated singing pitch is corrected as follows. For this reason, according to the estimation process, the estimated singing pitch in the subsequent unit section can be corrected according to the situation when the user actually sings.

これらの結果、推定処理によれば、推定される歌唱推定音高を、利用者の歌唱実体に近い音高とすることができる。
［その他の実施形態］
以上、本発明の実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において、様々な態様にて実施することが可能である。 As a result, according to the estimation process, the estimated singing estimated pitch can be set to a pitch close to the singing entity of the user.
[Other Embodiments]
As mentioned above, although embodiment of this invention was described, this invention is not limited to the said embodiment, In the range which does not deviate from the summary of this invention, it is possible to implement in various aspects.

すなわち、上記実施形態の構成の一部を省略した態様も本発明の実施形態である。また、上記実施形態と変形例とを適宜組み合わせて構成される態様も本発明の実施形態である。また、特許請求の範囲に記載した文言によって特定される発明の本質を逸脱しない限度において考え得るあらゆる態様も本発明の実施形態である。 That is, the aspect which abbreviate | omitted a part of structure of the said embodiment is also embodiment of this invention. Further, an aspect configured by appropriately combining the above embodiment and the modification is also an embodiment of the present invention. Moreover, all the aspects which can be considered in the limit which does not deviate from the essence of the invention specified by the wording described in the claims are the embodiments of the present invention.

また、本発明は、前述した表示制御装置の他、表示を制御するためにコンピュータが実行するプログラム、表示方法等、種々の形態で実現することができる。
［実施形態と特許請求の範囲との対応関係］
最後に、上記実施形態の記載と、特許請求の範囲の記載との関係を説明する。 In addition to the above-described display control device, the present invention can be realized in various forms such as a program executed by a computer to control display and a display method.
[Correspondence between Embodiment and Claims]
Finally, the relationship between the description of the above embodiment and the description of the scope of claims will be described.

上記実施形態の表示処理におけるＳ１１０を実行することで得られる機能が、特許請求の範囲に記載された取得手段の一例であり、Ｓ１２０を実行することで得られる機能が、特許請求の範囲に記載された特定手段の一例である。また、表示処理におけるＳ１５０を実行することで得られる機能が、特許請求の範囲に記載された算出手段の一例であり、表示手段のＳ１８０、即ち、推定処理のＳ３３０を実行することで得られる機能が、特許請求の範囲に記載された推定手段の一例である。 The function obtained by executing S110 in the display processing of the above embodiment is an example of the obtaining unit described in the claims, and the function obtained by executing S120 is described in the claims. This is an example of the specified means. Further, the function obtained by executing S150 in the display process is an example of the calculation means described in the claims, and the function obtained by executing S180 of the display means, that is, S330 of the estimation process. Is an example of the estimation means described in the claims.

さらに、表示手段のＳ１９０を実行することで得られる機能が、特許請求の範囲に記載された表示制御手段の一例であり、推定処理のＳ３４０を実行することで得られる機能が、特許請求の範囲に記載された補正手段の一例である。 Furthermore, the function obtained by executing S190 of the display means is an example of the display control means described in the claims, and the function obtained by executing S340 of the estimation process is described in the claims. It is an example of the correction | amendment means described in (1).

３０…カラオケ装置３２…通信部３４…入力受付部３６…楽曲再生部３８…記憶部４０…音声制御部４２…出力部４４…マイク入力部４６…映像制御部５０…制御部５２…ＲＯＭ５４…ＲＡＭ５６…ＣＰＵ６０…スピーカ６２…マイク６４…表示部 DESCRIPTION OF SYMBOLS 30 ... Karaoke apparatus 32 ... Communication part 34 ... Input reception part 36 ... Music reproduction part 38 ... Memory | storage part 40 ... Audio | voice control part 42 ... Output part 44 ... Microphone input part 46 ... Video | video control part 50 ... Control part 52 ... ROM 54 ... RAM 56 ... CPU 60 ... Speaker 62 ... Microphone 64 ... Display unit

Claims

An acquisition means for acquiring audio data that is a music that is a song in which lyrics are assigned to at least some of the plurality of notes and that is input during the performance of the target song that is the designated song;
Based on the voice data acquired by the acquisition means, the specifying means for specifying the singing pitch in the voice data for each unit singing section that is a specified section;
The singing corresponding to the singing pitch specified by the specifying means and the unit singing section of the singing pitch based on the model data representing the pitch of each singing note that is a note to which lyrics are assigned in the target music A calculating means for calculating a difference in singing pitch from the pitch of a note;
By adding the singing pitch difference calculated by the calculating means to the pitch of the singing note corresponding to the subsequent unit section that is a unit singing section after the current unit section that is the unit singing section at the present time Estimating means for estimating a singing pitch representing a singing pitch for the subsequent unit section;
A display control device comprising: display control means for displaying the estimated singing pitch estimated by the estimation means in association with the subsequent unit section.

The estimated singing pitch estimated by the estimating means includes a singing pitch in at least one previous unit section before the current unit section along a time axis, and a singing note corresponding to the at least one previous unit section. The display control apparatus according to claim 1, further comprising: a correcting unit that corrects the pitch based on a pitch difference from the pitch.

The correction means includes
If the pitch difference between the singing pitch in the previous unit interval and the pitch of the singing note corresponding to the previous unit interval is expanding along the time axis, the current unit interval calculated by the calculating means as the difference in pitch than the pitch difference increases, the display control device according to 請 Motomeko 2 you and corrects the singing estimated pitch.

The correction means includes
If the pitch difference between the singing pitch in the previous unit interval and the pitch of the singing note corresponding to the previous unit interval is reduced along the time axis, in the current unit interval calculated by the calculating means The display control device according to claim 2 , wherein the estimated singing pitch is corrected so that a pitch difference is smaller than a pitch difference.

An acquisition procedure for acquiring audio data that is audio that is input during performance of the target music that is a song that has lyrics assigned to at least some of the plurality of notes, and
Based on the audio data acquired in the acquisition procedure, a specific procedure for specifying the singing pitch in the audio data for each unit singing section that is a specified section;
The singing pitch corresponding to the unit singing interval of the singing pitch specified in the specific procedure and the singing pitch based on the model data representing the pitch of each singing note that is a note to which lyrics are assigned in the target music A calculation procedure for calculating the difference in singing pitch from the pitch of the note;
By adding the singing pitch difference calculated in the calculation procedure to the pitch of the singing note corresponding to the subsequent unit section that is a unit singing section after the current unit section that is the unit singing section at the present time , An estimation procedure for estimating a singing estimated pitch representing a singing pitch for the subsequent unit section;
A program for causing a computer to execute a display control procedure for displaying the estimated singing pitch estimated in the estimation procedure in association with the subsequent unit section.