JP2023122012A

JP2023122012A - karaoke device

Info

Publication number: JP2023122012A
Application number: JP2022025428A
Authority: JP
Inventors: 宇将永沼; Takamasa Naganuma; 幸裕金子; Yukihiro Kaneko
Original assignee: Daiichikosho Co Ltd
Current assignee: Daiichikosho Co Ltd
Priority date: 2022-02-22
Filing date: 2022-02-22
Publication date: 2023-09-01

Abstract

To provide a Karaoke device capable of presenting appropriate breathing timing to a singer.SOLUTION: A Karaoke device includes: an extraction unit for extracting timing in which a singer can breathe in music on the basis of reference data of the music; a detection unit for detecting breathing information including timing in which the singer breathed on the basis of a voice signal corresponding to singing voice of the singer and/or a video signal obtained by imaging the singer; a specification unit for specifying timing in which the singer can breathe next on the basis of timing in which the singer can breathe in extracted music and the detected breathing information; and a presentation unit which presents timing to perform specified next breathing to the singer.SELECTED DRAWING: Figure 2

Description

本発明はカラオケ装置に関する。 The present invention relates to a karaoke machine.

カラオケ装置が提供する歌唱支援の機能として、息継ぎのタイミングを歌唱者に対して提示するものがある。 As a singing support function provided by a karaoke device, there is one that presents the timing of breathing to the singer.

たとえば、特許文献１には、リファレンスデータに含まれる息継ぎタイミングを示す情報に基づいて、息継ぎタイミングの前後の音を離散させて表示することにより、息継ぎのタイミングを直感的に把握可能とする技術が開示されている。 For example, Patent Literature 1 discloses a technique that makes it possible to intuitively grasp the timing of breathing by discretely displaying the sounds before and after the timing of breathing based on information indicating the timing of breathing included in reference data. disclosed.

特開２０１６－０３１３９４号公報JP 2016-031394 A

しかしながら、特許文献１の技術によると、一の楽曲については、予め決められた息継ぎのタイミングしか提示できない。 However, according to the technique disclosed in Patent Literature 1, only predetermined breath timings can be presented for one piece of music.

本発明の目的は、歌唱者に対して適当な息継ぎのタイミングを提示することが可能なカラオケ装置を提供することにある。 SUMMARY OF THE INVENTION It is an object of the present invention to provide a karaoke machine capable of presenting a singer with appropriate timing for taking a breath.

上記目的を達成するための一の発明は、楽曲のリファレンスデータに基づいて、当該楽曲における息継ぎが可能なタイミングを抽出する抽出部と、前記楽曲のカラオケ歌唱を行う歌唱者の歌唱音声に対応する音声信号及び／または当該歌唱者を撮影して得られた映像信号に基づいて、当該歌唱者が行った息継ぎのタイミングを含む息継ぎ情報を検出する検出部と、抽出された前記楽曲における息継ぎが可能なタイミング、及び検出された前記息継ぎ情報に基づいて、前記歌唱者が、次の息継ぎを行うことが可能なタイミングを特定する特定部と、特定された前記次の息継ぎを行うことが可能なタイミングを前記歌唱者に対して提示する提示部と、を有するカラオケ装置である。
本発明の他の特徴については、後述する明細書及び図面の記載により明らかにする。 One invention for achieving the above object is an extracting unit that extracts the timing at which it is possible to take a breath in the song based on the reference data of the song, and corresponds to the singing voice of the singer who sings karaoke of the song. Based on the audio signal and/or the video signal obtained by photographing the singer, a detection unit that detects breath information including the timing of the breath performed by the singer, and the extracted breath in the song. and the timing at which the singer can take the next breath based on the timing and the detected breath information, and the specified timing at which the next breath can be taken. and a presentation unit that presents to the singer.
Other features of the present invention will be clarified by the description of the specification and drawings described later.

本発明によれば、歌唱者に対して適当な息継ぎのタイミングを提示することができる。 According to the present invention, it is possible to present an appropriate timing for taking a breath to the singer.

実施形態に係るカラオケ装置を示す図である。It is a figure which shows the karaoke apparatus which concerns on embodiment. 実施形態に係るカラオケ本体を示す図である。It is a figure which shows the karaoke main body which concerns on embodiment. 実施形態に係る提示部が、次の息継ぎのタイミングを提示する例を示す図である。FIG. 10 is a diagram showing an example of presenting the timing of the next breath by the presenting unit according to the embodiment; 実施形態に係るカラオケ装置の処理を示すフローチャートである。It is a flow chart which shows processing of a karaoke device concerning an embodiment.

＜実施形態＞
図１～図４を参照して、実施形態に係るカラオケ装置について説明する。 <Embodiment>
A karaoke apparatus according to an embodiment will be described with reference to FIGS. 1 to 4. FIG.

＝＝カラオケ装置＝＝
カラオケ装置Ｋは、楽曲のカラオケ演奏、及び歌唱者がカラオケ歌唱を行うための装置である。図１に示すように、カラオケ装置Ｋは、カラオケ本体１０、スピーカ２０、表示装置３０、マイク４０、リモコン装置５０、及び撮影手段６０を備える。 ==Karaoke Device==
The karaoke device K is a device for performing karaoke performances of musical pieces and singing karaoke by singers. As shown in FIG. 1, the karaoke machine K includes a karaoke main body 10, a speaker 20, a display device 30, a microphone 40, a remote control device 50, and a photographing means 60.

カラオケ本体１０は、選曲された楽曲のカラオケ演奏制御、歌詞や背景映像等の表示制御、マイク４０を通じて入力された音声信号の処理といった、カラオケ演奏やカラオケ歌唱に関する各種の制御を行う。スピーカ２０はカラオケ本体１０からの信号に基づいてカラオケ演奏音や歌唱音声を放音するための構成である。表示装置３０はカラオケ本体１０からの信号に基づいて映像や画像を画面に表示するための構成である。マイク４０は歌唱者のカラオケ歌唱に伴う歌唱音声をアナログの音声信号に変換してカラオケ本体１０に入力するための構成である。リモコン装置５０は、カラオケ本体１０に対する各種操作をおこなうための装置である。撮影手段６０は、歌唱者を撮影するためのカメラである。 The karaoke main body 10 performs various controls related to karaoke performance and karaoke singing, such as karaoke performance control of selected music, display control of lyrics, background images, etc., and processing of audio signals input through the microphone 40 . The speaker 20 is configured to emit karaoke performance sounds and singing voices based on signals from the karaoke main body 10 . The display device 30 is configured to display video and images on the screen based on the signal from the karaoke main body 10 . The microphone 40 is configured to convert the singing voice accompanying the karaoke singing of the singer into an analog voice signal and input it to the karaoke main body 10 . The remote control device 50 is a device for performing various operations on the karaoke main body 10 . The photographing means 60 is a camera for photographing the singer.

図２に示すように、本実施形態に係るカラオケ本体１０は、記憶手段１０ａ、通信手段１０ｂ、入力手段１０ｃ、演奏手段１０ｄ、及び制御手段１０ｅを備える。各構成はインターフェース（図示なし）を介してバスＢに接続されている。 As shown in FIG. 2, the karaoke main body 10 according to this embodiment includes storage means 10a, communication means 10b, input means 10c, performance means 10d, and control means 10e. Each configuration is connected to bus B via an interface (not shown).

［記憶手段］
記憶手段１０ａは、各種のデータを記憶する大容量の記憶装置である。記憶手段１０ａは、楽曲データを記憶する。 [Storage means]
The storage unit 10a is a large-capacity storage device that stores various data. The storage means 10a stores music data.

楽曲データは、個々の楽曲を特定するための楽曲識別情報が付与されている。楽曲識別情報は、楽曲を識別するための楽曲ＩＤ等、各楽曲に固有の情報である。楽曲データは、伴奏データ、リファレンスデータ等を含む。 The music data is provided with music identification information for specifying individual music. The song identification information is information unique to each song, such as a song ID for identifying the song. The music data includes accompaniment data, reference data, and the like.

伴奏データは、カラオケ演奏音の元となるデータである。リファレンスデータは、カラオケ演奏された楽曲の歌唱すべき主旋律を示すデータである。各楽曲（各楽曲のリファレンスデータ）は、複数のノートから構成されている。各ノートは、所定の音高、発声開始タイミング、及び発声終了タイミング等が設定されている。発声開始タイミングは、あるノートに対応する文字の発声を開始すべきタイミングであり、発声終了タイミングは、あるノートに対する文字の発声を終了すべきタイミングである。発声開始タイミング及び発声終了タイミングは、楽曲の演奏が開始されるタイミングを基準とした経過時間で示すことができる。 Accompaniment data is data that is the source of karaoke performance sounds. The reference data is data indicating the main melody to be sung of the karaoke-performed song. Each piece of music (reference data for each piece of music) is composed of a plurality of notes. Each note is set with a predetermined pitch, utterance start timing, utterance end timing, and the like. The utterance start timing is the timing at which utterance of a character corresponding to a certain note should be started, and the utterance end timing is the timing at which the utterance of a character corresponding to a certain note should be finished. The utterance start timing and the utterance end timing can be indicated by the elapsed time based on the timing at which the performance of the music is started.

また、記憶手段１０ａは、楽曲のカラオケ演奏に合わせて当該楽曲に対応する歌詞を表示装置３０等に表示させるための歌詞データを記憶する。歌詞データは、楽曲の歌詞を構成する文字の情報を含む。一の文字は、一のノートに対応付けられている。また、歌詞データは、文字毎に、表示するタイミングを示すタイミング情報を含んでいる。一の文字を表示するタイミングは、楽曲の演奏が開始されるタイミングを基準とした経過時間で示すことができる。 The storage means 10a also stores lyrics data for displaying the lyrics corresponding to the music on the display device 30 or the like in time with the karaoke performance of the music. The lyric data includes information on characters that form the lyrics of the music. One character is associated with one note. The lyric data also includes timing information indicating the timing of displaying each character. The timing for displaying one character can be indicated by the elapsed time based on the timing at which the performance of the music is started.

更に、記憶手段１０ａは、カラオケ演奏時に表示装置３０等に表示される背景映像等の背景映像データ、及び楽曲の属性情報を記憶する。属性情報は、たとえば、歌手名、楽曲の音楽ジャンル情報、楽曲のテンポ情報等を含む。 Further, the storage means 10a stores background image data such as a background image displayed on the display device 30 or the like during a karaoke performance, and attribute information of music. The attribute information includes, for example, the name of the singer, music genre information of the song, tempo information of the song, and the like.

［通信手段・入力手段・演奏手段］
通信手段１０ｂは、リモコン装置５０との通信を行うためのインターフェースを提供する。入力手段１０ｃは、歌唱者が各種の指示入力を行うための構成である。入力手段１０ｃは、カラオケ本体１０に設けられたボタン等である。或いは、リモコン装置５０が入力手段１０ｃとして機能してもよい。演奏手段１０ｄは、制御手段１０ｅの制御に基づき、楽曲のカラオケ演奏、及びマイク４０を通じて入力された歌唱音声に基づく信号の処理を行う。 [Communication means, input means, performance means]
The communication means 10b provides an interface for communicating with the remote controller 50. FIG. The input means 10c is a structure for the singer to input various instructions. The input means 10c is a button or the like provided on the karaoke main body 10 . Alternatively, the remote control device 50 may function as the input means 10c. The performance means 10d performs karaoke performance of music and processes signals based on singing voices input through the microphone 40 under the control of the control means 10e.

［制御手段］
制御手段１０ｅは、カラオケ装置Ｋにおける各種の制御を行う。制御手段１０ｅは、ＣＰＵおよびメモリ（いずれも図示無し）を備える。ＣＰＵは、メモリに記憶されたプログラムを実行することにより各種の機能を実現する。 [Control means]
The control means 10e performs various controls in the karaoke apparatus K. FIG. The control means 10e includes a CPU and memory (both not shown). The CPU implements various functions by executing programs stored in the memory.

本実施形態においてはＣＰＵがメモリに記憶されるプログラムを実行することにより、制御手段１０ｅは、抽出部１００、検出部２００、特定部３００、及び提示部４００として機能する。 In the present embodiment, the control means 10e functions as an extraction unit 100, a detection unit 200, an identification unit 300, and a presentation unit 400 by the CPU executing programs stored in the memory.

（抽出部）
抽出部１００は、楽曲のリファレンスデータに基づいて、当該楽曲における息継ぎが可能なタイミングを抽出する。 (Extraction part)
The extracting unit 100 extracts the timing at which the user can take a breath in the music based on the reference data of the music.

息継ぎが可能なタイミングは、歌唱者が無理のない（余裕を持って）息継ぎが可能となるタイミングである。たとえば、非歌唱区間（前奏、間奏、後奏のような、ある楽曲において歌唱すべき歌詞が設定されていない区間）や、歌唱区間（Ａメロやサビのような、ある楽曲において歌唱すべき歌詞が設定されている区間）において一の文字の発声終了タイミングと次の文字の発声開始タイミングが一定時間以上空いている場合、歌唱者は無理のない息継ぎが可能となる。息継ぎが可能なタイミングは、楽曲の演奏が開始されるタイミングを基準とした経過時間で示すことができる。 The timing at which it is possible to take a breather is the timing at which the singer can take a breather without straining (with a margin). For example, non-singing sections (sections such as prelude, interlude, and postlude where lyrics to be sung in a certain song are not set) and singing sections (lyrics to be sung in a certain song, such as A melody and chorus) is set), if there is a certain amount of time between the end of vocalization of one character and the start of vocalization of the next character, the singer can take a reasonable breather. The timing at which it is possible to take a breath can be indicated by the elapsed time based on the timing at which the performance of the music is started.

具体的に、抽出部１００は、歌唱者が選曲した楽曲について、リファレンスデータにおける一のノートＮ_nの発声終了タイミングＴＥ_nと、次のノートＮ_n+1の発声開始タイミングＴＳ_n+1との時間差（ＴＳ_n+1－ＴＥ_n）を求める。抽出部１００は、求めた時間差が第１の所定条件を満たすかどうかを確認する。第１の所定条件を満たす場合、抽出部１００は、ノートＮ_nの発声終了タイミングＴＥ_nを息継ぎが可能なタイミングとして抽出する。抽出部１００は、歌唱者が選曲した楽曲に含まれる各ノート（楽曲の最後のノートを除く）について、ノートの順番に沿って上記処理を行うことで、当該楽曲における息継ぎが可能なタイミングを複数抽出する。第１の所定条件は、２５０ｍｓｅｃ以上、３００ｍｓｅｃより長い等、無理のない息継ぎが可能な時間を想定して予め設定されている条件である。 Specifically, the extracting unit 100 extracts the utterance end timing TE _n of one note N _n and the utterance start timing TS n _{+1 of the next note N n+} ₁ in the reference data for the song selected by the singer. Find the time difference (TS _n+1 -TE _n ). The extraction unit 100 checks whether the obtained time difference satisfies a first predetermined condition. When the first predetermined condition is satisfied, the extraction unit 100 extracts the utterance end timing TE _n of the note N _n as the timing at which the breath can be taken. The extraction unit 100 performs the above-described processing for each note included in the song selected by the singer (excluding the last note of the song) in the order of the notes, so that there are multiple timings at which you can take a breath in the song. Extract. The first predetermined condition is a condition that is set in advance assuming a period of time during which a reasonable breath can be taken, such as 250 msec or more and longer than 300 msec.

（検出部）
検出部２００は、楽曲のカラオケ歌唱を行う歌唱者の歌唱音声に対応する音声信号及び／または当該歌唱者を撮影して得られた映像信号に基づいて、当該歌唱者が行った息継ぎのタイミングを含む息継ぎ情報を検出する。歌唱者が行った息継ぎのタイミングは、楽曲のカラオケ歌唱中に実際に行われた息継ぎのタイミングである。歌唱者が行った息継ぎのタイミングは、楽曲の演奏が開始されるタイミングを基準とした経過時間で示すことができる。 (Detection unit)
The detection unit 200 detects the timing of the breath performed by the singer based on the audio signal corresponding to the singing voice of the singer who sings karaoke of the song and / or the video signal obtained by shooting the singer. Detect breath information including: The timing of the breath performed by the singer is the timing of the breath actually performed during karaoke singing of the song. The timing of the breath performed by the singer can be indicated by the elapsed time based on the timing when the performance of the music is started.

具体的に、検出部２００は、マイク４０を通じて入力された歌唱者の歌唱音声に対応する音声信号の周波数特性を公知の手法により解析する。検出部２００は、解析結果に基づいて、呼気による音声信号（倍音成分に相当する音声信号）が含まれず、且つ息継ぎに特有の吸気ノイズに相当する音声信号が含まれていると判定した場合、当該音声信号に対応する歌唱音声が入力された時間を、歌唱者が行った息継ぎのタイミングとして検出する。 Specifically, the detection unit 200 analyzes the frequency characteristics of the audio signal corresponding to the singing voice of the singer input through the microphone 40 using a known technique. If the detection unit 200 determines, based on the analysis result, that an audio signal due to exhalation (audio signal corresponding to overtone components) is not included and an audio signal corresponding to inhalation noise peculiar to breathing is included, The time when the singing voice corresponding to the voice signal is input is detected as the timing of the singer's breathing.

また、検出部２００は、特開２００７－２７１９７７号公報記載の技術を応用し、歌唱者音声信号のパワー（上記周波数特性に相当）と楽譜音データ（上記リファレンスデータに相当）に基づいて、歌唱者が行った息継ぎのタイミングを検出することができる。 In addition, the detection unit 200 applies the technology described in Japanese Patent Application Laid-Open No. 2007-271977, and based on the power of the singer's voice signal (corresponding to the frequency characteristics described above) and the musical score sound data (corresponding to the reference data described above), It is possible to detect the timing of breathing performed by a person.

一方、検出部２００は、撮影手段６０により歌唱者を撮影して得られた映像信号を公知の手法により解析する。検出部２００は、解析結果に基づいて、息継ぎに特有の動作（たとえば、肩や頭部の上下動、口の開閉）が含まれると判定した場合、当該動作が撮影された時間を、歌唱者が行った息継ぎのタイミングとして検出する。 On the other hand, the detection unit 200 analyzes the video signal obtained by photographing the singer by the photographing means 60 by a known technique. If the detection unit 200 determines, based on the analysis result, that a movement peculiar to breathing (for example, vertical movement of the shoulder or head, opening and closing of the mouth) is included, the time when the movement was captured is recorded by the singer. It is detected as the timing of breathing performed by

或いは、検出部２００は、音声信号に基づいて検出した歌唱者が行った息継ぎのタイミングと、映像信号に基づいて検出した歌唱者が行った息継ぎのタイミングとを比較し、一致または近似するタイミングのみを歌唱者が行った息継ぎのタイミングとして検出してもよい。また、検出部２００は、音声信号に基づいて検出した歌唱者が行った息継ぎのタイミングと、映像信号に基づいて検出した歌唱者が行った息継ぎのタイミングとを比較し、より早い方（楽曲の演奏が開始されるタイミングを基準とした経過時間が短い方）を歌唱者が行った息継ぎのタイミングとして検出してもよい。 Alternatively, the detection unit 200 compares the timing of the breath performed by the singer detected based on the audio signal and the timing of the breath performed by the singer detected based on the video signal, and detects only the matching or similar timing. may be detected as the timing of the breath performed by the singer. In addition, the detection unit 200 compares the timing of the breath performed by the singer detected based on the audio signal and the timing of the breath performed by the singer detected based on the video signal, and compares the timing of the breath performed by the singer detected based on the video signal. The shortest elapsed time based on the timing at which the performance starts) may be detected as the timing of the breath performed by the singer.

（特定部）
特定部３００は、抽出された楽曲における息継ぎが可能なタイミング、及び検出された息継ぎ情報に基づいて、歌唱者が、次の息継ぎを行うことが可能なタイミングを特定する。次の息継ぎを行うことが可能なタイミングは、楽曲の演奏が開始されるタイミングを基準とした経過時間で示すことができる。 (specific part)
The specifying unit 300 specifies the timing at which the singer can take a next breath based on the extracted timing at which the song can take a breath and the detected breath information. The timing at which the next breath can be taken can be indicated by the elapsed time based on the timing at which the performance of the music is started.

具体的に、特定部３００は、検出された息継ぎ情報に含まれる歌唱者が行った息継ぎのタイミングＢＴと、抽出された楽曲における息継ぎが可能なタイミングのうち、タイミングＢＴより後のタイミングＴＥ_mとの時間差（ＴＥ_m－ＢＴ）を求める。特定部３００は、求めた時間差が第２の所定条件を満たすかどうかを判断する。第２の所定条件を満たす場合、特定部３００は、タイミングＴＥ_mを、次の息継ぎを行うことが可能なタイミングとして特定する。第２の所定条件は、１０秒以下、１５秒未満等、息が続かなくなる可能性が高い時間（息継ぎなしでカラオケ歌唱が可能な時間）を想定して予め設定されている条件である。 Specifically, the specifying unit 300 determines the timing BT of the breath performed by the singer included in the detected breath information, and the timing TE _m after the timing BT among the timings at which the breath can be taken in the extracted music. to find the time difference (TE _m - BT). The identifying unit 300 determines whether the obtained time difference satisfies a second predetermined condition. If the second predetermined condition is satisfied, the identifying unit 300 identifies the timing TE _m as the timing at which the next breath can be taken. The second predetermined condition is a condition that is set in advance assuming a period of time during which there is a high possibility that you will not be able to hold your breath, such as 10 seconds or less, or less than 15 seconds (a period during which you can sing karaoke without taking a breath).

なお、息継ぎをしてから一定時間は息継ぎなしでもカラオケ歌唱を行うことができる。すなわち、タイミングＢＴよりも後の息継ぎが可能なタイミングの中には、次の息継ぎを行うことが可能なタイミングとして敢えて特定する必要がないものも含まれる。そこで、第２の所定条件は、１０秒以上１５秒以下、９秒より長く且つ１３秒未満等、息が続かなくなる可能性が高い時間（息継ぎなしでカラオケ歌唱が可能な時間）だけでなく、息継ぎが不要である可能性が高い時間を想定して設定されていてもよい。 After taking a breather, karaoke can be sung without taking a breather for a certain period of time. That is, the timing at which the next breath can be taken includes timings that do not need to be specified as the timing at which the next breath can be taken. Therefore, the second predetermined condition is not only the time when there is a high possibility that the breath will not last, such as 10 seconds or more and 15 seconds or less, longer than 9 seconds and less than 13 seconds (time during which karaoke can be sung without breathing), It may be set assuming a time when there is a high possibility that a breather is unnecessary.

（提示部）
提示部４００は、特定された次の息継ぎを行うことが可能なタイミングを歌唱者に対して提示する。 (presentation part)
The presentation unit 400 presents the timing at which the specified next breath can be taken to the singer.

具体的に、提示部４００は、楽曲の歌詞の表示に対応させて、特定された次の息継ぎを行うことが可能なタイミングを表示させることができる。 Specifically, the presentation unit 400 can display the specified timing at which the next breath can be taken in correspondence with the display of the lyrics of the music.

提示部４００は、楽曲の歌詞データに含まれるタイミング情報を参照し、特定された次の息継ぎを行うことが可能なタイミングと一致するタイミングが設定されている文字を抽出する。提示部４００は、抽出した文字が表示装置３０の表示画面に表示される際、歌唱者が、次の息継ぎが可能なタイミングが分かるような表示を行う。たとえば、図３に示すように、提示部４００は、抽出した文字「が」の後に「息継ぎ」の文字を表示させることで、歌唱者に対して次の息継ぎのタイミングを提示することができる。なお、提示部４００は、「息継ぎ」の文字の代わりに、息継ぎを促すアイコン（たとえば、ブレス記号）を表示させてもよい。 The presentation unit 400 refers to the timing information included in the lyric data of the song, and extracts the character for which the timing that matches the specified timing at which the next breath can be taken is set. When the extracted characters are displayed on the display screen of the display device 30, the presentation unit 400 displays such that the singer can understand the timing at which the next breath can be taken. For example, as shown in FIG. 3, the presentation unit 400 can present the timing of the next breath to the singer by displaying the character "breathing" after the extracted character "ga". It should be noted that presentation unit 400 may display an icon (for example, a breath symbol) prompting to take a breath instead of the characters “breathe”.

一方、提示部４００は、楽曲の演奏開始からの経過時間を測定し、特定された次の息継ぎを行うことが可能なタイミングに応じてカウントダウンの表示を行い、特定された次の息継ぎを行うことが可能なタイミングが到来した場合に「息継ぎ」の文字を表示させてもよい。 On the other hand, the presentation unit 400 measures the elapsed time from the start of the performance of the music, displays a countdown according to the specified timing at which the next breath can be taken, and performs the specified next breath. It is also possible to display the characters "breathing" when the timing at which it is possible to

或いは、提示部４００は、カウントダウンの表示の代わりにスピーカ２０からカウントダウンに対応する音声を放音させることにより、特定された次の息継ぎを行うことが可能なタイミングを歌唱者に対して提示することも可能である。 Alternatively, the presentation unit 400 emits a sound corresponding to the countdown from the speaker 20 instead of displaying the countdown, thereby presenting the singer with the specified timing at which the next breath can be taken. is also possible.

＝＝カラオケ装置Ｋの動作について＝＝
次に、図４を参照して本実施形態におけるカラオケ装置Ｋの動作の具体例について述べる。図４は、カラオケ装置Ｋの動作例を示すフローチャートである。 ==Operation of karaoke machine K==
Next, a specific example of the operation of the karaoke machine K in this embodiment will be described with reference to FIG. FIG. 4 is a flow chart showing an operation example of the karaoke machine K. As shown in FIG.

歌唱者Ｓは、リモコン装置５０を操作し、楽曲Ｘを選曲する。カラオケ装置Ｋは、楽曲Ｘの楽曲ＩＤを予約待ち行列に登録することにより、楽曲Ｘのカラオケ演奏の予約を行う（楽曲Ｘのカラオケ演奏を予約。ステップ１０）。 The singer S operates the remote control device 50 to select a song X. The karaoke apparatus K reserves the karaoke performance of the music piece X by registering the music ID of the music piece X in the reservation queue (reservation of the karaoke performance of the music piece X, step 10).

抽出部１００は、記憶手段１０ａから読み出した楽曲Ｘのリファレンスデータに基づいて、楽曲Ｘにおける息継ぎが可能なタイミングを抽出する（楽曲Ｘにおける息継ぎが可能なタイミングを抽出。ステップ１１）。 The extraction unit 100 extracts the timing at which the user can take a breath in the music X based on the reference data of the music X read from the storage unit 10a (extract the timing at which the user can take a breath in the music X; step 11).

具体的に、抽出部１００は、楽曲Ｘについて、最初のノートＮ₁の発声終了タイミングＴＥ₁と、次のノートＮ₂の発声開始タイミングＴＳ₂との時間差（ＴＳ₂－ＴＥ₁）を求める。抽出部１００は、求めた時間差が第１の所定条件を満たすかどうかを確認する。この例では、第１の所定条件が２５０ｍｓｅｃ以上であるとする。すなわち、あるノートの発声終了タイミングから次のノートの発声開始タイミングまでの間隔が２５０ｍｓｅｃ以上あれば、無理のない息継ぎが可能であるとする。 Specifically, the extraction unit 100 obtains the time difference (TS ₂ −TE ₁ ) between the utterance end timing TE ₁ of the first note N ₁ and the utterance start timing TS ₂ of the next note N ₂ for the song X. The extraction unit 100 checks whether the obtained time difference satisfies a first predetermined condition. In this example, it is assumed that the first predetermined condition is 250 msec or longer. In other words, it is assumed that if the interval between the end timing of uttering a certain note and the start timing of uttering the next note is 250 msec or more, it is possible to take a reasonable breather.

抽出部１００は、楽曲Ｘに含まれるノートの順番に上記処理を繰り返し行うことにより、楽曲Ｘにおける息継ぎが可能なタイミングを複数抽出する。この例では、息継ぎが可能なタイミングとして、ノートＮ₄の発声終了タイミングＴＥ₄（３，０００ｍｓｅｃ）、ノートＮ₈の発声終了タイミングＴＥ₈（７，５００ｍｓｅｃ）、・・・・、ノートＮ_nの発声終了タイミングＴＥ_n（３８，０００ｍｓｅｃ）、ノートＮ_oの発声終了タイミングＴＥ_o（４２，０００ｍｓｅｃ）、ノートＮ_pの発声終了タイミングＴＥ_p（４６，５００ｍｓｅｃ）、ノートＮ_qの発声終了タイミングＴＥ_q（４９，０００ｍｓｅｃ）、ノートＮ_rの発声終了タイミングＴＥ_r（５２，５００ｍｓｅｃ）、ノートＮ_sの発声終了タイミングＴＥ_s（５５，５００ｍｓｅｃ）、ノートＮ_tの発声終了タイミングＴＥ_t（６１，５００ｍｓｅｃ）、・・・を抽出したとする。なお、上記タイミングの数値は、楽曲Ｘの演奏が開始されるタイミング（０ｍｓｅｃ）を基準とした値である。 The extracting unit 100 extracts a plurality of timings in music X at which it is possible to take a breath by repeatedly performing the above processing in the order of the notes included in music X. FIG. In this example, as timings at which breath can be taken, utterance end timing TE ₄ (3,000 msec) for _note N ₄ , utterance end timing TE ₈ (7,500 msec) for note N ₈ , . Vocalization end timing TE _n (38,000 msec), utterance end timing TE _o (42,000 msec) for note N _o , utterance end timing TE _p (46,500 msec) for note N _p , utterance end timing TE _q for note N _q (49,000 msec), utterance end timing TE _r of note N _r (52,500 msec), utterance end timing TE _s of note N _s (55,500 msec), utterance end timing TE _t of note N _t (61,500 msec) , are extracted. The numerical value of the timing is based on the timing (0 msec) at which the performance of the music piece X is started.

その後、カラオケ装置Ｋは、楽曲Ｘのカラオケ演奏を開始する（楽曲Ｘのカラオケ演奏を開始。ステップ１２）。歌唱者Ｓは、マイク４０を用いて楽曲Ｘのカラオケ歌唱を行う。 After that, the karaoke machine K starts karaoke performance of the music piece X (starts karaoke performance of the music piece X, step 12). Singer S sings karaoke song X using microphone 40 .

検出部２００は、楽曲Ｘのカラオケ歌唱を行う歌唱者Ｓの歌唱音声に対応する音声信号に基づいて、歌唱者Ｓが行った息継ぎのタイミングを含む息継ぎ情報を検出する（息継ぎ情報を検出。ステップ１３）。 The detection unit 200 detects breath information including the timing of the breath performed by the singer S based on the audio signal corresponding to the singing voice of the singer S who sings karaoke of the song X (detects breath information. Step 13).

具体的に、検出部２００は、マイク４０を通じて入力された歌唱者Ｓの歌唱音声に対応する音声信号の周波数特性を公知の手法により解析する。検出部２００は、解析結果に基づいて、呼気による音声信号が含まれず、且つ息継ぎに特有の吸気ノイズに相当する音声信号が含まれていると判定した場合、当該音声信号に対応する歌唱音声が入力された時間を歌唱者Ｓが行った息継ぎのタイミングＢＴとして検出する。この例では、楽曲Ｘの演奏が開始されるタイミング（０ｍｓｅｃ）を基準として４０秒後に息継ぎが行われたとする。この場合、検出部２００は、タイミングＢＴとして４０，０００ｍｓｅｃを検出する。 Specifically, the detection unit 200 analyzes the frequency characteristics of the audio signal corresponding to the singing voice of the singer S input through the microphone 40 using a known technique. If the detection unit 200 determines based on the analysis result that the sound signal due to exhalation is not included and the sound signal corresponding to the inhalation noise peculiar to breathing is included, the singing voice corresponding to the sound signal is detected. The input time is detected as the breath timing BT performed by the singer S. In this example, it is assumed that the player takes a breath after 40 seconds from the timing (0 msec) at which the musical composition X starts playing. In this case, the detection unit 200 detects 40,000 msec as the timing BT.

特定部３００は、ステップ１１で抽出された楽曲における息継ぎが可能なタイミング、及びステップ１３で検出された息継ぎ情報に基づいて、歌唱者Ｓが、次の息継ぎを行うことが可能なタイミングを特定する（次の息継ぎを行うことが可能なタイミングを特定。ステップ１４）。 The specifying unit 300 specifies the timing at which the singer S can take the next breath based on the timing at which the song extracted in step 11 can take a breath and the breath information detected in step 13. (Specify the timing at which the next breath can be taken. Step 14).

具体的に、特定部３００は、検出された息継ぎ情報に含まれる歌唱者Ｓが行った息継ぎのタイミングＢＴ（４０，０００ｍｓｅｃ）と、ステップ１１で抽出された楽曲Ｘにおける息継ぎが可能なタイミングのうち、タイミングＢＴより後のタイミングとの時間差を求める。特定部３００は、求めた時間差が第２の所定条件を満たすかどうかを判断する。この例では、第２の所定条件が１０秒以上１５秒以下であるとする。すなわち、実際に行った息継ぎから１０秒までは息継ぎなしでもカラオケ歌唱が可能であり、１５秒を超えると息が続かなくなる可能性が高いとする。 Specifically, the specifying unit 300 determines the timing BT (40,000 msec) of the breath performed by the singer S included in the detected breath-taking information and the timing at which the breath in the song X extracted in step 11 is possible. , the time difference from the timing after the timing BT. The identifying unit 300 determines whether the obtained time difference satisfies a second predetermined condition. In this example, it is assumed that the second predetermined condition is 10 seconds or more and 15 seconds or less. That is, it is assumed that karaoke can be sung without taking a breath for 10 seconds after the actually taken breath, and there is a high possibility that the breath will not last for more than 15 seconds.

ここで、ステップ１１で抽出された楽曲Ｘにおける息継ぎが可能なタイミングのうち、ノートＮ_nの発声終了タイミングＴＥ_n（３８，０００ｍｓｅｃ）以前のタイミングは、タイミングＢＴよりも前のタイミングである。よって、特定部３００は、ノートＮ_oの発声終了タイミングＴＥ_o（４２，０００ｍｓｅｃ）から順に、歌唱者Ｓが行った息継ぎのタイミングＢＴ（４０，０００ｍｓｅｃ）との時間差を求め、第２の所定条件を満たすかどうかを判断する。 Here, among the timings at which breath can be taken in the piece of music X extracted in step 11, the timing before the utterance end timing TE _n (38,000 msec) of the note N _n is the timing before the timing BT. Therefore, the specifying unit 300 sequentially obtains the time difference from the timing BT (40,000 msec) of the breath performed by the singer _S , starting from the utterance end timing TE _o (42,000 msec) of the note No. determine whether it satisfies

タイミングＢＴ（４０，０００ｍｓｅｃ）と、ノートＮ_oの発声終了タイミングＴＥ_n（４２，０００ｍｓｅｃ）との時間差は、２，０００ｍｓｅｃである。よって、特定部３００は、第２の所定条件を満たさないと判断する。同様に、タイミングＢＴ（４０，０００ｍｓｅｃ）と、ノートＮ_pの発声終了タイミングＴＥ_p（４６，５００ｍｓｅｃ）との時間差は、６，５００ｍｓｅｃであり、タイミングＢＴ（４０，０００ｍｓｅｃ）と、ノートＮ_qの発声終了タイミングＴＥ_q（４９，０００ｍｓｅｃ）との時間差は、９，０００ｍｓｅｃである。よって、特定部３００は、第２の所定条件を満たさないと判断する。 The time difference between the timing BT (40,000 msec) and the utterance end timing TE _n (42,000 msec) of note _No is 2,000 msec. Therefore, the identifying unit 300 determines that the second predetermined condition is not satisfied. Similarly, the time difference between the timing BT (40,000 msec) and the utterance end timing TE _p (46,500 _msec ) of note N _p is 6,500 msec. The time difference from the utterance end timing TE _q (49,000 msec) is 9,000 msec. Therefore, the identifying unit 300 determines that the second predetermined condition is not satisfied.

一方、タイミングＢＴ（４０，０００ｍｓｅｃ）と、ノートＮ_rの発声終了タイミングＴＥ_r（５２，５００ｍｓｅｃ）との時間差は、１２，５００ｍｓｅｃである。よって、特定部３００は、第２の所定条件を満たすと判断する。 On the other hand, the time difference between the timing BT (40,000 msec) and the utterance end timing TE _r (52,500 msec) of note N _r is 12,500 msec. Therefore, the identifying unit 300 determines that the second predetermined condition is satisfied.

なお、タイミングＢＴ（４０，０００ｍｓｅｃ）と、ノートＮ_sの発声終了タイミングＴＥ_s（５５，５００ｍｓｅｃ）との時間差は、１５，５００ｍｓｅｃである。よって、特定部３００は、第２の所定条件を満たさないと判断する。これ以降の発声終了タイミングは、第２の所定条件を満たさないことが明らかであるため、特定部３００は処理を終了する。 Note that the time difference between the timing BT (40,000 msec) and the utterance end timing TE _s (55,500 msec) of the note N _s is 15,500 msec. Therefore, the identifying unit 300 determines that the second predetermined condition is not satisfied. Since it is clear that the second predetermined condition is not satisfied at the utterance end timing after this, the specifying unit 300 ends the processing.

特定部３００は、第２の所定条件を満たすと判断したノートＮ_rの発声終了タイミングＴＥ_r（５２，５００ｍｓｅｃ）を、次の息継ぎを行うことが可能なタイミングとして特定する。特定部３００は、特定した次の息継ぎを行うことが可能なタイミングを提示部４００に出力する。 The specifying unit 300 specifies the utterance end timing TE _r (52,500 msec) of the note N _r determined to satisfy the second predetermined condition as the timing at which the next breath can be taken. The identifying unit 300 outputs to the presenting unit 400 the identified timing at which the next breath can be taken.

提示部４００は、ステップ１４で特定された次の息継ぎを行うことが可能なタイミングを歌唱者Ｓに対して提示する（次の息継ぎを行うことが可能なタイミングを提示。ステップ１５）。 The presentation unit 400 presents to the singer S the timing at which the next breath specified in step 14 can be performed (the timing at which the next breath can be performed is presented, step 15).

具体的に、提示部４００は、楽曲Ｘの歌詞データに含まれるタイミング情報を参照し、ノートＮ_rの発声終了タイミングＴＥ_r（５２，５００ｍｓｅｃ）と一致するタイミングが設定されている文字（すなわち、ノートＮ_rに対応する文字）を抽出する。提示部４００は、抽出したノートＮ_rに対応する文字が表示装置３０の表示画面に表示される際、歌唱者Ｓが、次の息継ぎが可能なタイミング（ノートＮ_rの発声終了タイミングＴＥ_r）が分かるような表示を行う。 Specifically, the presentation unit 400 refers to the timing information included in the lyric data of the song X, and selects a character whose timing matches the utterance end timing TE _r (52,500 msec) of the note N _r (that is, character corresponding to note N _r ). When the character corresponding to the extracted note N _r is displayed on the display screen of the display device 30, the presentation unit 400 determines the timing at which the singer S can take a next breath (timing TE _r at which the vocalization of the note N _r ends). display in such a way that

カラオケ装置Ｋは、楽曲Ｘのカラオケ演奏が終了するまで（ステップ１６でＹの場合）、ステップ１３からステップ１５の処理を繰り返し行う。なお、次の息継ぎを行うことが可能なタイミングが複数特定された場合には、歌詞の表示にそれぞれ対応させて表示する。 The karaoke machine K repeats the processing from step 13 to step 15 until the karaoke performance of the piece of music X is completed (in the case of Y in step 16). Note that when a plurality of timings at which the next breath can be taken are identified, they are displayed in correspondence with the display of the lyrics.

以上から明らかなように、本実施形態に係るカラオケ装置Ｋは、楽曲のリファレンスデータに基づいて、当該楽曲における息継ぎが可能なタイミングを抽出する抽出部１００と、楽曲のカラオケ歌唱を行う歌唱者の歌唱音声に対応する音声信号及び／または当該歌唱者を撮影して得られた映像信号に基づいて、当該歌唱者が行った息継ぎのタイミングを含む息継ぎ情報を検出する検出部２００と、抽出された楽曲における息継ぎが可能なタイミング、及び検出された息継ぎ情報に基づいて、歌唱者が、次の息継ぎを行うことが可能なタイミングを特定する特定部３００と、特定された次の息継ぎを行うことが可能なタイミングを歌唱者に対して提示する提示部４００と、を有する。 As is clear from the above, the karaoke device K according to the present embodiment includes an extraction unit 100 that extracts the timing at which the song can be breathed based on the reference data of the song, and the singer who sings karaoke of the song. Based on the audio signal corresponding to the singing voice and / or the video signal obtained by photographing the singer, the breath information including the timing of the breath performed by the singer is detected. A specifying unit 300 that specifies the timing at which the singer can take the next breath based on the timing at which the breath can be taken in the song and the detected breath information, and the specified next breath. and a presentation unit 400 that presents possible timings to the singer.

このようなカラオケ装置Ｋによれば、ある楽曲のカラオケ歌唱を行っている歌唱者が実際に行った息継ぎのタイミングに応じて、当該ある楽曲における息継ぎが可能なタイミングの中から、次の息継ぎのタイミングを特定し、歌唱者に提示することができる。よって、歌唱者は、次の息継ぎのタイミングとして適当なタイミングを把握できる。すなわち、本実施形態に係るカラオケ装置Ｋによれば、歌唱者に対して適当な息継ぎのタイミングを提示することができる。 According to such a karaoke device K, according to the timing of the breath actually performed by the singer who is singing karaoke of a certain song, the next breath is selected from the timing at which the breath in the song can be taken. Timing can be specified and presented to the singer. Therefore, the singer can grasp the appropriate timing as the timing of the next breath. That is, according to the karaoke apparatus K according to this embodiment, it is possible to present the singer with an appropriate timing for taking a breath.

また、本実施形態に係るカラオケ装置Ｋにおいて、提示部４００は、楽曲の歌詞の表示に対応させて、特定された次の息継ぎを行うことが可能なタイミングを表示させることができる。このようなカラオケ装置Ｋによれば、歌唱者に対して息継ぎのタイミングを視覚的に提示することができる。 In addition, in the karaoke machine K according to the present embodiment, the presentation unit 400 can display the specified timing at which the next breath can be taken in correspondence with the display of the lyrics of the music. According to such a karaoke machine K, it is possible to visually present the timing of breathing to the singer.

なお、上記例では、次の息継ぎが可能なタイミングとしてノートＮ_rの発声終了タイミングＴＥ_rのみを特定した。一方、第２の所定条件は、歌唱者毎に設定されているものではない。従って、歌唱者の歌唱力や歌唱の仕方によっては、第２の所定条件に基づいて特定した次の息継ぎが可能なタイミングよりも前に息が続かなくなる可能性がある。 In the above example, only the utterance end timing TE _r of note N _r is identified as the timing at which the next breath can be taken. On the other hand, the second predetermined condition is not set for each singer. Therefore, depending on the singing ability and manner of singing of the singer, there is a possibility that the breath will stop before the next breath timing specified based on the second predetermined condition.

そこで、特定部３００は、抽出された楽曲における息継ぎが可能なタイミングのうち、第２の所定条件を満たすタイミングと併せて、第２の所定条件よりも早いタイミングを次の息継ぎが可能なタイミングとして特定してもよい。 Therefore, the specifying unit 300 selects the timing at which the next breath can be taken from among the timings at which the breath can be taken in the extracted music, together with the timing that satisfies the second predetermined condition, and the timing earlier than the second predetermined condition as the timing at which the next breath can be taken. may be specified.

たとえば上記例において、特定部３００は、第２の所定条件を満たすと判断したノートＮ_rの発声終了タイミングＴＥ_r（５２，５００ｍｓｅｃ）と併せて、第２の所定条件の範囲（１０秒以上１５秒以下）よりも早いノートＮ_qの発声終了タイミングＴＥ_q（４９，０００ｍｓｅｃ）を、次の息継ぎを行うことが可能なタイミングとして特定する。 For example, in the above example, the specifying unit 300 determines the range of the second predetermined condition (10 seconds to 15 seconds) together with the utterance end timing TE _r (52,500 msec) of the note N _r determined to satisfy the second predetermined condition. second) is identified as the _timing at which the next _breath can be taken.

このような構成によれば、歌唱者の歌唱力等による差がある場合を想定したうえで、適当な息継ぎのタイミングを提示することができる。 According to such a configuration, it is possible to present an appropriate timing for taking a breath, assuming that there is a difference due to the singing ability of the singer.

＜変形例１＞
抽出部１００は、楽曲のリファレンスデータ及び楽曲の歌詞データに基づいて、楽曲における息継ぎが可能なタイミングを抽出してもよい。 <Modification 1>
The extracting unit 100 may extract the timing at which a breath can be taken in the music based on the reference data of the music and the lyric data of the music.

まず、抽出部１００は、実施形態で述べた処理を実行することにより、第１の所定条件を満たすノートの発声終了タイミングを抽出する。次に、抽出部１００は、抽出されたノートの発声終了タイミングの中から、歌詞データに基づいて、楽曲における息継ぎが可能なタイミングの絞り込みを行う。 First, the extraction unit 100 extracts the utterance end timing of the note that satisfies the first predetermined condition by executing the processing described in the embodiment. Next, the extracting unit 100 narrows down the timings at which the user can take a breath in the music based on the lyric data from among the extracted utterance end timings of the notes.

具体的に、抽出部１００は、公知の技術（たとえば、特開２００４－０７０６３４号公報）を用いて、歌詞データに対応する歌詞を解析し、読点を挿入可能な文節を特定する。抽出部１００は、第１の所定条件を満たすノートの発声終了タイミングの中から、特定した文節の最後の文字が表示されるタイミングと一致するタイミングを、楽曲における息継ぎが可能なタイミングとして抽出する。 Specifically, the extracting unit 100 analyzes the lyrics corresponding to the lyrics data using a known technique (for example, JP-A-2004-070634) and identifies clauses into which commas can be inserted. The extraction unit 100 extracts, from among the utterance end timings of the notes that satisfy the first predetermined condition, the timing that matches the timing at which the last character of the specified phrase is displayed as the timing at which the user can take a breath in the music.

一般的な会話文においても、読点を挿入可能な文節に含まれる最後の文字の発声終了から、次の文節に含まれる最初の文字の発声開始までは、話が区切られるので息を継ぐことが多い。すなわち、読点を挿入可能な文節に含まれる最後の文字に対応するノートの発声終了タイミングから、次の文節に含まれる最初の文字に対応するノートの発声開始タイミングまでの間は、自然に聴こえる息継ぎが可能である。よって、楽曲のリファレンスデータと合わせて歌詞データに基づいて抽出した息継ぎが可能なタイミングは、歌唱者にとってより息継ぎがしやすく、且つ息継ぎが自然に聴こえるタイミングとなる。 Even in general conversational sentences, there is a break between the end of the utterance of the last character contained in a phrase in which commas can be inserted and the beginning of the utterance of the first character contained in the next phrase, so you can catch your breath. many. In other words, the period from the end of the utterance timing of the note corresponding to the last character included in the phrase in which the comma can be inserted to the utterance start timing of the note corresponding to the first character included in the next phrase is naturally audible breathing. is possible. Therefore, the timing extracted based on the lyric data together with the reference data of the music is the timing at which it is easier for the singer to take a breath and the breath can be heard naturally.

なお、上記例では、歌詞データに対応する歌詞を解析して読点を挿入可能な文節を特定したが、歌詞データが予め読点のタイミングに相当する情報を含んでいてもよい。この場合、抽出部１００は、歌詞データに含まれる読点のタイミングに基づいて、楽曲における息継ぎが可能なタイミングの絞り込みを行う。 In the above example, the lyrics corresponding to the lyrics data are analyzed to specify the phrases in which the commas can be inserted. In this case, the extracting unit 100 narrows down the timings at which a breath can be taken in the music based on the timing of the reading points included in the lyrics data.

以上から明らかなように、本変形例に係るカラオケ装置Ｋにおいて、抽出部１００は、楽曲の歌詞データに基づいて、楽曲における息継ぎが可能なタイミングを抽出する。このようなカラオケ装置Ｋによれば、歌唱者に対してより適当な息継ぎのタイミングを提示することができる。 As is clear from the above, in the karaoke machine K according to this modified example, the extracting unit 100 extracts the timing at which it is possible to take a breath in a piece of music, based on the lyric data of the piece of music. According to such a karaoke machine K, it is possible to present the singer with a more suitable timing for taking a breath.

＜変形例２＞
検出部２００が検出する息継ぎ情報としては、歌唱者が行った息継ぎのタイミング及び歌唱者が行った息継ぎの長さを含んでいてもよい。 <Modification 2>
The breath information detected by the detection unit 200 may include the timing of the singer's breath and the length of the breath performed by the singer.

たとえば、検出部２００は、マイク４０を通じて入力された歌唱者の歌唱音声に対応する音声信号の周波数特性を公知の手法により解析する。検出部２００は、解析結果に基づいて、呼気による音声信号（倍音成分に相当する音声信号）が含まれず、且つ息継ぎに特有の吸気ノイズに相当する音声信号が含まれていると判定した場合、当該音声信号に対応する歌唱音声が入力された時間を、歌唱者が行った息継ぎのタイミングとして検出する。この際、検出部２００は、息継ぎに特有の吸気ノイズに相当する音声信号が含まれている持続時間を求め、当該持続時間を、歌唱者が行った息継ぎの長さとして検出する。 For example, the detection unit 200 analyzes the frequency characteristics of the audio signal corresponding to the singing voice of the singer input through the microphone 40 using a known technique. If the detection unit 200 determines, based on the analysis result, that an audio signal due to exhalation (audio signal corresponding to overtone components) is not included and an audio signal corresponding to inhalation noise peculiar to breathing is included, The time when the singing voice corresponding to the voice signal is input is detected as the timing of the singer's breathing. At this time, the detection unit 200 obtains the duration in which the sound signal corresponding to the inhalation noise peculiar to breathing is included, and detects the duration as the length of the breath performed by the singer.

また、検出部２００は、撮影手段６０により歌唱者を撮影して得られた映像信号を公知の手法により解析する。検出部２００は、解析結果に基づいて、息継ぎに特有の動作（たとえば、肩や頭部の上下動、口の開閉）が含まれると判定した場合、当該動作が撮影された時間を、歌唱者が行った息継ぎのタイミングとして検出する。この際、検出部２００は、息継ぎに特有の動作の持続時間を求め、当該持続時間を、歌唱者が行った息継ぎの長さとして検出する。 In addition, the detection unit 200 analyzes the video signal obtained by photographing the singer by the photographing means 60 by a known method. If the detection unit 200 determines, based on the analysis result, that a movement peculiar to breathing (for example, vertical movement of the shoulder or head, opening and closing of the mouth) is included, the time when the movement was captured is recorded by the singer. It is detected as the timing of breathing performed by At this time, the detection unit 200 obtains the duration of the movement peculiar to breathing, and detects the duration as the length of the breath performed by the singer.

或いは、検出部２００は、音声信号に基づいて検出した歌唱者が行った息継ぎの長さと、映像信号に基づいて検出した歌唱者が行った息継ぎの長さとを比較し、一致または近似する長さのみを歌唱者が行った息継ぎの長さとして検出してもよい。また、検出部２００は、音声信号に基づいて検出した歌唱者が行った息継ぎの長さと、映像信号に基づいて検出した歌唱者が行った息継ぎの長さとを比較し、より長いほうを歌唱者が行った息継ぎの長さとして検出してもよい。 Alternatively, the detection unit 200 compares the length of the breath performed by the singer detected based on the audio signal and the length of the breath performed by the singer detected based on the video signal, and compares the length of the breath performed by the singer detected based on the video signal. may be detected as the length of the breath taken by the singer. In addition, the detection unit 200 compares the length of the breath taken by the singer detected based on the audio signal and the length of the breath taken by the singer detected based on the video signal, and compares the length of the breath taken by the singer detected based on the video signal. may be detected as the length of the breath taken by

ここで、本変形例において、特定部３００が参照する第２の所定条件は、歌唱者が行った息継ぎの長さに応じて複数設定されている。歌唱者が行った息継ぎの長さが短い場合、息が続かなくなる可能性が高い時間（息継ぎなしでカラオケ歌唱が可能な時間）も短くなることが想定される。一方、歌唱者が行った息継ぎの長さが長い場合、息が続かなくなる可能性が高い時間（息継ぎなしでカラオケ歌唱が可能な時間）も長くなることが想定される。従って、第２の所定条件は、息継ぎの長さに応じた複数のパターンを含んでいることが好ましい。 Here, in this modified example, a plurality of second predetermined conditions referred to by the specifying unit 300 are set according to the length of the breath performed by the singer. If the length of the breath performed by the singer is short, it is assumed that the time during which there is a high possibility that the breath will not continue (the time during which karaoke can be sung without taking a breath) will also be shortened. On the other hand, if the length of the breath performed by the singer is long, it is assumed that the time during which there is a high possibility that the breath will not be continued (the time during which karaoke can be sung without taking a breath) will also increase. Therefore, it is preferable that the second predetermined condition includes a plurality of patterns corresponding to the length of breath.

たとえば、第２の所定条件が、息継ぎの長さが５００ｍｓｅｃ未満の場合は１０秒以上１５秒未満、息継ぎの長さが５００ｍｓｅｃ以上７５０ｍｓｅｃ未満の場合は１１秒以上１６秒未満、息継ぎの長さが７５０ｍｓｅｃ以上１，０００ｍｓｅｃ未満の場合は１２秒以上１７秒未満、息継ぎの長さが１，０００ｍｓｅｃ以上の場合は１３秒以上１８秒未満であるとする。 For example, the second predetermined condition is 10 seconds or more and less than 15 seconds when the length of breath is less than 500 msec, 11 seconds or more and less than 16 seconds when the length of breath is 500 msec or more and less than 750 msec, and the length of breath is If it is 750 msec or more and less than 1,000 msec, it is 12 seconds or more and less than 17 seconds, and if the breath length is 1,000 msec or more, it is 13 seconds or more and less than 18 seconds.

また、第１実施形態で述べたように、特定部３００は、ノートＮ_oの発声終了タイミングＴＥ_o（４２，０００ｍｓｅｃ）から順に、歌唱者Ｓが行った息継ぎのタイミングＢＴ（４０，０００ｍｓｅｃ）との時間差を求め、第２の所定条件を満たすかどうかを判断するとする。 In addition, as described in the first embodiment, the specifying unit 300, in order from the vocalization end timing TE _o (42,000 msec) of the note _No , the breath timing BT (40,000 msec) performed by the singer S, and to determine whether the second predetermined condition is satisfied.

タイミングＢＴ（４０，０００ｍｓｅｃ）と、ノートＮ_oの発声終了タイミングＴＥ_o（４２，０００ｍｓｅｃ）との時間差は、２，０００ｍｓｅｃである。よって、特定部３００は、息継ぎの長さに関わらず、第２の所定条件を満たさないと判断する。同様に、タイミングＢＴ（４０，０００ｍｓｅｃ）と、ノートＮ_pの発声終了タイミングＴＥ_p（４６，５００ｍｓｅｃ）との時間差は、６，５００ｍｓｅｃであり、タイミングＢＴ（４０，０００ｍｓｅｃ）と、ノートＮ_qの発声終了タイミングＴＥ_q（４９，０００ｍｓｅｃ）との時間差は、９，０００ｍｓｅｃである。よって、特定部３００は、息継ぎの長さに関わらず、いずれも第２の所定条件を満たさないと判断する。 The time difference between the timing BT (40,000 msec) and the utterance end timing _{TE o} ₍ 42,000 msec) of note No is 2,000 msec. Therefore, the identifying unit 300 determines that the second predetermined condition is not satisfied regardless of the length of the breath. Similarly, the time difference between the timing BT (40,000 msec) and the utterance end timing TE _p (46,500 _msec ) of note N _p is 6,500 msec. The time difference from the utterance end timing TE _q (49,000 msec) is 9,000 msec. Therefore, the identifying unit 300 determines that the second predetermined condition is not satisfied regardless of the length of the breath.

一方、タイミングＢＴ（４０，０００ｍｓｅｃ）と、ノートＮ_rの発声終了タイミングＴＥ_r（５２，５００ｍｓｅｃ）との時間差は、１２，５００ｍｓｅｃである。よって、特定部３００は、息継ぎの長さが５００ｍｓｅｃ未満の場合、息継ぎの長さが５００ｍｓｅｃ以上７５０ｍｓｅｃ未満の場合、及び息継ぎの長さが７５０ｍｓｅｃ以上１，０００ｍｓｅｃ未満の場合、第２の所定条件を満たすと判断し、息継ぎの長さが１，０００ｍｓｅｃ以上の場合、第２の条件を満たさないと判断する。 On the other hand, the time difference between the timing BT (40,000 msec) and the utterance end timing TE _r (52,500 msec) of note N _r is 12,500 msec. Therefore, the specifying unit 300 sets the second predetermined condition when the length of breath is less than 500 msec, when the length of breath is 500 msec or more and less than 750 msec, and when the length of breath is 750 msec or more and less than 1,000 msec. It is determined that the condition is satisfied, and if the breath length is 1,000 msec or more, it is determined that the second condition is not satisfied.

また、タイミングＢＴ（４０，０００ｍｓｅｃ）と、ノートＮ_sの発声終了タイミングＴＥ_s（５５，５００ｍｓｅｃ）との時間差は、１５，５００ｍｓｅｃである。よって、特定部３００は、息継ぎの長さが５００ｍｓｅｃ未満の場合、第２の所定条件を満たさないと判断し、息継ぎの長さが５００ｍｓｅｃ以上７５０ｍｓｅｃ未満の場合、息継ぎの長さが７５０ｍｓｅｃ以上１，０００ｍｓｅｃ未満の場合、及び息継ぎの長さが１，０００ｍｓｅｃ以上の場合、第２の条件を満たすと判断する。 Also, the time difference between the timing BT (40,000 msec) and the utterance end timing TE _s (55,500 msec) of the note N _s is 15,500 msec. Therefore, the identification unit 300 determines that the second predetermined condition is not satisfied when the length of the breath is less than 500 msec, and determines that the length of the breath is 750 msec or more1, when the length of the breath is 500 msec or more and less than 750 msec. If it is less than 000 msec and if the breath length is 1,000 msec or more, it is determined that the second condition is satisfied.

更に、タイミングＢＴ（４０，０００ｍｓｅｃ）と、ノートＮ_tの発声終了タイミングＴＥ_t（６１，５００ｍｓｅｃ）との時間差は、２１，５００ｍｓｅｃである。よって、特定部３００は、息継ぎの長さに関わらず、第２の所定条件を満たさないと判断する。これ以降の発声終了タイミングは、息継ぎの長さに関わらず、第２の所定条件を満たさないことが明らかであるため、特定部３００は処理を終了する。 Furthermore, the time difference between the timing BT (40,000 msec) and the utterance end timing TE _t (61,500 msec) of the note N _t is 21,500 msec. Therefore, the identifying unit 300 determines that the second predetermined condition is not satisfied regardless of the length of the breath. Since it is clear that the second predetermined condition is not satisfied at the utterance end timing after this regardless of the length of the breath, the specifying unit 300 ends the processing.

このように、本変形例に係る検出部２００は、歌唱者が行った息継ぎの長さを含む息継ぎ情報を検出することができる。次の息継ぎを行うことが可能なタイミングを特定する際に、歌唱者が行った息継ぎの長さを考慮することにより、歌唱者に対してより適当な息継ぎのタイミングを提示することができる。 In this way, the detection unit 200 according to this modification can detect breath information including the length of the breath performed by the singer. By considering the length of the breath taken by the singer when specifying the timing at which the next breath can be taken, it is possible to present the singer with a more suitable timing for taking a breath.

＜その他＞
上記実施形態は、例として提示したものであり、発明の範囲を限定するものではない。上記の構成は、適宜組み合わせて実施することが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。上記実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 <Others>
The above embodiments are presented as examples and are not intended to limit the scope of the invention. The above configurations can be implemented in combination as appropriate, and various omissions, replacements, and modifications can be made without departing from the scope of the invention. The above-described embodiments and modifications thereof are included in the scope and spirit of the invention, as well as the scope of the invention described in the claims and equivalents thereof.

１００抽出部
２００検出部
３００特定部
４００提示部
Ｋカラオケ装置 100 extraction unit 200 detection unit 300 identification unit 400 presentation unit K karaoke device

Claims

an extraction unit that extracts a timing at which a breath can be taken in the song based on the reference data of the song;
Based on the audio signal corresponding to the singing voice of the singer who sings karaoke of the song and / or the video signal obtained by shooting the singer, breath information including the timing of the breath performed by the singer a detection unit that detects
a specifying unit that specifies the timing at which the singer can take a next breath based on the extracted timing at which the song can take a breath and the detected breath information;
a presentation unit that presents the specified timing at which the next breath can be taken to the singer;
A karaoke device having

2. The karaoke apparatus according to claim 1, wherein the presentation unit displays the identified timing at which the next breath can be taken in correspondence with the display of the lyrics of the music piece.

3. The karaoke apparatus according to claim 1, wherein the extraction unit extracts a timing at which the user can take a breath in the music based on lyric data of the music.

4. The karaoke apparatus according to any one of claims 1 to 3, wherein the detection unit detects the breath information including the length of the breath performed by the singer.