JP6689705B2

JP6689705B2 - Karaoke song support device, Karaoke song support program

Info

Publication number: JP6689705B2
Application number: JP2016150933A
Authority: JP
Inventors: 永田　明峰; 明峰永田; 直孝野村
Original assignee: Daiichikosho Co Ltd
Current assignee: Daiichikosho Co Ltd
Priority date: 2016-08-01
Filing date: 2016-08-01
Publication date: 2020-04-28
Anticipated expiration: 2036-08-01
Also published as: JP2018021941A

Description

本発明はカラオケ歌唱サポート装置、及びカラオケ歌唱サポートプログラムに関する。 The present invention relates to a karaoke song support device and a karaoke song support program.

カラオケ装置の表示部には、カラオケ楽曲の演奏に合わせて、当該楽曲の歌詞を背景映像に重ねた画像が表示される。利用者は、表示された歌詞を参照しながらカラオケ歌唱ができるため歌詞を暗記する必要が無く、手軽にカラオケ歌唱を行うことができる。 On the display unit of the karaoke device, an image in which the lyrics of the karaoke song are superimposed on the background video is displayed in time with the performance of the karaoke song. Since the user can sing a karaoke song while referring to the displayed lyrics, the user does not need to memorize the lyrics and can easily perform the karaoke song.

一方、目が不自由な利用者や、表示部が観にくい位置に座ってしまった利用者は、表示部に表示される歌詞を参照することが困難である。 On the other hand, it is difficult for a visually impaired user or a user who sits in a position where the display is difficult to see to refer to the lyrics displayed on the display.

ここで、特許文献１には、骨伝導を利用して歌詞を読み上げる歌唱アシスト機能を有するカラオケシステムが開示されている。また、特許文献２及び特許文献３には、音声合成を用いた歌詞の読み上げに関する技術が開示されている。 Here, Patent Document 1 discloses a karaoke system having a singing assist function for reading lyrics by utilizing bone conduction. Further, Patent Documents 2 and 3 disclose techniques relating to the reading of lyrics using voice synthesis.

特開２００５−２４２０５７号公報JP, 2005-242057, A 特開平１０−１６１６８３号公報JP, 10-161683, A 特開平１１−１３３９８９号公報JP-A-11-133989

しかし、特許文献１のカラオケシステムの場合、骨伝導のための特殊な装置が必要となる。また、骨伝導を利用するため歌唱する態勢等に制約があり、煩雑である。また、特許文献２及び特許文献３の技術は、カラオケ装置専用のプログラムを開発する必要があるため、カラオケ店舗等に既に設置されているカラオケ装置において容易に利用することができない。すなわち、従来の技術では手軽にカラオケ歌唱を行うことが困難である。 However, the karaoke system of Patent Document 1 requires a special device for bone conduction. In addition, since bone conduction is used, there is a restriction on the stance of singing, which is complicated. Further, the techniques of Patent Documents 2 and 3 need to develop a program dedicated to a karaoke device, and therefore cannot be easily used in a karaoke device already installed in a karaoke store or the like. That is, it is difficult to easily sing a karaoke song with the conventional technology.

本発明の目的は、表示される歌詞を参照できない場合であっても、手軽にカラオケ歌唱を行うことを可能とするカラオケ歌唱サポート装置及びカラオケ歌唱サポートプログラムを提供することにある。 An object of the present invention is to provide a karaoke singing support device and a karaoke singing support program that enable easy karaoke singing even when the displayed lyrics cannot be referred to.

上記目的を達成するための主たる発明は、カラオケ演奏に伴ってカラオケ装置の表示部に表示される画像を撮影する撮影部と、撮影された画像に含まれる歌詞の文字列を認識する文字認識部と、第１の画像に含まれる文字列と、前記第１の画像よりも後のタイミングで撮影された第２の画像に含まれる文字列とを比較し、前記第２の画像に含まれる文字列に前記第１の画像には無い新たな文字列が含まれているかどうかを判定する判定部と、前記新たな文字列が含まれている場合、当該新たな文字列を音声データに変換する音声データ変換部と、前記新たな文字列に相当する音声データを出力させる音声出力処理部と、を有するカラオケ歌唱サポート装置である。
本発明の他の特徴については、後述する明細書及び図面の記載により明らかにする。 A main invention for achieving the above-mentioned object is a photographing unit that photographs an image displayed on a display unit of a karaoke device in association with a karaoke performance, and a character recognition unit that recognizes a character string of lyrics included in the photographed image. And a character string included in the second image by comparing the character string included in the first image with the character string included in the second image captured at a timing later than the first image. A determination unit that determines whether a new character string that is not included in the first image is included in the string, and, if the new character string is included, converts the new character string into audio data. A karaoke singing support device having a voice data conversion unit and a voice output processing unit for outputting voice data corresponding to the new character string.
Other features of the present invention will become apparent from the description and drawings described below.

本発明によれば、表示される歌詞を参照できない場合であっても、手軽にカラオケ歌唱を行うことが可能となる。 According to the present invention, even if the displayed lyrics cannot be referred to, it becomes possible to easily sing a karaoke song.

第１実施形態に係るカラオケ歌唱サポート装置のハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of the karaoke song support apparatus which concerns on 1st Embodiment. 第１実施形態に係るカラオケ歌唱サポート装置のソフトウェア構成例を示す図である。It is a figure which shows the software structural example of the karaoke song support apparatus which concerns on 1st Embodiment. 第１実施形態に係る文字記憶部に記憶されているデータ例を示す図である。It is a figure which shows the example of the data stored in the character storage part which concerns on 1st Embodiment. 第１実施形態に係る撮影部により撮影された画像に含まれる文字列を示す図である。It is a figure which shows the character string contained in the image image | photographed by the imaging part which concerns on 1st Embodiment. 第１実施形態に係る撮影部により撮影された画像に含まれる文字列を示す図である。It is a figure which shows the character string contained in the image image | photographed by the imaging part which concerns on 1st Embodiment. 第１実施形態に係るカラオケ歌唱サポート装置の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the karaoke song support apparatus which concerns on 1st Embodiment. 第２実施形態に係るカラオケ歌唱サポート装置のソフトウェア構成例を示す図である。It is a figure which shows the software structural example of the karaoke song support apparatus which concerns on 2nd Embodiment. 第２実施形態に係る音声データ記憶部に記憶されているデータ例を示す図である。It is a figure which shows the data example memorize | stored in the audio | voice data storage part which concerns on 2nd Embodiment. 第２実施形態に係る撮影部により撮影された画像に含まれる文字列を示す図である。It is a figure which shows the character string contained in the image image | photographed by the imaging part which concerns on 2nd Embodiment. 第２実施形態に係る撮影部により撮影された画像に含まれる文字列を示す図である。It is a figure which shows the character string contained in the image image | photographed by the imaging part which concerns on 2nd Embodiment. 第２実施形態に係る撮影部により撮影された画像に含まれる文字列を示す図である。It is a figure which shows the character string contained in the image image | photographed by the imaging part which concerns on 2nd Embodiment. 第２実施形態に係る撮影部により撮影された画像に含まれる文字列を示す図である。It is a figure which shows the character string contained in the image image | photographed by the imaging part which concerns on 2nd Embodiment. 第２実施形態に係る撮影部により撮影された画像に含まれる文字列を示す図である。It is a figure which shows the character string contained in the image image | photographed by the imaging part which concerns on 2nd Embodiment. 第２実施形態に係る撮影部により撮影された画像に含まれる文字列を示す図である。It is a figure which shows the character string contained in the image image | photographed by the imaging part which concerns on 2nd Embodiment.

後述する明細書及び図面の記載から、上記の主たる発明の他、少なくとも以下の事項が明らかとなる。 In addition to the main invention described above, at least the following matters will be apparent from the description and drawings described below.

すなわち、前記判定部は、前記第２の画像よりも後のタイミングで撮影された第３の画像に含まれる文字列に、前記第１の画像及び前記第２の画像の双方に含まれる共通の文字列が存在するかどうかを判定し、前記音声出力処理部は、前記第３の画像に含まれる文字列に前記共通の文字列が存在しないと判定された場合に、前記新たな文字列に相当する音声データを出力させるカラオケ歌唱サポート装置が明らかとなる。このようなカラオケ歌唱サポート装置によれば、新たな文字列の歌唱を開始するタイミング近傍で音声データとを出力することができる。 That is, the determination unit may include a common character string included in both the first image and the second image in the character string included in the third image captured at a timing later than the second image. If it is determined that the common character string does not exist in the character strings included in the third image, the voice output processing unit determines whether the character string exists in the new character string. A karaoke singing support device that outputs corresponding audio data becomes clear. According to such a karaoke singing support device, it is possible to output the voice data in the vicinity of the timing of singing a new character string.

また、前記判定部は、前記第２の画像よりも後のタイミングで撮影された第３の画像に含まれる文字列に、前記第１の画像及び前記第２の画像の双方に含まれる共通の文字列であって、且つ所定割合以上、色替えされた文字列が存在するかどうかを判定し、前記音声出力処理部は、前記第３の画像に含まれる文字列に前記色替えされた文字列が存在すると判定された場合に、前記新たな文字列に相当する音声データを出力させるカラオケ歌唱サポート装置が明らかとなる。このようなカラオケ歌唱サポート装置によれば、新たな文字列の歌唱を開始するタイミング近傍で音声データとを出力することができる。 In addition, the determination unit may include a common character string included in both the first image and the second image in the character string included in the third image captured at a timing later than the second image. It is determined whether or not there is a character string that is a character string and has been color-changed by a predetermined ratio or more, and the voice output processing unit is configured to change the color-changed character to the character string included in the third image. A karaoke singing support device that outputs voice data corresponding to the new character string becomes clear when it is determined that the string exists. According to such a karaoke singing support device, it is possible to output the voice data in the vicinity of the timing of singing a new character string.

更に、撮影部を備えるコンピュータに対し、前記撮影部に、カラオケ演奏に伴ってカラオケ装置の表示部に表示される画像を撮影させ、撮影された画像に含まれる歌詞の文字列を認識させ、第１の画像に含まれる文字列と、前記第１の画像よりも後のタイミングで撮影された第２の画像に含まれる文字列とを比較し、前記第２の画像に含まれる文字列に前記第１の画像には無い新たな文字列が含まれているかどうかを判定させ、前記新たな文字列が含まれている場合、当該新たな文字列を音声データに変換させ、前記音声出力部を介し、前記新たな文字列に相当する音声データを出力させる処理を行うカラオケ歌唱サポートプログラムが明らかとなる。このようなカラオケ歌唱サポートプログラムによれば、表示される歌詞を参照できない場合であっても、手軽にカラオケ歌唱を行うことが可能となる。 Further, the computer having a photographing unit is caused to cause the photographing unit to photograph an image displayed on the display unit of the karaoke device in association with the karaoke performance, and to recognize a character string of lyrics included in the photographed image. The character string included in the first image is compared with the character string included in the second image captured at a timing later than the first image, and the character string included in the second image is compared with the character string included in the second image. It is determined whether or not a new character string that is not included in the first image is included, and when the new character string is included, the new character string is converted into voice data, and the voice output unit is A karaoke singing support program for performing a process of outputting voice data corresponding to the new character string becomes clear. According to such a karaoke song support program, it is possible to easily perform a karaoke song even if the displayed lyrics cannot be referred to.

＜第１実施形態＞
図１〜図５を参照して、第１実施形態に係るカラオケ歌唱サポート装置１について説明する。 <First Embodiment>
The karaoke song support device 1 according to the first embodiment will be described with reference to FIGS. 1 to 5.

＝＝カラオケ歌唱サポート装置＝＝
（ハードウェア構成）
カラオケ歌唱サポート装置１は、カラオケ歌唱の際に参照する歌詞を音声で利用者に提供するための装置である。カラオケ歌唱サポート装置１としては、利用者が所有する携帯端末（ノート型パーソナルコンピュータ、タブレット端末、スマートフォン等）を用いることができる。 == Karaoke singing support device ==
(Hardware configuration)
The karaoke song support device 1 is a device for providing the user with the lyrics to be referred to when singing a karaoke song by voice. As the karaoke singing support device 1, a mobile terminal (notebook personal computer, tablet terminal, smartphone, etc.) owned by the user can be used.

図１に示すように、カラオケ歌唱サポート装置１は、撮影部１０、制御部２０、記憶部３０、音声出力部４０、操作部５０、表示部６０、通信部７０を備える。 As shown in FIG. 1, the karaoke singing support device 1 includes a photographing unit 10, a control unit 20, a storage unit 30, a voice output unit 40, an operation unit 50, a display unit 60, and a communication unit 70.

撮影部１０は、カラオケ演奏に伴ってカラオケ装置の表示部に表示される画像を撮影するための構成である。撮影部１０は、携帯端末が備える撮像素子や撮像回路等をそのまま利用することができる。表示部に表示される画像は、背景映像に歌詞を重畳させた画像である。すなわち、画像には歌詞に相当する文字列が含まれる。また、この画像はカラオケ演奏に伴って変化する（文字列も変化する）。 The image capturing section 10 is a configuration for capturing an image displayed on the display section of the karaoke device in association with the karaoke performance. The imaging unit 10 can directly use the imaging device, the imaging circuit, and the like included in the mobile terminal. The image displayed on the display unit is an image in which lyrics are superimposed on the background image. That is, the image includes a character string corresponding to the lyrics. Further, this image changes with the performance of karaoke (the character string also changes).

制御部２０は、ＣＰＵ２０ａおよびメモリ２０ｂを備える。ＣＰＵ２０ａは、メモリ２０ｂに記憶された動作プログラムを実行することにより各種の制御機能を実現する。メモリ２０ｂは、ＣＰＵ２０ａに実行されるプログラムを記憶したり、プログラムの実行時に各種情報を一時的に記憶したりする記憶装置である。記憶部３０は、各種のデータを記憶する大容量の記憶装置である。 The control unit 20 includes a CPU 20a and a memory 20b. The CPU 20a realizes various control functions by executing the operation program stored in the memory 20b. The memory 20b is a storage device that stores a program executed by the CPU 20a and temporarily stores various information when the program is executed. The storage unit 30 is a large-capacity storage device that stores various data.

音声出力部４０は、音声データに基づく音声を端末外部に出力するためのスピーカや、音声データを増幅するためのアンプ等である。なお、「音声」と「音声データ」は一対一に対応するため、以下の説明において同一視する場合がある。 The voice output unit 40 is a speaker for outputting voice based on the voice data to the outside of the terminal, an amplifier for amplifying the voice data, and the like. Since "voice" and "voice data" have a one-to-one correspondence, they may be regarded as the same in the following description.

利用者が本実施形態に係るカラオケ歌唱サポート装置１を使用する場合、音声出力部４０から出力される音声を聴きながらカラオケ歌唱を行う。この際、カラオケ歌唱の場ではカラオケ演奏音等が放音されているため、音声をスピーカから直接聴き取ることが困難な場合がある。従って、音声出力部４０からの音声は、イヤホン等を介して聴くことが好ましい。 When the user uses the karaoke singing support device 1 according to the present embodiment, the karaoke singing is performed while listening to the sound output from the sound output unit 40. At this time, since a karaoke performance sound or the like is emitted at the karaoke singing place, it may be difficult to hear the voice directly from the speaker. Therefore, it is preferable to listen to the voice from the voice output unit 40 via an earphone or the like.

操作部５０は、利用者がカラオケ歌唱サポート装置１に対して指示入力を行うための構成である。携帯端末においては、タッチパネル方式の表示部６０が操作部５０を兼ねるように構成してもよい。表示部６０は、撮影部１０により撮影された画像や、操作用のＧＵＩを表示させるためのディスプレイである。通信部７０は、サーバ（図示なし）と通信を行うためのインターフェースを提供する。 The operation unit 50 is a configuration for the user to input an instruction to the karaoke singing support device 1. In the mobile terminal, the display unit 60 of the touch panel type may be configured to also serve as the operation unit 50. The display unit 60 is a display for displaying an image captured by the image capturing unit 10 and a GUI for operation. The communication unit 70 provides an interface for communicating with a server (not shown).

（ソフトウェア構成）
利用者は、携帯端末にサーバ（図示なし）から所定のアプリケーションソフトウェアをダウンロードする。このアプリケーションを起動することにより、所定のプログラム（カラオケ歌唱サポートプログラム）が実行される。アプリケーションを起動させると、カラオケ装置の表示部を撮影するための撮影モードが立ち上がる。利用者が操作部５０を介して撮影開始の指示入力を行うと、撮影部１０は、当該プログラムに基づいてカラオケ装置の表示部に表示される画像を順次撮影する。撮影部１０で撮影された画像（画像データ）は、文字認識部１００（後述）に出力される。なお、画像を撮影するタイミングは、所定時間毎であってもよいし、連続（動画）であってもよい。 (Software configuration)
The user downloads predetermined application software from a server (not shown) to the mobile terminal. A predetermined program (karaoke singing support program) is executed by activating this application. When the application is activated, a shooting mode for shooting the display section of the karaoke device is activated. When the user inputs a shooting start instruction through the operation unit 50, the shooting unit 10 sequentially shoots the images displayed on the display unit of the karaoke device based on the program. The image (image data) captured by the image capturing unit 10 is output to the character recognition unit 100 (described later). The timing of capturing an image may be every predetermined time or may be continuous (moving image).

図２はカラオケ歌唱サポート装置１のソフトウェア構成例を示す図である。カラオケ歌唱サポート装置１は、文字認識部１００、文字記憶部２００、判定部３００、音声データ変換部４００、音声出力処理部５００を備える。文字記憶部２００は、記憶部３０の記憶領域の一部として構成される。文字認識部１００、判定部３００、音声データ変換部４００、及び音声出力処理部５００は、制御部２０のＣＰＵがメモリに記憶されるプログラムを実行することにより実現される。なお、本実施形態に係るカラオケ歌唱サポート装置１は、少なくとも文字認識部１００、判定部３００、音声データ変換部４００、及び音声出力処理部５００を備えていればよい。 FIG. 2 is a diagram showing a software configuration example of the karaoke singing support device 1. The karaoke song support device 1 includes a character recognition unit 100, a character storage unit 200, a determination unit 300, a voice data conversion unit 400, and a voice output processing unit 500. The character storage unit 200 is configured as a part of the storage area of the storage unit 30. The character recognition unit 100, the determination unit 300, the voice data conversion unit 400, and the voice output processing unit 500 are realized by the CPU of the control unit 20 executing a program stored in the memory. The karaoke song support device 1 according to the present embodiment may include at least the character recognition unit 100, the determination unit 300, the voice data conversion unit 400, and the voice output processing unit 500.

［文字認識部］
文字認識部１００は、撮影された画像に含まれる歌詞の文字列を認識する。文字認識に関しては、光学文字認識（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅｃｏｇｎｉｔｉｏｎ：ＯＣＲ）に関する技術を応用することができる。具体例として、文字認識部２００は、撮影部１０から出力された一の画像を走査し、認識可能な文字列があるかどうかを判定する。次に、文字認識部１００は、認識可能と判定された文字列と記憶部３０に予め登録された基準文字データとを照合し、具体的な文字列を特定する。文字認識部１００は、認識した文字列をテキストデータ化し、当該文字列が含まれる画像に対応する画像ＩＤと関連付けて文字記憶部２００に出力する。 [Character recognition part]
The character recognition unit 100 recognizes a character string of lyrics included in a captured image. Regarding character recognition, a technique related to optical character recognition (OCR) can be applied. As a specific example, the character recognition unit 200 scans one image output from the imaging unit 10 and determines whether there is a recognizable character string. Next, the character recognition unit 100 collates the character string determined to be recognizable with the reference character data registered in the storage unit 30 in advance, and specifies a specific character string. The character recognition unit 100 converts the recognized character string into text data, outputs it to the character storage unit 200 in association with the image ID corresponding to the image including the character string.

［文字記憶部］
文字記憶部２００は、文字認識部１００で認識された文字列を時系列に記憶する。図３は、文字記憶部２００に記憶されているデータの一例である。各文字列（テキストデータ）は、画像ＩＤと関連付けられたテーブル形式で時系列に記憶されている。 [Character storage]
The character storage unit 200 stores the character string recognized by the character recognition unit 100 in time series. FIG. 3 is an example of data stored in the character storage unit 200. Each character string (text data) is stored in time series in a table format associated with the image ID.

［判定部］
判定部３００は、第１の画像に含まれる文字列と、第１の画像よりも後のタイミングで撮影された第２の画像に含まれる文字列とを比較し、第２の画像に含まれる文字列に第１の画像には無い新たな文字列が含まれているかどうかを判定する。判定部３００による比較は、新たな画像が撮影された場合に実行される。たとえば、撮影部１０により画像Ｉ１、画像Ｉ２、画像Ｉ３の順で撮影されるとする。この場合、判定部３００は、画像Ｉ２が撮影された場合には、一つ前に撮影された画像Ｉ１に含まれる文字列と画像Ｉ２に含まれる文字列との比較を行い、画像Ｉ３が撮影された場合には、一つ前に撮影された画像Ｉ２に含まれる文字列と画像Ｉ３に含まれる文字列との比較を行う。また、新たな文字列が含まれているかどうかは、各文字列のテキストデータが一致するかどうかにより判定することができる。 [Judgment part]
The determination unit 300 compares the character string included in the first image with the character string included in the second image captured at a timing later than the first image, and included in the second image. It is determined whether the character string includes a new character string that is not included in the first image. The comparison by the determination unit 300 is executed when a new image is captured. For example, it is assumed that the image capturing unit 10 captures images I1, I2, and I3 in this order. In this case, when the image I2 is captured, the determination unit 300 compares the character string included in the image I1 captured immediately before with the character string included in the image I2, and the image I3 is captured. If so, the character string included in the image I2 captured immediately before is compared with the character string included in the image I3. Further, whether or not a new character string is included can be determined based on whether or not the text data of each character string matches.

［音声データ変換部］
音声データ変換部４００は、第２の画像に第１の画像には無い新たな文字列が含まれている場合、当該新たな文字列を音声データに変換する。文字列（テキストデータ）から音声データへの変換は公知の技術を用いることができる。 [Voice data converter]
When the second image includes a new character string that is not included in the first image, the voice data conversion unit 400 converts the new character string into voice data. A known technique can be used to convert a character string (text data) into voice data.

［音声出力処理部］
音声出力処理部５００は、新たな文字列に相当する音声データを出力させる。音声データの出力は音声出力部４０を介して行う。利用者は、イヤホン等を介して出力された音声データに基づく音声を聴くことにより、次に歌唱すべき歌詞を把握することができる。なお、本実施形態において、音声出力処理部５００は、新たな文字列に相当する音声データが入力されたタイミングで当該音声データを出力させる。 [Voice output processing unit]
The voice output processing unit 500 outputs voice data corresponding to a new character string. The audio data is output via the audio output unit 40. The user can grasp the lyrics to be sung next by listening to the voice based on the voice data output via the earphone or the like. In the present embodiment, the voice output processing unit 500 outputs the voice data at the timing when the voice data corresponding to the new character string is input.

（カラオケ歌唱サポート装置の動作）
ここで、図４Ａ〜図５を参照して、本実施形態に係るカラオケ歌唱サポート装置１の動作例について説明を行う。 (Operation of karaoke song support device)
Here, an operation example of the karaoke song support device 1 according to the present embodiment will be described with reference to FIGS. 4A to 5.

図４Ａ及び図４Ｂは、撮影部１０によって撮影された画像に含まれる文字列を示す図である。図４Ａに示す画像Ｉ１は、文字列Ｃ１及び文字列Ｃ２からなる歌詞が表示されている。図４Ｂに示す画像Ｉ２は、画像Ｉ１よりも後のタイミングで撮影されたものであり、文字列Ｃ２及び文字列Ｃ３からなる歌詞が表示されている。図４Ａの画像Ｉ１は、「第１の画像」の一例であり、図４Ｂの画像Ｉ２は、「第２の画像」の一例である。図５は、カラオケ歌唱サポート装置１の動作例を示すフローチャートである。なお、画像Ｉ１は、ある楽曲のカラオケ演奏開始後にはじめて表示された画像であるとする。 4A and 4B are diagrams showing character strings included in an image captured by the image capturing unit 10. In the image I1 shown in FIG. 4A, the lyrics composed of the character string C1 and the character string C2 are displayed. The image I2 shown in FIG. 4B is taken at a timing later than the image I1, and the lyrics composed of the character strings C2 and C3 are displayed. The image I1 of FIG. 4A is an example of the “first image”, and the image I2 of FIG. 4B is an example of the “second image”. FIG. 5 is a flowchart showing an operation example of the karaoke song support device 1. It is assumed that the image I1 is an image displayed for the first time after the start of the karaoke performance of a certain song.

まず、撮影部１０は、カラオケ装置の表示部に表示された画像Ｉ１の撮影を行う（画像Ｉ１の撮影。ステップ１０）。撮影された画像Ｉ１は、文字認識部１００に出力される。 First, the photographing unit 10 photographs the image I1 displayed on the display unit of the karaoke device (photographing the image I1. Step 10). The captured image I1 is output to the character recognition unit 100.

文字認識部１００は、ステップ１０で撮影された画像Ｉ１に含まれる歌詞の文字列Ｃ１、Ｃ２を認識する（文字列Ｃ１、Ｃ２の認識。ステップ１１）。文字認識部１００は、認識した文字列Ｃ１、Ｃ２を画像Ｉ１の識別子と関連付けて文字記憶部２００に出力する。 The character recognition unit 100 recognizes the character strings C1 and C2 of the lyrics included in the image I1 captured in step 10 (recognition of the character strings C1 and C2, step 11). The character recognition unit 100 outputs the recognized character strings C1 and C2 to the character storage unit 200 in association with the identifier of the image I1.

文字記憶部２００は、ステップ１１で認識された文字列Ｃ１、Ｃ２を記憶する（文字列Ｃ１、Ｃ２の記憶。ステップ１２）。 The character storage unit 200 stores the character strings C1 and C2 recognized in step 11 (storage of the character strings C1 and C2, step 12).

判定部３００は、文字記憶部２００から画像Ｉ１に含まれる文字列Ｃ１、Ｃ２を読み出す。ここで、この例において画像Ｉ１よりも前のタイミングで撮影された画像は存在しない。この場合、判定部３００は、文字列Ｃ１、Ｃ２を新たな文字列として判定する（新たな文字列の判定。ステップ１３）。判定部３００は、文字列Ｃ１、Ｃ２を音声データ変換部４００に出力する。 The determination unit 300 reads the character strings C1 and C2 included in the image I1 from the character storage unit 200. Here, in this example, there is no image captured before the image I1. In this case, the determination unit 300 determines the character strings C1 and C2 as new character strings (determination of new character strings, step 13). The determination unit 300 outputs the character strings C1 and C2 to the voice data conversion unit 400.

音声データ変換部４００は、ステップ１３で判定された新たな文字列Ｃ１、Ｃ２を音声データに変換する（文字列Ｃ１、Ｃ２を音声データに変換。ステップ１４）。音声データ変換部４００は、変換した音声データを音声出力処理部５００に出力する。 The voice data conversion unit 400 converts the new character strings C1 and C2 determined in step 13 into voice data (converts the character strings C1 and C2 into voice data, step 14). The voice data conversion unit 400 outputs the converted voice data to the voice output processing unit 500.

音声出力処理部５００は、ステップ１４で変換された文字列Ｃ１、Ｃ２に相当する音声データを出力させる（文字列Ｃ１、Ｃ２の音声データを出力。ステップ１５）。 The voice output processing unit 500 outputs the voice data corresponding to the character strings C1 and C2 converted in step 14 (outputs the voice data of the character strings C1 and C2, step 15).

次に、撮影部１０は、カラオケ装置の表示部に表示された画像Ｉ２の撮影を行う（画像Ｉ２の撮影。ステップ１６）。撮影された画像Ｉ２は、文字認識部１００に出力される。 Next, the photographing unit 10 photographs the image I2 displayed on the display unit of the karaoke device (photographing the image I2, step 16). The captured image I2 is output to the character recognition unit 100.

文字認識部１００は、ステップ１６で撮影された画像Ｉ２に含まれる歌詞の文字列Ｃ２、Ｃ３を認識する（文字列Ｃ２、Ｃ３の認識。ステップ１７）。文字認識部１００は、認識した文字列Ｃ２、Ｃ３を画像Ｉ２の識別子と関連付けて文字記憶部２００に出力する。 The character recognition unit 100 recognizes the character strings C2 and C3 of the lyrics included in the image I2 captured in step 16 (recognition of the character strings C2 and C3, step 17). The character recognition unit 100 outputs the recognized character strings C2 and C3 to the character storage unit 200 in association with the identifier of the image I2.

文字記憶部２００は、ステップ１７で認識された文字列Ｃ２、Ｃ３を記憶する（文字列Ｃ２、Ｃ３の記憶。ステップ１８）。 The character storage unit 200 stores the character strings C2 and C3 recognized in step 17 (storage of character strings C2 and C3, step 18).

新たな画像Ｉ２が撮影された場合、判定部３００は、新たな画像Ｉ２に含まれる文字列Ｃ２、Ｃ３（ステップ１８で記憶された文字列Ｃ２、Ｃ３）と、一つ前に撮影された画像Ｉ１に含まれる文字列Ｃ１、Ｃ２（ステップ１２で記憶された文字列Ｃ１、Ｃ２）とを文字記憶部２００から読み出す。そして、判定部３００は、画像Ｉ２に含まれる文字列に画像Ｉ１には無い新たな文字列が含まれているかどうかを判定する（新たな文字列の判定。ステップ１９）。この例では画像Ｉ２に新たな文字列Ｃ３が含まれている。この場合、判定部３００は、文字列Ｃ３を音声データ変換部４００に出力する。なお、新たな文字列が含まれていない場合には、特段の処理は行わない。判定部３００は、画像Ｉ２とその次のタイミングで撮影された画像に含まれる文字列について同様の処理を行う。 When the new image I2 is captured, the determination unit 300 determines the character strings C2 and C3 (the character strings C2 and C3 stored in step 18) included in the new image I2 and the image captured immediately before. The character strings C1 and C2 included in I1 (the character strings C1 and C2 stored in step 12) are read from the character storage unit 200. Then, the determination unit 300 determines whether the character string included in the image I2 includes a new character string that is not included in the image I1 (determination of a new character string, step 19). In this example, the image I2 includes the new character string C3. In this case, the determination unit 300 outputs the character string C3 to the voice data conversion unit 400. If the new character string is not included, no special processing is performed. The determination unit 300 performs the same process for the character string included in the image I2 and the image captured at the next timing.

音声データ変換部４００は、ステップ１９で判定された新たな文字列Ｃ３を音声データに変換する（文字列Ｃ３を音声データに変換。ステップ２０）。音声データ変換部４００は、変換した音声データを音声出力処理部５００に出力する。 The voice data conversion unit 400 converts the new character string C3 determined in step 19 into voice data (converts the character string C3 into voice data, step 20). The voice data conversion unit 400 outputs the converted voice data to the voice output processing unit 500.

音声出力処理部５００は、ステップ２０で変換された新たな文字列Ｃ３に相当する音声データを出力させる（文字列Ｃ３の音声データを出力。ステップ２１）。カラオケ歌唱サポート装置１は、ある楽曲のカラオケ演奏が終了するまで上記処理を繰り返し行う。 The voice output processing unit 500 outputs voice data corresponding to the new character string C3 converted in step 20 (outputs voice data of the character string C3, step 21). The karaoke singing support device 1 repeats the above-described processing until the karaoke performance of a certain music is completed.

このように、カラオケ歌唱サポート装置１によれば、歌詞を音声データとして出力することができる。また、カラオケ歌唱サポート装置１として利用者の携帯端末を利用することにより、従来からあるカラオケ装置をそのまま利用することができる。すなわち、本実施形態に係るカラオケ歌唱サポート装置１によれば、利用者がカラオケ装置の表示部に表示される歌詞を参照できない場合であっても、手軽にカラオケ歌唱を行うことが可能となる。 As described above, according to the karaoke song support device 1, the lyrics can be output as voice data. Further, by using the user's mobile terminal as the karaoke singing support device 1, the existing karaoke device can be used as it is. That is, according to the karaoke song support device 1 according to the present embodiment, even if the user cannot refer to the lyrics displayed on the display unit of the karaoke device, it becomes possible to easily perform the karaoke song.

＜第２実施形態＞
図６〜図９Ｃを参照して、第２実施形態に係るカラオケ歌唱サポート装置１について説明する。第１実施形態と同様の構成については詳細な説明を省略する。 <Second Embodiment>
The karaoke song support apparatus 1 according to the second embodiment will be described with reference to FIGS. 6 to 9C. Detailed description of the same configuration as that of the first embodiment is omitted.

第１実施形態の構成において、音声出力処理部５００は、音声データ変換部４００から新たな文字列に相当する音声データが入力されたタイミングで当該音声データを出力させている。この場合、利用者がカラオケ歌唱行っているタイミングと音声データの出力とが重なる可能性がある。たとえば、第１実施形態の例において、利用者が文字列Ｃ２を歌唱している途中で文字列Ｃ３に相当する音声が出力されると、利用者の歌唱音声と文字列Ｃ３に相当する音声とが重なって聴こえてしまう。この場合、利用者は歌いながら音声の確認を行うこととなり、歌唱（或いは音声を聴くこと）に集中できない。また、出力される音声データは、次に歌唱すべき文字列に相当するため、当該文字列の歌唱を開始するタイミングで出力されることが好ましい。本実施形態では、音声データを出力するタイミングを調整する例について述べる。 In the configuration of the first embodiment, the voice output processing unit 500 outputs the voice data at the timing when the voice data corresponding to the new character string is input from the voice data conversion unit 400. In this case, the timing when the user sings a karaoke song and the output of the voice data may overlap. For example, in the example of the first embodiment, when the voice corresponding to the character string C3 is output while the user is singing the character string C2, the singing voice of the user and the voice corresponding to the character string C3 are output. Can be heard overlapping. In this case, the user must check the voice while singing, and cannot concentrate on singing (or listening to the voice). Further, since the output voice data corresponds to a character string to be sung next, it is preferable that the output voice data be output at the timing of starting singing the character string. In the present embodiment, an example of adjusting the timing of outputting audio data will be described.

図６はカラオケ歌唱サポート装置１のソフトウェア構成例を示す図である。カラオケ歌唱サポート装置１は、文字認識部１００、文字記憶部２００、判定部３００、音声データ変換部４００、音声出力処理部５００、音声データ記憶部６００を備える。文字記憶部２００、及び音声データ記憶部６００は、記憶部３０の記憶領域の一部として構成される。文字認識部１００、判定部３００、音声データ変換部４００、及び音声出力処理部５００は、制御部２０のＣＰＵがメモリに記憶されるプログラムを実行することにより実現される。なお、本実施形態に係るカラオケ歌唱サポート装置１は、少なくとも文字認識部１００、判定部３００、音声データ変換部４００、及び音声出力処理部５００を備えていればよい。 FIG. 6 is a diagram showing a software configuration example of the karaoke singing support device 1. The karaoke singing support device 1 includes a character recognition unit 100, a character storage unit 200, a determination unit 300, a voice data conversion unit 400, a voice output processing unit 500, and a voice data storage unit 600. The character storage unit 200 and the voice data storage unit 600 are configured as part of the storage area of the storage unit 30. The character recognition unit 100, the determination unit 300, the voice data conversion unit 400, and the voice output processing unit 500 are realized by the CPU of the control unit 20 executing a program stored in the memory. The karaoke song support device 1 according to the present embodiment may include at least the character recognition unit 100, the determination unit 300, the voice data conversion unit 400, and the voice output processing unit 500.

［音声データ記憶部］
音声データ記憶部６００は、音声データ変換部４００で変換された音声データを記憶する。本実施形態に係る音声データ変換部４００は、変換した音声データを音声データ記憶部６００に出力する。音声データ記憶部６００は、音声データを対応する文字列と関連付けて、一旦記憶する。図７は、音声データ記憶部６００に記憶されているデータの一例である。各音声データ（Ｍ１、Ｍ２・・・）は、対応する文字列と関連付けられたテーブル形式で時系列に記憶されている。 [Voice data storage]
The voice data storage unit 600 stores the voice data converted by the voice data conversion unit 400. The voice data conversion unit 400 according to the present embodiment outputs the converted voice data to the voice data storage unit 600. The voice data storage unit 600 associates the voice data with the corresponding character string and temporarily stores the voice data. FIG. 7 is an example of data stored in the voice data storage unit 600. Each voice data (M1, M2 ...) Is chronologically stored in a table format associated with a corresponding character string.

［判定部］
本実施形態に係る判定部３００は、音声データ記憶部６００に記憶された音声データを出力するタイミングを調整する機能を有する。以下、２つの具体例について説明を行う。 [Judgment part]
The determination unit 300 according to the present embodiment has a function of adjusting the timing of outputting the audio data stored in the audio data storage unit 600. Two specific examples will be described below.

（具体例１）
１つ目の具体例として、判定部３００は、第２の画像よりも後のタイミングで撮影された第３の画像に含まれる文字列に、第１の画像及び第２の画像の双方に含まれる共通の文字列が存在するかどうかを判定する。具体的には、判定部３００は、第１の画像に含まれる文字列と第２の画像に含まれる文字列とを読み出して比較することで、双方に含まれる文字列の特定を行う。共通の文字列が特定された場合、判定部３００は、共通の文字列を第３の画像の文字列と比較し、共通の文字列が第３の画像の文字列に存在するかどうかを判定する。 (Specific example 1)
As a first specific example, the determination unit 300 includes the character string included in the third image captured at a timing later than that of the second image in both the first image and the second image. Whether there is a common character string that is Specifically, the determination unit 300 identifies the character strings included in both by reading and comparing the character strings included in the first image and the character strings included in the second image. When the common character string is specified, the determination unit 300 compares the common character string with the character string of the third image and determines whether the common character string exists in the character string of the third image. To do.

また、音声出力処理部５００は、第３の画像に含まれる文字列に共通の文字列が存在しないと判定された場合に、新たな文字列に相当する音声データを出力させる。 Further, when it is determined that the common character string does not exist in the character strings included in the third image, the sound output processing unit 500 outputs the sound data corresponding to the new character string.

図８Ａ〜図８Ｃは、撮影部１０によって撮影された画像に含まれる文字列を示す図である。図８Ａに示す画像Ｉ１は、文字列Ｃ１及び文字列Ｃ２からなる歌詞が表示されている。図８Ｂに示す画像Ｉ２は、画像Ｉ１よりも後のタイミングで撮影されたものであり、文字列Ｃ２及び文字列Ｃ３からなる歌詞が表示されている。図８Ｃに示す画像Ｉ３は、画像Ｉ２よりも後のタイミングで撮影されたものであり、文字列Ｃ３からなる歌詞が表示されている。図８Ａの画像Ｉ１は「第１の画像」の一例であり、図８Ｂの画像Ｉ２は「第２の画像」の一例であり、図８Ｃの画像Ｉ３は「第３の画像」の一例である。なお、画像Ｉ１は、ある楽曲のカラオケ演奏開始後にはじめて表示された画像であるとする。 8A to 8C are diagrams showing character strings included in an image captured by the image capturing unit 10. In the image I1 shown in FIG. 8A, the lyrics composed of the character string C1 and the character string C2 are displayed. The image I2 shown in FIG. 8B is taken at a timing later than the image I1, and the lyrics composed of the character string C2 and the character string C3 are displayed. The image I3 shown in FIG. 8C is taken at a timing later than the image I2, and the lyrics composed of the character string C3 is displayed. The image I1 in FIG. 8A is an example of the “first image”, the image I2 in FIG. 8B is an example of the “second image”, and the image I3 in FIG. 8C is an example of the “third image”. . It is assumed that the image I1 is an image displayed for the first time after the start of the karaoke performance of a certain song.

第１実施形態で説明した処理により、文字列Ｃ１〜文字列Ｃ３はそれぞれ音声データに変換される。本実施形態において、これらの音声データは音声データ記憶部６００に一旦記憶される。 By the processing described in the first embodiment, each of the character strings C1 to C3 is converted into voice data. In the present embodiment, these audio data are temporarily stored in the audio data storage unit 600.

本実施形態に係る判定部３００は、画像Ｉ２と画像Ｉ３とを比較する際、画像Ｉ３に含まれる文字列に、画像Ｉ１及び画像Ｉ２の双方に含まれる共通の文字列が存在するかどうかを判定する。この例において、画像Ｉ１及び画像Ｉ２の双方に含まれる共通の文字列Ｃ２は、画像Ｉ３には含まれていない（図８Ｃ参照）。この場合、判定部３００は、文字列Ｃ２が画像Ｉ３に含まれていないという判定結果を音声出力処理部５００に出力する。 When comparing the image I2 and the image I3, the determination unit 300 according to the present embodiment determines whether the character string included in the image I3 includes a common character string included in both the image I1 and the image I2. judge. In this example, the common character string C2 included in both the image I1 and the image I2 is not included in the image I3 (see FIG. 8C). In this case, the determination unit 300 outputs the determination result that the character string C2 is not included in the image I3 to the voice output processing unit 500.

ここで、先のタイミングで撮影された画像Ｉ１及び画像Ｉ２に含まれていた文字列Ｃ２が、後のタイミングで撮影された画像Ｉ３に含まれていないということは、画像Ｉ３が撮影された時点において、文字列Ｃ２に対応する歌唱（カラオケ演奏）は終了していると考えられる。一方、画像Ｉ２及び画像Ｉ３に含まれている文字列Ｃ３は、文字列Ｃ２の次に歌唱されるべきものであるため、文字列Ｃ２が含まれない画像が撮影された時点で文字列Ｃ３に相当する音声データを出力することにより、利用者は文字列Ｃ３を歌唱するタイミングで音声を聴き取ることが可能となる。 Here, the fact that the character string C2 included in the images I1 and I2 captured at the previous timing is not included in the image I3 captured at the later timing means that the image I3 was captured. In, it is considered that the song (karaoke performance) corresponding to the character string C2 has ended. On the other hand, since the character string C3 included in the images I2 and I3 is to be sung next to the character string C2, the character string C3 is added to the character string C3 at the time when an image that does not include the character string C2 is captured. By outputting the corresponding voice data, the user can listen to the voice at the timing of singing the character string C3.

そこで、音声出力処理部５００は、判定部３００からの判定結果に基づいて、画像Ｉ１及び画像Ｉ２の比較により得られた新たな文字列Ｃ３に基づく音声データを音声データ記憶部６００から読み出して出力させる。 Therefore, the voice output processing unit 500 reads out voice data based on the new character string C3 obtained by comparing the images I1 and I2 from the voice data storage unit 600 based on the determination result from the determination unit 300, and outputs the voice data. Let

このように、共通の文字列が含まれない画像が撮影された際に、当該共通の文字列と共に表示されている新たな文字列の音声データを出力することにより、当該文字列の歌唱を開始するタイミング近傍で音声データを出力することができる。 In this way, when an image that does not include a common character string is shot, the voice data of the new character string displayed together with the common character string is output to start singing the character string. The voice data can be output in the vicinity of the timing of the operation.

（具体例２）
２つ目の具体例として、判定部３００は、第２の画像よりも後のタイミングで撮影された第３の画像に含まれる文字列に、第１の画像及び第２の画像の双方に含まれる共通の文字列であって、且つ所定割合以上、色替えされた文字列が存在するかどうかを判定する。 (Specific example 2)
As a second specific example, the determination unit 300 includes the character string included in the third image captured at a timing later than that of the second image in both the first image and the second image. It is determined whether or not there is a common character string that has been color-changed by a predetermined ratio or more.

カラオケ装置は、表示部に表示された文字列の色をカラオケ演奏に合わせて変化させる機能（カラオケ演奏される前に対応する文字列を白抜きで表示させ、カラオケ演奏された後に対応する文字列を青で表示させる等）を有している。「色替え」とは、このような機能により文字列の色が替わることをいう。また、所定割合は、次に歌唱する文字列に相当する音声データを出力するタイミングを決定するための値である。所定割合は任意の値を設定することができる。たとえば、連続する文字列Ａ、Ｂがある場合に、文字列Ａが全て色替えされると、次に文字列Ｂの歌唱が開始される。よって、所定割合として、１００％（前の文字列が全て色替えされた状態）を設定してもよい。或いは、文字列Ａ、Ｂを連続して歌唱する場合等、文字列Ａと文字列Ｂのインターバルが短い場合には、１００％色替えされた後では音声データの出力タイミングとして遅い場合もありうる。従って、たとえば、所定割合として８０％（前の文字列が８割以上、色替えされた状態）を設定してもよい。 The karaoke device has a function of changing the color of the character string displayed on the display unit according to the karaoke performance (the corresponding character string is displayed in white before the karaoke performance, and the corresponding character string is displayed after the karaoke performance. Is displayed in blue). “Color change” means that the color of the character string is changed by such a function. In addition, the predetermined ratio is a value for determining the timing of outputting the voice data corresponding to the character string to be sung next. The predetermined ratio can be set to any value. For example, when there are consecutive character strings A and B, if the character string A is all changed in color, then the singing of the character string B is started. Therefore, 100% (a state in which all the previous character strings are color-changed) may be set as the predetermined ratio. Alternatively, if the intervals between the character strings A and B are short, such as when the character strings A and B are continuously sung, the output timing of the audio data may be delayed after 100% color change. . Therefore, for example, 80% (80% or more of the previous character string is in a color-changed state) may be set as the predetermined ratio.

音声出力処理部５００は、第３の画像に含まれる文字列に色替えされた文字列が存在すると判定された場合に、新たな文字列に相当する音声データを出力させる。 The voice output processing unit 500 outputs the voice data corresponding to the new character string when it is determined that the character string included in the third image has a color-changed character string.

図９Ａ〜図９Ｃは、撮影部１０によって撮影された画像に含まれる文字列を示す図である。図９Ａに示す画像Ｉ１は、文字列Ｃ１及び文字列Ｃ２からなる歌詞が表示されている。図９Ｂに示す画像Ｉ２は、画像Ｉ１よりも後のタイミングで撮影されたものであり、文字列Ｃ２及び文字列Ｃ３からなる歌詞が表示されている。図９Ｃに示す画像Ｉ３は、画像Ｉ２よりも後のタイミングで撮影されたものであり、文字列Ｃ２、Ｃ３からなる歌詞が表示されている。図９Ｃにおけるハッチングは、色替えされている文字列を示す。図９Ａの画像Ｉ１は「第１の画像」の一例であり、図９Ｂの画像Ｉ２は「第２の画像」の一例であり、図９Ｃの画像Ｉ３は「第３の画像」の一例である。なお、画像Ｉ１は、ある楽曲のカラオケ演奏開始後にはじめて表示された画像であるとする。 9A to 9C are diagrams showing character strings included in an image captured by the image capturing unit 10. In the image I1 shown in FIG. 9A, the lyrics composed of the character string C1 and the character string C2 are displayed. The image I2 shown in FIG. 9B is taken at a timing later than the image I1, and the lyrics composed of the character string C2 and the character string C3 are displayed. The image I3 shown in FIG. 9C is taken at a timing later than the image I2, and the lyrics composed of the character strings C2 and C3 are displayed. The hatching in FIG. 9C indicates a character string whose color has been changed. The image I1 of FIG. 9A is an example of the “first image”, the image I2 of FIG. 9B is an example of the “second image”, and the image I3 of FIG. 9C is an example of the “third image”. . It is assumed that the image I1 is an image displayed for the first time after the start of the karaoke performance of a certain song.

本実施形態に係る判定部３００は、画像Ｉ２と画像Ｉ３とを比較する際、画像Ｉ３に含まれる文字列に、画像Ｉ１及び画像Ｉ２の双方に含まれる共通の文字列であって、且つ所定割合以上、色替えされた文字列が存在するかどうかを判定する。この例において、画像Ｉ１及び画像Ｉ２の双方に含まれる共通の文字列Ｃ２は、画像Ｉ３において１００％色替えされている（図９Ｃ参照）。この場合、判定部３００は、文字列Ｃ２が所定割合以上、色替えされているという判定結果を音声出力処理部５００に出力する。 When comparing the image I2 and the image I3, the determination unit 300 according to the present embodiment determines that the character string included in the image I3 is a common character string included in both the image I1 and the image I2 and has a predetermined value. It is determined whether or not there is a character string whose color has been changed in proportion to the ratio or more. In this example, the common character string C2 included in both the image I1 and the image I2 is 100% color-changed in the image I3 (see FIG. 9C). In this case, the determination unit 300 outputs to the audio output processing unit 500 a determination result that the character string C2 is color-changed at a predetermined rate or more.

ここで、先のタイミングで撮影された画像Ｉ１及び画像Ｉ２に含まれていた文字列Ｃ２が、後のタイミングで撮影された画像Ｉ３で所定割合以上、色替えされているということは、画像Ｉ３が撮影された時点において、文字列Ｃ２に対応する歌唱（カラオケ演奏）は終了していると考えられる。一方、画像Ｉ２及び画像Ｉ３に含まれている文字列Ｃ３は、文字列Ｃ２の次に歌唱されるべきものであるため、文字列Ｃ２が所定割合以上、色替えされた画像が撮影された時点で文字列Ｃ３に相当する音声データを出力することにより、利用者は文字列Ｃ３を歌唱するタイミングで音声を聴き取ることが可能となる。 Here, the character string C2 included in the image I1 and the image I2 captured at the previous timing is color-changed in the image I3 captured at the later timing by a predetermined ratio or more, which means that the image I3 is changed. It is considered that the song (karaoke performance) corresponding to the character string C2 is completed at the time when is photographed. On the other hand, since the character string C3 included in the images I2 and I3 is to be sung next to the character string C2, the time when the color-changed image of the character string C2 is taken at a predetermined ratio or more. By outputting voice data corresponding to the character string C3, the user can listen to the voice at the timing of singing the character string C3.

このように、共通の文字列が所定割合以上、色替えされた際に、当該共通の文字列と共に表示されている新たな文字列の音声データを出力することにより、当該文字列の歌唱を開始するタイミング近傍で音声データを出力することができる。 In this way, when the common character string is color-changed by a predetermined ratio or more, the voice data of the new character string displayed together with the common character string is output to start singing the character string. The voice data can be output in the vicinity of the timing of the operation.

上記実施形態のプログラムが記憶された非一時的なコンピュータ可読媒体（non-transitory computer readable medium with an executable program thereon）を用いて、コンピュータにプログラムを供給することも可能である。なお、非一時的なコンピュータの可読媒体の例は、磁気記録媒体（例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ）、ＣＤ−ＲＯＭ（Read Only Memory）等がある。 It is also possible to supply a program to a computer using a non-transitory computer readable medium with an executable program stored therein. Examples of non-transitory computer-readable media include magnetic recording media (for example, flexible disk, magnetic tape, hard disk drive), CD-ROM (Read Only Memory), and the like.

上記実施形態は、例として提示したものであり、発明の範囲を限定するものではない。上記の構成は、適宜組み合わせて実施することが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。上記実施形態やその変形は、発明の範囲や要旨に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 The above embodiment is presented as an example and does not limit the scope of the invention. The above configurations can be implemented in an appropriate combination, and various omissions, replacements, and changes can be made without departing from the spirit of the invention. The above-described embodiments and modifications thereof are included in the invention described in the claims and equivalents thereof as well as included in the scope and the gist of the invention.

１カラオケ歌唱サポート装置
１０撮影部
１００文字認識部
３００判定部
４００音声データ変換部
５００音声出力処理部 1 Karaoke singing support device 10 Imaging unit 100 Character recognition unit 300 Judgment unit 400 Voice data conversion unit 500 Voice output processing unit

Claims

A shooting unit that shoots an image displayed on the display unit of the karaoke device along with the karaoke performance,
A character recognition unit that recognizes the character strings of lyrics included in the captured image,
The character string included in the first image is compared with the character string included in the second image captured at a timing later than the first image, and the character string included in the second image is compared. A determination unit that determines whether or not a new character string that is not included in the first image is included;
If the new character string is included, a voice data conversion unit that converts the new character string into voice data,
A voice output processing unit for outputting voice data corresponding to the new character string,
Karaoke singing support device having.

The determination unit is a common character string included in both the first image and the second image, in the character string included in the third image captured at a timing later than the second image. To determine if
The voice output processing unit outputs voice data corresponding to the new character string when it is determined that the common character string does not exist in the character strings included in the third image. The karaoke song support device according to claim 1.

The determination unit is a common character string included in both the first image and the second image, in the character string included in the third image captured at a timing later than the second image. And whether or not there is a color-changed character string at a predetermined rate or more,
The voice output processing unit outputs voice data corresponding to the new character string when it is determined that the color-changed character string exists in the character string included in the third image. The karaoke singing support device according to claim 1.

For a computer equipped with a shooting unit and an audio output unit,
The image capturing unit captures an image displayed on the display unit of the karaoke device along with the karaoke performance,
Recognize the text strings of the lyrics included in the captured image,
The character string included in the first image is compared with the character string included in the second image captured at a timing later than the first image, and the character string included in the second image is compared. It is determined whether a new character string that is not included in the first image is included,
When the new character string is included, the new character string is converted to voice data,
A karaoke singing support program for performing a process of outputting voice data corresponding to the new character string via the voice output unit.