JP6508567B2

JP6508567B2 - Karaoke apparatus, program for karaoke apparatus, and karaoke system

Info

Publication number: JP6508567B2
Application number: JP2015036340A
Authority: JP
Inventors: 勝巳戸田
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2015-02-26
Filing date: 2015-02-26
Publication date: 2019-05-08
Anticipated expiration: 2035-02-26
Also published as: JP2016157373A

Description

本発明は、音声により楽曲を検索可能なカラオケ装置に関する。 The present invention relates to a karaoke apparatus capable of searching for music by voice.

従来のカラオケ装置では、歌唱者は、楽曲名、または歌手名などを基に歌唱したい楽曲を検索し、検索結果のリストから本当に歌いたい曲を選択し歌唱することが可能である。また、従来のカラオケ装置の中には、歌唱者の歌唱データから抽出された情報に従って、歌唱者におすすめの曲を知らせることが可能なカラオケ装置がある。一例として、特許文献１には、歌唱者の歌唱データに基づいて、声質および歌う時のくせ、巧拙の程度にマッチしたカラオケ候補曲を提示することが可能なカラオケ装置が開示される。 In a conventional karaoke apparatus, a singer can search for a song that he / she wants to sing based on a song name, a singer name, etc., and can select and sing a song that he / she wants to sing from the list of search results. Further, among the conventional karaoke apparatuses, there is a karaoke apparatus capable of notifying a singer of recommended songs according to information extracted from the singing data of the singer. As an example, Patent Document 1 discloses a karaoke apparatus capable of presenting karaoke candidates that match the voice quality and the degree of skill and skill when singing based on singing data of a singer.

特開２００８−００３４８３号公報JP 2008-003483 A

しかしながら、一般的に、声質およびくせなどはユーザの歌唱時の体調および選曲に応じて変化する歌い方によって決まる。ユーザの体調は時々刻々と変化するため、歌唱データ作成中のユーザの体調が、歌唱データ作成後に楽曲を実際に選択する時点での現在のユーザの体調と異なる場合がある。そのため、歌唱データから抽出された声質が、現在のユーザの声質とは異なる恐れがある。 However, in general, voice quality, habit, and the like are determined depending on the user's singing condition and the way of singing that changes according to the music selection. Since the physical condition of the user changes from moment to moment, the physical condition of the user in creating the song data may differ from the current condition of the user at the time of actually selecting the music after creating the song data. Therefore, the voice quality extracted from the singing data may be different from the voice quality of the current user.

さらに、歌唱データ作成時に歌唱された楽曲のジャンルと、歌唱データ作成後に歌唱するために選択された楽曲のジャンルとが異なる場合がある。楽曲のジャンルが異なる場合、歌唱データ作成時のユーザの歌い方が、選択された楽曲を歌唱するユーザの歌い方と異なる恐れがある。歌い方が異なると、声質および歌う時のくせも異なる恐れがある。そのため、歌唱するユーザは選択された楽曲を歌唱し難いと感じ、その歌唱を聴く他のユーザは選択された楽曲と歌唱するユーザの声とに違和感を覚える可能性がある。 Furthermore, the genre of the song sung at the time of song data creation may be different from the genre of the song selected for singing after the song data creation. When the genre of the music is different, there is a possibility that the manner of the user singing at the time of creating the singing data may be different from the manner of the user singing the selected music. If you sing differently, the voice quality and the way you sing may be different. Therefore, the user who sings may feel that it is difficult to sing the selected music, and other users who listen to the song may feel uncomfortable with the selected music and the voice of the user who sings.

そこで、本発明は上述した事情に鑑みてなされたものであり、選曲時におけるユーザの現在の状況にマッチした楽曲をユーザに提示することができるカラオケ装置を提供することを目的とする。 Therefore, the present invention has been made in view of the above-described circumstances, and an object of the present invention is to provide a karaoke apparatus capable of presenting to a user a piece of music matching the current situation of the user at the time of music selection.

この目的を達成するために、請求項１記載のカラオケ装置は、楽曲関連情報をユーザに発音させるための指示を含む特定の指示を表す指示情報を記憶する記憶手段と、前記記憶手段に記憶された指示情報に従って前記特定の指示をユーザに提示する指示提示手段と、前記指示提示手段により提示された前記特定の指示に応答してユーザが発音した声を表す声情報を取得する取得手段と、前記取得手段により取得された前記声情報から、ユーザが前記特定の指示に応答した前記楽曲関連情報を含む応答内容、および現在のユーザの声の特徴を示す情報を抽出する抽出手段と、前記抽出手段により抽出された前記応答内容中の前記楽曲関連情報に従って複数の楽曲情報を検索する検索手段と、前記抽出手段により抽出された前記応答内容のうち前記楽曲関連情報以外の応答内容、および現在のユーザの前記声の特徴を示す情報に基づいて、前記検索手段により検索された複数の楽曲情報のうちでユーザに提示する楽曲情報の優先順位を決定する決定手段とを備えることを特徴とする。 In order to achieve this object, the karaoke apparatus according to claim 1 comprises: storage means for storing instruction information representing a specific instruction including an instruction for causing the user to pronounce music-related information; An instruction presenting means for presenting the specific instruction to the user according to the instruction information, an acquiring means for acquiring voice information representing a voice uttered by the user in response to the specific instruction presented by the instruction presenting means; Extraction means for extracting, from the voice information acquired by the acquisition means, response content including the music related information in response to the specific instruction by the user, and information indicating characteristics of the current user's voice, and the extraction means Search means for searching a plurality of pieces of music information according to the music related information in the contents of the response extracted by the means, and previous contents of the contents of the response extracted by the extraction means The priority order of the music information to be presented to the user among the plurality of music information searched by the search means is determined based on the response contents other than the music related information and the information indicating the feature of the voice of the current user. And determining means.

請求項２記載のカラオケ装置の前記指示情報は、楽曲発音指示情報と、思考発音指示情報、および利用形態発音指示情報の少なくとも１つと、を含み、前記楽曲発音指示情報は、前記楽曲関連情報をユーザに発音させる指示を表し、前記思考発音指示情報は、前記楽曲関連情報により関連付けられる楽曲に対するユーザの思考およびユーザの感情を表す思考情報をユーザに発音させる指示を表し、前記利用形態発音指示情報は、ユーザによるカラオケ装置の利用形態を表す利用形態情報をユーザに発音させる指示を表し、前記取得手段は、ユーザが発音した声を表す、第１声情報と、第２声情報および第３声情報の少なくとも１つと、を取得し、前記第１声情報は、前記楽曲関連情報を含む声を表す情報であり、前記第２声情報は、前記思考情報を含む声を表す情報であり、前記第３声情報は、前記利用形態情報を含む声を表す情報であり、前記抽出手段は、前記第１声情報と、前記第２声情報および前記第３声情報の少なくとも１つとから、前記楽曲関連情報と、前記思考情報および前記利用形態情報の少なくとも１つと、前記声の特徴を示す情報と、を抽出し、前記決定手段は、前記声の特徴を示す情報と、前記思考情報および前記利用形態情報の少なくとも１つと、に従って、前記検索手段により検索された前記複数の楽曲情報のうちでユーザに提示する楽曲情報の優先順位を決定することを特徴とする。 The instruction information of the karaoke apparatus according to claim 2 includes at least one of music sound generation instruction information, thought sound generation instruction information, and usage form sound generation instruction information, and the music sound generation instruction information includes the music related information. The instruction for causing the user to pronounce is indicated, and the thought pronunciation instruction information is an instruction for causing the user to pronounce thinking information representing the user's thoughts and emotions of the music associated with the music related information, and the usage form pronunciation instruction information Represents an instruction to cause the user to pronounce usage mode information representing a mode of use of the karaoke apparatus by the user, and the acquisition means represents first voice information, second voice information, and third voice representing a voice uttered by the user Acquiring at least one of the information, the first voice information is information representing a voice including the music related information, and the second voice information is the thought Information representing voice including information, the third voice information is information representing voice including the usage mode information, and the extraction means includes the first voice information, the second voice information, and the second voice information From the at least one of the three voice information, the music related information, the at least one of the thinking information and the usage form information, and the information indicating the feature of the voice are extracted, and the determination means is characterized by the voice The priority order of the music information to be presented to the user among the plurality of pieces of music information searched by the search means is determined according to at least one of the information indicating the information, the thought information, and the usage form information. I assume.

請求項３記載のカラオケ装置の前記思考発音指示情報はユーザが応答する複数の第１応答例を表す情報を含み、前記利用形態発音指示情報はユーザが応答する複数の第２応答例を表す情報を含み、前記取得手段は、ユーザが前記第１応答例および前記第２応答例をそれぞれ発音したときのユーザの声を表す前記第２声情報、および前記第３声情報を取得し、前記抽出手段は、前記第１応答例について、声の特徴に関連してあらかじめ定められた第１の基準と前記第２声情報とを比較する第１の比較処理と、前記第２応答例について、声の特徴に関連してあらかじめ定められた第２の基準と前記第３声情報とを比較する第２の比較処理と、の少なくとも１つの比較処理を実行し、前記実行した比較処理により、声の大きさ、速さ、および声が含む複数の周波数のうちの少なくとも１つの声の要素を前記声の特徴として抽出し、前記決定手段は、前記抽出手段により抽出された前記声の要素と楽曲情報に対応付けられた声の要素との一致度に応じて、ユーザに提示する前記複数の楽曲情報の優先順位を決定することを特徴とする。 The thought sounding instruction information of the karaoke apparatus according to claim 3 includes information representing a plurality of first response examples to which the user responds, and the usage mode sound generation instruction information representing information representing a plurality of second response examples to which the user responds The acquisition unit acquires the second voice information and the third voice information representing the user's voice when the user pronounces the first response example and the second response example, respectively; Means for performing a first comparison process of comparing the second voice information with a first reference determined in advance in relation to voice characteristics for the first response example; and for the second response example Performing at least one comparison process of the second comparison process of comparing the second voice information with the second standard predetermined in relation to the feature of Includes size, speed, and voice At least one voice element of a plurality of frequencies is extracted as the feature of the voice, and the determination means combines the voice element extracted by the extraction means with the voice element associated with music information. It is characterized in that priorities of the plurality of pieces of music information to be presented to the user are determined according to the degree of coincidence.

請求項４記載のカラオケ装置の前記抽出手段により前記声情報から少なくとも１つの前記楽曲関連情報、およびユーザの少なくとも１つの声の特徴を示す情報がそれぞれ抽出されるまで、前記取得手段は、前記特定の指示に応答してユーザが発音した声を表す声情報を取得することを特徴とする。 5. The acquisition means according to claim 4, until the extraction means of the karaoke apparatus according to claim 4 extracts at least one of the music related information and information indicating a feature of at least one voice of the user from the voice information. And voice information representing a voice uttered by the user is obtained in response to an instruction from the user.

請求項５記載のカラオケ装置の前記抽出手段が抽出した声の要素と一致する声の要素が対応付けられた楽曲情報がない場合、ユーザによるカラオケ装置の利用形態を表す利用形態情報と前記楽曲情報に対応付けられる利用形態情報との一致度に応じて前記複数の楽曲情報をユーザに提示する優先順位を決定することを特徴とする When there is no music information associated with a voice element corresponding to the voice element extracted by the extraction means of the karaoke apparatus according to claim 5, use mode information representing the usage mode of the karaoke apparatus by the user and the music information The priority order of presenting the plurality of pieces of music information to the user is determined according to the matching degree with the usage mode information associated with the

請求項６記載のカラオケ装置の前記楽曲情報は、前記楽曲情報は、ユーザによるカラオケ装置の利用形態を表す利用形態情報と対応付けられて記憶され、前記取得手段は、前記特定の指示に応答してユーザが発音する声以外にカラオケボックス内の音を取得し、前記抽出手段は、前記カラオケボックス内の音を抽出し、前記決定手段は、前記抽出手段が抽出した前記カラオケボックス内の音が様々な音を含むほど、複数のユーザがカラオケ装置を利用する形態を表す利用形態情報が対応付けられた楽曲情報の優先順位を上位の順位に決定してユーザに提示することを特徴とする。 The music information of the karaoke apparatus according to claim 6, wherein the music information is stored in association with use mode information representing a use mode of the karaoke apparatus by a user, and the acquisition means responds to the specific instruction. Other than the user's voice, the sound in the karaoke box is acquired, the extraction means extracts the sound in the karaoke box, and the determination means is the sound in the karaoke box extracted by the extraction means As various sounds are included, it is characterized in that the priority of the music information associated with the usage mode information representing the mode in which the plurality of users use the karaoke apparatus is determined as the higher rank and presented to the user.

請求項７記載のカラオケ装置用プログラムは楽曲関連情報をユーザに発音させるための指示を含む特定の指示を表す指示情報を記憶する記憶手段を備えるカラオケ装置のコンピュータに、前記記憶手段に記憶された指示情報に従って前記特定の指示をユーザに提示させる指示提示ステップと、前記指示提示ステップにより提示された前記特定の指示に応答してユーザが発音した声を表す声情報を取得させる取得ステップと、前記取得ステップにより取得された前記声情報から、ユーザが前記特定の指示に応答した前記楽曲関連情報を含む応答内容、および現在のユーザの声の特徴を示す情報を抽出させる抽出ステップと、前記抽出ステップにより抽出された前記応答内容中の前記楽曲関連情報に従って複数の楽曲情報を検索する検索ステップと、前記抽出ステップにより抽出された前記応答内容のうち前記楽曲関連情報以外の応答内容、および現在のユーザの前記声の特徴を示す情報に基づいて、前記検索ステップにより検索された複数の楽曲情報のうちでユーザに提示する楽曲情報の優先順位を決定する決定ステップと、を実行させることを特徴とする。 The karaoke apparatus program according to claim 7 is stored in the storage unit in a computer of the karaoke apparatus including storage means for storing instruction information representing a specific instruction including an instruction for causing the user to pronounce music-related information. An instruction presenting step of causing the user to present the specific instruction to the user according to the instruction information; an acquiring step of acquiring voice information representing a voice uttered by the user in response to the specific instruction presented by the instruction presenting step; An extraction step of extracting response content including the music related information in response to the specific instruction from the voice information acquired in the acquisition step, and information indicating a feature of the current user's voice; the extraction step Searching for a plurality of pieces of music information according to the music related information in the response contents extracted by , Response content other than the music-related information of said response content extracted by the extraction step, and on the basis of information indicating characteristics of the voice of the current user, the plurality of musical piece information retrieved by the retrieving step And determining a priority of music information to be presented to the user.

請求項８記載のカラオケシステムは、カラオケ装置と、サーバ装置と、携帯端末とがネットワーク網を介して互いに通信可能に備えられるカラオケシステムであって、楽曲関連情報をユーザに発音させるための指示を含む特定の指示を表す指示情報を記憶する記憶手段と、前記記憶手段に記憶された指示情報に従って前記特定の指示をユーザに提示する指示提示手段と、前記指示提示手段により提示された前記特定の指示に応答してユーザが発音した声を表す声情報を取得する取得手段と、前記取得手段により取得された前記声情報から、ユーザが前記特定の指示に応答した前記楽曲関連情報を含む応答内容、および現在のユーザの声の特徴を示す情報を抽出する抽出手段と、前記抽出手段により抽出された前記応答内容中の前記楽曲関連情報に従って複数の楽曲情報を検索する検索手段と、前記抽出手段により抽出された前記応答内容のうち前記楽曲関連情報以外の応答内容、および現在のユーザの前記声の特徴を示す情報に基づいて、前記検索手段により検索された複数の楽曲情報のうちでユーザに提示する楽曲情報の優先順位を決定する決定手段と、を備えることを特徴とする。 The karaoke system according to claim 8 is a karaoke system in which a karaoke apparatus, a server apparatus, and a portable terminal are provided so as to be mutually communicable via a network, and an instruction for causing a user to pronounce music related information Storage means for storing instruction information representing a specific instruction to be included, instruction presentation means for presenting the specific instruction to the user according to the instruction information stored in the storage means, and the specific device presented by the instruction presentation means Response means including the music related information in which the user responds to the specific instruction from the acquisition means for acquiring voice information representing the voice uttered by the user in response to the instruction, and the voice information acquired by the acquisition means And extraction means for extracting information indicating the characteristics of the current user's voice, and the music related information in the response contents extracted by the extraction means Based search means for searching a plurality of musical piece information, the response content other than the music-related information of said response content extracted by the extraction means, and the information indicating the characteristics of the voice of the current user in accordance with the And a determination unit configured to determine a priority of music information to be presented to the user among a plurality of pieces of music information searched by the search unit.

請求項１、請求項７、および請求項８に記載の発明によれば、抽出された応答内容中の楽曲関連情報に従って複数の楽曲情報が検索され、抽出された応答内容、および声の特徴を示す情報に基づいて、検索された複数の楽曲情報のうちでユーザに提示する楽曲情報の優先順位が決定される。これにより、選曲時におけるユーザの現在の状況にマッチした楽曲をユーザに提示することができる。 According to the invention described in claim 1, claim 7, and claim 8, a plurality of pieces of music information are searched according to the music related information in the extracted response contents, and the extracted response contents and voice characteristics are obtained. The priority of the music information to be presented to the user among the plurality of pieces of searched music information is determined based on the indicated information. Thus, it is possible to present the user with music that matches the user's current situation at the time of music selection.

請求項２記載のカラオケ装置は、声の特徴を示す情報と、思考情報および利用形態情報の少なくとも１つと、に従って、検索手段により検索された複数の楽曲情報のうちでユーザに提示する楽曲情報の優先順位を決定する。これにより、選曲時におけるユーザの現在の状況に、より一層マッチした楽曲をユーザに提示することができる。 The karaoke apparatus according to claim 2 is the music information to be presented to the user among the plurality of pieces of music information searched by the search means according to the information indicating the feature of the voice, and at least one of the thought information and the usage form information. Determine the priority. As a result, it is possible to present the user with music that more closely matches the user's current situation at the time of music selection.

請求項３記載のカラオケ装置は、抽出手段により抽出された声の要素と楽曲情報に対応付けられた声の要素との一致度に応じて、ユーザに提示する複数の楽曲情報の優先順位を決定する。これにより、選曲時におけるユーザの現在の状況にマッチした楽曲を精度よくユーザに提示することができる。 The karaoke apparatus according to claim 3 determines the priority of the plurality of pieces of music information to be presented to the user according to the degree of coincidence between the elements of the voice extracted by the extraction means and the elements of the voice associated with the music information. Do. As a result, it is possible to accurately present the user with music that matches the user's current situation at the time of music selection.

請求項４記載のカラオケ装置は、抽出手段により声情報から少なくとも１つの楽曲関連情報、およびユーザの少なくとも１つの声の特徴を示す情報がそれぞれ抽出されるまで、取得手段は、特定の指示に応答してユーザが発音した声を表す声情報を取得する。これにより、選曲時におけるユーザの現在の状況に少なくとも１つはマッチした楽曲をユーザに提示することができる。 The karaoke apparatus according to claim 4, wherein the acquisition means responds to the specific instruction until the extraction means respectively extracts at least one music related information and information indicating the feature of at least one voice of the user from the voice information. Then, voice information representing a voice uttered by the user is acquired. In this way, it is possible to present to the user at least one song that matches the user's current situation at the time of music selection.

請求項５記載のカラオケ装置は、抽出した声の要素と一致する声の要素が対応付けられた楽曲情報がない場合、ユーザによるカラオケ装置の利用形態を表す利用形態情報と楽曲情報に対応付けられる利用形態情報との一致度、または、ユーザの感情を表す思考情報と楽曲情報に対応付けられる思考情報との一致度の少なくとも１つに応じて複数の楽曲情報をユーザに提示する優先順位を決定する。これにより、選曲時におけるユーザの現在の声の要素にマッチした楽曲がなかった場合でも、利用形態、または思考の少なくとも１つがマッチした楽曲をユーザに提示することができる。 According to the fifth aspect of the present invention, when there is no music information associated with a voice element corresponding to an extracted voice element, the karaoke apparatus according to the fifth aspect is associated with usage mode information representing the usage mode of the karaoke apparatus by the user and the music information. Determine the priority of presenting multiple pieces of music information to the user according to at least one of the degree of coincidence with usage information or the degree of coincidence between thought information representing the user's emotions and thought information associated with the music information Do. Thereby, even if there is no music that matches the element of the user's current voice at the time of music selection, it is possible to present the user with a music that matches at least one of the usage form or the thought.

請求項６記載のカラオケ装置は、カラオケボックス内の音を抽出し、抽出したカラオケボックス内の音が様々な音を含むほど、複数のユーザがカラオケ装置を利用する形態を表す利用形態情報が対応付けられた楽曲情報の優先順位を上位の順位に決定してユーザに提示する。これにより、選曲時においてカラオケボックス内に複数の人がいる場合、複数の人がいるカラオケボックスでよく歌われている楽曲をユーザに提示することができる。 The karaoke apparatus according to claim 6 extracts sounds in the karaoke box, and as the sounds in the extracted karaoke box include various sounds, utilization form information representing a form in which a plurality of users utilize the karaoke apparatus is supported The priority of the attached music information is determined to be a higher rank and presented to the user. As a result, when there are a plurality of people in the karaoke box at the time of music selection, it is possible to present to the user a song often sung in the karaoke box in which the plurality of people are present.

本発明の第１実施形態に係るカラオケシステム１のブロック図である。It is a block diagram of the karaoke system 1 which concerns on 1st Embodiment of this invention. 第１実施形態に係るカラオケシステム１が備えるカラオケ装置１０の制御部１００が行う検索処理を表すフローチャートである。It is a flow chart showing search processing which control part 100 of karaoke device 10 with which karaoke system 1 concerning a 1st embodiment is provided performs. カラオケ装置１０が指示情報を表示させたときの画面表示の一例を表す図である。It is a figure showing an example of a screen display when the karaoke apparatus 10 displays instruction information. 本発明の第２実施形態に係るカラオケシステム１が備えるカラオケ装置１０の制御部１００が行う検索処理を表すフローチャートである。It is a flow chart showing search processing which control part 100 of karaoke device 10 with which karaoke system 1 concerning a 2nd embodiment concerning the present invention is provided performs. カラオケ装置１０が指示情報を表示させたときの画面表示の変形例を表す図である。It is a figure showing the modification of the screen display when the karaoke apparatus 10 displays instruction information.

以下、本発明の実施の形態について図面を参照して説明する。
［第１実施形態］
［カラオケシステム１の構成］
図１は本発明の第１実施形態に係るカラオケシステム１のブロック図である。カラオケシステム１は、カラオケ装置１０、端末装置２０、およびサーバ装置３０を備える。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
First Embodiment
[Configuration of karaoke system 1]
FIG. 1 is a block diagram of a karaoke system 1 according to a first embodiment of the present invention. The karaoke system 1 includes a karaoke device 10, a terminal device 20, and a server device 30.

カラオケ装置１０は、制御部１００、記憶部１１０、操作処理部１２０、操作部１２１、映像再生部１３０、映像制御部１３１、音響制御部１４０、および通信部１５０を備える。 The karaoke apparatus 10 includes a control unit 100, a storage unit 110, an operation processing unit 120, an operation unit 121, a video reproduction unit 130, a video control unit 131, an acoustic control unit 140, and a communication unit 150.

制御部１００は、ＲＯＭ、およびＲＡＭを備え、カラオケ装置１０の各種処理を制御する。ＲＯＭは、制御部１００が実行するプログラムを記憶する。ＲＡＭは、一時的な記憶領域であり、制御部１００の制御に関する情報を一時的に記憶する。 The control unit 100 includes a ROM and a RAM, and controls various processes of the karaoke apparatus 10. The ROM stores a program that the control unit 100 executes. The RAM is a temporary storage area, and temporarily stores information related to control of the control unit 100.

記憶部１１０は、記憶領域を備え、各種情報を記憶する。記憶部１１０は、カラオケ装置１０によって再生される楽曲の伴奏音を表す楽曲情報、表示される画像を表す画像情報、再生される動画を表す映像情報、および指示情報などを記憶する。楽曲情報は、楽曲を識別するための識別番号および楽曲名を含む楽曲識別情報と対応付けられて記憶される。楽曲識別情報は、楽曲のキーワード、ジャンル、楽曲に対するユーザの感想、楽曲の特徴、および歌手の声の特徴を示す情報などの楽曲に関する楽曲関連情報と対応付けられて記憶される。指示情報については後述する。 The storage unit 110 includes a storage area and stores various types of information. The storage unit 110 stores music information indicating the accompaniment sound of music played by the karaoke apparatus 10, image information indicating an image to be displayed, video information indicating a moving image to be played back, instruction information, and the like. The music information is stored in association with music identification information including an identification number for identifying a music and a music name. The music identification information is stored in association with music related information related to music, such as music keyword, genre, user's impression of music, characteristics of music, and information indicating characteristics of singer's voice. The instruction information will be described later.

操作処理部１２０は、ユーザによる操作部１２１の操作を表す操作情報を受け、制御部１００の動作を切り替える制御要求に変換し、制御部１００に制御要求を送る。操作部１２１は、スイッチまたはタッチパネルを備え、スイッチまたはタッチパネルはユーザにより押される。操作部１２１は、ユーザが押したスイッチの位置および回数などを表す操作情報、またはユーザがタッチしたタッチパネルの位置および回数などを表す操作情報を操作処理部１２０に送る。操作部１２１から送られた操作情報は操作処理部１２０によって制御要求に変換され、制御部１００へ送られる。 The operation processing unit 120 receives operation information indicating an operation of the operation unit 121 by the user, converts the operation of the control unit 100 into a control request to switch, and sends the control request to the control unit 100. The operation unit 121 includes a switch or a touch panel, and the switch or the touch panel is pressed by the user. The operation unit 121 transmits, to the operation processing unit 120, operation information indicating the position and the number of times of the switch pressed by the user, or operation information indicating the position and the number of times of the touch panel touched by the user. The operation information sent from the operation unit 121 is converted into a control request by the operation processing unit 120, and sent to the control unit 100.

映像再生部１３０は、制御部１００から送られる再生指示に従って、映像情報をデコードする。デコードされた映像情報は、再生可能な動画の情報として映像制御部１３１またはメモリ１０１に送られる。具体的には、再生指示は、映像情報と表示開始時間の情報とデコードされた動画の情報の送り先とを含む。映像情報は、表示開始時間の情報に同期され映像再生部１３０によりデコードされる。デコードされた映像情報が表す動画は、映像制御部１３１またはメモリ１０１に送られる。映像制御部１３１は、制御部１００によって送られるタブ譜および歌詞テロップなどの画像情報が表す画像と、映像再生部１３０およびメモリ１０１から送られるデコードされた動画の情報とを受ける。映像制御部１３１は、タブ譜および歌詞テロップなどと、動画とを合成し、モニタ１３２に表示可能な形式の動画に変換する。映像制御部１３１は、変換した動画をモニタ１３２の画面に表示させる。 The video reproduction unit 130 decodes the video information in accordance with the reproduction instruction sent from the control unit 100. The decoded video information is sent to the video control unit 131 or the memory 101 as information of reproducible moving pictures. Specifically, the reproduction instruction includes the video information, the information of the display start time, and the destination of the information of the decoded moving image. The video information is decoded by the video reproduction unit 130 in synchronization with the information of the display start time. The moving image represented by the decoded video information is sent to the video control unit 131 or the memory 101. The video control unit 131 receives an image represented by image information such as tablature and lyric telop sent by the control unit 100 and information of a decoded moving image sent from the video reproduction unit 130 and the memory 101. The image control unit 131 combines the tablature, the lyrics telop, and the like with the moving image, and converts it into a moving image in a format that can be displayed on the monitor 132. The video control unit 131 causes the converted moving image to be displayed on the screen of the monitor 132.

音響制御部１４０は、制御部１００から送られる再生指示に従って、楽曲の伴奏音を再生する。音響制御部１４０は、歌唱用マイク１４１から送られる歌唱音情報と楽曲情報とを合成して、スピーカ１４３に発音させるための発音情報を生成する。音響制御部１４０は発音情報をスピーカ１４３に送り、発音させる。また、音響制御部１４０は、回答用マイク１４２から送られる回答情報に含まれる声の情報を周波数解析し、解析結果を制御部１００へ送る。周波数解析については後述する。 The sound control unit 140 reproduces the accompaniment sound of the music in accordance with the reproduction instruction sent from the control unit 100. The sound control unit 140 combines the singing sound information and the music information sent from the singing microphone 141 to generate sounding information for causing the speaker 143 to sound. The sound control unit 140 sends sound generation information to the speaker 143 for sound generation. Further, the sound control unit 140 performs frequency analysis of voice information included in the answer information sent from the answer microphone 142, and sends the analysis result to the control unit 100. The frequency analysis will be described later.

通信部１５０は、端末装置２０、およびサーバ装置３０との通信を可能にする無線通信機能を有する。通信部１５０は、制御部１００に対する制御要求を端末装置２０から受信し、制御要求を制御部１００へ送る。通信部１５０は、端末装置２０に対する制御要求を制御部１００から受け、端末装置２０に送る。通信部１５０は、画像情報、および楽曲情報などの各種情報を送受信する機能を有する。また、通信部１５０は、サーバ装置３０が備える記憶部から楽曲情報、画像情報、および映像情報などの各種情報を受け取ることができる。 The communication unit 150 has a wireless communication function that enables communication with the terminal device 20 and the server device 30. The communication unit 150 receives a control request for the control unit 100 from the terminal device 20, and sends the control request to the control unit 100. The communication unit 150 receives a control request for the terminal device 20 from the control unit 100 and sends the control request to the terminal device 20. The communication unit 150 has a function of transmitting and receiving various information such as image information and music information. In addition, the communication unit 150 can receive various information such as music information, image information, and video information from the storage unit included in the server device 30.

端末装置２０は、カラオケ装置１０およびサーバ装置３０と通信可能に設けられる。一例として、端末装置２０は、カラオケ専用のリモコン装置、またはカラオケ装置１０を制御するためのアプリケーションがインストールされたユーザの携帯電話などの端末である。端末装置２０はユーザから受け付けた操作情報に対応する制御要求を通信部１５０に送る。 The terminal device 20 is provided to be communicable with the karaoke apparatus 10 and the server device 30. As an example, the terminal device 20 is a terminal such as a remote control device dedicated to karaoke, or a mobile phone of a user in which an application for controlling the karaoke device 10 is installed. The terminal device 20 transmits a control request corresponding to the operation information received from the user to the communication unit 150.

サーバ装置３０は、カラオケ装置１０および端末装置２０と通信可能に設けられる。サーバ装置３０は記憶部を備え、記憶部は、楽曲情報、画像情報、映像情報、および指示情報を記憶する。サーバ装置３０は、カラオケ装置１０または端末装置２０から送られる要求に従って、各種情報または、各種情報に含まれる情報を抽出して送る。各種情報に含まれる情報を抽出するとは、一例として、楽曲情報に含まれる楽曲識別情報と伴奏音を表す情報とのうち、楽曲識別情報のみを抽出することである。 The server device 30 is provided to be communicable with the karaoke device 10 and the terminal device 20. The server device 30 includes a storage unit, and the storage unit stores music information, image information, video information, and instruction information. The server device 30 extracts and sends various information or information included in the various information in accordance with a request sent from the karaoke device 10 or the terminal device 20. Extracting the information contained in the various information means, as an example, extracting only the music identification information from the music identification information contained in the music information and the information representing the accompaniment sound.

指示情報は、楽曲発音指示情報と、思考発音指示情報と、利用形態発音指示情報とをそれぞれ複数含む。楽曲発音指示情報は、ユーザが歌唱したい楽曲を特定するために楽曲のタイトルおよびキーワードなどを声により入力させるための指示を表す指示情報である。思考発音指示情報は、ユーザが歌唱したい楽曲についてのユーザの感想、およびユーザの現在の体調などを声により入力させるための指示を表す指示情報である。利用形態発音指示情報は、カラオケボックス内に居るユーザの関係、人数、およびカラオケボックスを利用している時期などを声により入力させるための指示を表す指示情報である。本実施形態において、指示情報は、モニタ１３２に表示される指示が記載された画像情報と対応付けられている。指示情報は、思考発音指示情報および利用形態発音指示情報の少なくとも１つを含み、どちらか一方を備えていなくても、後述の検索処理が実行される。 The instruction information includes a plurality of music sounding instruction information, thinking sounding instruction information, and usage mode sounding instruction information. The song pronunciation instruction information is instruction information representing an instruction for causing a user to input a title, a keyword, and the like of a song by voice in order to specify a song that the user desires to sing. The thought-pronunciation instruction information is instruction information representing an instruction for causing the user to input the user's impression of the music that the user wants to sing, the user's current physical condition and the like by voice. The usage mode sound generation instruction information is instruction information representing an instruction for a voice input of the relationship between the users present in the karaoke box, the number of persons, the time of using the karaoke box, and the like. In the present embodiment, the instruction information is associated with the image information in which the instruction displayed on the monitor 132 is described. The instruction information includes at least one of thinking sounding instruction information and usage form sounding instruction information, and search processing described later is executed even if either one is not provided.

［カラオケシステム１の検索処理］
カラオケシステム１の検索処理について、図２を参照して説明する。図２は本発明の第１実施形態における検索処理を表すフローチャートであり、フローチャートの各処理はカラオケ装置１０の制御部１００が行う処理である。フローチャートに記載された検索処理は、楽曲情報を検索する動作を制御部１００に実行させるための制御要求が操作処理部１２０または通信部１５０から制御部１００に送られることで開始される。 [Search process of karaoke system 1]
The search process of the karaoke system 1 will be described with reference to FIG. FIG. 2 is a flowchart showing a search process in the first embodiment of the present invention, and each process of the flowchart is a process performed by the control unit 100 of the karaoke apparatus 10. The search process described in the flowchart is started by sending a control request from the operation processing unit 120 or the communication unit 150 to the control unit 100 to cause the control unit 100 to execute an operation of searching for music information.

ステップＳ１００では、音声録音処理が開始される。具体的には、回答用マイク１４２に入力される音声が取得され、取得された音声を表す回答情報が生成されて記憶部１１０に記憶される。また、回答用マイク１４２に入力された音声は、スピーカ１４３から発音されないように、回答用マイク１４２から入力された音声の出力がミュートされる。音声録音処理が開始されると、処理はステップＳ１０１に移される。 In step S100, a voice recording process is started. Specifically, the voice input to the response microphone 142 is acquired, and the response information representing the acquired voice is generated and stored in the storage unit 110. Further, the output of the voice input from the answer microphone 142 is muted so that the sound input to the answer microphone 142 is not pronounced from the speaker 143. When the voice recording process is started, the process proceeds to step S101.

ステップＳ１０１では、指示情報が表す画像がモニタ１３２に表示される。指示情報に含まれる楽曲発音指示情報、思考発音指示情報、および利用形態発音指示情報がそれぞれランダムで抽出される。抽出された指示情報に対応付けられている画像情報が記憶部１１０から読み出され、映像制御部１３１に送られる。映像制御部１３１に送られた画像情報が表す画像がモニタ１３２に表示される。具体的には、抽出された指示情報の１つが思考発音指示情報であって、どんな気分であるかをユーザに入力させる場合、モニタ１３２に表示される画像の一例は図３の（Ａ）に示される。また、抽出された指示情報の１つが利用形態発音指示情報であって、だれとカラオケボックスに来ているのかをユーザに入力させる場合、モニタ１３２に表示される画像の一例は図３の（Ｂ）に示される。ユーザは、質問文の下方に記載された回答例に従って回答用マイク１４２に向かって回答内容を発音する。それぞれの指示情報に対応付けられた画像がモニタ１３２に表示されると、ステップＳ１０１の処理はステップＳ１０２に移される。 In step S101, an image represented by the instruction information is displayed on the monitor 132. The music sounding instruction information, the thought sounding instruction information, and the usage sounding instruction information included in the instruction information are randomly extracted. The image information associated with the extracted instruction information is read from the storage unit 110 and sent to the video control unit 131. An image represented by the image information sent to the video control unit 131 is displayed on the monitor 132. Specifically, in the case where one of the extracted instruction information is thinking sounding instruction information and the user is to input what mood it is, an example of an image displayed on the monitor 132 is shown in FIG. Indicated. In addition, when one of the extracted instruction information is usage mode sound generation instruction information and the user is to input who is coming to the karaoke box, an example of the image displayed on the monitor 132 is shown in FIG. Shown in). The user utters the answer contents toward the answer microphone 142 in accordance with the answer example described below the question sentence. When the image associated with each piece of instruction information is displayed on the monitor 132, the process of step S101 is moved to step S102.

ステップＳ１０２では、回答情報から検索ワードが生成される。具体的には、ステップＳ１００において生成が開始され、記憶部１１０に記憶された回答情報が表す音声が周波数解析される。周波数解析による解析結果から導き出される波形が文字に置き換えられる。波形と文字との関係は、記憶部１１０に記憶される文字と波形との対応関係を表す情報から特定される。波形が文字に置き換えられ、生成された検索ワードは、メモリ１０１に記憶される。検索ワードが生成されると、処理はステップＳ１０３に移される。または、ステップＳ１０２において、検索ワードが生成されず所定の時間が経過すると処理はステップＳ１０３に移される。 In step S102, a search word is generated from the answer information. Specifically, generation is started in step S100, and the voice represented by the answer information stored in the storage unit 110 is frequency analyzed. The waveform derived from the analysis result by the frequency analysis is replaced with characters. The relationship between the waveform and the character is specified from the information representing the correspondence between the character and the waveform stored in the storage unit 110. The waveform is replaced with characters, and the generated search word is stored in the memory 101. When the search word is generated, the process proceeds to step S103. Alternatively, when the search word is not generated in step S102 and a predetermined time elapses, the process proceeds to step S103.

ステップＳ１０３では、ステップＳ１０２において検索ワードが生成されたか否かが判断される。具体的には、メモリ１０１に検索ワードが記憶され、検索ワードが生成できたと判断される場合、ステップＳ１０３の処理はＹＥＳであるとして、ステップＳ１０４に処理が移される。メモリ１０１にメモリが記憶されておらず、検索ワードが生成できなかったと判断される場合、ステップＳ１０３の処理はＮＯであるとして、ステップＳ１１１に処理が移される。 In step S103, it is determined whether or not a search word is generated in step S102. Specifically, when the search word is stored in the memory 101 and it is determined that the search word has been generated, the process of step S103 is determined as YES, and the process is transferred to step S104. If it is determined that the memory is not stored in the memory 101 and the search word can not be generated, the process of step S103 is determined as NO, and the process is moved to step S111.

ステップＳ１０４では、ステップＳ１０２で生成された検索ワードのうち、楽曲発音指示情報に従って生成された回答情報から生成された検索ワードによって、楽曲が検索される。具体的には、メモリ１０１に記憶される検索ワードが読み出される。読み出された検索ワードに従って、記憶部１１０から楽曲関連情報が読み出される。読み出された楽曲関連情報に対応付けられた楽曲識別情報がメモリ１０１に記憶され、処理がステップＳ１０５に移される。 In step S104, a music is searched by the search word produced | generated from the answer information produced | generated according to music sounding instruction | indication information among the search words produced | generated by step S102. Specifically, the search word stored in the memory 101 is read. The music related information is read out from the storage unit 110 according to the read search word. The music identification information associated with the read music related information is stored in the memory 101, and the process proceeds to step S105.

ステップＳ１０５では、ステップＳ１００において開始された音声録音処理が終了される。また、ステップＳ１０５では、回答用マイク１４２のミュートが解除される。音声録音処理が終了されると、処理はステップＳ１０６に移される。 In step S105, the voice recording process started in step S100 is ended. Further, in step S105, the mute of the response microphone 142 is released. When the voice recording process is completed, the process proceeds to step S106.

ステップＳ１０６では、録音音声からユーザの声の特徴を示す情報が抽出される。具体的には、声の大きさ、速さ、および声が含む複数の周波数の成分が抽出される。一例として、これらの声の特徴を示す情報を抽出するために、指示情報には、図３に示される回答例の情報が対応付けられる。各回答例の情報には、基準となる声の大きさ、および速さが対応付けられて記憶される。声の大きさの基準としては特定の音量レベルを表す情報が対応付けられ、速さの基準としては文字数に応じた発話期間を表す情報が対応付けられる。つまり、声の大きさを判断するための基準となる特定の音量レベルが記憶され、特定の音量レベルよりも声の大きさが大きいか、または小さいがが判断される。速さについては、検索ワードの文字数、および回答例の内容に応じた発音時間の長さの基準よりも発音時間が長いか短いか判断される。発音時間が短い場合は速いと判断され、発音時間が長い場合は遅いと判断される。周波数の成分を抽出する処理は、従来公知の周波数解析によって、録音音声に含まれる音の周波数を音ごとに分ける。これらの判断結果および解析結果により抽出された声の大きさと、速さと、声が含む複数の周波数の成分とがユーザの声の特徴を示す情報としてメモリに記憶される。一連の声の特徴を解析する処理が終了すると、ユーザの声の特徴を示す情報が抽出できたか否かにかかわらず、処理はステップＳ１０７に移される。 In step S106, information indicating the feature of the user's voice is extracted from the recorded voice. Specifically, components of the voice size, speed, and multiple frequencies included in the voice are extracted. As an example, in order to extract information indicating the characteristics of these voices, the instruction information is associated with the information of the answer example shown in FIG. The information of each answer example is stored in association with the reference voice size and speed. Information representing a specific sound volume level is associated as the basis of the loudness of the voice, and information representing an utterance period according to the number of characters is associated as the basis of the speed. That is, a specific volume level that serves as a reference for determining the loudness of the voice is stored, and it is determined whether the loudness of the voice is larger or smaller than the specific loudness level. As for the speed, it is determined whether the pronunciation time is longer or shorter than the criterion of the length of the pronunciation time according to the number of characters of the search word and the contents of the answer example. If the pronunciation time is short, it is judged to be fast, and if the pronunciation time is long, it is judged to be slow. The process of extracting the component of the frequency divides the frequency of the sound included in the recorded voice into each sound by the conventionally known frequency analysis. The size and speed of the voice extracted by the determination result and the analysis result, and the components of the plurality of frequencies included in the voice are stored in the memory as information indicating the feature of the user's voice. When the process of analyzing the series of voice features is completed, the process proceeds to step S107 regardless of whether information indicating the voice features of the user has been extracted.

ステップＳ１０７では、ステップＳ１０２で生成された検索ワードのうち、楽曲発音指示情報以外の指示情報に従って生成された回答情報から生成された検索ワードによって、ステップＳ１０４において検索された楽曲から、さらに楽曲が検索される。具体的には、思考発音指示情報と、利用形態発音指示情報とに従って生成された回答情報から生成された検索ワードがメモリ１０１から読み出される。また、ステップＳ１０４において検索され、メモリ１０１に記憶された楽曲識別情報に対応付けられる楽曲関連情報が記憶部１１０から読み出される。読み出された検索ワードと、読み出された楽曲関連情報とが比較され、それぞれの適合度が決定される。一例として、適合度は、複数の検索ワードのうち、楽曲関連情報にいくつ一致する項目が含まれているかによって算出される割合である。ステップＳ１０７では、この算出される割合が９割を超える楽曲関連情報と対応付けられた楽曲情報が検索され、メモリ１０１に記憶される。この処理において、一致とは、検索ワードと楽曲関連情報とが完全に一致する場合のみでなく、部分一致、および関連語句であると判断される場合にも一致と判断してよい。ステップＳ１０７の処理を終えると、処理はステップＳ１０８に移される。 In step S107, of the search words generated in step S102, a search word is further searched from the music searched in step S104 by the search word generated from the answer information generated according to the instruction information other than the music sounding instruction information. Be done. Specifically, a search word generated from the answer information generated in accordance with the thinking sound production instruction information and the usage form sound production instruction information is read from the memory 101. Also, the music related information associated with the music identification information retrieved in step S104 and stored in the memory 101 is read out from the storage unit 110. The read search word and the read music related information are compared to determine their respective matching degrees. As one example, the matching degree is a ratio calculated based on how many items of the search related information included in the plurality of search words are included. In step S 107, the music information associated with the music related information whose calculated ratio exceeds 90% is searched and stored in the memory 101. In this process, the match may be determined not only when the search word and the music related information completely match but also when it is determined to be a partial match and a related phrase. When the process of step S107 is finished, the process proceeds to step S108.

ステップＳ１０８では、ステップＳ１０４およびステップＳ１０７の処理によって楽曲が１曲以上取得できたか否かが判断される。具体的には、ステップＳ１０７において検索された楽曲の楽曲識別情報が１件以上メモリ１０１に記憶されているか否かが判断される。楽曲識別情報がメモリ１０１に記憶されている場合、ステップＳ１０８の処理はＹＥＳであるとして、ステップＳ１０９に処理が移される。楽曲識別情報がメモリ１０１に記憶されていない場合、ステップＳ１０８はＮＯであるとして、ステップＳ１１３に処理が移される。 In step S108, it is determined whether one or more pieces of music have been acquired by the processing of steps S104 and S107. Specifically, it is determined whether one or more pieces of music identification information of the music searched in step S107 are stored in the memory 101. If the music identification information is stored in the memory 101, the process of step S108 is determined as YES, and the process proceeds to step S109. If the music identification information is not stored in the memory 101, the process proceeds to step S113, since step S108 is NO.

ステップＳ１０９では、楽曲関連情報のうち歌手の声の特徴を示す情報と、ユーザの声の特徴を示す情報との一致度を算出する。一致度の算出はステップＳ１０７と同様に、ステップＳ１０６において抽出されたユーザの声の特徴を示す情報と、楽曲関連情報に含まれる歌手の声の特徴を示す情報とが比較される。具体的には、ステップＳ１０６において抽出されたユーザの声の特徴を示す情報がメモリ１０１から読み出される。また、ステップＳ１０７において検索され、メモリ１０１に記憶された楽曲識別情報に対応付けられている楽曲関連情報が記憶部１１０から読み出される。読み出された楽曲関連情報に含まれる歌手の声の特徴を示す情報と、読み出されたユーザの声の特徴を示す情報とが比較され、それぞれの一致度が決定される。一例として、一致度は、複数のユーザの声の特徴を示す情報のうち、楽曲関連情報に含まれる歌手の声の特徴を示す情報に一致する項目がいくつ含まれているかによって算出される割合である。ステップＳ１０９では、この割合が楽曲ごとに算出されて、割合と楽曲識別情報とが対応付けられてメモリ１０１に記憶される。この処理において、一致とは、ユーザの声の特徴を示す情報と歌手の声の特徴を示す情報とが完全に一致する場合のみでなく、似ていると判断される場合についても一致と判断してもよい。ステップＳ１０９の処理を終えると、処理はステップＳ１１０に移される。 In step S109, the matching degree of the information indicating the feature of the singer's voice in the music related information and the information indicating the feature of the user's voice is calculated. Similar to step S107, in the calculation of the degree of coincidence, the information indicating the feature of the user's voice extracted in step S106 is compared with the information indicating the feature of the singer's voice included in the music related information. Specifically, information indicating the feature of the user's voice extracted in step S106 is read from the memory 101. Also, the music related information associated with the music identification information retrieved in step S 107 and stored in the memory 101 is read out from the storage unit 110. The information indicating the feature of the singer's voice included in the read music related information and the information indicating the feature of the read user's voice are compared, and the degree of coincidence is determined. As an example, the matching degree is a ratio calculated by how many items corresponding to the information indicating the feature of the singer's voice included in the music related information are included in the information indicating the features of the voices of a plurality of users is there. In step S109, this ratio is calculated for each music, and the ratio and the music identification information are associated with each other and stored in the memory 101. In this process, the match is determined not only when the information indicating the voice feature of the user and the information indicating the voice feature of the singer completely match, but also when it is determined that they are similar. May be When the process of step S109 is finished, the process proceeds to step S110.

ステップＳ１１０では、ステップＳ１０９において算出された一致度の高さの順番で楽曲が並べられて表示される。具体的には、ステップＳ１０９において算出された一致度を表す割合と楽曲識別情報とが読み出される。楽曲識別情報は、対応付けられた割合の大きな順に並べ替えられる。並べかえられた楽曲識別情報の内容を表示するための画像情報が映像制御部１３１に送られる。映像制御部１３１からモニタ１３２に送られて表示された楽曲識別情報は、ユーザによって選択されることで、楽曲の演奏予約が行われる。映像制御部１３１に楽曲識別情報が送られると、検索処理は終了される。 In step S110, the music pieces are arranged and displayed in the order of the degree of coincidence calculated in step S109. Specifically, the ratio representing the degree of coincidence calculated in step S109 and the music identification information are read out. The music identification information is sorted in descending order of the associated ratio. Image information for displaying the contents of the rearranged music identification information is sent to the video control unit 131. The music identification information sent from the video control unit 131 to the monitor 132 and displayed is selected by the user, and performance reservation of the music is performed. When the music identification information is sent to the video control unit 131, the search process is ended.

次に、ステップＳ１０３においてＮＯと判断された場合の処理について説明する。ステップＳ１０３でＮＯと判断されると、処理はステップＳ１１１に移される。ステップＳ１１１では、終了指令が発生したか否かが判断される。具体的には、ユーザによるキャンセル操作によって処理の終了指令が発生したか否か、またはタイムアウトによって処理の終了指令が発生したか否かが判断される。終了指令が発生したと判断された場合ステップＳ１１１の処理はＹＥＳであるとして、ステップＳ１１２に処理が移される。終了指令が発生していないと判断された場合、ステップＳ１１１の処理はＮＯであるとして、ステップＳ１０１に処理が移される。 Next, the process when it is determined NO in step S103 will be described. If NO in step S103, the process proceeds to step S111. In step S111, it is determined whether an end command has been issued. Specifically, it is determined whether a process termination instruction has been generated by a cancel operation by the user, or whether a process termination instruction has been generated by a timeout. If it is determined that the end command has been issued, the process of step S111 is determined as YES, and the process proceeds to step S112. If it is determined that the end instruction has not been generated, the process of step S111 is determined as NO, and the process is transferred to step S101.

ステップＳ１１２では、ステップＳ１００で開始された音声録音処理を終了する。ステップＳ１１２の処理は、ステップＳ１０５と同様の処理のため、説明を省略する。ステップＳ１１２の処理を終えると、検索処理は終了される。 In step S112, the voice recording process started in step S100 is ended. The process of step S112 is the same as the process of step S105, and thus the description thereof is omitted. When the process of step S112 is finished, the search process is ended.

ステップＳ１０８においてＮＯと判断された場合の処理について説明する。ステップＳ１０８がＮＯと判断されると、処理はステップＳ１１３に移される。ステップＳ１１３では、ステップＳ１０７の処理によって楽曲が一曲も取得されなかったことをモニタ１３２に表示させる。具体的には、記憶部１１０に記憶される、検索できなかったことを表す映像が抽出され、映像制御部１３１に送られる。映像制御部１３１に送られた映像は、モニタ１３２に表示可能な形式に変換されて、モニタ１３２に送られる。モニタ１３２に送られた映像は、モニタ１３２に表示される。ステップＳ１１３の処理を終えると、検索処理は終了される。 A process when it is determined NO in step S108 will be described. If it is determined that step S108 is NO, the process proceeds to step S113. In step S113, it is displayed on the monitor 132 that no music is acquired by the process of step S107. Specifically, an image representing that the search could not be stored, which is stored in the storage unit 110, is extracted and sent to the image control unit 131. The video sent to the video control unit 131 is converted into a format that can be displayed on the monitor 132 and sent to the monitor 132. The video sent to the monitor 132 is displayed on the monitor 132. When the process of step S113 is finished, the search process is ended.

［第２実施形態］
第１実施形態の検索処理と異なる検索処理が実行される第２実施形態について図４を参照して説明する。図４は第２実施形態における検索処理を表すフローチャートであり、カラオケ装置１０の制御部１００が行う処理である。図４では、図２に示される第１実施形態の検索処理における処理と同じ処理を行うステップについては、共通の番号が付与される。共通の番号が付与されたステップについては、処理の説明を省略する。 Second Embodiment
A second embodiment in which a search process different from the search process of the first embodiment is executed will be described with reference to FIG. FIG. 4 is a flowchart showing a search process in the second embodiment, which is a process performed by the control unit 100 of the karaoke apparatus 10. In FIG. 4, common steps are assigned to steps performing the same process as the process in the search process of the first embodiment shown in FIG. 2. The description of the process is omitted for the steps given the common numbers.

図４に示されるフローチャートの各処理は、楽曲情報を検索する動作を制御部１００に実行させるための制御要求が操作処理部１２０または通信部１５０から制御部１００に送られることで開始される。 Each process of the flowchart illustrated in FIG. 4 is started by sending a control request from the operation processing unit 120 or the communication unit 150 to the control unit 100 to cause the control unit 100 to execute an operation of searching for music information.

ステップＳ１００からステップＳ１０２までの処理を終えるとステップＳ１０６に処理が移される。つまり、検索ワードが生成されるとともに記憶されている回答情報から声の特徴を示す情報が抽出される。ステップＳ１０６の処理は第１実施形態と同じであるため説明を省略する。声の特徴を示す情報を抽出する処理を終えると処理はステップＳ２０３に移される。 When the processing from step S100 to step S102 is finished, the processing is moved to step S106. That is, information indicating the feature of the voice is extracted from the answer information in which the search word is generated and stored. The process of step S106 is the same as that of the first embodiment, and thus the description thereof is omitted. When the process of extracting the information indicating the feature of the voice is finished, the process proceeds to step S203.

ステップＳ２０３では、検索ワードが生成され、かつ声の特徴を示す情報が抽出されたか否かが判断される。ステップＳ１０２において検索ワードが生成され、かつ、ステップＳ１０６において声の特徴を示す情報が抽出された場合、ステップＳ２０３はＹＥＳと判断されステップＳ１０４に処理が移される。ステップＳ１０２において検索ワードが生成されなかった、または、ステップＳ１０６において声の特徴を示す情報が抽出されなかった場合、ステップＳ２０３はＮＯと判断されステップＳ１１１に処理が移される。後の処理は、第１実施形態の処理と同じであるため、説明を省略する。 In step S203, it is determined whether a search word has been generated and information indicating voice characteristics has been extracted. When the search word is generated in step S102 and the information indicating the feature of the voice is extracted in step S106, YES is determined in step S203, and the process is moved to step S104. If the search word is not generated in step S102 or if the information indicating the feature of the voice is not extracted in step S106, NO is determined in step S203, and the process is moved to step S111. The subsequent processing is the same as the processing of the first embodiment, so the description will be omitted.

第２実施形態においては、検索ワードが生成され、かつ声の特徴を示す情報が抽出されるまで繰り返しステップＳ１０２およびステップＳ１０６が実行される。これにより、確実に声の特徴を示す情報を抽出し検索に反映することができる。さらに、声の特徴を示す情報を抽出するだけでなく、複数の検索ワードも生成できる。楽曲発音指示情報と、思考発音指示情報と、利用形態発音指示情報とにそれぞれ対応する回答情報を複数取得できるため、ユーザの声の特徴を示す情報および検索ワードが多数生成され、よりユーザの現在の状況にマッチした楽曲を提示することができる。 In the second embodiment, steps S102 and S106 are repeatedly performed until a search word is generated and information indicating voice characteristics is extracted. This makes it possible to reliably extract information indicating voice characteristics and reflect it in the search. Furthermore, in addition to extracting information indicative of voice characteristics, multiple search words can also be generated. Since a plurality of pieces of answer information respectively corresponding to the music sounding instruction information, the thinking sounding instruction information, and the usage mode sounding instruction information can be acquired, a large number of information indicating the characteristics of the user's voice and the search word are generated. It is possible to present songs that match the situation.

［第３実施形態］
第１実施形態、および第２実施形態とは、声の特徴を示す情報を取得する処理の方法が異なる第３実施形態について説明する。具体的には、第３実施形態においては、指示情報に回答例の情報が対応付けられていない場合を説明する。また、図５は第３実施形態においてモニタ１３２に表示される画像である。 Third Embodiment
The first embodiment and the second embodiment will be described about the third embodiment in which the method of processing for acquiring information indicating the feature of the voice is different. Specifically, in the third embodiment, the case where the information of the response example is not associated with the instruction information will be described. FIG. 5 is an image displayed on the monitor 132 in the third embodiment.

第３実施形態におけるステップＳ１０１では、指示情報の内容を表す画像がモニタ１３２に表示される。指示情報に含まれる楽曲発音指示情報、思考発音指示情報、および利用形態発音指示情報がそれぞれランダムで抽出される。抽出された指示情報に対応付けられている画像情報が記憶部１１０から読み出され、映像制御部１３１に送られる。映像制御部１３１に送られた画像情報が表す画像がモニタ１３２に表示される。具体的には、抽出された指示情報の１つが思考発音指示情報であって、どんな気分であるかをユーザに入力させる場合、モニタ１３２に表示される画像の一例は図５の（Ａ）に示される。また、抽出された指示情報の１つが利用形態発音指示情報であって、だれとカラオケボックスに来ているのかをユーザに入力させる場合、モニタ１３２に表示される画像の一例は図５の（Ｂ）に示される。ユーザは、質問文に従って回答用マイク１４２に向かって回答内容を発音する。 In step S101 in the third embodiment, an image representing the content of the instruction information is displayed on the monitor 132. The music sounding instruction information, the thought sounding instruction information, and the usage sounding instruction information included in the instruction information are randomly extracted. The image information associated with the extracted instruction information is read from the storage unit 110 and sent to the video control unit 131. An image represented by the image information sent to the video control unit 131 is displayed on the monitor 132. Specifically, in the case where one of the extracted instruction information is thinking sounding instruction information and the user is to input what mood it is, an example of an image displayed on the monitor 132 is shown in FIG. Indicated. In addition, when one of the extracted instruction information is usage mode sound generation instruction information and the user is to input who is coming to the karaoke box, an example of an image displayed on the monitor 132 is shown in FIG. Shown in). The user pronounces the answer contents toward the answering microphone 142 in accordance with the question text.

ステップＳ１０６では、録音音声からユーザの声の特徴が解析される。具体的には、声の大きさ、速さ、および声が含む複数の周波数の成分が抽出される。まず、声の大きさについては、周波数解析によって周波数成分に分解された波形のパワースペクトラムから声の大きさが解析される。一例として、一般的な人の会話における音圧レベルである８０ｄＢを超える声の大きさである場合、声が大きいと判断され、８０ｄＢ以下の場合、声が小さいと判断される。速さについては、ステップＳ１０２で生成された検索ワードの文字数と、文字数に対応する発音時間の長さと、によって解析される。一例として、１文字の発音時間の長さが０．１秒であることが基準として定められる。検索ワードが「じょうねつ」の５文字である場合、発音時間の長さが０．５秒以内である場合、速いと判断され、０．５秒より長い場合、ゆっくりであると判断される。 In step S106, the characteristics of the user's voice are analyzed from the recorded voice. Specifically, components of the voice size, speed, and multiple frequencies included in the voice are extracted. First, with regard to the loudness of the voice, the loudness of the voice is analyzed from the power spectrum of the waveform separated into frequency components by frequency analysis. As an example, if the voice pressure exceeds 80 dB, which is the sound pressure level in general human conversation, it is determined that the voice is large, and if less than 80 dB, the voice is determined to be small. The speed is analyzed by the number of characters of the search word generated in step S102 and the length of the pronunciation time corresponding to the number of characters. As an example, it is defined as a standard that the length of one character's pronunciation time is 0.1 second. If the search word is 5 characters of "Jo-ntsu", it is judged to be fast if the length of pronunciation time is within 0.5 seconds, and it is judged to be slow if it is longer than 0.5 seconds .

［実施形態の効果］
従来のカラオケ装置においては、過去に歌唱した歌唱データ、および通話した通話データなどを基に声質を取得し、この声質を楽曲検索時に利用する方法がある。しかし、この方法においては、過去に取得された声の特徴を示す情報と、ユーザとの対応関係が不可欠であった。つまり、ユーザの識別情報と歌唱データまたは通話データとが対応付けて保存されなければならない。しかし、本実施形態によれば、ユーザと声の特徴を示す情報とを対応付けて記憶する必要がない。さらに、ユーザの現在の状況を表す情報に基づいて声の特徴を示す情報が抽出されるので、過去に取得された情報から抽出された声の特徴を示す情報よりも、ユーザの現状を表す情報として信頼性の高い情報が抽出できる。これにより、歌唱者は現在の声の特徴を示す情報に一致した楽曲を歌うことができ、聞き手は歌唱者の声と違和感の少ない楽曲の歌唱を聴くことができる。 [Effect of the embodiment]
In the conventional karaoke apparatus, there is a method of acquiring voice quality based on song data sung in the past, call data on a call, and the like, and using this voice quality at the time of music search. However, in this method, information indicating characteristics of voices acquired in the past and a correspondence relationship with the user are essential. That is, the identification information of the user and the song data or the call data must be stored in association with each other. However, according to the present embodiment, it is not necessary to associate and store the user and the information indicating the feature of the voice. Furthermore, since the information indicating the feature of the voice is extracted based on the information indicating the current situation of the user, the information indicating the current situation of the user rather than the information indicating the feature of the voice extracted from the information acquired in the past Reliable information can be extracted. As a result, the singer can sing a piece of music that matches the information indicating the feature of the current voice, and the listener can listen to the singing of a piece of music that is less uncomfortable with the voice of the singer.

［変形例１］
各実施形態におけるステップＳ１０１では、指示情報に対応付けられた画像情報が表す画像がモニタ１３２に表示された。しかし、指示情報に対応付けられた音声情報が表す音声をスピーカ１４３が発音することにより、指示情報の内容がユーザに提示されてもよい。具体的には、指示情報は音声情報と対応付けられて記憶部１１０に記憶される。指示情報に対応付けられた音声情報が記憶部１１０から読み出され、音響制御部１４０に送られる。音響制御部１４０は、音声情報に従って音声信号を生成し、スピーカ１４３に送る。スピーカ１４３は音声信号に従って音声を発音する。ユーザは、発音された質問に従って回答用マイク１４２に向かって回答内容を発音する。 [Modification 1]
In step S101 in each embodiment, an image represented by image information associated with the instruction information is displayed on the monitor 132. However, the content of the instruction information may be presented to the user by the speaker 143 pronouncing the audio represented by the audio information associated with the instruction information. Specifically, the instruction information is stored in the storage unit 110 in association with the voice information. The audio information associated with the instruction information is read from the storage unit 110 and sent to the sound control unit 140. The sound control unit 140 generates an audio signal according to the audio information and sends the audio signal to the speaker 143. The speaker 143 produces an audio according to the audio signal. The user utters the answer contents toward the answer microphone 142 in accordance with the uttered question.

［変形例２］
各実施形態において、図２に示されるフローチャートの検索処理は、カラオケ装置１０の制御部１００によって行われたが、端末装置２０の制御部において行われてもよい。または回答用マイク１４２は、カラオケ装置１０に接続されていたが、端末装置２０に接続され、または端末装置２０に装備されてもよい。さらに、カラオケ装置１０または端末装置２０において回答情報から抽出された声の特徴を表す特徴情報がサーバ装置３０に送られ、サーバ装置３０に記憶される楽曲情報が特徴情報に従って検索されてもよい。つまりは、フローチャートの検索処理は、カラオケ装置１０と端末装置２０とサーバ装置３０との組み合わせによって行われてもよい。 [Modification 2]
In each embodiment, the search process of the flowchart shown in FIG. 2 is performed by the control unit 100 of the karaoke apparatus 10, but may be performed by the control unit of the terminal device 20. Alternatively, although the response microphone 142 is connected to the karaoke apparatus 10, it may be connected to the terminal device 20 or may be equipped in the terminal device 20. Furthermore, the feature information representing the feature of the voice extracted from the answer information in the karaoke apparatus 10 or the terminal device 20 may be sent to the server device 30, and the music information stored in the server device 30 may be searched according to the feature information. That is, the search process of the flowchart may be performed by a combination of the karaoke apparatus 10, the terminal device 20, and the server device 30.

［変形例３］
各実施形態において、検索処理の手順は、検索ワードによって楽曲が選択され、思考、および利用形態によって楽曲が絞り込まれ、声の特徴によって楽曲の表示順が決定された。しかし、楽曲の表示順を確定するために参照する情報の順番を入れ替えてもよい。また、検索を特定の順番に行った結果、楽曲が１曲も取得できなかった場合、楽曲検索のための処理の、ひとつ前の結果をユーザに提示してもよい。つまり、一例としてステップＳ１０８において楽曲が取得できなかったと判断された場合、ステップＳ１０４において検索された楽曲をユーザに提示してもよい。また、ステップＳ１０４において検索された結果について、ステップＳ１０９およびステップＳ１１０の処理を行ってもよい。このように、前の結果を提示してもよいし、検索の順序を入れ替えて再度検索してもよい。これによりユーザの声の特徴を示す情報と歌手の声の特徴を示す情報との一致度が高い楽曲を提示できる可能性が高くなる。 [Modification 3]
In each embodiment, in the search processing procedure, the music is selected by the search word, the music is narrowed down by thinking and use form, and the display order of the music is determined by the feature of the voice. However, the order of the information to be referred to may be switched to determine the display order of the music. Moreover, as a result of performing a search in a specific order, when one music can not be acquired, the user may be presented with the result immediately before the process for the music search. That is, as one example, when it is determined that the music can not be acquired in step S108, the music searched in step S104 may be presented to the user. Further, the processing of step S109 and step S110 may be performed on the result of the search in step S104. Thus, the previous result may be presented, or the order of the search may be changed to search again. As a result, the possibility of being able to present a music piece having a high degree of matching between the information indicating the feature of the user's voice and the information indicating the feature of the singer's voice is increased.

［変形例４］
各実施形態において、ステップＳ１１０では、ステップＳ１０９において算出された一致度の高さの順番で楽曲が並べられてモニタ１３２に表示された。ステップＳ１１０では、ステップＳ１０９において算出された一致度と楽曲識別情報とが対応付けられてモニタ１３２に表示されてもよい。これにより、ユーザの声の特徴を示す情報と歌手の声の特徴を示す情報との一致度の高さを順番で知ることができるだけでなく、どの程度一致しているのかを定量的に知ることができる。順番だけではわからない一致度が分かり、ユーザは、順番だけでなく、順番の信頼度も知ることができる。 [Modification 4]
In each embodiment, in step S110, the music pieces are arranged in the order of the degree of coincidence calculated in step S109 and displayed on the monitor 132. In step S110, the matching degree calculated in step S109 may be associated with the music identification information and displayed on the monitor 132. In this way, not only can it be possible to sequentially know the degree of coincidence between the information indicating the characteristics of the user's voice and the information indicating the characteristics of the singer's voice, but also quantitatively know how much they match. Can. The degree of coincidence which can not be understood only by the order can be known, and the user can know not only the order but also the reliability of the order.

［変形例５］
各実施形態において、ユーザの識別情報と声の特徴を示す情報との対応付けが不要であるため、従来のログイン処理などによりユーザを特定する必要がなかった。しかし、ログイン処理を実行し、ユーザの識別情報と声の特徴を示す情報とを対応付けて記憶してもよい。これにより、ユーザの声の特徴を示す情報を抽出する処理において、過去に記憶された声の特徴を示す情報と現在の声の特徴を示す情報とを比較することができる。そのため、今回抽出できた声の特徴を示す情報が、過去に抽出された声の特徴を示す情報と完全に一致している場合、過去の声の特徴を示す情報を今回抽出できなかった声の特徴を示す情報とすることで、より声の特徴を示す情報に一致する楽曲を抽出することができる。 [Modification 5]
In each embodiment, it is not necessary to specify the user by the conventional login process or the like because it is not necessary to associate the identification information of the user with the information indicating the feature of the voice. However, the login process may be executed to associate the identification information of the user with the information indicating the feature of the voice and store them. Thus, in the process of extracting information indicating the feature of the user's voice, the information indicating the feature of the voice stored in the past can be compared with the information indicating the feature of the current voice. Therefore, when the information indicating the feature of the voice that can be extracted this time completely matches the information indicating the feature of the voice extracted in the past, the voice of the voice for which the information indicating the feature of the past voice could not be extracted this time By using the information indicating the feature, it is possible to extract the music that matches the information indicating the feature of the voice.

［変形例６］
各実施形態において、指示情報は、楽曲発音指示情報と、思考発音指示情報と、利用形態発音指示情報とをそれぞれ含んでいたが、さらにログイン指示情報を含んでいてもよい。ログイン指示情報が表示または発音され、ユーザは声によってユーザの識別情報、およびパスワードを回答用マイク１４２に向かって発音する。声により受け付けられた識別情報およびパスワードは、ステップＳ１０２において文字に置き換えられ、従来公知のログイン処理が実行されればよい。 [Modification 6]
In each embodiment, the instruction information includes the music sounding instruction information, the thought sounding instruction information, and the usage sounding instruction information, but may further include the login instruction information. The login instruction information is displayed or pronounced, and the user speaks his / her identification information and password to the answer microphone 142 by voice. The identification information and password accepted by voice may be replaced with characters in step S102, and a conventionally known login process may be executed.

［変形例７］
各実施形態において、回答用マイクは、回答専用のマイクであったが、歌唱用マイクと兼用する構成であってもよい。一例として、音声による楽曲検索処理が開始されてからマイクのスイッチを入れることで、回答用のマイクであると判断されればよい。 [Modification 7]
In each embodiment, the response microphone is a response-dedicated microphone, but may be shared with a singing microphone. As an example, it may be determined that the microphone is a response microphone by switching on the microphone after the music retrieval process by voice is started.

各実施形態、および各変形例は、それぞれ独立したものに限定されず、適宜組み合わされて実施されてもよい。 Each embodiment and each modification are not limited to an independent thing, respectively, and may be combined suitably and carried out.

［特許請求の範囲と実施形態の対応関係］
記憶部１１０、および端末装置の記憶部、および、サーバ装置の記憶部は、記憶手段の一例である。モニタ１３２、およびスピーカ１４は指示提示手段の一例である。回答用マイク１４２は、取得手段の一例である。音声処理部１４０は、抽出手段の一例である。制御部１００は、検索手段および決定手段の一例である。制御部１００とメモリ１０１との組み合わせがコンピュータの一例である。 [Correspondence between claim and embodiment]
The storage unit 110, the storage unit of the terminal device, and the storage unit of the server device are examples of storage means. The monitor 132 and the speaker 14 are an example of an instruction presentation unit. The response microphone 142 is an example of an acquisition unit. The voice processing unit 140 is an example of an extraction unit. The control unit 100 is an example of a search unit and a determination unit. The combination of the control unit 100 and the memory 101 is an example of a computer.

指示情報は指示情報の一例であって、楽曲発音指示情報は楽曲発音指示情報の一例、思考発音指示情報は思考発音指示情報の一例、利用形態発音指示情報は利用形態発音指示情報の一例である。回答情報が、声情報、第１声情報、第２声情報および第３声情報の一例である。 The instruction information is an example of the instruction information, the music sound generation instruction information is an example of the music sound generation instruction information, the thought sound generation instruction information is an example of the thought sound generation instruction information, and the usage form sound generation instruction information is an example of the usage form sound generation instruction information. . The answer information is an example of voice information, first voice information, second voice information, and third voice information.

ステップＳ１０６の処理が、第１の比較処理、および第２の比較処理の一例である。ステップＳ１０１の処理が、指示提示ステップの一例である。ステップＳ１００で開始され、ステップＳ１０５で終了される音声録音処理が取得ステップの一例である。ステップＳ１０２、およびステップＳ１０６の処理が抽出ステップの一例である。ステップＳ１０４の処理が検索ステップの一例である。ステップＳ１０７、およびステップＳ１０９とステップＳ１１０とが決定ステップの一例である。 The process of step S106 is an example of the first comparison process and the second comparison process. The process of step S101 is an example of the instruction presenting step. The voice recording process started in step S100 and ended in step S105 is an example of the acquisition step. The processes of step S102 and step S106 are an example of the extraction step. The process of step S104 is an example of a search step. Steps S107, S109 and S110 are an example of the determination step.

１カラオケシステム
１０カラオケ装置
２０端末装置
３０サーバ装置
１００制御部
１０１メモリ
１１０記憶部
１２０操作制御部
１３０映像再生部
１３１映像処理部
１４０音響制御部
１４１歌唱用マイク
１４２回答用マイク DESCRIPTION OF SYMBOLS 1 karaoke system 10 karaoke apparatus 20 terminal device 30 server apparatus 100 control part 101 memory 110 storage part 120 operation control part 130 video reproduction part 131 video processing part 140 acoustic control part 141 microphone 146 reply microphone

Claims

Storage means for storing instruction information representing a specific instruction including an instruction for causing the user to pronounce music related information;
An instruction presenting unit that presents the specific instruction to the user according to the instruction information stored in the storage unit;
Acquisition means for acquiring voice information representing a voice uttered by the user in response to the specific instruction presented by the instruction presentation means;
Extraction means for extracting response content including the music related information in response to the specific instruction from the user and information indicating characteristics of the current user's voice from the voice information acquired by the acquisition means;
Search means for searching a plurality of pieces of music information according to the music related information in the response contents extracted by the extraction means;
Among the plurality of pieces of music information searched by the search means based on the response contents other than the music related information among the response contents extracted by the extraction means and the information indicating the feature of the voice of the current user. Means for determining the priority of music information to be presented to the user at
A karaoke apparatus comprising:

The instruction information includes at least one of music sound generation instruction information, thought sound generation instruction information, and usage form sound generation instruction information.
The music sound generation instruction information represents an instruction to cause the user to sound the music related information.
The thought pronouncing instruction information represents an instruction to cause the user to pronounce thinking information representing the user's thoughts and emotions of the music associated with the music related information,
The use form sound generation instruction information represents an instruction to cause the user to pronounce the use form information representing the use form of the karaoke apparatus by the user,
The acquisition means acquires first voice information and at least one of second voice information and third voice information representing a voice uttered by a user.
The first voice information is information representing a voice including the music related information,
The second voice information is information representing a voice including the thought information,
The third voice information is information representing a voice including the usage mode information,
The extraction means is
The music related information, at least one of the thought information and the usage form information, and information indicating the feature of the voice from the first voice information and at least one of the second voice information and the third voice information And the
The determining means is
According to the information indicating the feature of the voice, and at least one of the thought information and the usage form information, the priority order of the music information to be presented to the user among the plurality of music information searched by the search means is determined The karaoke apparatus according to claim 1, characterized in that:

The thought pronunciation instruction information includes information representing a plurality of first response examples to which the user responds, and the usage form pronunciation instruction information includes information representing a plurality of second response examples to which the user responds.
The acquisition means acquires the second voice information and the third voice information representing the user's voice when the user pronounces the first response example and the second response example, respectively.
The extraction means is
A first comparison process for comparing the second voice information with a first reference determined in advance in relation to voice characteristics in the first response example;
Performing at least one comparison process of a second comparison process of comparing the third voice information with a second reference determined in advance with respect to voice characteristics, for the second response example;
According to the executed comparison process, at least one voice element of voice size, speed, and a plurality of frequencies included in the voice is extracted as the information indicating the feature of the voice;
The determining means is
A priority order of the plurality of pieces of music information to be presented to the user is determined according to the degree of coincidence between the voice element extracted by the extraction means and the voice element associated with the music information. The karaoke apparatus according to claim 2.

Until at least one of the music related information and information indicating the feature of at least one voice of the user are respectively extracted from the voice information by the extraction means,
The karaoke apparatus according to any one of claims 1 to 3, wherein the acquisition means acquires voice information representing a voice uttered by the user in response to the specific instruction.

When there is no music information associated with the voice element matched with the voice element extracted by the extraction means, usage mode information representing a usage mode of the karaoke apparatus by the user and usage mode information correlated with the music information The present invention is characterized in that the priority order of presenting the plurality of pieces of music information to the user is determined in accordance with at least one of the degree of coincidence between the thought information and the thought information associated with the music information. 3 or the karaoke apparatus of Claim 4.

The music information is stored in association with use form information representing a use form of the karaoke apparatus by a user.
The acquisition means acquires the sound in the karaoke box in addition to the voice pronounced by the user in response to the specific instruction.
The extraction means extracts the sound in the karaoke box,
The determination unit is configured to prioritize music information associated with usage form information representing a form in which a plurality of users use the karaoke apparatus as the sounds in the karaoke box extracted by the extraction unit include various sounds. The karaoke apparatus according to any one of claims 1 to 5, wherein the karaoke apparatus is determined to be a higher rank order and presented to the user.

A computer of a karaoke apparatus comprising storage means for storing instruction information representing a specific instruction including an instruction for causing the user to pronounce music related information;
An instruction presenting step of causing the user to present the specific instruction according to the instruction information stored in the storage means;
An acquiring step of acquiring voice information representing a voice uttered by the user in response to the specific instruction presented by the instruction presenting step;
An extraction step of extracting, from the voice information acquired in the acquisition step, response contents including the music related information in response to the specific instruction by the user, and information indicating characteristics of the current user's voice;
A search step of searching a plurality of pieces of music information according to the music related information in the response content extracted in the extraction step;
Among the response contents extracted in the extraction step , the response contents other than the music related information and the information indicating the feature of the voice of the current user among the plurality of music information searched out in the search step Determining the priority of the music information to be presented to the user at
A program for a karaoke apparatus characterized in that

A karaoke system in which a karaoke apparatus, a server apparatus, and a portable terminal are provided so as to be able to communicate with each other via a network.
Storage means for storing instruction information representing a specific instruction including an instruction for causing the user to pronounce music related information;
An instruction presenting unit that presents the specific instruction to the user according to the instruction information stored in the storage unit;
Acquisition means for acquiring voice information representing a voice uttered by the user in response to the specific instruction presented by the instruction presentation means;
Extraction means for extracting response content including the music related information in response to the specific instruction from the user and information indicating characteristics of the current user's voice from the voice information acquired by the acquisition means;
Search means for searching a plurality of pieces of music information according to the music related information in the response contents extracted by the extraction means;
Among the plurality of pieces of music information searched by the search means based on the response contents other than the music related information among the response contents extracted by the extraction means and the information indicating the feature of the voice of the current user. Means for determining the priority of music information to be presented to the user at
Karaoke system characterized by having.