JP3854263B2

JP3854263B2 - Karaoke device, karaoke method, and program

Info

Publication number: JP3854263B2
Application number: JP2003379463A
Authority: JP
Inventors: 一喜冨永
Original assignee: Konami Digital Entertainment Co Ltd
Current assignee: Konami Digital Entertainment Co Ltd
Priority date: 2003-11-10
Filing date: 2003-11-10
Publication date: 2006-12-06
Anticipated expiration: 2023-11-10
Also published as: JP2005141122A

Description

本発明は、カラオケ装置、カラオケ方法、ならびに、これらをコンピュータにて実現するプログラムに関する。 The present invention relates to a karaoke apparatus, a karaoke method, and a program for realizing these with a computer.

従来から、種々のカラオケ装置が提供されている。カラオケ装置では、ＭＩＤＩ（Musical Instrument Digital Interface）データ等で用意された楽曲データや、ＰＣＭ（Pulse Code Modulation）データ等で用意された音声波形データをあらかじめ用意しており、これを伴奏データとして音声出力するとともに、音声出力のタイミングに合わせてテレビジョン装置等の画面に歌詞を表示する。また、カラオケ装置に接続されたマイクから歌い手の音声波形データの入力を受け付け、これと、上記の伴奏データとを適宜混合（mixing；ミキシング）して、スピーカから音声出力する。伴奏データは、ＣＤ−ＲＯＭ（Compact Disk Read Only Memory）やＤＶＤ−ＲＯＭ（Digital Versatile Disk ROM）などの情報記録媒体に記録されている場合もあるし、コンピュータ通信網を介して、他の記憶装置から取得可能に構成されている場合もある。このようなカラオケ装置の技術については、たとえば、以下の文献に開示されている。
特許第２９２５７５９号公報 Conventionally, various karaoke apparatuses have been provided. In the karaoke machine, music data prepared as MIDI (Musical Instrument Digital Interface) data and voice waveform data prepared as PCM (Pulse Code Modulation) data are prepared in advance, and this is output as accompaniment data. At the same time, the lyrics are displayed on the screen of a television device or the like in accordance with the timing of the audio output. In addition, the input of the singer's voice waveform data is received from a microphone connected to the karaoke apparatus, and this is mixed with the accompaniment data as appropriate, and the voice is output from the speaker. The accompaniment data may be recorded on an information recording medium such as a CD-ROM (Compact Disk Read Only Memory) or a DVD-ROM (Digital Versatile Disk ROM), or other storage device via a computer communication network. In some cases, it can be obtained from The technology of such a karaoke apparatus is disclosed in the following documents, for example.
Japanese Patent No. 2925759

[特許文献１]では、このようなカラオケ装置において、歌い手の歌唱力を採点する技術が開示されている。 [Patent Document 1] discloses a technique for scoring a singer's singing ability in such a karaoke apparatus.

従来、このようなカラオケ装置は、カラオケボックスと呼ばれるカラオケ専用の施設や、各種の飲食店にて利用され、これらの施設においては、スピーカから大きな音量で、伴奏と歌唱とを出力するのが一般的であった。 Conventionally, such a karaoke apparatus is used in a karaoke-dedicated facility called a karaoke box and various restaurants, and these facilities generally output accompaniment and singing at a loud volume from a speaker. It was the target.

さらに、コンピュータ技術の発展に伴い、各家庭にあるゲーム装置を利用して、カラオケを楽しむことができる環境が整いつつある。このようなゲーム装置を利用したカラオケでは、ビデオ出力はテレビジョン装置などに接続され、歌詞等を表示するのに用いられる。マイクはゲーム装置に直結され、マイクから入力された音声は、ゲーム装置用のＤＶＤ−ＲＯＭに記憶された伴奏データやコンピュータ通信網を介してダウンロードされた伴奏データとミキシングされて、テレビジョン装置やステレオ装置の音声入力を介してスピーカから出力される。 Furthermore, with the development of computer technology, an environment in which karaoke can be enjoyed by using game devices in each home is being prepared. In karaoke using such a game device, the video output is connected to a television device or the like and used to display lyrics and the like. The microphone is directly connected to the game device, and the sound input from the microphone is mixed with the accompaniment data stored in the DVD-ROM for the game device or the accompaniment data downloaded via the computer communication network, and the television device or The sound is output from the speaker via the audio input of the stereo device.

しかしながら、音声を出力するのは、スピーカだけとは限らない。たとえば、深夜にカラオケを楽しみたい場合や、１人で練習をしたい場合などは、ヘッドホンやイヤフォンを利用して、ゲーム装置から出力される音声を聞くこともある。このような環境下では、大きな声で発声するのではなく、小さい声で発声して歌を歌いたい場合も多い。 However, it is not always the case that a speaker outputs sound. For example, if you want to enjoy karaoke at midnight or want to practice alone, you may hear the sound output from the game device using headphones or earphones. In such an environment, there are many cases where it is desired to sing a song with a small voice, instead of speaking with a loud voice.

一方で、スピーカから音声を出力するか、ヘッドフォンやイヤフォンから音声を出力するか、を、歌い手が適宜ゲーム装置を操作して設定するのでは煩雑にすぎる。 On the other hand, it is too complicated for the singer to appropriately set the sound output from the speaker or the sound output from the headphones or earphones by operating the game device.

そこで、どのような環境下でカラオケが行われているかを適切に推測して、選択する技術が強く求められている。
本発明は、このような課題を解決するためになされたもので、カラオケが行われている環境を適切に推測して、当該環境に合わせてカラオケを行うのに好適なカラオケ装置、カラオケ方法、ならびに、これらをコンピュータによって実現するプログラムを提供することを目的とする。 Therefore, there is a strong demand for a technique for appropriately estimating and selecting under what circumstances karaoke is being performed.
The present invention was made in order to solve such a problem, and appropriately estimates an environment in which karaoke is performed, and a karaoke apparatus, a karaoke method, and the like suitable for performing karaoke in accordance with the environment, An object of the present invention is to provide a program for realizing these by a computer.

以上の目的を達成するため、本発明の原理にしたがって、下記の発明を開示する。
本発明の第１の観点に係るカラオケ装置は、入力受付部、記憶部、混合部、出力部、および、制御部を備え、以下のように構成する。 In order to achieve the above object, the following invention is disclosed in accordance with the principle of the present invention.
The karaoke apparatus according to the first aspect of the present invention includes an input receiving unit, a storage unit, a mixing unit, an output unit, and a control unit, and is configured as follows.

すなわち、入力受付部は、第１の音声データの入力を受け付ける。たとえば、マイクなどが入力受付部として機能し、歌い手が歌う歌の音声データ（歌唱データ）が第１の音声データとなるが、当該第１の音声データには、カラオケが行われている環境下の各種の雑音や、カラオケに直接関係のない音声情報も含まれている。このような、歌唱データ以外にマイクから入力される音声を「環境音」と呼ぶ。 That is, the input receiving unit receives input of the first audio data. For example, a microphone or the like functions as an input reception unit, and voice data (singing data) of a song sung by a singer becomes the first voice data. The first voice data is an environment where karaoke is performed. In addition, various kinds of noise and voice information not directly related to karaoke are also included. Such voice input from the microphone other than the singing data is referred to as “environmental sound”.

一方、記憶部は、第２の音声データをあらかじめ記憶する。典型的には、ＣＤ−ＲＯＭやＤＶＤ−ＲＯＭなどの情報記録媒体に、当該第２の音声データが記録されており、これは、カラオケの伴奏データとなる。なお、ネットワークを介して一時的にゲーム装置のハードディスク等に伴奏データがダウンロードされた場合には、当該ハードディスク等が記憶部に相当することとなる。 On the other hand, the storage unit stores the second audio data in advance. Typically, the second audio data is recorded on an information recording medium such as a CD-ROM or DVD-ROM, and this is accompaniment data for karaoke. When accompaniment data is temporarily downloaded to the hard disk or the like of the game device via the network, the hard disk or the like corresponds to the storage unit.

さらに、混合部は、入力を受け付けられた第１の音声データと、あらかじめ記憶された第２の音声データと、を混合する。第２の音声データ（伴奏データ）がＭＩＤＩデータである場合には、別途用意された音源データを参照してＭＩＤＩデータを音声波形データであるＰＣＭデータに変換し、これと、第１の音声データ（歌唱データ）であるＰＣＭデータと加算もしくは重み付き加算を行い、必要に応じて飽和演算を採用することとして、ミキシングを行うのである。 Furthermore, the mixing unit mixes the first audio data that has been accepted for input with the second audio data that is stored in advance. When the second audio data (accompaniment data) is MIDI data, MIDI data is converted into PCM data which is audio waveform data with reference to separately prepared sound source data, and the first audio data. Mixing is performed by performing addition or weighted addition with PCM data which is (singing data), and adopting a saturation operation as necessary.

そして、出力部は、混合された結果の音声データを出力する。すなわち、ゲーム装置に接続されたステレオ装置やテレビジョン装置のスピーカのほか、ヘッドホンやイヤフォンなどから、伴奏データと歌唱データが出力され、伴奏と歌とが一緒に聞こえることとなる。 And an output part outputs the audio | voice data of the mixed result. That is, accompaniment data and singing data are output from headphones, earphones, etc., in addition to speakers of a stereo device or a television device connected to the game device, and the accompaniment and the song can be heard together.

一方、制御部は、入力を受け付けられた第１の音声データに含まれるあらかじめ記憶された第２の音声データの成分の強度にしたがって、混合される当該第１の音声データの混合比を制御する。スピーカから音声出力がされている場合には、通常のカラオケの楽しみ方をしていると予想され、ヘッドホンやイヤフォンから音声出力がされている場合には、たとえば深夜にカラオケをしていたり、１人で練習をしていると予想される。 On the other hand, the control unit controls the mixing ratio of the first audio data to be mixed according to the intensity of the component of the second audio data stored in advance included in the first audio data whose input has been accepted. . When sound is output from the speaker, it is expected that the user enjoys normal karaoke. When sound is output from the headphones or earphones, for example, karaoke is performed at midnight, It is expected that people are practicing.

そして、スピーカから音声出力がされている場合は、伴奏データの音声が環境音として入力されることとなる。一方、ヘッドホン等を利用している場合には、伴奏データの音声は環境音となることは（ヘッドホンからの音漏れがない限り）ない。そこで、マイクから入力された第１の音声データ（歌唱データと環境音の和）に、第２の音声データ（伴奏データ）の成分がどの程度含まれているかを調べ、その強度が大きければ、スピーカを用いてカラオケを行っており、小さければ、ヘッドホン等を利用していると推測する。そして、ヘッドホン等を利用している場合には、大きな声で歌う必要がないようにするため、マイクからの音声入力の割合が大きくなるように、入力受付部や混合部を制御するのである。 When sound is output from the speaker, the sound of the accompaniment data is input as the environmental sound. On the other hand, when using headphones or the like, the sound of the accompaniment data does not become an environmental sound (unless there is sound leakage from the headphones). Therefore, it is examined how much the component of the second sound data (accompaniment data) is included in the first sound data (the sum of the singing data and the environmental sound) input from the microphone. Karaoke is performed using a speaker. If it is small, it is assumed that headphones are used. Then, when using headphones or the like, the input receiving unit and the mixing unit are controlled so that the ratio of voice input from the microphone is increased so that it is not necessary to sing with a loud voice.

本発明によれば、カラオケの音声出力をスピーカで楽しんでいるか、それとも、ヘッドフォン等で楽しんでいるかを適切に推測し、それに合わせて、歌唱データをミキシングするときの歌唱データの強度を調整するカラオケ装置を提供することができる。 According to the present invention, a karaoke apparatus that appropriately estimates whether the sound output of karaoke is enjoyed with a speaker or with headphones or the like, and adjusts the intensity of the singing data when mixing the singing data accordingly. An apparatus can be provided.

また、本発明のカラオケ装置において、制御部は、混合される当該第１の音声データの強度が、当該第２の音声データの成分の強度にあらかじめ対応付けられた強度となるように、当該第１の音声データの混合比を制御するように構成することができる。たとえば、あらかじめ、歌唱データの強度を数段階に区分けしておき、それぞれの段階に、歌唱データの出力強度をどの程度とするか、を対応付け、これに応じた混合比を定めておく。そして、対応付けられた出力強度で歌唱データを混合して出力するのである。
本発明は、上記発明の好適実施形態の一つであり、典型的な環境下でのカラオケの行われ方を実験等で調べ、その結果に基づいて適切にマイクからの歌唱データの音量を制御したい場合等に適用することができる。 Further, in the karaoke apparatus of the present invention, the control unit is configured so that the intensity of the first audio data to be mixed becomes an intensity that is previously associated with the intensity of the component of the second audio data. It can be configured to control the mixing ratio of one audio data. For example, the intensity of the song data is divided into several stages in advance, and the degree of the output intensity of the song data is associated with each stage, and a mixing ratio corresponding to this is determined. Then, the singing data is mixed and output with the corresponding output intensity.
The present invention is one of the preferred embodiments of the above-described invention. The method of performing karaoke under a typical environment is examined by experiment and the volume of singing data from the microphone is appropriately controlled based on the result. It can be applied when you want to.

また、本発明のカラオケ装置において、制御部は、混合される当該第１の音声データの強度が当該第２の音声データの成分の強度に対して単調増加するように、当該第１の音声データの混合比を制御するように構成することができる。
すなわち、マイクに入る伴奏の音声が大きくなればなるほど、歌唱の音声も大きくするように混合比を制御するのである。
本発明は、上記発明の好適実施形態の一つであり、本発明によれば、伴奏の音量と歌唱の音量とを自動的に適切に調整して混合するカラオケ装置を提供することができる。 Moreover, in the karaoke apparatus of the present invention, the control unit includes the first audio data so that the intensity of the first audio data to be mixed monotonously increases with respect to the intensity of the component of the second audio data. The mixing ratio can be controlled.
That is, as the accompaniment sound entering the microphone increases, the mixing ratio is controlled so that the singing sound increases.
The present invention is one of the preferred embodiments of the present invention, and according to the present invention, it is possible to provide a karaoke apparatus that automatically adjusts and mixes the accompaniment volume and the singing volume.

また、本発明のカラオケ装置において、制御部は、当該入力受付部の感度を変更することによって、混合される当該第１の音声データの混合比を制御する。
ゲーム装置で利用されるマイクに限らず、一般的なマイクであっても、指向性を調整したり、感度を調整できるようになっているものは多い。そこで、本発明では、スピーカを用いた場合にはマイクの感度を低くし、ヘッドフォン等を用いた場合にはマイクの感度を高くすることによって、歌唱データの出力強度を調整する。
本発明は、上記発明の好適実施形態の一つであり、本発明によれば、マイクの感度を自動的に適切に調整して伴奏と歌唱とを混合するカラオケ装置を提供することができる。 In the karaoke apparatus of the present invention, the control unit controls the mixing ratio of the first audio data to be mixed by changing the sensitivity of the input receiving unit.
Not only microphones used in game devices but also general microphones, there are many that can adjust directivity and sensitivity. Therefore, in the present invention, the output intensity of the singing data is adjusted by lowering the sensitivity of the microphone when using a speaker and increasing the sensitivity of the microphone when using headphones or the like.
The present invention is one of the preferred embodiments of the present invention, and according to the present invention, it is possible to provide a karaoke apparatus that automatically adjusts the sensitivity of a microphone and mixes accompaniment and singing.

また、本発明のカラオケ装置において、制御部は、混合される当該第１の音声データの増幅率を変更することによって、混合される当該第１の音声データの混合比を制御するように構成することができる。
上記のように、ミキシングの際には、重み付き加算を行い、その重みを変更することによって、伴奏と歌唱との出力音量の割合を調整することができる。そこで、本発明では、スピーカを用いた場合にはマイク側の入力の増幅率を低くし、ヘッドフォン等を用いた場合にはマイク側の入力の増幅率を高くすることによって、歌唱データの出力強度を調整する。
本発明は、上記発明の好適実施形態の一つであり、本発明によれば、ミキシングの際のマイク入力の重みを自動的に適切に調整して伴奏と歌唱とを混合するカラオケ装置を提供することができる。 In the karaoke apparatus of the present invention, the control unit is configured to control the mixing ratio of the first audio data to be mixed by changing the amplification factor of the first audio data to be mixed. be able to.
As described above, at the time of mixing, weighted addition is performed, and the weight of the accompaniment and singing can be adjusted by changing the weight. Therefore, in the present invention, the output intensity of singing data is reduced by lowering the amplification factor of the microphone side input when using a speaker, and by increasing the amplification factor of the microphone side input when using headphones or the like. Adjust.
The present invention is one of the preferred embodiments of the present invention. According to the present invention, there is provided a karaoke apparatus that automatically adjusts the weight of a microphone input during mixing and mixes accompaniment and singing. can do.

また、本発明のカラオケ装置において、制御部は、当該第２の音声データ内の所定の検査用音声データが、混合され、出力されるごとに、当該第１の音声データに含まれる当該所定の検査用音声データの成分の強度を取得して、当該取得された成分の強度によって、以降の当該第１の音声データの混合比を制御するように構成することができる。 In the karaoke apparatus according to the present invention, the control unit mixes and outputs the predetermined test audio data in the second audio data and outputs the predetermined audio data included in the first audio data. It is possible to obtain the intensity of the component of the test audio data and control the subsequent mixing ratio of the first audio data according to the intensity of the acquired component.

すなわち、伴奏そのものの音声を用いてスピーカかヘッドホン等かを判別するのではなく、カラオケ演奏の始まる前や、カラオケ演奏の途中などの所定のタイミングで、検査用音声データという特別なデータを出力し、このデータがどの程度マイクから入ってくるかによって歌唱データの出力強度を決めるものである。
本発明では、専用の検査用音声データを採用することによって、より正確にスピーカを用いているか、ヘッドホン等を用いているか、を判別することができるようになる。 In other words, instead of determining whether the accompaniment itself is a speaker or headphones, special data called test audio data is output at a predetermined timing before the karaoke performance starts or during the karaoke performance. The output intensity of the singing data is determined by how much this data comes from the microphone.
In the present invention, it is possible to more accurately determine whether a speaker is used or headphones are used by adopting dedicated inspection audio data.

また、本発明のカラオケ装置において、記憶部は、当該所定の検査用音声データとして、人間の可聴領域外の音声データをあらかじめ記憶するように構成することができる。
すなわち、人間には聞こえない音声を利用して、スピーカを用いているか、ヘッドホン等を用いているか、を判別するのである。
本発明では、たとえばカラオケ演奏の曲の途中であっても、歌い手の歌唱の邪魔にならないように、かつ正確にスピーカを用いているか、ヘッドホン等を用いているか、を判別することができるようになる。 In the karaoke apparatus of the present invention, the storage unit can be configured to store in advance audio data outside the human audible area as the predetermined test audio data.
That is, it is determined whether a speaker or headphones are used by using sound that cannot be heard by humans.
In the present invention, for example, even in the middle of a karaoke performance, it is possible to discriminate whether a speaker or headphones are used accurately so as not to disturb the singer's singing. Become.

また、本発明のカラオケ装置は、記憶部にかえて、当該第２の音声データをコンピュータ通信網を介して取得する取得部を備え、「あらかじめ記憶された第２の音声データ」にかえて「コンピュータ通信網を介して取得された第２の音声データ」を用いるように構成することができる。 Moreover, the karaoke apparatus of the present invention includes an acquisition unit that acquires the second audio data via a computer communication network instead of the storage unit, and replaces the “second audio data stored in advance” with “ The second audio data obtained via the computer communication network ”can be used.

本発明は、上記発明を変形した実施形態に係るものであり、コンピュータ通信網を介して伴奏データがストリーミング配信され、配信と音声出力とがパイプライン的に行われて、ゲーム装置のハードディスク等に伴奏データがダウンロードされない場合に適用されるものである。
本発明によれば、ストリーミング配信によってカラオケの伴奏データが提供される場合であっても、スピーカを用いているか、ヘッドホン等を用いているか、を判別して、適切にマイクからの入力の強度を調整することができるようになる。 The present invention relates to a modified embodiment of the above-described invention. Accompaniment data is streamed and distributed via a computer communication network, and the distribution and audio output are performed in a pipeline, so that the hard disk or the like of a game device can be used. This is applied when accompaniment data is not downloaded.
According to the present invention, even when accompaniment data for karaoke is provided by streaming delivery, it is determined whether a speaker is used or headphones are used, and the intensity of input from the microphone is appropriately determined. Will be able to adjust.

本発明の第２の観点に係るカラオケ方法は、入力受付工程、混合工程、出力工程、および、制御工程を備え、以下のように構成する。
すなわち、入力受付工程では、第１の音声データの入力を受け付ける。
一方、混合工程では、入力を受け付けられた第１の音声データと、あらかじめ記憶された第２の音声データと、を混合する。
さらに、出力工程では、混合された結果の音声データを出力する。 The karaoke method according to the second aspect of the present invention includes an input reception process, a mixing process, an output process, and a control process, and is configured as follows.
That is, in the input receiving step, input of the first audio data is received.
On the other hand, in the mixing step, the first audio data whose input has been accepted is mixed with the second audio data stored in advance.
Further, in the output step, the mixed audio data is output.

そして、制御工程では、入力を受け付けられた第１の音声データに含まれるあらかじめ記憶された第２の音声データの成分の強度にしたがって、混合される第１の音声データの混合比を制御する。
本発明の他の観点に係るプログラムは、コンピュータを、上記のカラオケ装置の各部として機能させ、もしくは、コンピュータに、上記のカラオケ方法の各工程を実行させるように構成する。 In the control step, the mixing ratio of the first audio data to be mixed is controlled according to the intensity of the second audio data component stored in advance in the first audio data that has been accepted.
A program according to another aspect of the present invention is configured to cause a computer to function as each unit of the karaoke device or to cause the computer to execute each step of the karaoke method.

また、本発明のプログラムは、コンパクトディスク、フレキシブルディスク、ハードディスク、光磁気ディスク、ディジタルビデオディスク、磁気テープ、半導体メモリ等のコンピュータ読取可能な情報記録媒体に記録することができる。上記プログラムは、当該プログラムが実行されるコンピュータとは独立して、コンピュータ通信網を介して配布・販売することができる。また、上記情報記録媒体は、当該コンピュータとは独立して配布・販売することができる。 The program of the present invention can be recorded on a computer-readable information recording medium such as a compact disk, flexible disk, hard disk, magneto-optical disk, digital video disk, magnetic tape, and semiconductor memory. The above program can be distributed and sold via a computer communication network independently of the computer on which the program is executed. The information recording medium can be distributed and sold independently of the computer.

本発明によれば、カラオケが行われている環境を適切に推測して、当該環境に合わせてカラオケを行うのに好適なカラオケ装置、カラオケ方法、ならびに、これらをコンピュータによって実現するプログラムを提供することができる。 According to the present invention, there is provided a karaoke apparatus, a karaoke method, and a program that realizes these by a computer, which is suitable for appropriately estimating an environment in which karaoke is performed and performing karaoke in accordance with the environment. be able to.

以下に本発明の実施形態を説明する。以下では、理解を容易にするため、ゲーム装置に本発明が適用される実施形態を説明するが、各種のコンピュータ、ＰＤＡ、携帯電話などの情報処理装置においても同様に本発明を適用することができる。すなわち、以下に説明する実施形態は説明のためのものであり、本願発明の範囲を制限するものではない。したがって、当業者であればこれらの各要素もしくは全要素をこれと均等なものに置換した実施形態を採用することが可能であるが、これらの実施形態も本発明の範囲に含まれる。 Embodiments of the present invention will be described below. In the following, for ease of understanding, an embodiment in which the present invention is applied to a game device will be described. However, the present invention can be similarly applied to information processing devices such as various computers, PDAs, and mobile phones. it can. That is, the embodiment described below is for explanation, and does not limit the scope of the present invention. Therefore, those skilled in the art can employ embodiments in which each or all of these elements are replaced with equivalent ones, and these embodiments are also included in the scope of the present invention.

図１は、本発明の実施形態の１つに係るカラオケ装置が実現される典型的なゲーム装置の概要構成を示す模式図である。以下、本図を参照して説明する。 FIG. 1 is a schematic diagram showing a schematic configuration of a typical game device in which a karaoke device according to one embodiment of the present invention is realized. Hereinafter, a description will be given with reference to FIG.

ゲーム装置１００は、ＣＰＵ（Central Processing Unit）１０１と、ＲＯＭ（Read Only Memory）１０２と、ＲＡＭ（Random Access Memory）１０３と、インターフェース１０４と、コントローラ１０５と、外部メモリ１０６と、画像処理部１０７と、ＤＶＤ（Digital Versatile Disk）−ＲＯＭドライブ１０８と、ＮＩＣ（Network Interface Card）１０９と、を備える。 The game apparatus 100 includes a CPU (Central Processing Unit) 101, a ROM (Read Only Memory) 102, a RAM (Random Access Memory) 103, an interface 104, a controller 105, an external memory 106, an image processing unit 107, and the like. A DVD (Digital Versatile Disk) -ROM drive 108 and a NIC (Network Interface Card) 109.

ゲーム用のプログラムおよびデータを記憶したＤＶＤ−ＲＯＭをＤＶＤ−ＲＯＭドライブ１０８に装着して、ゲーム装置１００の電源を投入することにより、当該プログラムが実行され、本実施形態のカラオケ装置が実現される。 A DVD-ROM storing a game program and data is loaded into the DVD-ROM drive 108 and the game apparatus 100 is turned on to execute the program, thereby realizing the karaoke apparatus of the present embodiment. .

ＣＰＵ１０１は、ゲーム装置１００全体の動作を制御し、各構成要素と接続され制御信号やデータをやりとりする。 The CPU 101 controls the overall operation of the game apparatus 100 and is connected to each component to exchange control signals and data.

ＲＯＭ１０２には、電源投入直後に実行されるＩＰＬ（Initial Program Loader）が記録され、これが実行されることにより、ＤＶＤ−ＲＯＭに記録されたプログラムをＲＡＭ１０３に読み出してＣＰＵ１０１による実行が開始される。また、ＲＯＭ１０２には、ゲーム装置１００全体の動作制御に必要なオペレーティングシステムのプログラムや各種のデータが記録される。 The ROM 102 records an IPL (Initial Program Loader) that is executed immediately after the power is turned on, and when this is executed, the program recorded on the DVD-ROM is read out to the RAM 103 and execution by the CPU 101 is started. The The ROM 102 stores an operating system program and various data necessary for operation control of the entire game apparatus 100.

ＲＡＭ１０３は、データやプログラムを一時的に記憶するためのもので、ＤＶＤ−ＲＯＭから読み出したプログラムやデータ、その他ゲームの進行やチャット通信に必要なデータが保持される。 The RAM 103 is for temporarily storing data and programs, and holds programs and data read from the DVD-ROM and other data necessary for game progress and chat communication.

インターフェース１０４を介して接続されたコントローラ１０５は、ユーザがカラオケなどのゲーム実行の際に行う操作入力を受け付ける。 The controller 105 connected via the interface 104 receives an operation input performed by the user when executing a game such as karaoke.

インターフェース１０４を介して脱自在に接続された外部メモリ１０６には、カラオケのプレイ状況（過去に歌った楽曲や採点結果等）を示すデータ、ゲームの進行状態を示すデータ、チャット通信のログ（記録）のデータなどが書き換え可能に記憶される。ユーザは、コントローラ１０５を介して指示入力を行うことにより、これらのデータを適宜外部メモリ１０６に記録することができる。 The external memory 106 detachably connected via the interface 104 stores data indicating the karaoke play status (songs sung in the past, scoring results, etc.), data indicating the progress of the game, and chat communication logs (records). ) Is stored in a rewritable manner. The user can record these data in the external memory 106 as appropriate by inputting an instruction via the controller 105.

ＤＶＤ−ＲＯＭドライブ１０８に装着されるＤＶＤ−ＲＯＭには、ゲームを実現するためのプログラムとゲームに付随する画像データや音声データが記録される。ＣＰＵ１０１の制御によって、ＤＶＤ−ＲＯＭドライブ１０８は、これに装着されたＤＶＤ−ＲＯＭに対する読み出し処理を行って、必要なプログラムやデータを読み出し、これらはＲＡＭ１０３等に一時的に記憶される。 A DVD-ROM mounted on the DVD-ROM drive 108 stores a program for realizing the game and image data and audio data associated with the game. Under the control of the CPU 101, the DVD-ROM drive 108 performs a reading process on the DVD-ROM loaded therein, reads out necessary programs and data, and these are temporarily stored in the RAM 103 or the like.

画像処理部１０７は、ＤＶＤ−ＲＯＭから読み出されたデータをＣＰＵ１０１や画像処理部１０７が備える画像演算プロセッサ（図示せず）によって加工処理した後、これを画像処理部１０７が備えるフレームメモリ（図示せず）に記録する。フレームメモリに記録された画像情報は、所定の同期タイミングでビデオ信号に変換され画像処理部１０７に接続されるモニタ（図示せず）へ出力される。これにより、各種の画像表示が可能となる。 The image processing unit 107 processes the data read from the DVD-ROM by an image arithmetic processor (not shown) included in the CPU 101 or the image processing unit 107, and then processes the processed data on a frame memory ( (Not shown). The image information recorded in the frame memory is converted into a video signal at a predetermined synchronization timing and output to a monitor (not shown) connected to the image processing unit 107. Thereby, various image displays are possible.

画像演算プロセッサは、２次元の画像の重ね合わせ演算やαブレンディング等の透過演算、各種の飽和演算を高速に実行できる。
また、仮想３次元空間に配置され、各種のテクスチャ情報が付加されたポリゴン情報を、Ｚバッファ法によりレンダリングして、所定の視点位置から仮想３次元空間に配置されたポリゴンを俯瞰したレンダリング画像を得る演算の高速実行も可能である。 The image calculation processor can execute a two-dimensional image overlay calculation, a transmission calculation such as α blending, and various saturation calculations at high speed.
In addition, the polygon information arranged in the virtual three-dimensional space and added with various texture information is rendered by the Z buffer method, and a rendering image obtained by overlooking the polygon arranged in the virtual three-dimensional space from a predetermined viewpoint position is obtained. High speed execution of the obtained operation is also possible.

さらに、ＣＰＵ１０１と画像演算プロセッサが協調動作することにより、文字の形状を定義するフォント情報にしたがって、文字列を２次元画像としてフレームメモリへ描画したり、各ポリゴン表面へ描画することが可能である。フォント情報は、ＲＯＭ１０３に記録されているが、ＤＶＤ−ＲＯＭに記録された専用のフォント情報を利用することも可能である。 Further, the CPU 101 and the image arithmetic processor operate in a coordinated manner, so that a character string can be drawn as a two-dimensional image in a frame memory or drawn on the surface of each polygon according to font information that defines the character shape. is there. The font information is recorded in the ROM 103, but it is also possible to use dedicated font information recorded in the DVD-ROM.

ＮＩＣ１０９は、ゲーム装置１００をインターネット等のコンピュータ通信網（図示せず）に接続するためのものであり、ＬＡＮ（Local Area Network）を構成する際に用いられる１０ＢＡＳＥ−Ｔ／１００ＢＡＳＥ−Ｔ規格にしたがうものや、電話回線を用いてインターネットに接続するためのアナログモデム、ＩＳＤＮ（Integrated Services Digital Network）モデム、ＡＤＳＬ（Asymmetric Digital Subscriber Line）モデム、ケーブルテレビジョン回線を用いてインターネットに接続するためのケーブルモデム等と、これらとＣＰＵ１０１との仲立ちを行うインターフェース（図示せず）により構成される。 The NIC 109 is for connecting the game apparatus 100 to a computer communication network (not shown) such as the Internet, and conforms to the 10BASE-T / 100BASE-T standard used when configuring a LAN (Local Area Network). Therefore, analog modems for connecting to the Internet using telephone lines, ISDN (Integrated Services Digital Network) modems, ADSL (Asymmetric Digital Subscriber Line) modems, cables for connecting to the Internet using cable television lines A modem or the like, and an interface (not shown) that mediates between them and the CPU 101 are configured.

音声処理部１１０は、ＤＶＤ−ＲＯＭから読み出した音声データをアナログ音声信号に変換し、これに接続されたスピーカ（図示せず）から出力させる。また、ＣＰＵ１０１の制御の下、ゲームの進行の中で発生させるべき効果音や楽曲データを生成し、これに対応した音声をスピーカから出力させる。 The audio processing unit 110 converts audio data read from the DVD-ROM into an analog audio signal and outputs the analog audio signal from a speaker (not shown) connected thereto. Further, under the control of the CPU 101, sound effects and music data to be generated during the progress of the game are generated, and sound corresponding to this is output from the speaker.

音声処理部１１０は、ＤＶＤ−ＲＯＭに記録された音声データがＭＩＤＩデータである場合には、これが有する音源データを参照して、ＭＩＤＩデータをＰＣＭデータに変換する。また、ADPCM形式やOgg Vorbis形式等の圧縮済音声データである場合には、これを展開してＰＣＭデータに変換する。ＰＣＭデータは、そのサンプリング周波数に応じたタイミングでＤ／Ａ（Digital/Analog）変換を行って、スピーカに出力することにより、音声出力が可能となる。 When the audio data recorded on the DVD-ROM is MIDI data, the audio processing unit 110 refers to the sound source data included in the audio data and converts the MIDI data into PCM data. If the compressed audio data is in ADPCM format or Ogg Vorbis format, it is expanded and converted to PCM data. The PCM data can be output by performing D / A (Digital / Analog) conversion at a timing corresponding to the sampling frequency and outputting it to a speaker.

さらに、ゲーム装置１００には、インターフェース１０４を介してマイク１１１を接続することができる。この場合、マイク１１１からのアナログ信号に対しては、適当なサンプリング周波数でＡ／Ｄ変換を行い、ＰＣＭ形式のディジタル信号として、音声処理部１１０でのミキシング等の処理ができるようにする。 Furthermore, a microphone 111 can be connected to the game apparatus 100 via the interface 104. In this case, the analog signal from the microphone 111 is subjected to A / D conversion at an appropriate sampling frequency so that processing such as mixing in the audio processing unit 110 can be performed as a PCM format digital signal.

ゲーム装置１００をカラオケ装置として利用する場合には、ＤＶＤ−ＲＯＭから読み出した音声データ、もしくは、ＮＩＣ１０９を介してコンピュータ通信網から取得した音声データを伴奏データとし、マイクから入力された音声データを歌唱データとして、伴奏データと歌唱データを音声処理部１１０がミキシングし、スピーカから出力する。また、スピーカにかえて、ヘッドホン（図示せず）やイヤフォン（図示せず）を用いて、音声を出力させることもできる。 When the game apparatus 100 is used as a karaoke apparatus, audio data read from a DVD-ROM or audio data acquired from a computer communication network via the NIC 109 is used as accompaniment data, and audio data input from a microphone is used. The audio processing unit 110 mixes accompaniment data and singing data as singing data, and outputs them from the speaker. Moreover, it is possible to output sound using headphones (not shown) or earphones (not shown) instead of the speakers.

このほか、ゲーム装置１００は、ハードディスク等の大容量外部記憶装置を用いて、ＲＯＭ１０２、ＲＡＭ１０３、外部メモリ１０６、ＤＶＤ−ＲＯＭドライブ１０８に装着されるＤＶＤ−ＲＯＭ等と同じ機能を果たすように構成してもよい。 In addition, the game apparatus 100 uses a large-capacity external storage device such as a hard disk so as to perform the same functions as the ROM 102, the RAM 103, the external memory 106, the DVD-ROM attached to the DVD-ROM drive 108, and the like. It may be configured.

（カラオケ装置の構成）図２は、上記ゲーム装置１００等の上に実現されるカラオケ装置の概要構成を示す説明図である。図３は、当該カラオケ装置にて実行されるカラオケ方法の制御の流れを示すフローチャートである。以下、本図を参照して説明する。 (Configuration of Karaoke Device) FIG. 2 is an explanatory diagram showing a schematic configuration of a karaoke device realized on the game device 100 or the like. FIG. 3 is a flowchart showing a flow of control of the karaoke method executed by the karaoke apparatus. Hereinafter, a description will be given with reference to FIG.

本実施形態に係るカラオケ装置２０１は、入力受付部２０２、記憶部２０３、混合部２０４、出力部２０５、および、制御部２０６を備える。 The karaoke apparatus 201 according to the present embodiment includes an input receiving unit 202, a storage unit 203, a mixing unit 204, an output unit 205, and a control unit 206.

ここで、記憶部２０３は、第２の音声データをあらかじめ記憶する。本実施形態では、ＤＶＤ−ＲＯＭドライブ１０８に装着されたＤＶＤ−ＲＯＭや、ＮＩＣ１０９を介して接続された他のコンピュータのハードディスク等、また、ダウンロードされた楽曲データを一時的に記憶するＲＡＭ１０３等が、記憶部２０３として機能する。第２の音声データは、カラオケの伴奏データに相当するものである。 Here, the storage unit 203 stores the second audio data in advance. In the present embodiment, a DVD-ROM mounted on the DVD-ROM drive 108, a hard disk of another computer connected via the NIC 109, a RAM 103 that temporarily stores downloaded music data, and the like. Functions as the storage unit 203. The second audio data corresponds to accompaniment data for karaoke.

なお、コンピュータ通信網を介したストリーム配信により伴奏データを取得する場合には、ストリーム配信を受け付ける取得部（図示せず。）を、当該記憶部２０３のかわりに設けることとし、当該取得部が、ストリーム配信された伴奏データを、第２の音声データとして、随時混合部２０４に与えるような形態を採用することもできる。 When accompaniment data is acquired by stream distribution via a computer communication network, an acquisition unit (not shown) that accepts stream distribution is provided instead of the storage unit 203, and the acquisition unit A form in which the stream-accompanied accompaniment data is given to the mixing unit 204 as the second audio data may be employed.

まず、入力受付部２０２は、所定時間長の第１の音声データの入力を受け付ける（ステップＳ３０１）。本実施形態では、マイク１１１が入力受付部２０２として機能する。第１の音声データには、歌唱データ（歌い手が歌う歌の音声データ）と、環境音（各種の雑音、カラオケに関係のない外部の音等）とが含まれることとなる。 First, the input receiving unit 202 receives input of first audio data having a predetermined time length (step S301). In the present embodiment, the microphone 111 functions as the input receiving unit 202. The first sound data includes singing data (sound data of a song sung by a singer) and environmental sounds (various noises, external sounds not related to karaoke, etc.).

なお、カラオケ用のマイク１１１としては、指向性が強いものを用いることが多く、また、ヘッドセットなどを用いて歌い手の口の前方にマイク１１１を配置した場合には、歌唱データの平均的な強度と、環境音の平均的な強度とを比較すると、前者は後者よりも遥かに大きいのが一般的である。 In addition, as the microphone 111 for karaoke, a microphone having strong directivity is often used, and when the microphone 111 is placed in front of the singer's mouth using a headset or the like, the average of the singing data is Comparing the intensity with the average intensity of the environmental sound, the former is generally much larger than the latter.

入力受付部２０２は、ＲＡＭ１０３内等に用意された所定のサイズのバッファに、マイク１１１から入力された第１の音声データをＡ／Ｄ変換したデータを記憶する。したがって、以降の処理は、当該所定のサイズのバッファに相当する時間単位で繰り返されることとなる。 The input receiving unit 202 stores data obtained by A / D converting the first audio data input from the microphone 111 in a buffer of a predetermined size prepared in the RAM 103 or the like. Therefore, the subsequent processing is repeated in units of time corresponding to the buffer of the predetermined size.

なお、マイク１１１からの入力が、当該バッファ長だけ溜まるまでは、以降の処理は開始されない。一方で、マイク１１１からの入力は、上記のように適当なサンプリングレートでＡ／Ｄ変換されて、蓄積されるため、典型的には割り込みや並行処理の技術を用いて行われる。 Note that the subsequent processing is not started until the input from the microphone 111 is accumulated for the buffer length. On the other hand, since the input from the microphone 111 is A / D converted and stored at an appropriate sampling rate as described above, it is typically performed using an interrupt or parallel processing technique.

したがって、ステップＳ３０１においては、第１の音声データについて、バッファ長の断片が蓄積されるまでは、ＣＰＵ１０１は本処理については待機状態となり、他の処理を並行して実行することができる。 Therefore, in step S301, until the buffer length fragment is accumulated for the first audio data, the CPU 101 is in a standby state for this process and can execute other processes in parallel.

ついで、混合部２０４は、ＲＡＭ１０３等にあらかじめ用意された変数領域a，bに記憶される重みを取得する（ステップＳ３０２）。 Next, the mixing unit 204 acquires the weights stored in the variable areas a and b prepared in advance in the RAM 103 or the like (step S302).

ついで、第２の音声データ（伴奏データ）を先頭から順に、上記所定の時間長の断片を取得する（ステップＳ３０３）。上記のように、マイク１１１から入力された音声データは、適宜Ａ／Ｄ変換を行ってＰＣＭデータとされているので、当該ＰＣＭデータ長と同じだけの伴奏データを順次取得するのである。 Next, the second audio data (accompaniment data) is acquired in order from the top, and the fragments having the predetermined time length are acquired (step S303). As described above, since the audio data input from the microphone 111 is converted into PCM data by appropriately performing A / D conversion, accompaniment data having the same length as the PCM data length is sequentially acquired.

そして、これらの重みの混合比a:bで、入力を受け付けられた第１の音声データ（歌唱データ＋環境音）と、あらかじめ記憶された第２の音声データ（伴奏データ）のうち、当該第１の音声データに対応する音声データの断片と、を混合する（ステップＳ３０４）。すなわち、ＣＰＵ１０１は、音声処理部１１０と共働して、混合部２０４として機能する。 Of the first audio data (singing data + environmental sound) that has been accepted and the second audio data (accompaniment data) stored in advance with the weight mixing ratio a: b, The audio data fragment corresponding to one audio data is mixed (step S304). That is, the CPU 101 functions as the mixing unit 204 in cooperation with the audio processing unit 110.

具体的には、伴奏データがＭＩＤＩデータである場合には、別途用意された音源データを参照してＭＩＤＩデータを音声波形データであるＰＣＭデータに変換し、圧縮されたデータである場合には、これを展開してＰＣＭデータに変換する。 Specifically, when the accompaniment data is MIDI data, the MIDI data is converted into PCM data that is voice waveform data by referring to separately prepared sound source data, and when the data is compressed data, This is expanded and converted into PCM data.

そして、第１の音声データに対応するＰＣＭデータと、第２の音声データに対応するＰＣＭデータと、を、重み付き飽和加算する。重み付き飽和加算は、音声データのミキシングに用いられる技術である。 Then, weighted saturation addition is performed on the PCM data corresponding to the first audio data and the PCM data corresponding to the second audio data. Weighted saturation addition is a technique used for mixing audio data.

ＰＣＭデータは、適当なサンプリング周波数でサンプリングされたアナログ信号を所定の精度でディジタル化したデータの列であり、典型的には、サンプリング周波数として、４４１００Ｈｚ，４８０００Ｈｚやこれを整数で割ったものが用いられ、精度としては１６ビットや２４ビット、３２ビットなどが用いられる。 PCM data is a string of data obtained by digitizing an analog signal sampled at an appropriate sampling frequency with a predetermined accuracy. Typically, the sampling frequency is 44100 Hz, 48000 Hz, or a value obtained by dividing this by an integer. The precision is 16 bits, 24 bits, 32 bits, or the like.

そこで、理解を容易にするため、第１の音声データに対応するＰＣＭデータの列を、先頭から順に
v[0]，v[1]，v[2]，…
とし、第２の音声データに対応するＰＣＭデータの列を同様に
s[0]，s[1]，s[2]，…
とおく。 Therefore, in order to facilitate understanding, the sequence of PCM data corresponding to the first audio data is changed in order from the top.
v [0], v [1], v [2], ...
Similarly, the PCM data column corresponding to the second audio data
s [0], s [1], s [2], ...
far.

最も単純なミキシングの手法においては、ミキシング結果として、
v[0]+s[0]，v[1]+s[1]，v[2]+s[2]，…
を出力する。 In the simplest mixing method, the mixing result is
v [0] + s [0], v [1] + s [1], v [2] + s [2], ...
Is output.

また、重み付きミキシングの場合には、第１の音声データに対する重みがa、第２の音声データに対する重みがbであるから、ミキシング結果として、
a・v[0]+b・s[0]，a・v[1]+b・s[1]，a・v[2]+b・s[2]，…
を出力する。重み付きミキシングを行う場合、カラオケでは、aを大きくすれば人間の声が大きく再生され、bを大きくすれば伴奏が大きく再生されることになる。 In the case of weighted mixing, since the weight for the first audio data is a and the weight for the second audio data is b, as a mixing result,
a ・ v [0] + b ・ s [0], a ・ v [1] + b ・ s [1], a ・ v [2] + b ・ s [2], ...
Is output. In the case of weighted mixing, in karaoke, if a is increased, a human voice is reproduced greatly, and if b is increased, an accompaniment is reproduced greatly.

飽和演算とは、サンプリングの精度を考慮した演算である。たとえば、サンプリング精度が１６ビット符号付き整数である場合には、ＰＣＭデータの列に含まれる値は、−３２７６８〜３２７６７の範囲に含まれる。そして、１６ビット符号付き整数でＡ／Ｄ変換やＤ／Ａ変換が行われるので、ミキシングの結果も、この範囲に含まれるようにしなければならない。 Saturation calculation is calculation considering the accuracy of sampling. For example, when the sampling accuracy is a 16-bit signed integer, the values included in the PCM data column are included in the range of −32768 to 32767. Since A / D conversion and D / A conversion are performed with a 16-bit signed integer, the result of mixing must be included in this range.

そこで、
f(x) = 32767 (x > 32767)；
= -32768 (x < -32768)；
= x (上記以外)
という関数f(・)を考え、飽和演算においては、ミキシングの結果を、
f(v[0]+s[0])，f(v[1]+s[1])，f(v[2]+s[2])，…
もしくは
f(a・v[0]+b・s[0])，f(a・v[1]+b・s[1])，f(a・v[2]+b・s[2])，…
とする。 Therefore,
f (x) = 32767 (x>32767);
= -32768 (x <-32768);
= x (other than above)
In the saturation operation, the result of mixing is
f (v [0] + s [0]), f (v [1] + s [1]), f (v [2] + s [2]), ...
Or
f (a ・ v [0] + b ・ s [0]), f (a ・ v [1] + b ・ s [1]), f (a ・ v [2] + b ・ s [2]) , ...
And

そして、出力部２０５は、混合された結果の音声データを出力する（ステップＳ３０５）。本実施形態では、音声処理部１１０がこれに接続されたスピーカ、ヘッドホン、イヤフォン等と共働して、出力部２０５として機能する。 And the output part 205 outputs the audio | voice data of the mixed result (step S305). In this embodiment, the audio processing unit 110 functions as the output unit 205 in cooperation with a speaker, headphones, earphones, and the like connected thereto.

具体的には、上記のようにミキシングされた結果のＰＣＭデータを、当該サンプリング周波数でＤ／Ａ変換してアナログ信号とし、これをスピーカ等に印加して、音声を出力させる。 Specifically, the PCM data obtained as a result of mixing as described above is D / A converted at the sampling frequency into an analog signal, which is applied to a speaker or the like to output sound.

なお、一般には、音声処理部１１０はＲＡＭ１０３内に混合された音声データが存在する場合には、割り込み等を利用して当該混合された音声データを順次処理してスピーカ等から出力する。この場合、ステップＳ３０５における処理は、直ちに終了し、音声処理部１１０における上記の処理が以降の処理と並行して行われることになる。 In general, when there is mixed audio data in the RAM 103, the audio processing unit 110 sequentially processes the mixed audio data using an interrupt or the like and outputs the mixed audio data from a speaker or the like. In this case, the processing in step S305 is immediately terminated, and the above processing in the audio processing unit 110 is performed in parallel with the subsequent processing.

さて、本実施形態では、制御部２０６は、入力を受け付けられた第１の音声データ
v[0]，v[1]，v[2]，…
と、第２の音声データ
s[0]，s[1]，s[2]，…
と、を対比して、その結果により、重みa，bを調整する（ステップＳ３０６）。したがって、ＣＰＵ１０１は、音声処理部１１０と共働して、制御部２０６として機能する。 In the present embodiment, the control unit 206 receives the first audio data that has been accepted.
v [0], v [1], v [2], ...
And second audio data
s [0], s [1], s [2], ...
And the weights a and b are adjusted based on the result (step S306). Therefore, the CPU 101 functions as the control unit 206 in cooperation with the voice processing unit 110.

以下、具体的な対比の手法と、それによる重みの変更の手法の一例について説明する。 Hereinafter, a specific comparison method and an example of a weight change method using the method will be described.

まず、第２の音声データが、音声処理部１１０を経てスピーカ等から出力され、その音が環境音としてマイク１１１に入力される状況を考える。このような場合には、遅延が生じるので、その最大遅延時間をTc（秒）と置く。 First, consider a situation in which the second audio data is output from a speaker or the like via the audio processing unit 110 and the sound is input to the microphone 111 as an environmental sound. In such a case, a delay occurs, and the maximum delay time is set as Tc (seconds).

一方、各種の音声データのサンプリング周波数をf（Ｈｚ）とすると、第１の音声データ中に、第２の音声データの成分が現れる場合の最大遅延オフセットTは、T = Tc・fのように表現できる。 On the other hand, assuming that the sampling frequency of various audio data is f (Hz), the maximum delay offset T when the second audio data component appears in the first audio data is T = Tc · f. Can express.

そして、第２の音声データの成分が表われる程度を調べる幅をWとする。これは、無線通信の同期技術の分野などで、「窓」と呼ばれるものである。本実施形態では、この「窓」をスライドさせて、第２の音声データの成分の強度がどの程度であるかを推測する。 Then, let W be a width for checking the degree of appearance of the second audio data component. This is called a “window” in the field of wireless communication synchronization technology. In this embodiment, the “window” is slid to estimate the intensity of the second audio data component.

まず、第２の音声データのうちある時点で処理しようとしているデータの添字をiと置く。対比するための第２の音声データは、
s[i]，s[i+1]，…，s[i+W-1]
というW個の数列である。 First, a subscript of data to be processed at a certain point in time in the second audio data is set as i. The second audio data for comparison is
s [i], s [i + 1], ..., s [i + W-1]
W number sequence.

これに対して、t (0<t≦T)の範囲で、第１の音声データ
v[i+t]，v[i+t+1]，…，v[i+t+W-1]
との対比を行う。 On the other hand, the first audio data in the range of t (0 <t ≦ T)
v [i + t], v [i + t + 1], ..., v [i + t + W-1]
Contrast with.

ここで、これらのW個の要素からなる数列をそれぞれベクトルとして考える。すなわち、
S(i) = (s[i]，s[i+1]，…，s[i+W-1])；
V(i,t) = (v[i+t]，v[i+t+1]，…，v[i+t+W-1])
として、これらについて、以下の評価値eを計算する。
e(i,t) = (S(i)・V(i,t))/(|S(i)| |V(i,t)|) Here, a numerical sequence composed of these W elements is considered as a vector. That is,
S (i) = (s [i], s [i + 1], ..., s [i + W-1]);
V (i, t) = (v [i + t], v [i + t + 1], ..., v [i + t + W-1])
For these, the following evaluation value e is calculated.
e (i, t) = (S (i) ・ V (i, t)) / (| S (i) | | V (i, t) |)

評価値e(t)は、ベクトルS(i)とベクトルV(i,t)との内積をそれぞれのベクトルの大きさで割ったものである。これらのベクトルがなす角をθとすると、e(i,t)は、cos θに相当する値である。したがって、評価値e(t)は-1〜1の間の値をとり、1の場合は、ベクトルS(i)とベクトルV(i,t)は同じ方向を向いていることとなる。 The evaluation value e (t) is obtained by dividing the inner product of the vector S (i) and the vector V (i, t) by the size of each vector. If the angle formed by these vectors is θ, e (i, t) is a value corresponding to cos θ. Therefore, the evaluation value e (t) takes a value between −1 and 1, and in the case of 1, the vector S (i) and the vector V (i, t) are in the same direction.

音声データとして考えると、e(i,t) = 1の場合は、二つの音声データの位相が同じであって一方は他方を増幅したものであることに相当し、e(i,t) = -1の場合は、逆位相に相当する。 When considered as audio data, when e (i, t) = 1, it corresponds to the phase of the two audio data being the same, and one is an amplified version of the other, e (i, t) = A case of -1 corresponds to an antiphase.

そこで、tを0<t≦Tの範囲で変化させて、上記の評価値e(i,t)が最も大きくなるものを考える。このときのtの値をτ(i)とおく。 Therefore, t is changed in the range of 0 <t ≦ T, and the evaluation value e (i, t) is maximized. The value of t at this time is set to τ (i).

τ(i)は、伴奏データが音声出力されてからマイク１１１を介してまた戻ってくるまでにかかる遅延時間に対応するものと推測することができる。 It can be inferred that τ (i) corresponds to the delay time required for accompaniment data to be returned through the microphone 111 after the sound is output.

さて、時点t = τ(i)において、第２の音声データ（伴奏データ）の成分が、第１の音声データ（歌唱データ＋環境音）の中に含まれている割合は、e(i,τ(i))である。e(i,τ(i))は、両者の「相関」を示す数値である。 Now, at time t = τ (i), the proportion of the second audio data (accompaniment data) component included in the first audio data (singing data + environmental sound) is e (i, τ (i)). e (i, τ (i)) is a numerical value indicating the “correlation” between the two.

そこで、この値の大きさによって、第１の音声データと第２の音声データとのミキシングの重みを変更するのである。 Therefore, the mixing weight of the first audio data and the second audio data is changed according to the magnitude of this value.

もっとも単純な手法は、典型的な環境でスピーカを使った場合とヘッドホン等を使った場合とで、e(i,τ(i))がどの程度違うかを実験によりあらかじめ調べて閾値を得て、e(i,τ(i))当該閾値以上であれば、スピーカ出力であり、当該閾値未満であれば、ヘッドホン等出力である、と推測するものである。 The simplest method is to obtain a threshold value by investigating beforehand how much e (i, τ (i)) is different between using a speaker in a typical environment and using headphones. , E (i, τ (i)) If it is equal to or greater than the threshold value, it is assumed that the output is a speaker, and if it is less than the threshold value, it is estimated that the output is headphones or the like.

そして、重みa，bについては、スピーカ出力の際の値と、ヘッドフォン等出力の際と、をそれぞれあらかじめ用意しておき、上記e(i,τ(i))の値によってこれを切り替える。 As for the weights a and b, a value for speaker output and a headphone output are prepared in advance, and are switched according to the value of e (i, τ (i)).

このほか、適当な正定数A，Cを用いて、重みaの値を
a = A - C e(i,τ(i))
のように変化させる、という手法もある。ヘッドホン等出力の場合には、評価値e(i,τ(i))はほぼ０になると考えられるから、重みa = Aとなる。一方、スピーカ出力の場合は、評価値e(i,τ(i))は一定の正の値をとるから、ヘッドホン等出力の場合よりも重みが小さくなる。 In addition, using appropriate positive constants A and C, the value of weight a is
a = A-C e (i, τ (i))
There is also a method of changing like this. In the case of an output from headphones or the like, the evaluation value e (i, τ (i)) is considered to be almost zero, so the weight a = A. On the other hand, in the case of speaker output, the evaluation value e (i, τ (i)) takes a certain positive value, and therefore the weight is smaller than in the case of output from headphones or the like.

結果として、歌い手が発する声の大きさはヘッドホン等出力の場合は、スピーカ出力の場合よりも小さくてすむこととなる。 As a result, the volume of the voice sung by the singer is smaller in the case of output from headphones or the like than in the case of speaker output.

そして、これらの定数の値は、本発明が適用される環境によって適宜変更することができる。 The values of these constants can be changed as appropriate according to the environment to which the present invention is applied.

なお、一般的なカラオケの環境では、スピーカ出力とヘッドホン等出力とが頻繁に変更される状況は考えにくい。したがって、遅延時間の変化に相当する数列
τ(0)，τ(1)，τ(2)，…
や、伴奏データが再度マイク１１１から入力される程度の表す数列
e(0,τ(0))，e(1，τ(1))，e(2，τ(2))，…
は、いずれも、ほぼ定数列となるべきであり、これが急激に変化するのは、雑音など、何らかの偶然によるものと考えられる。 In a general karaoke environment, it is difficult to imagine a situation in which the speaker output and headphones output are frequently changed. Therefore, the sequence τ (0), τ (1), τ (2), ...
Or a sequence representing how much accompaniment data is input from the microphone 111 again
e (0, τ (0)), e (1, τ (1)), e (2, τ (2)), ...
Should be almost a constant sequence, and it is thought that this abrupt change is due to some chance such as noise.

そこで、
e(0,τ(0))，e(1，τ(1))，e(2，τ(2))，…
をそのまま利用して重みを決定するのではなく、一定の区間の平均を採用するなどの手法をとっても良い。 Therefore,
e (0, τ (0)), e (1, τ (1)), e (2, τ (2)), ...
Instead of determining the weight by using the value as it is, a method of adopting an average of a certain section may be taken.

たとえば、適当な大きさの正整数Nを用いて、
μ(i) = (1/N)Σ_j=0 ^N-1 τ(i+j)
として遅延時間を平均化し、これによって、
e(0,μ(0))，e(1，μ(1))，e(2，μ(2))，…
あるいは、適当な正整数Mを利用して
e(M,μ(0))，e(M+1，μ(1))，e(M+2，μ(2))，…
のようにして、第２の音声データの成分が第１の音声データに含まれる割合（の時間変化）を求めることができる。 For example, using a positive integer N of an appropriate size,
μ (i) = (1 / N) Σ _{j = 0} ^N-1 τ (i + j)
Average delay time as
e (0, μ (0)), e (1, μ (1)), e (2, μ (2)), ...
Alternatively, using an appropriate positive integer M
e (M, μ (0)), e (M + 1, μ (1)), e (M + 2, μ (2)), ...
In this way, the ratio (time change) of the second audio data component contained in the first audio data can be obtained.

このほか、
E(i) = (1/N)Σ_j=0 ^N-1 e(i+j,τ(i+j))
とおいて、第２の音声データの成分が第１の音声データに含まれる割合として、
E(0)，E(1)，E(2)，…
を採用しても良い。 other than this,
E (i) = (1 / N) Σ _{j = 0} ^N-1 e (i + j, τ (i + j))
In the meantime, as a ratio that the component of the second audio data is included in the first audio data,
E (0), E (1), E (2), ...
May be adopted.

これらの手法により、雑音等の影響を排除することができる一方で、カラオケ演奏の途中で、スピーカ出力とヘッドホン等出力とを切り替えた場合にも対応が可能である。 While these methods can eliminate the influence of noise and the like, it is also possible to cope with switching between speaker output and headphone output during karaoke performance.

さて、このようにして、所定の時間長の処理が終了したら、カラオケのプレイが終了したか否かを判定し（ステップＳ３０７）、終了している場合（ステップＳ３０７；Ｙｅｓ）、本処理を終了する。一方、終了していない場合（ステップＳ３０７；Ｎｏ）、ステップＳ３０１に戻る。 Now, when the process of the predetermined time length is finished in this way, it is determined whether or not the karaoke play is finished (step S307). When the process is finished (step S307; Yes), this process is finished. To do. On the other hand, when it has not ended (Step S307; No), it returns to Step S301.

なお、本実施形態では、ミキシングの混合比を決定する重みa，bを、成分割合に応じて変化させる手法を採用したが、マイク１１１の構成によっては、マイク１１１の感度そのものを変化させたり、マイク１１１に含まれる増幅器の増幅率を変化させることができるものもある。したがって、重みa，bを変化させるのではなく、マイク１１１の感度や増幅器の増幅率を変化させるように制御する態様を採用しても良い。 In the present embodiment, the method of changing the weights a and b for determining the mixing ratio according to the component ratio is adopted. However, depending on the configuration of the microphone 111, the sensitivity of the microphone 111 may be changed. Some can change the amplification factor of the amplifier included in the microphone 111. Therefore, instead of changing the weights a and b, a mode of controlling the sensitivity of the microphone 111 and the amplification factor of the amplifier may be adopted.

このように、本実施形態によれば、カラオケを楽しむ際に、音声出力がスピーカからされているか、それともヘッドホン等からされているか、を適切に推測し、これに応じて、歌唱データのミキシング割合を自動的に変化させ、前者の場合は歌い手に大きな声で歌うよう促し、前者の場合は歌い手に小さな声で歌うように促すようなカラオケ装置を実現することができる。 Thus, according to the present embodiment, when enjoying karaoke, it is appropriately estimated whether the audio output is from a speaker or from headphones, and the mixing ratio of the singing data is accordingly determined. Can be automatically changed, and in the former case, a karaoke device can be realized that urges the singer to sing with a loud voice, and in the former case, urges the singer to sing with a low voice.

上記実施形態では、第１の音声データと第２の音声データとの相関を求めることによって、第１の音声データを出力する強度を制御していたが、より単純な手法を採用することも可能である。 In the above embodiment, the intensity of outputting the first audio data is controlled by obtaining the correlation between the first audio data and the second audio data. However, a simpler method can be adopted. It is.

すなわち、伴奏データに、「歌い手が曲の一部とは考えない音声データ」すなわち「歌い手が声を発しない区間に出力される音声データ」を入れておく手法である。そして、これを検査用音声データとする。 In other words, the accompaniment data is a method in which “speech data that the singer does not consider to be part of the song”, that is, “speech data that is output during a period in which the singer does not speak”. This is used as inspection voice data.

検査用音声データの典型例としては、伴奏データの先頭に置かれることが多いリズムを刻む部分の音声データや、いわゆるイントロ部分の音声データを採用する手法が考えられる。 As a typical example of the test audio data, a method of adopting audio data of a rhythm portion that is often placed at the head of accompaniment data, or audio data of a so-called intro portion can be considered.

検査用音声データが出力されている間は、マイク１１１から入ってくる音声データには、歌い手の声の成分はほとんど入っていないと考えられるので、マイク１１１からの入力は、スピーカ出力がされている場合は、スピーカ出力の成分と雑音との和であり、ヘッドホン等出力がされている場合は、雑音のみとなると考えられる。 While the voice data for inspection is being output, the voice data coming from the microphone 111 is considered to contain almost no singer's voice component, so the input from the microphone 111 is output from the speaker. Is the sum of the component of the speaker output and the noise, and it is considered that only the noise is output when the output of headphones or the like is output.

そしてまた、当該区間の演奏前（イントロよりも前）であれば、マイク１１１からの入力は、スピーカ出力でもヘッドホン等出力でも、雑音のみとなると考えられる。イントロよりも前であれば、スピーカからも伴奏データの音声出力はされておらず、無音状態となっているはずだからである。 Further, before the performance of the section (before the intro), it is considered that the input from the microphone 111 is only noise, whether it is output from a speaker or headphone. This is because the accompaniment data is not output from the speaker before the intro and should be silent.

そこで、検査用音声データ再生以前のマイク１１１からの入力信号の強度と、検査用音声データ再生中のマイク１１１からの入力信号の強度とを比較して、これに顕著な差があれば（所定の閾値以上であれば）、スピーカ出力であると判断し、そうでなければヘッドホン等出力であると判断する手法である。 Therefore, the intensity of the input signal from the microphone 111 before the reproduction of the test audio data is compared with the intensity of the input signal from the microphone 111 during the reproduction of the test audio data, and if there is a significant difference (predetermined) This is a method for determining that the output is a speaker output, and otherwise determining that the output is headphones or the like.

本実施形態は、上記実施形態に比べて、判断に要する処理が極めて少なくてすむため、計算能力に劣るゲーム装置１００であっても本発明を適用することができる。 Since the present embodiment requires very little processing for determination as compared with the above-described embodiment, the present invention can be applied even to a game apparatus 100 that is inferior in calculation ability.

本実施形態は、ゲーム装置１００がフーリエ変換等の周波数成分分解の機能をハードウェアで実装している場合に好適な手法である。 This embodiment is a method suitable for the case where the game apparatus 100 has a function of frequency component decomposition such as Fourier transform implemented by hardware.

すなわち、伴奏データ中に、あらかじめ定めた周波数のあらかじめ定めた強度の音成分を入れておき、これを検査用音声データとする。 That is, a sound component having a predetermined frequency and a predetermined intensity is put in the accompaniment data, and this is used as inspection audio data.

一方で、マイク１１１からの入力を高速フーリエ変換して、当該あらかじめ定めた周波数成分の強度を求める。 On the other hand, the input from the microphone 111 is subjected to fast Fourier transform to obtain the intensity of the predetermined frequency component.

そして、当該周波数成分の強度を閾値と比較することにより、検査用音声データが、どの程度マイク１１１に戻ってきているか、を、検査し、これと所定の閾値と比較することにより、スピーカ出力か、ヘッドホン等出力か、を判別するのである。 Then, by comparing the intensity of the frequency component with a threshold value, it is inspected how much the test voice data has returned to the microphone 111, and comparing this with a predetermined threshold value, whether the output is speaker output. It is determined whether the output is headphones or the like.

本実施形態においては、検査用音声データは、一定の音量であることが望ましい。特に、検査用音声データとして、可聴音でない周波数の音を採用した場合には、カラオケを楽しむ邪魔にはならない一方で、より正確に、スピーカを利用しているかヘッドホン等を利用しているか、を判別することができるようになる。 In the present embodiment, it is desirable that the inspection audio data has a constant volume. In particular, when sound of non-audible frequency is used as the sound data for inspection, it is not disturbed to enjoy karaoke, but more accurately whether the speaker is used or headphones are used. It becomes possible to discriminate.

上記のように、本発明によれば、カラオケが行われている環境を適切に推測して、当該環境に合わせてカラオケを行うのに好適なカラオケ装置２０１、カラオケ方法、ならびに、これらをコンピュータによって実現するプログラムを提供することができ、カラオケボックス等の専用施設に利用されるカラオケ装置２０１の他、汎用ゲーム装置や汎用コンピュータ上に実現されるカラオケ装置２０１においても、本発明を適用することができる。 As described above, according to the present invention, the karaoke apparatus 201, the karaoke method, and the karaoke apparatus suitable for performing the karaoke according to the environment appropriately estimated by the environment in which the karaoke is performed are performed by the computer. The present invention can be applied not only to the karaoke apparatus 201 that can provide a program to be realized and used on a dedicated facility such as a karaoke box, but also to a karaoke apparatus 201 that is realized on a general-purpose game apparatus or a general-purpose computer. it can.

本発明の実施形態に係るカラオケ装置が実現される典型的なゲーム装置の概要構成を示す模式図である。It is a mimetic diagram showing an outline composition of a typical game device by which a karaoke device concerning an embodiment of the present invention is realized. 本実施形態のカラオケ装置の概要構成を示す説明図である。It is explanatory drawing which shows schematic structure of the karaoke apparatus of this embodiment. 本実施形態のカラオケ装置にて実行されるカラオケ方法の制御の流れを示すフローチャートである。It is a flowchart which shows the flow of control of the karaoke method performed with the karaoke apparatus of this embodiment.

Explanation of symbols

１００ゲーム装置
１０１ＣＰＵ
１０２ＲＯＭ
１０３ＲＡＭ
１０４インターフェース
１０５コントローラ
１０６外部メモリ
１０７画像処理部
１０８ＤＶＤ−ＲＯＭドライブ
１０９ＮＩＣ
１１０音声処理部
１１１マイク
２０１カラオケ装置
２０２入力受付部
２０３記憶部
２０４混合部
２０５出力部
２０６制御部 100 game machine 101 CPU
102 ROM
103 RAM
104 Interface 105 Controller 106 External Memory 107 Image Processing Unit 108 DVD-ROM Drive 109 NIC
DESCRIPTION OF SYMBOLS 110 Sound processing part 111 Microphone 201 Karaoke apparatus 202 Input reception part 203 Storage part 204 Mixing part 205 Output part 206 Control part

Claims

A karaoke apparatus that receives an input of a singing sound uttered by a user with respect to an accompaniment sound, mixes at least the singing sound and the accompaniment sound, and outputs the mixed sound to the user,
An input receiving unit that receives input of first audio data including at least the voice of the singing sound ;
A storage unit for storing second sound data representing the accompaniment sound in advance;
The intensity of the component of the second audio data stored in advance included in the first audio data that has received the input is acquired, and the first audio data with respect to the second audio data is acquired based on the intensity. A control unit that controls the mixing ratio, and when the strength increases, the control unit controls the mixing ratio to decrease.
A mixing unit that mixes the first audio data received with the input and the second audio data stored in advance at the controlled mixing ratio ; and
An output unit that outputs the mixed audio data to the user
Karaoke apparatus comprising: a.

The karaoke apparatus according to claim 1,
The control unit controls the mixing ratio so that the intensity of the first audio data included in the mixed result becomes an intensity preliminarily associated with the intensity of the component of the second audio data. A karaoke machine characterized by that .

The karaoke apparatus according to claim 1,
The controller determines in advance a mixing ratio when the intensity is equal to or greater than a predetermined threshold and a mixing ratio when the intensity is less than the predetermined threshold, and whether the intensity is equal to or greater than the predetermined threshold. The karaoke apparatus characterized by controlling the mixing ratio depending on whether or not .

The karaoke apparatus according to any one of claims 1 to 3,
The said control part controls the said mixture ratio by changing the sensitivity of the said input reception part . The karaoke apparatus characterized by the above-mentioned .

The karaoke apparatus according to any one of claims 1 to 3,
The control unit, by changing the amplification factor of the first voice data to be mixed in the mixing section, the karaoke apparatus characterized by controlling the mixing ratio.

A karaoke apparatus according to any one of claims 1 to 5,
Each time the predetermined test audio data in the second audio data is mixed and output, the control unit increases the intensity of the component of the predetermined test audio data included in the first audio data. And the subsequent mixing ratio is controlled by the intensity of the acquired component .

The karaoke apparatus according to any one of claims 1 to 6,
An acquisition unit that acquires the second audio data via a computer communication network instead of the storage unit;
A karaoke apparatus using “second voice data acquired via the computer communication network” instead of “second voice data stored in advance” .

The input of the singing sound uttered by the user with respect to the accompaniment sound is received, and at least the singing sound and the accompaniment sound are mixed and output to the user. The input receiving unit, the control unit, the mixing unit, and the output unit A karaoke method executed by a karaoke apparatus comprising:
An input receiving step in which the input receiving unit receives an input of first audio data including at least the voice of the singing sound ;
The control unit obtains the intensity of the component of the second audio data stored in advance in the first audio data that has received the input, and based on the intensity, the control unit obtains the intensity of the second audio data. A control step for controlling the mixing ratio of the first audio data, wherein when the intensity increases, the control step controls so that the mixing ratio decreases.
A mixing step in which the mixing unit mixes the first audio data that has received the input and the second audio data representing the accompaniment sound stored in advance at the controlled mixing ratio ; and
The output step in which the output unit outputs the audio data resulting from the mixing
Karaoke method characterized by comprising a.

A program that allows a computer to receive an input of a singing sound uttered by a user with respect to an accompaniment sound, and to function to mix and output at least the singing sound and the accompaniment sound to the user,
An input receiving unit that receives input of first audio data including at least the voice of the singing sound ;
A storage unit for storing second sound data representing the accompaniment sound in advance;
The intensity of the component of the second audio data stored in advance included in the first audio data that has received the input is acquired, and the first audio data with respect to the second audio data is acquired based on the intensity. A control unit that controls the mixing ratio, and when the strength increases, the control unit controls the mixing ratio to decrease.
A mixing unit that mixes the first audio data received with the input and the second audio data stored in advance at the controlled mixing ratio ; and
An output unit that outputs the mixed audio data to the user
Program for causing to function as.