KR101666535B1

KR101666535B1 - Performance evaluation device, karaoke device, and server device

Info

Publication number: KR101666535B1
Application number: KR1020147025532A
Authority: KR
Inventors: 슈이치 마츠모토
Original assignee: 야마하 가부시키가이샤
Priority date: 2012-04-18
Filing date: 2013-04-18
Publication date: 2016-10-14
Also published as: WO2013157602A1; TWI497484B; KR20140124843A; CN104170006B; JP2013222140A; TW201407602A; JP5958041B2; CN104170006A

Abstract

악곡의 연주 중에 행해져야 할 표정 연주와 상기 표정 연주가 상기 악곡에 있어서 행해져야 할 타이밍을 상기 악곡에 포함되는 노트 또는 노트군의 발음 개시 시각을 기준으로 해서 나타내는 표정 연주 레퍼런스 데이터를 취득하는 표정 연주 레퍼런스 데이터 취득 수단과, 연주자에 의한 상기 악곡의 연주음으로부터 상기 연주음의 피치 및 음량을 나타내는 피치 음량 데이터를 생성하는 피치 음량 데이터 생성 수단과, 상기 피치 음량 데이터 생성 수단에 의해 생성된 상기 피치 음량 데이터에 의해 나타내어지는 피치 및 음량 중 적어도 한쪽의 특성이, 상기 악곡에 있어서의 상기 표정 연주 레퍼런스 데이터에 의해 나타내어지는 소정 시간 범위 내에 있어서 상기 표정 연주 레퍼런스 데이터에 의해 행해져야 한다고 여겨지는 표정 연주의 특성을 나타내는 경우, 상기 연주자에 의한 상기 악곡의 연주에 대한 평가를 향상시키는 연주 평가 수단을 구비하는 연주 평가 장치.A facial expression performance to be performed during the performance of the music piece and a facial expression reference reference data for acquiring facial expression reference data indicating the timing at which the facial expression should be performed on the musical composition, Pitch data generating means for generating pitch data representing the pitch and the volume of the performance sound from the performance sound of the musical piece by the player; At least one of characteristics of pitch and loudness represented by the expression data is considered to be performed by the expression performance reference data within a predetermined time range indicated by the expression performance reference data in the piece of music Appear , The performance evaluation apparatus having a performance evaluation method for improving the assessment of the performance of the music piece by the player case.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a performance evaluation device, a karaoke device,

이 발명은 악곡 연주의 교졸을 평가하는 기술에 관한 것이다.This invention relates to a technique for evaluating the performance of musical performances.

예를 들면, 가창자의 가창 연주의 교졸을 채점하는 채점 기능을 갖춘 가창용의 노래방 장치(이하, 특별히 언급하지 않는 한, 단순히「노래방 장치」라고 함)에 관한 기술이 각종 제안되어 있다. 이러한 종류의 기술을 개시한 문헌으로서 특허문헌 1이 있다. 동 문헌에 개시된 노래방 장치는 이용자의 가창음으로부터 추출한 피치와 가이드 멜로디로 해서 미리 준비된 데이터로부터 추출한 피치와의 차분을 가창곡의 노트마다 산출하고, 이 차분에 의거하여 기본 득점을 산출한다. 또한, 이 노래방 장치는 비브라토나 포르타멘토(Portamento) 등의 기법을 구사한 가창이 행해진 경우에는 그 가창이 행해진 횟수에 따른 보너스 포인트를 산출한다. 이 노래방 장치는 기본 득점과 보너스 포인트의 합계점을 최종적인 평가 결과로서 이용자에게 제시한다. 이 기술에 의하면 비브라토나 포르타멘토 등과 같은 난도가 높은 기법을 구사한 가창을 평가 결과에 반영시킬 수 있다.For example, various techniques have been proposed for a karaoke system for karaoke (hereinafter, simply referred to as " karaoke system ", unless specifically stated) having a scoring function for scoring a karaoke performance of a singing voice in a singing voice. Patent document 1 discloses such a technique. The karaoke system disclosed in this document calculates a difference between a pitch extracted from a user's chorus and a pitch extracted from data prepared in advance as a guide melody for each note of a chorus, and a basic score is calculated based on the difference. In addition, when the karaoke system uses a technique such as vibrato or portamento, the karaoke system calculates a bonus point according to the number of times the karaoke is performed. The karaoke system presents the sum of the basic score and the bonus point to the user as the final evaluation result. According to this technique, it is possible to reflect the vocal performance of high-tech techniques such as vibrato and portamento in the evaluation results.

또한, 가창음을 나타내는 파형으로부터 비브라토나 포르타멘토 등의 기법을 사용한 가창이 행해진 것을 검출하는 기술을 개시한 문헌으로서, 예를 들면 특허문헌 2∼6이 있다.Furthermore, Patent Documents 2 to 6 disclose techniques for detecting a voicing using a technique such as vibrato or portamento from waveforms showing a bell sound.

특허문헌 1 : 일본 특허 공개 2005-107334호 공보Patent Document 1: JP-A-2005-107334 특허문헌 2 : 일본 특허 공개 2005-107330호 공보Patent Document 2: Japanese Patent Application Laid-Open No. 2005-107330 특허문헌 3 : 일본 특허 공개 2005-107087호 공보Patent Document 3: Japanese Patent Application Laid-Open No. 2005-107087 특허문헌 4 : 일본 특허 공개 2008-268370호 공보Patent Document 4: JP-A-2008-268370 특허문헌 5 : 일본 특허 공개 2005-107336호 공보Patent Document 5: Japanese Patent Application Laid-Open No. 2005-107336 특허문헌 6 : 일본 특허 공개 2008-225115호 공보Patent Document 6: JP-A-2008-225115

그러나, 특허문헌 1의 기술의 경우 본래라면 비브라토나 포르타멘토 등의 기법을 구사한 가창을 행하는 것이 바람직하지 않은 가창 개소에 대해서 그와 같은 가창이 행해진 경우에도 보너스 포인트가 가산되어 버린다. 이 때문에, 평가 결과로서 제시되는 득점이 인간의 감성에 의한 것과 괴리되어 버린다는 문제가 있었다.However, in the case of the technique of Patent Document 1, bonus points are added even when such a vocal sound is performed for a vocal point where it is undesirable to perform a vocalization using techniques such as vibrato or portamento. For this reason, there is a problem that the score presented as the evaluation result is different from that due to human emotion.

본 발명은 이와 같은 과제를 감안하여 이루어진 것으로서, 노래방 가창 등의 악곡 연주의 평가에 있어서 인간의 감성에 의한 것에 보다 가까운 평가 결과를 제시할 수 있도록 하는 것을 목적으로 한다.SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and it is an object of the present invention to provide evaluation results closer to those of human emotions in evaluation of music performance such as karaoke.

상기 과제를 해결하기 위해서 본 발명은 악곡의 연주 중에 행해져야 할 표정 연주와 상기 표정 연주가 상기 악곡에 있어서 행해져야 할 타이밍을 상기 악곡에 포함되는 노트 또는 노트군의 발음 개시 시각을 기준으로 해서 나타내는 표정 연주 레퍼런스 데이터를 취득하는 표정 연주 레퍼런스 데이터 취득 수단과, 연주자에 의한 상기 악곡의 연주음으로부터 상기 연주음의 피치 및 음량을 나타내는 피치 음량 데이터를 생성하는 피치 음량 데이터 생성 수단과, 상기 피치 음량 데이터 생성 수단에 의해 생성된 상기 피치 음량 데이터에 의해 나타내어지는 피치 및 음량 중 적어도 한쪽의 특성이, 상기 악곡에 있어서의 상기 표정 연주 레퍼런스 데이터에 의해 나타내어지는 소정 시간 범위 내에 있어서 상기 표정 연주 레퍼런스 데이터에 의해 행해져야 한다고 여겨지는 표정 연주의 특성을 나타내는 경우, 상기 연주자에 의한 상기 악곡의 연주에 대한 평가를 향상시키는 연주 평가 수단을 구비하는 연주 평가 장치를 제공한다.In order to solve the above-described problem, the present invention provides a program for causing a computer to perform a facial expression performance to be performed during a performance of a music piece and a facial expression in which the facial expression performance is to be performed in the musical composition, A pitch loudness data generating means for generating pitch loudness data indicative of a pitch and a volume of the performance sound from a performance sound of the musical tones by the player; At least one of the pitch and the loudness indicated by the pitch loudness data generated by the means is performed by the expression performance reference data within a predetermined time range indicated by the expression performance reference data in the musical composition Ya Han If it represents a characteristic of high expression is believed to play and provides a performance evaluation apparatus having a performance evaluation method for improving the assessment of the performance of the music piece by the player.

또한, 본 발명은 상기 연주 평가 장치와, 악곡의 반주를 지시하는 반주 데이터를 취득하는 반주 데이터 취득 수단과, 상기 반주 데이터의 지시에 따라 반주의 악음을 나타내는 음신호를 출력하는 음신호 출력 수단을 구비하고, 상기 피치 음량 데이터 생성 수단은 상기 음신호 출력 수단으로부터 출력된 음신호에 따라 스피커로부터 방음(放音)된 반주에 따라 상기 연주자에 의해 행해진 상기 악곡의 연주음의 피치 및 음량을 나타내는 피치 음량 데이터를 생성하는 노래방 장치를 제공한다.Further, according to the present invention, there is provided a performance evaluation apparatus, comprising: accompaniment data acquisition means for acquiring accompaniment data for instructing accompaniment of a piece of music; and sound signal output means for outputting a sound signal indicative of a musical note of the accompaniment in accordance with the instruction of the accompaniment data Wherein the pitch loudness data generating means generates a pitch loudness data indicating a pitch and a loudness of the performance sound of the music performed by the player in accordance with the accompaniment sound outputted from the speaker in accordance with the sound signal output from the sound signal outputting means A karaoke system for generating volume data is provided.

또한, 본 발명은 임의수의 임의의 연주자에 의한 악곡의 연주음의 각각에 관하여, 상기 악곡에 포함되는 노트 또는 노트군의 발음 개시 시각을 기준으로 하는 하나의 타이밍에 있어서 하나의 표정 연주가 출현한 것을 나타내는 표정 연주 출현 데이터를 취득하는 표정 연주 출현 데이터 취득 수단과, 상기 표정 연주 출현 데이터 취득 수단에 의해 취득된 임의수의 표정 연주 출현 데이터에 의거하여 상기 악곡에 포함되는 노트 또는 노트군의 각각에 관하여, 상기 노트 또는 노트군의 발음 개시 시각을 기준으로 하는 어느 타이밍에서 어느 표정 연주가 어느 빈도로 출현하고 있는지를 특정하고, 상기 특정한 정보에 따라 상기 악곡의 연주 중에 행해져야 할 표정 연주와 상기 표정 연주가 상상기 악곡에 있어서 행해져야 할 타이밍을 상기 악곡에 포함되는 노트 또는 노트군의 발음 개시 시각을 기준으로 해서 나타내는 표정 연주 레퍼런스 데이터를 생성하는 표정 연주 레퍼런스 데이터 생성 수단과, 상기 표정 연주 레퍼런스 데이터 생성 수단에 의해 생성된 표정 연주 레퍼런스 데이터를 연주 평가 장치에 송신하는 송신 수단을 구비하는 서버 장치를 제공한다.In addition, the present invention is characterized in that, with respect to each of the musical tones played by an arbitrary number of musicians, one musical piece appears in one timing with reference to the note start time of the note or note group included in the musical tune And a control unit for controlling the control unit to control the control unit so as to control the control unit to control the control unit so that the control unit controls the control unit Which is to be performed during the performance of the piece of music, and the expressing performance to be performed during the performance of the piece of music in accordance with the specific information, The timing to be performed in the imaginary musical tune is determined as the timing A facial expression reference data generating means for generating facial expression reference data representing a note or a note group based on the pronunciation start time of the note or note group and the facial expression reference data generated by the facial expression reference data generating means to the performance evaluation apparatus And a transmitting means.

또한, 본 발명은 가창 평가 시스템으로서, 악곡의 연주 중에 행해져야 할 표정 연주와 상기 표정 연주가 상기 악곡에 있어서 행해져야 할 타이밍을 상기 악곡에 포함되는 노트 또는 노트군의 발음 개시 시각을 기준으로 해서 나타내는 제 1 표정 연주 레퍼런스 데이터를 취득하는 표정 연주 레퍼런스 데이터 취득 수단과, 연주자에 의한 상기 악곡의 연주음으로부터 상기 연주음의 피치 및 음량을 나타내는 피치 음량 데이터를 생성하는 피치 음량 데이터 생성 수단과, 상기 피치 음량 데이터 생성 수단에 의해 생성된 상기 피치 음량 데이터에 의해 나타내어지는 피치 및 음량 중 적어도 한쪽의 특성이, 상기 악곡에 있어서의 상기 제 1 표정 연주 레퍼런스 데이터에 의해 나타내어지는 소정 시간 범위 내에 있어서 상기 제 1 표정 연주 레퍼런스 데이터에 의해 행해져야 한다고 여겨지는 표정 연주의 특성을 나타내는 경우, 상기 연주자에 의한 상기 악곡의 연주에 대한 평가를 향상시키는 연주 평가 수단과, 임의수의 임의의 연주자에 의한 악곡의 연주음의 각각에 관하여, 상기 임의의 연주자에 의한 상기 악곡에 포함되는 노트 또는 노트군의 발음 개시 시각을 기준으로 하는 하나의 타이밍에 있어서 하나의 표정 연주가 출현한 것을 나타내는 표정 연주 출현 데이터를 취득하는 표정 연주 출현 데이터 취득 수단과, 상기 표정 연주 출현 데이터 취득 수단에 의해 취득된 임의수의 표정 연주 출현 데이터에 의거하여 상기 임의의 연주자에 의한 악곡에 포함되는 노트 또는 노트군의 각각에 관하여, 상기 노트 또는 노트군의 발음 개시 시각을 기준으로 하는 어느 타이밍에서 어느 표정 연주가 어느 빈도로 출현하고 있는지를 특정하고, 상기 특정된 정보에 따라 상기 임의의 연주자에 의한 악곡의 연주 중에 행해져야 할 표정 연주와 상기 표정 연주가 상기 임의의 연주자에 의한 악곡에 있어서 행해져야 할 타이밍을 상기 임의의 연주자에 의한 악곡에 포함되는 노트 또는 노트군의 발음 개시 시각을 기준으로 해서 나타내는 제 2 표정 연주 레퍼런스 데이터를 생성하는 표정 연주 레퍼런스 데이터 생성 수단을 구비하는 가창 평가 시스템을 제공한다.Further, the present invention is a system for evaluating a phoneme, which expresses a facial expression to be performed during a performance of a music piece and a timing at which the facial expression should be performed in the piece of music, with reference to a phonetic start time of a note or note group A pitch loudness data generating means for generating pitch loudness data indicative of a pitch and a loudness of the performance sound from a performance sound of the musical tune performed by the player by the player; Wherein at least one of the pitch and the loudness indicated by the pitch loudness data generated by the loudness data generating means is at least one of the pitch and loudness indicated by the first expression performance reference data in the music, Expression performance reference data A performance evaluating means for evaluating performance of the musical composition performed by the player when the characteristic of the performance of performance of the musical composition is considered to be performed, A facial expression appearance data acquiring means for acquiring facial expression appearance data indicating that one facial expression appears at one timing based on the pronunciation start time of the note or note group included in the musical composition by an arbitrary player, And a control unit for controlling the sound generation start time of the note or note group with respect to each of the note or note group included in the musical composition by the arbitrary player based on the arbitrary number of facial expression appearance data acquired by the facial expression appearance data acquisition unit At any given timing, which facial expression appears at any frequency In accordance with the specified information, a facial expression performance to be performed during the performance of the musical piece by the arbitrary player and a timing at which the facial expression should be performed in the musical composition by the arbitrary player, And a facial expression reference data generating means for generating second facial expression reference data representing a note included in the musical composition or a reference start time of the note group as a reference.

또한, 본 발명은 악곡의 연주 중에 행해져야 할 표정 연주와 상기 표정 연주가 상기 악곡에 있어서 행해져야 할 타이밍을 상기 악곡에 포함되는 노트 또는 노트군의 발음 개시 시각을 기준으로 해서 나타내는 표정 연주 레퍼런스 데이터를 취득하고, 연주자에 의한 상기 악곡의 연주음으로부터 상기 연주음의 피치 및 음량을 나타내는 피치 음량 데이터를 생성하고, 상기 피치 음량 데이터에 의해 나타내어지는 피치 및 음량 중 적어도 한쪽의 특성이, 상기 악곡에 있어서의 상기 표정 연주 레퍼런스 데이터에 의해 나타내어지는 소정 시간 범위 내에 있어서 상기 표정 연주 레퍼런스 데이터에 의해 행해져야 한다고 여겨지는 표정 연주의 특성을 나타내는 경우, 상기 연주자에 의한 상기 악곡의 연주에 대한 평가를 향상시키는 연주 평가 방법을 제공한다.In addition, the present invention is characterized in that a facial expression performance to be performed during performance of a music piece and a timing to be performed in the musical composition by the facial expression performance are expressed by reference to the facial expression reference data And generating pitch sound volume data indicative of the pitch and the volume of the performance sound from the performance sound of the music piece by the player, and wherein at least one of the pitch and the volume characteristics represented by the pitch sound volume data is In the case where the characteristic of the expression performance is considered to be performed by the expression performance reference data within a predetermined time range indicated by the expression performance reference data of the player Evaluation method.

또한, 본 발명은 컴퓨터가 실행 가능한 프로그램으로서, 악곡의 연주 중에 행해져야 할 표정 연주와 상기 표정 연주가 상기 악곡에 있어서 행해져야 할 타이밍을 상기 악곡에 포함되는 노트 또는 노트군의 발음 개시 시각을 기준으로 해서 나타내는 표정 연주 레퍼런스 데이터를 취득하는 표정 연주 레퍼런스 데이터 취득 처리와, 연주자에 의한 상기 악곡의 연주음으로부터 상기 연주음의 피치 및 음량을 나타내는 피치 음량 데이터를 생성하는 피치 음량 데이터 생성 처리와, 상기 피치 음량 데이터 생성 수단에 의해 생성된 상기 피치 음량 데이터에 의해 나타내어지는 피치 및 음량 중 적어도 한쪽의 특성이, 상기 악곡에 있어서의 상기 표정 연주 레퍼런스 데이터에 의해 나타내어지는 소정 시간 범위 내에 있어서 상기 표정 연주 레퍼런스 데이터에 의해 행해져야 한다고 여겨지는 표정 연주의 특성을 나타내는 경우, 상기 연주자에 의한 상기 악곡의 연주에 대한 평가를 향상시키는 연주 평가 처리를 상기 컴퓨터에 실행시키는 프로그램을 제공한다.According to another aspect of the present invention, there is provided a computer-executable program for causing a computer to execute: a facial expression performance to be performed during a performance of a music piece; and a timing at which the facial expression performance should be performed in the piece of music, A pitch loudness data generating process for generating pitch loudness data indicating the pitch and loudness of the performance sound from the performance sound of the piece of music by the player; At least one of the pitch and the loudness indicated by the pitch loudness data generated by the loudness data generating means is within the predetermined time range indicated by the facial expression performance reference data in the musical composition, To And a performance evaluation process for improving the evaluation of the performance of the musical piece by the player when the performance of the performance is considered to be performed.

(발명의 효과)(Effects of the Invention)

본 발명에 의하면, 개개의 악곡의 연주에 있어서 바람직한 타이밍에서 바람직한 표정 연주가 행해지면 연주자에 대하여 높은 평가를 부여하는 연주 평가 장치가 실현된다. 그 결과, 연주자에 의해 표정 연주가 행해진 경우, 인간의 감성과의 괴리가 적은 평가가 이루어진다.According to the present invention, a performance evaluating apparatus that gives a high evaluation to a performer when a desired facial expression performance is performed at a desired timing in the performance of individual pieces of music is realized. As a result, when facial expression is performed by the performer, an evaluation with little discrepancy from human emotion is made.

도 1은 본 발명의 일실시형태인 가창 평가 시스템의 구성을 나타내는 도면이다.
도 2는 늦추기의 가창음의 파형을 나타내는 도면이다.
도 3은 비브라토의 가창음의 파형을 나타내는 도면이다.
도 4는 꾸밈음의 가창음의 파형을 나타내는 도면이다.
도 5는 포르타멘토의 가창음의 파형을 나타내는 도면이다.
도 6은 폴(fall)의 가창음의 파형을 나타내는 도면이다.
도 7은 본 발명의 일실시형태인 가창 평가 시스템의 동작을 나타내는 플로우 차트이다.
도 8은 늦추기에 대해서 생성한 통계 데이터의 일례이다.
도 9는 비브라토에 대해서 생성한 통계 데이터의 일례이다.
도 10은 꾸밈음에 대해서 생성한 통계 데이터의 일례이다.
도 11은 포르타멘토에 대해서 생성한 통계 데이터의 일례이다.
도 12는 폴에 대해서 생성한 통계 데이터의 일례이다.
도 13은 본 발명의 연주 평가 장치를 나타내는 블록도이다.BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a diagram showing a configuration of a voice evaluation system according to an embodiment of the present invention. Fig.
Fig. 2 is a diagram showing the waveform of the latest negative of the delay. Fig.
Fig. 3 is a diagram showing the waveform of the negative sound of the vibrato. Fig.
4 is a diagram showing the waveform of the negative sound of the ornaments.
5 is a diagram showing a waveform of the positive tone of the portamento.
FIG. 6 is a diagram showing the waveform of the negative of the fall. FIG.
Fig. 7 is a flowchart showing the operation of the vocal evaluation system, which is an embodiment of the present invention.
8 is an example of the statistical data generated for the slowing.
9 is an example of statistical data generated for vibrato.
10 is an example of statistical data generated for ornaments.
11 is an example of statistical data generated for portamento.
12 is an example of statistical data generated for the poles.
13 is a block diagram showing a performance evaluation apparatus according to the present invention.

이하, 도면을 참조하여 이 발명의 실시형태를 설명한다.Hereinafter, embodiments of the present invention will be described with reference to the drawings.

도 1은 본 발명의 일실시형태인 가창 평가 시스템(1)의 구성을 나타내는 도면이다. 이 가창 평가 시스템(1)은 노래방 장치[10-m(m=1, 2…M: M은 노래방 장치의 총수)]와 서버 장치(30)를 갖는다. 노래방 장치(10-m)는 각 노래방점에 1대 또는 복수대씩 설치된다. 서버 장치(30)는 시스템 운영 센터 내에 설치된다. 노래방 장치(10-m)와 서버 장치(30)는 네트워크(90)에 접속되어 서로 각종 데이터의 송수신이 가능하다.Fig. 1 is a diagram showing a configuration of a voice evaluation system 1, which is an embodiment of the present invention. This versatility evaluation system 1 has a karaoke system 10-m (m = 1, 2 ... M: M is the total number of karaoke systems)) and a server device 30. One or more karaoke devices 10-m are installed at each karaoke site. The server device 30 is installed in the system operation center. The karaoke system 10-m and the server apparatus 30 are connected to the network 90 and can exchange various data with each other.

노래방 장치(10-m)는 이용자의 가창을 서포트하는 반주곡의 방음과, 가사의 표시를 통한 가창 연출과, 이용자의 가창의 교졸의 평가를 행하는 장치이다. 여기에서, 노래방 장치(10-m)는 가창의 교졸의 평가에서는 이용자의 가창음의 피치 및 음량의 양부(良否)를 평가 대상으로 하는 평가와, 이하에 나타내는 5종류의 표정 가창의 양부를 평가 대상으로 하는 평가를 행하여 2가지 평가의 평가 결과인 득점을 코멘트 메시지와 함께 이용자에게 제시한다.The karaoke system 10-m is a device for performing sound evaluation of an accompaniment piece supporting the user's mouth, verbal presentation through display of the lyrics, and evaluation of the user's most recent work. Here, the karaoke system 10-m evaluates the user's positive pitch and loudness of the user's voice in the evaluation of the lecture hall at the earliest, evaluates both sides of the five types of facial expressions shown below And presents the score, which is the evaluation result of the two evaluations, to the user along with the comment message.

a1. 늦추기a1. Slow

이것은 가창곡 내의 특정 음의 가창 개시를 고의로 늦추는 표정 가창이다. 도 2에 나타내는 바와 같이, 이 가창이 행해진 경우 가창음의 전의 음에서 상기 음으로 음의 피치가 변화되는 시각이 악보(모범적인 가창)에 있어서의 양쪽 음에 대응하는 2개의 노트(음표)의 전이 시각보다 약간의 시간만큼 늦다.This is an expression that deliberately slows down the beginning of a particular note in a singing song. As shown in Fig. 2, when this vowel is performed, the time at which the negative pitch changes from the previous negative note to the negative note is calculated as the difference between the two notes (notes) corresponding to both notes in the score It is a little later than the transition time.

b1. 비브라토b1. Vibrato

이것은 가창곡 내의 특정 음을 외형상의 피치를 유지하면서 미세하게 진동시키는 표정 가창이다. 도 3에 나타내는 바와 같이, 이 가창이 행해진 경우 가창음의 피치는 악보에 있어서의 그 음에 대응하는 노트의 높이를 걸쳐서 주기적으로 변화한다.This is an expression that vibrates a specific note in a singing song finely while maintaining the outward pitch. As shown in Fig. 3, when this vowel is performed, the pitch of the latest pitch periodically changes over the height of the note corresponding to the note in the score.

c1. 꾸밈음c1. ornament

이것은 가창곡 내의 특정 음의 음색을 발음의 도중에 길게 뽑아내듯이 변화시키는 표정 가창이다. 도 4에 나타내는 바와 같이, 이 가창이 행해진 경우 가창음의 피치는 악보에 있어서의 그 음에 대응하는 노트의 도중에 일과적으로 상승한다.This is an expression that changes the tone of a specific note in the song as if it were extracted in the middle of the pronunciation. As shown in Fig. 4, when this vowel is performed, the pitch of the most recent pitch gradually rises during the middle of the note corresponding to the note in the score.

d1. 포르타멘토d1. Portamento

이것은 가창곡 내의 특정 음을 본래의 높이보다 낮은 음으로 발음하고 나서 본래의 높이에 근접시켜 가는 가창 방법이다. 도 5에 나타내는 바와 같이, 이 가창이 행해진 경우 가창음의 발음 개시 시각에 있어서의 피치는 악보에 있어서의 그 음에 대응하는 노트의 높이보다 낮아지게 된다. 그리고, 이 가창음의 피치는 발음 개시 후에 완만하게 상승해서 노트의 높이와 거의 같은 높이에 도달한다.This is the best way to get a specific note in a song to a lower pitch than its original height and then approach it to its original height. As shown in Fig. 5, when this vowel is performed, the pitch at the latest sounding start time becomes lower than the height of the note corresponding to the note in the score. Then, the pitch of the next sound gently rises after the start of sounding and reaches a height almost equal to the height of the note.

e1. 폴e1. pole

이것은 가창곡 내의 특정 음을 본래의 높이보다 높은 음으로 발음하고 나서 본래의 높이에 근접시켜 가는 가창 방법이다. 도 6에 나타내는 바와 같이, 이 가창이 행해진 경우 가창음의 발음 개시 시각에 있어서의 피치는 악보에 있어서의 그 음에 대응하는 노트의 높이보다 높아지게 된다. 그리고, 이 가창음의 피치는 발음 개시 후에 완만하게 하강해서 노트의 높이와 거의 같은 높이에 도달한다.This is the best way to get a specific note in a song to a higher pitch than its original height and then approach it to its original height. As shown in Fig. 6, when this window is performed, the pitch at the most recent sounding start time becomes higher than the height of the note corresponding to the sound in the score. Then, the pitch of the next pitch gradually falls after the start of the sounding, and reaches a height almost equal to the height of the note.

도 1로 되돌아가 가창 평가 시스템(1) 전체의 설명을 계속한다. 노래방 장치(10-m)는 음원(11), 스피커(12), 마이크로폰(13), 표시부(14), 통신 인터페이스(15), 보컬 어댑터(16), CPU(17), RAM(18), ROM(19), 하드디스크(20), 시퀀서(21)를 갖는다. 음원(11)은 MIDI(Musical Instrument Digital Interface)의 각종 메시지에 따른 음신호(S_A)를 출력한다. 스피커(12)는 주어진 신호를 음으로서 방음한다. 마이크로폰(13)은 음을 수음해서 수음 신호(S_M)를 출력한다. 표시부(14)는 화상 신호(S_I)에 따른 화상을 표시한다. 통신 인터페이스(15)는 네트워크(90)에 접속된 장치와의 사이에서 데이터를 송수신한다.Returning to Fig. 1, the description of the entire vocal evaluation system 1 will be continued. The karaoke system 10-m includes a sound source 11, a speaker 12, a microphone 13, a display unit 14, a communication interface 15, a vocal adapter 16, a CPU 17, a RAM 18, A ROM 19, a hard disk 20, and a sequencer 21. The sound source 11 outputs a sound signal S _A according to various messages of MIDI (Musical Instrument Digital Interface). The speaker 12 sounds a given signal as a sound. The microphone 13 receives the sound and outputs the sound receiving signal S _M. The display unit 14 displays an image according to the image signal S _I. The communication interface 15 transmits / receives data to / from a device connected to the network 90.

보컬 어댑터(16)는 음신호(S_M)의 피치 및 음량을 측정하여 그들 시각적인 변화를 나타내는 피치 음량 데이터를 생성한다. 구체적으로는, 보컬 어댑터(16)는 마이크로폰(13)으로부터 주어진 음신호(S_M)의 피치를 시간(T_S)(예를 들면, T_S=30밀리 초로 함)마다 검출하여 이 검출 결과를 신호(S_P)로서 출력한다. 또한, 보컬 어댑터(16)는 마이크로폰(13)으로부터 주어진 음신호(S_M)의 음량을 시간(T_S)마다 검출하여 이 검출 결과를 신호(S_L)로서 출력한다.The vocal adapter 16 measures the pitch and the volume of the sound signal S _M and generates pitch volume data representing the visual change thereof. More specifically, the vocal adapter 16 detects the pitch of a given sound signal S _M from the microphone 13 at each time T _S (for example, T _S = 30 milliseconds) And outputs it as a signal _Sp . The vocal adapter 16 detects the volume of the sound signal S _M given from the microphone 13 at every time T _S and outputs the detection result as the signal S _L.

CPU(17)는 RAM(18)을 워크 에리어로서 이용하면서 ROM(19)이나 하드디스크(20)에 기억된 프로그램을 실행한다. 이 CPU(17)의 동작의 상세한 것은 후술한다. ROM(19)에는 IPL(Initial Program Loader) 등이 기억되어 있다. 하드디스크(20)에는 각종 가창곡의 곡 데이터[MD-n(n=1∼N)(N은 가창곡의 종류의 총수)], 레퍼런스 데이터베이스(DBRK), 및 가창 평가 프로그램(VPG)이 기억되어 있다. 각 가창곡의 곡 데이터(MD-n)는 가창곡의 반주 내용, 가창곡의 가사, 및 가창곡의 모범적인 가창 내용을 SMF(Standard MIDI File) 형식으로 기록한 데이터이다.The CPU 17 executes the program stored in the ROM 19 or the hard disk 20 while using the RAM 18 as a work area. Details of the operation of the CPU 17 will be described later. An IPL (Initial Program Loader) and the like are stored in the ROM 19. The hard disk 20 stores the song data MD-n (n = 1 to N) (N is the total number of types of the song) of various karaoke songs, the reference database DBRK and the vocal evaluation program VPG . The song data MD-n of each singing song is data in which the accompaniment contents of the singing song, the lyrics of the singing song, and the exemplary singing contents of the singing song are recorded in SMF (Standard MIDI File) format.

구체적으로 설명하면, 도 1의 범위 내에 나타내는 바와 같이 곡 데이터(MD-n)는 헤더(HD), 반주 트랙(TR_AC), 가사 트랙(TR_LY), 모범 가창 레퍼런스 트랙(TR_NR)을 갖고 있다. 헤더(HD)에는 곡번호, 곡명, 장르, 연주 시간, 타임 베이스(4분 음표 1개분의 시간에 상당하는 틱수) 등의 정보가 기술되어 있다.More specifically, as shown in the range of Fig. 1, the song data MD-n includes a header HD, an accompaniment track TR _AC , a lyric track TR _LY , and an exemplary gauge reference track TR _NR have. In the header HD, information such as a song number, a song name, a genre, a playing time, a time base (a tick number corresponding to a time of one quarter note) is described.

반주 트랙(TR_AC)에는 가창곡의 악보의 반주 파트에 있어서의 각 노트[NT(i)] {i는 악보의 상기 파트의 선두의 노트[NT(1)]부터 센 순번을 나타냄}의 음의 발음을 지시하는 이벤트[EV(i)_ON]와 그 소음(消音)을 지시하는 이벤트[EV(i)_OFF], 및 서로 전후하는 이벤트의 실행 시간차(틱수)를 나타내는 델타 타임(DT)이 시계열순으로 기술되어 있다.In the accompaniment track TR _AC , each note [NT (i)] in the accompaniment part of the musical score of the vocal music is indicated by {i (i) represents the counted number from the head note NT (1) A delta time DT indicating an event [EV (i) _ON ] for instructing sounding of the event and an event [EV (i) _OFF ] for instructing the sound to be silenced and an execution time difference (tick number) Time series.

가사 트랙(TR_LY)에는 가창곡의 가사를 나타내는 각 데이터(D_LY)와, 각 가사의 표시 시각[보다 구체적으로는, 각 가사의 표시 시각과 각각의 앞의 가사의 표시 시각 사이의 시간차(틱수)]을 나타내는 델타 타임(DT)이 시계열순으로 기술되어 있다.In the lyrics track TR _LY , the data D _LY indicating the lyrics of the vocal music and the display time of each lyrics (more specifically, the time difference between the display time of each lyrics and the display time of each preceding lyrics Number of ticks)] is described in time series.

모범 가창 레퍼런스 트랙(TR_NR)에는 가창곡의 악보의 가창 파트에 있어서의 각 노트[NT(i)]의 음의 발음을 지시하는 이벤트[EV(i)_ON]와 그 소음을 지시하는 이벤트[EV(i)_OFF], 및 서로 전후하는 이벤트의 실행 시간차(틱수)를 나타내는 델타 타임(DT)이 시계열순으로 기술되어 있다.An example event [EV (i) _ON ] indicating the sounding of the notes of each note [NT (i)] in the upper part of the musical note of the current song is recorded in the example gauge reference track TR _NR , EV (i) _OFF ], and a delta time (DT) representing the execution time difference (tick number) of events before and after each other.

레퍼런스 데이터베이스(DBRK)에는 5종류의 표정 가창 레퍼런스 데이터(DD_a1, DD_a2, DD_a3, DD_a4, DD_a5)가 기억되어 있다. 표정 가창 레퍼런스 데이터(DD_a1)는 가창곡에 포함되는 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상의 각 시각(t)과 그들 시각(t)에 있어서 늦추기에 의한 가창이 행해진 경우의 평가점[VSR(t)]의 각 쌍을 나타내는 데이터이다. 표정 가창 레퍼런스 데이터(DD_a2)는 가창곡에 포함되는 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상의 각 시각(t)과 그들 시각(t)에 있어서 비브라토에 의한 가창이 행해진 경우의 평가점[VSR(t)]의 각 쌍을 나타내는 데이터이다. 표정 가창 레퍼런스 데이터(DD_a3)는 가창곡에 포함되는 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상의 각 시각(t)과 그들 시각(t)에 있어서 꾸밈음에 의한 가창이 행해진 경우의 평가점[VSR(t)]의 각 쌍을 나타내는 데이터이다. 표정 가창 레퍼런스 데이터(DD_a4)는 가창곡에 포함되는 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상의 각 시각(t)과 그들 시각(t)에 있어서 포르타멘토에 의한 가창이 행해진 경우의 평가점[VSR(t)]의 각 쌍을 나타내는 데이터이다. 표정 가창 레퍼런스 데이터(DD_a5)는 가창곡에 포함되는 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상의 각 시각(t)과 그들 시각(t)에 있어서 폴에 의한 가창이 행해진 경우의 평가점[VSR(t)]의 각 쌍을 나타내는 데이터이다. 이하에서는 5종류의 표정 가창 레퍼런스 데이터(DD_a1, DD_a2, DD_a3, DD_a4, DD_a5)를 구별하지 않는 경우는 표정 가창 레퍼런스 데이터(DD)라고 표기한다.In the reference database DBRK, there are stored five kinds of reference main data DD _a1 , DD _a2 , DD _a3 , DD _a4 and DD _a5 . The reference expression data DD _a1 is obtained by subtracting the reference time point t _{BS of} the note NT [i] included in the current song from the reference time point t _BS at each time t on the time axis and at the time t And the evaluation point [VSR (t)] in the case where the vowel is performed by the user. The reference expression data DD _a2 is a representation of the time t on the time axis where the pronunciation start time of the note NT [i] included in the current song is set as the reference point t _BS and the time t on the vibrato And the evaluation point [VSR (t)] in the case where the vowel is performed by the user. The reference expression data DD _a3 is a representation of the time t on the time axis where the pronunciation start time of the note NT [i] included in the current song is set as the reference point t _BS , And the evaluation point [VSR (t)] in the case where the vowel is performed by the user. The reference expression data DD _a4 is obtained by multiplying each time t on the time axis with the pronunciation start time of the note NT [i] included in the singing song as the reference point t _BS and the portamento And the evaluation point [VSR (t)] in the case where the vowel is performed by the user. The referenced reference data DD _a5 is obtained by multiplying each time t on the time axis with the pronunciation start time of the note NT [i] included in the current song as the reference point t _BS , And the evaluation point [VSR (t)] in the case where the vowel is performed by the user. In the following description, when the five types of reference large-size reference data ( _DDa1 , _DDa2 , _DDa3 , _DDa4 , _DDa5 ) are not distinguished,

가창 평가 프로그램(VPG)은 다음의 3가지 기능을 갖는다.The vocal evaluation program (VPG) has the following three functions.

a2. 표준 평가 기능a2. Standard evaluation function

이것은 보컬 어댑터(16)의 출력 신호(S_L 및 S_P)가 나타내는 피치 및 음량과 모범 가창 레퍼런스 트랙(TR_NR) 내의 각 이벤트[EV(i)_ON 및 EV(i)_OFF]에 의해 결정되는 각 노트[NT(i)]의 모범 피치(PCH_REF) 및 모범 음량(LV_REF)을 비교하고, 이 비교의 결과에 의거하여 가창의 교졸을 평가하는 기능이다.This is determined by the pitch and volume indicated by the output signals S _L and _SP of the vocal adapter 16 and by the respective events EV (i) _ON and EV (i) _{OFF in the} best reference reference track TR _NR (PCH _REF ) and the exemplary loudness (LV _REF ) of each note [NT (i)], and evaluates the largest lecture based on the result of this comparison.

b2. 표정 가창 평가 기능b2. Expression evaluation function

이것은 보컬 어댑터(16)의 출력 신호(S_P)가 나타내는 피치 파형에 표정 가창의 특징 파형이 출현할 때마다 표정 가창의 대상이 된 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상에서 있어서의 표정 가창의 특징 파형의 출현 시각을 구하고, 이 출현 시각과 대응하는 평가점[VSR(t)]을 레퍼런스 데이터베이스(DBRK) 내에 있어서의 해당 표정 가창 레퍼런스 데이터(DD)의 각 평가점[VSR(t)] 중으로부터 선택하고, 이 평가점[VSR(t)]에 의거하여 가창의 교졸을 평가하는 기능이다.This vocal adapter 16 output signal (S _P) that indicates the pronunciation start time of which is the subject of expression vocal notes [NT (i)] reference point whenever the appearance characteristic waveform of expression singing a pitch waveform (t _BS of (T)) corresponding to the time of occurrence of the feature waveform on the time axis in the reference database DBRK and the evaluation point VSR (t) corresponding to the appearance time in the reference database DBRK VSR (t)], and evaluates the largest number of professors based on the evaluation point VSR (t).

c2. 평가 결과 제시 기능c2. Evaluation result presentation function

이것은 a2에 의한 평가의 평가 결과 및 b2에 의한 평가의 평가 결과로부터 득점을 산출하고, 이 득점을 코멘트 메시지와 함께 표시부(14)에 표시시키는 기능이다.This is a function of calculating a score from the evaluation result of a2 and the evaluation result of b2, and displaying the score on the display unit 14 together with the comment message.

시퀀서(21)는 리모트 컨트롤러(도시하지 않음)에 의한 가창곡의 가창 개시 조작을 계기로 해서 해당 곡의 곡 데이터(MD-n)가 하드디스크(20)로부터 RAM(18)에 전송된 경우에 그 곡 데이터(MD-n) 내의 이벤트[EV(i)_ON, EV(i)_OFF], 및 데이터(D_LY)를 장치 각 부에 공급한다. 구체적으로는, 시퀀서(21)는 RAM(18)에 곡 데이터(MD-n)가 기억되면 이 곡 데이터(MD-n)의 헤더(HD)에 기술된 타임 베이스와 리모트 컨트롤러(도시하지 않음)에 의해 지정된 템포에 의거하여 1틱의 시간 길이를 결정하고, 이 시간 길이의 경과에 맞추어 틱을 카운트하면서 다음의 3가지 처리를 행한다.When the song data MD-n of the song is transferred from the hard disk 20 to the RAM 18 on the occasion of a start operation of a singing song by a remote controller (not shown) EV (i) _ON , EV (i) _OFF ] and data (D _LY ) in the music data MD-n to each unit of the apparatus. Specifically, when the music data MD-n is stored in the RAM 18, the sequencer 21 stores the time base described in the header HD of the music data MD-n and the remote controller (not shown) The time length of one tick is determined based on the tempo designated by the user, and the following three processes are performed while counting ticks according to the elapsed time length.

제 1 처리에서는 시퀀서(21)는 틱의 카운트수가 반주 트랙(TR_AC) 내의 델타 타임(DT)과 일치할 때마다 그것에 후속하는 이벤트[EV(i)_ON 또는 EV(i)_OFF]를 판독해서 음원(11)에 공급한다. 음원(11)은 시퀀서(21)로부터 이벤트[EV(i)_ON]가 공급되면 그 이벤트[EV(i)_ON]가 지정하는 음신호(S_A)를 스피커(12)에 공급하고, 시퀀서(21)로부터 이벤트[EV(i)_OFF]가 공급되면 스피커(12)로의 음신호(S_A)의 공급을 중지한다.In the first process, the sequencer 21 reads out an event EV (i) _ON or EV (i) _OFF subsequent thereto every time the count number of ticks matches the delta time DT in the accompaniment track TR _AC And supplies it to the sound source 11. The sound source 11 supplies the sound signal S _A designated by the event EV (i) _ON to the speaker 12 when the event EV (i) _ON is supplied from the sequencer 21, Supply of the sound signal S _A to the speaker 12 is stopped when an event [EV (i) _OFF ]

제 2 처리에서는 시퀀서(21)는 틱의 카운트수가 가사 트랙(TR_LY) 내의 델타 타임(DT)과 일치할 때마다 그것에 후속하는 데이터(D_LY)를 판독해서 표시부(14)에 공급한다. 표시부(14)는 시퀀서(21)로부터 데이터(D_LY)가 공급되면 그 데이터(D_LY)를 가사 텔롭의 화상으로 변환하고, 이 화상을 디스플레이(도시하지 않음)에 표시시킨다.In the second process, the sequencer 21 reads the data D _LY subsequent thereto and supplies the data D _LY to the display unit 14 whenever the count number of ticks coincides with the delta time DT in the lyrics track TR _LY . The display unit 14 converts the data D _LY into an image of the teletext when the data D _LY is supplied from the sequencer 21 and displays the image D _LY on a display (not shown).

시퀀서(21)가 이 제 1 및 제 2 처리를 행함으로써 스피커(12)로부터의 반주음의 방음과 디스플레이로의 가사의 표시가 진행된다. 이용자는 스피커(12)로부터 방음되는 반주음을 들으면서 디스플레이에 표시된 가사를 마이크로폰(13)을 향해서 가창한다. 이용자가 마이크로폰(13)을 향해서 가창하고 있는 동안, 마이크로폰(13)은 이용자의 가창음의 수음 신호(S_M)를 출력하고, 보컬 어댑터(16)는 이 신호(S_M)의 피치 및 음량을 나타내는 신호(S_P 및 S_L)를 출력한다.The sequencer 21 performs these first and second processes so that the sound of the accompaniment sound from the speaker 12 and the display of the lyrics on the display proceed. The user can listen to the accompaniment sound that is soundproofed from the speaker 12 and open the lyric displayed on the display toward the microphone 13. [ The microphone 13 outputs the user's positive sound signal S _M while the user is heading toward the microphone 13 and the vocal adapter 16 outputs the pitch and volume of the signal S _M And outputs the signals S _P and S _L indicating them.

제 3 처리에서는 시퀀서(21)는 틱의 카운트수가 모범 가창 레퍼런스 트랙(TR_NR) 내의 델타 타임(DT)과 일치할 때마다 그것에 후속하는 이벤트[EV(i)_ON][또는 EV(i)_OFF]를 판독해서 CPU(17)에 공급한다. CPU(17)는 시퀀서(21)로부터 공급되는 이벤트[EV(i)_ON 및 EV(i)_OFF]와 보컬 어댑터(16)의 출력 신호(S_P 및 S_L)를 사용하여 이용자의 가창의 교졸을 평가한다. 상세하게는 후술한다.A third processing in the sequencer 21 counts the number of model singing reference track (TR _NR) Delta Time (DT) event [EV (i) _ON] [or EV (i) subsequent to it each time a match _OFF in the tick And supplies it to the CPU 17. The CPU 17 uses the events EV (i) _ON and EV (i) _OFF supplied from the sequencer 21 and the output signals S _P and S _L of the vocal adapter 16, . The details will be described later.

서버 장치(30)는 노래방 점포에 있어서의 서비스의 제공을 지원하는 역할을 담당하는 장치이다. 서버 장치(30)는 통신 인터페이스(35), CPU(37), RAM(38), ROM(39), 하드디스크(40)를 갖는다. 통신 인터페이스(35)는 네트워크(90)에 접속된 장치와의 사이에서 데이터를 송수신한다. CPU(37)는 RAM(38)을 워크 에리어로서 이용하면서 ROM(39)이나 하드디스크(40)에 기억된 각종 프로그램을 실행한다. 이 CPU(37)의 동작의 상세한 것은 후술한다. ROM(39)에는 IPL 등이 기억되어 있다.The server device 30 is a device for supporting the provision of services in the karaoke shop. The server device 30 has a communication interface 35, a CPU 37, a RAM 38, a ROM 39, and a hard disk 40. The communication interface 35 exchanges data with a device connected to the network 90. The CPU 37 executes various programs stored in the ROM 39 or the hard disk 40 while using the RAM 38 as a work area. The details of the operation of the CPU 37 will be described later. The ROM 39 stores IPL and the like.

하드디스크(40)에는 가창 샘플 데이터베이스(DBS), 레퍼런스 데이터베이스(DBRS), 및 가창 분석 프로그램(APG)이 기억되어 있다. 가창 샘플 데이터베이스(DBS)에는 각각이 1개의 가창곡과 대응하는 가창 샘플 데이터(DS)군이 개별적으로 기억된다. 가창 샘플 데이터(DS)는 일정 수준 이상의 가창력을 갖는 자가 가창곡을 가창했을 때의 가창음의 피치 파형 및 음량 파형을 기록한 데이터이다. 레퍼런스 데이터베이스(DBRS)에는 각 노래방 장치(10-m)의 레퍼런스 데이터베이스(DBRK) 내에 격납되어야 할 최신의 표정 가창 레퍼런스 데이터(DD)가 기억된다.The hard disk 40 stores a vocabulary sample database DBS, a reference database DBRS, and a vocal analysis program APG. In the upper sample database DBS, the group of voxel sample data DS corresponding to one vocal music piece is individually stored. The maximal sample data DS is data recording a pitch waveform and a loudness waveform of a chorus when a chorus song having a certain level of vocal chorus is singed. In the reference database DBRS, the latest look-ahead reference data DD to be stored in the reference database DBRK of each karaoke apparatus 10-m is stored.

가창 분석 프로그램(APG)은 다음의 3가지 기능을 갖는다.The analysis program (APG) has the following three functions.

a3. 축적 기능a3. Accumulation function

이것은 노래방 장치(10-m)로부터 각 가창곡의 가창 샘플 데이터(DS)를 1곡만큼씩 취득하고, 취득한 가창 샘플 데이터(DS)를 가창 샘플 데이터베이스(DBS)에 축적하는 기능이다.This is a function to acquire the vocal sample data DS of each vocal track from the karaoke system 10-m one song at a time and store the acquired vocal sample data DS in the vocal sample database DBS.

b3. 재기록 기능b3. Rewrite function

이것은 가창 샘플 데이터베이스(DBS)에 축적된 가창 샘플 데이터(DS)의 각각에 대해서 상기 가창 샘플 데이터(DS)가 나타내는 파형 내로부터 표정 가창의 특징 파형을 탐색하고, 이 탐색 결과로부터 표정 가창의 대상이 된 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상의 각 시각(t)과 그들 시각(t)에 있어서의 표정 가창의 출현수(Num)의 관계를 나타내는 통계 데이터를 생성하고, 레퍼런스 데이터베이스(DBR) 내의 표정 가창 레퍼런스 데이터(DD)에 있어서의 각 시각(t)과 대응하는 평가점[VSR(t)]을 통계 데이터의 내용에 의거하여 재기록하는 기능이다.This means that for each of the voxel sample data DS accumulated in the voxel sample database DBS, the characteristic waveform of the upper window of expression is searched from within the waveform indicated by the voxel sample data DS, and from this search result, Statistic data indicating the relationship between each time t on the time axis at which the sounding start time of the note [NT (i)] is set as the reference point t _BS and the appearance number (Num) And rewriting the evaluation point VSR (t) corresponding to each time t in the currently viewed reference data DD in the reference database DBR based on the contents of the statistical data.

c3. 송신 기능c3. Transmission function

이것은 재기록 기능에 의해 재기록된 표정 가창 레퍼런스 데이터(DD)를 노래방 장치(10-m)로부터의 요구에 따라 노래방 장치(10-m)에 송신하는 기능이다.This is a function to transmit the referenced reference data DD rewritten by the rewriting function to the karaoke apparatus 10-m in response to a request from the karaoke apparatus 10-m.

이어서, 본 실시형태의 동작을 설명한다. 도 7은 본 실시형태의 동작을 나타내는 플로우차트이다. 도 7에 있어서, 노래방 장치(10-m)의 CPU(17)는 가창곡의 가창 개시 조작이 행해진 경우(S100: Yes), 시퀀서(21)에 제어 신호(S_O)를 공급해서 시퀀서(21)에 처리(상술한 제 1∼제 3 처리)를 개시시킨다(S120). CPU(17)는 시퀀서(21)에 의한 처리가 개시되면 표준 가창 평가 처리(S130)와 표정 가창 평가 처리(S140)의 2가지 처리를 행한다. 이 2가지 처리의 상세한 것은 다음과 같다.Next, the operation of the present embodiment will be described. 7 is a flowchart showing the operation of this embodiment. 7, the CPU 17 of the karaoke system 10-m supplies the control signal S _o to the sequencer 21 and _{outputs the} control signal S _o to the sequencer 21 (The first to third processes described above) (S120). When the processing by the sequencer 21 is started, the CPU 17 performs two processes, that is, a standard vocal evaluation process (S130) and a facial expression evaluation process (S140). Details of these two processes are as follows.

a4. 표준 가창 평가 처리(S130)a4. Standard vocal evaluation processing (S130)

이 처리에서는 CPU(17)는 시퀀서(21)로부터 이벤트[EV(i)_ON]가 공급되고 나서 다음 이벤트[EV(i)_OFF]가 공급되기까지의 시간을 i번째의 노트[NT(i)]에 상당하는 음의 발음 시간[T_NT(i)]으로 한다. CPU(17)는 발음 시간[T_NT(i)] 동안의 보컬 어댑터(16)의 출력 신호(S_P)가 나타내는 피치와 이벤트[EV(i)_ON]의 노트 넘버를 변환한 모범 피치(PCH_REF)의 차(PCH_DEF), 및 그 동안의 신호(S_P)가 나타내는 음량과 이벤트[EV(i)_ON]의 벨로시티를 변환한 모범 음량(LV_REF)의 차(LV_DEF)를 구하고, 이 차(PCH_DEF) 및 차(LV_DEF)가 소정 범위로 되는 경우에 노트[NT(i)]의 가창이 합격이라고 판정한다. CPU(17)는 이용자에 의한 가창의 개시부터 종료까지의 동안에 걸쳐서 이 노트 판정을 행하고, 가창의 종료 시점에 있어서의 전체 노트[TN(i)]의 수를 합격이라고 판정한 노트[NT(i)]의 수로 나눈 값에 100을 곱한 값을 기본 득점(SR_BASE)으로 한다.In this process, the CPU 17 determines the time from the supply of the event EV (i) _ON to the supply of the next event EV (i) _OFF from the sequencer 21 to the i-th note NT (i) (T _NT (i)] equivalent to the sounding time [T _NT (i)]. The CPU 17 sets the pitches indicated by the output signals S _P of the vocal adapter 16 and the note numbers of the event EV (i) _ON during the sounding time [T _NT (i) _{A difference} LV _DEF between the volume of the signal PCP _DEF and the signal _SP during that time and the ideal volume LV _REF of the velocity EV of the event EV (i) _ON is obtained , And when the difference PCH _DEF and the difference LV _DEF fall within a predetermined range, it is judged that the buzzer of the note NT (i) is acceptable. The CPU 17 performs this note determination during the period from the start to the end of the verse by the user and judges that the number of all notes TN (i) )] Is multiplied by 100, and the base score (SR _BASE ) is set.

또한, 이 처리에서는 CPU(17)는 보컬 어댑터(16)의 출력 신호(S_P)가 나타내는 피치 파형 내에 늦추기, 비브라토, 꾸밈음, 포르타멘토, 폴 중 어느 표정 가창의 특징 파형이 출현했는지의 여부를 판정한다. 여기에서, 늦추기의 특징 파형의 판정 수법의 상세한 것은 특허문헌 2를, 비브라토의 특징 파형의 판정 수법의 상세한 것은 특허문헌 3을, 꾸밈음의 특징 파형의 판정 수법의 상세한 것은 특허문헌 4를, 포르타멘토의 특징 파형의 판정 수법의 상세한 것은 특허문헌 5를, 폴의 특징 파형의 판정 수법의 상세한 것은 특허문헌 6을 참조하길 바란다. CPU(17)는 이용자에 의한 가창의 개시부터 종료까지의 동안에 걸쳐서 이 특징 파형 판정을 행하고, 가창의 종료 시점에 있어서의 표정 가창의 출현수에 소정의 계수를 곱한 값을 가산점(SR_ADD)으로 한다. 그리고, 이 처리에서는 기본 득점(SR_BASE)과 가산점(SR_ADD)의 합계를 표준 득점(SR_NOR)으로 한다.In this process, the CPU 17 judges whether or not the characteristic waveform of the face of any expression among the delay, the vibrato, the ornamental, the portamento, and the pole appears in the pitch waveform indicated by the output signal S _P of the vocal adapter 16 do. Here, details of the determining method of the delayed characteristic waveform are described in Patent Document 2, details of the determination method of the characteristic waveform of vibrato are described in Patent Document 3, details of the method of determining the characteristic waveform of ornaments are described in Patent Document 4, Refer to Patent Document 5 for details of the determination method of the characteristic waveform and Patent Document 6 for details of the determination method of the characteristic waveform of the pole. The CPU 17 performs this characteristic waveform determination during the period from the beginning to the end of the vowel by the user and calculates a value obtained by multiplying the appearance number of times of appearance at the end of the vowel by a predetermined coefficient to the addition point SR _ADD do. In this process, the sum of the basic score (SR _BASE ) and the addition score (SR _ADD ) is set as a standard score (SR _NOR ).

b4. 표정 가창 평가 처리(S140)b4. Expression voxel evaluation processing (S140)

이 처리에서는 CPU(17)는 음원 이벤트[EV(i)_ON]의 출력으로부터 다음 이벤트[EV(i)_OFF]의 출력까지의 시간을 i번째의 노트[NT(i)]에 상당하는 음의 발음 시간[T_NT(i)]으로 한다. 그리고, CPU(17)는 발음 시간[T_NT(i)] 동안의 보컬 어댑터(16)의 출력 신호(S_P)가 나타내는 피치 파형 내에 표정 가창의 특징 파형이 출현한 경우에는 발음 시간[T_NT(i)] 내에 있어서의 표정 가창의 출현 시각과 출현한 표정 가창의 종류를 구한다. CPU(17)는 그와 같이 특정한 표정 가창의 종류와 출현 시각을 나타내는 표정 가창 출현 데이터를 생성한다.In this processing, the CPU 17 sets the time from the output of the sound source event EV (i) _{ON to} the output of the next event EV (i) _OFF to the sound of the note corresponding to the i-th note NT (i) And the pronunciation time [T _NT (i)]. When the appearance waveform of the upper phoneme appears in the pitch waveform indicated by the output signal S _P of the vocal adapter 16 during the sounding time [T _NT (i)], the CPU 17 sets the sounding time [T _NT (i)] and the type of expression window in which the appearance occurred. The CPU 17 generates facial expression appearance data indicating the type of the specific expression window and the appearance time as described above.

그리고, CPU(17)는 생성한 표정 가창 출현 데이터에 나타내어지는 표정 가창 및 그 출현 시각에 따른 평가점[VSR(t)]을 표정 가창 레퍼런스 데이터(DD)가 나타내는 일련의 평가점[VSR(t)] 중으로부터 선택한다. CPU는 이용자에 의한 가창의 개시부터 종료까지의 동안에 걸쳐서 이러한 평가점[VSR(t)]의 선택을 행하고, 가창의 종료 시점에 있어서의 평가점[VSR(t)]의 평균값을 표정 득점(SR_EX)으로 한다.Then, the CPU 17 compares the evaluation point VSR (t) according to the expression window VSR (t) according to the expression window VSR (t) represented by the generated expression window appearance data and the appearance time reference data DD )]. The CPU selects such an evaluation point VSR (t) over the period from the start to the end of the vowel by the user and calculates the average value of the evaluation points VSR (t) _EX ).

CPU(17)는 이용자에 의한 가창곡의 가창이 종료되면 평가 결과 제시 처리를 행한다(S150). 평가 결과 제시 처리에서는 CPU(17)는 표준 가창 평가 처리에 의해 채점된 표준 득점(SR_NOR)과 표정 가창 평가 처리에 의해 채점된 표정 득점(SR_EX) 중 높은 쪽의 득점을 선택한다. 그리고, CPU(17)는 표준 득점(SR_NOR)을 선택한 경우 이 득점(SR_NOR)과, 예를 들면 「쿨하고 정치(精緻)한 노래네요」와 같은 득점(SR_NOR)에 따른 코멘트 메시지를 표시부(14)에 표시시킨다. 또한, CPU(17)는 표정 득점(SR_EX)을 선택한 경우 이 득점(SR_EX)과, 예를 들면 「인정미 넘치네요」와 같은 표정 득점(SR_EX)에 따른 코멘트 메시지를 표시부(14)에 표시시킨다.The CPU 17 performs the evaluation result presentation processing when the user has finished the singing of the singing song (S150). In the evaluation result presentation process, the CPU 17 selects a higher score among the standard score (SR _NOR ) scored by the standard vocal evaluation process and the expression score (SR _EX ) scored by the expression evaluation process. And the comment message according to the CPU (17) is a standard score points (SR _NOR) If you select (SR _NOR) is "I sing a cool and politics (精緻)" scores (SR _NOR) and, for example, such as And displays it on the display unit 14. [ Further, displayed in the CPU (17) is a facial expression score (SR _EX), the score (SR _EX) and, for example, a comment the message display section 14 according to the facial expression score (SR _EX), such as "injeongmi neomchineyo" if you select the .

이어서, CPU(17)는 샘플 송신 처리를 행한다(S160). 샘플 송신 처리에서는 CPU(17)는 가창곡의 가창의 개시부터 종료까지의 동안에 보컬 어댑터(16)가 출력한 신호(S_P 및 S_L)를 상기 가창곡의 가창 샘플 데이터(DS)로 해서 이 가창 샘플 데이터(DS)와 스텝(S130)에서 구한 기본 득점(SR_BASE)(가창 평가 데이터)를 포함하는 메시지(MS1)를 서버 장치(30)에 송신한다.Subsequently, the CPU 17 performs sample transmission processing (S160). In the sample transmission process, the CPU 17 uses the signals (S _P and S _L ) output from the vocal adapter 16 during the period from the beginning to the end of the singing of the singing song as the singular sample data DS of the singing song To the server device 30, a message MS1 including the main sample data DS and the basic score (SR _BASE ) obtained in step S130 (the later evaluation data).

서버 장치(30)의 CPU(37)는 노래방 장치(10-m)로부터 메시지(MS1)를 취득하면(S200: Yes) 이 메시지(MS1)로부터 가창 샘플 데이터(DS)와 기본 득점(SR_BASE)을 추출하고, 이 기본 득점(SR_BASE)을 상급자와 그렇지 않은 자를 나누는 기준 득점(SR_TH)(예를 들면, 80점으로 함)과 비교한다(S220). CPU(37)는 기본 득점(SR_BASE)이 기준 득점(SR_TH)보다 높을 경우(S220: Yes), 메시지(MS1)로부터 추출한 가창 샘플 데이터(DS)를 가창 샘플 데이터베이스(DBS)에 축적한다(S230).When the CPU 37 of the server apparatus 30 acquires the message MS1 from the karaoke system 10-m (S200: Yes), the CPU 37 of the server apparatus 30 reads the voiced sample data DS and the basic score SR _BASE from the message MS1, And compares the basic score SR _BASE with a reference score SR _TH (for example, 80 points) for dividing the superior score and the non-superior score (S220). When the basic score SR _BASE is higher than the reference score SR _TH (S220: Yes), the CPU 37 stores the voxel sample data DS extracted from the message MS1 in the voxel sample DBS S230).

계속해서, CPU(37)는 재기록 처리를 행한다(S240). 재기록 처리에서는 CPU(37)는 다음의 5가지 처리를 행한다. 제 1 처리에서는 CPU(37)는 가창 샘플 데이터베이스(DBS)에 축적된 각 가창 샘플 데이터(DS)가 나타내는 피치 파형 내로부터 늦추기의 특징 파형을 탐색하고, 이 탐색 결과를 나타내는 표정 가창 출현 데이터{늦추기가 출현한 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상의 각 시각(t)을 나타내는 데이터}를 생성한다. 계속해서, CPU(37)는 늦추기에 관해 생성한 표정 가창 출현 데이터에 의거하여 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상의 각 시각(t)과 그들 시각(t)에 있어서의 표정 가창 「늦추기」의 출현수(Num)의 관계를 나타내는 통계 데이터를 생성하고, 표정 가창 레퍼런스 데이터(DD_a1)에 있어서의 각 시각(t)과 대응하는 평가점[VSR(t)]을 이 통계 데이터의 내용에 의거하여 재기록한다.Subsequently, the CPU 37 performs a rewriting process (S240). In the rewriting process, the CPU 37 performs the following five processes. In the first process, the CPU 37 searches for the characteristic waveform of the delay from within the pitch waveform indicated by each of the larger sample data DS stored in the larger-size sample database DBS, and outputs the larger expression data { (T) on the time axis on which the pronunciation start time of the note [NT (i)] in which the sound generation start time of the note [NT (i) appears) as the reference point (t _BS ). Subsequently, the CPU 37 compares the time t on the time axis on which the pronunciation start time of the note [NT (i)] is set as the reference point t _BS and the time t t) the evaluation points corresponding to the respective time (t) in the expression singing generate statistical data showing the relationship between the number of occurrences of the "slow" (Num), and expression singing reference data (DD _a1) in the [VSR ( t)] is rewritten based on the contents of this statistical data.

도 8은 늦추기에 대한 통계 데이터의 일례를 나타내는 도면이다. 이 예의 통계 데이터에서는 기준점(t_BS)보다 시간(T1_a1)만큼 전의 시각(t1_a1)과 기준점(t_BS)보다 시간(T4_a1)만큼 후의 시각(t4_a1) 동안에 표정 가창의 출현수(Num)가 분포되어 있다. 그리고, 이 예의 통계 데이터에서는 기준점(t_BS)의 직후의 시각(t2_a1)에 출현수(Num)의 최대 피크가 나타내어져 있고, 시각(t2_a1)보다 늦은 시각(t3_a1)에 출현수(Num)의 2번째의 피크가 나타내어져 있다. 따라서, 이 예의 통계 데이터에 의한 재기록 후의 표정 가창 레퍼런스 데이터(DD_a1)에서는 시각(t2_a1)의 평가점[VSR(t2_a1)]이 가장 높게 되고, 시각(t3_a1)의 평가점[VSR(t3_a1)]이 2번째로 높게 된다.8 is a diagram showing an example of statistical data for slowing down. In this example, statistical data, the reference point (t _BS) than the time (T1 _a1) number of occurrences of facial expression singing during the previous time (t1 _a1) and the reference point (t _BS) than the time (T4 _a1) time (t4 _a1) after as much as (Num ) Are distributed. In the statistical data of this example, the maximum peak of the number of occurrences Num is indicated at time t2 _a1 immediately after the reference point t _BS , and the maximum number of occurrences (Num) at time t3 _a1 later than time t2 _a1 Num) are shown. Thus, in this example the statistical data expression singing reference data (DD _a1) after rewriting according to the evaluation points [VSR (t2 _a1)] of the time (t2 _a1) is the highest, the point [VSR of time (t3 _a1) Evaluation ( t3 _a1 )] becomes the second highest.

제 2 처리에서는 CPU(37)는 가창 샘플 데이터베이스(DBS)에 축적된 각 가창 샘플 데이터(DS)가 나타내는 피치 파형 내로부터 비브라토의 특징 파형을 탐색하고, 이 탐색 결과를 나타내는 표정 가창 출현 데이터{비브라토가 출현한 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상의 각 시각(t)을 나타내는 데이터}를 생성한다. 계속해서, CPU(37)는 비브라토에 관해 생성한 표정 가창 출현 데이터에 의거하여 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상의 각 시각(t)과 그들 시각(t)에 있어서의 표정 가창의 출현수(Num)의 관계를 나타내는 통계 데이터를 생성하고, 표정 가창 레퍼런스 데이터(DD_a2)에 있어서의 각 시각(t)과 대응하는 평가점[VSR(t)]을 이 통계 데이터의 내용에 의거하여 재기록한다.In the second process, the CPU 37 searches for the characteristic waveform of the vibrato from within the pitch waveform indicated by each of the upper sample data DS stored in the upper sample database DBS, and outputs the facial expression data {vibrato (T) on the time axis on which the pronunciation start time of the note [NT (i)] in which the sound generation start time of the note [NT (i) appears) as the reference point (t _BS ). Subsequently, the CPU 37 compares each time t on the time axis on which the pronunciation start time of the note [NT (i)] is set as the reference point t _BS based on the appearance appearance window data generated about the vibrato and the time (t)) corresponding to each time (t) in the reference reference data (DD _a2 ) is generated, and statistical data representing the relationship of the number of appearances of the expression window (Num) Is rewritten based on the contents of this statistical data.

도 9는 비브라토에 대한 통계 데이터의 일례를 나타내는 도면이다. 이 예의 통계 데이터에서는 기준점(t_BS)과 기준점(t_BS)보다 시간(T2_a2)만큼 후의 시각(t2_a2) 동안에 표정 가창의 출현수(Num)가 분포되어 있다. 그리고, 이 예의 통계 데이터에서는 기준점(t_BS)보다 시간(T1_a2)만큼 후의 시각(t1_a2)에 출현수(Num)의 최대 피크가 나타내어져 있다. 따라서, 이 예의 통계 데이터에 의한 재기록 후의 표정 가창 레퍼런스 데이터(DD_a2)에서는 시각(t1_a2)의 평가점[VSR(t1_a2)]이 가장 높게 된다.9 is a diagram showing an example of statistical data for vibrato. In the statistical data of this example, the appearance number Num of the expression window is distributed during the time t2 _a2 after the reference point t _BS and the reference point t _BS by the time T2 _a2 . In the statistical data of this example, the maximum peak of the number of appearances (Num) is shown at a time (t1 _a2 ) later than the reference point (t _BS ) by the time (T1 _a2 ). Therefore, the evaluation point VSR (t1 _a2 ) at time t1 _a2 becomes the highest in the reference large reference data DD _a2 after rewriting by the statistical data of this example.

제 3 처리에서는 CPU(37)는 가창 샘플 데이터베이스(DBS)에 축적된 각 가창 샘플 데이터(DS)가 나타내는 피치 파형 내로부터 꾸밈음의 특징 파형을 탐색하고, 이 탐색 결과를 나타내는 표정 가창 출현 데이터{꾸밈음이 출현한 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상의 각 시각(t)을 나타내는 데이터}를 생성한다. 계속해서, CPU(37)는 꾸밈음에 관해 생성한 표정 가창 출현 데이터에 의거하여 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상의 각 시각(t)과 그들 시각(t)에 있어서의 표정 가창의 출현수(Num)의 관계를 나타내는 통계 데이터를 생성하고, 표정 가창 레퍼런스 데이터(DD_a3)에 있어서의 각 시각(t)과 대응하는 평가점[VSR(t)]을 이 통계 데이터의 내용에 의거하여 재기록한다.In the third process, the CPU 37 searches for the feature waveform of the embellishment from within the pitch waveform indicated by each of the upper sample data DS stored in the upper sample database DBS, and outputs the facial expression data { (T) on the time axis on which the sounding start time of the appearing note NT (i) is set as the reference point (t _BS ). Subsequently, the CPU 37 compares the time t on the time axis on which the pronunciation start time of the note [NT (i)] is set as the reference point t _BS and the time t (t)) corresponding to each time (t) in the look-up reference data (DD _a3 ) is generated, and statistical data representing the relationship of the number of appearances (Num) Is rewritten based on the contents of this statistical data.

도 10은 꾸밈음에 대한 통계 데이터의 일례를 나타내는 도면이다. 이 예의 통계 데이터에서는 기준점(t_BS)과 기준점(t_BS)보다 시간(T2_a3)만큼 후의 시각(t2_a3) 동안에 표정 가창의 출현수(Num)가 분포되어 있다. 그리고, 이 예의 통계 데이터에서는 기준점(t_BS)보다 시간(T1_a3)만큼 후의 시각(t1_a3)에 출현수(Num)의 최대 피크가 나타내어져 있다. 따라서, 이 예의 통계 데이터에 의한 재기록 후의 표정 가창 레퍼런스 데이터(DD_a3)에서는 시각(t1_a3)의 평가점[VSR(t1_a3)]이 가장 높게 된다.10 is a diagram showing an example of statistical data on ornaments. In the statistical data of this example, the appearance number Num of the expression window is distributed during the time t2 _a3 after the reference point t _BS and the reference point t _BS by the time T2 _a3 . In the statistical data of this example, the maximum peak of the number of appearances (Num) is shown at a time (t1 _a3 ) later than the reference point (t _BS ) by the time (T1 _a3 ). Therefore, the evaluation point VSR (t1 _a3 ) at the time t1 _a3 is the highest in the look-up reference main data DD _a3 after rewriting by the statistical data of this example.

제 4 처리에서는 CPU(37)는 가창 샘플 데이터베이스(DBS)에 축적된 각 가창 샘플 데이터(DS)가 나타내는 피치 파형 내로부터 포르타멘토의 특징 파형을 탐색하고, 이 탐색 결과를 나타내는 표정 가창 출현 데이터{포르타멘토가 출현한 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상의 각 시각(t)을 나타내는 데이터}를 생성한다. 계속해서, CPU(37)는 포르타멘토에 관해 생성한 표정 가창 출현 데이터에 의거하여 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상의 각 시각(t)과 그들 시각(t)에 있어서의 표정 가창의 출현수(Num)의 관계를 나타내는 통계 데이터를 생성하고, 표정 가창 레퍼런스 데이터(DD_a4)에 있어서의 각 시각(t)과 대응하는 평가점[VSR(t)]을 이 통계 데이터의 내용에 의거하여 재기록한다.In the fourth process, the CPU 37 searches for the characteristic waveform of the portamento from within the pitch waveform indicated by each of the alternate sample data DS stored in the alternate sample database DBS, and outputs the facial expression data {Portamento (T) on the time axis on which the pronunciation start time of the note [NT (i)] in which the sound generation start time of the note [NT (i) appears) as the reference point (t _BS ). Subsequently, on the basis of the appearance appearance date data generated for the portamento, the CPU 37 compares the time t on the time axis, at which the pronunciation start time of the note [NT (i)] is set as the reference point t _BS , (t)) corresponding to each time (t) in the reference large reference data (DD _a4 ) is generated, and statistical data representing the relationship of the number of appearances (Num) Is rewritten based on the contents of this statistical data.

도 11은 포르타멘토에 대한 통계 데이터의 일례를 나타내는 도면이다. 이 예의 통계 데이터에서는 기준점(t_BS)과 기준점(t_BS)보다 시간(T2_a4)만큼 후의 시각(t2_a4) 동안에 표정 가창의 출현수(Num)가 분포되어 있다. 그리고, 이 예의 통계 데이터에서는 기준점(t_BS)에 출현수(Num)의 최대 피크가 나타내어져 있고, 기준점(t_BS)보다 시간(T1_a4)만큼 늦은 시각(t1_a4)에 출현수(Num)의 2번째의 피크가 나타내어져 있다. 따라서, 이 예의 통계 데이터에 의한 재기록 후의 표정 가창 레퍼런스 데이터(DD_a4)에서는 시각(t_BS)의 평가점[VSR(t_BS)]이 가장 높게 되고, 시각(t1_a4)의 평가점[VSR(t1_a4)]이 2번째로 높게 된다.11 is a diagram showing an example of statistical data for portamento. In the statistical data of this example, the appearance number Num of the expression window is distributed during the time t2 _a4 after the reference point t _BS and the reference point t _BS by the time T2 _a4 . And, in this example statistical data in the number of occurrences in the reference point (t _BS) appearance number (Num) of and the maximum peak is shown, the reference point (t _BS) than the time (T1 _a4) as a late time (t1 _a4) to (Num) The second peak is shown. Thus, in this example the statistical data expression singing reference data (DD _a4) after rewriting according to the evaluation points [VSR (t _BS)] at time (t _BS) is the highest, the point [VSR of time (t1 _a4) Evaluation ( t1 _a4 )] becomes the second highest.

제 5 처리에서는 CPU(37)는 가창 샘플 데이터베이스(DBS)에 축적된 각 가창 샘플 데이터(DS)가 나타내는 피치 파형 내로부터 폴의 특징 파형을 탐색하고, 이 탐색 결과를 나타내는 표정 가창 출현 데이터{폴이 출현한 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상의 각 시각(t)을 나타내는 데이터}를 생성한다. 계속해서, CPU(37)는 폴에 관해 생성한 표정 가창 출현 데이터에 의거하여 노트[NT(i)]의 발음 개시 시각을 기준점(t_BS)으로 하는 시간축 상의 각 시각(t)과 그들 시각(t)에 있어서의 표정 가창의 출현수(Num)의 관계를 나타내는 통계 데이터를 생성하고, 표정 가창 레퍼런스 데이터(DD_a5)에 있어서의 각 시각과 대응하는 평가점[VSR(t)]을 이 통계 데이터의 내용에 의거하여 재기록한다.In the fifth process, the CPU 37 searches for the characteristic waveform of the pole from within the pitch waveform indicated by each of the alternate sample data DS stored in the alternate sample database DBS, and outputs the facial expression data { (T) on the time axis on which the sounding start time of the appearing note NT (i) is set as the reference point (t _BS ). Subsequently, the CPU 37 compares the time t on the time axis on which the pronunciation start time of the note [NT (i)] is set as the reference point t _BS and the time t and the evaluation point VSR (t) corresponding to each time in the reference reference data DD _{a5 is} calculated as the statistical data indicating the relationship of the number of appearances of the expression window Rewrite based on the contents of the data.

도 12는 폴에 대한 통계 데이터의 일례를 나타내는 도면이다. 이 예의 통계 데이터에서는 기준점(t_BS)보다 시간(T1_a5)만큼 후의 시각(t1_a5)과 시각(t_BS)으로부터 시간(T2_a5)만큼 후의 시각(t2_a5) 동안에 표정 가창의 출현수(Num)가 분포되어 있다. 그리고, 이 예의 통계 데이터에서는 시각(t2_a5)에 출현수(Num)의 최대 피크가 나타내어져 있다. 따라서, 이 예의 통계 데이터에 의한 재기록 후의 표정 가창 레퍼런스 데이터(DD_a5)에서는 시각(t2_a5)의 평가점[VSR(t2_a5)]이 가장 높게 된다.12 is a diagram showing an example of statistical data for a poll. In this example, statistical data, the reference point (t _BS) than the time (T1 _a5) number of occurrences of facial expression singing during the time (t1 _a5) and time (T2 _a5) time (t2 _a5) after as much from the time (t _BS) after as (Num ) Are distributed. In the statistical data of this example, the maximum peak of the number of appearances (Num) is shown at time t2 _a5 . Therefore, the evaluation point VSR (t2 _a5 ) at time t2 _a5 is the highest in the reference large reference data DD _a5 after rewriting by the statistical data of this example.

도 7에 있어서, 노래방 장치(10-m)의 CPU(17)는 미리 결정된 조회 시각이 도래할 때마다(S110: Yes) 조회 처리를 행한다(S170). 이 조회 처리에서는 CPU(17)는 최신 데이터의 송신을 요구하는 메시지(MS2)를 서버 장치(30)에 송신한다(S170). 서버 장치(30)의 CPU(37)는 노래방 장치(10-m)로부터 메시지(MS2)를 수신하면(S210: Yes) 전회의 메시지(MS2)의 수신 시각으로부터 금회의 메시지(MS2)의 수신 시각까지의 동안에 내용을 재기록한 표정 가창 레퍼런스 데이터(DD)를 메시지(M2)의 송신원의 노래방 장치(10-m)에 송신한다(S250). 노래방 장치(10-m)의 CPU(17)는 서버 장치(30)로부터 표정 가창 레퍼런스 데이터(DD)를 수신하면, 이 표정 가창 레퍼런스 데이터(DD)를 레퍼런스 데이터베이스(DBRK)에 덮어쓰기해서 그 내용을 갱신한다(S180).In Fig. 7, the CPU 17 of the karaoke system 10-m performs the inquiry processing every time a predetermined inquiry time arrives (S110: Yes) (S170). In this inquiry processing, the CPU 17 transmits a message MS2 requesting transmission of the latest data to the server device 30 (S170). When the CPU 37 of the server device 30 receives the message MS2 from the karaoke system 10-m (S210: Yes), the CPU 37 of the server device 30 determines the reception time of the current message MS2 from the reception time of the previous message MS2 To the karaoke system 10-m of the sender of the message M2 (S250). The CPU 17 of the karaoke system 10-m overwrites the reference data DB DD in the reference database DBRK when receiving the reference data DD from the server device 30, (S180).

이상이 본 실시형태의 구성의 상세한 것이다. 본 실시형태에 의하면, 다음의 효과가 얻어진다.The configuration of the present embodiment has been described in detail above. According to the present embodiment, the following effects can be obtained.

제 1로, 본 실시형태의 표정 부여 가창 평가 처리에서는 보컬 어댑터(16)의 출력 신호의 파형에 표정 가창의 특징 파형이 출현할 때마다 표정 가창의 대상이 된 노트[NT(i)]의 발음 개시 시각을 기준점으로 하는 시간축 상에서 있어서의 표정 가창의 특징 파형의 출현 시각을 구하고, 이 출현 시각과 대응하는 평가점[VSR(t)]을 가창 레퍼런스 데이터(DD) 내의 각 평가점[VSR(t)] 중으로부터 선택하고, 이 선택한 평가점[VSR(t)]에 의거하여 가창의 교졸을 평가한다. 따라서, 본 실시형태에 의하면 이용자가 표정 가창을 행했다고 해도 그 타이밍이 적절하지 않으면 양호한 평가가 얻어지지 않게 된다. 따라서, 본 실시형태에 의하면 사람의 감성에 의한 것에 보다 가까운 평가 결과를 제시할 수 있다.First, in the facial expression granting high-order evaluation processing of the present embodiment, every time a characteristic waveform of a face expression appears in the waveform of the output signal of the vocal adapter 16, the pronunciation of the note [NT (i)] VSR (t)] corresponding to the appearance time is obtained from each evaluation point [VSR (t) in the reference data DD in the reference data DD, )], And evaluates the largest number of professors based on the selected evaluation point [VSR (t)]. Therefore, according to the present embodiment, even if the user performs the expression window, if the timing is not appropriate, good evaluation can not be obtained. Therefore, according to the present embodiment, it is possible to present an evaluation result closer to that of human emotion.

제 2로, 본 실시형태에서는 가창 샘플 데이터베이스(DBS)내에 축적된 표정 가창 레퍼런스 데이터(DD)의 각각에 대해서 상기 데이터(DD)가 나타내는 파형 내로부터 표정 가창의 특징 파형을 탐색하고, 이 탐색 결과로부터 표정 가창의 대상이 된 노트[NT(i)]의 발음 개시 시각을 기준점으로 하는 시간축 상의 각 시각과 그들 시각에 있어서의 표정 가창의 출현수의 관계를 나타내는 통계 데이터를 생성하고, 가창 레퍼런스 데이터(DD)에 있어서의 각 시각과 대응하는 평가점[VSR(t)]을 통계 데이터의 내용에 의거하여 재기록한다. 따라서, 본 실시형태에 의하면 가창곡을 반복해서 부르고 있는 상급자들의 가창 방식의 경향의 변화를 평가 결과에 반영시킬 수 있다.Secondly, in the present embodiment, for each of the reference large reference data DD stored in the large sample database DBS, the feature waveform of the largest expression is searched for from the waveform indicated by the data DD, , Statistical data representing the relationship between each time on the time axis with the start point of time of the note [NT (i)] as the reference point as the reference point and the number of occurrences of the facial expression at that time is generated from the reference data VSR (t)] corresponding to each time point in the statistical data DD is rewritten based on the contents of the statistical data. Therefore, according to the present embodiment, it is possible to reflect a change in tendency of the singing method of a senior who is repeatedly singing a song in the evaluation result.

이상, 본 발명의 일실시형태에 대해서 설명했지만 본 발명에는 그 밖에도 실시형태가 있을 수 있다. 예를 들면, 이하와 같다.Although the embodiment of the present invention has been described above, the present invention may have other embodiments. For example, as follows.

(1) 상기 실시형태에서는 CPU(17)는 늦추기, 비브라토, 꾸밈음, 포르타멘토, 폴의 5종류의 표정 가창을 보컬 어댑터(16)의 출력 신호(S_P)로부터 검출했다. 그러나, 이 5종류 이외의 표정 가창을 검출해도 좋다. 예를 들면, 억양을 부여한 가창을 검출해도 좋다.(1) In the above embodiment, the CPU 17 detects five types of expression windows, namely, slowing, vibrato, ornaments, portamento, and pole from the output signal S _p of the vocal adapter 16. However, it is also possible to detect facial expression windows other than these five kinds. For example, a vowel to which an accent is given may be detected.

(2) 상기 실시형태에서는 CPU(17)는 보컬 어댑터(16)의 출력 신호(S_P 및 S_L)의 양쪽을 사용하여 표준 가창 평가 처리를 행하고, 보컬 어댑터(16)의 출력 신호(S_P 및 S_L) 중 피치를 나타내는 신호(S_P)만을 사용하여 표정 가창 평가 처리를 행했다. 그러나, CPU(17)는 신호(S_P 및 S_L)의 한쪽만을 사용하여 표준 가창 평가 처리를 행해도 좋다. 또한, CPU(17)는 신호(S_P 및 S_L)의 양쪽을 사용하여 표정 가창 평가 처리를 행해도 좋다.(2) In the above embodiment, the CPU 17 performs standard vowel evaluation processing using both of the output signals S _P and S _L of the vocal adapter 16, and outputs the output signal S _P And S _L ), only the signal ( _Sp ) indicating the pitch was used to perform the expression evaluation process. However, CPU (17) using only one of the signals (S _P, and S _L) may be carried out for a standard evaluation processing singing. Further, the CPU 17 may perform both the facial expression evaluation processing using both signals S _P and S _L.

(3) 상기 실시형태의 표정 가창 평가 처리에서는 표정 가창의 특징 파형의 출현 시각에 의거하여 가창의 교졸을 평가했다. 그러나, 표정 가창의 특징 파형의 출현 시각 이외의 요소(예를 들면, 늦추기, 비브라토, 꾸밈음, 포르타멘토, 폴의 각각의 길이나 깊이 등)을 가미한 평가를 행해도 좋다.(3) In the expression evaluation processing of the above embodiment, the evaluation of the highest degree of proficiency was based on the appearance time of the characteristic waveform of the expression window. However, an evaluation may be performed in addition to elements other than the appearance time of the characteristic waveform of the expression window (for example, delay, vibrato, ornament, portamento, and the respective lengths and depths of the pawl, etc.).

(4) 상기 실시형태의 표정 가창 평가 처리에서는 가창곡에 포함되는 노트의 각각에 따른 가창음에 있어서 출현하는 표정 가창을 검출하는 구성이 채용되어 있지만, 가창곡에 포함되는 일련의 복수의 노트(노트군)에 따른 가창음에 있어서 출현하는 표정 가창을 검출하는 구성이 채용되어도 좋다. 예를 들면, 크레셴도·데크레셴도와 같은 표정 가창은 일련의 복수의 노트의 가창에 있어서 행해지는 표정 가창이기 때문에, 그들 표정 가창의 검출 및 평가는 노트군을 단위로 해서 행해지는 편이 바람직하다. 따라서, 그와 같은 표정 가창에 관한 표정 가창 레퍼런스 데이터(DD)도 또한 노트군 단위로 구성되는 것이 바람직하다.(4) In the expression evaluation processing of the embodiment described above, a configuration is employed in which the expression window in which a note appears in the chorus according to each of the notes included in the chorus is detected, but a plurality of notes A group of notes) may be employed. For example, a face expression window such as Crescendo · Decrementeo is a window in which a plurality of notes of a series of notes are displayed. Therefore, it is preferable that the detection and evaluation of these expression windows are performed on a note group basis. Therefore, it is also preferable that the above-referenced reference data DD related to such a facial expression window is also constituted by a note group.

(5) 상기 실시형태에서는 노래방 장치(10)로부터 서버 장치(30)에 대하여, 가창곡의 가창의 개시부터 종료까지의 동안에 보컬 어댑터(16)가 출력한 신호(S_P 및 S_L)를 포함하는 가창 샘플 데이터(DS)(피치 음량 데이터)를 송신하고, 서버 장치(30)에 있어서는 가창 샘플 데이터(DS)로부터 각 표정 가창의 검출 및 그 출현의 타이밍의 특정 처리가 행해지는 구성이 채용되어 있다. 이것 대신에, 노래방 장치(10)로부터 서버 장치(30)에 대하여 마이크로폰(13)에 의해 수음된 음을 나타내는 음신호(S_M)(가창음을 나타내는 음성 파형 데이터)를 송신하고, 서버 장치(30)에 있어서 음신호(S_M)로부터 신호(S_P) 및 신호(S_L)를 생성하는 처리[상기 실시형태에 있어서의 보컬 어댑터(16)가 행하는 처리]가 행해지는 구성이 채용되어도 좋다. 또한, 노래방 장치(10)로부터 서버 장치(30)에 대하여 가창 평가 프로그램(VPG)에 따라 행해지는 표정 가창 평가 처리(S140)시에 특정한 표정 가창의 종별 및 그 출현의 타이밍을 나타내는 데이터(표정 가창 출현 데이터)를 송신하고, 서버 장치(30)에 있어서는 표정 가창의 검출 처리는 행하지 않고, 노래방 장치(10)로부터 송신되어 오는 표정 가창 출현 데이터에 의거하여 표정 가창 레퍼런스 데이터(DD)의 갱신 처리가 행해지는 구성이 채용되어도 좋다.(5) In the above embodiment, the karaoke system 10 to the server device 30 includes the signals (S _P and S _L ) output from the vocal adapter 16 during the period from the beginning to the end of the singing of the singing song A configuration is adopted in which the server apparatus 30 transmits the voiced sample data DS (pitch volume data) and performs detection processing of the identification window and the timing of appearance of the voices from the voiced sample data DS have. Instead of this, the karaoke system 10 transmits to the server device 30 a sound signal S _M (sound waveform data indicating a sound) indicating a sound received by the microphone 13, (Processing performed by the vocal adapter 16 in the above-described embodiment) for generating the signal S _P and the signal S _L from the sound signal S _M may be adopted . The karaoke system 10 also transmits data indicating the type of the specified expression window and the timing of appearance of the specified expression window in the facial expression evaluation processing (S140) performed on the server device 30 in accordance with the vocal evaluation program VPG The server device 30 does not perform the detection processing of the upper side expression and updates the upper side reference data DD based on the facial expression appearance data transmitted from the karaoke system 10 May be adopted.

(6) 상기 실시형태에서는 서버 장치(30)가 통계 데이터의 생성과 이것에 의거하는 표정 가창 레퍼런스 데이터(DD)의 재기록을 행했다. 그러나, 각 노래방 장치(10-m)가 과거에 자기(自機)에 의해 생성, 또는 다른 노래방 장치(10-m)로부터 직접 또는 서버 장치(30)를 통해서 취득한 가창음을 나타내는 음신호(S_M)나 그들 음신호(S_M)로부터 생성된 신호(S_P) 및 신호(S_L), 또는 그들 신호를 이용하여 특정한 표정 가창의 종별 및 그 출현의 타이밍을 나타내는 데이터(표정 가창 출현 데이터)를 하드디스크(20)에 기억해 두고, CPU(17)가 그것들을 판독해서 사용하여 서버 장치(30)가 S240에서 행하는 처리와 마찬가지의 처리, 즉 통계 데이터의 생성과 이것에 의거하는 표정 가창 레퍼런스 데이터(DD)의 재기록을 행하도록 해도 좋다.(6) In the above-described embodiment, the server device 30 generates statistical data and rewrites the facsimile reference data DD based on the generated statistical data. However, when each karaoke system 10-m generates a tone signal S (t), which indicates a karaoke sound generated in the past by the self apparatus or acquired from the other karaoke system 10-m or through the server apparatus 30 _M) and a signal (S _P) and the signal (S _L), or data (facial expression singing appearance data) by using these signals indicating the timing of a particular expression vocal type, and the emergence of the product from them the sound signal (S _M) Are stored in the hard disk 20 and the CPU 17 reads and uses them to perform processing similar to the processing that the server 30 performs in S240, that is, the generation of statistical data, (DD) may be rewritten.

(7) 상기 실시형태에 있어서의 가창의 평가의 방법 및 평가 결과의 가창자로의 제시의 형태는 다양하게 변경 가능하다. 예를 들면, 상기 실시형태에 있어서는 표준 가창 평가 처리(S130)에서 표정 가창의 출현 횟수에 의거하여 산출되는 가산점(SR_ADD)을 기본 득점(SR_BASE)과 합계함으로써 표준 득점(SR_NOR)을 산출하는 구성이 채용되어 있지만, 표준 가창 평가 처리에 있어서는 표정 가창의 출현은 고려하지 않고 기본 득점(SR_BASE)만을 산출하는 구성이 채용되어도 좋다. 또한, 상기 실시형태에 있어서는 가창자에 대하여 표준 가창 평가 처리에 의해 채점된 표준 득점(SR_NOR)과 표정 가창 평가 처리에 의해 채점된 표정 득점(SR_EX) 중 높은 쪽의 득점이 표시되지만, 그것들의 양쪽을 표시하거나 그들 합계 점수를 표시하는 등, 다른 형태로 가창자에 대한 평가 결과의 제시가 행해져도 좋다.(7) The method of evaluating the maximum number in the above-described embodiment and the form of presentation of the evaluation result as the plural number can be variously changed. For example, calculating a summing point (SR _ADD) the primary goal (SR _BASE) and standard score (SR _NOR) by the sum calculated based on the appearance frequency of expression singing in standard vocal evaluation processing (S130) In the above-described embodiment However, in the standard vowel evaluation process, a configuration may be adopted in which only the basic score (SR _BASE ) is calculated without considering the appearance of the expression window. Further, in the above embodiment, the higher-order score among the standard score (SR _NOR ) scored by the standard vowel evaluation processing and the facial expression score SR _EX that is scored by the facial expression evaluation processing is displayed for the phonemes, The result of evaluation on the singers may be presented in another form, such as displaying both of them or displaying their total score.

(8) 상기 실시형태에서는 표정 가창 레퍼런스 데이터(DD)의 갱신시에 기본 득점(SR_BASE)이 기준 득점(SR_TH)보다 높은 가창자를 상급자로 하고, 상급자에 관한 가창 샘플 데이터(DS)만을 사용하여 표정 가창 레퍼런스 데이터(DD)의 갱신을 행하는 구성이 채용되어 있다. 표정 가창 레퍼런스 데이터(DD)의 갱신에 사용하는 가창 샘플 데이터(DS)의 선택 방법은 이것에 한정되지 않는다. 예를 들면, 기본 득점(SR_BASE) 대신에 기본 득점(SR_BASE)에 가산점(SR_ADD)을 합계한 표준 득점(SR_NOR)을 상급자의 추정의 기준으로 해서 사용해도 좋다. 또한, 전혀 표정 가창을 행하지 않았기 때문에 기본 득점(SR_BASE)이 고득점으로 되어 있는 상급자를 제외하기 위해서, 하측의 역치[기준 득점(SR_TH)]에 추가하여 상측의 역치를 형성하고, 상측의 역치보다 높은 기본 득점[SR_BASE(또는 그 밖의 득점)]의 가창자의 가창 샘플 데이터(DS)는 표정 가창 레퍼런스 데이터(DD)의 갱신에는 사용하지 않는다는 구성이 채용되어도 좋다. 또한, 상기한 바와 같이 가창자를 상급자와 그 이외의 자로 양분하는 대신에, 예를 들면 기본 득점(SR_BASE)이 높은 가창자의 가창 샘플 데이터(DS)에 큰 가중을 붙여서 표정 가창 레퍼런스 데이터(DD)의 갱신에 사용하도록 해도 좋다.(8) In the above-described embodiment, when updating the reference magnified reference data DD, the supporter whose basic score (SR _BASE ) is higher than the reference score (SR _TH ) is supposed to be superior and only the maximal sample data DS And the reference data DD is updated. The method of selecting the voxel sample data DS to be used for updating the outermost reference data DD is not limited to this. For example, it may be used by the base points (SR _BASE) instead of the basic score summing point (SR _ADD) a standard score _(NOR SR) in total of the (SR _BASE) based on the advanced estimated. In addition, in order to exclude an advanced person who has a high score in the basic score (SR _BASE ) because he / she has not performed the expression window at all, the upper threshold value is formed in addition to the lower threshold value (reference score (SR _TH ) A configuration may be employed in which the maximal sample data DS of the ascendant of the higher basic score [SR _BASE (or other score)] is not used for updating the reference data DD. As described above, instead of dividing the vocabulary into the superior and the other words, the vocabulary reference data DD is added to the vocabulary sample data DS having a high basic score (SR _BASE ) May be used for the update.

(9) 상기 실시형태에서는 악곡 연주를 평가하는 연주 평가 장치의 일례로서, 가창용의 노래방 장치에 설치되어 가창 연주를 평가하는 연주 평가 장치를 나타냈지만, 본 발명에 의한 연주 평가 장치는 가창 연주의 평가에 한정되지 않고, 각종악기를 사용한 악곡 연주의 평가에도 적용 가능하다. 즉, 상기 실시형태에 있어서 사용한 「가창」이라는 말은 보다 일반적인 「연주」라는 말로 치환할 수 있다. 또한, 기악 연주를 평가하는 연주 평가 장치에 있어서는, 예를 들면 기타에 있어서의 초킹 등, 개개의 악기에 따른 표정 연주에 관한 평가가 행해지게 된다. 또한, 악곡이 가창곡이 아니고 악기용의 악곡인 경우 악기 연주용의 노래방 장치는 곡 데이터(MD)는 가사 트랙(TR_LY) 대신에, 예를 들면 악보를 나타내는 데이터와 악보의 각 구간(예를 들면, 2소절 또는 4소절의 블록 등)의 표시 시각을 나타내는 델타 타임이 시계열순으로 기술된 데이터인 악보 트랙을 포함하도록 구성되고, 시퀀서(21) 및 표시부(14)는 악보 트랙에 따라 악곡의 진행에 수반하여 반주 개소에 따른 악보를 나타내는 화상 신호를 디스플레이에 출력하도록 구성되게 된다. 또한, 가창용의 노래방 장치 및 악기 연주용의 노래방 장치에 있어서 가사 또는 악보의 표시가 불필요한 경우에는 시퀀서(21) 및 표시부(14)에 의한 화상 신호의 출력 처리는 행해지지 않아도 좋다.(9) In the above embodiment, the performance evaluating apparatus is shown as an example of a performance evaluating apparatus for evaluating music performance, which is installed in a karaoke system for a vocal chest and evaluates a chant performance. The present invention is not limited to the evaluation but can be applied to evaluation of music performance using various musical instruments. That is, the word " vowel " used in the above embodiment can be replaced with the more general term " performance ". Further, in the performance evaluating apparatus for evaluating instrument performance, evaluations are made with respect to expressive performance according to individual musical instruments, such as chalking in a guitar. When the musical composition is not a chant but a music piece for a musical instrument, the karaoke system for musical instrument performance may be configured such that the song data MD includes, instead of the lyrics track TR _LY , data indicating musical scores and intervals The sequencer 21 and the display unit 14 are configured to include a music track in which the delta time representing the display time of the music data (for example, a block of two measures or four measures) And outputs an image signal indicative of the score according to the accompaniment position to the display in accordance with the progress. In the case where the display of the lyrics or the score is unnecessary in the karaoke system for karaoke and the karaoke system for musical instrument performance, the output processing of the image signal by the sequencer 21 and the display unit 14 may not be performed.

(10) 이상의 예시로부터 이해되는 바와 같이, 본 발명의 바람직한 형태에 의한 연주 평가 장치는 도 13에 예시되는 바와 같이 악곡의 연주 중에 행해져야 할 표정 연주와 상기 표정 연주가 상기 악곡에 있어서 행해져야 할 타이밍을 상기 악곡에 포함되는 노트 또는 노트군의 발음 개시 시각을 기준으로 해서 나타내는 표정 연주 레퍼런스 데이터를 취득하는 표정 연주 레퍼런스 데이터 취득 수단(101)과, 연주자에 의한 상기 악곡의 연주음으로부터 상기 연주음의 피치 및 음량을 나타내는 피치 음량 데이터를 생성하는 피치 음량 데이터 생성 수단(102)과, 상기 피치 음량 데이터 생성 수단(102)에 의해 생성된 상기 피치 음량 데이터에 의해 나타내어지는 피치 및 음량 중 적어도 한쪽의 특성이, 상기 악곡에 있어서의 상기 표정 연주 레퍼런스 데이터에 의해 나타내어지는 소정 시간 범위 내에 있어서 상기 표정 연주 레퍼런스 데이터에 의해 행해져야 한다고 여겨지는 표정 연주의 특성을 나타내는 경우, 상기 연주자에 의한 상기 악곡의 연주에 대한 평가를 향상시키는 연주 평가 수단(103)을 구비하는 장치로서 포괄적으로 표현되고, 다른 요소의 유무나 다른 요소의 구체적인 형태는 임의이다.(10) As can be understood from the above examples, the performance evaluating apparatus according to the preferred embodiment of the present invention is characterized in that the performance evaluating apparatus according to the preferred embodiment of the present invention includes: (101) for obtaining facial expression reference data representing a note or a note group included in the musical composition based on a pronunciation start time of the note or note group; A pitch loudness data generating means (102) for generating pitch loudness data indicative of a pitch and a volume, at least one of pitch and loudness represented by the pitch loudness data generated by the pitch loudness data generating means Is determined by the expression performance reference data in the musical composition, And performance evaluation means (103) for improving the evaluation of performance of the musical piece by the performer when it indicates characteristics of a facial expression performance that should be performed by the facial expression reference data within a predetermined time range to be played And the like, and the specific form of the presence or absence of other elements and other elements is arbitrary.

(11) 상기 실시형태에서는 소위 전용기로서의 노래방 장치에 본 발명에 의한 연주 평가 장치가 설치되어 있는 예를 나타냈지만, 본 발명에 의한 연주 평가 장치는 전용기에 한정되지 않는다. 예를 들면, 퍼스널 컴퓨터나 휴대 정보 단말(예를 들면, 휴대 전화기나 스마트폰)이나 게임 장치 등의 각종 장치에 프로그램에 따른 처리를 행하게 함으로써 본 발명에 의한 연주 평가 장치를 실현하는 구성이 채용되어도 좋다. 또한, 이 프로그램은 CD-ROM 등의 기록 매체에 격납하여 배포하거나, 인터넷 등의 전기 통신 회선을 이용해서 배포하거나 하는 것도 가능하다.(11) In the above embodiment, the performance evaluation apparatus according to the present invention is provided in a karaoke system as a so-called dedicated machine. However, the performance evaluation apparatus according to the present invention is not limited to a dedicated machine. For example, even if a configuration for realizing a performance evaluation apparatus according to the present invention is adopted by causing various apparatuses such as a personal computer and a portable information terminal (for example, a mobile phone or a smart phone) good. The program may be stored in a recording medium such as a CD-ROM and distributed, or may be distributed using an electric communication line such as the Internet.

본 출원은 2012년 4월 18일 출원의 일본 특허출원, 일본 특허출원 2012-094853에 의거하는 것이며, 그 내용은 여기에 참조로서 도입된다.This application is based on Japanese patent application filed on April 18, 2012 and Japanese Patent Application 2012-094853, the contents of which are incorporated herein by reference.

(산업상의 이용 가능성)(Industrial availability)

본 발명에 의하면, 연주자에 의해 표정 연주가 행해진 경우 인간의 감성과의 괴리가 적은 평가를 행하는 것이 가능하다.According to the present invention, it is possible to perform evaluation with less discrepancy from human emotion when a player performs a facial expression performance.

1 : 가창 평가 시스템 10 : 노래방 장치
11 : 음원 12 : 스피커
13 : 마이크로폰 14 : 표시부
15 : 통신 인터페이스 16 : 보컬 어댑터
17 : CPU 18 : RAM
19 : ROM 20 : 하드디스크
21 : 시퀀서 30 : 서버 장치
35 : 통신 인터페이스 37 : CPU
38 : RAM 39 : ROM
40 : 하드디스크 90 : 네트워크1: singing evaluation system 10: karaoke system
11: Sound source 12: Speaker
13: microphone 14:
15: communication interface 16: vocal adapter
17: CPU 18: RAM
19: ROM 20: hard disk
21: Sequencer 30: Server device
35: Communication interface 37: CPU
38: RAM 39: ROM
40: Hard disk 90: Network

Claims

A pitch volume data acquiring means for acquiring pitch volume data indicating the pitch and the volume of the performance sound with respect to the performance sound of the music piece by the player,
When at least one of the pitch and the loudness represented by the pitch loudness data acquired by the pitch loudness data acquiring means indicates one of the characteristics of at least one facial expression performance predetermined in the musical composition, A facial expression appearance data generating means for generating facial expression appearance data representing a pair of timings based on the pronunciation start time of the note or note group included in the musical composition;
And a control unit for, based on the facial expression appearance data generated by the facial expression appearance data generation unit, with respect to each of the notes or the notes included in the piece of music, at any timing on the time axis with reference to the pronunciation start time of the note or note group The evaluation of when a facial expression to be performed during performance of the piece of music is performed at each of the time points on the time axis and the time point on the basis of the sounding start time using the specific information by specifying which facial expression appears at which frequency And a facial expression reference data generating means for generating facial expression reference data indicated by each pair of points.

A facial expression reference data generation device according to claim 1;
A facial expression reference data acquisition means for acquiring the facial expression reference data generated by the facial expression reference data generation apparatus;
Pitch volume data generating means for generating pitch volume data representing the pitch and volume of the performance sound from the performance sound of the musical composition by the player;
At least one of the pitch and loudness indicated by the pitch loudness data generated by the pitch loudness data generating means is within a predetermined time range indicated by the facial expression performance reference data in the music, And performance evaluation means for performing evaluation using the evaluation point corresponding to the appearance time of the expression performance when the characteristic of the expression performance is considered to be performed by the reference data.

3. The method of claim 2,
And a facial expression reference data storing means for storing facial expression reference data,
Wherein the facial expression reference data stored in the facial expression reference data storing means is rewritten based on the facial expression reference data generated by the facial expression reference data generating means.

The method according to claim 2 or 3,
And an example performance reference data acquiring means for acquiring example performance reference data indicating a pitch that is an exemplary pitch of the music,
Wherein the performance evaluating means evaluates the performance of the musical piece by the player based on the comparison between the pitch indicated by the pitch volume data generated by the pitch volume data generating means and the pitch indicated by the best performance reference data, In the performance evaluation device.

3. The method of claim 2,
And an example performance reference data acquiring means for acquiring example performance reference data indicating a pitch that is an exemplary pitch of the music,
Wherein the performance evaluating means evaluates the performance of the musical piece by the player based on the comparison between the pitch indicated by the pitch volume data generated by the pitch volume data generating means and the pitch indicated by the best performance reference data, Is evaluated,
The pitch loudness data acquired by the pitch loudness data acquiring means is transmitted to the other performance evaluating apparatus having the same result as the result of the evaluation performed using the best performance reference data by the performance evaluating means, Accompanied by performance evaluation data indicating a result of evaluation performed using data similar to the best performance reference data,
The facial expression reference data generating means generates the facial expression reference data by the facial expression appearance data generating means using the pitch volume data accompanying the performance evaluation data satisfying a predetermined condition among the pitch volume data acquired by the pitch volume data acquiring means And the expression performance reference data is generated based on the facial expression appearance appearance data.

A performance evaluating apparatus according to any one of claims 2, 3, and 5,
An accompaniment data acquiring means for acquiring accompaniment data indicating an accompaniment of a piece of music,
And sound signal output means for outputting a sound signal indicating the accompaniment tone according to the instruction of the accompaniment data,
Wherein the pitch loudness data generation means generates pitch loudness data indicating the pitch and the volume of the performance sound of the music performed by the player in accordance with the accompaniment soundproofed from the speaker in accordance with the sound signal output from the sound signal output means, .

The method according to claim 6,
Wherein the music piece is a song,
A lyric data acquiring means for acquiring lyric data representing lyrics of the singing music;
And image signal output means for outputting, as lyrics represented by the lyrics data, an image signal indicative of a lyrics to be pronounced together with an accompaniment indicated by a tone signal currently output by the tone signal output means Device.

The method according to claim 6,
The music piece is a piece of music to be played by the musical instrument,
Score data acquisition means for acquiring score data indicating a score of the music;
And an image signal output means for outputting an image signal indicating a score indicating a performance to be performed together with the accompaniment indicated by the sound signal currently output by the sound signal output means as the score indicated by the score data Features a karaoke device.

A facial expression appearing for obtaining the facial expression appearance data indicating that one facial expression appears at one timing based on the pronunciation start time of the note or note group included in the musical composition with respect to the performance sound of the music by the player Data acquisition means,
And a control unit for, based on the facial expression appearance data acquired by the facial expression appearance data acquisition unit, for each of the notes or the notes included in the musical tune, at any timing on the time axis based on the pronunciation start time of the note or note group The evaluation of when a facial expression to be performed during performance of the piece of music is performed at each of the time points on the time axis and the time point on the basis of the sounding start time using the specific information by specifying which facial expression appears at which frequency A facial expression reference data generating means for generating facial expression reference data indicated by each pair of points,
And transmission means for transmitting the facial expression reference data generated by the facial expression reference data generation means to the performance evaluation apparatus.

A facial expression performance to be performed during the performance of the music piece and a facial expression to acquire the first facial expression reference data indicating the timing at which the facial expression should be performed on the basis of the pronunciation start time of the note or note group included in the musical composition Performance reference data acquisition means,
Pitch volume data generating means for generating pitch volume data representing the pitch and volume of the performance sound from the performance sound of the musical composition by the player;
Wherein at least one of the pitch and the loudness indicated by the pitch loudness data generated by the pitch loudness data generating means is within a predetermined time range indicated by the first facial expression performance reference data in the musical composition, Performance evaluation means for improving evaluation of performance of the musical piece by the player when the performance data indicates a characteristic of a facial expression performance that should be performed by the first facial expression performance reference data;
Acquiring facial expression appearance data indicating that one facial expression appears at one timing based on the pronunciation start time of the note or note group included in the music piece by the player with respect to the performance sound of the music piece by the player A facial expression appearance data acquiring means for acquiring,
And a control unit for controlling the control unit so that the timing of the timing of the timing of the timing of the timing of the timing of the timing of the timing of the timing of the timing And the timing at which the facial expression should be performed in the musical composition by the player is determined based on the specified information, and the timing at which the facial expression should be performed during the performance of the musical composition by the player And a facial expression reference data generating means for generating second facial expression reference data representing the note included in the musical piece by the player or the note start time of the note group as a reference.

Acquiring pitch volume data indicating the pitch and the volume of the performance sound with respect to the performance sound of the music piece by the player,
When at least one of the pitch and loudness represented by the obtained pitch loudness data indicates one of characteristics of at least one facial expression performance predetermined in the musical composition, Generating facial expression appearance data indicating a pair of timings based on the pronunciation start time of the group,
It is possible to determine at which timing the expression of the expression appears at which frequency on the time axis based on the pronunciation start time of the note or note group with respect to each of the notes or notes included in the piece of music on the basis of the generated facial expression appearance data And a pair of evaluation reference points, which are indicated by pairs of evaluation points in the case where a performance performance to be performed during the performance of the musical piece is performed at each time on the time axis and the time on the basis of the sounding start time using the specific information, A method of generating facial expression reference data for generating data.

A pitch loudness data acquiring process for acquiring pitch loudness data indicating the pitch and loudness of the performance sound with respect to the performance sound of the music piece by the player,
When at least one of the pitch and loudness represented by the pitch loudness data obtained by the pitch loudness data obtaining process indicates one of the characteristics of at least one facial expression performance predetermined in the music, A facial expression appearance data generation process for generating facial expression appearance data representing a pair of timings based on the pronunciation start time of the note or note group included in the music;
On the basis of the facial expression appearance data generated by the facial expression appearance data generation processing, with respect to each of the notes or the notes included in the music, at any timing on the time axis based on the pronunciation start time of the note or note group The evaluation of when a facial expression to be performed during performance of the piece of music is performed at each of the time points on the time axis and the time point on the basis of the sounding start time using the specific information by specifying which facial expression appears at which frequency A facial expression reference data generation process for generating facial expression reference data represented by each pair of points.

3. The method of claim 2,
And an example performance reference data acquiring means for acquiring the best performance reference data indicating the volume or the pitch that is an example of the musical composition,
The performance evaluating means,
On the basis of a result of comparison between the volume or pitch indicated by the pitch volume data generated by the pitch volume data generation means and the volume or pitch indicated by the best performance reference data, Finding the first score (SR _NOR )
Based on a result of comparison between the volume or the pitch indicated by the pitch volume data generated by the pitch volume data generation means and the volume or pitch indicated by the expression performance reference data, (SR _NOR ) with respect to the second target point,
And performs a performance evaluation on performance of the musical composition based on the first score and the second score.