WO2013157602A1

WO2013157602A1 - Performance evaluation device, karaoke device, and server device

Info

Publication number: WO2013157602A1
Application number: PCT/JP2013/061488
Authority: WO
Inventors: 松本　秀一
Original assignee: ヤマハ株式会社
Priority date: 2012-04-18
Filing date: 2013-04-18
Publication date: 2013-10-24
Also published as: JP2013222140A; TWI497484B; KR101666535B1; CN104170006B; CN104170006A; KR20140124843A; TW201407602A; JP5958041B2

Abstract

This performance evaluation device is provided with: an expressiveness reference data acquisition means that acquires expressiveness reference data that shows expressiveness that should be achieved during the performance of a piece of music and timing at which the expressiveness should be achieved for that piece of music with utterance start timing for notes or note groups included in the piece of music as a standard; a pitch and volume data generating means that generates pitch and volume data showing the pitch and volume of the performance sound from the performance sound of the piece of music by a performer; and a performance evaluation means that improves evaluation of the performance of the piece of music by the performer when at least one of the characteristics of pitch and volume shown by the pitch and volume data generated by the pitch and volume data generating means exhibits expressiveness characteristics that should be achieved according to the expressiveness reference data within a prescribed time range shown by the expressiveness reference data for the piece of music.

Description

Performance evaluation apparatus, karaoke apparatus, and server apparatus

This invention relates to a technique for evaluating the skill of music performance.

For example, various techniques relating to a karaoke apparatus for singing having a scoring function for scoring the skill of a singer's singing performance (hereinafter simply referred to as “karaoke apparatus” unless otherwise specified) have been proposed. There is Patent Document 1 as a document disclosing this kind of technology. The karaoke device disclosed in this document calculates the difference between the pitch extracted from the user's singing sound and the pitch extracted from the data prepared in advance as the guide melody for each note of the singing song, and based on this difference Calculate the basic score. Moreover, this karaoke apparatus calculates the bonus point according to the frequency | count that the singing was performed, when the singing using techniques, such as a vibrato and a shawl, was performed. This karaoke device presents the total score of the basic score and bonus points to the user as the final evaluation result. According to this technology, singing that makes full use of highly difficult techniques such as vibrato and shackle can be reflected in the evaluation results.

Further, for example, Patent Documents 2 to 6 disclose documents that disclose a technique for detecting a singing using a technique such as vibrato or shackle from a waveform indicating a singing sound.

Japanese Unexamined Patent Publication No. 2005-107334 Japanese Unexamined Patent Publication No. 2005-107330 Japanese Laid-Open Patent Publication No. 2005-107087 Japanese Unexamined Patent Publication No. 2008-268370 Japanese Unexamined Patent Publication No. 2005-107336 Japanese Unexamined Patent Publication No. 2008-225115

However, in the case of the technique of Patent Document 1, even if such singing is performed for a singing place where it is not preferable to perform singing using techniques such as vibrato and shackle, It will be added. For this reason, there is a problem that the score presented as the evaluation result deviates from that due to human sensitivity.

The present invention has been made in view of such a problem, and an object of the present invention is to make it possible to present an evaluation result closer to human sensitivity in the evaluation of music performance such as karaoke singing.

In order to solve the above-mentioned problems, the present invention relates to the expression performance to be performed during the performance of the music and the timing at which the expression performance should be performed in the music with reference to the pronunciation start time of the note or note group included in the music. Facial expression performance reference data acquisition means for acquiring facial expression performance reference data to indicate, pitch volume data generation means for generating pitch volume data indicating the pitch and volume of the performance sound from the performance sound of the music by the performer, and the pitch volume At least one characteristic of pitch and volume indicated by the pitch volume data generated by the data generation means should be performed by the facial expression performance reference data within a predetermined time range indicated by the facial expression performance reference data in the music piece. Characteristics of facial expression performance If shown, it provides a performance evaluation apparatus and a playing evaluating means to improve the evaluation of the performance of the music by the player.

The present invention also provides the performance evaluation apparatus, accompaniment data acquisition means for acquiring accompaniment data for instructing accompaniment of music, and sound signal output means for outputting a sound signal indicating a musical sound of accompaniment according to the instruction of the accompaniment data The pitch volume data generation means includes the pitch and volume of the performance sound of the music performed by the performer according to the accompaniment emitted from a speaker according to the sound signal output from the sound signal output means A karaoke apparatus for generating pitch sound volume data indicating the above is provided.

Further, the present invention relates to each of the performance sounds of music by an arbitrary number of arbitrary performers, and one facial expression performance appears at one timing based on the pronunciation start time of notes or note groups included in the music Each of a note or a group of notes included in the music based on an expression performance appearance data acquisition means for acquiring expression performance appearance data indicating that, and an arbitrary number of expression performance appearance data acquired by the expression performance appearance data acquisition means Is performed during the performance of the musical piece according to the specified information, specifying which facial expression performance appears at which timing with reference to the sound generation start time of the note or group of notes. The expression of a note or a group of notes included in the music indicates the power expression performance and the timing at which the expression performance should be performed in the music A server apparatus comprising facial expression performance reference data generating means for generating facial expression performance reference data indicating the start time as reference, and transmitting means for transmitting facial expression performance reference data generated by the facial expression performance reference data generating means to the performance evaluation apparatus I will provide a.
Further, the present invention is a singing evaluation system, wherein a facial expression performance to be performed during the performance of a musical piece and a timing at which the facial expression performance is to be performed in the musical piece are determined as a pronunciation start time of a note or a note group included in the musical piece. Facial expression performance reference data acquisition means for acquiring first facial expression performance reference data shown as a reference; pitch volume data generation means for generating pitch volume data indicating the pitch and volume of the performance sound from the performance sound of the music by the performer; And at least one of the pitch and volume characteristics indicated by the pitch volume data generated by the pitch volume data generation means is within the predetermined time range indicated by the first facial expression performance reference data in the music. Should be done with performance reference data The performance evaluation means for improving the performance of the music performed by the performer and each of the performance sounds of the music performed by an arbitrary number of the performers. Expression performance appearance data acquisition means for acquiring expression performance appearance data indicating that one expression performance has appeared at one timing based on a pronunciation start time of a note or a group of notes included in the music, and the expression performance appearance data Based on the arbitrary number of facial expression performance appearance data acquired by the acquisition means, any timing with respect to each note or group of notes included in the musical piece by the arbitrary player based on the pronunciation start time of the notes or group of notes And which facial expression performance appears at what frequency, and according to the identified information, the arbitrary performance The facial expression performance to be performed during the performance of the musical piece by and the timing at which the facial expression performance should be performed in the musical piece by the arbitrary player, based on the pronunciation start time of the note or the note group included in the musical piece by the arbitrary player There is provided a singing evaluation system comprising expression performance reference data generating means for generating second expression performance reference data to be shown.
The present invention also provides facial expression performance reference data that indicates the facial expression performance to be performed during the performance of a musical piece and the timing at which the facial expression performance is to be performed in the musical piece, with reference to the pronunciation start time of the note or note group included in the musical piece. And generating pitch volume data indicating the pitch and volume of the performance sound from the performance sound of the music by the performer, and at least one of the characteristics of the pitch and volume indicated by the pitch volume data is A performance evaluation method for improving the performance of the music performed by the performer when the characteristics of the facial expression performance that should be performed by the expression performance reference data within a predetermined time range indicated by the expression performance reference data I will provide a.
In addition, the present invention is a computer-executable program, in which a facial expression performance to be performed during the performance of a musical piece and a timing at which the facial expression performance should be performed in the musical piece are pronounced in a note or a group of notes included in the musical piece Expression performance reference data acquisition processing for acquiring expression performance reference data indicating the start time as a reference, and pitch volume data generation processing for generating pitch volume data indicating the pitch and volume of the performance sound from the performance sound of the music by the performer And at least one of the pitch and volume characteristics indicated by the pitch volume data generated by the pitch volume data generation means is within the predetermined time range indicated by the expression performance reference data in the music piece. Should be done by When showing the characteristics of the expression performance that is, to provide a program for executing a performance evaluation process for improving the evaluation of the performance of the music by the player to the computer.

According to the present invention, when a desired facial expression performance is performed at a desired timing in the performance of each piece of music, a performance evaluation device that gives a high evaluation to the performer is realized. As a result, when an expression performance is performed by the performer, the evaluation is performed with little deviation from human sensitivity.

It is a figure which shows the structure of the song evaluation system which is one Embodiment of this invention. It is a figure which shows the waveform of the singing sound of Tame. It is a figure which shows the waveform of the vibrato song sound. It is a figure which shows the waveform of the song sound of Kobushi. It is a figure which shows the waveform of the song sound of a shakuri. It is a figure which shows the waveform of the fall singing sound. It is a flowchart which shows operation | movement of the song evaluation system which is one Embodiment of this invention. It is an example of the statistical data produced | generated about the seed. It is an example of the statistical data produced | generated about vibrato. It is an example of the statistical data produced | generated about Kobushi. It is an example of the statistical data produced | generated about the chestnut. It is an example of the statistical data produced | generated about the fall. It is a block diagram which shows the performance evaluation apparatus of this invention.

Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 is a diagram showing a configuration of a singing evaluation system 1 according to an embodiment of the present invention. The singing evaluation system 1 includes a karaoke device 10-m (m = 1, 2,... M: M is the total number of karaoke devices) and a server device 30. One or a plurality of karaoke apparatuses 10-m are installed in each karaoke store. The server device 30 is installed in the system management center. The karaoke apparatus 10-m and the server apparatus 30 are connected to the network 90, and can transmit and receive various data to and from each other.

The karaoke device 10-m is a device that performs singing effects through sound emission of accompaniment music that supports the user's singing and display of lyrics, and evaluation of the skill of the user's singing. Here, the karaoke apparatus 10-m evaluates the skill of the singing skill by evaluating the pitch and volume of the user's singing sound and the following five types of facial expression singing. The score that is the evaluation result of the two evaluations is presented to the user together with the comment message.
a1. Tame This is a facial expression song that intentionally delays the singing of a specific sound in the song. As shown in FIG. 2, when this singing is performed, the time at which the pitch of the sound changes from the sound before the singing sound to that of the sound corresponds to both sounds in the score (exemplary singing). It is delayed by a slight time from the transition time of the two notes (notes).
b1. Vibrato This is a facial expression song that vibrates finely while maintaining the apparent pitch of a specific sound in the song. As shown in FIG. 3, when this singing is performed, the pitch of the singing sound periodically changes across the height of the note corresponding to the sound in the score.
c1. Kobushi This is a facial expression song that changes the tone of a specific sound in the song so that it sings during pronunciation. As shown in FIG. 4, when this singing is performed, the pitch of the singing sound rises temporarily in the middle of the note corresponding to the sound in the score.
d1. Shakuri This is a singing technique in which a specific sound in a song is pronounced with a voice lower than the original pitch and then brought close to the original pitch. As shown in FIG. 5, when this singing is performed, the pitch of the singing sound at the sounding start time is lower than the height of the note corresponding to the sound in the score. Then, the pitch of this singing sound rises slowly after the start of sounding and reaches almost the same height as the note.
e1. Fall This is a singing technique in which a specific sound in a song is pronounced with a voice higher than its original height and then brought close to its original height. As shown in FIG. 6, when this singing is performed, the pitch of the singing sound at the sounding start time is higher than the height of the note corresponding to the sound in the score. The pitch of the singing sound gradually falls after the start of sounding and reaches almost the same height as the note.

Returning to FIG. 1, the explanation of the entire singing evaluation system 1 will be continued. The karaoke apparatus 10-m includes a sound source 11, a speaker 12, a microphone 13, a display unit 14, a communication interface 15, a vocal adapter 16, a CPU 17, a RAM 18, a ROM 19, a hard disk 20, and a sequencer 21. Sound source 11 outputs a sound signal S _A in accordance with the various messages of MIDI (Musical Instrument Digital Interface). The speaker 12 emits a given signal as sound. The microphone 13 collects sound and outputs a sound collection signal S _M. The display unit 14 displays an image corresponding to the image signal S _I. The communication interface 15 transmits / receives data to / from devices connected to the network 90.

The vocal adapter 16 measures the pitch and volume of the sound signal S _M , and generates pitch volume data indicating their temporal changes. Specifically, the vocal adapter 16 detects the pitch of the sound signal S _M given from the microphone 13 every time T _S (for example, T _S = 30 milliseconds), and the detection result is the signal S _P. Output as. The vocal adapter 16 detects the volume of the sound signal S _M given from the microphone 13 every time T _S and outputs the detection result as a signal S _L.

The CPU 17 executes a program stored in the ROM 19 or the hard disk 20 while using the RAM 18 as a work area. Details of the operation of the CPU 17 will be described later. The ROM 19 stores IPL (Initial Program Loader) and the like. The hard disk 20 stores song data MD-n (n = 1 to N) (N is the total number of song types), a reference database DBRK, and a song evaluation program VPG of various song songs. The song data MD-n of each song is data in which the accompaniment content of the song, the lyrics of the song, and the exemplary song content of the song are recorded in SMF (Standard MIDI File) format.

More specifically, as shown in the frame of FIG. 1, the music data MD-n has a header HD, an accompaniment track TR _AC , a lyrics track TR _LY , and a model song reference track TR _NR . In the header HD, information such as a song number, a song title, a genre, a performance time, and a time base (the number of ticks corresponding to the time of one quarter note) is described.

The accompaniment track TR _AC, each note NT (i) in the score of the accompaniment part of singing songs (i indicates the order counted from the beginning of the notebook NT of the score of the relevant part (1)) indicate the pronunciation of the sound of The event EV (i) _ON to be _{turned on} , the event EV (i) _OFF instructing to mute the event EV, and the delta time DT indicating the execution time difference (number of ticks) of the succeeding events are described in chronological order.

The lyrics track TR _LY includes each data D _LY indicating the lyrics of the song and the display time of each lyrics (more specifically, the time difference between the display time of each lyrics and the display time of each previous lyrics) Delta time DT indicating (number of ticks)) is described in chronological order.

The model singing reference track TR _NR includes an event EV (i) _ON for instructing the sound of each note NT (i) in the singing part of the score of the song, and an event EV (i) _OFF for instructing to mute the sound. A delta time DT indicating a difference in execution time (number of ticks) between successive events is described in chronological order.

The reference database DBRK stores five types of facial expression singing reference data DD _a1 , DD _a2 , DD _a3 , DD _a4 , DD _a5 . The facial expression singing reference data DD _a1 is obtained when the singing is performed at each time t on the time axis with the pronunciation start time of the note NT (i) included in the singing song as the reference point t _BS and at those times t. Is a data showing each pair of evaluation points VSR (t). The facial expression singing reference data DD _a2 is obtained when the vibrato singing is performed at each time t on the time axis with the pronunciation start time of the note NT (i) included in the singing song as a reference point t _BS and those times t. Is a data showing each pair of evaluation points VSR (t). The facial expression singing reference data DD _a3 is obtained when each time t on the time axis with the pronunciation start time of the note NT (i) included in the singing song as a reference point t _BS and the time t at which the singing is performed by Kobushi. Is a data showing each pair of evaluation points VSR (t). The facial expression singing reference data DD _a4 is obtained when the singing is performed at each time t on the time axis with the pronunciation start time of the note NT (i) included in the singing song as the reference point t _BS and at the time t. Is a data showing each pair of evaluation points VSR (t). The facial expression singing reference data DD _a5 is obtained when the singing by the fall is performed at each time t on the time axis with the sound generation start time of the note NT (i) included in the singing song as the reference point t _BS and at those times t Is a data showing each pair of evaluation points VSR (t). Hereinafter, the five types of facial expression song reference data DD _a1 , DD _a2 , DD _a3 , DD _a4 , DD _a5 are referred to as facial expression song reference data DD.

The song evaluation program VPG has the following three functions.
a2. Standard Evaluation Function This is because each note NT (determined by each event EV (i) _ON and EV (i) _OFF in the exemplary singing reference track TR _NR indicated by the output signals S _L and S _P of the vocal adapter 16 This is a function for comparing the model pitch PCH _REF and the model volume LV _{REF of} i) and evaluating the skill of singing based on the result of this comparison.
b2. Expression singing evaluation function which, each time the characteristic waveform expression singing appear in the pitch waveform indicated by the output signal S _P output vocal adapter 16, the reference point pronunciation start time of notebook NT is the subject expression singing (i) The appearance time of the feature waveform of the facial expression song on the time axis as t _BS is obtained, and the evaluation point VSR (t) corresponding to this appearance time is set as each evaluation point VSR of the corresponding facial expression song reference data DD in the reference database DBRK ( This is a function of selecting from t) and evaluating the skill of singing based on this evaluation point VSR (t).
c2. Evaluation result presentation function This is a function for calculating a score from the evaluation result of the evaluation by a2 and the evaluation result of the evaluation by b2, and displaying the score on the display unit 14 together with the comment message.

When the song data MD-n of the corresponding song is transferred from the hard disk 20 to the RAM 18 in response to the singing start operation of the song by a remote controller (not shown), the sequencer 21 performs an event EV in the song data MD-n. (I) _ON , EV (i) _OFF , and data _DLY are supplied to each part of the apparatus. Specifically, when the music piece data MD-n is stored in the RAM 18, the sequencer 21 stores the time base described in the header HD of the music piece data MD-n and the tempo designated by the remote controller (not shown). The time length of one tick is determined based on the above, and the following three processes are performed while counting ticks as the time length elapses.

In the first processing, the sequencer 21 reads out the event EV (i) _ON following thereafter each time the count number of ticks matches the delta time DT in accompaniment track TR _AC (or EV (i) _OFF) Instrument 11 is supplied. When the event EV (i) _ON is supplied from the sequencer 21, the sound source 11 supplies the sound signal S _A specified by the event EV (i) _ON to the speaker 12, and the event EV (i) _OFF is supplied from the sequencer 21. Then, the supply of the sound signal S _A to the speaker 12 is stopped.

In the second process, the sequencer 21 reads the subsequent data _DLY and supplies it to the display unit 14 every time the tick count matches the delta time DT in the lyrics track _TRLY . When the data D _LY is supplied from the sequencer 21, the display unit 14 converts the data D _LY into a lyrics telop image, and displays the image on a display (not shown).

When the sequencer 21 performs the first and second processes, the accompaniment sound is emitted from the speaker 12 and the lyrics are displayed on the display. The user sings the lyrics displayed on the display toward the microphone 13 while listening to the accompaniment sound emitted from the speaker 12. While the user is singing into the microphone 13, the microphone 13 outputs a collected sound signal S _M of the user's singing sound, vocal adapter 16 signal S _P and showing the pitch and volume of the signal S _M S _L is output.

In the third processing, the sequencer 21, counts the number of ticks read event EV (i) _ON following thereafter every time matches the delta time DT within model singing Reference track TR _NR (or EV (i) _OFF) To the CPU 17. The CPU 17 evaluates the skill of the user's singing using the events EV (i) _ON and EV (i) _OFF supplied from the sequencer 21 and the output signals S _P and S _{L of the} vocal adapter 16. Details will be described later.

The server device 30 is a device that plays a role of supporting the provision of services at a karaoke store. The server device 30 includes a communication interface 35, a CPU 37, a RAM 38, a ROM 39, and a hard disk 40. The communication interface 35 transmits / receives data to / from devices connected to the network 90. The CPU 37 executes various programs stored in the ROM 39 and the hard disk 40 while using the RAM 38 as a work area. Details of the operation of the CPU 37 will be described later. The ROM 39 stores IPL and the like.

The hard disk 40 stores a song sample database DBS, a reference database DBRS, and a song analysis program APG. In the singing sample database DBS, singing sample data DS groups each corresponding to one singing song are individually stored. The singing sample data DS is data in which a pitch waveform and a volume waveform of a singing sound when a person who has a singing ability exceeding a certain level sings a singing song is recorded. The reference database DBRS stores the latest facial expression singing reference data DD to be stored in the reference database DBRK of each karaoke apparatus 10-m.

The song analysis program APG has the following three functions.
a3. Accumulation function This is a function for acquiring the song sample data DS for each song from the karaoke apparatus 10-m one by one, and accumulating the acquired song sample data DS in the song sample database DBS.
b3. Rewriting function This is to search the characteristic waveform of the facial expression song from the waveform indicated by the song sample data DS for each of the song sample data DS stored in the song sample database DBS, and to be the target of the facial expression song from the search result. Statistical data indicating the relationship between each time t on the time axis with the pronunciation start time of the note NT (i) as the reference point t _BS and the number of facial expression songs Num at those times t is generated, and the reference database DBR It is a function which rewrites the evaluation score VSR (t) corresponding to each time t in the facial expression song reference data DD based on the contents of the statistical data.
c3. Transmission Function This is a function for transmitting facial expression song reference data DD rewritten by the rewriting function to the karaoke apparatus 10-m in response to a request from the karaoke apparatus 10-m.

Next, the operation of this embodiment will be described. FIG. 7 is a flowchart showing the operation of this embodiment. In FIG. 7, the CPU 17 of the karaoke apparatus 10-m supplies a control signal S _O to the sequencer 21 when the singing start operation of the song is performed (S100: Yes), and processes the sequencer 21 (the above-described first process). To (third process) are started (S120). CPU17 will perform two processes, a standard song evaluation process (S130) and a facial expression song evaluation process (S140), if the process by the sequencer 21 starts. Details of these two processes are as follows.

a4. Standard song evaluation process (S130)
In this processing, the CPU 17 determines the time from when the event EV (i) _ON is supplied from the sequencer 21 to when the next event EV (i) _OFF is supplied to the sound corresponding to the i-th note NT (i). Let the pronunciation time T _NT (i). CPU17, the difference PCH _DEF of a model pitch PCH _REF output signal S _P output vocal adapter 16 converts the note number of the pitch and event EV (i) _ON shown during the sounding time T _NT (i), and in between determining a difference LV _DEF of a model volume LV _REF obtained by converting the volume and event EV (i) velocity _oN indicated by the signal S _P, notebook NT if this difference PCH _DEF and differences LV _DEF is within a predetermined range (i) It is determined that the singing is successful. The CPU 17 performs this note determination from the start to the end of the singing by the user, and divides the number of all notes TN (i) at the end of the singing by the number of the notes NT (i) determined to be acceptable. A value obtained by multiplying the obtained value by 100 is defined as a basic score SR _BASE .

Further, in this process, CPU 17 determines, in a pitch waveform indicated by the output signal S _P output vocal adapter 16, Tame, vibrato, fist, jerking, whether any of the expression singing features waveform fall appeared . Here, details of the method for determining the feature waveform of the patent are disclosed in Patent Document 2, details of the method for determining the characteristic waveform of the vibrato are described in Patent Document 3, details of the method of determining the feature waveform of Kobushi are described in Patent Document 4, and Refer to Patent Document 5 for details of the feature waveform determination method, and Patent Document 6 for details of the fall feature waveform determination method. The CPU 17 performs this characteristic waveform determination from the start to the end of the singing by the user, and sets a value obtained by multiplying the number of appearances of the facial expression song at the end of the singing by a predetermined coefficient as the addition point SR _ADD . In this process, the total of the basic score SR _BASE and the addition point SR _ADD is set as the standard score SR _NOR .

b4. Expression song evaluation process (S140)
In this process, the CPU 17 sets the time from the output of the sound source event EV (i) _{ON to} the output of the next event EV (i) _OFF as the sound generation time T _NT (i) corresponding to the i-th note NT (i). ). Then, CPU 17, when the characteristic waveform expression singing in pitch waveform indicated by the output signal S _P output vocal adapter 16 between the sounding time T _NT (i) are noticed, within sounding time T _NT (i) Find the appearance time of the facial expression song and the type of facial expression song that appeared. The CPU 17 generates facial expression song appearance data indicating the type and appearance time of the facial expression song specified as described above.

The CPU 17 selects the facial expression song indicated in the generated facial expression song appearance data and the evaluation point VSR (t) corresponding to the appearance time from the series of evaluation points VSR (t) indicated by the facial expression song reference data DD. To do. The CPU selects such evaluation points VSR (t) from the start to the end of singing by the user, and the average value of the evaluation points VSR (t) at the end of the singing is used as the facial expression score SR _EX. And

CPU17 will perform an evaluation result presentation process, after the song of the song by a user is complete | finished (S150). In the evaluation result presentation process, the CPU 17 selects a higher score from the standard score SR _NOR scored by the standard song evaluation process and the facial score SR _EX scored by the facial expression song evaluation process. Then, CPU17 is, if you choose the standard score SR _NOR, and this score SR _NOR, to display a comment messages in accordance with the score SR _NOR for example, such as "It is cool and refined song" on the display unit 14. In addition, CPU17 is, if you choose a facial expression score SR _EX, this and score SR _EX, for example, to display a comment message corresponding to the facial expression score such as "I have full of kindness" SR _EX on the display unit 14.

Next, the CPU 17 performs a sample transmission process (S160). Sample transmission process, CPU 17 is vocal signal S _P and S _L adapter 16 has output a singing sample data DS of the singing music piece, steps and the singing sample data DS between the start and end of singing singing voice A message MS1 including the basic score SR _BASE (singing evaluation data) obtained in S130 is transmitted to the server device 30.

When the CPU 37 of the server device 30 obtains the message MS1 from the karaoke device 10-m (S200: Yes), the singing sample data DS and the basic score SR _BASE are extracted from this message MS1, and this basic score SR _BASE is obtained from the advanced player. It is compared with a reference score SR _TH (for example, 80 points) that separates those who are not (S220). When the basic score SR _BASE is higher than the reference score SR _TH (S220: Yes), the CPU 37 accumulates the song sample data DS extracted from the message MS1 in the song sample database DBS (S230).

Subsequently, the CPU 37 performs a rewriting process (S240). In the rewriting process, the CPU 37 performs the following five processes. In the first process, the CPU 37 searches for the characteristic waveform of the ticks from within the pitch waveform indicated by each singing sample data DS stored in the singing sample database DBS, and the facial expression singing appearance data indicating the search results (the appearance of the ticks). generating a note NT data indicating each time t on the time axis of the reproduction starting time of (i) a reference point t _BS). Then, CPU 37, based on the expression singing occurrence data generated relates Tame, expression singing at each time t and their time t on the time axis to the reproduction starting time of the notebook NT (i) as a reference point t _BS "tame" Statistical data showing the relationship with the number of occurrences Num of the synthesizer, and the evaluation point VSR (t) corresponding to each time t in the facial expression singing reference data DD _a1 is rewritten based on the contents of this statistical data.

FIG. 8 is a diagram illustrating an example of statistical data on the eggs. The statistics of this example, the number of occurrences Num expression singing between the reference point t _BS time T1 _a1 only before time t1 _a1 and the reference point t _BS than the time T4 _a1 time t4 after only _a1 is distributed Yes. In the statistical data of this example, the maximum peak of the number of appearances Num appears at the time t2 _a1 immediately after the reference point t _BS , and the second peak of the number of appearances Num at the time t3 _a1 later than the time t2 _a1. Appears. Therefore, in the facial expression song reference data DD _a1 after rewriting by the statistical data of this example, the evaluation point VSR (t2 _a1 ) at the time t2 _a1 is the highest, and the evaluation point VSR (t3 _a1 ) at the time t3 _a1 is the second. Get higher.

In the second process, the CPU 37 searches for the characteristic waveform of vibrato from within the pitch waveform indicated by each singing sample data DS stored in the singing sample database DBS, and facial expression singing appearance data (vibrato has appeared) indicating the search result. generating a note NT data indicating each time t on the time axis of the reproduction starting time of (i) a reference point t _BS). Subsequently, based on the expression song appearance data generated for the vibrato, the CPU 37 uses the time t on the time axis where the pronunciation start time of the note NT (i) is the reference point t _BS and the number Num of appearances of the expression song at those times t. Is generated, and the evaluation score VSR (t) corresponding to each time t in the facial expression song reference data DD _a2 is rewritten based on the contents of the statistical data.

FIG. 9 is a diagram illustrating an example of statistical data on vibrato. In the statistics example, appearance number Num expression singing between the reference point t _BS and the reference point t _BS than the time T2 _a2 time after only t2 _a2 are distributed. Then, in the statistical data example, to be the reference point t _BS time T1 _a2 only after the time t1 _a2 is the maximum peak number of occurrences Num has appeared. Therefore, in the facial expression song reference data DD _a2 after rewriting by the statistical data of this example, the evaluation point VSR (t1 _a2 ) at the time t1 _a2 is the highest.

In the third process, the CPU 37 searches for the characteristic waveform of Kobushi from within the pitch waveform indicated by each singing sample data DS stored in the singing sample database DBS, and the facial expression singing appearance data (Kobushi appears) indicating the search result. generating a note NT data indicating each time t on the time axis of the reproduction starting time of (i) a reference point t _BS). Subsequently, based on the facial expression song appearance data generated for Kobushi, the CPU 37 uses each time t on the time axis with the pronunciation start time of the note NT (i) as the reference point t _BS and the number of facial expression songs at those times t. Statistical data indicating the relationship with Num is generated, and the evaluation point VSR (t) corresponding to each time t in the facial expression song reference data DD _a3 is rewritten based on the contents of the statistical data.

FIG. 10 is a diagram illustrating an example of statistical data regarding Kobushi. In the statistics example, appearance number Num expression singing between the reference point t _BS and the reference point t _BS than the time T2 _a3 time after only t2 _a3 are distributed. Then, in the statistical data example, to be the reference point t _BS time T1 _a3 only after time t1 _a3 maximum peak number of occurrences Num has appeared. Therefore, in the facial expression song reference data DD _a3 after rewriting by the statistical data of this example, the evaluation point VSR (t1 _a3 ) at the time t1 _a3 is the highest.

In the fourth process, the CPU 37 searches for the characteristic waveform of the crisp from the pitch waveform indicated by each singing sample data DS stored in the singing sample database DBS, and the facial expression singing appearance data indicating the search result (the appearance of the crisp appears). generating a note NT data indicating each time t on the time axis of the reproduction starting time of (i) a reference point t _BS). Subsequently, based on the expression song appearance data generated for the shackle, the CPU 37 uses each time t on the time axis with the pronunciation start time of the note NT (i) as a reference point t _BS and the number of appearances of the expression song at those times t. Statistical data indicating the relationship with Num is generated, and the evaluation point VSR (t) corresponding to each time t in the facial expression song reference data DD _a4 is rewritten based on the contents of the statistical data.

FIG. 11 is a diagram illustrating an example of the statistical data regarding shackle. In the statistics example, appearance number Num expression singing between the reference point t _BS and the reference point t _BS than the time T2 _a4 time after only t2 _a4 are distributed. Then, the statistics in this example, the reference point t _BS have appeared up to the peak number of occurrences Num, the time t1 _a4 than the reference point t _BS delayed by time T1 _a4 is a second peak number of occurrences Num Appears. Therefore, in the facial expression song reference data DD _a4 after rewriting by the statistical data of this example, the evaluation point VSR (t _BS ) at the time t _BS is the highest, and the evaluation point VSR (t 1 _a4 ) at the time t1 _a4 is the second. Get higher.

In the fifth process, the CPU 37 searches for the characteristic waveform of the fall from within the pitch waveform indicated by each singing sample data DS stored in the singing sample database DBS, and the facial expression singing appearance data (fall has appeared) indicating this search result. generating a note NT data indicating each time t on the time axis of the reproduction starting time of (i) a reference point t _BS). Then, CPU 37, based on the expression singing occurrence data generated relates fall, the number of occurrences of facial expression singing at each time t and their time t on the time axis to the reproduction starting time of the notebook NT (i) as a reference point t _BS Statistical data indicating the relationship with Num is generated, and the evaluation point VSR (t) corresponding to each time in the facial expression song reference data DD _a5 is rewritten based on the contents of the statistical data.

FIG. 12 is a diagram illustrating an example of statistical data regarding a fall. The statistics of this example, the number of occurrences of facial expression singing between the reference point t _BS than the time T1 _a5 only after the time t1 _a5 and time t _BS from the time T2 _a5 only after the time t2 _a5 of Num is distributed . In the statistical data of this example, the maximum peak of the number of occurrences Num appears at time t2 _a5 . Therefore, in the facial expression song reference data DD _a5 after rewriting by the statistical data of this example, the evaluation point VSR (t2 _a5 ) at time t2 _a5 is the highest.

In FIG. 7, the CPU 17 of the karaoke apparatus 10-m performs inquiry processing every time a predetermined inquiry time arrives (S110: Yes) (S170). In this inquiry process, the CPU 17 transmits a message MS2 for requesting transmission of the latest data to the server device 30 (S170). When the CPU 37 of the server device 30 receives the message MS2 from the karaoke device 10-m (S210: Yes), the facial expression singing reference in which the contents are rewritten between the previous message MS2 reception time and the current message MS2 reception time. The data DD is transmitted to the karaoke apparatus 10-m that is the transmission source of the message M2 (S250). When receiving the facial expression song reference data DD from the server device 30, the CPU 17 of the karaoke apparatus 10-m overwrites the facial expression song reference data DD on the reference database DBRK and updates the content (S180).

The above is the details of the configuration of the present embodiment. According to this embodiment, the following effects can be obtained.
First, in the facial expression singing evaluation process according to the present embodiment, each time the characteristic waveform of the facial expression song appears in the waveform of the output signal of the vocal adapter 16, the pronunciation of the note NT (i) that is the target of the facial expression song is started. The appearance time of the feature waveform of the facial expression song on the time axis with the time as the reference point is obtained, and the evaluation point VSR (t) corresponding to this appearance time is selected from the evaluation points VSR (t) in the song reference data DD. The skill of singing is evaluated based on the selected evaluation point VSR (t). Therefore, according to this embodiment, even if the user performs facial expression singing, good evaluation cannot be obtained unless the timing is appropriate. Therefore, according to this embodiment, it is possible to present an evaluation result closer to that based on human sensitivity.

Secondly, in the present embodiment, for each of the facial expression song reference data DD stored in the song sample database DBS, the facial expression song characteristic waveform is searched from the waveform indicated by the data DD, and the facial expression song is obtained from the search result. Statistical data indicating the relationship between each time on the time axis with the pronunciation start time of the note NT (i) as a reference point being the reference point and the number of facial expression singings appearing at those times, and singing reference data DD The evaluation score VSR (t) corresponding to each time at is rewritten based on the contents of the statistical data. Therefore, according to this embodiment, the change of the tendency of how to sing advanced users who are singing a song can be reflected in the evaluation result.

Although one embodiment of the present invention has been described above, the present invention may have other embodiments. For example, it is as follows.
(1) In the above embodiment, CPU 17 has detected Tame, vibrato, fist, jerking, five types of expressions singing fall from the output signal S _P output vocal adapter 16. However, facial expressions other than these five types may be detected. For example, a song with inflection may be detected.

(2) In the above embodiment, the CPU 17 performs the standard singing evaluation process using both the output signals S _P and S _L of the vocal adapter 16 and indicates the pitch among the output signals S _P and S _L of the vocal adapter 16. was facial expression singing evaluation process by using only the signal S _P. However, CPU 17 may perform a standard singing evaluation process using only one signal S _P and S _L. Further, CPU 17 may perform facial expression singing evaluation process using both signals S _P and S _L.

(3) In the facial expression song evaluation process of the above embodiment, the skill of the song was evaluated based on the appearance time of the characteristic waveform of the facial expression song. However, the evaluation may be performed in consideration of elements other than the appearance time of the feature waveform of the facial expression song (for example, the length and depth of each of the choke, vibrato, kobushi, shakuri, and fall).

(4) In the facial expression song evaluation process of the above embodiment, a configuration is adopted in which a facial expression song that appears in the song sound corresponding to each of the notes included in the song song is adopted, but a series of plural songs included in the song song are included. The structure which detects the facial expression song which appears in the song sound according to the note (note group) may be employ | adopted. For example, a facial expression song such as crescendo decrescendo is a facial expression song performed in a series of notes, and it is desirable that detection and evaluation of those facial expressions be performed in units of notes. Therefore, it is desirable that the facial expression song reference data DD relating to such facial expression song is also configured in units of notes.

(5) In the above embodiment, the singing sample data DS (pitch) including the signals _SP and S _L output from the vocal adapter 16 to the server device 30 from the start to the end of the singing of the singing song. The sound volume data) is transmitted, and the server apparatus 30 employs a configuration in which each facial expression song is detected from the singing sample data DS and the timing of the appearance is specified. Alternatively, to the server device 30 from the karaoke device 10, transmits a sound signal S _M indicating the picked-up sound (sound waveform data indicating the singing sound) by microphones 13, the sound signal S _M in the server apparatus 30 processing for generating a signal S _p and the signal S _L (processing vocal adapter 16 in the above embodiment does) configuration may be employed that originate. Further, the karaoke device 10 transmits to the server device 30 data (facial singing appearance data) indicating the type of facial expression singing specified in the facial expression singing evaluation processing (S140) performed in accordance with the singing evaluation program VPG and the timing of its appearance. The server device 30 may employ a configuration in which the facial expression song reference data DD is updated based on the facial expression song appearance data transmitted from the karaoke device 10 without performing facial expression song detection processing.

(6) In the above-described embodiment, the server device 30 generates statistical data and rewrites the facial expression song reference data DD based on the statistical data. However, each of the karaoke apparatuses 10-m generates a sound signal S _M indicating a singing sound generated by the own apparatus in the past, or directly from another karaoke apparatus 10-m or via the server apparatus 30, and their sound signals S _The signal S _p and the signal S _L generated from _M , or data (expression song appearance data) indicating the type and expression timing of the expression song specified using these signals are stored in the hard disk 20, and the CPU 17 They may be read and used to perform processing similar to the processing performed by the server device 30 in S240, that is, generation of statistical data and rewriting of facial expression song reference data DD based thereon.

(7) The method of singing evaluation in the above embodiment and the manner of presenting the evaluation result to the singer can be variously changed. For example, in the above-described embodiment, the standard score SR _NOR is calculated by summing the addition point SR _ADD calculated based on the number of appearances of the expression song in the standard song evaluation process (S130) with the basic score SR _BASE. Although adopted, in the standard singing evaluation process, the appearance of the facial expression singing is not taken into account, and a configuration for calculating only the basic score SR _BASE may be adopted. In the above embodiment, the higher score of the standard score SR _NOR scored by the standard song evaluation process and the expression score SR _EX scored by the expression song evaluation process is displayed to the singer. The evaluation result for the singer may be presented in other manners such as displaying both and displaying their total score.

(8) In the above embodiment, when updating the facial expression singing reference data DD, a singer whose basic score SR _BASE is higher than the standard score SR _TH is regarded as an advanced person, and the facial expression singing reference is made using only the singing sample data DS relating to the advanced person. A configuration for updating the data DD is employed. The method of selecting the singing sample data DS used for updating the facial expression singing reference data DD is not limited to this. For example, the basic score instead of the SR _BASE, may be used standard scoring SR _NOR that the sum of the summing junction SR _ADD basic score SR _BASE as the basis for advanced estimation. In addition, in order to exclude an advanced player who has a high basic score SR _BASE because no facial expression singing is performed, an upper threshold is provided in addition to a lower threshold (reference score SR _TH ). The singing sample data DS of a singer with a higher basic score SR _BASE (or other score) may not be used for updating the facial expression song reference data DD. Also, instead of _{dividing the} singer into the advanced and the others as described above, for example, the singing sample data DS of the singer with a high basic score SR _BASE is given a high weight and the facial expression singing reference data DD is updated. You may make it use for.

(9) In the above embodiment, as an example of a performance evaluation device that evaluates a music performance, a performance evaluation device that is provided in a karaoke device for singing and evaluates a singing performance is shown. The present invention is not limited to the evaluation of singing performances, and can be applied to the evaluation of musical performances using various musical instruments. That is, the term “singing” used in the above embodiment is replaced with the more general term “performance”. Note that in a performance evaluation device that evaluates instrumental music performance, for example, choking on a guitar, and the like, evaluation regarding facial expression performance corresponding to each instrument is performed. When the music is not a song but a music for a musical instrument, the karaoke apparatus for musical instrument performance uses, for example, data indicating the score and each section of the score (for example, the song data MD instead of the lyrics track TR _LY (for example, The delta time indicating the display time of 2 bars or 4 bars) is configured to include a score track which is data described in chronological order, and the sequencer 21 and the display unit 14 follow the score track to progress the music. Accordingly, an image signal indicating a musical score corresponding to the accompaniment location is output to the display. In the karaoke apparatus for singing and the karaoke apparatus for playing musical instruments, if display of lyrics or score is unnecessary, the image signal output processing by the sequencer 21 and the display unit 14 may not be performed.
(10) As understood from the above examples, the performance evaluation apparatus according to a preferred aspect of the present invention is such that, as illustrated in FIG. Facial expression performance reference data acquisition means 101 for acquiring facial expression performance reference data that indicates the timing to be performed in reference to the pronunciation start time of a note or note group included in the music, and the performance from the performance sound of the music by the performer Pitch volume data generation means 102 for generating pitch volume data indicating the pitch and volume of sound, and at least one of the characteristics of pitch and volume indicated by the pitch volume data generated by the pitch volume data generation means 102 is Within a predetermined time range indicated by the expression performance reference data in the music And a performance evaluation means 103 for improving the performance of the music performed by the performer when the expression performance characteristics are supposed to be performed by the expression performance reference data. Expressed, the presence or absence of other elements and the specific mode of other elements are arbitrary.

(11) In the above embodiment, an example in which the performance evaluation device according to the present invention is provided in a karaoke device as a so-called dedicated device is shown, but the performance evaluation device according to the present invention is not limited to a dedicated device. For example, even if a configuration that realizes the performance evaluation device according to the present invention by causing various devices such as a personal computer, a portable information terminal (for example, a mobile phone or a smart phone), and a game device to perform processing according to a program is adopted. Good. Further, this program can be distributed by being stored in a recording medium such as a CD-ROM, or can be distributed by using an electric communication line such as the Internet.
This application is based on Japanese Patent Application No. 2012-094853 filed on Apr. 18, 2012, the contents of which are incorporated herein by reference.

According to the present invention, when an expression performance is performed by a performer, it is possible to perform an evaluation with little deviation from human sensitivity.

DESCRIPTION OF SYMBOLS 1 ... Singing evaluation system, 10 ... Karaoke apparatus, 11 ... Sound source, 12 ... Speaker, 13 ... Microphone, 14 ... Display part, 15 ... Communication interface, 16 ... Vocal adapter, 17 ... CPU, 18 ... RAM, 19 ... ROM, DESCRIPTION OF SYMBOLS 20 ... Hard disk, 21 ... Sequencer, 30 ... Server apparatus, 35 ... Communication interface, 37 ... CPU, 38 ... RAM, 39 ... ROM, 40 ... Hard disk, 90 ... Network

Claims

A facial expression performance reference that acquires facial expression performance reference data that indicates the facial expression performance to be performed during the performance of the musical piece and the timing at which the facial expression performance is to be performed in the musical piece, based on the pronunciation start time of the note or note group included in the musical piece Data acquisition means;
Pitch volume data generating means for generating pitch volume data indicating the pitch and volume of the performance sound from the performance sound of the music by the performer;
At least one of the pitch and volume characteristics indicated by the pitch volume data generated by the pitch volume data generation means is performed by the facial expression performance reference data within a predetermined time range indicated by the facial expression performance reference data in the music. A performance evaluation device comprising performance evaluation means for improving the performance of the music performed by the performer when exhibiting the characteristics of facial expression performance that should be performed.
Pitch volume data acquisition means for acquiring pitch volume data indicating the pitch and volume of the performance sound for each of the performance sounds of the music by an arbitrary number of arbitrary players,
The characteristic of at least one of the pitch and volume indicated by the volume pitch data acquired by the pitch volume data acquisition means is one of the characteristics of one or more facial expression performances predetermined at an arbitrary timing in the music The expression performance appearance data generating means for generating expression performance appearance data indicating a pair of the expression performance and the timing based on the pronunciation start time of the note or note group included in the music;
Based on any number of facial expression performance appearance data generated by the facial expression appearance data generation means, any timing with respect to each note or group of notes included in the music based on the pronunciation start time of the note or group of notes The performance evaluation device according to claim 1, further comprising: facial expression performance reference data generating means for identifying which facial expression performance appears at what frequency and generating facial expression performance reference data according to the identified information.
A facial expression performance reference data storage means for storing facial expression performance reference data;
The performance evaluation device according to claim 2, wherein the facial expression performance reference data stored in the facial expression performance reference data storage means is rewritten based on the facial expression performance reference data generated by the facial expression performance reference data generation means.
An exemplary performance reference data acquisition means for acquiring exemplary performance reference data indicating a pitch as an exemplary musical piece;
The performance evaluation unit is configured to perform the performance of the music by the performer based on a result of comparison between the pitch indicated by the pitch volume data generated by the pitch volume data generation unit and the pitch indicated by the model performance reference data. The performance evaluation apparatus according to any one of claims 1 to 3, wherein the evaluation is performed.
An exemplary performance reference data acquisition means for acquiring exemplary performance reference data indicating a pitch as an exemplary musical piece;
The performance evaluation unit is configured to perform the performance of the music by the performer based on a result of comparison between the pitch indicated by the pitch volume data generated by the pitch volume data generation unit and the pitch indicated by the model performance reference data. Make an assessment,
The pitch volume data acquired by the pitch volume data acquisition means is the result of the evaluation performed by the performance evaluation means using the model performance reference data, or by the other device including the same means as the performance evaluation means. Accompanied by performance evaluation data indicating the results of evaluation performed using data similar to the model performance reference data,
The expression performance reference data generation means is generated by the expression performance appearance data generation means using pitch volume data with performance evaluation data satisfying a predetermined condition among pitch volume data acquired by the pitch volume data acquisition means. The performance evaluation device according to claim 2, wherein the facial expression performance reference data is generated based on the facial expression performance appearance data.
A performance evaluation apparatus according to any one of claims 1 to 5,
Accompaniment data acquisition means for acquiring accompaniment data instructing the accompaniment of the music;
Sound signal output means for outputting a sound signal indicating the musical sound of the accompaniment according to the instruction of the accompaniment data,
The pitch volume data generating means is a pitch volume indicating the pitch and volume of the performance sound of the music performed by the performer according to the accompaniment emitted from a speaker according to the sound signal output from the sound signal output means Karaoke device that generates data.
The music is a song,
Lyrics data acquisition means for acquiring lyrics data indicating the lyrics of the song;
The image signal output means which outputs the image signal which shows the lyrics which are the lyrics shown by the said lyric data, and which should be sung with the accompaniment which the sound signal currently output by the said sound signal output means shows. Karaoke apparatus as described in 1.
The music is a music played by a musical instrument,
Score data acquisition means for acquiring score data indicating the score of the music;
Image signal output means for outputting an image signal indicating a score which is a score indicated by the score data and which indicates a performance to be performed together with an accompaniment indicated by the sound signal currently output by the sound signal output means; The karaoke apparatus according to claim 6.
An expression performance appearance indicating that one expression performance has appeared at one timing on the basis of the pronunciation start time of a note or a group of notes included in the music for each of the performance sounds of an arbitrary number of performers Facial expression appearance data acquisition means for acquiring data;
Based on an arbitrary number of facial expression performance appearance data acquired by the facial expression appearance data acquisition means, any timing with respect to each note or group of notes included in the music, based on the pronunciation start time of the note or group of notes And the frequency at which the expression performance appears, and the expression performance to be performed during the performance of the music and the timing at which the expression performance should be performed in the music according to the specified information. A facial expression performance reference data generating means for generating facial expression performance reference data indicating a pronunciation start time of a note or a group of notes included in
A server device comprising: transmitting means for transmitting the expression performance reference data generated by the expression performance reference data generating means to a performance evaluation device.
A singing evaluation system,
A facial expression for obtaining first facial expression performance reference data indicating a facial expression performance to be performed during the performance of a musical piece and a timing at which the facial expression performance is to be performed in the musical piece with reference to a pronunciation start time of a note or a group of notes included in the musical piece Performance reference data acquisition means;
Pitch volume data generating means for generating pitch volume data indicating the pitch and volume of the performance sound from the performance sound of the music by the performer;
At least one of the pitch and the volume indicated by the pitch volume data generated by the pitch volume data generation means has the first facial expression performance within a predetermined time range indicated by the first facial expression performance reference data in the music piece. Performance evaluation means for improving the performance of the music performed by the performer when indicating the characteristics of facial expression performance that should be performed by reference data;
With respect to each of the performance sounds of the music by an arbitrary number of arbitrary performers, one facial expression performance appeared at one timing based on the pronunciation start time of the notes or note groups included in the music by the arbitrary performers Facial expression appearance data acquisition means for acquiring facial expression performance appearance data indicating that;
Based on an arbitrary number of facial expression performance appearance data acquired by the facial expression performance appearance data acquisition means, with respect to each note or note group included in the music by the arbitrary player, the pronunciation start time of the note or note group is used as a reference The expression performance and the expression performance to be performed during the performance of the music by the arbitrary player are determined according to the specified information. Facial expression performance reference data generating means for generating second facial expression performance reference data indicating the timing to be performed in the music by the arbitrary player based on the pronunciation start time of a note or note group included in the music by the arbitrary player When,
Is provided.
Obtaining facial expression performance reference data indicating the facial expression performance to be performed during the performance of the musical piece and the timing at which the facial expression performance is to be performed in the musical piece with reference to the pronunciation start time of the note or note group included in the musical piece,
Generate pitch volume data indicating the pitch and volume of the performance sound from the performance sound of the music by the performer,
A facial expression performance that is to be performed by the facial expression performance reference data within a predetermined time range indicated by the facial expression performance reference data in the musical piece, at least one of the pitch and volume characteristics indicated by the pitch volume data. A performance evaluation method for improving the performance of the music player by the performer when the characteristic is exhibited.
A computer executable program,
A facial expression performance reference that acquires facial expression performance reference data that indicates the facial expression performance to be performed during the performance of the musical piece and the timing at which the facial expression performance is to be performed in the musical piece, based on the pronunciation start time of the note or note group included in the musical piece Data acquisition processing,
Pitch volume data generation processing for generating pitch volume data indicating the pitch and volume of the performance sound from the performance sound of the music by the performer;
At least one of the pitch and volume characteristics indicated by the pitch volume data generated by the pitch volume data generation means is performed by the facial expression performance reference data within a predetermined time range indicated by the facial expression performance reference data in the music. A program for causing the computer to execute performance evaluation processing for improving the performance of the music performed by the performer when the performance of the facial expression performance to be performed is indicated.