WO2015111671A1

WO2015111671A1 - Singing evaluation device, singing evaluation method, and singing evaluation program

Info

Publication number: WO2015111671A1
Application number: PCT/JP2015/051731
Authority: WO
Inventors: 片寄　晴弘; 達矢的場; 隆一成山; 松本　秀一
Original assignee: ヤマハ株式会社
Priority date: 2014-01-23
Filing date: 2015-01-22
Publication date: 2015-07-30
Also published as: JP6304650B2; JP2015138177A

Abstract

A singing evaluation device (10) equipped with a consonant length measurement unit (11), a consonant length distribution creation unit (12), and an evaluation unit (13). The consonant length measurement unit (11) detects consonants in singing data and measures the length of each consonant. The consonant length distribution creation unit (12) stores, over a prescribed time period, the consonant lengths that have been input, and creates a frequency distribution for the consonant length. The evaluation unit (13) utilizes the fact that the singing is more in the groove when the variation in the consonant length is greater and the singing is less in the groove when the variation in the consonant length is less to determine whether the singing is in the groove. Specifically, the evaluation unit (13) calculates an index that is based on the variation in the consonant length, from the frequency distribution for the consonant length, and determines whether the singing is in the groove from the size of the index that is based on the variation in the consonant length.

Description

Singing evaluation device, singing evaluation method, and singing evaluation program

The present invention relates to a technique for evaluating the singing content of a singer.

Various singing evaluation methods are adopted in the singing evaluation apparatus used for the karaoke apparatus that is currently popular. In these conventional song evaluation apparatuses, song data and score data are compared as basic song evaluation items. Singing data is voice data obtained by collecting a song of a singer. The musical score data is data for setting the pitch, volume, and sounding timing of each sound in a song sung by a singer. A conventional singing evaluation apparatus sings depending on how much the pitch, volume, and pronunciation timing of each sound in the song data match the pitch, volume, and pronunciation timing of each sound in the score data. Is evaluated. For example, in a conventional singing evaluation apparatus, the higher the degree that the pitch, volume, and pronunciation timing of each sound in the song data match the pitch, volume, and pronunciation timing of each sound in the score data, the higher the degree Evaluate to score.

Moreover, in the karaoke apparatus described in Patent Document 1, the singing is evaluated by detecting the rhythm of the singing data and detecting whether or not the rhythms included in the singing music data match.

Japanese Unexamined Patent Publication No. 2007-114492

However, with the conventional singing evaluation apparatus including the karaoke apparatus described in Patent Document 1, it is not possible to evaluate the singer's feeling of dynamism, particularly the flutter. Such a feeling of dynamism in singing can be evaluated only by comparison with simple musical score data, as described above, even though it is pleasant for singers and listeners. That is, although the singer or listener thinks that “the singer is singing well with a sense of dynamism”, the evaluation may be judged to be low (for example, the score is low).

An object of the present invention is to provide a technique for evaluating the dynamic feeling of singing, in particular, the reflection, which cannot be evaluated only by comparison with simple musical score data or music data.

The singing evaluation apparatus according to the present invention includes a consonant length distribution creation unit and an evaluation unit. The consonant length distribution creating unit creates a consonant length distribution of each consonant included in the singing sound based on the singing data indicating the singing sound. The evaluation unit detects a consonant length variation that is a degree of spread of the consonant length distribution using the consonant length distribution, and evaluates the singing sound using the consonant length variation.

This configuration uses the fact that the consonant length variation is different between a song with a sense of dynamism and a song without a sense of dynamism, in particular, a song with a good twist and a song with a bad twist. Therefore, by detecting and using the variation of the consonant length, it is possible to accurately determine the score of the singing.

Moreover, the singing evaluation apparatus of the present invention includes a consonant length measuring unit that measures the consonant length of the singing data obtained from the singing sound of the singer and outputs the measured consonant length to the consonant length distribution creating unit.

In this configuration, the consonant length of the singing data can be measured from the singing sound sung by the user.

The singing evaluation apparatus according to the present invention includes a consonant length measuring unit, a consonant length distribution creating unit, and an evaluating unit. The consonant length measurement unit measures the consonant length of the singing data obtained from the singing sound of the singer. The consonant length distribution creating unit creates a consonant length distribution. The evaluation unit detects a consonant length variation that is a degree of spread of the consonant length distribution using the consonant length distribution, and evaluates the singing sound using the consonant length variation.

Moreover, the singing evaluation apparatus of the present invention includes a determination target detection unit that detects a consonant to be determined from at least one of music score data and music data of a song to be sung. The consonant length distribution creating unit creates a consonant length distribution for the consonant to be determined.

This configuration utilizes the fact that the degree of variation in consonant length differs between a song with a good twist and a song with a bad twist in a specific sound type and rhythm. Therefore, by detecting and using a variation in consonant length with respect to a specific consonant to be determined, it is possible to more accurately determine the singing.

Further, the determination target detection unit of the singing evaluation apparatus of the present invention includes at least one of a sound type analysis unit, a rhythm analysis unit, and a specific section extraction unit. The sound type analysis unit analyzes the sound type sung from the musical score data. The rhythm analyzer analyzes the rhythm sung from the musical score data. The specific section extraction unit extracts a specific section of a song to be sung from the music data. The determination target detection unit determines a determination target from at least one of a sound type, a rhythm, and a specific section.

This configuration shows a more specific preferable example of the determination target detection unit. As described above, by determining the sound type, rhythm, and specific section as determination targets, more accurate determination can be performed.

Moreover, the singing evaluation apparatus of the present invention includes a vowel pronunciation timing acquisition unit and a vowel pronunciation timing distribution creation unit. The vowel pronunciation timing acquisition unit detects vowel pronunciation timing using song data. The vowel pronunciation timing distribution creation unit detects a timing difference between the vowel pronunciation timing and the beat timing of the music, and creates a distribution of the vowel pronunciation timing difference. The evaluation unit uses the distribution of the vowel pronunciation timing difference for the evaluation of the song.

In this configuration, the vowel pronunciation timing is almost the same as the beat timing for both good and bad singing, and if the vowel pronunciation timing deviates many times, the song can be heard poorly. is doing. Therefore, by detecting and using the difference between the pronunciation timing of the vowel and the beat, it is possible to more accurately evaluate the skill of singing.

Moreover, in the singing evaluation apparatus of the present invention, the evaluation unit determines that the larger the degree of spread of the consonant length distribution (the variation of the consonant length) is, the better the song is, and highly evaluates the singing.

In this configuration, an example of singing evaluation is shown. As shown in FIG. 2 to be described later, a song with a good twist has a large consonant length variation, and a song with a bad twist, for example, a song sung mechanically according to the score, It uses the fact that there is little variation in consonant length. Therefore, it is possible to perform an evaluation closer to the sensation of the upper and lower singing of a singer or listener by determining that the greater the consonant length variation, the better.

Another aspect of the present invention creates a distribution of consonant lengths of each consonant included in the singing sound based on the singing data indicating the singing sound, detects the degree of spread of the consonant length distribution, A singing evaluation method for evaluating the singing sound using a degree of spread of distribution is provided.

According to another aspect of the present invention, a consonant length distribution creating unit that creates a consonant length distribution of each consonant included in the singing sound based on the singing data indicating the singing sound, and a degree of spread of the consonant length distribution. There is provided a singing evaluation program for causing a computer to execute each function of detecting and evaluating the singing sound using the degree of spread of the consonant length distribution.

According to the present invention, it is possible to realize singing evaluation closer to the feeling of good and bad hands by a singer or listener.

The block diagram which shows the main structures of the song evaluation apparatus which concerns on the 1st Embodiment of this invention. The figure which shows the reference | standard concept of song evaluation of the song evaluation apparatus which concerns on the 1st Embodiment of this invention. The flowchart of the song evaluation method which concerns on the 1st Embodiment of this invention. The block diagram which shows the main structures of the song evaluation apparatus which concerns on the 2nd Embodiment of this invention. The figure which shows the concept which links | relates the pronunciation start timing of a vowel with the timing of a beat which concerns on the 2nd Embodiment of this invention. Frequency distribution diagram showing the relationship between the distribution of the time difference between the vowel sounding start timing and the beat timing according to the second embodiment of the present invention and the poor singing The flowchart of the song evaluation method which concerns on the 2nd Embodiment of this invention. The block diagram which shows the main structures of the song evaluation apparatus which concerns on the 3rd Embodiment of this invention. Frequency distribution diagram showing consonant length distribution for each beat according to the third embodiment of the present invention The flowchart of the song evaluation method which concerns on the 3rd Embodiment of this invention. The block diagram which shows the main structures of the song evaluation apparatus which concerns on the 4th Embodiment of this invention. The figure which shows the specific example of the setting concept of the consonant utilized for determination of the consonant length which concerns on the 4th Embodiment of this invention. Flowchart of the singing evaluation method according to the fourth embodiment of the present invention.

The singing evaluation apparatus according to the first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the main configuration of a singing evaluation apparatus according to the first embodiment of the present invention. FIG. 2 is a diagram illustrating a reference concept of singing evaluation of the singing evaluation apparatus according to the first embodiment of the present invention. In FIG. 2, the horizontal axis is the consonant length, and the vertical axis is the frequency. That is, FIG. 2 is a diagram showing a frequency distribution of consonant lengths included in a predetermined time section of a sung song.

The singing evaluation apparatus according to the present invention makes it possible to evaluate the dynamic feeling of the singing, in particular, the “noir” of the singing.

The song “Nori” in the present invention is an advanced technique that musically controls the deviation from a position (timing) that is mainly divided into measures in time.
・ Singers' sung songs and uplifting feelings,
・ Rhythm that makes you want to sing and dance together just by listening.
・ Human, free or lively dynamics,
It is a singing expression that obtains effects such as. Sometimes called “groove” or “groove feeling”.

First, the reference concept of song evaluation of the song evaluation apparatus according to the first embodiment of the present invention will be described with reference to FIG. The singing with a bad twist in FIG. 2 shows a case where the singing is mechanically performed according to the pitch, the volume, and the sounding timing according to the score. The singing with a good flare shows the case where the singing that can be felt from the part depending on the emotion at the time of singing by the above-mentioned singer is performed.

As shown in FIG. 2, in a song with a good twist, the consonant length variation is large and the average consonant length is large (consonant length is long on average). In a song with a bad track, the consonant length variation is small and the average value is small (consonant length is short on average).

Therefore, by calculating the index based on the variation of the consonant length, it is possible to determine whether the groove is good or bad. Specifically, for example, the variance of the consonant length and the standard deviation are calculated. When the consonant length variance and the standard deviation are large, it is determined that the sound is good and the singing is good, and when the consonant length dispersion and the standard deviation are small, the sound is bad and the singing is determined to be poor. In addition, the score for singing increases as the variance or standard deviation increases, and the score decreases as the variance or standard deviation decreases.

Also, by calculating an index based on the average value of the consonant length, it is possible to determine whether the groove is good or bad. Specifically, for example, an average value of the consonant length is calculated. When the average value of the consonant length is large, it is determined that the sound is good and the singing is good. When the average value of the consonant length is small, it is determined that the sound is poor and the singing is poor. Moreover, it determines so that the score with respect to a song becomes high, so that an average value is large, and a score becomes low, so that an average value is small.

Also, as shown in FIG. 2, in a song with a good twist, the consonant length that takes the maximum value (mode) is long. In singing with a bad track, the consonant length that takes the maximum value (mode) is short. Therefore, by calculating an index based on the length of the consonant length that is the mode value, it is possible to determine whether the groove is good or bad.

Specifically, for example, a consonant length value that is a mode value is calculated. When the consonant length value that is the most frequent value is large (when the consonant length is long), it is determined that the sound is good and the singing is good, and when the consonant length value that is the most frequent value is small (the consonant length is short) ), It is determined that the song is bad and the singing is poor. Further, the higher the consonant length value that is the mode value, the higher the score for singing, and the smaller the consonant length value that is the mode value, the lower the score.

Further, the song may be evaluated by determining the glue by taking into account at least two of an index based on variation, an index based on an average value, and an index based on a mode value. For example, the index based on the variation may be digitized, the index based on the average value may be digitized, and the result of the four arithmetic operations may be used to determine the glue and evaluate the singing. At this time, as a specific example, the numerical values representing these indices may be set to increase as the variation and the average value increase, and an average of these indices or a weighted average may be used.

In order to realize such singing evaluation, the singing evaluation apparatus according to the embodiment of the present invention has a configuration shown in FIG. The singing evaluation device 10 includes a consonant length measuring unit 11, a consonant length distribution creating unit 12, and an evaluating unit 13.

The consonant length measuring unit 11 receives singing data obtained by collecting the singing of the singer. The consonant length measuring unit 11 detects a consonant from the song data using a known method. The consonant length measuring unit 11 measures the consonant length that is the length of each consonant. The consonant length measuring unit 11 outputs the measured consonant length to the consonant length distribution creating unit 12.

An example of a known method for detecting consonants is Japanese Patent Application Laid-Open No. 2008-32933. This document discloses a method of determining a period with periodicity as a vowel section and determining other sections as consonant sections. There is also a method of measuring the length of each identified consonant section by manually specifying a consonant section of singing data on GUI (Graphical User Interface). Alternatively, a method of measuring the length of each identified consonant section by manually identifying a consonant section while listening to song data without using a GUI is also conceivable.

The consonant length distribution creating unit 12 stores the input consonant lengths over a preset time interval. The time interval to be stored may be, for example, the whole piece of music or a predetermined one phrase. When using the determined one phrase, it is only necessary to acquire score data separately and use it with reference to the score data. The consonant length distribution creating unit 12 creates a frequency distribution of consonant lengths. The consonant length distribution creation unit 12 outputs the frequency distribution of the consonant length to the evaluation unit 13.

The evaluation unit 13 determines whether the groove is good or bad from the frequency distribution of the consonant length, and performs singing evaluation based on the groove. The determination criteria for the glue are as described above. Specifically, the evaluation unit 13 calculates an index (for example, variance or standard deviation) based on the variation of the consonant length from the frequency distribution of the consonant length. The evaluation unit 13 determines whether the flutter is good or bad from the index based on the variation of the consonant length as described above. Note that, as described above, the evaluation unit 13 may determine whether or not the slack is good by using the average value of the consonant length or the consonant length (mode) at which the frequency is maximum.

As described above, by using the configuration of the present embodiment, it is possible to accurately determine whether the singing is good or not, and to perform singing evaluation in consideration of the quality of the singing. That is, it is possible to realize a singing evaluation that is closer to the feeling of good and bad hands by a singer or listener.

The following methods can be considered as a method of singing evaluation by the evaluation unit 13. For example, singing data of various levels ranging from those who are good at singing to those who are poor at the same song are collected and stored in a storage device. A computer reads each song data from a memory | storage device, and calculates dispersion | distribution of the consonant length obtained from the said song data. Further, the computer assigns scores (1 to 10) corresponding to each song data by evaluating the variance value obtained from each song data in 10 stages. Thereby, it becomes possible to evaluate the score in each song data with a score of 10 stages.

In addition, the stage evaluation of the variance value is not limited to 10 stages, and a finer stage evaluation (20 stages, 50 stages, etc.) or a rougher stage evaluation (8 stages, 5 stages, etc.) may be adopted. Absent. In addition, the interval between steps used in such a step evaluation (difference between the minimum value of variance determined to correspond to a certain score and the maximum value of variance) may be set at equal intervals. Alternatively, a weighted interval (for example, the interval at 5 points and the interval at 10 points are different) may be used. In addition, although the interval of the steps used in the step evaluation is set at an equal interval, a method of weighting the added points in each step may be adopted (as the three-step evaluation, the minimum evaluation is 5 points, the middle The evaluation is 10 points, and the highest evaluation is 30 points).

In the above description, each functional unit is individually provided. However, these may be stored as a program and the program may be executed by an arithmetic processing element such as a CPU. In this case, the following processing flow may be used. FIG. 3 is a flowchart of the singing evaluation method according to the first embodiment of the present invention.

First, the song evaluation device 10 acquires song data and measures the consonant length of each consonant included in the song data (S101). Next, the singing evaluation apparatus 10 creates a distribution of the acquired consonant lengths (S102). Next, the singing evaluation apparatus 10 calculates an index based on the variation of the consonant length from the distribution of the consonant length, determines whether the groove is good or bad using the index, and performs singing evaluation (S103). In FIG. 3, the quality of the groove is determined based on the variation of the consonant length. However, as described above, the quality of the groove can be determined using the average value or the mode value of the consonant length. it can.

Next, a singing evaluation apparatus according to the second embodiment of the present invention will be described with reference to the drawings. FIG. 4 is a block diagram showing the main configuration of the singing evaluation apparatus according to the second embodiment of the present invention.

10A of song evaluation apparatuses of this embodiment add the vowel pronunciation timing acquisition part 21 and the vowel pronunciation timing distribution creation part 22 with respect to the song evaluation apparatus 10 shown in 1st Embodiment, and others The configuration is the same as the singing evaluation apparatus 10 shown in the first embodiment. Therefore, only the parts different from the singing evaluation apparatus 10 according to the first embodiment will be specifically described below.

The same singing data as the consonant length measurement unit 11 is input to the vowel pronunciation timing acquisition unit 21. The vowel pronunciation timing acquisition unit 21 detects a vowel by a known method, and detects a pronunciation start timing of the vowel. Specifically, in the case of a vowel with a consonant, the vowel timing detection unit 21 detects the timing at which the vowel is switched to the vowel. The vowel timing detection unit 21 detects the timing at which vowels are switched when vowels are continuous. The vowel timing detection unit 21 detects a timing at which a vowel is generated from a silent state when a vowel is generated from the silence. The vowel pronunciation timing acquisition unit 21 outputs the detected pronunciation start timing of each vowel to the vowel pronunciation timing distribution creation unit 22.

The vowel sound generation timing distribution creation unit 22 receives the sound generation start timing of each vowel and the beat timing of the song being sung. The vowel pronunciation timing distribution creation unit 22 compares the difference between the beat timing and the vowel pronunciation start timing. At this time, the vowel sounding timing distribution creating unit 22 associates the timing of the beat closest to the sounding start timing of each vowel with the sounding start timing of each vowel. FIG. 5 is a diagram showing the concept of associating the vowel pronunciation start timing with the beat timing. In the case of FIG. 5, for example, the sounding start timing of the vowel V01 is closest to the timing of the first beat. Therefore, the vowel sound generation timing distribution creation unit 22 sets the beat corresponding to the sound generation start timing of the vowel V01 as the first beat. Similarly, in the case of FIG. 5, the vowel sound generation timing distribution creating unit 22 sets the beat corresponding to the sound generation start timing of the vowel V02 as the third beat. In the case of FIG. 5, the vowel sound generation timing distribution creation unit 22 sets the beat corresponding to the sound generation start timing of the vowel V03 as the fourth beat, and sets the beat corresponding to the sound generation start timing of the vowel V04 as the fifth beat. The beat corresponding to the sound generation start timing of the vowel V05 is set to the seventh beat, and the beat corresponding to the sound generation start timing of the vowel V06 is set to the eighth beat.

The vowel pronunciation timing distribution creation unit 22 calculates the time difference between the vowel pronunciation start timing and the beat timing corresponding to each. The vowel pronunciation timing distribution creation unit 22 creates this time difference distribution. The vowel sound generation timing distribution creation unit 22 outputs the time difference distribution to the evaluation unit 13.

The evaluator 13 determines whether the singing is good or bad using the distribution of the time difference between the vowel pronunciation start timing and the beat timing. FIG. 6 is a frequency distribution diagram showing the relationship between the distribution of the time difference between the vowel sounding start timing and the beat timing and the poor singing ability. As shown in FIG. 6, when the singing is good, the variation in time difference between the vowel pronunciation start timing and the beat timing is small, and when the singing is poor, the time difference variation between the vowel pronunciation start timing and the beat timing is small. Is big. In addition, when the singing is good, the mode value of the time difference between the vowel pronunciation start timing and the beat timing is substantially 0, and when the singing is not good, the time difference between the vowel pronunciation start timing and the beat timing is the most frequent. The value is greatly deviated from 0.

Using this characteristic, the evaluation unit 13 detects the variation in the time difference, and determines that the singing is better as the variation in the time difference becomes smaller. The evaluation unit 13 detects the mode value of the time difference, and determines that the singing is better as the mode value is closer to zero. Moreover, the evaluation part 13 may determine the skill level of a song using both the time difference variation and the time difference mode value.

The evaluation unit 13 reflects the singing evaluation result based on the time difference between the vowel pronunciation start timing and the beat timing in the singing evaluation result based on the above-mentioned consonant length, and performs singing evaluation in an integrated manner. As a result, it is possible to more accurately determine the sung skill.

Note that the evaluation unit 13 may perform the singing evaluation based on the consonant length only when it is determined that the singing is good based on the singing evaluation result based on the time difference between the vowel pronunciation start timing and the beat timing. . In this case, the evaluation unit 13 does not perform the singing evaluation based on the consonant length when the singing evaluation based on the consonant length is considered unnecessary. Therefore, the processing load of the singing evaluation can be reduced.

It should be noted that the processing of this embodiment may also be stored as a program, and the program may be executed by an arithmetic processing element such as a CPU. In this case, the following processing flow may be used. FIG. 7 is a flowchart of the singing evaluation method according to the second embodiment of the present invention. FIG. 7 shows a case where whether or not to perform the singing evaluation based on the consonant length is switched depending on the singing evaluation result based on the time difference between the vowel sound generation timing and the beat timing.

First, the song evaluation device 10A acquires song data and measures the consonant length of each consonant included in the song data (S201). Apart from the measurement of the consonant length, the singing evaluation device 10A detects the pronunciation start timing of each vowel included in the singing data (S202). Next, the singing evaluation device 10A creates a time difference distribution between the sounding start timing of each vowel and the timing of the beat corresponding to the timing (S203). When the singing evaluation apparatus 10A detects that the time difference is greatly distributed in the vicinity of 0 (S204: YES), the singing evaluation apparatus 10A calculates an index based on the variation of the consonant length from the distribution of the consonant length, and uses this index to determine whether or not Bad judgment is performed and singing evaluation is performed (S205). When the singing evaluation device 10A detects that the time difference is not largely distributed in the vicinity of 0 (S204: NO), the singing evaluation by the consonant length is not performed, and the singing is evaluated to be poor.

Next, a singing evaluation apparatus according to the third embodiment will be described with reference to the drawings. FIG. 8 is a block diagram showing the main configuration of the singing evaluation apparatus according to the third embodiment of the present invention.

The singing evaluation device 10B of the present embodiment is different from the singing evaluation device 10A shown in the second embodiment in the connection configuration with respect to the vowel pronunciation timing acquisition unit 21 and the vowel pronunciation timing distribution creation unit 22, The other configuration is the same as the singing evaluation apparatus 10A shown in the second embodiment. Therefore, only the part different from the singing evaluation apparatus 10A according to the second embodiment will be specifically described.

The vowel sound generation timing acquisition unit 21 outputs the detected vowel sound generation start timing to the consonant length distribution generation unit 12 together with the vowel sound generation timing distribution generation unit 22.

The vowel pronunciation timing distribution creation unit 22 outputs beats corresponding to each vowel pronunciation start timing to the consonant length distribution creation unit 12.

The consonant length distribution creating unit 12 detects a beat corresponding to each consonant from the input vowel sounding timing and the corresponding beat. Specifically, in the case of FIG. 5, the consonant length distribution creating unit 12 detects that the beat corresponding to the consonant C01 attached to the vowel V01 is the first beat. Similarly, the consonant length distribution creating unit 12 detects that the beat corresponding to the consonant C02 attached to the vowel V02 is the third beat, and detects the beat corresponding to the consonant C04 attached to the vowel V04 is the fifth beat. The beat corresponding to the consonant C05 attached to the vowel V05 is detected as the seventh beat, and the beat corresponding to the consonant C06 attached to the vowel V06 is detected as the eighth beat.

The consonant length distribution creating unit 12 creates a consonant length distribution for each beat. FIG. 9 is a frequency distribution diagram showing the consonant length distribution for each beat according to the third embodiment of the present invention. The consonant length distribution creation unit 12 outputs the consonant length distribution for each beat to the evaluation unit 13.

The evaluation unit 13 uses a consonant length distribution for each beat to determine whether the groove is good or bad and performs singing evaluation. As a specific example, the evaluation unit 13 has a difference between the dispersion and standard deviation of the consonant length in the beat having the smallest consonant length variation and the difference between the dispersion and standard deviation of the consonant length in the beat having the largest consonant length variation. Judge the quality of the glue. In the specific example shown in FIG. 9, the quality of the glue is determined based on the difference between the variance and standard deviation of the first beat and the variance and standard deviation of the sixth beat. At this time, the evaluation unit 13 determines not only the arithmetic difference between the variance and the standard deviation as the difference, but also the beat with the variance of the consonant length and the standard deviation in the beat having the smallest consonant length variation and the maximum consonant length variation. A consonant length variance or an arithmetic ratio with a standard deviation may be used. The evaluation unit 13 determines that the larger the difference is, the better the roughness is, and the smaller the difference is, the worse the roughness is.

In this way, it is possible to determine the quality of the groove more accurately by determining the quality of the groove based on the variation for each beat, and the singing evaluation closer to the feeling of the upper and lower hands by the singer and listener Can be realized.

It should be noted that the processing of this embodiment may also be stored as a program, and the program may be executed by an arithmetic processing element such as a CPU. In this case, the following processing flow may be used. FIG. 10 is a flowchart of the singing evaluation method according to the third embodiment of the present invention. FIG. 10 shows a case where singing evaluation based on the time difference between the vowel sound generation timing and the beat timing is not performed. Thus, in this embodiment, singing evaluation based on the time difference between the vowel sound generation timing and the beat timing may or may not be performed.

First, the song evaluation apparatus 10B acquires song data, and measures the consonant length of each consonant included in the song data (S301). Apart from the measurement of the consonant length, the singing evaluation device 10B detects the sounding start timing of each vowel included in the singing data (S302). Next, the singing evaluation device 10B detects the association between the sounding start timing of each vowel and the timing of the corresponding beat (S303). The singing evaluation apparatus 10B creates a distribution of consonant length for each beat (S304). The singing evaluation device 10B determines whether the sound is good or bad by using the variation in the distribution of the consonant length for each beat, and performs singing evaluation (S305).

Next, a singing evaluation apparatus according to the fourth embodiment of the present invention will be described with reference to the drawings. FIG. 11: is a block diagram which shows the main structures of the song evaluation apparatus which concerns on the 4th Embodiment of this invention. The singing evaluation apparatus 10C of the present embodiment further includes a determination target detection unit 23 with respect to the singing evaluation apparatus 10B shown in the third embodiment, and the processing of the consonant length distribution creation unit 12 is different. Therefore, only a different part from the song evaluation apparatus 10B which concerns on 3rd Embodiment is demonstrated concretely.

The determination target detection unit 23 includes a sound form analysis unit, a rhythm analysis unit, and a specific section extraction unit. The determination target detection unit 23 receives at least one of score data and music data. The musical score data includes the pitch, volume, and sounding timing of the song to be sung. The music data includes the composition of music such as a chorus section, the genre of music, and the like.

The determination target detection unit 23 analyzes the score data and sets a consonant to be used for determination of the consonant length. FIG. 12 is a diagram showing a specific example of a consonant setting concept used for determination of consonant length. FIG. 12A is a diagram showing a consonant setting concept based on sound type, and FIG. 12B is a diagram showing a consonant setting concept based on rhythm.

When the sound type is used, when the sound type analysis unit of the determination target detection unit 23 detects the sound type used for the determination of the consonant length from the score data, the sound type interval and the consonant used for the determination of the consonant length are detected. Set the timing. For example, as shown in FIG. 12A, when the rising tone type is detected, the consonant of the third sound in the three sounds whose pitches rise continuously is set as the consonant used for the determination of the consonant length. The timing of the consonant is given to the consonant length distribution creating unit 12.

When using a rhythm, when the rhythm analysis unit of the determination target detection unit 23 detects a rhythm to be used for determining the consonant length from the musical score data, the rhythm section and a consonant timing to be used for determining the consonant length are set. To do. For example, as shown in FIG. 12B, when syncopation is detected, the consonant of the third sound (the sound having a short sound length) in the three sounds in the section where the different sound lengths are repeated is used to determine the consonant length. And the consonant length of the consonant is given to the consonant length distribution creating unit 12.

In the present embodiment, the tone type and rhythm of a continuous three-tone section is specified, but a section composed of two or more sounds may be used. However, the smaller the number of sounds, the greater the number of times that the same sound type and the same rhythm are included in the song being sung. Therefore, it is easy to create a more accurate consonant length distribution, which is more useful. In addition, the consonant of the head or middle sound may be set as the consonant used for the determination of the consonant length, not the last sound of the section. Thereby, more appropriate evaluation can be performed according to a sound type and a rhythm.

Also, the determination target detection unit 23 can analyze music data and set a consonant to be used for determination of the consonant length. For example, the determination target detection unit 23 detects a chorus section from the music data, and gives the time of the chorus section to the consonant length distribution creation unit 12.

The consonant length distribution creating unit 12 refers to the timing or interval given from the determination target detecting unit 23 and the timing of each consonant obtained from the vowel sounding timing detection 21 and the vowel sounding timing distribution creating unit 22, and the consonant length measuring unit 11 is used to create a consonant length distribution for a given timing or interval.

It has been found that singing is likely to occur at a specific timing in a specific sound type, a specific rhythm, or a specific music section. Therefore, by using the configuration of the present embodiment, it is possible to determine the flare by the consonant length at a timing at which the flare is likely to occur, and it is possible to more accurately determine whether the flare is good or bad.

It should be noted that the processing of this embodiment may also be stored as a program, and the program may be executed by an arithmetic processing element such as a CPU. In this case, the following processing flow may be used. FIG. 13 is a flowchart of the singing evaluation method according to the fourth embodiment of the present invention. Note that FIG. 13 shows a case in which whether or not to perform singing evaluation based on the consonant length is switched depending on the singing evaluation result based on the time difference between the vowel sounding timing and the beat timing. Singing evaluation based on the time difference from the beat timing may not be performed.

First, the song evaluation apparatus 10C acquires song data and measures the consonant length of each consonant included in the song data (S401). Separately from the measurement of the consonant length, the singing evaluation apparatus 10C analyzes the score data or the music data, and sets the timing or section that is the creation target of the consonant length distribution based on the sound type, rhythm, or specific section (S402). ). In addition to the measurement of the consonant length and the setting of the creation target of the consonant length distribution, the singing evaluation device 10C detects the pronunciation start timing of each vowel included in the singing data (S403). Next, the singing evaluation apparatus 10C creates a time difference distribution between the sounding start timing of each vowel and the timing of the corresponding beat (S404). When the singing evaluation device 10C detects that the time difference is largely distributed in the vicinity of 0 (S405: YES), the singing evaluation device 10C creates a consonant length distribution for the timing or section to be determined (S406). The singing evaluation apparatus 10C calculates an index based on the variation of the consonant length from the distribution of the consonant length, determines whether the groove is good or bad using the index, and performs the singing evaluation (S407). When the singing evaluation device 10A detects that the time difference is not greatly distributed in the vicinity of 0 (S405: NO), the singing evaluation is not performed by the consonant length, and the singing is evaluated as being poor. Or, the singing evaluation by consonant is performed, but the evaluation of the glue may be greatly reduced.

In the above description, an example of creating a distribution of consonant length by beat has been shown, but a consonant length distribution by consonant, for example, a consonant length distribution may be created for each consonant by identifying the consonant. Alternatively, a consonant category may be identified from the music data, and a consonant length distribution may be created for each consonant category. For example, consonants may be classified into categories of voiced and unvoiced sounds, or categories such as plosives and sibilants, and a consonant length distribution may be created for each category.

Further, in the above description, the mode of measuring the consonant length and determining the variation of the consonant length based on the singing sound collected by the microphone or the like is shown, but the singing created by artificially imitating a human voice The above-described configuration can also be applied to sound, and similar effects can be obtained.

In the above description, the mode in which the consonant length is automatically measured has been shown. However, the user manually identifies the consonant and the vowel while viewing the waveform, measures the consonant length, and inputs it to the device (acquired by the device). )

In the above description, the vowel pronunciation start timing is automatically measured. However, the user manually detects the vowel pronunciation start timing while viewing the waveform, and represents the detected vowel pronunciation start timing. Data may be input to the device (acquired by the device), and the user listens to the singing sound and detects the vowel start timing manually, and inputs data indicating the detected vowel pronunciation start timing to the device (Acquired by the apparatus). In this case, the vowel sound generation timing acquisition unit may use an operation input unit (not shown).

This application is based on a Japanese patent application (Japanese Patent Application No. 2014-010223) filed on January 23, 2014, the contents of which are incorporated herein by reference.

10, 10A, 10B, 10C: Singing evaluation device 11: consonant length measurement unit 12: consonant length distribution creation unit 13: evaluation unit 21: vowel pronunciation timing acquisition unit 22: vowel pronunciation timing distribution creation unit 23: determination target detection unit

Claims

A consonant length distribution creating unit that creates a distribution of consonant lengths of each consonant included in the singing sound based on the singing data indicating the singing sound;
An evaluator that detects the degree of spread of the consonant length distribution and evaluates the singing sound using the degree of spread of the consonant length distribution;
A singing evaluation device.
Measure the consonant length of the singing data obtained from the singing sound, comprising a consonant length measuring unit that outputs the measured consonant length to the consonant length distribution creating unit,
The singing evaluation apparatus according to claim 1.
A determination target detection unit that detects a consonant to be determined from at least one of the musical score data and the music data of the song to be sung;
The consonant length distribution creation unit creates the consonant length distribution for the consonant that is the determination target.
The song evaluation apparatus according to claim 1 or 2.
The determination target detection unit includes:
A sound type analysis unit for analyzing a sound type sung from the score data;
A rhythm analyzer for analyzing rhythms sung from the musical score data;
A specific section extraction unit that extracts a specific section of a song sung from the music data;
Comprising at least one of
Determining the determination target from at least one of the sound type, the rhythm, and the specific section;
The singing evaluation apparatus according to claim 3.
A vowel pronunciation timing acquisition unit for acquiring a vowel pronunciation timing of the singing sound;
Detecting a timing difference between the vowel pronunciation timing and the beat timing of the music, and creating a distribution of the vowel pronunciation timing difference;
With
The evaluation unit uses the distribution of the vowel pronunciation timing difference for the evaluation of the singing sound.
The singing evaluation apparatus in any one of Claims 1-4.
The evaluation unit judges that the greater the degree of spread of the distribution of the consonant length, the better the paste, and highly evaluates the singing sound.
The song evaluation apparatus in any one of Claims 1-5.
Based on the singing data indicating the singing sound, create a distribution of the consonant length of each consonant included in the singing sound,
Detecting the degree of spread of the consonant length distribution and evaluating the singing sound using the degree of spread of the consonant length distribution;
Singing evaluation method.
The singing evaluation method according to claim 7, wherein a consonant length of singing data obtained from the singing sound is measured, and a distribution of the consonant length is created based on the measured consonant length.
A consonant to be judged is detected from at least one of musical score data and music data of the song to be sung,
The singing evaluation method according to claim 7 or 8, wherein a distribution of the consonant length is created for the consonant that is the determination target.
The determination target from at least one of a sung sound type analyzed from the score data, a sung rhythm analyzed from the score data, and a specific section of a song extracted from the song data The singing evaluation method according to claim 9.
Get the vowel pronunciation timing of the singing sound,
Detecting a timing difference between the vowel pronunciation timing and the beat timing of the music, creating a distribution of the vowel pronunciation timing difference;
Use the distribution of the vowel pronunciation timing difference for the evaluation of the singing sound,
The singing evaluation method according to any one of claims 7 to 10.
Judge that the greater the degree of spread of the consonant length distribution, the better the paste, and highly evaluate the singing sound,
The singing evaluation method according to any one of claims 7 to 11.
A consonant length distribution creating unit that creates a distribution of consonant lengths of each consonant included in the singing sound based on the singing data indicating the singing sound;
An evaluator that detects the degree of spread of the consonant length distribution and evaluates the singing sound using the degree of spread of the consonant length distribution;
Singing evaluation program to make a computer execute each function of.