WO2016148256A1

WO2016148256A1 - Evaluation device and program

Info

Publication number: WO2016148256A1
Application number: PCT/JP2016/058583
Authority: WO
Inventors: 松本　秀一; 辰弥寺島
Original assignee: ヤマハ株式会社
Priority date: 2015-03-17
Filing date: 2016-03-17
Publication date: 2016-09-22

Abstract

This evaluation device according to an embodiment of the present invention is provided with: an input sound acquisition unit which acquires an input sound; a pitch calculating unit which calculates the input sound pitch of the input sound acquired by the input sound acquisition unit; a pitch comparing unit which compares the input sound pitch calculated by the pitch calculating unit with a plurality of reference pitches, in a prescribed evaluation section; and an evaluating unit which calculates an evaluation value for the input sound on the basis of the comparison result obtained by the pitch comparing unit.

Description

Evaluation apparatus and program

The present invention relates to a technique for evaluating sound.

Karaoke devices often have a function to analyze and evaluate singing voice. For example, the singing is evaluated based on the degree of coincidence by comparing the pitch of the singing voice with the pitch of the melody to be sung (for example, Patent Document 1).

Japanese Unexamined Patent Publication No. 2005-215493

However, in the technique of Patent Document 1, since a melody is used as a criterion for determination, it is necessary to prepare information serving as a criterion for determination for each song. In addition, when it is going to evaluate the instrument sound etc. which do not accompany a song, it becomes necessary to prepare the information used as the determination reference | standard for every music similarly. *

One of the objects of the present invention is to evaluate the sound without depending on the melody.

According to an embodiment of the present invention, an input sound acquisition unit that acquires an input sound, a pitch calculation unit that calculates a pitch of the input sound acquired by the input sound acquisition unit, a plurality of evaluation sounds in a predetermined evaluation section, A pitch comparison unit that compares a reference pitch with the pitch of the input sound calculated by the pitch calculation unit; and an evaluation unit that calculates an evaluation value for the input sound based on a result of comparison by the pitch comparison unit; Are provided. The evaluation apparatus characterized by including these is provided.

According to an embodiment of the present invention, the input sound is acquired, the pitch of the input sound is calculated, and a plurality of reference pitches are compared with the calculated pitch of the input sound in a predetermined evaluation section. A program for causing a computer to calculate an evaluation value for the input sound based on the compared result is provided.

According to one embodiment of the present invention, sound evaluation can be performed without depending on the melody.

It is a block diagram which shows the structure of the evaluation apparatus 1 in 1st Embodiment of this invention. It is a block diagram which shows the structure of the evaluation function in 1st Embodiment of this invention. (A)-(c) is a figure explaining the evaluation method in 1st Embodiment of this invention. It is a figure explaining the evaluation method in 2nd Embodiment of this invention. (A) And (b) is a figure explaining the evaluation method in 3rd Embodiment of this invention. (A) And (b) is a figure explaining the evaluation method in 4th Embodiment of this invention. It is a block diagram which shows the structure of the evaluation function in 5th Embodiment of this invention. It is a figure explaining the evaluation method in 5th Embodiment of this invention. It is a block diagram which shows the structure of the evaluation function in 6th Embodiment of this invention. It is a figure explaining the evaluation method in 6th Embodiment of this invention. It is a block diagram which shows the structure of the evaluation function in 7th Embodiment of this invention. It is a block diagram which shows the structure of the evaluation function in other embodiment of this invention. It is a flowchart which shows the evaluation method in other embodiment of this invention.

Hereinafter, an evaluation apparatus according to an embodiment of the present invention will be described in detail with reference to the drawings. The following embodiments are examples of the embodiments of the present invention, and the present invention is not limited to these embodiments.

<First Embodiment>
The evaluation apparatus according to the first embodiment of the present invention will be described in detail with reference to the drawings. The evaluation apparatus which concerns on 1st Embodiment is an apparatus which evaluates the singing voice of the user who sings (it may be hereafter called a singer). The evaluation device compares a plurality of pitches determined for each predetermined period (hereinafter may be referred to as a reference pitch) and a pitch of a singer's singing voice (hereinafter may be referred to as a singing pitch), Evaluate the singing. Evaluation that does not depend on the melody can be performed. In this example, even if the singing pitch is intentionally shifted from the reference pitch by performing singing using the singing technique, the evaluation is lowered. You can avoid it. Hereinafter, such an evaluation apparatus will be described.

[Hardware]
FIG. 1 is a block diagram showing a configuration of an evaluation apparatus 1 in the first embodiment of the present invention. The evaluation device 1 is, for example, a karaoke device. In addition, portable apparatuses, such as a smart phone, may be sufficient. The evaluation device 1 includes a control unit 11, a storage unit 13, an operation unit 15, a display unit 17, a communication unit 19, and a signal processing unit 21. Each of these components is connected via a bus. In addition, a microphone 23 and a speaker 25 are connected to the signal processing unit 21.

The control unit 11 includes an arithmetic processing circuit such as a CPU. The control unit 11 causes the CPU to execute the control program stored in the storage unit 13 to realize various functions in the evaluation device 1. The realized function includes a singing voice evaluation function. The storage unit 13 is a storage device such as a nonvolatile memory or a hard disk. The storage unit 13 stores a control program for realizing the evaluation function. The control program may be provided in a state stored in a computer-readable recording medium such as a magnetic recording medium, an optical recording medium, a magneto-optical recording medium, or a semiconductor memory. In this case, the evaluation device 1 only needs to include a device that reads the recording medium. The control program may be downloaded via a network.

Moreover, the memory | storage part 13 memorize | stores music data, singing voice data, and evaluation criteria information as data regarding a song. The music data includes data related to the karaoke song, for example, guide melody data, accompaniment data, and lyrics data. The guide melody data is data indicating the melody of the song. Accompaniment data is data indicating the accompaniment of a song. The guide melody data and accompaniment data may be data expressed in the MIDI format. The lyric data is data for displaying the lyrics of the song and data indicating the timing for changing the color of the displayed lyrics telop. The singing voice data is data indicating the singing voice input from the microphone 23 by the singer. In this example, the singing voice data is buffered in the storage unit 13 until the singing voice is evaluated by the evaluation function.

The evaluation standard information is information used by the evaluation function as a standard for evaluating the singing voice. For example, the evaluation reference information includes information for specifying a change in singing pitch (singing pitch waveform) for detecting a singing technique. In the case of a singing technique such as vibrato, kobushi, shakuri, or fall, for example, the following singing pitch waveform is shown.
(1) Vibrato: The pitch changes finely (within a predetermined period or less). A specific example of vibrato detection is disclosed in Japanese Patent Application Laid-Open No. 2005-107087.
(2) Kobushi: The pitch temporarily increases (within a predetermined time), and then returns to the original pitch. A specific example of Kobushi detection is disclosed in Japanese Patent Laid-Open No. 2008-268370.
(3) Shaking: The pitch increases over a predetermined time and then stabilizes. A specific example of shackle detection is disclosed in Japanese Patent Application Laid-Open No. 2005-107334.
(4) Fall: The pitch is lowered over a predetermined time, and then the singing is interrupted. A specific example of fall detection is disclosed in Japanese Patent Application Laid-Open No. 2008-225115.

Moreover, the reference pitch to be compared with the singing pitch is defined in the evaluation reference information. The reference pitch is defined as a plurality of pitches. In this example, the reference pitch includes 440 Hz and is defined at 100 cent intervals with 440 Hz as a reference. In addition, when the pitch used as a reference | standard deviates from 440Hz with a music, for example, when it is 442Hz, a reference pitch should just be prescribed | regulated by 100cent interval on the basis of 442Hz. For example, the shift amount may be stored as tune information in the music data.

The operation unit 15 is a device such as operation buttons, a keyboard, and a mouse provided on the operation panel and the remote controller, and outputs a signal corresponding to the input operation to the control unit 11. The display unit 17 is a display device such as a liquid crystal display or an organic EL display, and displays a screen based on control by the control unit 11. Note that the operation unit 15 and the display unit 17 may integrally form a touch panel. The communication unit 19 is connected to a communication line such as the Internet based on the control of the control unit 11 and transmits / receives information to / from an external device such as a server. The function of the storage unit 13 may be realized by an external device that can communicate with the communication unit 19.

The signal processing unit 21 includes a sound source that generates an audio signal from a MIDI format signal, an A / D converter, a D / A converter, and the like. The singing voice is converted into an electric signal by the microphone 23 and input to the signal processing unit 21, and A / D converted by the signal processing unit 21 and output to the control unit 11. As described above, the singing voice is buffered in the storage unit 13 as singing voice data. The accompaniment data is read out by the control unit 11, D / A converted by the signal processing unit 21, and output from the speaker 25 as an accompaniment of the song. At this time, a guide melody may be output from the speaker 25.

[Evaluation function]
An evaluation function realized by the control unit 11 of the evaluation apparatus 1 executing a control program will be described. A part or all of the configuration for realizing the evaluation function described below may be realized by hardware.

FIG. 2 is a block diagram showing the configuration of the evaluation function in the first embodiment of the present invention. The evaluation function 100 includes an accompaniment output unit 101, an input sound acquisition unit 103, a pitch calculation unit 105, a specific section detection unit 107, a pitch comparison unit 109, and an evaluation unit 111. The accompaniment output unit 101 reads the accompaniment data corresponding to the song tune designated by the singer, and causes the speaker 25 to output the accompaniment sound via the signal processing unit 21. The input sound acquisition unit 103 acquires singing voice data indicating the singing voice input from the microphone 23. In this example, the input sound to the microphone 23 during the period in which the accompaniment sound is output is recognized as the singing sound to be evaluated. The input sound acquisition unit 103 acquires the singing voice data buffered in the storage unit 13, but may be acquired after the singing voice data of the entire song is stored in the storage unit 13, or the signal processing unit You may acquire directly from 21. The input sound acquisition unit 103 is not limited to acquiring singing voice data indicating an input sound to the microphone 23, and acquires singing voice data indicating an input sound to the external device via the network by the communication unit 19. Also good.

The pitch calculation unit 105 analyzes the singing voice data acquired by the input sound acquisition unit 103, and calculates a temporal change of the singing pitch (frequency), that is, a singing pitch waveform. Specifically, the singing pitch waveform is calculated by a known method such as a method using a zero cross of a waveform of a singing voice or a method using FFT (Fast Fourier Transform). The specific section detection unit 107 analyzes the singing pitch waveform and detects a section (specific section) including the singing technique defined by the evaluation reference information in the singing voice input period. The specific section detected at this time may be associated with each type of singing technique.

The pitch comparison unit 109 sets a section excluding the specific section detected by the specific section detection unit 107 in the singing voice input period as an evaluation section. The pitch comparison unit 109 compares the singing pitch waveform in the evaluation section with the reference pitch. As a comparison result, in this example, the degree of mismatch between the singing pitch waveform and the reference pitch is calculated. Here, a plurality of reference pitches exist at 100 cent intervals. Therefore, the reference pitch closest to the singing pitch among the plurality of reference pitches is selected as a comparison target of the singing pitch. The greater the difference between the singing pitch waveform and the reference pitch, the higher the discrepancy. For example, the difference between the singing pitch and the reference pitch in each sample of the singing pitch waveform is added in the evaluation section, and the added value is divided by the number of samples in the evaluation section, thereby calculating the degree of inconsistency.

It should be noted that the pitch range of the predetermined range from the reference pitch to the sharp side (high pitch side) and the flat side (low pitch side) is the pass area, the other is the reject area, and the number of samples in the reject area is The degree of inconsistency may be calculated by dividing by the number of samples. The pitch width of the pass area may be different between the sharp side and the flat side. Further, different weights may be applied to the added value of the number of samples on the sharp side and the flat side in the rejected area. For example, weighting may be performed so that the influence on the mismatch degree of the added value of the number of samples shifted to the sharp side is increased. According to this, it is also possible to perform an evaluation that becomes severe for a singing shifted to the sharp side, which is considered to be conspicuous in the sense of hearing. Further, the degree of inconsistency may be calculated by adding the difference between the singing pitch and the reference pitch in each sample of the singing pitch waveform in the rejected area in the evaluation section, and dividing the added value by the number of samples in the evaluation section. .

In this way, singing evaluation independent of the melody can be performed. In this example, the singing pitch and the reference pitch are compared not in the entire singing voice input period but in a section excluding the specific section. Therefore, the intentional singing pitch shift by the singing technique in the specific section can be prevented from increasing the degree of inconsistency. Note that the evaluation section may be further divided into a plurality of sections, and the degree of mismatch may be calculated in each section. The sections divided into a plurality may have sections that are partially overlapped.

The evaluation unit 111 calculates an evaluation value serving as an index for evaluating the singing voice based on the comparison result in the pitch comparison unit 109. In this example, the higher the mismatch degree calculated by the pitch comparison unit 109, the lower the evaluation value is calculated, and the evaluation of the singing voice becomes worse. Note that the evaluation unit 111 may calculate the evaluation value based on another element, instead of calculating the evaluation value based only on the degree of mismatch. As other elements, a singing technique and other parameters that can be extracted from the singing voice data are assumed. When reflecting the singing technique in the evaluation value, the singing technique corresponding to the specific section detected by the specific section detecting unit 107 may be used. Another parameter is, for example, volume change. If volume change is used, singing intonation can be added to the evaluation. The evaluation result by the evaluation unit 111 may be presented on the display unit 17. As described above, the evaluation method from when the singing voice data is input until the evaluation value is calculated is provided by sequentially executing the processing in each configuration of the evaluation function 100.

[Example of singing evaluation]
The singing voice evaluation method by the evaluation function 100 described above will be described using a specific example of the singing pitch shown in FIG.

FIG. 3 is a diagram for explaining an evaluation method in the first embodiment of the present invention. The waveform shown to Fig.3 (a) is an example of the song pitch waveform in a part of song. The vertical axis represents the pitch. The broken lines arranged every 100 cents in the pitch direction indicate a plurality of reference pitches. The horizontal axis shows the passage of time. The specific section detection unit 107 detects a specific section where the singing technique exists from the singing pitch waveform. The section S shown in FIG. 3A is a specific section corresponding to “shakuri”, the section F is “fall”, the section K is “Kobushi”, and the section V is “vibrato”. Therefore, the evaluation section is other than the specific section corresponding to the sections S, F, K, and V.

The inconsistency calculated in the pitch comparison unit 109 corresponds to the added value of the difference between the singing pitch and the reference pitch in each sample, that is, the area of the hatched portion shown in FIG. In the section V, the area of the hatched portion shown in FIG. 3C increases due to the characteristic of the vibrato pitch change. Therefore, when the section V is included in the comparison target of the singing pitch and the reference pitch, the disagreement degree is greatly calculated and the evaluation of the singing is lowered despite the rich singing using the vibrato singing technique. May end up. Even in such a case, if the singing pitch is compared with the reference pitch in the evaluation section excluding the specific section including the section V as in the evaluation device 1 in the present embodiment, the singing technique is used. It is possible to prevent the evaluation from being lowered. Here, an example of vibrato is shown as a singing technique, but the same applies to “shakuri”, “kobushi”, and “fall”.

Second Embodiment
FIG. 4 is a diagram for explaining an evaluation method according to the second embodiment of the present invention. In the second embodiment, the pitch comparison unit 109 removes the specific section in advance, generates a singing pitch waveform in which only the evaluation section is extracted, and compares the singing pitch with the reference pitch. When this method is used when collectively analyzing the singing voice data of the entire song, the processing efficiency is good.

<Third Embodiment>
In 3rd Embodiment, it is an evaluation method from which the calculation method of the mismatch degree used for song evaluation differs. In the embodiment described above, the degree of inconsistency is calculated by the pitch comparison unit 109 using the added value of the difference between the singing pitch and the reference pitch in each sample. In the third embodiment, the pitch is divided in units of a predetermined pitch width (for example, 2 cent), and the frequency of the singing pitch in each divided pitch range, that is, the number of samples including the singing pitch in each pitch range is calculated. To do. For the calculation of the frequency, the singing pitch of the evaluation section is used as in the above-described embodiment.

FIG. 5 is a diagram for explaining an evaluation method according to the third embodiment of the present invention. The vertical axis represents the pitch (each pitch range). The broken lines arranged every 100 cents in the pitch direction indicate a plurality of reference pitches. The horizontal axis shows the passage of time. Fig.5 (a) has shown frequency distribution of the singing pitch from which singing evaluation becomes high. FIG.5 (b) has shown frequency distribution of the singing pitch from which singing evaluation becomes low.

Comparing FIG. 5 (a) and FIG. 5 (b), the peak positions are almost the same, but the frequency distribution shown in FIG. 5 (a) shows a higher peak (difference between peak and dip is larger). And the peak width is narrow. Since this indicates that the singing pitch is stable in the vicinity of the reference pitch, the calculated inconsistency is low. On the other hand, FIG. 5B shows that the variation in the singing pitch with respect to the reference pitch is large, so the degree of mismatch becomes large. Further, the degree of mismatch may be increased as the peak position is shifted from the reference pitch. As described above, the pitch comparison unit 109 according to the third embodiment may calculate the degree of inconsistency based on the frequency distribution of the singing pitch in each pitch range.

<Fourth embodiment>
The fourth embodiment is an evaluation method in which, in the evaluation method of the third embodiment, even when the peak of the frequency distribution is deviated from the reference pitch, if the predetermined condition is satisfied, the degree of mismatch is not increased. .

FIG. 6 is a diagram for explaining an evaluation method in the fourth embodiment of the present invention. FIG. 6A is different from FIG. 5A in that the peak position (arrow portion) of the frequency distribution is lower than the reference pitch. However, all peaks are lower than the reference pitch as a whole. In such a case, the pitch comparison unit 109 adjusts the relationship between the frequency distribution and the reference pitch in the pitch direction. At this time, for each peak of the frequency distribution, the relationship between the frequency distribution and the reference pitch is adjusted so that the sum of pitch differences from the reference pitch closest to each peak is minimized. For example, by reducing the reference pitch as shown in FIG. 6B from the state of FIG. 6A, the peak and the reference pitch can be substantially matched. After making such adjustments, the degree of inconsistency is calculated.

By adjusting the relationship between the frequency distribution and the reference pitch in this way, even if the singing voice deviates from the absolute pitch, the singing evaluation can be increased if the singing pitch is correctly shifted by a multiple of 100 cent. it can. Moreover, when the pitch used as the reference | standard of a music has shifted | deviated from 440Hz, when singing along with the accompaniment based on the shifted pitch, song evaluation can be made high. In the evaluation method according to the third embodiment, in the frequency distribution of FIG. 6A, the singing evaluation becomes low due to the pitch difference between the peak and the reference pitch. However, as in the fourth embodiment, FIG. If the frequency distribution of (b) is adjusted, the singing evaluation can be prevented from being lowered.

<Fifth Embodiment>
In the said embodiment, although the area remove | excluding the specific area from the input period of singing voice was made into the evaluation area, in 5th Embodiment, at least in a specific area, a song pitch waveform is changed into another waveform and evaluated.

FIG. 7 is a block diagram showing the configuration of the evaluation function in the fifth embodiment of the present invention. Of the evaluation function 100A, a configuration (pitch change unit 113, pitch comparison unit 109A) different from the above embodiment will be described. The pitch changing unit 113 changes the singing pitch waveform of the specific section detected by the specific section detecting unit 107 according to a change rule corresponding to the singing technique. A specific example will be described with reference to FIG.

FIG. 8 is a diagram for explaining an evaluation method in the fifth embodiment of the present invention. In FIG. 8, the singing pitch waveform indicated by the broken line is changed in the specific section, and becomes the singing pitch waveform indicated by the solid line. As shown in this example, the singing pitch waveform is changed so as to reduce the singing pitch deviating from the reference pitch. For example, in “Shakuri” and “Fall”, the singing pitch is changed so as to change sharply. “Kobushi” is changed so as to smooth the sharp rise and fall of the singing pitch. In “vibrato”, a change is made so as to smooth the fluctuation of the periodic singing pitch (for example, smoothing to an average per period). At this time, the change may be made not only in the specific section but also in the peripheral portion of the specific section. It should be noted that, in the specific section, only the part that changes the singing pitch waveform in the singing technique is removed, and as a result, only the part where the singing pitch waveform changes little, that is, the part where the singing pitch waveform is almost horizontal is extracted. And may be evaluated.

The pitch comparison unit 109A compares the singing pitch changed by the pitch changing unit 113 with the reference pitch. The section to be compared is an input period of singing voice unlike the above embodiment. That is, the specific section is not excluded. On the other hand, unlike the above-described embodiment, the singing pitch is changed in the specific section in which the singing technique is detected, and as a result, the period during which the singing pitch deviates from the reference pitch is shortened. Therefore, even if the evaluation excluding the specific section is not executed, the degree of mismatch is unlikely to increase. It is also possible to perform singing evaluation in this way. Note that the singing pitch is not changed in all the specific sections, but some specific sections may be excluded from the evaluation target as in the above embodiment.

<Sixth Embodiment>
In the said embodiment, although the reference | standard pitch compared with a song pitch was not changed over the whole song, in 6th Embodiment, a reference | standard pitch is changed with the position of a song.

FIG. 9 is a block diagram showing the configuration of the evaluation function in the sixth embodiment of the present invention. Of the evaluation function 100B, a configuration (code detection unit 115, pitch comparison unit 109B) different from the above embodiment will be described. The chord detection unit 115 analyzes the accompaniment sound output by the accompaniment output unit 101 and detects chords in each period of the song. The pitch comparison unit 109B sets a reference pitch in each period based on the chord detected by the chord detection unit 115 and compares it with the singing pitch. In this example, the chord constituent sound is set as the reference pitch. A specific example will be described with reference to FIG.

FIG. 10 is a diagram for explaining an evaluation method according to the sixth embodiment of the present invention. In this example, the code of the first half period is detected as “Dm7”, and the code of the second half period is detected as “G7”. Therefore, in the first half period of “Dm7”, the reference pitch is set as “D”, “F”, “A”, “C”. In the second half period of “G7”, the reference pitch is set as “G”, “B”, “D”, “F”. In the evaluation section, the pitch comparison unit 109B compares the singing pitch with the reference pitch set according to the chord, and calculates the degree of inconsistency.

Thus, the reference pitch may not be fixed over the entire song, and may be changed in the evaluation section. In addition, as described above, when the reference pitch is determined based on the accompaniment sound, the reference pitch may be a chord constituent sound or a scale constituent sound. Further, the key of the song may be detected from the accompaniment sound, and the reference pitch may be determined based on this key. In addition, even if it is a case where the reference pitch is fixed over the whole song song, or it is a case where it changes in an evaluation area, a reference pitch may be predetermined. For example, based on a key preset for the song, a reference pitch based on this key may be determined in advance. In this case, a part of the pitch whose reference pitch is determined in advance at 100 cent intervals is used. In the above embodiment, the reference pitch is determined at 100 cent intervals (semitone intervals). However, the reference pitch is another interval (for example, 50 cent intervals, 200 cent intervals, etc.) other than 100 cents whose frequencies are logarithmically equal. Also good.

<Seventh embodiment>
In the seventh embodiment, an evaluation function 100C in which the timing at which each constituent sound of the melody sung by the singer is emitted is added to the singing evaluation.

FIG. 11 is a block diagram showing the configuration of the evaluation function in the seventh embodiment of the present invention. Of the evaluation function 100C, configurations (timing evaluation unit 117, evaluation unit 111C) different from the above embodiment will be described. The timing evaluation unit 117 analyzes the accompaniment sound and detects beats in each period of the song, that is, beat timing. In addition, the timing evaluation unit 117 detects note-on timing (timing at which the singer utters the constituent sound of the melody) based on the volume change of the singing voice. The timing evaluation unit 117 compares the detected beat timing with the note-on timing. The evaluation unit 111C reflects the comparison result in the timing evaluation unit 117 in calculating the evaluation value. That is, the evaluation value is calculated so that the higher the beat timing and the note-on timing match, the higher the evaluation.

<Eighth Embodiment>
In the said embodiment, the specific area detection part 107 has detected the specific area where the waveform corresponding to the singing technique is contained in the singing pitch. In the eighth embodiment, a specific section that should be excluded as an evaluation section is also detected in other areas. For example, the specific section may be classified into a section (technical section) corresponding to a singing technique that is to be added and a section (unauthorized section) that is not to be added. The illegal section is special when the sound input as the singing voice is a singing that should not be evaluated (for example, a sound other than a singing sound such as an accompaniment sound input to the microphone 23), or for special evaluation. It is a section detected as a case of singing.

The input singing voice is detected as an illegal section when, for example, the same singing pitch continues for a predetermined time or more. Specifically, it is a case where the data obtained by differentiating the data string of the singing pitch is integrated with positive and negative, respectively, and there is no undulation more than a certain level in the singing pitch. It is relatively easy to continue to sing at the same singing pitch and try to match the reference pitch. Therefore, if it continues singing with the same singing pitch, high evaluation will be obtained. Therefore, in such a case, it should be detected as an illegal song. On the other hand, regardless of the melody of the song, it is not easy to sing with the singing pitch varied. Therefore, if the singing pitch is more than a certain level, it is estimated that the melody of the singing song is sung.

Also, when the accompaniment sound wraps around the input sound and there is a high possibility that it is not a singing sound, it is detected as an illegal section. When there is a high possibility that an accompaniment sound is included in the sound input to the microphone 23, a section having a singing pitch waveform shown below is detected. For example, when the singing pitch is not continuous for a predetermined time, when the singing pitch changes too rapidly, when the singing pitch changes discontinuously and greatly in a short period (when there is a jump in the pitch), etc. is there.

In addition, you may detect an illegal area from parameters other than the singing pitch extracted from singing voice data. For example, when the volume is the same (or hardly changes) for a predetermined time or more, a note (one melody sound) determined from the volume change continues for a predetermined time (for example, 10 seconds) or more, or a formant Is significantly different from the formant obtained by normal singing (such as when the peak distribution correlation is below a certain value), it includes sounds other than normal singing (for example, accompaniment sounds, instrument sounds, etc.) There is a high possibility. Therefore, a section satisfying such a condition is detected as an illegal section.

Exceptions such as the above are excluded from the evaluation section as specific sections. Even if the evaluation value is calculated so that the evaluation unit 111 uses the singing technique regardless of the singing pitch, the evaluation value is calculated for the part (technical section) in which the singing technique is detected in the specific section. The good thing was mentioned above. On the other hand, the illegal section may not affect the calculation of the evaluation value. Moreover, since it may be an improper input sound which intentionally aimed at high evaluation, an evaluation value may be calculated so that evaluation may be made low.

<Other embodiments>
In the said embodiment, although the evaluation area was set as an area which excluded the specific area from the input period of singing voice, it is not restricted to this. For example, the evaluation section may be set so that a predetermined section before or after the specific section is also excluded from the evaluation target. A part of the specific section may be incorporated in the evaluation section. The evaluation section may be determined in advance as a predetermined section regardless of the specific section. In this case, since the detection of the specific section is unnecessary, as shown in FIG. 12, there is no specific section detection unit, and the singing pitch waveform is set as the reference pitch in the evaluation section determined regardless of the specific section. An evaluation function 100D including a comparison unit 109D for comparison may be realized.

Further, the evaluation method in this example is shown in the flowchart shown in FIG. That is, the evaluation method in this example acquires an input sound (step S101), calculates a pitch of the acquired input sound (step S103), and calculates a plurality of reference pitches and a calculated input in a predetermined evaluation section. Each process includes comparing the pitch of the sound (step S105) and calculating an evaluation value for the input sound based on the comparison result (step S107). Note that, as in the first embodiment, a step of detecting a specific section in which the calculated pitch of the input sound changes in the input sound input period may be included. In this case, the predetermined evaluation section may be determined based on a specific section in the input period. In addition, as in the fifth embodiment, by having the step of changing the pitch of the input sound in the specific section according to the change rule, the pitch of the input sound to be compared includes the pitch of the input sound after being changed. It may be like this.

The degree of inconsistency calculated in the pitch comparison unit 109 is obtained by adding the difference between the reference pitch and the singing pitch for each sample, but weighting may be performed so that the degree of inconsistency increases as the difference increases. . For example, when the difference is 20 cent compared to 10 cent, the difference may be tripled instead of doubling. On the other hand, when the difference between the reference pitch and the singing pitch is smaller than a predetermined range (for example, 2 cent or less), it may be treated as matching (difference 0 cent) and the degree of mismatch may not be increased. .

Also, the difference between the reference pitch and the singing pitch may be added separately for the flat side and the sharp side. Then, a flat-side mismatch degree and a sharp-side mismatch degree may be calculated, respectively. If the degree of mismatch is biased to any one, it can be determined whether the song is out of sharpness or out of flatness.

The sound indicated by the singing voice data acquired by the input sound acquisition unit 103 is not limited to the voice by the singer, but may be a voice by singing synthesis or an instrument sound. If it is a musical instrument sound, it is desirable to be a single note performance. In the case of a musical instrument sound, techniques detected as a specific section include, for example, vibrato, staccato, bend up (shakri), bend down (fall), and slide (portamento). Of these techniques, vibrato, bend-up, bend-down, and slide with pitch change are detected in the same manner as in the embodiment. In order to affect the degree of inconsistency calculated in the pitch comparison unit 109, the specific section detected in this way is excluded from the evaluation in the evaluation section as in the case of singing. In addition to these, techniques related to timbres such as expression of notes such as trills and extremely short modified sounds, saxophone growls, guitar cutting, etc. are also detected as specific sections and excluded from evaluation because they affect the pitch acquisition accuracy. It is good also as an object of. Furthermore, in order to prevent malfunction, a musical instrument capable of generating complex sounds may be detected by detecting complex sounds. In addition, including the case of detecting the technique section and the illegal section as shown in the eighth embodiment, the input sound to the microphone 23 is not limited to the singing voice and may be an instrument sound in all the embodiments.
Although the present invention has been described in detail and with reference to specific embodiments, it will be apparent to those skilled in the art that various changes and modifications can be made without departing from the spirit and scope of the invention.

This application is based on Japanese Patent Application No. 2015-053020 filed on March 17, 2015 and Japanese Patent Application No. 2016-013318 filed on January 27, 2016, the contents of which are incorporated herein by reference.

DESCRIPTION OF SYMBOLS 1 ... Evaluation apparatus, 11 ... Control part, 13 ... Memory | storage part, 15 ... Operation part, 17 ... Display part, 19 ... Communication part, 21 ... Signal processing part, 23 ... Microphone, 25 ... Speaker, 100 ... Evaluation function, 101 ... accompaniment output unit, 103 ... input sound acquisition unit, 105 ... pitch calculation unit, 107 ... specific section detection unit, 109 ... pitch comparison unit, 111 ... evaluation unit, 113 ... pitch change unit, 115 ... chord detection unit, 117 ... Timing evaluation section

Claims

An input sound acquisition unit for acquiring the input sound;
A pitch calculation unit for calculating the pitch of the input sound acquired by the input sound acquisition unit;
A pitch comparison unit that compares a plurality of reference pitches with the pitch of the input sound calculated by the pitch calculation unit in a predetermined evaluation section;
An evaluation unit that calculates an evaluation value for the input sound based on the result compared by the pitch comparison unit;
An evaluation apparatus comprising:
A specific section detection unit for detecting a specific section in which the pitch of the input sound calculated by the pitch calculation unit has a specific change in the input period of the input sound;
The evaluation apparatus according to claim 1, wherein the predetermined evaluation section is determined based on the specific section in the input period.
A specific section detection unit for detecting a specific section in which the pitch of the input sound changes in a specific manner in the input period of the input sound; and
A pitch changing unit for changing the pitch of the input sound in the specific section according to a change rule;
Further comprising
The pitch of the input sound calculated by the pitch calculation unit, which is a comparison target by the pitch comparison unit, includes the pitch of the input sound after being changed by the pitch changing unit. The evaluation device described.
The evaluation apparatus according to claim 1, wherein the adjacent reference pitches are determined at an interval of 100 cent.
In the input period, further comprising an accompaniment sound output unit for outputting an accompaniment sound to the sound output device,
The evaluation apparatus according to claim 1, wherein the reference pitch is determined based on the output accompaniment sound.
6. The evaluation apparatus according to claim 5, wherein the reference pitch is changed corresponding to the accompaniment sound.
The evaluation apparatus according to claim 2 or 3, wherein the evaluation unit calculates the evaluation value based on the detection of the specific section and the comparison result.
The specific section includes a technique section and an illegal section classified according to the pitch of the input sound included in the specific section,
The evaluation apparatus according to claim 2, wherein the evaluation unit has different effects on the calculation of the evaluation value between the technique section and the unauthorized section.
Get the input sound,
Calculating the pitch of the input sound,
In a predetermined evaluation section, a plurality of reference pitches are compared with the calculated pitch of the input sound,
A program for causing a computer to calculate an evaluation value for the input sound based on the compared result.