WO2016148256A1 - Evaluation device and program - Google Patents

Evaluation device and program Download PDF

Info

Publication number
WO2016148256A1
WO2016148256A1 PCT/JP2016/058583 JP2016058583W WO2016148256A1 WO 2016148256 A1 WO2016148256 A1 WO 2016148256A1 JP 2016058583 W JP2016058583 W JP 2016058583W WO 2016148256 A1 WO2016148256 A1 WO 2016148256A1
Authority
WO
WIPO (PCT)
Prior art keywords
pitch
evaluation
unit
input sound
singing
Prior art date
Application number
PCT/JP2016/058583
Other languages
French (fr)
Japanese (ja)
Inventor
松本 秀一
辰弥 寺島
Original Assignee
ヤマハ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from JP2016013318A external-priority patent/JP2016173562A/en
Application filed by ヤマハ株式会社 filed Critical ヤマハ株式会社
Publication of WO2016148256A1 publication Critical patent/WO2016148256A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K15/00Acoustics not otherwise provided for
    • G10K15/04Sound-producing devices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination

Definitions

  • the present invention relates to a technique for evaluating sound.
  • Karaoke devices often have a function to analyze and evaluate singing voice. For example, the singing is evaluated based on the degree of coincidence by comparing the pitch of the singing voice with the pitch of the melody to be sung (for example, Patent Document 1).
  • One of the objects of the present invention is to evaluate the sound without depending on the melody.
  • an input sound acquisition unit that acquires an input sound
  • a pitch calculation unit that calculates a pitch of the input sound acquired by the input sound acquisition unit, a plurality of evaluation sounds in a predetermined evaluation section
  • a pitch comparison unit that compares a reference pitch with the pitch of the input sound calculated by the pitch calculation unit
  • an evaluation unit that calculates an evaluation value for the input sound based on a result of comparison by the pitch comparison unit
  • the input sound is acquired, the pitch of the input sound is calculated, and a plurality of reference pitches are compared with the calculated pitch of the input sound in a predetermined evaluation section.
  • a program for causing a computer to calculate an evaluation value for the input sound based on the compared result is provided.
  • sound evaluation can be performed without depending on the melody.
  • FIG. 1 It is a block diagram which shows the structure of the evaluation apparatus 1 in 1st Embodiment of this invention. It is a block diagram which shows the structure of the evaluation function in 1st Embodiment of this invention.
  • (A)-(c) is a figure explaining the evaluation method in 1st Embodiment of this invention. It is a figure explaining the evaluation method in 2nd Embodiment of this invention.
  • (A) And (b) is a figure explaining the evaluation method in 3rd Embodiment of this invention.
  • (A) And (b) is a figure explaining the evaluation method in 4th Embodiment of this invention. It is a block diagram which shows the structure of the evaluation function in 5th Embodiment of this invention.
  • the evaluation apparatus which concerns on 1st Embodiment is an apparatus which evaluates the singing voice of the user who sings (it may be hereafter called a singer).
  • the evaluation device compares a plurality of pitches determined for each predetermined period (hereinafter may be referred to as a reference pitch) and a pitch of a singer's singing voice (hereinafter may be referred to as a singing pitch), Evaluate the singing. Evaluation that does not depend on the melody can be performed. In this example, even if the singing pitch is intentionally shifted from the reference pitch by performing singing using the singing technique, the evaluation is lowered. You can avoid it.
  • such an evaluation apparatus will be described.
  • FIG. 1 is a block diagram showing a configuration of an evaluation apparatus 1 in the first embodiment of the present invention.
  • the evaluation device 1 is, for example, a karaoke device.
  • portable apparatuses such as a smart phone, may be sufficient.
  • the evaluation device 1 includes a control unit 11, a storage unit 13, an operation unit 15, a display unit 17, a communication unit 19, and a signal processing unit 21. Each of these components is connected via a bus.
  • a microphone 23 and a speaker 25 are connected to the signal processing unit 21.
  • the control unit 11 includes an arithmetic processing circuit such as a CPU.
  • the control unit 11 causes the CPU to execute the control program stored in the storage unit 13 to realize various functions in the evaluation device 1.
  • the realized function includes a singing voice evaluation function.
  • the storage unit 13 is a storage device such as a nonvolatile memory or a hard disk.
  • the storage unit 13 stores a control program for realizing the evaluation function.
  • the control program may be provided in a state stored in a computer-readable recording medium such as a magnetic recording medium, an optical recording medium, a magneto-optical recording medium, or a semiconductor memory.
  • the evaluation device 1 only needs to include a device that reads the recording medium.
  • the control program may be downloaded via a network.
  • storage part 13 memorize
  • the music data includes data related to the karaoke song, for example, guide melody data, accompaniment data, and lyrics data.
  • the guide melody data is data indicating the melody of the song.
  • Accompaniment data is data indicating the accompaniment of a song.
  • the guide melody data and accompaniment data may be data expressed in the MIDI format.
  • the lyric data is data for displaying the lyrics of the song and data indicating the timing for changing the color of the displayed lyrics telop.
  • the singing voice data is data indicating the singing voice input from the microphone 23 by the singer. In this example, the singing voice data is buffered in the storage unit 13 until the singing voice is evaluated by the evaluation function.
  • the evaluation standard information is information used by the evaluation function as a standard for evaluating the singing voice.
  • the evaluation reference information includes information for specifying a change in singing pitch (singing pitch waveform) for detecting a singing technique.
  • a singing technique such as vibrato, kobushi, shakuri, or fall
  • the following singing pitch waveform is shown.
  • Vibrato The pitch changes finely (within a predetermined period or less).
  • a specific example of vibrato detection is disclosed in Japanese Patent Application Laid-Open No. 2005-107087.
  • Kobushi The pitch temporarily increases (within a predetermined time), and then returns to the original pitch.
  • Kobushi detection is disclosed in Japanese Patent Laid-Open No.
  • the reference pitch to be compared with the singing pitch is defined in the evaluation reference information.
  • the reference pitch is defined as a plurality of pitches.
  • the reference pitch includes 440 Hz and is defined at 100 cent intervals with 440 Hz as a reference.
  • a reference pitch should just be prescribed
  • the shift amount may be stored as tune information in the music data.
  • the operation unit 15 is a device such as operation buttons, a keyboard, and a mouse provided on the operation panel and the remote controller, and outputs a signal corresponding to the input operation to the control unit 11.
  • the display unit 17 is a display device such as a liquid crystal display or an organic EL display, and displays a screen based on control by the control unit 11. Note that the operation unit 15 and the display unit 17 may integrally form a touch panel.
  • the communication unit 19 is connected to a communication line such as the Internet based on the control of the control unit 11 and transmits / receives information to / from an external device such as a server.
  • the function of the storage unit 13 may be realized by an external device that can communicate with the communication unit 19.
  • the signal processing unit 21 includes a sound source that generates an audio signal from a MIDI format signal, an A / D converter, a D / A converter, and the like.
  • the singing voice is converted into an electric signal by the microphone 23 and input to the signal processing unit 21, and A / D converted by the signal processing unit 21 and output to the control unit 11.
  • the singing voice is buffered in the storage unit 13 as singing voice data.
  • the accompaniment data is read out by the control unit 11, D / A converted by the signal processing unit 21, and output from the speaker 25 as an accompaniment of the song.
  • a guide melody may be output from the speaker 25.
  • evaluation function An evaluation function realized by the control unit 11 of the evaluation apparatus 1 executing a control program will be described. A part or all of the configuration for realizing the evaluation function described below may be realized by hardware.
  • FIG. 2 is a block diagram showing the configuration of the evaluation function in the first embodiment of the present invention.
  • the evaluation function 100 includes an accompaniment output unit 101, an input sound acquisition unit 103, a pitch calculation unit 105, a specific section detection unit 107, a pitch comparison unit 109, and an evaluation unit 111.
  • the accompaniment output unit 101 reads the accompaniment data corresponding to the song tune designated by the singer, and causes the speaker 25 to output the accompaniment sound via the signal processing unit 21.
  • the input sound acquisition unit 103 acquires singing voice data indicating the singing voice input from the microphone 23. In this example, the input sound to the microphone 23 during the period in which the accompaniment sound is output is recognized as the singing sound to be evaluated.
  • the input sound acquisition unit 103 acquires the singing voice data buffered in the storage unit 13, but may be acquired after the singing voice data of the entire song is stored in the storage unit 13, or the signal processing unit You may acquire directly from 21.
  • the input sound acquisition unit 103 is not limited to acquiring singing voice data indicating an input sound to the microphone 23, and acquires singing voice data indicating an input sound to the external device via the network by the communication unit 19. Also good.
  • the pitch calculation unit 105 analyzes the singing voice data acquired by the input sound acquisition unit 103, and calculates a temporal change of the singing pitch (frequency), that is, a singing pitch waveform. Specifically, the singing pitch waveform is calculated by a known method such as a method using a zero cross of a waveform of a singing voice or a method using FFT (Fast Fourier Transform).
  • the specific section detection unit 107 analyzes the singing pitch waveform and detects a section (specific section) including the singing technique defined by the evaluation reference information in the singing voice input period. The specific section detected at this time may be associated with each type of singing technique.
  • the pitch comparison unit 109 sets a section excluding the specific section detected by the specific section detection unit 107 in the singing voice input period as an evaluation section.
  • the pitch comparison unit 109 compares the singing pitch waveform in the evaluation section with the reference pitch.
  • the degree of mismatch between the singing pitch waveform and the reference pitch is calculated.
  • a plurality of reference pitches exist at 100 cent intervals. Therefore, the reference pitch closest to the singing pitch among the plurality of reference pitches is selected as a comparison target of the singing pitch. The greater the difference between the singing pitch waveform and the reference pitch, the higher the discrepancy.
  • the difference between the singing pitch and the reference pitch in each sample of the singing pitch waveform is added in the evaluation section, and the added value is divided by the number of samples in the evaluation section, thereby calculating the degree of inconsistency.
  • the pitch range of the predetermined range from the reference pitch to the sharp side (high pitch side) and the flat side (low pitch side) is the pass area
  • the other is the reject area
  • the number of samples in the reject area is The degree of inconsistency may be calculated by dividing by the number of samples.
  • the pitch width of the pass area may be different between the sharp side and the flat side.
  • different weights may be applied to the added value of the number of samples on the sharp side and the flat side in the rejected area. For example, weighting may be performed so that the influence on the mismatch degree of the added value of the number of samples shifted to the sharp side is increased.
  • the degree of inconsistency may be calculated by adding the difference between the singing pitch and the reference pitch in each sample of the singing pitch waveform in the rejected area in the evaluation section, and dividing the added value by the number of samples in the evaluation section.
  • the evaluation section may be further divided into a plurality of sections, and the degree of mismatch may be calculated in each section.
  • the sections divided into a plurality may have sections that are partially overlapped.
  • the evaluation unit 111 calculates an evaluation value serving as an index for evaluating the singing voice based on the comparison result in the pitch comparison unit 109.
  • the higher the mismatch degree calculated by the pitch comparison unit 109 the lower the evaluation value is calculated, and the evaluation of the singing voice becomes worse.
  • the evaluation unit 111 may calculate the evaluation value based on another element, instead of calculating the evaluation value based only on the degree of mismatch.
  • a singing technique and other parameters that can be extracted from the singing voice data are assumed.
  • the singing technique corresponding to the specific section detected by the specific section detecting unit 107 may be used.
  • Another parameter is, for example, volume change. If volume change is used, singing intonation can be added to the evaluation.
  • the evaluation result by the evaluation unit 111 may be presented on the display unit 17. As described above, the evaluation method from when the singing voice data is input until the evaluation value is calculated is provided by sequentially executing the processing in each configuration of the evaluation function 100.
  • FIG. 3 is a diagram for explaining an evaluation method in the first embodiment of the present invention.
  • the waveform shown to Fig.3 (a) is an example of the song pitch waveform in a part of song.
  • the vertical axis represents the pitch.
  • the broken lines arranged every 100 cents in the pitch direction indicate a plurality of reference pitches.
  • the horizontal axis shows the passage of time.
  • the specific section detection unit 107 detects a specific section where the singing technique exists from the singing pitch waveform.
  • the section S shown in FIG. 3A is a specific section corresponding to “shakuri”, the section F is “fall”, the section K is “Kobushi”, and the section V is “vibrato”. Therefore, the evaluation section is other than the specific section corresponding to the sections S, F, K, and V.
  • the inconsistency calculated in the pitch comparison unit 109 corresponds to the added value of the difference between the singing pitch and the reference pitch in each sample, that is, the area of the hatched portion shown in FIG. In the section V, the area of the hatched portion shown in FIG. 3C increases due to the characteristic of the vibrato pitch change. Therefore, when the section V is included in the comparison target of the singing pitch and the reference pitch, the disagreement degree is greatly calculated and the evaluation of the singing is lowered despite the rich singing using the vibrato singing technique. May end up. Even in such a case, if the singing pitch is compared with the reference pitch in the evaluation section excluding the specific section including the section V as in the evaluation device 1 in the present embodiment, the singing technique is used. It is possible to prevent the evaluation from being lowered.
  • an example of vibrato is shown as a singing technique, but the same applies to “shakuri”, “kobushi”, and “fall”.
  • FIG. 4 is a diagram for explaining an evaluation method according to the second embodiment of the present invention.
  • the pitch comparison unit 109 removes the specific section in advance, generates a singing pitch waveform in which only the evaluation section is extracted, and compares the singing pitch with the reference pitch.
  • this method is used when collectively analyzing the singing voice data of the entire song, the processing efficiency is good.
  • the degree of inconsistency is calculated by the pitch comparison unit 109 using the added value of the difference between the singing pitch and the reference pitch in each sample.
  • the pitch is divided in units of a predetermined pitch width (for example, 2 cent), and the frequency of the singing pitch in each divided pitch range, that is, the number of samples including the singing pitch in each pitch range is calculated. To do.
  • the singing pitch of the evaluation section is used as in the above-described embodiment.
  • FIG. 5 is a diagram for explaining an evaluation method according to the third embodiment of the present invention.
  • the vertical axis represents the pitch (each pitch range).
  • the broken lines arranged every 100 cents in the pitch direction indicate a plurality of reference pitches.
  • the horizontal axis shows the passage of time.
  • Fig.5 (a) has shown frequency distribution of the singing pitch from which singing evaluation becomes high.
  • FIG.5 (b) has shown frequency distribution of the singing pitch from which singing evaluation becomes low.
  • the pitch comparison unit 109 may calculate the degree of inconsistency based on the frequency distribution of the singing pitch in each pitch range.
  • the fourth embodiment is an evaluation method in which, in the evaluation method of the third embodiment, even when the peak of the frequency distribution is deviated from the reference pitch, if the predetermined condition is satisfied, the degree of mismatch is not increased. .
  • FIG. 6 is a diagram for explaining an evaluation method in the fourth embodiment of the present invention.
  • FIG. 6A is different from FIG. 5A in that the peak position (arrow portion) of the frequency distribution is lower than the reference pitch. However, all peaks are lower than the reference pitch as a whole.
  • the pitch comparison unit 109 adjusts the relationship between the frequency distribution and the reference pitch in the pitch direction. At this time, for each peak of the frequency distribution, the relationship between the frequency distribution and the reference pitch is adjusted so that the sum of pitch differences from the reference pitch closest to each peak is minimized. For example, by reducing the reference pitch as shown in FIG. 6B from the state of FIG. 6A, the peak and the reference pitch can be substantially matched. After making such adjustments, the degree of inconsistency is calculated.
  • the singing evaluation can be increased if the singing pitch is correctly shifted by a multiple of 100 cent. it can.
  • standard of a music has shifted
  • when singing along with the accompaniment based on the shifted pitch song evaluation can be made high.
  • the singing evaluation becomes low due to the pitch difference between the peak and the reference pitch.
  • FIG. If the frequency distribution of (b) is adjusted, the singing evaluation can be prevented from being lowered.
  • FIG. 7 is a block diagram showing the configuration of the evaluation function in the fifth embodiment of the present invention.
  • the evaluation function 100A a configuration (pitch change unit 113, pitch comparison unit 109A) different from the above embodiment will be described.
  • the pitch changing unit 113 changes the singing pitch waveform of the specific section detected by the specific section detecting unit 107 according to a change rule corresponding to the singing technique.
  • a specific example will be described with reference to FIG.
  • FIG. 8 is a diagram for explaining an evaluation method in the fifth embodiment of the present invention.
  • the singing pitch waveform indicated by the broken line is changed in the specific section, and becomes the singing pitch waveform indicated by the solid line.
  • the singing pitch waveform is changed so as to reduce the singing pitch deviating from the reference pitch.
  • the singing pitch is changed so as to change sharply.
  • “Kobushi” is changed so as to smooth the sharp rise and fall of the singing pitch.
  • vibrato a change is made so as to smooth the fluctuation of the periodic singing pitch (for example, smoothing to an average per period). At this time, the change may be made not only in the specific section but also in the peripheral portion of the specific section.
  • the pitch comparison unit 109A compares the singing pitch changed by the pitch changing unit 113 with the reference pitch.
  • the section to be compared is an input period of singing voice unlike the above embodiment. That is, the specific section is not excluded.
  • the singing pitch is changed in the specific section in which the singing technique is detected, and as a result, the period during which the singing pitch deviates from the reference pitch is shortened. Therefore, even if the evaluation excluding the specific section is not executed, the degree of mismatch is unlikely to increase. It is also possible to perform singing evaluation in this way. Note that the singing pitch is not changed in all the specific sections, but some specific sections may be excluded from the evaluation target as in the above embodiment.
  • FIG. 9 is a block diagram showing the configuration of the evaluation function in the sixth embodiment of the present invention.
  • the chord detection unit 115 analyzes the accompaniment sound output by the accompaniment output unit 101 and detects chords in each period of the song.
  • the pitch comparison unit 109B sets a reference pitch in each period based on the chord detected by the chord detection unit 115 and compares it with the singing pitch. In this example, the chord constituent sound is set as the reference pitch.
  • a specific example will be described with reference to FIG.
  • FIG. 10 is a diagram for explaining an evaluation method according to the sixth embodiment of the present invention.
  • the code of the first half period is detected as “Dm7”
  • the code of the second half period is detected as “G7”. Therefore, in the first half period of “Dm7”, the reference pitch is set as “D”, “F”, “A”, “C”. In the second half period of “G7”, the reference pitch is set as “G”, “B”, “D”, “F”.
  • the pitch comparison unit 109B compares the singing pitch with the reference pitch set according to the chord, and calculates the degree of inconsistency.
  • the reference pitch may not be fixed over the entire song, and may be changed in the evaluation section.
  • the reference pitch when the reference pitch is determined based on the accompaniment sound, the reference pitch may be a chord constituent sound or a scale constituent sound.
  • the key of the song may be detected from the accompaniment sound, and the reference pitch may be determined based on this key.
  • a reference pitch may be predetermined. For example, based on a key preset for the song, a reference pitch based on this key may be determined in advance. In this case, a part of the pitch whose reference pitch is determined in advance at 100 cent intervals is used.
  • the reference pitch is determined at 100 cent intervals (semitone intervals).
  • the reference pitch is another interval (for example, 50 cent intervals, 200 cent intervals, etc.) other than 100 cents whose frequencies are logarithmically equal. Also good.
  • FIG. 11 is a block diagram showing the configuration of the evaluation function in the seventh embodiment of the present invention.
  • the timing evaluation unit 117 analyzes the accompaniment sound and detects beats in each period of the song, that is, beat timing.
  • the timing evaluation unit 117 detects note-on timing (timing at which the singer utters the constituent sound of the melody) based on the volume change of the singing voice.
  • the timing evaluation unit 117 compares the detected beat timing with the note-on timing.
  • the evaluation unit 111C reflects the comparison result in the timing evaluation unit 117 in calculating the evaluation value. That is, the evaluation value is calculated so that the higher the beat timing and the note-on timing match, the higher the evaluation.
  • the specific area detection part 107 has detected the specific area where the waveform corresponding to the singing technique is contained in the singing pitch.
  • a specific section that should be excluded as an evaluation section is also detected in other areas.
  • the specific section may be classified into a section (technical section) corresponding to a singing technique that is to be added and a section (unauthorized section) that is not to be added.
  • the illegal section is special when the sound input as the singing voice is a singing that should not be evaluated (for example, a sound other than a singing sound such as an accompaniment sound input to the microphone 23), or for special evaluation. It is a section detected as a case of singing.
  • the input singing voice is detected as an illegal section when, for example, the same singing pitch continues for a predetermined time or more.
  • the data obtained by differentiating the data string of the singing pitch is integrated with positive and negative, respectively, and there is no undulation more than a certain level in the singing pitch. It is relatively easy to continue to sing at the same singing pitch and try to match the reference pitch. Therefore, if it continues singing with the same singing pitch, high evaluation will be obtained. Therefore, in such a case, it should be detected as an illegal song.
  • regardless of the melody of the song it is not easy to sing with the singing pitch varied. Therefore, if the singing pitch is more than a certain level, it is estimated that the melody of the singing song is sung.
  • the accompaniment sound wraps around the input sound and there is a high possibility that it is not a singing sound, it is detected as an illegal section.
  • a section having a singing pitch waveform shown below is detected. For example, when the singing pitch is not continuous for a predetermined time, when the singing pitch changes too rapidly, when the singing pitch changes discontinuously and greatly in a short period (when there is a jump in the pitch), etc. is there.
  • the evaluation area was set as an area which excluded the specific area from the input period of singing voice, it is not restricted to this.
  • the evaluation section may be set so that a predetermined section before or after the specific section is also excluded from the evaluation target. A part of the specific section may be incorporated in the evaluation section.
  • the evaluation section may be determined in advance as a predetermined section regardless of the specific section. In this case, since the detection of the specific section is unnecessary, as shown in FIG. 12, there is no specific section detection unit, and the singing pitch waveform is set as the reference pitch in the evaluation section determined regardless of the specific section.
  • An evaluation function 100D including a comparison unit 109D for comparison may be realized.
  • the evaluation method in this example is shown in the flowchart shown in FIG. That is, the evaluation method in this example acquires an input sound (step S101), calculates a pitch of the acquired input sound (step S103), and calculates a plurality of reference pitches and a calculated input in a predetermined evaluation section.
  • Each process includes comparing the pitch of the sound (step S105) and calculating an evaluation value for the input sound based on the comparison result (step S107).
  • a step of detecting a specific section in which the calculated pitch of the input sound changes in the input sound input period may be included.
  • the predetermined evaluation section may be determined based on a specific section in the input period.
  • the pitch of the input sound to be compared includes the pitch of the input sound after being changed. It may be like this.
  • the degree of inconsistency calculated in the pitch comparison unit 109 is obtained by adding the difference between the reference pitch and the singing pitch for each sample, but weighting may be performed so that the degree of inconsistency increases as the difference increases. . For example, when the difference is 20 cent compared to 10 cent, the difference may be tripled instead of doubling. On the other hand, when the difference between the reference pitch and the singing pitch is smaller than a predetermined range (for example, 2 cent or less), it may be treated as matching (difference 0 cent) and the degree of mismatch may not be increased. .
  • a predetermined range for example, 2 cent or less
  • the difference between the reference pitch and the singing pitch may be added separately for the flat side and the sharp side. Then, a flat-side mismatch degree and a sharp-side mismatch degree may be calculated, respectively. If the degree of mismatch is biased to any one, it can be determined whether the song is out of sharpness or out of flatness.
  • the sound indicated by the singing voice data acquired by the input sound acquisition unit 103 is not limited to the voice by the singer, but may be a voice by singing synthesis or an instrument sound. If it is a musical instrument sound, it is desirable to be a single note performance.
  • techniques detected as a specific section include, for example, vibrato, staccato, bend up (shakri), bend down (fall), and slide (portamento). Of these techniques, vibrato, bend-up, bend-down, and slide with pitch change are detected in the same manner as in the embodiment. In order to affect the degree of inconsistency calculated in the pitch comparison unit 109, the specific section detected in this way is excluded from the evaluation in the evaluation section as in the case of singing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

This evaluation device according to an embodiment of the present invention is provided with: an input sound acquisition unit which acquires an input sound; a pitch calculating unit which calculates the input sound pitch of the input sound acquired by the input sound acquisition unit; a pitch comparing unit which compares the input sound pitch calculated by the pitch calculating unit with a plurality of reference pitches, in a prescribed evaluation section; and an evaluating unit which calculates an evaluation value for the input sound on the basis of the comparison result obtained by the pitch comparing unit.

Description

評価装置およびプログラムEvaluation apparatus and program
 本発明は、音を評価する技術に関する。 The present invention relates to a technique for evaluating sound.
 カラオケ装置には、歌唱音声を解析して評価する機能が備えられていることが多い。歌唱は、例えば、歌唱音声のピッチと歌唱すべきメロディのピッチとを比較し、これらの一致の程度に基づいて評価される(例えば、特許文献1)。 Karaoke devices often have a function to analyze and evaluate singing voice. For example, the singing is evaluated based on the degree of coincidence by comparing the pitch of the singing voice with the pitch of the melody to be sung (for example, Patent Document 1).
日本国特開2005-215493号公報Japanese Unexamined Patent Publication No. 2005-215493
 しかしながら、特許文献1の技術では、メロディを判定の基準に用いるため、歌唱曲ごとに判定の基準となる情報を用意する必要がある。なお、歌唱を伴わない楽器音などを評価しようとする場合にも、同様に、曲毎に判定の基準となる情報を用意する必要がなる。    However, in the technique of Patent Document 1, since a melody is used as a criterion for determination, it is necessary to prepare information serving as a criterion for determination for each song. In addition, when it is going to evaluate the instrument sound etc. which do not accompany a song, it becomes necessary to prepare the information used as the determination reference | standard for every music similarly. *
 本発明の目的の一つは、メロディに依存せずに音の評価を行うことにある。 One of the objects of the present invention is to evaluate the sound without depending on the melody.
 本発明の一実施形態によると、入力音を取得する入力音取得部と、前記入力音取得部によって取得された前記入力音のピッチを算出するピッチ算出部と、所定の評価区間において、複数の基準ピッチと前記ピッチ算出部によって算出された前記入力音のピッチとを比較するピッチ比較部と、前記ピッチ比較部によって比較された結果に基づいて、前記入力音に対する評価値を算出する評価部と、を備えることを特徴とする評価装置が提供される。 According to an embodiment of the present invention, an input sound acquisition unit that acquires an input sound, a pitch calculation unit that calculates a pitch of the input sound acquired by the input sound acquisition unit, a plurality of evaluation sounds in a predetermined evaluation section, A pitch comparison unit that compares a reference pitch with the pitch of the input sound calculated by the pitch calculation unit; and an evaluation unit that calculates an evaluation value for the input sound based on a result of comparison by the pitch comparison unit; Are provided. The evaluation apparatus characterized by including these is provided.
 また、本発明の一実施形態によると、入力音を取得し、前記入力音のピッチを算出し、所定の評価区間において、複数の基準ピッチと前記算出された前記入力音のピッチとを比較し、前記比較された結果に基づいて、前記入力音に対する評価値を算出することをコンピュータに実行させるためのプログラムが提供される。 According to an embodiment of the present invention, the input sound is acquired, the pitch of the input sound is calculated, and a plurality of reference pitches are compared with the calculated pitch of the input sound in a predetermined evaluation section. A program for causing a computer to calculate an evaluation value for the input sound based on the compared result is provided.
 本発明の一実施形態によれば、メロディに依存せずに音の評価を行うことができる。 According to one embodiment of the present invention, sound evaluation can be performed without depending on the melody.
本発明の第1実施形態における評価装置1の構成を示すブロック図である。It is a block diagram which shows the structure of the evaluation apparatus 1 in 1st Embodiment of this invention. 本発明の第1実施形態における評価機能の構成を示すブロック図である。It is a block diagram which shows the structure of the evaluation function in 1st Embodiment of this invention. (a)~(c)は、本発明の第1実施形態における評価方法を説明する図である。(A)-(c) is a figure explaining the evaluation method in 1st Embodiment of this invention. 本発明の第2実施形態における評価方法を説明する図である。It is a figure explaining the evaluation method in 2nd Embodiment of this invention. (a)および(b)は、本発明の第3実施形態における評価方法を説明する図である。(A) And (b) is a figure explaining the evaluation method in 3rd Embodiment of this invention. (a)および(b)は、本発明の第4実施形態における評価方法を説明する図である。(A) And (b) is a figure explaining the evaluation method in 4th Embodiment of this invention. 本発明の第5実施形態における評価機能の構成を示すブロック図である。It is a block diagram which shows the structure of the evaluation function in 5th Embodiment of this invention. 本発明の第5実施形態における評価方法を説明する図である。It is a figure explaining the evaluation method in 5th Embodiment of this invention. 本発明の第6実施形態における評価機能の構成を示すブロック図である。It is a block diagram which shows the structure of the evaluation function in 6th Embodiment of this invention. 本発明の第6実施形態における評価方法を説明する図である。It is a figure explaining the evaluation method in 6th Embodiment of this invention. 本発明の第7実施形態における評価機能の構成を示すブロック図である。It is a block diagram which shows the structure of the evaluation function in 7th Embodiment of this invention. 本発明のその他の実施形態における評価機能の構成を示すブロック図である。It is a block diagram which shows the structure of the evaluation function in other embodiment of this invention. 本発明のその他の実施形態における評価方法を示すフローチャートである。It is a flowchart which shows the evaluation method in other embodiment of this invention.
 以下、本発明の一実施形態における評価装置について、図面を参照しながら詳細に説明する。以下に示す実施形態は本発明の実施形態の一例であって、本発明はこれらの実施形態に限定されるものではない。 Hereinafter, an evaluation apparatus according to an embodiment of the present invention will be described in detail with reference to the drawings. The following embodiments are examples of the embodiments of the present invention, and the present invention is not limited to these embodiments.
<第1実施形態>
 本発明の第1実施形態における評価装置について、図面を参照しながら詳細に説明する。第1実施形態に係る評価装置は、歌唱するユーザ(以下、歌唱者という場合がある)の歌唱音声を評価する装置である。この評価装置は、所定期間ごとに決められた複数のピッチ(以下、基準ピッチという場合がある)と、歌唱者の歌唱音声のピッチ(以下、歌唱ピッチという場合がある)とを比較して、歌唱の評価を行う。メロディに依存しない評価をすることができ、この例では、さらに、歌唱技法を用いた歌唱を行うことにより基準ピッチから意図的にずらした歌唱ピッチが含まれる場合であっても、評価を下げてしまわないようにすることができる。以下、このような評価装置について説明する。
<First Embodiment>
The evaluation apparatus according to the first embodiment of the present invention will be described in detail with reference to the drawings. The evaluation apparatus which concerns on 1st Embodiment is an apparatus which evaluates the singing voice of the user who sings (it may be hereafter called a singer). The evaluation device compares a plurality of pitches determined for each predetermined period (hereinafter may be referred to as a reference pitch) and a pitch of a singer's singing voice (hereinafter may be referred to as a singing pitch), Evaluate the singing. Evaluation that does not depend on the melody can be performed. In this example, even if the singing pitch is intentionally shifted from the reference pitch by performing singing using the singing technique, the evaluation is lowered. You can avoid it. Hereinafter, such an evaluation apparatus will be described.
[ハードウエア]
 図1は、本発明の第1実施形態における評価装置1の構成を示すブロック図である。評価装置1は、例えば、カラオケ装置である。なお、スマートフォン等の携帯装置であってもよい。評価装置1は、制御部11、記憶部13、操作部15、表示部17、通信部19、および信号処理部21を含む。これらの各構成は、バスを介して接続されている。また、信号処理部21には、マイクロフォン23およびスピーカ25が接続されている。
[Hardware]
FIG. 1 is a block diagram showing a configuration of an evaluation apparatus 1 in the first embodiment of the present invention. The evaluation device 1 is, for example, a karaoke device. In addition, portable apparatuses, such as a smart phone, may be sufficient. The evaluation device 1 includes a control unit 11, a storage unit 13, an operation unit 15, a display unit 17, a communication unit 19, and a signal processing unit 21. Each of these components is connected via a bus. In addition, a microphone 23 and a speaker 25 are connected to the signal processing unit 21.
 制御部11は、CPUなどの演算処理回路を含む。制御部11は、記憶部13に記憶された制御プログラムをCPUにより実行して、各種機能を評価装置1において実現させる。実現される機能には、歌唱音声の評価機能が含まれる。記憶部13は、不揮発性メモリ、ハードディスク等の記憶装置である。記憶部13は、評価機能を実現するための制御プログラムを記憶する。制御プログラムは、磁気記録媒体、光記録媒体、光磁気記録媒体、半導体メモリなどのコンピュータ読み取り可能な記録媒体に記憶した状態で提供されてもよい。この場合には、評価装置1は、記録媒体を読み取る装置を備えていればよい。また、制御プログラムは、ネットワーク経由でダウンロードされてもよい。 The control unit 11 includes an arithmetic processing circuit such as a CPU. The control unit 11 causes the CPU to execute the control program stored in the storage unit 13 to realize various functions in the evaluation device 1. The realized function includes a singing voice evaluation function. The storage unit 13 is a storage device such as a nonvolatile memory or a hard disk. The storage unit 13 stores a control program for realizing the evaluation function. The control program may be provided in a state stored in a computer-readable recording medium such as a magnetic recording medium, an optical recording medium, a magneto-optical recording medium, or a semiconductor memory. In this case, the evaluation device 1 only needs to include a device that reads the recording medium. The control program may be downloaded via a network.
 また、記憶部13は、歌唱に関するデータとして、楽曲データ、歌唱音声データ、および評価基準情報を記憶する。楽曲データは、カラオケの歌唱曲に関連するデータ、例えば、ガイドメロディデータ、伴奏データ、歌詞データなどが含まれている。ガイドメロディデータは、歌唱曲のメロディを示すデータである。伴奏データは、歌唱曲の伴奏を示すデータである。ガイドメロディデータおよび伴奏データは、MIDI形式で表現されたデータであってもよい。歌詞データは、歌唱曲の歌詞を表示させるためのデータ、および表示させた歌詞テロップを色替えするタイミングを示すデータである。歌唱音声データは、歌唱者がマイクロフォン23から入力した歌唱音声を示すデータである。この例では、歌唱音声データは、評価機能によって歌唱音声が評価されるまで、記憶部13にバッファされる。 Moreover, the memory | storage part 13 memorize | stores music data, singing voice data, and evaluation criteria information as data regarding a song. The music data includes data related to the karaoke song, for example, guide melody data, accompaniment data, and lyrics data. The guide melody data is data indicating the melody of the song. Accompaniment data is data indicating the accompaniment of a song. The guide melody data and accompaniment data may be data expressed in the MIDI format. The lyric data is data for displaying the lyrics of the song and data indicating the timing for changing the color of the displayed lyrics telop. The singing voice data is data indicating the singing voice input from the microphone 23 by the singer. In this example, the singing voice data is buffered in the storage unit 13 until the singing voice is evaluated by the evaluation function.
 評価基準情報は、評価機能が歌唱音声の評価の基準として用いる情報である。例えば、評価基準情報には、歌唱技法を検出するための歌唱ピッチの変化(歌唱ピッチ波形)を特定するための情報が含まれる。ビブラート、コブシ、シャクリ、フォールといった歌唱技法であれば、例えば、以下のような歌唱ピッチ波形を示す。
(1)ビブラート:ピッチが細かく(所定周期以下で)高低に変化する。ビブラート検出の具体的な例は、日本国特開2005-107087号公報に開示されている。
(2)コブシ:ピッチが一時的に(所定時間以内で)高くなり、その後、元のピッチに戻る。コブシ検出の具体的な例は、日本国特開2008-268370号公報に開示されている。
(3)シャクリ:ピッチが所定時間かけて高くなり、その後安定する。シャクリ検出の具体的な例は、日本国特開2005-107334号公報に開示されている。
(4)フォール:ピッチが所定時間かけて低くなり、その後、歌唱が途切れる。フォール検出の具体的な例は、日本国特開2008-225115号公報に開示されている。
The evaluation standard information is information used by the evaluation function as a standard for evaluating the singing voice. For example, the evaluation reference information includes information for specifying a change in singing pitch (singing pitch waveform) for detecting a singing technique. In the case of a singing technique such as vibrato, kobushi, shakuri, or fall, for example, the following singing pitch waveform is shown.
(1) Vibrato: The pitch changes finely (within a predetermined period or less). A specific example of vibrato detection is disclosed in Japanese Patent Application Laid-Open No. 2005-107087.
(2) Kobushi: The pitch temporarily increases (within a predetermined time), and then returns to the original pitch. A specific example of Kobushi detection is disclosed in Japanese Patent Laid-Open No. 2008-268370.
(3) Shaking: The pitch increases over a predetermined time and then stabilizes. A specific example of shackle detection is disclosed in Japanese Patent Application Laid-Open No. 2005-107334.
(4) Fall: The pitch is lowered over a predetermined time, and then the singing is interrupted. A specific example of fall detection is disclosed in Japanese Patent Application Laid-Open No. 2008-225115.
 また、評価基準情報には、歌唱ピッチと比較されるべき基準ピッチが規定されている。基準ピッチは、複数のピッチとして規定されている。この例では、基準ピッチは、440Hzを含み、440Hzを基準として100cent間隔で規定されている。なお、楽曲によって基準となるピッチが440Hzからずれている場合、例えば、442Hzである場合には、基準ピッチは、442Hzを基準として100cent間隔で規定されればよい。このずれ量については、例えば、楽曲データにチューン情報として格納されていればよい。 Moreover, the reference pitch to be compared with the singing pitch is defined in the evaluation reference information. The reference pitch is defined as a plurality of pitches. In this example, the reference pitch includes 440 Hz and is defined at 100 cent intervals with 440 Hz as a reference. In addition, when the pitch used as a reference | standard deviates from 440Hz with a music, for example, when it is 442Hz, a reference pitch should just be prescribed | regulated by 100cent interval on the basis of 442Hz. For example, the shift amount may be stored as tune information in the music data.
 操作部15は、操作パネルおよびリモコンなどに設けられた操作ボタン、キーボード、マウスなどの装置であり、入力された操作に応じた信号を制御部11に出力する。表示部17は、液晶ディスプレイ、有機ELディスプレイ等の表示装置であり、制御部11による制御に基づいた画面が表示される。なお、操作部15と表示部17とは一体としてタッチパネルを構成してもよい。通信部19は、制御部11の制御に基づいて、インターネットなどの通信回線と接続して、サーバ等の外部装置と情報の送受信を行う。なお、記憶部13の機能は、通信部19において通信可能な外部装置で実現されてもよい。 The operation unit 15 is a device such as operation buttons, a keyboard, and a mouse provided on the operation panel and the remote controller, and outputs a signal corresponding to the input operation to the control unit 11. The display unit 17 is a display device such as a liquid crystal display or an organic EL display, and displays a screen based on control by the control unit 11. Note that the operation unit 15 and the display unit 17 may integrally form a touch panel. The communication unit 19 is connected to a communication line such as the Internet based on the control of the control unit 11 and transmits / receives information to / from an external device such as a server. The function of the storage unit 13 may be realized by an external device that can communicate with the communication unit 19.
 信号処理部21は、MIDI形式の信号からオーディオ信号を生成する音源、A/Dコンバータ、D/Aコンバータ等を含む。歌唱音声は、マイクロフォン23において電気信号に変換されて信号処理部21に入力され、信号処理部21においてA/D変換されて制御部11に出力される。上述したように、歌唱音声は、歌唱音声データとして記憶部13にバッファされる。また、伴奏データは、制御部11によって読み出され、信号処理部21においてD/A変換され、スピーカ25から歌唱曲の伴奏として出力される。このとき、ガイドメロディもスピーカ25から出力されるようにしてもよい。 The signal processing unit 21 includes a sound source that generates an audio signal from a MIDI format signal, an A / D converter, a D / A converter, and the like. The singing voice is converted into an electric signal by the microphone 23 and input to the signal processing unit 21, and A / D converted by the signal processing unit 21 and output to the control unit 11. As described above, the singing voice is buffered in the storage unit 13 as singing voice data. The accompaniment data is read out by the control unit 11, D / A converted by the signal processing unit 21, and output from the speaker 25 as an accompaniment of the song. At this time, a guide melody may be output from the speaker 25.
[評価機能]
 評価装置1の制御部11が制御プログラムを実行することによって実現される評価機能について説明する。なお、以下に説明する評価機能を実現する構成の一部または全部は、ハードウエアによって実現されてもよい。
[Evaluation function]
An evaluation function realized by the control unit 11 of the evaluation apparatus 1 executing a control program will be described. A part or all of the configuration for realizing the evaluation function described below may be realized by hardware.
 図2は、本発明の第1実施形態における評価機能の構成を示すブロック図である。評価機能100は、伴奏出力部101、入力音取得部103、ピッチ算出部105、特定区間検出部107、ピッチ比較部109、および評価部111を含む。伴奏出力部101は、歌唱者に指定された歌唱曲に対応する伴奏データを読み出し、信号処理部21を介して、伴奏音をスピーカ25から出力させる。入力音取得部103は、マイクロフォン23から入力された歌唱音声を示す歌唱音声データを取得する。この例では、伴奏音が出力されている期間におけるマイクロフォン23への入力音を、評価対象の歌唱音声として認識する。なお、入力音取得部103は、記憶部13にバッファされた歌唱音声データを取得するが、記憶部13に1曲全体の歌唱音声データが記憶された後に取得してもよいし、信号処理部21から直接取得してもよい。また、入力音取得部103は、マイクロフォン23へ入力音を示す歌唱音声データを取得する場合に限らず、外部装置への入力音を示す歌唱音声データを、通信部19によりネットワーク経由で取得してもよい。 FIG. 2 is a block diagram showing the configuration of the evaluation function in the first embodiment of the present invention. The evaluation function 100 includes an accompaniment output unit 101, an input sound acquisition unit 103, a pitch calculation unit 105, a specific section detection unit 107, a pitch comparison unit 109, and an evaluation unit 111. The accompaniment output unit 101 reads the accompaniment data corresponding to the song tune designated by the singer, and causes the speaker 25 to output the accompaniment sound via the signal processing unit 21. The input sound acquisition unit 103 acquires singing voice data indicating the singing voice input from the microphone 23. In this example, the input sound to the microphone 23 during the period in which the accompaniment sound is output is recognized as the singing sound to be evaluated. The input sound acquisition unit 103 acquires the singing voice data buffered in the storage unit 13, but may be acquired after the singing voice data of the entire song is stored in the storage unit 13, or the signal processing unit You may acquire directly from 21. The input sound acquisition unit 103 is not limited to acquiring singing voice data indicating an input sound to the microphone 23, and acquires singing voice data indicating an input sound to the external device via the network by the communication unit 19. Also good.
 ピッチ算出部105は、入力音取得部103によって取得された歌唱音声データを解析し、歌唱ピッチ(周波数)の時間的な変化、すなわち歌唱ピッチ波形を算出する。具体的には、歌唱音声の波形のゼロクロスを用いた方法、FFT(Fast Fourier Transform)を用いた方法等、公知の方法で歌唱ピッチ波形が算出される。特定区間検出部107は、歌唱ピッチ波形を解析し、歌唱音声の入力期間のうち、評価基準情報によって規定された歌唱技法を含む区間(特定区間)を検出する。このとき検出される特定区間は、歌唱技法の種類ごとに対応付けられていてもよい。 The pitch calculation unit 105 analyzes the singing voice data acquired by the input sound acquisition unit 103, and calculates a temporal change of the singing pitch (frequency), that is, a singing pitch waveform. Specifically, the singing pitch waveform is calculated by a known method such as a method using a zero cross of a waveform of a singing voice or a method using FFT (Fast Fourier Transform). The specific section detection unit 107 analyzes the singing pitch waveform and detects a section (specific section) including the singing technique defined by the evaluation reference information in the singing voice input period. The specific section detected at this time may be associated with each type of singing technique.
 ピッチ比較部109は、歌唱音声の入力期間のうち、特定区間検出部107において検出された特定区間を除いた区間を、評価区間として設定する。ピッチ比較部109は、評価区間における歌唱ピッチ波形を、基準ピッチと比較する。比較結果として、この例では、歌唱ピッチ波形と基準ピッチとの不一致度を算出する。ここで、複数の基準ピッチが100cent間隔で存在する。そのため、複数の基準ピッチのうち、歌唱ピッチに最も近い基準ピッチが歌唱ピッチの比較対象として選択される。歌唱ピッチ波形と基準ピッチとの差が大きいほど、不一致度が高くなるように算出される。例えば、歌唱ピッチ波形の各サンプルにおける歌唱ピッチと基準ピッチとの差分を評価区間において加算し、加算値を評価区間のサンプル数で除算することによって、不一致度が算出される。 The pitch comparison unit 109 sets a section excluding the specific section detected by the specific section detection unit 107 in the singing voice input period as an evaluation section. The pitch comparison unit 109 compares the singing pitch waveform in the evaluation section with the reference pitch. As a comparison result, in this example, the degree of mismatch between the singing pitch waveform and the reference pitch is calculated. Here, a plurality of reference pitches exist at 100 cent intervals. Therefore, the reference pitch closest to the singing pitch among the plurality of reference pitches is selected as a comparison target of the singing pitch. The greater the difference between the singing pitch waveform and the reference pitch, the higher the discrepancy. For example, the difference between the singing pitch and the reference pitch in each sample of the singing pitch waveform is added in the evaluation section, and the added value is divided by the number of samples in the evaluation section, thereby calculating the degree of inconsistency.
 なお、基準ピッチからシャープ側(高ピッチ側)およびフラット側(低ピッチ側)に所定範囲のピッチ幅を合格域とし、それ以外を不合格域として、不合格域のサンプル数を、評価区間のサンプル数で除算することによって不一致度を算出してもよい。シャープ側とフラット側とで合格域のピッチ幅が異なっていてもよい。また、不合格域のうちシャープ側とフラット側とで、サンプル数の加算値に対して異なる重み付けがされてもよい。例えば、シャープ側にずれたサンプル数の加算値の不一致度に対する影響が大きくなるように重み付けがされてもよい。これによれば、聴感上目立つとされているシャープ側にずれた歌唱に厳しくなる評価を行うことも可能である。また、不合格域の歌唱ピッチ波形の各サンプルにおける歌唱ピッチと基準ピッチとの差分を評価区間において加算し、加算値を評価区間のサンプル数で除算することによって、不一致度が算出されてもよい。 It should be noted that the pitch range of the predetermined range from the reference pitch to the sharp side (high pitch side) and the flat side (low pitch side) is the pass area, the other is the reject area, and the number of samples in the reject area is The degree of inconsistency may be calculated by dividing by the number of samples. The pitch width of the pass area may be different between the sharp side and the flat side. Further, different weights may be applied to the added value of the number of samples on the sharp side and the flat side in the rejected area. For example, weighting may be performed so that the influence on the mismatch degree of the added value of the number of samples shifted to the sharp side is increased. According to this, it is also possible to perform an evaluation that becomes severe for a singing shifted to the sharp side, which is considered to be conspicuous in the sense of hearing. Further, the degree of inconsistency may be calculated by adding the difference between the singing pitch and the reference pitch in each sample of the singing pitch waveform in the rejected area in the evaluation section, and dividing the added value by the number of samples in the evaluation section. .
 このようにして、メロディに依存しない歌唱評価を行うことができる。また、この例では、歌唱ピッチと基準ピッチとが、歌唱音声の入力期間全体ではなく、特定区間を除いた区間において比較している。したがって、特定区間における歌唱技法による意図的な歌唱ピッチのずれが、不一致度を増加させてしまわないようにすることもできる。なお、評価区間をさらに複数に分割された区間に分けて、それぞれの区間において不一致度を算出するようにしてもよい。複数に分割された区間は、それぞれ一部が重複した区間を有していてもよい。 In this way, singing evaluation independent of the melody can be performed. In this example, the singing pitch and the reference pitch are compared not in the entire singing voice input period but in a section excluding the specific section. Therefore, the intentional singing pitch shift by the singing technique in the specific section can be prevented from increasing the degree of inconsistency. Note that the evaluation section may be further divided into a plurality of sections, and the degree of mismatch may be calculated in each section. The sections divided into a plurality may have sections that are partially overlapped.
 評価部111は、ピッチ比較部109における比較結果に基づいて、歌唱音声の評価の指標となる評価値を算出する。この例では、ピッチ比較部109で算出された不一致度が高いほど評価値が低く算出され、歌唱音声の評価が悪くなる。なお、評価部111は、この不一致度のみに基づいて評価値を算出するのではなく、さらに他の要素に基づいて評価値を算出してもよい。他の要素は、歌唱技法および歌唱音声データから抽出可能な他のパラメータなどが想定される。歌唱技法を評価値に反映させる場合には、特定区間検出部107において検出された特定区間に対応する歌唱技法を用いればよい。他のパラメータとしては、例えば、音量変化がある。音量変化を用いれば、歌唱の抑揚を評価に加えることもできる。評価部111による評価結果は、表示部17において提示されてもよい。以上のとおり、歌唱音声データが入力されて評価値を算出するまでの評価方法は、評価機能100の各構成における処理が順に実行されることにより提供される。 The evaluation unit 111 calculates an evaluation value serving as an index for evaluating the singing voice based on the comparison result in the pitch comparison unit 109. In this example, the higher the mismatch degree calculated by the pitch comparison unit 109, the lower the evaluation value is calculated, and the evaluation of the singing voice becomes worse. Note that the evaluation unit 111 may calculate the evaluation value based on another element, instead of calculating the evaluation value based only on the degree of mismatch. As other elements, a singing technique and other parameters that can be extracted from the singing voice data are assumed. When reflecting the singing technique in the evaluation value, the singing technique corresponding to the specific section detected by the specific section detecting unit 107 may be used. Another parameter is, for example, volume change. If volume change is used, singing intonation can be added to the evaluation. The evaluation result by the evaluation unit 111 may be presented on the display unit 17. As described above, the evaluation method from when the singing voice data is input until the evaluation value is calculated is provided by sequentially executing the processing in each configuration of the evaluation function 100.
[歌唱評価の例]
 上述した評価機能100による歌唱音声の評価方法について、図3に示す具体的な歌唱ピッチの例を用いて説明する。
[Example of singing evaluation]
The singing voice evaluation method by the evaluation function 100 described above will be described using a specific example of the singing pitch shown in FIG.
 図3は、本発明の第1実施形態における評価方法を説明する図である。図3(a)に示す波形は、歌唱の一部における歌唱ピッチ波形の例である。縦軸はピッチを示す。ピッチ方向に100centごとに配置された破線は複数の基準ピッチを示している。横軸は時間の経過を示している。特定区間検出部107は、歌唱ピッチ波形から、歌唱技法が存在する特定区間を検出する。図3(a)に示す区間Sは「シャクリ」、区間Fは「フォール」、区間Kは「コブシ」、区間Vは「ビブラート」にそれぞれ対応する特定区間である。したがって、評価区間は、区間S、F、K、Vに対応する特定区間以外となる。 FIG. 3 is a diagram for explaining an evaluation method in the first embodiment of the present invention. The waveform shown to Fig.3 (a) is an example of the song pitch waveform in a part of song. The vertical axis represents the pitch. The broken lines arranged every 100 cents in the pitch direction indicate a plurality of reference pitches. The horizontal axis shows the passage of time. The specific section detection unit 107 detects a specific section where the singing technique exists from the singing pitch waveform. The section S shown in FIG. 3A is a specific section corresponding to “shakuri”, the section F is “fall”, the section K is “Kobushi”, and the section V is “vibrato”. Therefore, the evaluation section is other than the specific section corresponding to the sections S, F, K, and V.
 ピッチ比較部109において算出される不一致度は、各サンプルにおける歌唱ピッチと基準ピッチとの差分の加算値、すなわち、図3(b)に示す斜線部分の面積に対応している。区間Vにおいては、ビブラートのピッチ変化の特徴から、図3(c)に示す斜線部分の面積が大きくなる。そのため、区間Vを歌唱ピッチと基準ピッチとの比較の対象に含める場合に、ビブラートの歌唱技法を用いた豊かな歌唱であるにもかかわらず、不一致度が大きく算出されて歌唱の評価を下げてしまう場合がある。このような場合であっても、本実施形態における評価装置1のように、区間Vを含む特定区間を除いた評価区間において歌唱ピッチと基準ピッチとの比較をすれば、歌唱技法を用いることによって評価を下げてしまわないようにすることもできる。ここでは、歌唱技法としてビブラートの例を示したが、「シャクリ」、「コブシ」、「フォール」でも同様である。 The inconsistency calculated in the pitch comparison unit 109 corresponds to the added value of the difference between the singing pitch and the reference pitch in each sample, that is, the area of the hatched portion shown in FIG. In the section V, the area of the hatched portion shown in FIG. 3C increases due to the characteristic of the vibrato pitch change. Therefore, when the section V is included in the comparison target of the singing pitch and the reference pitch, the disagreement degree is greatly calculated and the evaluation of the singing is lowered despite the rich singing using the vibrato singing technique. May end up. Even in such a case, if the singing pitch is compared with the reference pitch in the evaluation section excluding the specific section including the section V as in the evaluation device 1 in the present embodiment, the singing technique is used. It is possible to prevent the evaluation from being lowered. Here, an example of vibrato is shown as a singing technique, but the same applies to “shakuri”, “kobushi”, and “fall”.
<第2実施形態>
 図4は、本発明の第2実施形態における評価方法を説明する図である。第2実施形態では、ピッチ比較部109において予め特定区間を除去し、評価区間のみを抽出した歌唱ピッチ波形を生成し、歌唱ピッチと基準ピッチとを比較する。この方法は、歌唱曲全体の歌唱音声データをまとめて解析するときに用いると、処理の効率がよい。
Second Embodiment
FIG. 4 is a diagram for explaining an evaluation method according to the second embodiment of the present invention. In the second embodiment, the pitch comparison unit 109 removes the specific section in advance, generates a singing pitch waveform in which only the evaluation section is extracted, and compares the singing pitch with the reference pitch. When this method is used when collectively analyzing the singing voice data of the entire song, the processing efficiency is good.
<第3実施形態>
 第3実施形態では、歌唱評価に用いる不一致度の算出方法が異なる評価方法である。上記実施形態では、不一致度は、ピッチ比較部109において、各サンプルにおける歌唱ピッチと基準ピッチとの差分の加算値を用いて算出された。第3実施形態では、所定のピッチ幅(例えば、2cent)単位でピッチを区分し、この区分された各ピッチ範囲における歌唱ピッチの度数、すなわち、各ピッチ範囲における歌唱ピッチが含まれるサンプル数を算出する。この度数の算出には、上述の実施形態と同様に、評価区間の歌唱ピッチが用いられる。
<Third Embodiment>
In 3rd Embodiment, it is an evaluation method from which the calculation method of the mismatch degree used for song evaluation differs. In the embodiment described above, the degree of inconsistency is calculated by the pitch comparison unit 109 using the added value of the difference between the singing pitch and the reference pitch in each sample. In the third embodiment, the pitch is divided in units of a predetermined pitch width (for example, 2 cent), and the frequency of the singing pitch in each divided pitch range, that is, the number of samples including the singing pitch in each pitch range is calculated. To do. For the calculation of the frequency, the singing pitch of the evaluation section is used as in the above-described embodiment.
 図5は、本発明の第3実施形態における評価方法を説明する図である。縦軸はピッチ(各ピッチ範囲)を示す。ピッチ方向に100centごとに配置された破線は複数の基準ピッチを示している。横軸は時間の経過を示している。図5(a)は、歌唱評価が高くなる歌唱ピッチの度数分布を示している。図5(b)は、歌唱評価が低くなる歌唱ピッチの度数分布を示している。 FIG. 5 is a diagram for explaining an evaluation method according to the third embodiment of the present invention. The vertical axis represents the pitch (each pitch range). The broken lines arranged every 100 cents in the pitch direction indicate a plurality of reference pitches. The horizontal axis shows the passage of time. Fig.5 (a) has shown frequency distribution of the singing pitch from which singing evaluation becomes high. FIG.5 (b) has shown frequency distribution of the singing pitch from which singing evaluation becomes low.
 図5(a)と図5(b)を比較すると、ピークの位置はほとんど同じであるが、図5(a)に示す度数分布の方が、高いピーク(ピークとディップの差が大きい)を有し、そのピークの幅が狭い。このことは、歌唱ピッチが基準ピッチ近傍で安定していることを示しているため、算出される不一致度は低くなる。一方、図5(b)では基準ピッチに対する歌唱ピッチのばらつきが大きいことを示しているため、不一致度は大きくなる。また、ピークの位置が基準ピッチからずれているほど、不一致度を高くしてもよい。このように、第3実施形態のピッチ比較部109は、各ピッチ範囲で歌唱ピッチの度数分布に基づいて、不一致度を算出してもよい。 Comparing FIG. 5 (a) and FIG. 5 (b), the peak positions are almost the same, but the frequency distribution shown in FIG. 5 (a) shows a higher peak (difference between peak and dip is larger). And the peak width is narrow. Since this indicates that the singing pitch is stable in the vicinity of the reference pitch, the calculated inconsistency is low. On the other hand, FIG. 5B shows that the variation in the singing pitch with respect to the reference pitch is large, so the degree of mismatch becomes large. Further, the degree of mismatch may be increased as the peak position is shifted from the reference pitch. As described above, the pitch comparison unit 109 according to the third embodiment may calculate the degree of inconsistency based on the frequency distribution of the singing pitch in each pitch range.
<第4実施形態>
 第4実施形態では、第3実施形態の評価方法において、度数分布のピークが基準ピッチからずれている場合でも、所定の条件を満たす場合には、不一致度を高くしないようにする評価方法である。
<Fourth embodiment>
The fourth embodiment is an evaluation method in which, in the evaluation method of the third embodiment, even when the peak of the frequency distribution is deviated from the reference pitch, if the predetermined condition is satisfied, the degree of mismatch is not increased. .
 図6は、本発明の第4実施形態における評価方法を説明する図である。図6(a)は、図5(a)とは度数分布のピークの位置(矢印部分)が基準ピッチよりも低くなっている。ただし、全てのピークについて、全体的に基準ピッチよりも低くなっている。このような場合には、ピッチ比較部109は、度数分布と基準ピッチとの関係をピッチ方向に調整する。このとき、度数分布の各ピークに対して、ピークの各々から最も近い基準ピッチとのピッチ差の合計が最も少なくなるように、度数分布と基準ピッチとの関係が調整される。例えば、図6(a)の状態から、図6(b)に示すように基準ピッチを下げることによって、ピークと基準ピッチとをほぼ一致させることができる。このように調整した上で、不一致度が算出される。 FIG. 6 is a diagram for explaining an evaluation method in the fourth embodiment of the present invention. FIG. 6A is different from FIG. 5A in that the peak position (arrow portion) of the frequency distribution is lower than the reference pitch. However, all peaks are lower than the reference pitch as a whole. In such a case, the pitch comparison unit 109 adjusts the relationship between the frequency distribution and the reference pitch in the pitch direction. At this time, for each peak of the frequency distribution, the relationship between the frequency distribution and the reference pitch is adjusted so that the sum of pitch differences from the reference pitch closest to each peak is minimized. For example, by reducing the reference pitch as shown in FIG. 6B from the state of FIG. 6A, the peak and the reference pitch can be substantially matched. After making such adjustments, the degree of inconsistency is calculated.
 このように度数分布と基準ピッチとの関係を調整することによって、歌唱音声が絶対的なピッチから外れていても、歌唱ピッチが100centの倍数で正しく遷移していれば歌唱評価を高くすることができる。また、楽曲の基準となるピッチが440Hzからずれている場合において、ずれたピッチに基づく伴奏にあわせて歌唱した場合に、歌唱評価を高くすることができる。第3実施形態での評価方法では、図6(a)の度数分布では、ピークと基準ピッチのピッチ差のため歌唱評価が低くなってしまうことになるが、第4実施形態のように図6(b)の度数分布に調整すれば、歌唱評価が低くならないようにすることができる。 By adjusting the relationship between the frequency distribution and the reference pitch in this way, even if the singing voice deviates from the absolute pitch, the singing evaluation can be increased if the singing pitch is correctly shifted by a multiple of 100 cent. it can. Moreover, when the pitch used as the reference | standard of a music has shifted | deviated from 440Hz, when singing along with the accompaniment based on the shifted pitch, song evaluation can be made high. In the evaluation method according to the third embodiment, in the frequency distribution of FIG. 6A, the singing evaluation becomes low due to the pitch difference between the peak and the reference pitch. However, as in the fourth embodiment, FIG. If the frequency distribution of (b) is adjusted, the singing evaluation can be prevented from being lowered.
<第5実施形態>
 上記実施形態では、歌唱音声の入力期間から特定区間を除いた区間を、評価区間としていたが、第5実施形態では、少なくとも特定区間においては歌唱ピッチ波形を別の波形に変更して評価する。
<Fifth Embodiment>
In the said embodiment, although the area remove | excluding the specific area from the input period of singing voice was made into the evaluation area, in 5th Embodiment, at least in a specific area, a song pitch waveform is changed into another waveform and evaluated.
 図7は、本発明の第5実施形態における評価機能の構成を示すブロック図である。評価機能100Aのうち、上記実施形態と異なる構成(ピッチ変更部113、ピッチ比較部109A)について説明する。ピッチ変更部113は、特定区間検出部107において検出された特定区間の歌唱ピッチ波形を、歌唱技法に応じた変更ルールに従って変更する。具体例として、図8を用いて説明する。 FIG. 7 is a block diagram showing the configuration of the evaluation function in the fifth embodiment of the present invention. Of the evaluation function 100A, a configuration (pitch change unit 113, pitch comparison unit 109A) different from the above embodiment will be described. The pitch changing unit 113 changes the singing pitch waveform of the specific section detected by the specific section detecting unit 107 according to a change rule corresponding to the singing technique. A specific example will be described with reference to FIG.
 図8は、本発明の第5実施形態における評価方法を説明する図である。図8において、破線で示した歌唱ピッチ波形は、特定区間において変更され、実線で示した歌唱ピッチ波形となる。この例で示すように、歌唱ピッチ波形は、基準ピッチから外れた歌唱ピッチを少なくするように変更されている。例えば、「シャクリ」、「フォール」では、歌唱ピッチが急峻に変化するように変更される。「コブシ」では、歌唱ピッチの急峻な上昇および下降を平滑化するように変更される。「ビブラート」では、周期的な歌唱ピッチの変動を平滑化(例えば、周期当たりの平均に平滑化)するように変更される。この際、特定区間だけではなく、特定区間の周辺部分においても変更されてもよい。なお、特定区間のうち、歌唱技法における歌唱ピッチ波形の特定の変化をする部分のみ除去し、結果的に歌唱ピッチの変化が少ない部分、すなわち歌唱ピッチ波形がほぼ水平になっている部分のみを抽出して評価されるようにしてもよい。 FIG. 8 is a diagram for explaining an evaluation method in the fifth embodiment of the present invention. In FIG. 8, the singing pitch waveform indicated by the broken line is changed in the specific section, and becomes the singing pitch waveform indicated by the solid line. As shown in this example, the singing pitch waveform is changed so as to reduce the singing pitch deviating from the reference pitch. For example, in “Shakuri” and “Fall”, the singing pitch is changed so as to change sharply. “Kobushi” is changed so as to smooth the sharp rise and fall of the singing pitch. In “vibrato”, a change is made so as to smooth the fluctuation of the periodic singing pitch (for example, smoothing to an average per period). At this time, the change may be made not only in the specific section but also in the peripheral portion of the specific section. It should be noted that, in the specific section, only the part that changes the singing pitch waveform in the singing technique is removed, and as a result, only the part where the singing pitch waveform changes little, that is, the part where the singing pitch waveform is almost horizontal is extracted. And may be evaluated.
 ピッチ比較部109Aは、ピッチ変更部113によって変更された歌唱ピッチと基準ピッチとを比較する。比較対象となる区間は、上記実施形態とは異なり、歌唱音声の入力期間となる。すなわち、特定区間を除くものではない。一方、上記実施形態とは異なり、歌唱技法が検出された特定区間において歌唱ピッチが変更され、その結果、歌唱ピッチは基準ピッチから外れる期間が短くなっている。したがって、特定区間を除いた評価を実行しなくても、不一致度が高くなりにくい。このようにして歌唱評価を実行することも可能である。なお、全ての特定区間において歌唱ピッチを変更するのではなく、一部の特定区間については、上記実施形態のように評価対象から除外してもよい。 The pitch comparison unit 109A compares the singing pitch changed by the pitch changing unit 113 with the reference pitch. The section to be compared is an input period of singing voice unlike the above embodiment. That is, the specific section is not excluded. On the other hand, unlike the above-described embodiment, the singing pitch is changed in the specific section in which the singing technique is detected, and as a result, the period during which the singing pitch deviates from the reference pitch is shortened. Therefore, even if the evaluation excluding the specific section is not executed, the degree of mismatch is unlikely to increase. It is also possible to perform singing evaluation in this way. Note that the singing pitch is not changed in all the specific sections, but some specific sections may be excluded from the evaluation target as in the above embodiment.
<第6実施形態>
 上記実施形態では、歌唱ピッチと比較される基準ピッチは歌唱曲全体にわたって変更されなかったが、第6実施形態では、歌唱曲の位置によって基準ピッチが変更される。
<Sixth Embodiment>
In the said embodiment, although the reference | standard pitch compared with a song pitch was not changed over the whole song, in 6th Embodiment, a reference | standard pitch is changed with the position of a song.
 図9は、本発明の第6実施形態における評価機能の構成を示すブロック図である。評価機能100Bのうち、上記実施形態と異なる構成(コード検出部115、ピッチ比較部109B)について説明する。コード検出部115は、伴奏出力部101によって出力される伴奏音を解析し、歌唱曲の各期間におけるコードを検出する。ピッチ比較部109Bは、コード検出部115において検出されたコードに基づいて、各期間における基準ピッチを設定し、歌唱ピッチと比較する。この例では、コード構成音が基準ピッチとして設定される。具体例として、図10を用いて説明する。 FIG. 9 is a block diagram showing the configuration of the evaluation function in the sixth embodiment of the present invention. Of the evaluation function 100B, a configuration (code detection unit 115, pitch comparison unit 109B) different from the above embodiment will be described. The chord detection unit 115 analyzes the accompaniment sound output by the accompaniment output unit 101 and detects chords in each period of the song. The pitch comparison unit 109B sets a reference pitch in each period based on the chord detected by the chord detection unit 115 and compares it with the singing pitch. In this example, the chord constituent sound is set as the reference pitch. A specific example will be described with reference to FIG.
 図10は、本発明の第6実施形態における評価方法を説明する図である。この例では、前半期間のコードが「Dm7」と検出され、後半期間のコードが「G7」と検出された場合を示している。そのため、「Dm7」の前半期間では、基準ピッチが「D」、「F」、「A」、「C」として設定される。「G7」の後半期間では、基準ピッチが「G」、「B」、「D」、「F」として設定される。ピッチ比較部109Bは、評価区間において、歌唱ピッチと、コードに応じて設定された基準ピッチとを比較して、不一致度を算出する。 FIG. 10 is a diagram for explaining an evaluation method according to the sixth embodiment of the present invention. In this example, the code of the first half period is detected as “Dm7”, and the code of the second half period is detected as “G7”. Therefore, in the first half period of “Dm7”, the reference pitch is set as “D”, “F”, “A”, “C”. In the second half period of “G7”, the reference pitch is set as “G”, “B”, “D”, “F”. In the evaluation section, the pitch comparison unit 109B compares the singing pitch with the reference pitch set according to the chord, and calculates the degree of inconsistency.
 このように、基準ピッチは歌唱曲の全体にわたって固定されていなくてもよく、評価区間において変更されていてもよい。また、上記の通り、伴奏音に基づいて基準ピッチが決められる場合には、基準ピッチをコード構成音としてもよいし、スケール構成音としてもよい。また、歌唱曲の調を伴奏音から検出し、この調に基づいて基準ピッチが決められてもよい。なお、基準ピッチが歌唱曲の全体にわたって固定されている場合であっても評価区間において変更される場合であっても、基準ピッチが予め決められていてもよい。例えば、歌唱曲に対して予め設定された調に基づいて、この調に基づいた基準ピッチが予め決められてもよい。この場合には、予め基準ピッチが100cent間隔で決められたピッチの一部が用いられることになる。また、上記実施形態では、基準ピッチは100cent間隔(半音間隔)で決められていたが、周波数が対数的に等間隔な100cent以外の別の間隔(例えば、50cent間隔、200cent間隔など)であってもよい。 Thus, the reference pitch may not be fixed over the entire song, and may be changed in the evaluation section. In addition, as described above, when the reference pitch is determined based on the accompaniment sound, the reference pitch may be a chord constituent sound or a scale constituent sound. Further, the key of the song may be detected from the accompaniment sound, and the reference pitch may be determined based on this key. In addition, even if it is a case where the reference pitch is fixed over the whole song song, or it is a case where it changes in an evaluation area, a reference pitch may be predetermined. For example, based on a key preset for the song, a reference pitch based on this key may be determined in advance. In this case, a part of the pitch whose reference pitch is determined in advance at 100 cent intervals is used. In the above embodiment, the reference pitch is determined at 100 cent intervals (semitone intervals). However, the reference pitch is another interval (for example, 50 cent intervals, 200 cent intervals, etc.) other than 100 cents whose frequencies are logarithmically equal. Also good.
<第7実施形態>
 第7実施形態では、さらに、歌唱者が歌唱したメロディの各構成音を発したタイミングを歌唱評価に加えた評価機能100Cについて説明する。
<Seventh embodiment>
In the seventh embodiment, an evaluation function 100C in which the timing at which each constituent sound of the melody sung by the singer is emitted is added to the singing evaluation.
 図11は、本発明の第7実施形態における評価機能の構成を示すブロック図である。評価機能100Cのうち、上記実施形態と異なる構成(タイミング評価部117、評価部111C)について説明する。タイミング評価部117は、伴奏音を解析して歌唱曲の各期間におけるビート、すなわち、拍のタイミングを検出する。また、タイミング評価部117は、歌唱音声の音量変化に基づいて、ノートオンタイミング(メロディの構成音を歌唱者が発したと想定されるタイミング)を検出する。そして、タイミング評価部117は、検出した拍のタイミングとノートオンタイミングとを比較する。評価部111Cは、評価値の算出において、タイミング評価部117における比較結果を反映させる。すなわち、拍のタイミングとノートオンタイミングとが一致しているほど、高い評価になるように評価値が算出される。 FIG. 11 is a block diagram showing the configuration of the evaluation function in the seventh embodiment of the present invention. Of the evaluation function 100C, configurations (timing evaluation unit 117, evaluation unit 111C) different from the above embodiment will be described. The timing evaluation unit 117 analyzes the accompaniment sound and detects beats in each period of the song, that is, beat timing. In addition, the timing evaluation unit 117 detects note-on timing (timing at which the singer utters the constituent sound of the melody) based on the volume change of the singing voice. The timing evaluation unit 117 compares the detected beat timing with the note-on timing. The evaluation unit 111C reflects the comparison result in the timing evaluation unit 117 in calculating the evaluation value. That is, the evaluation value is calculated so that the higher the beat timing and the note-on timing match, the higher the evaluation.
<第8実施形態>
 上記実施形態では、特定区間検出部107は、歌唱ピッチに、歌唱技法に対応した波形が含まれている特定区間を検出していた。第8実施形態においては、その他においても評価区間として除外すべき特定区間が検出される。例えば、特定区間は、加点対象となるような歌唱技法に対応した区間(技法区間)と加点対象とならないような区間(不正区間)とに分類されてもよい。不正区間は、歌唱音声として入力された音が、評価されるべきではない歌唱(例えば、マイクロフォン23に入力された伴奏音などの歌唱以外の音)である場合、または高評価を狙って特殊な歌唱をした場合として検出された区間である。
<Eighth Embodiment>
In the said embodiment, the specific area detection part 107 has detected the specific area where the waveform corresponding to the singing technique is contained in the singing pitch. In the eighth embodiment, a specific section that should be excluded as an evaluation section is also detected in other areas. For example, the specific section may be classified into a section (technical section) corresponding to a singing technique that is to be added and a section (unauthorized section) that is not to be added. The illegal section is special when the sound input as the singing voice is a singing that should not be evaluated (for example, a sound other than a singing sound such as an accompaniment sound input to the microphone 23), or for special evaluation. It is a section detected as a case of singing.
 入力された歌唱音声が、例えば、同じ歌唱ピッチが所定時間以上連続する場合に不正区間として検出される。具体的には、歌唱ピッチのデータ列を微分したものを正負でそれぞれ積算し、一定以上の起伏が歌唱ピッチに存在しない場合である。同じ歌唱ピッチで歌唱し続けて基準ピッチに一致させようとすることは比較的容易である。そのため、同じ歌唱ピッチで歌唱し続ければ、高評価が得られてしまう。したがって、このような場合には不正な歌唱であると検出されるべきである。一方、歌唱曲のメロディとは関係なく、歌唱ピッチを変動させて歌唱することは容易ではない。そのため、一定以上の歌唱ピッチの起伏があれば、歌唱曲のメロディを歌唱しているものと推定される。 The input singing voice is detected as an illegal section when, for example, the same singing pitch continues for a predetermined time or more. Specifically, it is a case where the data obtained by differentiating the data string of the singing pitch is integrated with positive and negative, respectively, and there is no undulation more than a certain level in the singing pitch. It is relatively easy to continue to sing at the same singing pitch and try to match the reference pitch. Therefore, if it continues singing with the same singing pitch, high evaluation will be obtained. Therefore, in such a case, it should be detected as an illegal song. On the other hand, regardless of the melody of the song, it is not easy to sing with the singing pitch varied. Therefore, if the singing pitch is more than a certain level, it is estimated that the melody of the singing song is sung.
 また、入力音に伴奏音が回り込んで、歌唱音ではない可能性が高いときには、不正区間として検出する。マイクロフォン23へ入力された音に伴奏音が含まれている可能性が高い場合は、以下に示す歌唱ピッチ波形を持つ区間が検出される。例えば、歌唱ピッチが所定時間にわたって連続しない細切れになっている場合、歌唱ピッチの変化が急激すぎる場合、歌唱ピッチが短期間で不連続的に大きく変化する場合(ピッチの跳躍がある場合)等である。 Also, when the accompaniment sound wraps around the input sound and there is a high possibility that it is not a singing sound, it is detected as an illegal section. When there is a high possibility that an accompaniment sound is included in the sound input to the microphone 23, a section having a singing pitch waveform shown below is detected. For example, when the singing pitch is not continuous for a predetermined time, when the singing pitch changes too rapidly, when the singing pitch changes discontinuously and greatly in a short period (when there is a jump in the pitch), etc. is there.
 なお、歌唱音声データから抽出される歌唱ピッチ以外のパラメータから不正区間を検出してもよい。例えば、音量が所定時間以上にわたって同じである(またはほとんど変化しない)場合、音量変化から判定されるノート(一つのメロディ音)が連続して所定時間(例えば10秒)以上続く場合、または、フォルマントが通常の歌唱で得られるフォルマントとは大きく異なる(ピークの分布の相関が一定値以下である場合など)場合には、通常の歌唱以外の音(例えば、伴奏音、楽器音等)が含まれている可能性が高い。したがって、このような条件を満たす区間は、不正区間として検出される。 In addition, you may detect an illegal area from parameters other than the singing pitch extracted from singing voice data. For example, when the volume is the same (or hardly changes) for a predetermined time or more, a note (one melody sound) determined from the volume change continues for a predetermined time (for example, 10 seconds) or more, or a formant Is significantly different from the formant obtained by normal singing (such as when the peak distribution correlation is below a certain value), it includes sounds other than normal singing (for example, accompaniment sounds, instrument sounds, etc.) There is a high possibility. Therefore, a section satisfying such a condition is detected as an illegal section.
 上記のような不正区間は、特定区間として評価区間から除外する。特定区間のうち歌唱技法の検出がされた部分(技法区間)については、評価部111が歌唱ピッチにかかわらず、歌唱技法を使っていることにより評価を高くするように評価値を算出してもよいことは、上述した。一方、不正区間については、評価値の算出に影響を与えなくてもよい。また、意図的に高評価を狙った不正な入力音である場合もあるため、評価を低くするように評価値が算出されてもよい。 Exceptions such as the above are excluded from the evaluation section as specific sections. Even if the evaluation value is calculated so that the evaluation unit 111 uses the singing technique regardless of the singing pitch, the evaluation value is calculated for the part (technical section) in which the singing technique is detected in the specific section. The good thing was mentioned above. On the other hand, the illegal section may not affect the calculation of the evaluation value. Moreover, since it may be an improper input sound which intentionally aimed at high evaluation, an evaluation value may be calculated so that evaluation may be made low.
<その他の実施形態>
 上記実施形態において、評価区間は、歌唱音声の入力期間から特定区間を除外した区間として設定していたが、これに限られない。例えば、特定区間の前または後の所定区間も評価の対象外となるように評価区間が設定されてもよい。また、特定区間の一部が評価区間に組み込まれてもよい。なお、評価区間は、特定区間とは関係なく、所定の区間として予め決められていてもよい。この場合には、特定区間の検出は不要であるから、図12に示すように、特定区間検出部がなく、また、特定区間とは関係なく決められた評価区間において歌唱ピッチ波形を基準ピッチと比較する比較部109Dを含む評価機能100Dが実現されてもよい。
<Other embodiments>
In the said embodiment, although the evaluation area was set as an area which excluded the specific area from the input period of singing voice, it is not restricted to this. For example, the evaluation section may be set so that a predetermined section before or after the specific section is also excluded from the evaluation target. A part of the specific section may be incorporated in the evaluation section. The evaluation section may be determined in advance as a predetermined section regardless of the specific section. In this case, since the detection of the specific section is unnecessary, as shown in FIG. 12, there is no specific section detection unit, and the singing pitch waveform is set as the reference pitch in the evaluation section determined regardless of the specific section. An evaluation function 100D including a comparison unit 109D for comparison may be realized.
 また、この例における評価方法は、図13に示すフローチャートで示される。すなわち、この例における評価方法は、入力音を取得し(ステップS101)、取得された入力音のピッチを算出し(ステップS103)、所定の評価区間において、複数の基準ピッチと、算出された入力音のピッチとを比較し(ステップS105)、比較された結果に基づいて、入力音に対する評価値を算出する(ステップS107)ことの各処理を含む。なお、第1実施形態のように、入力音の入力期間のうち、算出された入力音のピッチが特定の変化をする特定区間を検出するステップを有していてもよい。この場合には、所定の評価区間は、入力期間のうち特定区間に基づいて決められていてもよい。また、第5実施形態のように、特定区間における入力音のピッチを変更ルールに従って変更するステップを有することによって、比較対象である入力音のピッチが、変更された後の入力音のピッチを含むようになっていてもよい。 Further, the evaluation method in this example is shown in the flowchart shown in FIG. That is, the evaluation method in this example acquires an input sound (step S101), calculates a pitch of the acquired input sound (step S103), and calculates a plurality of reference pitches and a calculated input in a predetermined evaluation section. Each process includes comparing the pitch of the sound (step S105) and calculating an evaluation value for the input sound based on the comparison result (step S107). Note that, as in the first embodiment, a step of detecting a specific section in which the calculated pitch of the input sound changes in the input sound input period may be included. In this case, the predetermined evaluation section may be determined based on a specific section in the input period. In addition, as in the fifth embodiment, by having the step of changing the pitch of the input sound in the specific section according to the change rule, the pitch of the input sound to be compared includes the pitch of the input sound after being changed. It may be like this.
 ピッチ比較部109において算出される不一致度は、基準ピッチと歌唱ピッチとの差分をサンプルごとに加算していたが、この差分が大きいほど、不一致度がさらに大きくなるように重み付けをしてもよい。例えば、差分が10centである場合に対して20centである場合には、差分が2倍になるのではなく3倍になるようにすればよい。一方、基準ピッチと歌唱ピッチとの差分が所定範囲より小さい場合(例えば、2cent以下)である場合には、一致(差分0cent)しているものとして扱い、不一致度を大きくしないようにしてもよい。 The degree of inconsistency calculated in the pitch comparison unit 109 is obtained by adding the difference between the reference pitch and the singing pitch for each sample, but weighting may be performed so that the degree of inconsistency increases as the difference increases. . For example, when the difference is 20 cent compared to 10 cent, the difference may be tripled instead of doubling. On the other hand, when the difference between the reference pitch and the singing pitch is smaller than a predetermined range (for example, 2 cent or less), it may be treated as matching (difference 0 cent) and the degree of mismatch may not be increased. .
 また、基準ピッチと歌唱ピッチとの差分は、フラット側とシャープ側とに分けて加算してもよい。そして、フラット側の不一致度とシャープ側の不一致度とをそれぞれ算出してもよい。不一致度の大きさがいずれかに偏っている場合には、シャープ気味に外れた歌唱なのか、フラット気味に外れた歌唱なのかを判定することもできる。 Also, the difference between the reference pitch and the singing pitch may be added separately for the flat side and the sharp side. Then, a flat-side mismatch degree and a sharp-side mismatch degree may be calculated, respectively. If the degree of mismatch is biased to any one, it can be determined whether the song is out of sharpness or out of flatness.
 入力音取得部103によって取得される歌唱音声データが示す音は、歌唱者による音声に限られず、歌唱合成による音声であってもよいし、楽器音であってもよい。楽器音である場合には、単音演奏であることが望ましい。また、楽器音である場合には、特定区間として検出される技法として、例えば、ビブラート、スタッカート、ベンドアップ(シャクリ)、ベンドダウン(フォール)、スライド(ポルタメント)がある。これらの技法のうち、ピッチ変化を伴うビブラート、ベンドアップ、ベンドダウン、スライドが、実施形態と同様な方法で検出される。ピッチ比較部109において算出される不一致度に影響を及ぼすため、このようにして検出された特定区間においては、歌唱の場合と同様に評価区間における評価から除外される。これら以外にも、トリルおよび極めて短い修飾音等の音符の表現、サックスのグロール、ギターのカッティング等の音色に関する技法についても、ピッチの取得精度に影響があるため特定区間として検出して評価の除外の対象としてもよい。さらに複音の発音ができる楽器は誤動作を防止するために複音を検出して除去の対象としてもよい。なお、第8実施形態に示すような技法区間および不正区間の検出の場合を含み、全ての実施形態において、マイクロフォン23への入力音は歌唱音声に限らず楽器音であってもよい。
 本発明を詳細にまた特定の実施態様を参照して説明したが、本発明の精神と範囲を逸脱することなく様々な変更や修正を加えることができることは当業者にとって明らかである。
The sound indicated by the singing voice data acquired by the input sound acquisition unit 103 is not limited to the voice by the singer, but may be a voice by singing synthesis or an instrument sound. If it is a musical instrument sound, it is desirable to be a single note performance. In the case of a musical instrument sound, techniques detected as a specific section include, for example, vibrato, staccato, bend up (shakri), bend down (fall), and slide (portamento). Of these techniques, vibrato, bend-up, bend-down, and slide with pitch change are detected in the same manner as in the embodiment. In order to affect the degree of inconsistency calculated in the pitch comparison unit 109, the specific section detected in this way is excluded from the evaluation in the evaluation section as in the case of singing. In addition to these, techniques related to timbres such as expression of notes such as trills and extremely short modified sounds, saxophone growls, guitar cutting, etc. are also detected as specific sections and excluded from evaluation because they affect the pitch acquisition accuracy. It is good also as an object of. Furthermore, in order to prevent malfunction, a musical instrument capable of generating complex sounds may be detected by detecting complex sounds. In addition, including the case of detecting the technique section and the illegal section as shown in the eighth embodiment, the input sound to the microphone 23 is not limited to the singing voice and may be an instrument sound in all the embodiments.
Although the present invention has been described in detail and with reference to specific embodiments, it will be apparent to those skilled in the art that various changes and modifications can be made without departing from the spirit and scope of the invention.
 本出願は、2015年3月17日出願の日本特許出願2015-053020および2016年1月27日出願の日本特許出願2016-013318に基づくものであり、その内容はここに参照として取り込まれる。 This application is based on Japanese Patent Application No. 2015-053020 filed on March 17, 2015 and Japanese Patent Application No. 2016-013318 filed on January 27, 2016, the contents of which are incorporated herein by reference.
1…評価装置、11…制御部、13…記憶部、15…操作部、17…表示部、19…通信部、21…信号処理部、23…マイクロフォン、25…スピーカ、100…評価機能、101…伴奏出力部、103…入力音取得部、105…ピッチ算出部、107…特定区間検出部、109…ピッチ比較部、111…評価部、113…ピッチ変更部、115…コード検出部、117…タイミング評価部 DESCRIPTION OF SYMBOLS 1 ... Evaluation apparatus, 11 ... Control part, 13 ... Memory | storage part, 15 ... Operation part, 17 ... Display part, 19 ... Communication part, 21 ... Signal processing part, 23 ... Microphone, 25 ... Speaker, 100 ... Evaluation function, 101 ... accompaniment output unit, 103 ... input sound acquisition unit, 105 ... pitch calculation unit, 107 ... specific section detection unit, 109 ... pitch comparison unit, 111 ... evaluation unit, 113 ... pitch change unit, 115 ... chord detection unit, 117 ... Timing evaluation section

Claims (9)

  1.  入力音を取得する入力音取得部と、
     前記入力音取得部によって取得された前記入力音のピッチを算出するピッチ算出部と、
     所定の評価区間において、複数の基準ピッチと前記ピッチ算出部によって算出された前記入力音のピッチとを比較するピッチ比較部と、
     前記ピッチ比較部によって比較された結果に基づいて、前記入力音に対する評価値を算出する評価部と、
     を備えることを特徴とする評価装置。
    An input sound acquisition unit for acquiring the input sound;
    A pitch calculation unit for calculating the pitch of the input sound acquired by the input sound acquisition unit;
    A pitch comparison unit that compares a plurality of reference pitches with the pitch of the input sound calculated by the pitch calculation unit in a predetermined evaluation section;
    An evaluation unit that calculates an evaluation value for the input sound based on the result compared by the pitch comparison unit;
    An evaluation apparatus comprising:
  2.  前記入力音の入力期間のうち、前記ピッチ算出部によって算出された前記入力音のピッチが特定の変化をする特定区間を検出する特定区間検出部をさらに備え、
     前記所定の評価区間は、前記入力期間のうち前記特定区間に基づいて決められていることを特徴とする請求項1に記載の評価装置。
    A specific section detection unit for detecting a specific section in which the pitch of the input sound calculated by the pitch calculation unit has a specific change in the input period of the input sound;
    The evaluation apparatus according to claim 1, wherein the predetermined evaluation section is determined based on the specific section in the input period.
  3.  前記入力音の入力期間のうち、前記入力音のピッチが特定の変化をする特定区間を検出する特定区間検出部と、
     前記特定区間における前記入力音のピッチを、変更ルールに従って変更するピッチ変更部と、
     をさらに備え、
     前記ピッチ比較部による比較対象である前記ピッチ算出部によって算出された前記入力音のピッチは、前記ピッチ変更部によって変更された後の前記入力音のピッチを含むこと
    を特徴とする請求項1に記載の評価装置。
    A specific section detection unit for detecting a specific section in which the pitch of the input sound changes in a specific manner in the input period of the input sound; and
    A pitch changing unit for changing the pitch of the input sound in the specific section according to a change rule;
    Further comprising
    The pitch of the input sound calculated by the pitch calculation unit, which is a comparison target by the pitch comparison unit, includes the pitch of the input sound after being changed by the pitch changing unit. The evaluation device described.
  4.  隣接する前記基準ピッチが100centの間隔で決められていることを特徴とする請求項1に記載の評価装置。 The evaluation apparatus according to claim 1, wherein the adjacent reference pitches are determined at an interval of 100 cent.
  5.  前記入力期間において、音出力装置に伴奏音を出力する伴奏音出力部をさらに備え、
     前記基準ピッチは、前記出力される前記伴奏音に基づいて決められていることを特徴とする請求項1に記載の評価装置。
    In the input period, further comprising an accompaniment sound output unit for outputting an accompaniment sound to the sound output device,
    The evaluation apparatus according to claim 1, wherein the reference pitch is determined based on the output accompaniment sound.
  6.  前記基準ピッチは、前記伴奏音に対応して変更されることを特徴とする請求項5に記載の評価装置。 6. The evaluation apparatus according to claim 5, wherein the reference pitch is changed corresponding to the accompaniment sound.
  7.  前記評価部は、前記特定区間の検出および前記比較された結果に基づいて、前記評価値を算出することを特徴とする請求項2または請求項3に記載の評価装置。 The evaluation apparatus according to claim 2 or 3, wherein the evaluation unit calculates the evaluation value based on the detection of the specific section and the comparison result.
  8.  前記特定区間は、当該特定区間に含まれている前記入力音のピッチに対応して分類された技法区間と不正区間とを含み、
     前記評価部は、前記技法区間と前記不正区間とでは、前記評価値の算出に与える影響が異なることを特徴とする請求項2または請求項3に記載の評価装置。
    The specific section includes a technique section and an illegal section classified according to the pitch of the input sound included in the specific section,
    The evaluation apparatus according to claim 2, wherein the evaluation unit has different effects on the calculation of the evaluation value between the technique section and the unauthorized section.
  9.  入力音を取得し、
     前記入力音のピッチを算出し、
     所定の評価区間において、複数の基準ピッチと前記算出された前記入力音のピッチとを比較し、
     前記比較された結果に基づいて、前記入力音に対する評価値を算出する
     ことをコンピュータに実行させるためのプログラム。
    Get the input sound,
    Calculating the pitch of the input sound,
    In a predetermined evaluation section, a plurality of reference pitches are compared with the calculated pitch of the input sound,
    A program for causing a computer to calculate an evaluation value for the input sound based on the compared result.
PCT/JP2016/058583 2015-03-17 2016-03-17 Evaluation device and program WO2016148256A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2015053020 2015-03-17
JP2015-053020 2015-03-17
JP2016013318A JP2016173562A (en) 2015-03-17 2016-01-27 Evaluation device and program
JP2016-013318 2016-01-27

Publications (1)

Publication Number Publication Date
WO2016148256A1 true WO2016148256A1 (en) 2016-09-22

Family

ID=56919024

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/058583 WO2016148256A1 (en) 2015-03-17 2016-03-17 Evaluation device and program

Country Status (1)

Country Link
WO (1) WO2016148256A1 (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004102149A (en) * 2002-09-12 2004-04-02 Taito Corp Karaoke scoring device having sobbing grading function
JP2005107328A (en) * 2003-09-30 2005-04-21 Yamaha Corp Karaoke machine
JP2005107330A (en) * 2003-09-30 2005-04-21 Yamaha Corp Karaoke machine
JP2005107087A (en) * 2003-09-30 2005-04-21 Yamaha Corp Singing voice evaluating device, karaoke scoring device and program for these devices
JP2008015214A (en) * 2006-07-06 2008-01-24 Dds:Kk Singing skill evaluation method and karaoke machine
JP2008026622A (en) * 2006-07-21 2008-02-07 Yamaha Corp Evaluation apparatus
JP2010230854A (en) * 2009-03-26 2010-10-14 Brother Ind Ltd Scoring device and program
JP2014092550A (en) * 2012-10-31 2014-05-19 Daiichikosho Co Ltd Voice evaluation device for evaluating singing with shout technique

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004102149A (en) * 2002-09-12 2004-04-02 Taito Corp Karaoke scoring device having sobbing grading function
JP2005107328A (en) * 2003-09-30 2005-04-21 Yamaha Corp Karaoke machine
JP2005107330A (en) * 2003-09-30 2005-04-21 Yamaha Corp Karaoke machine
JP2005107087A (en) * 2003-09-30 2005-04-21 Yamaha Corp Singing voice evaluating device, karaoke scoring device and program for these devices
JP2008015214A (en) * 2006-07-06 2008-01-24 Dds:Kk Singing skill evaluation method and karaoke machine
JP2008026622A (en) * 2006-07-21 2008-02-07 Yamaha Corp Evaluation apparatus
JP2010230854A (en) * 2009-03-26 2010-10-14 Brother Ind Ltd Scoring device and program
JP2014092550A (en) * 2012-10-31 2014-05-19 Daiichikosho Co Ltd Voice evaluation device for evaluating singing with shout technique

Similar Documents

Publication Publication Date Title
US8471135B2 (en) Music transcription
WO2017082061A1 (en) Tuning estimation device, evaluation apparatus, and data processing apparatus
US8618402B2 (en) Musical harmony generation from polyphonic audio signals
JP4672613B2 (en) Tempo detection device and computer program for tempo detection
JP4767691B2 (en) Tempo detection device, code name detection device, and program
US10497348B2 (en) Evaluation device and evaluation method
US10643638B2 (en) Technique determination device and recording medium
JP5229998B2 (en) Code name detection device and code name detection program
WO2016148256A1 (en) Evaluation device and program
JP2016173562A (en) Evaluation device and program
JP2016180965A (en) Evaluation device and program
Müller et al. Tempo and Beat Tracking
Freire et al. Analysis of musical textures played on the guitar by means of real-time extraction of mid-level descriptors
JP6638305B2 (en) Evaluation device
Van Oudtshoorn Investigating the feasibility of near real-time music transcription on mobile devices
WO2018216423A1 (en) Musical piece evaluation apparatus, musical piece evaluation method, and program
Fiss Real-time software electric guitar audio transcription

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16765085

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16765085

Country of ref document: EP

Kind code of ref document: A1