WO2015060340A1 - Singing voice synthesis - Google Patents

Singing voice synthesis Download PDF

Info

Publication number
WO2015060340A1
WO2015060340A1 PCT/JP2014/078080 JP2014078080W WO2015060340A1 WO 2015060340 A1 WO2015060340 A1 WO 2015060340A1 JP 2014078080 W JP2014078080 W JP 2014078080W WO 2015060340 A1 WO2015060340 A1 WO 2015060340A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
pitch
singing
volume
singing voice
Prior art date
Application number
PCT/JP2014/078080
Other languages
French (fr)
Japanese (ja)
Inventor
土屋 豪
毅彦 川原
純也 浦
Original Assignee
ヤマハ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ヤマハ株式会社 filed Critical ヤマハ株式会社
Publication of WO2015060340A1 publication Critical patent/WO2015060340A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • G10L13/0335Pitch control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation
    • G10L2013/105Duration
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals

Definitions

  • the present invention relates to an apparatus and method for synthesizing a singing voice, and further relates to a non-transitory computer-readable storage medium storing a program executable by a processor for realizing the method.
  • the following techniques are known as techniques for converting a singer's (user) singing voice into another person's singing voice. That is, when formant sequence data when a specific person (for example, an original singer) sings is stored in advance as source data and the singing voice by the singer (user) is converted, the singer (user) There has been proposed a technique for synthesizing a singing voice by shaping a formant based on a formant sequence of an original singer in accordance with the pitch and volume of the singing voice (see, for example, Patent Document 1).
  • the present invention has been made in view of the above-described circumstances, and brings about an unprecedented experience in the technology for generating a singing voice having a voice quality different from the input voice based on the input voice (for example, a user's singing voice).
  • the purpose is to be able to.
  • a singing voice synthesizing apparatus supplies a pitch detection unit for detecting the pitch of an input voice, a volume detection unit for detecting the volume of the input voice, and the progress of the performance.
  • a voice synthesizer that synthesizes a singing voice based on the lyrics data, and controls the pitch and volume according to the pitch detected by the pitch detector and the volume detected by the volume detector
  • the voice synthesis unit that synthesizes the singing voice.
  • the pitch and the volume of the synthesized singing voice are controlled according to the pitch and the volume detected from the input voice. .
  • the synthesized singing voice does not affect the way the original singer sings.
  • the singing voice is synthesized with a voice quality different from that of the singer (user) while reflecting the pitch and volume of the singing by the singer (user)
  • the expression of the singing is seen from the viewpoint of the singer (user). You can zoom in and experience a new singing experience.
  • the speech synthesizer may synthesize artificial singing speech corresponding to the characters of the lyrics data using speech segment data stored in the library.
  • the voice synthesis unit may synthesize the singing voice with, for example, the same pitch as the pitch detected by the pitch detection unit, or a sound shifted in a predetermined relationship with respect to the detected pitch. You may synthesize with high. Further, the voice synthesis unit may synthesize the singing voice with the same volume as the volume detected by the volume detection unit, or with a volume having a predetermined relationship with the detected volume. Alternatively, when the detected sound volume exceeds the threshold value, the sound volume may be synthesized according to the sound volume.
  • a configuration may further include a sound source unit that generates an accompaniment sound in accordance with the progress of the performance, and an output unit that acoustically outputs the accompaniment sound and the singing voice.
  • a sound source unit that generates an accompaniment sound in accordance with the progress of the performance
  • an output unit that acoustically outputs the accompaniment sound and the singing voice.
  • the voice synthesizer may synthesize the singing voice according to the utterance timing of the lyrics data.
  • the voice synthesizer may synthesize singing voice by changing the singing timing of the lyrics data according to the volume detected by the volume detector. According to this configuration, the singer can control the synthesized lyric sound to some extent, not according to the utterance timing defined by the lyric data. For this reason, it becomes possible to improvise (ad-lib) the timing of the singing voice-synthesized.
  • the present invention can be embodied not only as a singing voice synthesizing apparatus but also as a computer-implemented method, and the method is stored in a program that stores a program executable by a processor. It can also be embodied as a transient computer readable storage medium.
  • FIG. 1 is a functional block diagram showing the configuration of the singing voice synthesizing apparatus 10 according to the first embodiment.
  • the singing voice synthesizing apparatus 10 is a notebook type or tablet type computer such as a voice input unit 102, a pitch detection unit 104, a volume detection unit 108, an operation unit 112, a control unit 120, a database 130, a voice.
  • a synthesis unit 140, a sound source unit 160, and speakers 172 and 174 are included.
  • the voice input unit 102, the operation unit 112, the voice synthesis unit 140, and the speakers 172 and 174 are constructed by hardware
  • the pitch detection unit 104, the volume detection unit 108, the control unit 120, the database 130 and the sound source unit 160 are constructed by executing an application program in which a CPU (Central Processing Unit) (not shown) is installed in advance.
  • a CPU Central Processing Unit
  • the singing voice synthesizing apparatus 10 has a display unit in addition to this, so that the user can check the status and settings of the apparatus.
  • the audio input unit 102 omits details, but a microphone that converts a singing voice by a singer (user) into a singing voice signal of an electric signal, and an LPF (low-pass) that cuts a high frequency component of the converted singing voice signal. Filter) and an A / D converter that converts a singing voice signal from which a high frequency component is cut into a digital signal.
  • the pitch detection unit 104 performs frequency analysis on the singing voice signal (input voice) converted into a digital signal and outputs pitch data indicating the pitch (frequency) obtained by the analysis in almost real time.
  • FFT Fast Fourier Transform
  • the volume detection unit 108 detects the amplitude envelope of the singing voice signal by, for example, filtering the singing voice signal converted into a digital signal with a low-pass filter, and obtains volume data indicating the volume of the singing voice almost in real time. Output.
  • the operation unit 112 inputs an operation by a singer, for example, an operation for selecting a song to be sung, and supplies information indicating the operation to the control unit 120.
  • the database 130 stores music data for a plurality of songs.
  • the music data for one song is composed of accompaniment data that defines the accompaniment sound of the song with one or more tracks, and lyrics data indicating the lyrics of the song.
  • the control unit 120 functions as a sequencer when the performance is in progress.
  • the control unit 120 functioning as a sequencer interprets the accompaniment data in the music data read from the database 130, and sets the musical tone information that defines the musical tone to be generated in time series from the start of the performance to the progress of the performance. Are supplied to the sound source unit 160 in this order.
  • the accompaniment data conforming to the MIDI standard is used.
  • the accompaniment data is defined by a combination of an event and a duration indicating a time interval between events. For this reason, the control unit 120 supplies tone information indicating the content of the event to the sound source unit 160 every time the time indicated by the duration elapses.
  • control unit 120 interprets the accompaniment data and supplies musical tone information to the sound source unit 160 to advance the performance of the song.
  • the control unit 120 obtains an integrated value of the duration from the start of the performance.
  • the control unit 120 can grasp the progress state of the performance, that is, which part of the song is being played based on the integrated value.
  • the sound source unit 160 synthesizes a musical sound signal indicating an accompaniment sound according to the musical sound information supplied from the control unit 120.
  • the sound source unit 160 is not essential because it is not always necessary to output the accompaniment sound.
  • the musical tone signal output from the sound source unit 160 is converted into an analog signal by a D / A conversion unit (not shown), and then is acoustically converted by the speaker 174 and output.
  • the control unit 120 supplies musical tone information to the sound source unit 160 and also supplies lyrics data to the voice synthesis unit 140 as the performance progresses.
  • the voice synthesizer 140 synthesizes the singing voice according to the lyrics data supplied from the controller 120, the pitch data supplied from the pitch detector 104, and the volume data supplied from the volume detector 108. And output as a singing voice signal.
  • the singing voice signal output from the voice synthesis unit 140 is converted into an analog signal by a D / A conversion unit (not shown), and then acoustically converted by the speaker 172 and output.
  • FIG. 2 is a diagram showing an example of lyrics data.
  • the lyrics data of “Sakura” is shown as a song together with the melody (the score displayed on the lyrics). Note that the copyright protection period of the music “Sakura” has already expired in accordance with the provisions of Article 51 and Article 57 of the Copyright Act of Japan.
  • the lyric data is composed of a character string in which character information of lyrics to be sung is arranged in order from the start of the performance, and utterance timing for each character or character group constituting the lyrics (that is, for each syllable). Information for defining the utterance timing).
  • the lyric data may include character information indicating the lyrics, and the characters or character groups of the lyrics may be divided in time so that the temporal arrangement can be specified.
  • the lyric data may include information that associates each character of the lyric with a note of a melody, that is, a singing timing at which the lyrics are to be sung and a pitch at which the lyrics are to be sung.
  • a note of a melody that is, a singing timing at which the lyrics are to be sung and a pitch at which the lyrics are to be sung.
  • one note is assigned to each of the lyrics 51 to 57 (the lyrics 51 to 57 are shown in the figure, and the subsequent illustration is omitted), but depending on the song (lyrics),
  • a plurality of notes may be assigned to one character or character group (that is, one syllable), or a plurality of characters or character groups (that is, a plurality of syllables) may be assigned to one note.
  • the control unit 120 displays the lyric character or character group corresponding to the note (that is, the characters constituting the syllable) and the pitch of the note. Is supplied to the speech synthesizer 140.
  • the control unit 120 can determine whether or not the value corresponding to the singing timing has been reached.
  • the accompaniment sound is not output (when the accompaniment data is not used)
  • the progress of the performance cannot be grasped by the integrated value of the duration of the accompaniment data.
  • the singing timing of the lyrics is the accompaniment data.
  • whether or not it is the singing timing is defined by the event (the lyrics singing event) and the duration indicating the time interval between the events. What is necessary is just to judge by no.
  • the speech synthesizer 140 synthesizes the characters of the lyrics data supplied from the controller 120 using speech segment data registered in a library (not shown).
  • a library speech unit data defining waveforms of various speech units that are materials of singing speech such as a single phoneme or a transition part from one phoneme to another phoneme is registered in advance.
  • the speech synthesizer 140 converts the phoneme sequence indicated by the characters of the supplied lyric data into a speech unit sequence, and selects speech unit data corresponding to these speech units from the library. While connecting mutually, the pitch of each connected speech unit data is converted according to the designated pitch, and the singing voice signal which shows a singing voice is synthesize
  • the singing voice is output separately from the speaker 172 and the accompaniment sound is output separately from the speaker 174.
  • the singing voice and the accompaniment sound may be mixed and output from the same speaker. .
  • the control unit 120 reads the song data corresponding to the song from the database 130, and among the song data, Interpret the accompaniment data, supply the musical tone information of the accompaniment sound to be synthesized to the sound source unit 160 and synthesize the musical tone signal to the sound source unit 160, while the lyrics data of the music data is synchronized with the progress of the performance
  • the voice synthesis unit 140 is supplied to synthesize a singing voice signal.
  • the singing synthesizing apparatus 10 when a performance is started, firstly, a musical tone synthesis process for synthesizing musical tone signals in accordance with the progress of the performance, and secondly, lyrics data is supplied in accordance with the progress of the performance.
  • the singing voice synthesizing process for synthesizing the singing voice is executed independently of each other.
  • the tone synthesis process is a process in which the control unit 120 supplies tone information as the performance progresses, while the tone generator unit 160 synthesizes a tone signal based on the tone information.
  • This process itself is well known. (See, for example, JP-A-7-199975). For this reason, description is abbreviate
  • the control unit 120 When a song is selected by the operation unit 112, the control unit 120 automatically starts supplying accompaniment data and lyrics data of the song. As a result, the start of performance of the song is instructed. However, even if a song is selected, if the performance of another song is in progress, the control unit 120 waits for the performance of the selected song until the other song ends.
  • FIG. 3 is a flowchart showing the singing voice synthesis process.
  • This singing voice synthesis process is executed by the control unit 120 and the voice synthesis unit 140.
  • the control unit 120 first determines whether or not the progress stage of the performance is a singing timing (step Sa11).
  • step Sa11 If it is determined that the performance stage is not the singing timing (if the determination result in step Sa11 is “No”), the control unit 120 returns the processing procedure to step Sa11. In other words, the process waits at step Sa11 until the performance stage reaches the singing timing. On the other hand, if it is determined that the progress stage of the performance has become the singing timing (if the determination result of step Sa11 is “Yes”), the control unit 120 may control the lyrics data, that is, the characters and sounds to be sung at the singing timing. Data defining the height is supplied to the speech synthesizer 140 (step Sa12).
  • the speech synthesizer 140 synthesizes speech based on the lyrics data when the lyrics data is supplied from the control unit 120, but controls the pitch and volume as follows (step Sa13). That is, if the volume indicated by the volume data supplied from the volume detection unit 108 is equal to or lower than the threshold, the voice synthesis unit 140 supplies the text of the lyrics data from the volume detection unit 108 at the pitch of the lyrics data. The voice is synthesized at the volume indicated by the volume data to be output and output as a singing voice signal.
  • This threshold value is a small value. Therefore, when the volume indicated by the volume data is equal to or lower than the threshold value, even if the singing voice signal is output from the speaker 172, it is output at a level that can be ignored for hearing.
  • the voice synthesis unit 140 determines the pitch of the lyrics data supplied from the control unit 120 as a pitch detection unit.
  • the pitch is changed to the pitch indicated by the pitch data supplied from 104, and the characters of the lyrics data are synthesized with the volume indicated by the volume data supplied from the volume detector 108 and output as a singing voice signal.
  • the singing voice signal that can be heard from the speaker 172 is obtained by synthesizing the characters of the lyric data with the pitch changed by the singer and the volume change following the change of the volume sung by the singer.
  • step Sa14 the control unit 120 determines whether or not there is lyric data to be sung next. If it exists (if the determination result in step Sa14 is “No”), the control unit 120 returns the processing procedure to step Sa11. Thereby, the process of step Sa12, 13 is performed when the progress stage of performance comes to the next song timing. Finally, if there is no data to be sung next (if the determination result in step Sa14 is “Yes”), the control unit 120 ends the singing voice synthesis process.
  • FIG. 4 is a diagram showing a specific synthesis example of singing voice. This figure is an example in the case where “Sakura” (see FIG. 2) is selected as a song sung by the singer.
  • the singing voice is shown in FIG. Is output. That is, when the singer sings with the volume turned up at a slightly delayed timing from the beginning of “sa” (lyric characters 51) with respect to the progress of the performance, the speech synthesizer 140 is supplied from the volume detector 108.
  • the amplitude of the singing voice signal is adjusted according to the volume, so that “s” (symbol 61) of the singing voice shown in (c) is (a) The correct singing timing as defined by the lyric data (lyrics 51) is not obtained.
  • the singing voice synthesizing apparatus 10 in the first embodiment uses only the pitch and volume of the singer when synthesizing the singing voice. Therefore, if a singer (user) sings in a scat or humming style, such as “Ah, Ah,” instead of “Sakura, Sakura”, The singing voice synthesized by the singing voice synthesizing apparatus 10 has the correct lyrics “Sakura, Sakura”.
  • the singing voice synthesize combined with the voice quality different from a singer is output, reflecting the intention (pitch, volume) of the singing by a singer, the singing expression of the singer is expressed. It can be expanded and a new singing experience can be experienced.
  • the singing voice is synthesized by reflecting the pitch and volume of the singing by the singer.
  • Information other than the pitch and volume in short, the actual singing voice by the singer It is not used at all.
  • 2nd Embodiment demonstrated below it comprises so that a chorus may be performed by the actual singing voice itself by a singer, and the singing voice synthesize
  • the second embodiment can be summarized as follows. For example, an actual singing voice by a singer is a root sound, while a sound that is three times higher than the root sound and a sound that is five times higher than the root sound. Is synthesized with a triad, even though the singer is singing alone.
  • FIG. 5 is a functional block diagram showing the configuration of the singing voice synthesizing apparatus 10 according to the second embodiment.
  • the singing voice synthesizing device 10 shown in this figure is different from the first embodiment shown in FIG. 1 in that the pitch converting units 106a and 106b are provided, and the two voice synthesizing units 140a and 140b are provided.
  • the point provided is the point where the mixer 150 is provided. For this reason, in the second embodiment, these different portions will be mainly described.
  • the pitch conversion unit 106a converts the pitch indicated by the pitch data supplied from the pitch detection unit 104 into a pitch having a predetermined relationship, for example, a pitch that is three degrees above.
  • the pitch conversion unit 106b converts the pitch indicated by the pitch data supplied from the pitch detection unit 104 into a pitch having a predetermined relationship, for example, a pitch that is 5 degrees above.
  • 3 degrees for the root sound include a short 3 degree and a long 3 degree
  • 5 degrees for the root sound includes a complete 5 degree, a decreased 5 degree, and an increased 5 degree.
  • the pitch converters 106a and 106b tabulate the converted pitches for the root pitches in advance.
  • the pitch indicated by the pitch data supplied from the pitch detection unit 104 may be converted with reference to the table.
  • the voice synthesis units 140a and 140b are functionally the same as the voice synthesis unit 140 in the first embodiment, and receive the same lyrics data from the control unit 120, but the voice synthesis unit 140a
  • the pitch converted by the pitch converter 106a is specified, and the pitch converted by the pitch converter 106b is specified in the voice synthesizer 140b.
  • the mixer 150 mixes the singing voice signal from the voice input unit 102, the singing voice signal from the voice synthesis unit 140a, and the singing voice signal from the voice synthesis unit 140b. Note that the mixed singing voice signal is converted into an analog signal by a D / A converter (not shown), and then acoustically converted by a speaker 172 and output.
  • FIG. 6 is a diagram showing a specific synthesis example of the singing voice according to the second embodiment.
  • “SAKURA” (see FIG. 2) is selected as a song to be sung by the singer, and the singer listens to the accompaniment sound while the performance progresses, with reference numerals 71, 72, 73,.
  • This is an example in which the lyrics shown are sung at the pitch indicated by the keyboard in the left column of the figure, that is, when the singing is performed at the pitch and singing timing of the score (lyric data) shown in the upper column of the figure.
  • the speech synthesizer 140a synthesizes speech with a pitch that is three times higher than the pitch of the song as indicated by reference numerals 61a, 62a, 63a,. As shown by 62b, 63b,..., Voice synthesis is performed at a pitch five degrees higher than the pitch of the singer's singing.
  • symbol 61a has a 3rd minor relationship with respect to the code
  • symbol 61b has a 3rd major relationship with respect to the code
  • symbol 71, 61a, 61b becomes a short triad.
  • Reference numerals 72, 62a and 62b are also short triads.
  • the reference numeral 63a has a minor third relation with respect to the reference numeral 73, and the reference numeral 63b has a minor third relation with respect to the reference numeral 63a.
  • symbol 73, 63a, 63b becomes a reduced triad. In this way, when the singer sings at a volume exceeding the threshold and at the pitch and timing as shown in the figure, the speaker 172 uses a triad with the singer's singing as the root. A singing voice will be output.
  • the speech synthesis unit is not limited to two systems, and may be configured to convert to a pitch having a predetermined relationship as one system, or may be three or more systems.
  • the singing voice of the singer and the singing voice of the voice synthesis units 140a and 140b are mixed and output from the speaker 172, and the accompaniment sound by the sound source unit 160 is output from another speaker 174.
  • the pitch conversion unit 106a converts the pitches indicated by the pitch data supplied from the pitch detection unit 104 into pitches having a predetermined relationship. The relationship may be changed by an instruction from the control unit 120 or the operation unit 112. The same applies to the pitch conversion unit 106b, and the pitch relationship to be converted may be changed by an instruction from the control unit 120 or the operation unit 112.
  • ⁇ Third Embodiment> when the progress stage of the performance is the singing timing, among the lyrics data, data (characters, pitches) to be sung at the singing timing is supplied to the speech synthesizer 140. From the point of view of the singer, it was not possible to control the timing of the lyrics that were synthesized. On the other hand, in the third embodiment described below, the singer can control the timing of the lyrics to be synthesized to some extent.
  • FIG. 7 is a functional block diagram showing the configuration of the singing voice synthesizing apparatus 10 according to the third embodiment.
  • the singing voice synthesizing apparatus 10 shown in this figure is different from the first embodiment shown in FIG. 1 in that the volume data output from the volume detection unit 108 is supplied to the control unit 120 together with the voice synthesis unit 140. It is. For this reason, in the third embodiment, this difference will be mainly described.
  • the control unit 120 triggers that the volume indicated by the volume data supplied from the volume detection unit 108 exceeds a threshold value or that the temporal change in the volume exceeds a predetermined value.
  • the lyrics data corresponding to the next note is supplied to the speech synthesizer 140. That is, the control unit 120 synthesizes the lyric data corresponding to the next note, such as when the singer's singing volume exceeds a threshold, even if the performance stage is not the singing timing of the lyric data. To the unit 140.
  • FIG. 4A A specific synthesis example of the singing voice according to the third embodiment will be described.
  • the singer selects “Sakura” as the song to sing, and the singer listens to the accompaniment sound.
  • a description will be given of an example of singing at a volume as shown in (b) of the figure as the performance progresses.
  • a singing voice is output as shown in (d) of the figure in the third embodiment. Is done.
  • the control unit 120 performs the following “sa” in accordance with the change in the volume data supplied from the volume detection unit 108.
  • the lyric data (reference numeral 54) is supplied to the speech synthesizer 140. For this reason, “sa” (symbol 64) is voice-synthesized at a timing earlier than the singing timing defined by the lyrics data.
  • the volume indicated by the volume data supplied from the volume detection unit 108 has exceeded a threshold value, or the temporal change in the volume has exceeded a predetermined value.
  • it may be executed as a trigger when the slope (acceleration) of the temporal change of the volume exceeds a predetermined value.
  • the pitch data output from the pitch detection unit 104 is supplied to the control unit 120 together with the voice synthesis unit 140, and the control unit 120 is indicated by the pitch data supplied from the pitch detection unit 104.
  • a configuration may be adopted in which the lyrics data is not supplied to the speech synthesizer 140 and waits for a predetermined time (or until the volume is lowered). With this configuration, the singer can synthesize the singing voice by continuing the desired lyrics longer than the timing defined by the lyrics data.
  • the singer can control the lyrics to be synthesized by voice to some extent rather than the timing specified by the lyrics data, and thus improvise the timing of singing to be synthesized (ad-lib). ) Can be changed.
  • this 3rd Embodiment is not restricted to 1st Embodiment, You may combine with 2nd Embodiment which mixes the song by the song singer himself, and the voice-synthesized song.
  • the control unit 120 supplies lyrics data (characters, pitches) corresponding to the singing timing to the voice synthesizing unit 140 when the progress stage of the performance is the singing timing.
  • lyrics data characters, pitches
  • the control part 120 does not need to supply to the speech synthesis part 140 about a pitch.
  • the voice synthesizer 140 does not substantially output the singing voice signal when the volume indicated by the volume data is equal to or lower than the threshold, and when the volume exceeds the threshold, the pitch of the lyrics data This is because the pitch is indicated by the pitch data output from the pitch detector 104.
  • the voice synthesis unit 140 determines that the volume of the lyrics data supplied from the control unit 120 exceeds the threshold value indicated by the volume data of the input voice.
  • the voice may be synthesized at the pitch indicated by the pitch data of the input voice according to the volume.
  • MIDI data is used as accompaniment data, but the present invention is not limited to this.
  • a configuration may be adopted in which a musical tone signal is obtained by reproducing a compact disc.
  • elapsed time information and remaining time information can be used as information for grasping the progress of performance.
  • the control part 120 should just supply lyric data to the speech synthesis part 140 (140a, 140b) according to the progress of the performance grasped
  • the voice input unit 102 is configured to input a singer's singing with a microphone and convert it into a singing voice signal.
  • the singing voice signal (input voice) is input or inputted in some form. Any configuration can be used.
  • the voice input unit 102 may be configured to input a singing voice signal processed by another processing unit, a singing voice signal supplied (or transferred) from another device, or simply a singing voice. It may be an input interface circuit that receives a signal and transfers it to a subsequent stage.
  • the input voice is not limited to the voice uttered by the user using the singing voice synthesizing apparatus, but may be voiced by another person (friend or third party).
  • the pitch detection unit 104, the pitch conversion units 106a and 106b, and the volume detection unit 108 are configured by software, but may be configured by hardware. Further, the speech synthesizer 140 (140a, 140b) may be configured by software. In addition to controlling the pitch and volume of the singing voice synthesized according to the pitch and volume of the input voice, other voice elements such as timbre are controlled according to the pitch and / or volume of the input voice. You may make it do.
  • the processor according to the present invention is not limited to a processor that can execute a software program such as the CPU described in the above embodiment, but may be a processor that can execute a microprogram such as a DSP. It may be a processor configured by a dedicated hardware circuit (an integrated circuit or a discrete circuit group) so as to realize a desired processing function.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

A user arbitrarily sings the melody of a given song with the lyrics of that song, made-up lyrics, scat syllables, etc. A pitch detection unit (104) receives the voice of the singing user as inputted voice and detects the pitch thereof. A volume detection unit (108) detects the volume of the inputted voice. A voice synthesis unit (140) synthesizes an artificial singing voice on the basis of lyric data which is supplied according to the progress of the performance, and controls the pitch and volume of the synthesized singing voice according to the detected pitch and volume of the inputted voice. The synthesized singing voice is acoustically outputted. An artificial singing voice is thus generated according to the pitch which the user vocalizes.

Description

歌唱音声の合成Singing voice synthesis
 本発明は、歌唱音声を合成する装置および方法に関し、更には、該方法を実現するための、プロセッサによって実行可能なプログラムを記憶した、非一過性のコンピュータ読取り可能な記憶媒体に関する。 The present invention relates to an apparatus and method for synthesizing a singing voice, and further relates to a non-transitory computer-readable storage medium storing a program executable by a processor for realizing the method.
 従来より、歌唱者(ユーザ)の歌唱音声を他人の歌唱音声に変換する技術としては、次のようなものが知られている。すなわち、予め特定人(例えばオリジナルの歌手)が歌唱したときのフォルマントシーケンスデータをソースデータとして記憶しておき、歌唱者(ユーザ)による歌唱音声を変換する際には、当該歌唱者(ユーザ)の歌唱音声の音高および音量に合わせて、オリジナル歌手のフォルマントシーケンスに基づくフォルマントを整形して、歌唱音声を合成する技術が提案されている(例えば特許文献1参照)。 Conventionally, the following techniques are known as techniques for converting a singer's (user) singing voice into another person's singing voice. That is, when formant sequence data when a specific person (for example, an original singer) sings is stored in advance as source data and the singing voice by the singer (user) is converted, the singer (user) There has been proposed a technique for synthesizing a singing voice by shaping a formant based on a formant sequence of an original singer in accordance with the pitch and volume of the singing voice (see, for example, Patent Document 1).
特開平10-268895号公報JP-A-10-268895
 ところで、上記技術では、オリジナル歌手のフォルマントシーケンスデータに基づくフォルマントを整形するので、出力される歌唱音声において、オリジナルの歌手の歌い方の影響が残存するのは避けられない。そのため、歌唱するユーザにとって、十分な若しくは多様な歌唱体験が得られない。一方、音声合成技術によって、人工的に歌唱音声を合成することも知られている。しかし、従来の、人工的な歌唱音声合成技術は、ユーザが歌詞データをキーボート等により入力することに基づき、該入力された歌詞データに対応する歌唱音声を合成するものであった。そのため、ユーザが自らの音声をもって、人工的な歌唱音声合成に介入することができず、体験感に乏しい。 By the way, in the above technique, since the formant based on the formant sequence data of the original singer is shaped, it is inevitable that the influence of the original singer's singing remains in the output singing voice. Therefore, sufficient or various singing experiences cannot be obtained for the user who sings. On the other hand, it is also known to synthesize a singing voice artificially by a voice synthesis technique. However, the conventional artificial singing voice synthesizing technique synthesizes singing voice corresponding to the inputted lyric data based on the user inputting the lyric data by a keyboard or the like. Therefore, the user cannot intervene in artificial singing voice synthesis with his own voice, and the experience is poor.
 本発明は、上述した事情に鑑みてなされたもので、入力音声(例えばユーザの歌唱音声)に基づき該入力音声とは違う声質の歌唱音声を生成する技術において、従来にない体験感をもたらすことができるようにすることを目的とする。また、合成される歌唱音声にオリジナルの歌手の歌い方の影響を及ぼさないようにすることを目的とする。 The present invention has been made in view of the above-described circumstances, and brings about an unprecedented experience in the technology for generating a singing voice having a voice quality different from the input voice based on the input voice (for example, a user's singing voice). The purpose is to be able to. It is another object of the present invention to prevent the synthesized singing voice from affecting the way the original singer sings.
 上記目的を達成するために本発明に係る歌唱合成装置は、入力音声の音高を検出する音高検出部と、前記入力音声の音量を検出する音量検出部と、演奏の進行に応じて供給される歌詞データに基づき歌唱音声を合成する音声合成部であって、前記音高検出部で検出された音高及び前記音量検出部で検出された音量に応じて音高及び音量を制御して前記歌唱音声を合成する前記音声合成部と、を備える。 In order to achieve the above object, a singing voice synthesizing apparatus according to the present invention supplies a pitch detection unit for detecting the pitch of an input voice, a volume detection unit for detecting the volume of the input voice, and the progress of the performance. A voice synthesizer that synthesizes a singing voice based on the lyrics data, and controls the pitch and volume according to the pitch detected by the pitch detector and the volume detected by the volume detector The voice synthesis unit that synthesizes the singing voice.
 これによれば、歌詞データに基づき歌唱音声が人工的に合成される一方で、入力音声から検出された音高および音量に応じて該合成されされる歌唱音声の音高及び音量が制御される。このため、オリジナルの歌手の歌い方という概念が存在せず、合成される歌唱音声にオリジナルの歌手の歌い方の影響を及ぼすことがない。また、歌唱者(ユーザ)による歌唱の音高、音量が反映されつつ、歌唱者(ユーザ)とは異なる声質で歌唱音声が合成されるので、歌唱者(ユーザ)からみれば、歌唱の表現を拡大することができるとともに、新たな歌唱体験を体感することができる。 According to this, while the singing voice is artificially synthesized based on the lyrics data, the pitch and the volume of the synthesized singing voice are controlled according to the pitch and the volume detected from the input voice. . For this reason, there is no concept of how to sing the original singer, and the synthesized singing voice does not affect the way the original singer sings. Also, since the singing voice is synthesized with a voice quality different from that of the singer (user) while reflecting the pitch and volume of the singing by the singer (user), the expression of the singing is seen from the viewpoint of the singer (user). You can zoom in and experience a new singing experience.
 一実施態様において、音声合成部は、ライブラリに記憶した音声素片データを使用して前記歌詞データの文字に対応する人工的な歌唱音声を合成するようにしてよい。 In one embodiment, the speech synthesizer may synthesize artificial singing speech corresponding to the characters of the lyrics data using speech segment data stored in the library.
 音声合成部は、歌唱音声を、例えば、音高検出部で検出された音高と同じ音高で合成しても良いし、あるいは、検出された音高に対して所定の関係でシフトした音高で合成しても良い。また、音声合成部は、歌唱音声を、例えば、音量検出部で検出された音量と同じ音量で合成しても良いし、あるいは、検出された音量に対して所定の関係にある音量で合成しても良いし、検出された音量が閾値を超えたときに当該音量に応じて合成しても良い。 The voice synthesis unit may synthesize the singing voice with, for example, the same pitch as the pitch detected by the pitch detection unit, or a sound shifted in a predetermined relationship with respect to the detected pitch. You may synthesize with high. Further, the voice synthesis unit may synthesize the singing voice with the same volume as the volume detected by the volume detection unit, or with a volume having a predetermined relationship with the detected volume. Alternatively, when the detected sound volume exceeds the threshold value, the sound volume may be synthesized according to the sound volume.
 一実施態様において、前記演奏の進行に応じて伴奏音を生成する音源部と、前記伴奏音と前記歌唱音声とを音響的に出力する出力部と、を更に備える構成としても良い。この構成によれば、音声合成部よって合成された歌唱音声と、演奏の進行に応じた伴奏音とが出力されるので、歌唱者に新たなる歌唱を体験させることができる。該出力部は、更に入力音声を音響的に出力するようにしてもよい。 In one embodiment, a configuration may further include a sound source unit that generates an accompaniment sound in accordance with the progress of the performance, and an output unit that acoustically outputs the accompaniment sound and the singing voice. According to this configuration, the singing voice synthesized by the voice synthesizing unit and the accompaniment sound according to the progress of the performance are output, so that the singer can experience a new singing. The output unit may further output the input voice acoustically.
 実施態様において、前記音声合成部は、前記歌詞データが持つ発声タイミングに従って前記歌唱音声を合成してよい。あるいは、前記音声合成部は、前記音量検出部で検出された音量に応じて前記歌詞データの歌唱タイミングを変化させて歌唱音声を合成する構成としても良い。この構成によれば、歌唱者は、合成される歌詞音声を、歌詞データで規定される発声タイミング通りではなく、ある程度コントロールできる。このため、音声合成される歌唱のタイミングを即興(アドリブ)的に変化させることが可能になる。 In an embodiment, the voice synthesizer may synthesize the singing voice according to the utterance timing of the lyrics data. Alternatively, the voice synthesizer may synthesize singing voice by changing the singing timing of the lyrics data according to the volume detected by the volume detector. According to this configuration, the singer can control the synthesized lyric sound to some extent, not according to the utterance timing defined by the lyric data. For this reason, it becomes possible to improvise (ad-lib) the timing of the singing voice-synthesized.
 本発明は、歌唱合成装置として具体化することができるのみならず、コンピュータにより実装される方法として具体化することもでき、また、当該方法を、プロセッサによって実行可能なプログラムを記憶した、非一過性のコンピュータ読取り可能な記憶媒体として具体化することもできる。 The present invention can be embodied not only as a singing voice synthesizing apparatus but also as a computer-implemented method, and the method is stored in a program that stores a program executable by a processor. It can also be embodied as a transient computer readable storage medium.
本発明の第1実施形態に係る歌唱合成装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the song synthesizing | combining apparatus which concerns on 1st Embodiment of this invention.
歌唱合成装置における歌詞データ等を示す図である。It is a figure which shows the lyric data etc. in a song synthesizing | combining apparatus.
歌唱合成装置における歌唱音声合成処理を示すフローチャートである。It is a flowchart which shows the singing voice synthesis | combination process in a song synthesizer.
歌唱合成装置における歌唱音声の出力例を示す図である。It is a figure which shows the example of an output of the song voice in a song synthesis apparatus.
本発明の第2実施形態に係る歌唱合成装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the song synthesizing | combining apparatus which concerns on 2nd Embodiment of this invention.
歌唱合成装置における歌唱音声の出力例を示す図である。It is a figure which shows the example of an output of the song voice in a song synthesis apparatus.
本発明の第3実施形態に係る歌唱合成装置の構成を示す機能ブロック図である。It is a functional block diagram which shows the structure of the song synthesizing | combining apparatus which concerns on 3rd Embodiment of this invention.
 以下、本発明の実施形態について図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
<第1実施形態>
 図1は、第1実施形態に係る歌唱合成装置10の構成を示す機能ブロック図である。この図において、歌唱合成装置10は、ノート型やタブレット型などのコンピュータであって、音声入力部102、音高検出部104、音量検出部108、操作部112、制御部120、データベース130、音声合成部140、音源部160、スピーカ172、174を有する。これらの機能ブロックのうち、例えば音声入力部102、操作部112、音声合成部140、スピーカ172、174についてはハードウェアによって構築され、音高検出部104、音量検出部108、制御部120、データベース130、音源部160については、図示省略したCPU(Central Processing Unit)が予めインストールされたアプリケーションプログラムを実行することによって構築される。なお、特に図示しないが、歌唱合成装置10は、このほかにも表示部を有し、利用者が装置の状況や設定を確認することができるようになっている。
<First Embodiment>
FIG. 1 is a functional block diagram showing the configuration of the singing voice synthesizing apparatus 10 according to the first embodiment. In this figure, the singing voice synthesizing apparatus 10 is a notebook type or tablet type computer such as a voice input unit 102, a pitch detection unit 104, a volume detection unit 108, an operation unit 112, a control unit 120, a database 130, a voice. A synthesis unit 140, a sound source unit 160, and speakers 172 and 174 are included. Among these functional blocks, for example, the voice input unit 102, the operation unit 112, the voice synthesis unit 140, and the speakers 172 and 174 are constructed by hardware, and the pitch detection unit 104, the volume detection unit 108, the control unit 120, the database 130 and the sound source unit 160 are constructed by executing an application program in which a CPU (Central Processing Unit) (not shown) is installed in advance. Although not shown in particular, the singing voice synthesizing apparatus 10 has a display unit in addition to this, so that the user can check the status and settings of the apparatus.
 音声入力部102は、詳細については省略するが、歌唱者(ユーザ)による歌唱音声を電気信号の歌唱音声信号に変換するマイクロフォンと、変換された歌唱音声信号の高域成分をカットするLPF(ローパスフィルタ)と、高域成分をカットした歌唱音声信号をデジタル信号に変換するA/D変換器とで構成される。 The audio input unit 102 omits details, but a microphone that converts a singing voice by a singer (user) into a singing voice signal of an electric signal, and an LPF (low-pass) that cuts a high frequency component of the converted singing voice signal. Filter) and an A / D converter that converts a singing voice signal from which a high frequency component is cut into a digital signal.
 音高検出部104は、デジタル信号に変換された歌唱音声信号(入力音声)を周波数解析するとともに、解析して得られた音高(周波数)を示す音高データをほぼリアルタイムで出力する。なお、周波数解析については、FFT(Fast Fourier Transform)や、その他公知の方法を用いることができる。 The pitch detection unit 104 performs frequency analysis on the singing voice signal (input voice) converted into a digital signal and outputs pitch data indicating the pitch (frequency) obtained by the analysis in almost real time. For frequency analysis, FFT (Fast Fourier Transform) or other known methods can be used.
 音量検出部108は、例えばデジタル信号に変換された歌唱音声信号をローパスフィルタで濾波するなどの処理によって、該歌唱音声信号の振幅エンベロープを検出し、歌唱音声の音量を示す音量データをほぼリアルタイムで出力する。操作部112は、歌唱者による操作、例えば歌唱する楽曲の選択操作などを入力して、当該操作を示す情報を、制御部120に供給する。データベース130は、複数の曲分の楽曲データを記憶する。1曲分の楽曲データは、当該曲の伴奏音を1以上のトラックで規定する伴奏データ、および、当該曲の歌詞を示す歌詞データから構成される。 The volume detection unit 108 detects the amplitude envelope of the singing voice signal by, for example, filtering the singing voice signal converted into a digital signal with a low-pass filter, and obtains volume data indicating the volume of the singing voice almost in real time. Output. The operation unit 112 inputs an operation by a singer, for example, an operation for selecting a song to be sung, and supplies information indicating the operation to the control unit 120. The database 130 stores music data for a plurality of songs. The music data for one song is composed of accompaniment data that defines the accompaniment sound of the song with one or more tracks, and lyrics data indicating the lyrics of the song.
 制御部120は、データベース130を管理するほか、演奏の進行時にあたっては、シーケンサとして機能する。シーケンサとして機能する制御部120は、データベース130から読み出した楽曲データのうち、伴奏データを解釈して、発生すべき楽音を規定する楽音情報を、演奏の開始時から演奏の進行に合わせて時系列の順で音源部160に供給する。ここで、伴奏データとして例えばMIDI規格に準拠したものが用いられる。なお、MIDI規格に準拠した場合、当該伴奏データは、イベントと、イベント同士の時間間隔を示すデュレーションとの組み合わせで規定される。このため、制御部120は、デュレーションで示される時間が経過する毎に、イベントの内容を示す楽音情報を、音源部160に供給する。つまり、制御部120は、伴奏データを解釈して、楽音情報を音源部160に供給することで当該曲の演奏を進行させることになる。また、制御部120は、伴奏データを解釈する際に、演奏開始からのデュレーションの積算値を求める。制御部120は、当該積算値によって、演奏の進行状態、すなわち曲のどの部分が演奏されているかを把握することができる。 In addition to managing the database 130, the control unit 120 functions as a sequencer when the performance is in progress. The control unit 120 functioning as a sequencer interprets the accompaniment data in the music data read from the database 130, and sets the musical tone information that defines the musical tone to be generated in time series from the start of the performance to the progress of the performance. Are supplied to the sound source unit 160 in this order. Here, for example, the accompaniment data conforming to the MIDI standard is used. In the case of conforming to the MIDI standard, the accompaniment data is defined by a combination of an event and a duration indicating a time interval between events. For this reason, the control unit 120 supplies tone information indicating the content of the event to the sound source unit 160 every time the time indicated by the duration elapses. In other words, the control unit 120 interprets the accompaniment data and supplies musical tone information to the sound source unit 160 to advance the performance of the song. In addition, when interpreting the accompaniment data, the control unit 120 obtains an integrated value of the duration from the start of the performance. The control unit 120 can grasp the progress state of the performance, that is, which part of the song is being played based on the integrated value.
 音源部160は、制御部120から供給される楽音情報にしたがって、伴奏音を示す楽音信号を合成する。なお、本実施形態では、必ずしも伴奏音を出力する必要はないので、音源部160は必須ではない。また、音源部160から出力される楽音信号は、図示省略したD/A変換部によってアナログ信号に変換された後、スピーカ174によって音響変換されて出力される。 The sound source unit 160 synthesizes a musical sound signal indicating an accompaniment sound according to the musical sound information supplied from the control unit 120. In the present embodiment, the sound source unit 160 is not essential because it is not always necessary to output the accompaniment sound. The musical tone signal output from the sound source unit 160 is converted into an analog signal by a D / A conversion unit (not shown), and then is acoustically converted by the speaker 174 and output.
 制御部120は、楽音情報を音源部160に供給するほか、演奏の進行に合わせて、歌詞データを音声合成部140に供給する。音声合成部140は、制御部120から供給される歌詞データと、音高検出部104から供給される音高データと、音量検出部108から供給される音量データと、にしたがって歌唱音声を合成し、歌唱音声信号として出力する。なお、音声合成部140から出力される歌唱音声信号は、図示省略したD/A変換部によってアナログ信号に変換された後、スピーカ172によって音響変換されて出力される。 The control unit 120 supplies musical tone information to the sound source unit 160 and also supplies lyrics data to the voice synthesis unit 140 as the performance progresses. The voice synthesizer 140 synthesizes the singing voice according to the lyrics data supplied from the controller 120, the pitch data supplied from the pitch detector 104, and the volume data supplied from the volume detector 108. And output as a singing voice signal. Note that the singing voice signal output from the voice synthesis unit 140 is converted into an analog signal by a D / A conversion unit (not shown), and then acoustically converted by the speaker 172 and output.
 図2は、歌詞データの一例を示す図である。この図の例では、楽曲として「さくら」の歌詞データが旋律(歌詞の上に表示された楽譜)とともに示されている。なお、楽曲「さくら」の著作権の保護期間は、日本国の著作権法第51条及び第57条の規定によりすでに満了している。 FIG. 2 is a diagram showing an example of lyrics data. In the example of this figure, the lyrics data of “Sakura” is shown as a song together with the melody (the score displayed on the lyrics). Note that the copyright protection period of the music “Sakura” has already expired in accordance with the provisions of Article 51 and Article 57 of the Copyright Act of Japan.
 この図に示されるように、歌詞データは、歌唱すべき歌詞の文字情報を演奏の開始時から順番に配列した文字列と、歌詞を構成する文字又は文字グループ毎の発声タイミング(つまり音節毎の発声タイミング)を規定する情報とを含む。一例として、歌詞データは、歌詞を示す文字情報を含み、歌詞の文字又は文字グループがその時間的配置を特定できるように時間的に区切られていてよい。また、歌詞データは、旋律の音符、すなわち、歌詞を歌唱すべき歌唱タイミングおよび歌唱すべき音高に、歌詞の各文字をそれぞれ対応付ける情報を含んでいてよい。図2の例では、歌詞51~57(図では歌詞51から57までを図示し、以降については図示省略)のそれぞれに対して1つの音符が割り当てられているが、曲(歌詞)によっては、1つの文字又は文字グループ(つまり1音節)に対して複数の音符が割り当てられる場合もあれば、1つの音符に対して複数の文字又は文字グループ(つまり複数音節)が割り当てられる場合もある。演奏の進行が音符で示される歌唱タイミング(発声タイミング)に到達したときに、制御部120は、当該音符に対応する歌詞の文字又は文字グループ(つまり音節を構成する文字)および当該音符の音高を示すデータを音声合成部140に供給する。 As shown in this figure, the lyric data is composed of a character string in which character information of lyrics to be sung is arranged in order from the start of the performance, and utterance timing for each character or character group constituting the lyrics (that is, for each syllable). Information for defining the utterance timing). As an example, the lyric data may include character information indicating the lyrics, and the characters or character groups of the lyrics may be divided in time so that the temporal arrangement can be specified. The lyric data may include information that associates each character of the lyric with a note of a melody, that is, a singing timing at which the lyrics are to be sung and a pitch at which the lyrics are to be sung. In the example of FIG. 2, one note is assigned to each of the lyrics 51 to 57 (the lyrics 51 to 57 are shown in the figure, and the subsequent illustration is omitted), but depending on the song (lyrics), A plurality of notes may be assigned to one character or character group (that is, one syllable), or a plurality of characters or character groups (that is, a plurality of syllables) may be assigned to one note. When the progress of the performance reaches the singing timing (speech timing) indicated by the notes, the control unit 120 displays the lyric character or character group corresponding to the note (that is, the characters constituting the syllable) and the pitch of the note. Is supplied to the speech synthesizer 140.
 なお、演奏の進行が歌唱タイミングに到達したか否かについて、伴奏データの解釈におけるデュレーションの積算値と歌詞データの歌唱タイミングとを予め対応付けておけば、演奏進行において当該積算値が 歌詞データの歌唱タイミングに対応付けられた値に達したか否かによって、制御部120が判別することができる。一方、伴奏音を出力しない場合(伴奏データを使用しない場合)には、伴奏データのデュレーションの積算値で演奏の進行を把握できないので、この場合には、例えば歌詞の歌唱タイミングを、伴奏データと同じように、イベント(歌詞の歌唱イベント)と当該イベント同士の時間間隔を示すデュレーションとで規定して、歌唱タイミングであるか否かについては、当該歌詞データにおいて歌唱すべきイベントが到来しているか否かで判別すれば良い。 As to whether or not the progress of the performance has reached the singing timing, if the duration integrated value in the interpretation of the accompaniment data and the singing timing of the lyric data are associated in advance, the integrated value in the performance progress will be the lyric data. The control unit 120 can determine whether or not the value corresponding to the singing timing has been reached. On the other hand, when the accompaniment sound is not output (when the accompaniment data is not used), the progress of the performance cannot be grasped by the integrated value of the duration of the accompaniment data. In this case, for example, the singing timing of the lyrics is the accompaniment data. Similarly, whether or not it is the singing timing is defined by the event (the lyrics singing event) and the duration indicating the time interval between the events. What is necessary is just to judge by no.
 図1において、音声合成部140は、制御部120から供給された歌詞データの文字を、ライブラリ(図示省略)に登録された音声素片データを用いて音声合成する。このライブラリには、単一の音素や、或る音素から別の音素への遷移部分など、歌唱音声の素材となる各種の音声素片の波形を定義した音声素片データが予め登録されている。詳細には、音声合成部140は、供給された歌詞データの文字で示される音素列を音声素片の列に変換し、これらの音声素片に対応する音声素片データをライブラリから選択して相互に接続するとともに、接続した音声素片データの各々のピッチを、指定された音高に合わせて変換して、歌唱音声を示す歌唱音声信号を合成する。なお、音声合成部140における歌唱音声の音高および音量については、後述する。 In FIG. 1, the speech synthesizer 140 synthesizes the characters of the lyrics data supplied from the controller 120 using speech segment data registered in a library (not shown). In this library, speech unit data defining waveforms of various speech units that are materials of singing speech such as a single phoneme or a transition part from one phoneme to another phoneme is registered in advance. . Specifically, the speech synthesizer 140 converts the phoneme sequence indicated by the characters of the supplied lyric data into a speech unit sequence, and selects speech unit data corresponding to these speech units from the library. While connecting mutually, the pitch of each connected speech unit data is converted according to the designated pitch, and the singing voice signal which shows a singing voice is synthesize | combined. Note that the pitch and volume of the singing voice in the voice synthesizer 140 will be described later.
 なお、本実施形態では、歌唱音声をスピーカ172によって、伴奏音をスピーカ174によって、それぞれ別々に出力する構成としたが、歌唱音声と伴奏音とをミキシングして同じスピーカから出力する構成としても良い。 In this embodiment, the singing voice is output separately from the speaker 172 and the accompaniment sound is output separately from the speaker 174. However, the singing voice and the accompaniment sound may be mixed and output from the same speaker. .
 次に、本実施形態に係る歌唱合成装置10における動作について説明する。この歌唱合成装置10では、歌唱者が操作部112を操作して、所望の曲を選択すると、制御部120が、当該曲に対応する楽曲データをデータベース130から読み出すとともに、当該楽曲データのうち、伴奏データを解釈し、合成すべき伴奏音の楽音情報を音源部160に供給して、当該音源部160に楽音信号を合成させる一方、当該楽曲データのうち、歌詞データを演奏の進行に合わせて音声合成部140に供給して、当該音声合成部140に歌唱音声信号を合成させる。すなわち、歌唱合成装置10において、演奏が開始されると、第1に、演奏の進行に合わせて楽音信号を合成する楽音合成処理と、第2に、当該演奏の進行に合わせて歌詞データを供給することにより歌唱音声を合成する歌唱音声合成処理とが互いに独立して実行される。このうち、楽音合成処理は、制御部120が演奏の進行に合わせて楽音情報を供給する一方、音源部160が当該楽音情報に基づいて楽音信号を合成する処理であり、この処理自体は周知である(例えば特開平7-199975号公報等参照)。このため、楽音合成処理の詳細については説明を省略し、以下においては、歌唱音声合成処理についてのみ説明する。 Next, the operation of the singing voice synthesizing apparatus 10 according to this embodiment will be described. In this song synthesizing device 10, when the singer operates the operation unit 112 to select a desired song, the control unit 120 reads the song data corresponding to the song from the database 130, and among the song data, Interpret the accompaniment data, supply the musical tone information of the accompaniment sound to be synthesized to the sound source unit 160 and synthesize the musical tone signal to the sound source unit 160, while the lyrics data of the music data is synchronized with the progress of the performance The voice synthesis unit 140 is supplied to synthesize a singing voice signal. That is, in the singing synthesizing apparatus 10, when a performance is started, firstly, a musical tone synthesis process for synthesizing musical tone signals in accordance with the progress of the performance, and secondly, lyrics data is supplied in accordance with the progress of the performance. By doing so, the singing voice synthesizing process for synthesizing the singing voice is executed independently of each other. Among these, the tone synthesis process is a process in which the control unit 120 supplies tone information as the performance progresses, while the tone generator unit 160 synthesizes a tone signal based on the tone information. This process itself is well known. (See, for example, JP-A-7-199975). For this reason, description is abbreviate | omitted about the detail of a musical tone synthesis process, and only a song voice synthesis process is demonstrated below.
 なお、曲が操作部112によって選択された場合に、制御部120は、当該曲の伴奏データや歌詞データの供給を自動的に開始する。これによって、当該曲の演奏開始が指示されることになる。ただし、制御部120は、曲が選択された場合であっても、他の曲の演奏が進行していれば、当該他の曲が終了するまで、選択された曲の演奏を待機させる。 When a song is selected by the operation unit 112, the control unit 120 automatically starts supplying accompaniment data and lyrics data of the song. As a result, the start of performance of the song is instructed. However, even if a song is selected, if the performance of another song is in progress, the control unit 120 waits for the performance of the selected song until the other song ends.
 図3は、歌唱音声合成処理を示すフローチャートである。この歌唱音声合成処理は、制御部120と音声合成部140とで実行される。演奏が開始されると、制御部120は、まず演奏の進行段階が歌唱タイミングであるか否かを判別する(ステップSa11)。 FIG. 3 is a flowchart showing the singing voice synthesis process. This singing voice synthesis process is executed by the control unit 120 and the voice synthesis unit 140. When the performance is started, the control unit 120 first determines whether or not the progress stage of the performance is a singing timing (step Sa11).
 演奏の進行段階が歌唱タイミングでないと判別すれば(ステップSa11の判別結果が「No」であれば)、制御部120は、処理手順をステップSa11に戻す。換言すれば、演奏の進行段階が歌唱タイミングになるまで、ステップSa11で待機することになる。一方、演奏の進行段階が歌唱タイミングになったと判別すれば(ステップSa11の判別結果が「Yes」であれば)、制御部120は、歌詞データ、すなわち、当該歌唱タイミングで歌唱すべき文字、音高を規定するデータを音声合成部140に供給する(ステップSa12)。 If it is determined that the performance stage is not the singing timing (if the determination result in step Sa11 is “No”), the control unit 120 returns the processing procedure to step Sa11. In other words, the process waits at step Sa11 until the performance stage reaches the singing timing. On the other hand, if it is determined that the progress stage of the performance has become the singing timing (if the determination result of step Sa11 is “Yes”), the control unit 120 may control the lyrics data, that is, the characters and sounds to be sung at the singing timing. Data defining the height is supplied to the speech synthesizer 140 (step Sa12).
 音声合成部140は、制御部120から、歌詞データが供給された場合に、当該歌詞データに基づき音声合成するが、音高および音量ついては、次のように制御する(ステップSa13)。すなわち、音声合成部140は、音量検出部108から供給される音量データで示される音量が閾値以下であれば、当該歌詞データの文字を、当該歌詞データの音高で、音量検出部108から供給される音量データで示される音量で音声合成して、歌唱音声信号として出力する。この閾値は小さな値であり、従って、当該音量データで示される音量が該閾値以下である場合、当該歌唱音声信号をスピーカ172から出力させても、聴感上無視できるレベルで出力される。 The speech synthesizer 140 synthesizes speech based on the lyrics data when the lyrics data is supplied from the control unit 120, but controls the pitch and volume as follows (step Sa13). That is, if the volume indicated by the volume data supplied from the volume detection unit 108 is equal to or lower than the threshold, the voice synthesis unit 140 supplies the text of the lyrics data from the volume detection unit 108 at the pitch of the lyrics data. The voice is synthesized at the volume indicated by the volume data to be output and output as a singing voice signal. This threshold value is a small value. Therefore, when the volume indicated by the volume data is equal to or lower than the threshold value, even if the singing voice signal is output from the speaker 172, it is output at a level that can be ignored for hearing.
 一方、音声合成部140は、制御部120から歌詞データが供給された場合に音量データで示される音量が前記閾値より大きいとき、制御部120から供給される歌詞データの音高を音高検出部104から供給された音高データで示される音高に変更して、音量検出部108から供給される音量データで示される音量で、当該歌詞データの文字を音声合成して歌唱音声信号として出力する。このため、スピーカ172から聴こえる当該歌唱音声信号は、歌詞データの文字を、歌唱者が歌唱した音高で、歌唱者が歌唱した音量の変化に追随した音量変化で、音声合成したものとなる。 On the other hand, when the volume data indicated by the volume data is larger than the threshold when the lyrics data is supplied from the control unit 120, the voice synthesis unit 140 determines the pitch of the lyrics data supplied from the control unit 120 as a pitch detection unit. The pitch is changed to the pitch indicated by the pitch data supplied from 104, and the characters of the lyrics data are synthesized with the volume indicated by the volume data supplied from the volume detector 108 and output as a singing voice signal. . For this reason, the singing voice signal that can be heard from the speaker 172 is obtained by synthesizing the characters of the lyric data with the pitch changed by the singer and the volume change following the change of the volume sung by the singer.
 一方、制御部120は、歌唱タイミングに至った歌詞データを音声合成部140に供給した後、次に歌唱すべき歌詞データが存在しないか否かを判別する(ステップSa14)。存在すれば(ステップSa14の判別結果が「No」であれば)、制御部120は、処理手順をステップSa11に戻す。これにより、演奏の進行段階が次の歌唱タイミングに至ったときにステップSa12、13の処理が実行される。最終的に、次に歌唱すべきデータが存在しなければ(ステップSa14の判別結果が「Yes」であれば)、制御部120は、歌唱音声合成処理を終了させる。 On the other hand, after supplying the lyric data that has reached the singing timing to the speech synthesizer 140, the control unit 120 determines whether or not there is lyric data to be sung next (step Sa14). If it exists (if the determination result in step Sa14 is “No”), the control unit 120 returns the processing procedure to step Sa11. Thereby, the process of step Sa12, 13 is performed when the progress stage of performance comes to the next song timing. Finally, if there is no data to be sung next (if the determination result in step Sa14 is “Yes”), the control unit 120 ends the singing voice synthesis process.
 図4は、歌唱音声の具体的な合成例を示す図である。この図は、歌唱者が歌唱する曲として「さくら」(図2参照)を選択した場合の例である。当該歌唱者が、伴奏音を聴きながら演奏の進行に合わせて、(b)で示されるような音量で歌唱したときに、本実施形態では、同図(c)で示されるように歌唱音声が出力される。すなわち、歌唱者が演奏の進行に対して、「さ」(歌詞文字51)の冒頭から若干遅れ気味のタイミングで音量を上げて歌唱した場合、音声合成部140は、音量検出部108から供給された音量データで示される音量が閾値を超えたときに、歌唱音声信号の振幅を当該音量に合わせて調整するので、(c)に示す歌唱音声の「さ」(符号61)は、(a)の歌詞データ(歌詞51)で規定されるような正しい歌唱タイミング通りとはならない。 FIG. 4 is a diagram showing a specific synthesis example of singing voice. This figure is an example in the case where “Sakura” (see FIG. 2) is selected as a song sung by the singer. When the singer sings at a volume as shown in (b) while listening to the accompaniment sound, in this embodiment, the singing voice is shown in FIG. Is output. That is, when the singer sings with the volume turned up at a slightly delayed timing from the beginning of “sa” (lyric characters 51) with respect to the progress of the performance, the speech synthesizer 140 is supplied from the volume detector 108. When the volume indicated by the volume data exceeds the threshold, the amplitude of the singing voice signal is adjusted according to the volume, so that “s” (symbol 61) of the singing voice shown in (c) is (a) The correct singing timing as defined by the lyric data (lyrics 51) is not obtained.
 また、歌唱者が、演奏の進行に対して、「く」(歌詞文字52)から「ら」(歌詞文字53)までにおいて音量を下げたとき(または音声入力部102のマイクロフォンを口から遠ざけたとき)、(c)の歌唱音声では、「く」(符号62)と「ら」(符号63-1)とに間が空くことになる。歌唱者が演奏の進行に対して、「ら」(歌詞文字53)の途中において音量を下げたとき、同様な理由により、(c)の歌唱音声では、「ら」が符号63-1、63-2に分断されることになる。なお、時間的後方の「ら」(符号63-2)は、説明の便宜のために「ら」と表記しているが、実際には「ら」の母音である「あ」として聴こえることになる。 In addition, when the singer decreases the volume from “ku” (lyric characters 52) to “ra” (lyric characters 53) as the performance progresses (or moves the microphone of the voice input unit 102 away from the mouth). ), In the singing voice of (c), there is a gap between “ku” (reference numeral 62) and “ra” (reference numeral 63-1). For the same reason, when the singer decreases the volume in the middle of “ra” (lyric characters 53) with respect to the progress of the performance, in the singing voice of (c), “ra” is indicated by reference numerals 63-1, 63. -2 will be divided. Note that “ra” (symbol 63-2) at the back of the time is expressed as “ra” for convenience of explanation, but in reality it can be heard as “a” which is a vowel of “ra”. Become.
 なお、図4の例では、歌唱者がどのような音量で歌唱したときに、歌唱音声がどのように音声合成されるのか、という観点で説明した図である。この例では、歌唱者がどのような音高で歌唱したときに、歌唱音声がどのような音高で音声合成されるのか、という点については示していないが、特段に詳しい説明は要しないであろう。すなわち、歌唱者(ユーザ)による入力音声の音高に従って音声合成される。 In addition, in the example of FIG. 4, when the singer sings at what volume, it is a figure demonstrated from the viewpoint of how the singing voice is voice-synthesized. This example does not show the pitch at which the singing voice is synthesized when the singer sings, but does not require any detailed explanation. I will. That is, voice synthesis is performed according to the pitch of the input voice by the singer (user).
 第1実施形態における歌唱合成装置10は、歌唱音声の合成にあたって、歌唱者による音高および音量のみを用いている。したがって、歌唱者(ユーザ)が、「さくら、さくら・」という歌詞ではなく、例えば「あああ、あああ・」というようにスキャットあるいはハミング風に歌唱しても、あるいは出鱈目な歌詞で歌唱しても、歌唱合成装置10によって合成される歌唱音声は、「さくら、さくら・」という正しい歌詞となる。 The singing voice synthesizing apparatus 10 in the first embodiment uses only the pitch and volume of the singer when synthesizing the singing voice. Therefore, if a singer (user) sings in a scat or humming style, such as “Ah, Ah,” instead of “Sakura, Sakura”, The singing voice synthesized by the singing voice synthesizing apparatus 10 has the correct lyrics “Sakura, Sakura”.
 背景技術で述べたようなフォルマントシーケンスデータを用いる場合には、オリジナルの歌手が歌唱したときのデータを採取する必要がある。また、この場合、歌唱者が歌唱した音高および音量に応じて、フォルマントシーケンスデータに基づくフォルマントを整形するので、オリジナルの歌手の歌い方の影響を受けるのは避けられない。これに対して、本実施形態では、音声素片であるライブラリを用いて歌唱音声を合成するので、モデルとなる人物の歌い方の影響を受けないし、そもそもモデルとなる人物に曲を歌わせる必要がないほか、歌唱者(ユーザ)が実際にその場で歌唱した音高および音量に対して忠実に、歌唱音声を音声合成することができる、という利点がある。そして、本実施形態によれば、歌唱者による歌唱の意図(音高、音量)が反映されつつ、歌唱者とは異なる声質で合成された歌唱音声が出力されるので、歌唱者の歌唱表現を拡張することができるとともに、新たなる歌唱体験を体感させることができる。 When using formant sequence data as described in the background art, it is necessary to collect data when the original singer sang. Further, in this case, since the formant based on the formant sequence data is shaped according to the pitch and volume sung by the singer, it is inevitable that the singer is influenced by the way of singing the original singer. On the other hand, in this embodiment, since the singing voice is synthesized using the library which is a speech unit, it is not affected by the way of singing the model person, and it is necessary to let the model person sing the song in the first place. In addition, there is an advantage that the singing voice can be voice-synthesized faithfully to the pitch and volume that the singer (user) actually sang on the spot. And according to this embodiment, since the singing voice synthesize | combined with the voice quality different from a singer is output, reflecting the intention (pitch, volume) of the singing by a singer, the singing expression of the singer is expressed. It can be expanded and a new singing experience can be experienced.
<第2実施形態>
 第1実施形態では、歌唱者による歌唱の音高および音量を反映させて、歌唱音声を合成する構成であり、音高および音量以外の情報、端的にいえば、歌唱者による実際の歌唱音声それ自体は全く利用していない。これに対して、次に説明する第2実施形態では、歌唱者による実際の歌唱音声それ自体と、音声合成した歌唱音声とで合唱を行うように構成する。この第2実施形態は、概略すると、例えば歌唱者による実際の歌唱音声を根音とする一方、当該根音に対して3度上の音と、当該根音に対して5度上の音とを音声合成して、歌唱者がひとりで歌唱しているにもかかわらず、三和音でハモるようにしたものである。
Second Embodiment
In the first embodiment, the singing voice is synthesized by reflecting the pitch and volume of the singing by the singer. Information other than the pitch and volume, in short, the actual singing voice by the singer It is not used at all. On the other hand, in 2nd Embodiment demonstrated below, it comprises so that a chorus may be performed by the actual singing voice itself by a singer, and the singing voice synthesize | combined with speech. The second embodiment can be summarized as follows. For example, an actual singing voice by a singer is a root sound, while a sound that is three times higher than the root sound and a sound that is five times higher than the root sound. Is synthesized with a triad, even though the singer is singing alone.
 図5は、第2実施形態に係る歌唱合成装置10の構成を示す機能ブロック図である。この図に示される歌唱合成装置10が、図1に示した第1実施形態と相違する部分は、音高変換部106a、106bが設けられた点と、2系統の音声合成部140a、140bが設けられた点、および、ミキサ150が設けられた点である。このため、第2実施形態では、これらの相違部分を中心に説明することにする。 FIG. 5 is a functional block diagram showing the configuration of the singing voice synthesizing apparatus 10 according to the second embodiment. The singing voice synthesizing device 10 shown in this figure is different from the first embodiment shown in FIG. 1 in that the pitch converting units 106a and 106b are provided, and the two voice synthesizing units 140a and 140b are provided. The point provided is the point where the mixer 150 is provided. For this reason, in the second embodiment, these different portions will be mainly described.
 音高変換部106aは、音高検出部104から供給される音高データで示される音高に対して、予め定められた関係にある音高、例えば3度上にある音高に変換して、音声合成部140aに供給する。音高変換部106bは、音高検出部104から供給される音高データで示される音高に対して、予め定められた関係にある音高、例えば5度上にある音高に変換して、音声合成部140bに供給する。なお、根音に対する3度には短3度と長3度とがあり、根音に対して5度には完全5度と減5度と増5度とがある。いずれの音程とするかについては、根音の音高(および調号)で定まるので、音高変換部106a、106bは、例えば、根音の音高に対する変換後の音高を予めテーブル化しておき、音高検出部104から供給される音高データで示される音高を、当該テーブルを参照して変換する構成とすれば良い。 The pitch conversion unit 106a converts the pitch indicated by the pitch data supplied from the pitch detection unit 104 into a pitch having a predetermined relationship, for example, a pitch that is three degrees above. To the speech synthesizer 140a. The pitch conversion unit 106b converts the pitch indicated by the pitch data supplied from the pitch detection unit 104 into a pitch having a predetermined relationship, for example, a pitch that is 5 degrees above. To the speech synthesizer 140b. Note that 3 degrees for the root sound include a short 3 degree and a long 3 degree, and 5 degrees for the root sound includes a complete 5 degree, a decreased 5 degree, and an increased 5 degree. Since which pitch is determined by the pitch (and key signature) of the root note, the pitch converters 106a and 106b, for example, tabulate the converted pitches for the root pitches in advance. The pitch indicated by the pitch data supplied from the pitch detection unit 104 may be converted with reference to the table.
 音声合成部140a、140bは、機能的には第1実施形態における音声合成部140と同機能を有するものであり、制御部120から同じ歌詞データの供給を受けるが、音声合成部140aには、音高変換部106aで変換された音高が指定され、音声合成部140bには、音高変換部106bで変換された音高が指定される。ミキサ150は、音声入力部102による歌唱音声信号と、音声合成部140aによる歌唱音声信号と、音声合成部140bによる歌唱音声信号とをミキシングする。なお、ミキシングされた歌唱音声信号は、図示省略したD/A変換部によってアナログ信号に変換された後、スピーカ172によって音響変換されて出力される。 The voice synthesis units 140a and 140b are functionally the same as the voice synthesis unit 140 in the first embodiment, and receive the same lyrics data from the control unit 120, but the voice synthesis unit 140a The pitch converted by the pitch converter 106a is specified, and the pitch converted by the pitch converter 106b is specified in the voice synthesizer 140b. The mixer 150 mixes the singing voice signal from the voice input unit 102, the singing voice signal from the voice synthesis unit 140a, and the singing voice signal from the voice synthesis unit 140b. Note that the mixed singing voice signal is converted into an analog signal by a D / A converter (not shown), and then acoustically converted by a speaker 172 and output.
 図6は、第2実施形態による歌唱音声の具体的な合成例を示す図である。この図は、歌唱者が歌唱する曲として「さくら」(図2参照)を選択して、当該歌唱者が、伴奏音を聴きながら演奏の進行に合わせて、符号71、72、73、・で示す歌詞を同図の左欄の鍵盤で示される音高で歌唱した場合、すなわち、同図の上欄で示される楽譜(歌詞データ)の音高および歌唱タイミングで歌唱した場合の例である。この場合、音声合成部140aは、符号61a、62a、63a、・で示されるように当該歌唱の音高に対して3度上の音高で音声合成し、音声合成部140bは、符号61b、62b、63b、・で示されるように歌唱者の歌唱の音高に対して5度上の音高で音声合成する。 FIG. 6 is a diagram showing a specific synthesis example of the singing voice according to the second embodiment. In this figure, “SAKURA” (see FIG. 2) is selected as a song to be sung by the singer, and the singer listens to the accompaniment sound while the performance progresses, with reference numerals 71, 72, 73,. This is an example in which the lyrics shown are sung at the pitch indicated by the keyboard in the left column of the figure, that is, when the singing is performed at the pitch and singing timing of the score (lyric data) shown in the upper column of the figure. In this case, the speech synthesizer 140a synthesizes speech with a pitch that is three times higher than the pitch of the song as indicated by reference numerals 61a, 62a, 63a,. As shown by 62b, 63b,..., Voice synthesis is performed at a pitch five degrees higher than the pitch of the singer's singing.
 なお、図6の例では、符号61aは、ハ長調において符号71に対して短3度の関係にあり、符号61bは、符号61aに対して長3度の関係にある。このため、符号71、61a、61bは短三和音となる。符号72、62a、62bも同様に短三和音となる。また、符号63aは、符号73に対して短3度の関係にあり、符号63bは、符号63aに対して短3度の関係にある。このため、符号73、63a、63bは減三和音となる。このように、歌唱者が、閾値を超える音量で、かつ、同図に示される楽譜通りの音高、タイミングで歌唱したとき、スピーカ172からは、歌唱者による歌唱を根音とする三和音でハモった歌唱音声が出力されることになる。 In addition, in the example of FIG. 6, the code | symbol 61a has a 3rd minor relationship with respect to the code | symbol 71 in C major, and the code | symbol 61b has a 3rd major relationship with respect to the code | symbol 61a. For this reason, the code | symbol 71, 61a, 61b becomes a short triad. Reference numerals 72, 62a and 62b are also short triads. Further, the reference numeral 63a has a minor third relation with respect to the reference numeral 73, and the reference numeral 63b has a minor third relation with respect to the reference numeral 63a. For this reason, the code | symbol 73, 63a, 63b becomes a reduced triad. In this way, when the singer sings at a volume exceeding the threshold and at the pitch and timing as shown in the figure, the speaker 172 uses a triad with the singer's singing as the root. A singing voice will be output.
 このように、第2実施形態によれば、歌唱者は、1人で歌唱しているにもかかわらず、ハモることができるので、歌唱者に対して、歌唱の表現をさらに拡大させることができる。なお、上述した音高の変換は、あくまでも一例に過ぎない。和音以外となるように変換しても良いし、オクターブ変換しても良い。また、音声合成部は2系統に限られず、1系統として、所定の関係にある音高に変換する構成であっても良いし、3系統以上でも良い。 Thus, according to 2nd Embodiment, since a singer can sing even though he is singing alone, he can further expand the expression of singing to the singer. it can. Note that the pitch conversion described above is merely an example. You may convert so that it may become other than a chord, and you may carry out octave conversion. Further, the speech synthesis unit is not limited to two systems, and may be configured to convert to a pitch having a predetermined relationship as one system, or may be three or more systems.
 なお、第2実施形態では、歌唱者の歌唱音声と音声合成部140a、140bの歌唱音声とをミキシングしてスピーカ172から出力し、音源部160による伴奏音を別のスピーカ174から出力する構成としたが、歌唱音声と伴奏音とをミキシングして1つのスピーカから出力する構成としても良い。すなわち、歌唱音声と伴奏音とを出力する出力部は、別々のスピーカであるか、同じスピーカであるかについては問われない。また、音高変換部106aは、音高検出部104から供給される音高データで示される音高に対して、予め定められた関係にある音高にそれぞれ変換するが、変換する音高の関係については、制御部120や操作部112による指示によって変更可能な構成にしても良い。音高変換部106bについても同様であり、変換する音高の関係を制御部120や操作部112による指示によって変更可能な構成にしても良い。 In the second embodiment, the singing voice of the singer and the singing voice of the voice synthesis units 140a and 140b are mixed and output from the speaker 172, and the accompaniment sound by the sound source unit 160 is output from another speaker 174. However, it is good also as a structure which mixes a singing voice and an accompaniment sound and outputs it from one speaker. That is, it does not matter whether the output unit that outputs the singing voice and the accompaniment sound is a separate speaker or the same speaker. The pitch conversion unit 106a converts the pitches indicated by the pitch data supplied from the pitch detection unit 104 into pitches having a predetermined relationship. The relationship may be changed by an instruction from the control unit 120 or the operation unit 112. The same applies to the pitch conversion unit 106b, and the pitch relationship to be converted may be changed by an instruction from the control unit 120 or the operation unit 112.
<第3実施形態>
 第1実施形態において、演奏の進行段階が歌唱タイミングになったときに、歌詞データうち、当該歌唱タイミングで歌唱すべきデータ(文字、音高)が音声合成部140に供給される構成であるので、歌唱者からみれば、音声合成される歌詞のタイミングをコントロールすることができなかった。これに対して、次に述べる第3実施形態では、歌唱者が、音声合成される歌詞のタイミングをある程度、コントロールすることができる。
<Third Embodiment>
In the first embodiment, when the progress stage of the performance is the singing timing, among the lyrics data, data (characters, pitches) to be sung at the singing timing is supplied to the speech synthesizer 140. From the point of view of the singer, it was not possible to control the timing of the lyrics that were synthesized. On the other hand, in the third embodiment described below, the singer can control the timing of the lyrics to be synthesized to some extent.
 図7は、第3実施形態に係る歌唱合成装置10の構成を示す機能ブロック図である。この図に示される歌唱合成装置10が、図1に示した第1実施形態と相違する部分は、音量検出部108から出力される音量データが音声合成部140とともに制御部120に供給される点である。このため、第3実施形態では、この相違部分を中心に説明することにする。 FIG. 7 is a functional block diagram showing the configuration of the singing voice synthesizing apparatus 10 according to the third embodiment. The singing voice synthesizing apparatus 10 shown in this figure is different from the first embodiment shown in FIG. 1 in that the volume data output from the volume detection unit 108 is supplied to the control unit 120 together with the voice synthesis unit 140. It is. For this reason, in the third embodiment, this difference will be mainly described.
 第3実施形態において制御部120は、音量検出部108から供給される音量データで示される音量が閾値を超えたこと、または、当該音量の時間的な変化が所定値を超えたことをトリガーとして、次の音符に対応する歌詞データを音声合成部140に供給する。すなわち、制御部120は、歌唱者の歌唱した音量が閾値を超えたとき等において、次の音符に対応する歌詞データを、演奏の進行段階が当該歌詞データの歌唱タイミングでなくても、音声合成部140に供給する。 In the third embodiment, the control unit 120 triggers that the volume indicated by the volume data supplied from the volume detection unit 108 exceeds a threshold value or that the temporal change in the volume exceeds a predetermined value. The lyrics data corresponding to the next note is supplied to the speech synthesizer 140. That is, the control unit 120 synthesizes the lyric data corresponding to the next note, such as when the singer's singing volume exceeds a threshold, even if the performance stage is not the singing timing of the lyric data. To the unit 140.
 第3実施形態による歌唱音声の具体的な合成例について説明する。ここでは、第1実施形態と同様に、図4(a)に示されるように、歌唱者が歌唱する曲として「さくら」を選択した場合であって、当該歌唱者が、伴奏音を聴きながら演奏の進行に合わせて、同図の(b)で示されるような音量で歌唱した場合を例にとって説明すると、第3実施形態では、同図の(d)で示されるように歌唱音声が出力される。 A specific synthesis example of the singing voice according to the third embodiment will be described. Here, as in the first embodiment, as shown in FIG. 4A, the singer selects “Sakura” as the song to sing, and the singer listens to the accompaniment sound. A description will be given of an example of singing at a volume as shown in (b) of the figure as the performance progresses. In the third embodiment, a singing voice is output as shown in (d) of the figure in the third embodiment. Is done.
 第3実施形態の特徴的な部分について説明すると、歌唱者が演奏の進行に対して、「ら」(歌詞53)の途中において音量を下げた後、次の「さ」(歌詞54)の前に、音量を上げたとき(当該音量の時間的な変化が所定値を超えたとき)、音量検出部108から供給される音量データの変化に応じて、制御部120は、次の「さ」(符号54)の歌詞データを音声合成部140に供給する。このため、歌詞データで規定される歌唱タイミングよりも早いタイミングで「さ」(符号64)が音声合成されることになる。なお、次の音符に対応する歌詞データの読み出しについては、音量検出部108から供給される音量データで示される音量が閾値を超えたことや、当該音量の時間的な変化が所定値を超えたこと以外にも、当該音量の時間的な変化の傾き(加速度)が所定値を超えたことをトリガーとして実行しても良い。 The characteristic part of the third embodiment will be described. After the singer decreases the volume in the middle of “ra” (lyrics 53) with respect to the progress of the performance, before the next “sa” (lyrics 54) In addition, when the volume is increased (when the temporal change of the volume exceeds a predetermined value), the control unit 120 performs the following “sa” in accordance with the change in the volume data supplied from the volume detection unit 108. The lyric data (reference numeral 54) is supplied to the speech synthesizer 140. For this reason, “sa” (symbol 64) is voice-synthesized at a timing earlier than the singing timing defined by the lyrics data. Regarding the reading of the lyric data corresponding to the next note, the volume indicated by the volume data supplied from the volume detection unit 108 has exceeded a threshold value, or the temporal change in the volume has exceeded a predetermined value. In addition to this, it may be executed as a trigger when the slope (acceleration) of the temporal change of the volume exceeds a predetermined value.
 ところで、歌唱者が、ある歌詞をほぼ同じ音高で、ほぼ同じ音量で、歌詞データで規定されるタイミングよりも長く継続して歌唱する場合、当該歌詞を意図的に(余韻を込めて)延ばしていると考えられる。このような場合に対応するためには、図7において破線で示されるような構成とすれば良い。すなわち、音高検出部104から出力される音高データを、音声合成部140とともに制御部120に供給して、当該制御部120が、音高検出部104から供給される音高データで示される音高が所定値以内で一定であって、音量検出部108から供給される音量データで示される音量が所定値以内で一定である場合、次の歌唱タイミングが到来していても、当該次の歌詞データを音声合成部140に供給しないで、所定時間だけ(または音量が下がるまで)待機する構成とすれば良い。この構成により、歌唱者は、所望の歌詞を、歌詞データで規定されるタイミングよりも長く継続させて歌唱音声を合成させることができる。 By the way, when a singer continuously sings a certain lyrics with almost the same pitch, almost the same volume, and longer than the timing specified by the lyrics data, the lyrics are intentionally extended (with a reverberation). It is thought that. In order to cope with such a case, the configuration shown by the broken line in FIG. That is, the pitch data output from the pitch detection unit 104 is supplied to the control unit 120 together with the voice synthesis unit 140, and the control unit 120 is indicated by the pitch data supplied from the pitch detection unit 104. When the pitch is constant within a predetermined value and the volume indicated by the volume data supplied from the volume detector 108 is constant within a predetermined value, even if the next singing timing has arrived, A configuration may be adopted in which the lyrics data is not supplied to the speech synthesizer 140 and waits for a predetermined time (or until the volume is lowered). With this configuration, the singer can synthesize the singing voice by continuing the desired lyrics longer than the timing defined by the lyrics data.
 このように、第3実施形態によれば、歌唱者が、音声合成される歌詞を、歌詞データで規定されるタイミング通りではなく、ある程度コントロールできるので、音声合成される歌唱のタイミングを即興(アドリブ)的に変化させることが可能になる。なお、この第3実施形態は、第1実施形態に限られず、 歌唱者自身による歌唱と、音声合成された歌唱とをミキシングする第2実施形態と組み合わせても良い。 As described above, according to the third embodiment, the singer can control the lyrics to be synthesized by voice to some extent rather than the timing specified by the lyrics data, and thus improvise the timing of singing to be synthesized (ad-lib). ) Can be changed. In addition, this 3rd Embodiment is not restricted to 1st Embodiment, You may combine with 2nd Embodiment which mixes the song by the song singer himself, and the voice-synthesized song.
<応用・変形例>
 本発明は、上述した第1乃至第3実施形態に限定されるものではなく、例えば次に述べるような各種の応用・変形が可能である。なお、次に述べる応用・変形の態様は、任意に選択された一または複数を適宜に組み合わせることもできる。
<Application and modification>
The present invention is not limited to the first to third embodiments described above, and various applications and modifications described below are possible, for example. Note that one or a plurality of arbitrarily selected aspects of application / deformation described below can be appropriately combined.
 第1(第2)実施形態において、制御部120は、演奏の進行段階が歌唱タイミングになったときに、当該歌唱タイミングに対応する歌詞データ(文字、音高)を音声合成部140に供給する構成であったが、このうち、音高について、制御部120は、音声合成部140に供給しなくても良い。その理由は、音声合成部140は、音量データで示される音量が閾値以下のときは、歌唱音声信号を実質的に出力せず、音量がしきい値を超えたときは、歌詞データの音高ではなく、音高検出部104から出力された音高データで示される音高であるためである。制御部120が、歌詞の音高を供給しない構成であっても、音声合成部140は、制御部120から供給される歌詞データの文字を、入力音声の音量データで示される音量が閾値を超えたときに、入力音声の音高データで示される音高で、当該音量に応じて音声合成すれば良い。 In the first (second) embodiment, the control unit 120 supplies lyrics data (characters, pitches) corresponding to the singing timing to the voice synthesizing unit 140 when the progress stage of the performance is the singing timing. Although it was a structure, among these, the control part 120 does not need to supply to the speech synthesis part 140 about a pitch. The reason is that the voice synthesizer 140 does not substantially output the singing voice signal when the volume indicated by the volume data is equal to or lower than the threshold, and when the volume exceeds the threshold, the pitch of the lyrics data This is because the pitch is indicated by the pitch data output from the pitch detector 104. Even if the control unit 120 is configured not to supply the pitch of the lyrics, the voice synthesis unit 140 determines that the volume of the lyrics data supplied from the control unit 120 exceeds the threshold value indicated by the volume data of the input voice. The voice may be synthesized at the pitch indicated by the pitch data of the input voice according to the volume.
 各実施形態において伴奏データとしてMIDIデータを用いたが、本発明はこれに限られない。例えばコンパクトディスクを再生させることによって楽音信号を得る構成としても良い。この構成において演奏の進行状態を把握するための情報としては、経過時間情報や残り時間情報を用いることができる。このため、制御部120は、経過時間情報や残り時間情報で把握した演奏の進行に合わせて歌詞データを音声合成部140(140a、140b)に供給すれば良い。 In each embodiment, MIDI data is used as accompaniment data, but the present invention is not limited to this. For example, a configuration may be adopted in which a musical tone signal is obtained by reproducing a compact disc. In this configuration, elapsed time information and remaining time information can be used as information for grasping the progress of performance. For this reason, the control part 120 should just supply lyric data to the speech synthesis part 140 (140a, 140b) according to the progress of the performance grasped | ascertained by elapsed time information and remaining time information.
 各実施形態では、音声入力部102が、歌唱者の歌唱をマイクロフォンで入力して歌唱音声信号に変換する構成としたが、歌唱音声信号(入力音声)をなんらかの形で入力する、または、入力される構成であれば良い。例えば、音声入力部102としては、他の処理部で処理された歌唱音声信号や、他の装置から供給(または転送された)歌唱音声信号を入力する構成でも良いし、さらには、単に歌唱音声信号を受信し後段に転送する入力インターフェース回路等であっても良い。また、入力音声は、この歌唱合成装置を使用しているユーザから発声されたものに限らず、他人(友人又は第三者)から発声されたものであってもよい。 In each embodiment, the voice input unit 102 is configured to input a singer's singing with a microphone and convert it into a singing voice signal. However, the singing voice signal (input voice) is input or inputted in some form. Any configuration can be used. For example, the voice input unit 102 may be configured to input a singing voice signal processed by another processing unit, a singing voice signal supplied (or transferred) from another device, or simply a singing voice. It may be an input interface circuit that receives a signal and transfers it to a subsequent stage. Moreover, the input voice is not limited to the voice uttered by the user using the singing voice synthesizing apparatus, but may be voiced by another person (friend or third party).
 各実施形態において、音高検出部104、音高変換部106a、106b、および、音量検出部108については、ソフトウェアで構成したが、ハードウェアで構成しても良い。また、音声合成部140(140a、140b)をソフトウェアで構成しても良い。また、入力音声の音高及び音量に応じて合成する歌唱音声の音高及び音量を制御するのみに限らず、音色など、その他の音声要素を入力音声の音高及び/又は音量に応じて制御するようにしてもよい。 In each embodiment, the pitch detection unit 104, the pitch conversion units 106a and 106b, and the volume detection unit 108 are configured by software, but may be configured by hardware. Further, the speech synthesizer 140 (140a, 140b) may be configured by software. In addition to controlling the pitch and volume of the singing voice synthesized according to the pitch and volume of the input voice, other voice elements such as timbre are controlled according to the pitch and / or volume of the input voice. You may make it do.
 また、本発明におけるプロセッサは、上記実施例において説明されたCPUのようなソフトウェアプログラムを実行可能なプロセッサに限らず、DSPのようなマイクロプログラムを実行可能なプロセッサであってもよく、更に、所期の処理機能を実現しうるように専用のハードウェア回路(集積回路又はデスクリート回路群)によって構成されたプロセッサであってもよい。 The processor according to the present invention is not limited to a processor that can execute a software program such as the CPU described in the above embodiment, but may be a processor that can execute a microprogram such as a DSP. It may be a processor configured by a dedicated hardware circuit (an integrated circuit or a discrete circuit group) so as to realize a desired processing function.

Claims (9)

  1.  入力音声の音高を検出する音高検出部と、
     前記入力音声の音量を検出する音量検出部と、
     演奏の進行に応じて供給される歌詞データに基づき歌唱音声を合成する音声合成部であって、前記音高検出部で検出された音高及び前記音量検出部で検出された音量に応じて音高及び音量を制御して前記歌唱音声を合成する前記音声合成部と、
    を備える歌唱合成装置。
    A pitch detector for detecting the pitch of the input voice;
    A volume detector for detecting the volume of the input voice;
    A speech synthesizer that synthesizes a singing voice based on lyrics data supplied as the performance progresses, wherein the sound is detected according to the pitch detected by the pitch detector and the volume detected by the volume detector. The voice synthesizer that synthesizes the singing voice by controlling high and volume; and
    A singing synthesizer.
  2.  前記歌詞データは、歌詞の発声タイミングを規定する情報を持ち、
     前記音声合成部は、前記歌詞データが持つ発声タイミングに従って前記歌唱音声を合成する、
    ことを特徴とする請求項1に記載の歌唱合成装置。
    The lyric data has information defining the utterance timing of the lyrics,
    The speech synthesizer synthesizes the singing speech according to the utterance timing of the lyrics data.
    The singing voice synthesizing apparatus according to claim 1.
  3.  前記音声合成部は、前記歌詞データに基づき合成される前記歌唱音声の発声タイミングを、前記音量検出部で検出された音量に応じて変化させる
    ことを特徴とする請求項1または2に記載の歌唱合成装置。
    The singing according to claim 1, wherein the voice synthesizing unit changes the utterance timing of the singing voice synthesized based on the lyrics data according to the volume detected by the volume detecting unit. Synthesizer.
  4.  前記演奏の進行に応じて伴奏音を生成する音源部と、
     前記伴奏音と前記歌唱音声とを音響的に出力する出力部と、
    を更に備える請求項1乃至3のいずれかに記載の歌唱合成装置。
    A sound source unit that generates an accompaniment sound according to the progress of the performance;
    An output unit for acoustically outputting the accompaniment sound and the singing voice;
    The singing voice synthesizing apparatus according to claim 1, further comprising:
  5.  前記音声合成部は、前記合成する歌唱音声の音高を前記音高検出部で検出された音高に対して所与の音程を持つ音高に変換し、
     前記合成された歌唱音声と共に前記入力音声を音響的に出力する、
    ことを特徴とする請求項1乃至4のいずれかに記載の歌唱合成装置。
    The voice synthesis unit converts the pitch of the singing voice to be synthesized into a pitch having a given pitch with respect to the pitch detected by the pitch detection unit,
    Acoustically outputting the input voice together with the synthesized singing voice;
    The singing voice synthesizing apparatus according to any one of claims 1 to 4.
  6.  前記音声合成部は、前記歌詞データの文字に対応する音声素片データに基づき前記歌唱音声を合成する、請求項1乃至5のいずれかに記載の歌唱合成装置。 The singing voice synthesizing apparatus according to any one of claims 1 to 5, wherein the voice synthesizing unit synthesizes the singing voice based on voice segment data corresponding to characters of the lyrics data.
  7.  前記入力音声は、ユーザから発声された音声である、請求項1乃至5のいずれかに記載の歌唱合成装置。 The singing voice synthesizing apparatus according to any one of claims 1 to 5, wherein the input voice is voice uttered by a user.
  8.  入力音声の音高を検出するステップと、
     前記入力音声の音量を検出するステップと、
     演奏の進行に応じて供給される歌詞データに基づき歌唱音声を合成するステップであって、前記検出された音高及び前記検出された音量に応じて音高及び音量を制御して前記歌唱音声を合成する前記ステップと、
    からなる、コンピュータにより実装される方法。
    Detecting the pitch of the input voice;
    Detecting the volume of the input voice;
    A step of synthesizing a singing voice based on lyrics data supplied in accordance with the progress of the performance, wherein the singing voice is controlled by controlling the pitch and the volume according to the detected pitch and the detected volume. Said step of synthesizing;
    A computer-implemented method comprising:
  9.  入力音声の音高を検出するステップと、
     前記入力音声の音量を検出するステップと、
     演奏の進行に応じて供給される歌詞データに基づき歌唱音声を合成するステップであって、前記検出された音高及び前記検出された音量に応じて音高及び音量を制御して前記歌唱音声を合成する前記ステップと、
    からなる方法を実行するための、プロセッサによって実行可能なプログラムを記憶した、非一過性のコンピュータ読取り可能な記憶媒体。
    Detecting the pitch of the input voice;
    Detecting the volume of the input voice;
    A step of synthesizing a singing voice based on lyrics data supplied in accordance with the progress of the performance, wherein the singing voice is controlled by controlling the pitch and the volume according to the detected pitch and the detected volume. Said step of synthesizing;
    A non-transitory computer readable storage medium storing a program executable by a processor for executing a method comprising:
PCT/JP2014/078080 2013-10-23 2014-10-22 Singing voice synthesis WO2015060340A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2013219805A JP2015082028A (en) 2013-10-23 2013-10-23 Singing synthetic device and program
JP2013-219805 2013-10-23

Publications (1)

Publication Number Publication Date
WO2015060340A1 true WO2015060340A1 (en) 2015-04-30

Family

ID=52992930

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/078080 WO2015060340A1 (en) 2013-10-23 2014-10-22 Singing voice synthesis

Country Status (2)

Country Link
JP (1) JP2015082028A (en)
WO (1) WO2015060340A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107025902A (en) * 2017-05-08 2017-08-08 腾讯音乐娱乐(深圳)有限公司 Data processing method and device
JP2017167411A (en) * 2016-03-17 2017-09-21 ヤマハ株式会社 Voice synthesis method and voice synthesis control device
CN110390922A (en) * 2018-04-16 2019-10-29 卡西欧计算机株式会社 Electronic musical instrument, the control method of electronic musical instrument and storage medium
CN110741430A (en) * 2017-06-14 2020-01-31 雅马哈株式会社 Singing synthesis method and singing synthesis system
JP2020086113A (en) * 2018-11-26 2020-06-04 株式会社第一興商 Karaoke system and karaoke device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6801766B2 (en) * 2019-10-30 2020-12-16 カシオ計算機株式会社 Electronic musical instruments, control methods for electronic musical instruments, and programs

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10268895A (en) * 1997-03-28 1998-10-09 Yamaha Corp Voice signal processing device
JP2002202788A (en) * 2000-12-28 2002-07-19 Yamaha Corp Method for synthesizing singing, apparatus and recording medium
JP2006030609A (en) * 2004-07-16 2006-02-02 Yamaha Corp Voice synthesis data generating device, voice synthesizing device, voice synthesis data generating program, and voice synthesizing program
JP2006119674A (en) * 2006-01-30 2006-05-11 Yamaha Corp Singing composition method and system, and recording medium
JP2013195928A (en) * 2012-03-22 2013-09-30 Yamaha Corp Synthesis unit segmentation device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10268895A (en) * 1997-03-28 1998-10-09 Yamaha Corp Voice signal processing device
JP2002202788A (en) * 2000-12-28 2002-07-19 Yamaha Corp Method for synthesizing singing, apparatus and recording medium
JP2006030609A (en) * 2004-07-16 2006-02-02 Yamaha Corp Voice synthesis data generating device, voice synthesizing device, voice synthesis data generating program, and voice synthesizing program
JP2006119674A (en) * 2006-01-30 2006-05-11 Yamaha Corp Singing composition method and system, and recording medium
JP2013195928A (en) * 2012-03-22 2013-09-30 Yamaha Corp Synthesis unit segmentation device

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017167411A (en) * 2016-03-17 2017-09-21 ヤマハ株式会社 Voice synthesis method and voice synthesis control device
WO2017159083A1 (en) * 2016-03-17 2017-09-21 ヤマハ株式会社 Sound synthesis method and sound synthesis control device
CN107025902A (en) * 2017-05-08 2017-08-08 腾讯音乐娱乐(深圳)有限公司 Data processing method and device
CN107025902B (en) * 2017-05-08 2020-10-09 腾讯音乐娱乐(深圳)有限公司 Data processing method and device
CN110741430A (en) * 2017-06-14 2020-01-31 雅马哈株式会社 Singing synthesis method and singing synthesis system
CN110741430B (en) * 2017-06-14 2023-11-14 雅马哈株式会社 Singing synthesis method and singing synthesis system
CN110390922A (en) * 2018-04-16 2019-10-29 卡西欧计算机株式会社 Electronic musical instrument, the control method of electronic musical instrument and storage medium
CN110390922B (en) * 2018-04-16 2023-01-10 卡西欧计算机株式会社 Electronic musical instrument, control method for electronic musical instrument, and storage medium
JP2020086113A (en) * 2018-11-26 2020-06-04 株式会社第一興商 Karaoke system and karaoke device
JP7117228B2 (en) 2018-11-26 2022-08-12 株式会社第一興商 karaoke system, karaoke machine

Also Published As

Publication number Publication date
JP2015082028A (en) 2015-04-27

Similar Documents

Publication Publication Date Title
WO2015060340A1 (en) Singing voice synthesis
JP2014501941A (en) Music content production system using client terminal
JP6784022B2 (en) Speech synthesis method, speech synthesis control method, speech synthesis device, speech synthesis control device and program
JP2011048335A (en) Singing voice synthesis system, singing voice synthesis method and singing voice synthesis device
JP2016177276A (en) Pronunciation device, pronunciation method, and pronunciation program
JP2013045082A (en) Musical piece generation device
JPH11184490A (en) Singing synthesizing method by rule voice synthesis
JP4844623B2 (en) CHORAL SYNTHESIS DEVICE, CHORAL SYNTHESIS METHOD, AND PROGRAM
JP4038836B2 (en) Karaoke equipment
JP4304934B2 (en) CHORAL SYNTHESIS DEVICE, CHORAL SYNTHESIS METHOD, AND PROGRAM
JP6044284B2 (en) Speech synthesizer
JP6171393B2 (en) Acoustic synthesis apparatus and acoustic synthesis method
JP2003015672A (en) Karaoke device having range of voice notifying function
JP4433734B2 (en) Speech analysis / synthesis apparatus, speech analysis apparatus, and program
JP5018422B2 (en) Harmony sound generator and program
JP4180548B2 (en) Karaoke device with vocal range notification function
JP2014098802A (en) Voice synthesizing apparatus
JP2022065554A (en) Method for synthesizing voice and program
JP5106437B2 (en) Karaoke apparatus, control method therefor, and control program therefor
WO2023233856A1 (en) Sound control device, method for controlling said device, program, and electronic musical instrument
JP7509127B2 (en) Information processing device, electronic musical instrument system, electronic musical instrument, syllable progression control method and program
JP6144593B2 (en) Singing scoring system
JP2002221978A (en) Vocal data forming device, vocal data forming method and singing tone synthesizer
JP7158331B2 (en) karaoke device
JP2009244790A (en) Karaoke system with singing teaching function

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14855205

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14855205

Country of ref document: EP

Kind code of ref document: A1