WO2019003349A1 - Sound-producing device and method - Google Patents

Sound-producing device and method Download PDF

Info

Publication number
WO2019003349A1
WO2019003349A1 PCT/JP2017/023783 JP2017023783W WO2019003349A1 WO 2019003349 A1 WO2019003349 A1 WO 2019003349A1 JP 2017023783 W JP2017023783 W JP 2017023783W WO 2019003349 A1 WO2019003349 A1 WO 2019003349A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
singing
sound generation
phrase
section
Prior art date
Application number
PCT/JP2017/023783
Other languages
French (fr)
Japanese (ja)
Inventor
一輝 柏瀬
桂三 濱野
Original Assignee
ヤマハ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ヤマハ株式会社 filed Critical ヤマハ株式会社
Priority to JP2019526038A priority Critical patent/JP6787491B2/en
Priority to CN201780091661.1A priority patent/CN110720122B/en
Priority to PCT/JP2017/023783 priority patent/WO2019003349A1/en
Publication of WO2019003349A1 publication Critical patent/WO2019003349A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser

Definitions

  • the present invention relates to a sound generating device and method for producing a singing sound based on singing data.
  • the apparatus of Patent Document 1 causes the user to input a plurality of types of synthesis information (phonological information and prosody information) for each note, and performs singing synthesis in real time.
  • the apparatus of Patent Document 1 gives a sense of discomfort to the user if there is a shift in the input timing of the phonetic information and the prosody information, so the device of Patent Document 1 determines the voice signal corresponding to the earliest synthesis information from the first synthesis information Uncomfortable feeling is eased by pronouncing the dummy sound until the output of is started. According to this, the sense of incongruity when singing one syllable one by one in a fixed order can be alleviated.
  • the lyrics of a song are configured to have a plurality of unit (section) having groupings of phrases and the like. Therefore, in the middle of the singing of a certain phrase, a player may want to shift to the singing of the next phrase. If it is configured to be able to switch phrases, it is necessary to confirm the phrase to be switched and to move the sound generation position to syllables in the phrase after switching. If time is required for finalization of the phrase to be switched to and the actual switching process, the pronunciation of syllables based on the original singing instruction may be interrupted every time the phrase is switched, which may cause discomfort. The accompaniment sounds are particularly noticeable when they are also played back.
  • An object of the present invention is to provide a sound generation device and method capable of alleviating discomfort when switching between sound generation sections.
  • a data acquisition unit for acquiring singing data including syllable information as a basis of pronunciation and including a plurality of continuous sections, and for singing performed by the data acquisition unit.
  • a detection unit for detecting a section designation operation for specifying a next pronunciation target section, and a predetermined different from the singing sound based on the singing instruction in response to the detection of the section specification operation by the detection section.
  • a sound generation control unit for generating a singing sound.
  • FIG. 1 is a schematic view of a sound generation device according to a first embodiment of the present invention.
  • This sound generator is configured as an electronic musical instrument 100 which is a keyboard instrument as an example, and has a main body 30 and a neck 31.
  • the main body portion 30 has a first surface 30a, a second surface 30b, a third surface 30c, and a fourth surface 30d.
  • the first surface 30a is a keyboard mounting surface on which a keyboard section KB composed of a plurality of keys is disposed.
  • the second surface 30 b is the back surface. Hooks 36 and 37 are provided on the second surface 30 b.
  • a strap (not shown) can be placed between the hooks 36 and 37, and the player usually puts the strap on his shoulder and performs performance such as operating the keyboard KB. Therefore, at the time of use with shoulders, particularly when the scale direction (key arrangement direction) of the keyboard KB is in the left-right direction, the first surface 30a and the keyboard KB face the listener side, the third surface 30c, the fourth The faces 30d face generally downward and upward, respectively.
  • the neck portion 31 is extended from the side of the main body 30.
  • the neck portion 31 is provided with various operators including the advance operator 34 and the return operator 35.
  • a display unit 33 composed of liquid crystal or the like is disposed on the fourth surface 30 d of the main body 30.
  • the electronic musical instrument 100 is a musical instrument that simulates singing in response to an operation on a performance operator.
  • the song simulation is to output a voice simulating a human voice by song synthesis.
  • White keys and black keys are arranged in the order of pitches of the keys of the keyboard section KB, and the keys are associated with different pitches.
  • the user presses a desired key on the keyboard KB.
  • the electronic musical instrument 100 detects a key operated by the user, and produces a singing sound of a pitch corresponding to the operated key.
  • the order of the syllables of the singing voice to be pronounced is predetermined.
  • FIG. 2 is a block diagram of the electronic musical instrument 100.
  • the electronic musical instrument 100 includes a central processing unit (CPU) 10, a timer 11, a read only memory (ROM) 12, a random access memory (RAM) 13, a data storage unit 14, a performance control 15, and other operations.
  • CPU central processing unit
  • ROM read only memory
  • RAM random access memory
  • a slave 16 a parameter value setting operator 17, a display unit 33, a sound source 19, an effect circuit 20, a sound system 21, a communication I / F (Interface), and a bus 23.
  • the CPU 10 is a central processing unit that controls the entire electronic musical instrument 100.
  • the timer 11 is a module that measures time.
  • the ROM 12 is a non-volatile memory that stores control programs and various data.
  • the RAM 13 is a volatile memory used as a work area of the CPU 10 and various buffers.
  • the display unit 33 is a display module such as a liquid crystal display panel or an organic EL (Electro-Luminescence) panel. The display unit 33 displays the operation state of the electronic musical instrument 100, various setting screens, a message for the user, and the like.
  • the performance operator 15 is a module that mainly accepts a performance operation that designates a pitch.
  • the keyboard portion KB, the advance operator 34, and the return operator 35 are included in the performance operator 15.
  • the performance operation element 15 when the performance operation element 15 is a keyboard, the performance operation element 15 may be a note on / note off based on sensor on / off corresponding to each key, a key depression strength (speed, velocity), etc.
  • Output performance information may be in the form of a MIDI (musical instrument digital interface) message.
  • the other operator 16 is, for example, an operation module such as an operation button or an operation knob for performing settings other than performance, such as settings relating to the electronic musical instrument 100.
  • the parameter value setting operation unit 17 is an operation module such as an operation button or an operation knob that is mainly used to set parameters for the attribute of the singing voice. Examples of this parameter include harmonics, brightness, resonance, and gender factor.
  • the harmony is a parameter for setting the balance of the harmonic component contained in the voice.
  • Brightness is a parameter for setting the tone of the voice and gives a tone change.
  • the resonance is a parameter for setting timbre and strength of singing voice and musical instrument sound.
  • the gender element is a parameter for setting formants, and changes the thickness and texture of the voice in a feminine or male manner.
  • the external storage device 3 is, for example, an external device connected to the electronic musical instrument 100, and is, for example, a device that stores audio data.
  • the communication I / F 22 is a communication module that communicates with an external device.
  • the bus 23 transfers data between the units in the electronic musical instrument 100.
  • the data storage unit 14 stores singing data 14a.
  • the song data 14a includes lyric text data, a phonological information database, and the like.
  • the lyrics text data is data describing the lyrics.
  • the lyrics of each song are described divided in syllable units. That is, the lyric text data has character information obtained by dividing the lyrics into syllables, and the character information is also information for display corresponding to the syllables.
  • the syllable is a group of sounds output in response to one performance operation.
  • the phoneme information database is a database storing speech segment data.
  • the voice segment data is data indicating a waveform of voice, and includes, for example, spectrum data of a sample string of the voice segment as waveform data.
  • the speech segment data includes segment pitch data indicating the pitch of the waveform of the speech segment.
  • the lyrics text data and the speech segment data may each be managed by a database.
  • the sound source 19 is a module having a plurality of tone generation channels. Under the control of the CPU 10, one sound generation channel is assigned to the sound source 19 in accordance with the user's performance. In the case of producing a singing voice, the sound source 19 reads voice segment data corresponding to a performance from the data storage unit 14 in the assigned tone generation channel to generate singing voice data.
  • the effect circuit 20 applies the acoustic effect designated by the parameter value setting operator 17 to the singing voice data generated by the sound source 19.
  • the sound system 21 converts the singing sound data processed by the effect circuit 20 into an analog signal by a digital / analog converter. Then, the sound system 21 amplifies the singing sound converted into the analog signal and outputs it from a speaker or the like.
  • FIG. 3 is a diagram showing the main part of the display unit 33.
  • the display unit 33 has a first main area 41, a second main area 42, a first sub area 43, and a second sub area 44 as display areas.
  • the entire display area is configured in two lines (two stages), and the first main area 41 and the first sub area 43 are the first line (upper stage), and the second main area 42 and the second sub area 44 are two lines. It is placed on the eye (lower).
  • a plurality of display frames 45 (45-1, 45-2, 45-3,...) are arranged in series in the longitudinal direction of the display unit 33. Starting from the display frame 45-1 at the left end of FIG. 3, characters corresponding to syllables are displayed in the order of scheduled pronunciation.
  • the main areas 41 and 42 are mainly used for displaying lyrics.
  • the lyric text data included in the song data 14a at least includes character information associated with a plurality of syllables according to the selected song.
  • the lyrics text data is data to be sung by the singing part (the sound source 19, the effect circuit 20 and the sound system 21).
  • the lyrics text data is divided in advance into a plurality of continuous sections, and each divided section is referred to as a "phrase".
  • a phrase is a unit of unity and is separated by a meaning that can be easily recognized by the user, but the definition of the section is not limited to this.
  • the CPU 10 acquires the song while being divided into a plurality of phrases.
  • a phrase includes one or more syllables and character information corresponding to the syllables.
  • the CPU 10 causes the first main area 41 (FIG. 3) of the display unit 33 to display character information corresponding to the first phrase of the plurality of phrases corresponding to the selected music.
  • the first character of the first phrase is displayed in the display frame 45-1 at the left end, and a number of characters that can be displayed in the first main area 41 are displayed.
  • characters are displayed in the second main area 42 by the number that can be displayed.
  • the keyboard unit KB plays a role as an instruction acquisition unit for acquiring a singing instruction.
  • the CPU 10 causes the singing part to sing the next syllable to be sung in response to the acquisition of the singing instruction by the operation of the keyboard part KB, etc., and causes the display of the characters displayed in the first main area 41 to be performed.
  • the advancing direction of the character display is the left direction in FIG. 3, and the characters that can not be displayed at first appear from the display frame 45 at the right end according to the progress of singing.
  • the cursor position indicates a syllable to be sung next, and designates a syllable corresponding to the character displayed in the display frame 45-1 of the first main area 41.
  • the lyrics displayed on the display unit 33 are updated according to the operation of the keyboard KB.
  • one letter and one syllable do not necessarily correspond.
  • the lyrics may be in English, for example, if the lyrics are “september", it will be the three syllables of "sep" "tem” "ber". Although “sep” is one syllable, three characters “s” "e” "p” correspond to one syllable. Since the progression of the character display is a syllable unit to the last, in the case of "da", it will advance two characters by singing. Thus, the lyrics are not limited to Japanese and may be other languages.
  • the CPU 10 belongs to the next phrase of the phrase to be displayed in the first main area 41. Character information is displayed in the first main area 41, and character information belonging to the phrase next to the phrase to be displayed in the second main area 42 is displayed in the second main area 42. If there is no phrase following the phrase to be displayed in the second main area 42, the characters displayed in the second main area 42 will disappear (all display frames 45 are blank).
  • the advance operator 34 shown in FIG. 1 is an operator for advancing the display in phrase units.
  • an operation of pressing and releasing the advance operator 34 is an example of the phrase advance operation.
  • the return operator 35 is an operator for advancing the display in phrase units.
  • An operation of pressing and releasing the return operation element 35 is an example of the phrase return operation.
  • the phrase advance operation by the advance operator 34 and the phrase return operation by the return operator 35 correspond to a phrase specification operation (section specification operation) for specifying the next phrase to be sounded (a section to be sounded).
  • the CPU 10 fixes the next pronunciation target phrase. For example, after detecting the pressing operation of the advance operator 34, when detecting the release operation of the advance operator 34, the CPU 10 determines the phrase one after the current phrase as the pronunciation target phrase. Further, after detecting the pressing operation of the return operation 35, when detecting the release operation of the return operation 35, the phrase immediately before the current phrase is decided as the pronunciation target phrase.
  • the depression operation of the advance operation element 34 and the depression operation of the return operation element 35 are the designation start operation of the phrase designation operation.
  • the release operation of the advance operation element 34 and the release operation of the return operation element 35 are the designation end operation among the phrase specification operations.
  • the CPU 10 executes the lyric display process as follows in conjunction with the process of determining the pronunciation target phrase. This lyric display process is executed by a separate flowchart (not shown). First, when detecting the phrase advance operation, the CPU 10 executes the phrase display advancing process to display the determined pronunciation target phrase in the first main area 41. For example, the CPU 10 causes the first main area 41 to display the character string which has been displayed in the second main area 42 so far, and further causes the second main area 42 to display the character string of the next phrase. If there is no phrase following the phrase to be displayed in the second main area 42, the characters displayed in the second main area 42 will disappear (all display frames 45 are blank).
  • the CPU 10 executes the phrase display advancing process to display the determined pronunciation target phrase in the first main area 41.
  • the CPU 10 causes the first main area 41 to display character information belonging to the phrase immediately before the phrase to be displayed in the first main area 41, and the phrase used to be displayed in the second main area 42.
  • the character information belonging to the immediately preceding phrase is displayed in the second main area 42.
  • the CPU 10 in response to the detection of the phrase advance operation or the return operation (the section designation operation indicating the start of designation), the CPU 10 starts the generation of the dummy sound (predetermined singing sound), and at least the next Continue the dummy sound until the target phrase for pronunciation is decided.
  • the dummy sound is a singing sound such as "ru" by singing synthesis, and regardless of the type, the syllable information which is the basis of its pronunciation is stored in the ROM 12 in advance.
  • the syllable information that is the basis of the pronunciation of the dummy sound may be attached to the singing data 14a. Moreover, in the singing data 14a, syllable information for dummy sound may be added for each phrase, and a dummy sound corresponding to the current pronunciation target phrase or the next pronunciation target phrase may be generated. Also, a plurality of syllable information that is the basis of the pronunciation of the dummy sound may be stored, and the dummy sound may be generated based on the singing voice that was pronounced immediately before.
  • FIG. 4 is a flowchart showing an example of the flow of processing when a performance is performed by the electronic musical instrument 100.
  • the processing in the case where the user performs the selection of the musical composition and the performance of the selected musical composition will be described. Further, in order to simplify the description, a case where only a single sound is output will be described even if a plurality of keys are simultaneously operated. In this case, only the highest pitch among the pitches of keys operated simultaneously may be processed, or only the lowest pitch may be processed.
  • the processing described below is realized, for example, by the CPU 10 executing a program stored in the ROM 12 or the RAM 13. In the process illustrated in FIG. 4, the CPU 10 plays a role as a data acquisition unit, a detection unit, a sound generation control unit, and a determination unit.
  • the CPU 10 waits until an operation of selecting a song to be played is received from the user (step S101). Note that if there is no song selection operation even after a certain time has elapsed, the CPU 10 may determine that a song set by default has been selected.
  • the CPU 10 receives the selection of the song, it reads the lyric text data of the song data 14a of the selected song. Then, the CPU 10 sets the cursor position at the top syllable described in the lyric text data (step S102).
  • the cursor is a virtual index indicating the position of the syllable to be pronounced next.
  • the CPU 10 determines whether note-on has been detected based on the operation of the keyboard section KB (step S103).
  • the CPU 10 determines whether the note-off is detected (step S109). On the other hand, when note-on is detected, that is, when a new key depression is detected, the CPU 10 stops the output of the sound if the sound is being output (step S104). The sounds in this case may include dummy sounds. Next, the CPU 10 determines whether or not the next pronunciation target phrase is in the determined state (step S105). At the stage where the singing syllables are sequentially stepped in response to the acquisition of a normal singing instruction (note-on), the pronunciation target phrase is in the determined state. Therefore, in this case, the CPU 10 executes an output sound generation process for producing a singing sound according to note-on (step S107).
  • the CPU 10 reads voice segment data (waveform data) of the syllable corresponding to the cursor position, and outputs the sound of the waveform indicated by the read voice segment data at a pitch corresponding to note-on. Specifically, the CPU 10 obtains the difference between the pitch indicated by the segment pitch data included in the voice segment data and the pitch corresponding to the operated key, and the waveform data is obtained by the frequency corresponding to this difference. The spectral distribution shown is moved in the frequency axis direction. Thus, the electronic musical instrument 100 can output a singing sound at the pitch corresponding to the operated key. Next, the CPU 10 updates the cursor position (read position) (step S108), and advances the process to step S109.
  • voice segment data waveform data
  • the CPU 10 updates the cursor position (read position) (step S108), and advances the process to step S109.
  • FIG. 5 is a view showing an example of lyrics text data.
  • the lyrics of the five syllables c1 to c5 are described in the lyrics text data.
  • Each character "ha”, “ru”, “yo”, “ko”, "i” indicates one Japanese hiragana character and each character corresponds to one syllable.
  • the CPU 10 updates the cursor position in syllable units.
  • the CPU 10 moves the cursor position to the next syllable c4.
  • the CPU 10 sequentially moves the cursor position to the next syllable in response to the note-on.
  • FIG. 6 is a diagram showing an example of the type of speech segment data.
  • the CPU 10 extracts speech segment data corresponding to syllables from the phonological information database in order to pronounce syllables corresponding to the cursor position.
  • phoneme chain data is data indicating a speech segment when the pronunciation changes, such as "silence (#) to consonant", “consonant to vowel", "vowel to consonant or vowel (of the next syllable)" .
  • the steady part data is data indicating a speech segment when the pronunciation of the vowel continues.
  • the sound source 19 includes voice chain data “# -h” corresponding to “silence ⁇ consonant h”, “consonant h ⁇ vowel a
  • the voice chain data “ha” corresponding to “” and the stationary partial data “a” corresponding to “vowel a” are selected.
  • the CPU 10 operates the singing voice based on the voice chain data “# -h”, the voice chain data “ha,” and the steady part data “a”. Output according to the pitch according to, the velocity according to the operation.
  • the determination of the cursor position and the sounding of the singing sound are performed.
  • step S105 if it is determined in step S105 that the next pronunciation target phrase is in an undetermined state, the CPU 10 generates an output sound of a dummy sound at the note-on pitch detected in step S103, and generates a dummy sound. Output.
  • a dummy sound has already been output based on the designation start operation in step S115 described later. Therefore, when the pitch of the dummy sound being output is different from the pitch of the note-on detected in step S103, the CPU 10 changes the dummy sound being output to the pitch of the note-on detected in step S103. Generate a dummy sound output sound to correct. Therefore, after the output of the dummy sound, the player can correct the pitch of the dummy sound by pressing the key until the next phrase is determined. Thereafter, the process proceeds to step S109.
  • step S110 determines whether the next phrase to be sounded is in the finalized state.
  • the CPU 10 stops the output of the sound (step S111), and the process proceeds to step S112. If it is determined in step S110 that the next pronunciation target phrase is in an undetermined state, the CPU 10 advances the process to step S112.
  • step S112 the CPU 10 determines whether or not a designation start operation (an operation of pressing the advance operator 34 or the return operator 35) is detected.
  • the CPU 10 determines whether the designation end operation (the release operation of the advance operator 34 or the return operator 35) is detected (step S116). Then, when the designation end operation is not detected, the CPU 10 advances the process to step S121.
  • step S112 As a result of the determination in step S112, when the designation start operation is detected, the CPU 10 stops outputting the sound if it is outputting the sound (step S113), and sets the sound generation target phrase in an unconfirmed state (step S112). Step S114).
  • the CPU 10 manages, for example, the unconfirmed state and the confirmed state of the pronunciation target phrase by setting 0 and 1 to predetermined flags, for example.
  • the CPU 10 automatically generates a dummy sound and starts output of the dummy sound (step S115). Thereby, the sound generation of the dummy sound is started according to the designation start operation. Thereafter, the process proceeds to step S116.
  • the CPU 10 determines the next pronunciation target phrase based on the designation start operation detected in step S112 and the designation end operation (step S117). . For example, as described above, after the CPU 10 proceeds in step S112 to detect the pressing operation of the operating element 34, it proceeds in step S116 to detect the releasing operation of the operating element 34, and then pronounces the phrase one phrase after the current phrase. Confirm as the target phrase. Next, the CPU 10 updates the reading position, that is, updates the cursor position to the first syllable in the determined pronunciation target phrase (step S118).
  • step S103 when a singing instruction is acquired in step S103 after the next pronunciation target phrase is determined, the syllable corresponding to the beginning of the pronunciation target phrase is sung, and it is possible to immediately shift to the singing of the determined phrase.
  • the update destination of the cursor position in the determined pronunciation target phrase may be a predetermined position, and may not necessarily be the head position.
  • the CPU 10 sets the phrase to be sounded into a definite state (step S119), and stops the outputting dummy sound (step S120). As a result, the generation of the dummy sound ends in response to the determination of the pronunciation target phrase. Thereafter, the process proceeds to step S121.
  • step S121 the CPU 10 executes other processing. For example, when the generation of the dummy sound continues for a predetermined time or more, the CPU 10 restarts generation and output of the same dummy sound. Thereby, for example, when the dummy sound "ru" continues for a long time, the same syllable can be repeatedly sounded like "rule rue". Thereafter, the CPU 10 determines whether or not the performance has ended (step S122), and returns the process to step S103 if the performance has not ended. On the other hand, when the performance is ended, if the sound is being outputted, the CPU 10 stops the output of the sound (step S123), and the processing shown in FIG. 4 is ended. Note that the CPU 10 determines whether the performance has ended, for example, whether the last syllable of the selected song has been pronounced, or whether the operation to end the performance has been performed by the other operating element 16 or the like. It can be determined based on
  • a dummy sound predetermined singing sound
  • the dummy sound is pronounced, so that the sense of discomfort at the switching of the pronunciation section can be alleviated.
  • the sound generation of the dummy sound is started in response to the detection of the designation start operation and continues at least until the next sound generation target section is determined, it is avoided to become silent when the sound generation section changes.
  • the pronunciation target phrase is determined by the designation end operation, it is possible to continue the generation of the dummy sound while the user is performing the phrase specification operation.
  • the CPU 10 changes the sound generation pitch of the dummy sound to the specified pitch (step S106), so the dummy sound pitch correction The discomfort can be further alleviated.
  • the dummy sound is stopped as soon as the pronunciation target phrase is in the finalized state.
  • the dummy sound whose sound generation has been started is continued until the note-on is first after the determination of the sound generation target phrase.
  • step S120 of FIG. 4 may be eliminated.
  • the dummy sound which has been being output is stopped at step S104. Therefore, it is possible to prevent the dummy sound whose sound generation has been started from being interrupted until note-on after the determination of the sound generation target phrase.
  • the present embodiment is effective, for example, in a specification in which the designation start operation and the designation end operation are completed by one operation.
  • the present invention may be applied to a specification in which the designation start operation and the designation end operation are instructed only by pressing the advance operator 34 or the return operator 35, and the release operation has no meaning.
  • the pitch of the dummy sound is corrected to the pitch of the note-on by changing the pitch and re-generating the dummy sound. (Step S106).
  • the dummy sound is not regenerated or reproduced again.
  • FIG. 7 is a part of a flowchart showing an example of the flow of processing when a performance is performed by the electronic musical instrument 100 according to the third embodiment of the present invention.
  • the illustration thereof is omitted. Steps S105 and S106 are abolished.
  • step S103 the CPU 10 determines whether a dummy sound is being generated (step S201). If the dummy sound is not being generated, steps S104, S107, and S108 are performed, and the process proceeds to step S109. Therefore, the sound being produced based on the previous note-on is stopped, and the singing sound based on the current note-on is produced. In addition, that a dummy sound is stopped means that the pronunciation target phrase is decided. On the other hand, when the CPU 10 is producing a dummy sound, the process proceeds to step S109. Therefore, when the dummy sound is being generated, the sound generation based on the note-on is not performed even if the note-on is performed, and the generation of the dummy sound continues without the pitch correction.
  • phrase designation designated operation
  • pressing the designated operator such as the advance operator 34 or the return operator 35 once instructs the designation start operation and the designation end operation, and the phrase to be sounded May be determined.
  • the phrase to be moved by one set of operation is not limited to the adjacent phrase, and the CPU 10 may jump over a plurality of phrases to determine the pronunciation target phrase.
  • the designation start operation and the designation end operation may be completed by pressing the designation operator for a certain period of time. At that time, the CPU 10 may determine the phrase of the movement destination according to the length of time of the long press.
  • the CPU 10 may fix the pronunciation target phrase of the movement destination based on the number of repetitions of the pressing operation of the designated operator and the releasing operation within a predetermined time.
  • the configuration may be such that the pronunciation target phrase of the movement destination can be designated by a combination of operations of the designated operator and other operators. Further, by operating the designated operator in a predetermined mode, the leading phrase of the selected music may be determined as the pronunciation target phrase regardless of the current phrase.
  • the setting of the cursor position in the pronunciation target phrase determined as the determination of the pronunciation target phrase may be performed as follows. For example, if there is a final phrase of the selected song and there is a phrase specification operation by the operating element 34, the CPU 10 determines the top phrase of the selected song as the pronunciation target phrase and sets the cursor to the top syllable of the pronunciation target phrase. Good. In addition, when the phrase specification operation is performed with the return operator 35 as the leading phrase, the CPU 10 may fix the leading phrase of the selected music as the pronunciation target phrase and set the cursor at the leading syllable of the pronunciation target phrase.
  • the song data 14a for the selected song may be acquired in a state of being divided into a plurality of phrases, and is not limited to acquisition in units of songs, and may be acquired in units of phrases.
  • the manner in which the song data 14a is stored in the data storage unit 14 is not limited to the music unit.
  • the acquisition destination of the singing data 14a is not limited to the storage unit, and an external device through the communication I / F 22 may be the acquisition destination.
  • the CPU 10 may acquire the information by the user editing or creating the electronic musical instrument 100.
  • the storage medium storing the control program represented by the software for achieving the present invention may be read out to the present instrument to achieve the same effect, in which case, the storage medium is read from the storage medium.
  • the program code itself implements the novel functions of the present invention, and the non-transitory computer readable recording medium storing the program code constitutes the present invention.
  • the program code may be supplied via a transmission medium or the like, in which case the program code itself constitutes the present invention.
  • ROMs, floppy disks, hard disks, optical disks, magneto-optical disks, CD-ROMs, CD-Rs, magnetic tapes, non-volatile memory cards, etc. can be used as storage media in these cases.
  • non-transitory computer readable recording medium is a volatile memory (for example, a server or client internal to the computer system when the program is transmitted via a network such as the Internet or a communication line such as a telephone line) It also includes one that holds a program for a fixed time, such as a dynamic random access memory (DRAM).
  • DRAM dynamic random access memory

Abstract

Provided is a sound-producing device capable of lessening the incongruity when switching between sound production periods. When a designation start operation (operation of depressing advance operator 34 or reverse operator 35) is detected, a CPU 10 stops a sound being outputted, sets a sound production target phrase as undefined, and automatically generates and starts outputting a dummy sound. Thereafter, the CPU 10 defines the next sound production target phrase when the designation end operation (operation of releasing advance operator 34 or reverse operator 35) is detected. For example, when a release operation is detected after an advance operator 34 depressing operation has been detected, the CPU 10 defines the one phrase that follows the current phrase as the sound production target phrase, and stops the dummy sound.

Description

音発生装置及び方法Sound generation device and method
 本発明は、歌唱用データに基づき歌唱音を発音する音発生装置及び方法に関する。 The present invention relates to a sound generating device and method for producing a singing sound based on singing data.
 音声合成技術を用い、歌唱用データに基づき歌唱音を発音する音発生装置が知られている。例えば、下記特許文献1の装置は、複数種類の合成情報(音韻情報と韻律情報)を音符毎にユーザに入力させ、リアルタイムに歌唱合成を行う。なお、音韻情報と韻律情報との入力タイミングにズレがあるとユーザに違和感を与えることから、特許文献1の装置は、最先の合成情報の入力から、最先の合成情報に対応する音声信号の出力が開始されるまでの間、ダミー音を発音することで違和感を緩和している。これによれば、決まった順番で1音節ずつ歌唱する際の違和感を緩和できる。 There is known a sound generation device that generates a singing sound based on data for singing using a speech synthesis technology. For example, the apparatus of Patent Document 1 causes the user to input a plurality of types of synthesis information (phonological information and prosody information) for each note, and performs singing synthesis in real time. Note that the apparatus of Patent Document 1 gives a sense of discomfort to the user if there is a shift in the input timing of the phonetic information and the prosody information, so the device of Patent Document 1 determines the voice signal corresponding to the earliest synthesis information from the first synthesis information Uncomfortable feeling is eased by pronouncing the dummy sound until the output of is started. According to this, the sense of incongruity when singing one syllable one by one in a fixed order can be alleviated.
特許第6044284号公報Patent No. 6044284
 ところで一般に、曲の歌詞はフレーズ等のまとまりのある単位(区間)を複数有して構成される。そのため、あるフレーズの歌唱途中において、演奏者が、次のフレーズの歌唱へ移行したい場合が考えられる。仮にフレーズの切り替えができるように構成した場合、切り替え先のフレーズを確定し、切り替え後のフレーズ内の音節に発音位置を移動させる等の処理が必要となる。切り替え先のフレーズの確定や実際の切り替え処理のために時間を要すると、フレーズ切り替えの度に本来の歌唱指示に基づく音節の発音が途切れ、違和感を与えるおそれがある。伴奏音も併せて再生しているときには特に目立ってしまう。 By the way, generally, the lyrics of a song are configured to have a plurality of unit (section) having groupings of phrases and the like. Therefore, in the middle of the singing of a certain phrase, a player may want to shift to the singing of the next phrase. If it is configured to be able to switch phrases, it is necessary to confirm the phrase to be switched and to move the sound generation position to syllables in the phrase after switching. If time is required for finalization of the phrase to be switched to and the actual switching process, the pronunciation of syllables based on the original singing instruction may be interrupted every time the phrase is switched, which may cause discomfort. The accompaniment sounds are particularly noticeable when they are also played back.
 本発明の目的は、発音区間の切り替わり時における違和感を緩和することができる音発生装置及び方法を提供することである。 An object of the present invention is to provide a sound generation device and method capable of alleviating discomfort when switching between sound generation sections.
 上記目的を達成するために本発明によれば、発音の基となる音節情報を含み連続する複数の区間からなる歌唱用データを取得するデータ取得部と、前記データ取得部により取得された歌唱用データのうち次の発音対象区間を指定する区間指定操作を検出する検出部と、前記検出部により区間指定操作が検出されたことに応じて、歌唱の指示に基づく歌唱音とは別の所定の歌唱音を発音する発音制御部と、を有する音発生装置が提供される。 According to the present invention to achieve the above object, according to the present invention, there is provided a data acquisition unit for acquiring singing data including syllable information as a basis of pronunciation and including a plurality of continuous sections, and for singing performed by the data acquisition unit. Among the data, a detection unit for detecting a section designation operation for specifying a next pronunciation target section, and a predetermined different from the singing sound based on the singing instruction in response to the detection of the section specification operation by the detection section. And a sound generation control unit for generating a singing sound.
 なお、上記括弧内の符号は例示である。 In addition, the code in the said parenthesis is an illustration.
 本発明によれば、発音区間の切り替わり時における違和感を緩和することができる。 According to the present invention, it is possible to alleviate a sense of incongruity at the time of switching of a sound production section.
音発生装置の模式図である。It is a schematic diagram of a sound generator. 電子楽器のブロック図である。It is a block diagram of an electronic musical instrument. 表示ユニットの主要部を示す図である。It is a figure which shows the principal part of a display unit. 演奏が行われる場合の処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of a process in case a performance is performed. 歌詞テキストデータの一例を示す図である。It is a figure which shows an example of lyric text data. 音声素片データの種類の一例を示す図である。It is a figure which shows an example of the kind of voice | phonetic segment data. 演奏が行われる場合の処理の流れの一例を示すフローチャートの一部である。It is a part of flowchart which shows an example of the flow of a process in case a performance is performed.
 以下、図面を参照して本発明の実施の形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
 (第1の実施の形態)
 図1は、本発明の第1の実施の形態に係る音発生装置の模式図である。この音発生装置は、一例として鍵盤楽器である電子楽器100として構成され、本体部30及びネック部31を有する。本体部30は、第1面30a、第2面30b、第3面30c、第4面30dを有する。第1面30aは、複数の鍵から成る鍵盤部KBが配設される鍵盤配設面である。第2面30bは裏面である。第2面30bにはフック36、37が設けられる。フック36、37間には不図示のストラップを架けることができ、演奏者は通常、ストラップを肩に掛けて鍵盤部KBの操作等の演奏を行う。従って、肩掛けした使用時で、特に鍵盤部KBの音階方向(鍵の配列方向)が左右方向となるとき、第1面30a及び鍵盤部KBが聴取者側を向き、第3面30c、第4面30dはそれぞれ概ね下方、上方を向く。ネック部31は本体部30の側部から延設される。ネック部31には、進み操作子34、戻し操作子35をはじめとする各種の操作子が配設される。本体部30の第4面30dには、液晶等で構成される表示ユニット33が配設される。
First Embodiment
FIG. 1 is a schematic view of a sound generation device according to a first embodiment of the present invention. This sound generator is configured as an electronic musical instrument 100 which is a keyboard instrument as an example, and has a main body 30 and a neck 31. The main body portion 30 has a first surface 30a, a second surface 30b, a third surface 30c, and a fourth surface 30d. The first surface 30a is a keyboard mounting surface on which a keyboard section KB composed of a plurality of keys is disposed. The second surface 30 b is the back surface. Hooks 36 and 37 are provided on the second surface 30 b. A strap (not shown) can be placed between the hooks 36 and 37, and the player usually puts the strap on his shoulder and performs performance such as operating the keyboard KB. Therefore, at the time of use with shoulders, particularly when the scale direction (key arrangement direction) of the keyboard KB is in the left-right direction, the first surface 30a and the keyboard KB face the listener side, the third surface 30c, the fourth The faces 30d face generally downward and upward, respectively. The neck portion 31 is extended from the side of the main body 30. The neck portion 31 is provided with various operators including the advance operator 34 and the return operator 35. A display unit 33 composed of liquid crystal or the like is disposed on the fourth surface 30 d of the main body 30.
 電子楽器100は、演奏操作子への操作に応じて歌唱模擬を行う楽器である。ここで、歌唱模擬とは、歌唱合成により人間の声を模擬した音声を出力することである。鍵盤部KBの各鍵は白鍵、黒鍵が音高順に並べられ、各鍵は、それぞれ異なる音高に対応付けられている。電子楽器100を演奏する場合、ユーザは、鍵盤部KBの所望の鍵を押下する。電子楽器100はユーザにより操作された鍵を検出し、操作された鍵に応じた音高の歌唱音を発音する。なお、発音される歌唱音の音節の順番は予め定められている。 The electronic musical instrument 100 is a musical instrument that simulates singing in response to an operation on a performance operator. Here, the song simulation is to output a voice simulating a human voice by song synthesis. White keys and black keys are arranged in the order of pitches of the keys of the keyboard section KB, and the keys are associated with different pitches. When playing the electronic musical instrument 100, the user presses a desired key on the keyboard KB. The electronic musical instrument 100 detects a key operated by the user, and produces a singing sound of a pitch corresponding to the operated key. The order of the syllables of the singing voice to be pronounced is predetermined.
 図2は、電子楽器100のブロック図である。電子楽器100は、CPU(Central Processing Unit)10と、タイマ11と、ROM(Read Only Memory)12と、RAM(Random Access Memory)13と、データ記憶部14と、演奏操作子15と、他操作子16と、パラメータ値設定操作子17と、表示ユニット33と、音源19と、効果回路20と、サウンドシステム21と、通信I/F(Interface)と、バス23と、を備える。 FIG. 2 is a block diagram of the electronic musical instrument 100. As shown in FIG. The electronic musical instrument 100 includes a central processing unit (CPU) 10, a timer 11, a read only memory (ROM) 12, a random access memory (RAM) 13, a data storage unit 14, a performance control 15, and other operations. A slave 16, a parameter value setting operator 17, a display unit 33, a sound source 19, an effect circuit 20, a sound system 21, a communication I / F (Interface), and a bus 23.
 CPU10は、電子楽器100全体の制御を行う中央処理装置である。タイマ11は、時間を計測するモジュールである。ROM12は制御プログラムや各種のデータなどを格納する不揮発性のメモリである。RAM13はCPU10のワーク領域及び各種のバッファなどとして使用される揮発性のメモリである。表示ユニット33は、液晶ディスプレイパネル、有機EL(Electro-Luminescence)パネルなどの表示モジュールである。表示ユニット33は、電子楽器100の動作状態、各種設定画面、ユーザに対するメッセージなどを表示する。 The CPU 10 is a central processing unit that controls the entire electronic musical instrument 100. The timer 11 is a module that measures time. The ROM 12 is a non-volatile memory that stores control programs and various data. The RAM 13 is a volatile memory used as a work area of the CPU 10 and various buffers. The display unit 33 is a display module such as a liquid crystal display panel or an organic EL (Electro-Luminescence) panel. The display unit 33 displays the operation state of the electronic musical instrument 100, various setting screens, a message for the user, and the like.
 演奏操作子15は、主として音高を指定する演奏操作を受け付けるモジュールである。本実施の形態では、鍵盤部KB、進み操作子34、戻し操作子35は演奏操作子15に含まれる。一例として、演奏操作子15が鍵盤である場合、演奏操作子15は、各鍵に対応するセンサのオン/オフに基づくノートオン/ノートオフ、押鍵の強さ(速さ、ベロシティ)などの演奏情報を出力する。この演奏情報は、MIDI(musical instrument digital interface)メッセージ形式であってもよい。 The performance operator 15 is a module that mainly accepts a performance operation that designates a pitch. In the present embodiment, the keyboard portion KB, the advance operator 34, and the return operator 35 are included in the performance operator 15. As an example, when the performance operation element 15 is a keyboard, the performance operation element 15 may be a note on / note off based on sensor on / off corresponding to each key, a key depression strength (speed, velocity), etc. Output performance information. This performance information may be in the form of a MIDI (musical instrument digital interface) message.
 他操作子16は、例えば、電子楽器100に関する設定など、演奏以外の設定を行うための操作ボタンや操作つまみなどの操作モジュールである。パラメータ値設定操作子17は、主として歌唱音の属性についてのパラメータを設定するために使用される、操作ボタンや操作つまみなどの操作モジュールである。このパラメータとしては、例えば、和声(Harmonics)、明るさ(Brightness)、共鳴(Resonance)、性別要素(Gender Factor)等がある。和声とは、声に含まれる倍音成分のバランスを設定するパラメータである。明るさとは、声の明暗を設定するパラメータであり、トーン変化を与える。共鳴とは、歌唱音声や楽器音の、音色や強弱を設定するパラメータである。性別要素とは、フォルマントを設定するパラメータであり、声の太さ、質感を女性的、或いは、男性的に変化させる。外部記憶装置3は、例えば、電子楽器100に接続される外部機器であり、例えば、音声データを記憶する装置である。通信I/F22は、外部機器と通信する通信モジュールである。バス23は電子楽器100における各部の間のデータ転送を行う。 The other operator 16 is, for example, an operation module such as an operation button or an operation knob for performing settings other than performance, such as settings relating to the electronic musical instrument 100. The parameter value setting operation unit 17 is an operation module such as an operation button or an operation knob that is mainly used to set parameters for the attribute of the singing voice. Examples of this parameter include harmonics, brightness, resonance, and gender factor. The harmony is a parameter for setting the balance of the harmonic component contained in the voice. Brightness is a parameter for setting the tone of the voice and gives a tone change. The resonance is a parameter for setting timbre and strength of singing voice and musical instrument sound. The gender element is a parameter for setting formants, and changes the thickness and texture of the voice in a feminine or male manner. The external storage device 3 is, for example, an external device connected to the electronic musical instrument 100, and is, for example, a device that stores audio data. The communication I / F 22 is a communication module that communicates with an external device. The bus 23 transfers data between the units in the electronic musical instrument 100.
 データ記憶部14は、歌唱用データ14aを格納する。歌唱用データ14aには歌詞テキストデータ、音韻情報データベースなどが含まれる。歌詞テキストデータは、歌詞を記述するデータである。歌詞テキストデータには、曲ごとの歌詞が音節単位で区切られて記述されている。すなわち、歌詞テキストデータは歌詞を音節に区切った文字情報を有し、この文字情報は音節に対応する表示用の情報でもある。ここで音節とは、1回の演奏操作に応じて出力する音のまとまりである。音韻情報データベースは、音声素片データを格納するデータベースである。音声素片データは音声の波形を示すデータであり、例えば、音声素片のサンプル列のスペクトルデータを波形データとして含む。また、音声素片データには、音声素片の波形のピッチを示す素片ピッチデータが含まれる。歌詞テキストデータ、音声素片データは、それぞれ、データベースにより管理されてもよい。 The data storage unit 14 stores singing data 14a. The song data 14a includes lyric text data, a phonological information database, and the like. The lyrics text data is data describing the lyrics. In the lyric text data, the lyrics of each song are described divided in syllable units. That is, the lyric text data has character information obtained by dividing the lyrics into syllables, and the character information is also information for display corresponding to the syllables. Here, the syllable is a group of sounds output in response to one performance operation. The phoneme information database is a database storing speech segment data. The voice segment data is data indicating a waveform of voice, and includes, for example, spectrum data of a sample string of the voice segment as waveform data. The speech segment data includes segment pitch data indicating the pitch of the waveform of the speech segment. The lyrics text data and the speech segment data may each be managed by a database.
 音源19は、複数の発音チャンネルを有するモジュールである。音源19には、CPU10の制御の基で、ユーザの演奏に応じて1つの発音チャンネルが割り当てられる。歌唱音を発音する場合、音源19は、割り当てられた発音チャンネルにおいて、データ記憶部14から演奏に対応する音声素片データを読み出して歌唱音データを生成する。効果回路20は、音源19が生成した歌唱音データに対して、パラメータ値設定操作子17により指定された音響効果を適用する。サウンドシステム21は、効果回路20による処理後の歌唱音データを、デジタル/アナログ変換器によりアナログ信号に変換する。そして、サウンドシステム21は、アナログ信号に変換された歌唱音を増幅してスピーカなどから出力する。 The sound source 19 is a module having a plurality of tone generation channels. Under the control of the CPU 10, one sound generation channel is assigned to the sound source 19 in accordance with the user's performance. In the case of producing a singing voice, the sound source 19 reads voice segment data corresponding to a performance from the data storage unit 14 in the assigned tone generation channel to generate singing voice data. The effect circuit 20 applies the acoustic effect designated by the parameter value setting operator 17 to the singing voice data generated by the sound source 19. The sound system 21 converts the singing sound data processed by the effect circuit 20 into an analog signal by a digital / analog converter. Then, the sound system 21 amplifies the singing sound converted into the analog signal and outputs it from a speaker or the like.
 図3は、表示ユニット33の主要部を示す図である。表示ユニット33は、表示領域として、第1メインエリア41、第2メインエリア42、第1サブエリア43、第2サブエリア44を有する。全体の表示領域は2行(2段)構成となっており、第1メインエリア41及び第1サブエリア43が1行目(上段)、第2メインエリア42及び第2サブエリア44が2行目(下段)に配置される。メインエリア41、42のそれぞれにおいて、表示ユニット33の長手方向に複数の表示枠45(45-1、45-2、45-3・・・)が直列に配置されている。図3の左端の表示枠45-1を先頭として、音節に対応する文字が発音予定順に表示される。メインエリア41、42は主として歌詞表示に用いられる。 FIG. 3 is a diagram showing the main part of the display unit 33. As shown in FIG. The display unit 33 has a first main area 41, a second main area 42, a first sub area 43, and a second sub area 44 as display areas. The entire display area is configured in two lines (two stages), and the first main area 41 and the first sub area 43 are the first line (upper stage), and the second main area 42 and the second sub area 44 are two lines. It is placed on the eye (lower). In each of the main areas 41 and 42, a plurality of display frames 45 (45-1, 45-2, 45-3,...) Are arranged in series in the longitudinal direction of the display unit 33. Starting from the display frame 45-1 at the left end of FIG. 3, characters corresponding to syllables are displayed in the order of scheduled pronunciation. The main areas 41 and 42 are mainly used for displaying lyrics.
 次に、歌唱順序及び歌詞表示に着目した動作について説明する。まず、歌唱用データ14aに含まれる歌詞テキストデータは、選択曲に応じた複数の各音節に対応付けられた文字情報を少なくとも含む。歌詞テキストデータは、歌唱部(音源19、効果回路20及びサウンドシステム21)により歌唱されるためのデータである。歌詞テキストデータは予め、連続した複数の区間に分けられており、分割された各区間を「フレーズ」と称する。フレーズは、あるまとまりのある単位であり、ユーザが認識しやすい意味により区切られたものであるが、区間の定義はこれに限定されない。CPU10は、曲が選択されると、複数のフレーズに分けられた状態で取得する。フレーズには1以上の音節とその音節に対応する文字情報が含まれる。 Next, an operation focusing on the singing order and the lyrics display will be described. First, the lyric text data included in the song data 14a at least includes character information associated with a plurality of syllables according to the selected song. The lyrics text data is data to be sung by the singing part (the sound source 19, the effect circuit 20 and the sound system 21). The lyrics text data is divided in advance into a plurality of continuous sections, and each divided section is referred to as a "phrase". A phrase is a unit of unity and is separated by a meaning that can be easily recognized by the user, but the definition of the section is not limited to this. When the song is selected, the CPU 10 acquires the song while being divided into a plurality of phrases. A phrase includes one or more syllables and character information corresponding to the syllables.
 電子楽器100が起動されると、CPU10は、選択曲に対応する複数のフレーズのうち先頭のフレーズに対応する文字情報を、表示ユニット33の第1メインエリア41(図3)に表示させる。その際、1フレーズ目の先頭の文字が左端の表示枠45-1に表示され、第1メインエリア41に表示可能な数だけ文字が表示される。2フレーズ目については、第2メインエリア42に表示可能な数だけ文字が表示される。鍵盤部KBは、歌唱の指示を取得する指示取得部としての役割を果たす。CPU10は、鍵盤部KBの操作等によって歌唱の指示が取得されたことに応じて、次に歌唱する音節を歌唱部に歌唱させると共に、第1メインエリア41に表示された文字の表示を、音節の進行に従って進める。文字表示の歩進方向は図3の左方向であり、最初に表示しきれなかった文字は、歌唱の進行に応じて右端の表示枠45から表れる。カーソル位置は次に歌唱する音節を示すものであり、第1メインエリア41の表示枠45-1に表示された文字に対応する音節を指示する。鍵盤部KBの操作に応じて、表示ユニット33に表示される歌詞が更新される。 When the electronic musical instrument 100 is activated, the CPU 10 causes the first main area 41 (FIG. 3) of the display unit 33 to display character information corresponding to the first phrase of the plurality of phrases corresponding to the selected music. At that time, the first character of the first phrase is displayed in the display frame 45-1 at the left end, and a number of characters that can be displayed in the first main area 41 are displayed. For the second phrase, characters are displayed in the second main area 42 by the number that can be displayed. The keyboard unit KB plays a role as an instruction acquisition unit for acquiring a singing instruction. The CPU 10 causes the singing part to sing the next syllable to be sung in response to the acquisition of the singing instruction by the operation of the keyboard part KB, etc., and causes the display of the characters displayed in the first main area 41 to be performed. Follow the progress of The advancing direction of the character display is the left direction in FIG. 3, and the characters that can not be displayed at first appear from the display frame 45 at the right end according to the progress of singing. The cursor position indicates a syllable to be sung next, and designates a syllable corresponding to the character displayed in the display frame 45-1 of the first main area 41. The lyrics displayed on the display unit 33 are updated according to the operation of the keyboard KB.
 なお、1文字と1音節とは必ずしも対応しない。例えば、濁点を有する「だ」(da)は、「た」(ta)と「"」の2文字が1音節に対応する。また、歌詞は英語でもよく、例えば歌詞が「september」の場合、「sep」「tem」「ber」の3音節となる。「sep」は1音節であるが、「s」「e」「p」の3文字が1音節に対応する。文字表示の歩進はあくまで音節単位であるので、「だ」の場合は歌唱により2文字進むことになる。このように、歌詞は、日本語に限らず他言語であってもよい。 Note that one letter and one syllable do not necessarily correspond. For example, in the case of "da" (da) having a cloud point, two letters "ta" (ta) and "" "correspond to one syllable. Also, the lyrics may be in English, for example, if the lyrics are "september", it will be the three syllables of "sep" "tem" "ber". Although "sep" is one syllable, three characters "s" "e" "p" correspond to one syllable. Since the progression of the character display is a syllable unit to the last, in the case of "da", it will advance two characters by singing. Thus, the lyrics are not limited to Japanese and may be other languages.
 第1メインエリア41への表示対象となっているフレーズの全ての音節が発音済みとなった場合は、CPU10は、第1メインエリア41への表示対象となっているフレーズの次のフレーズに属する文字情報を第1メインエリア41に表示させ、第2メインエリア42への表示対象となっているフレーズの次のフレーズに属する文字情報を第2メインエリア42に表示させる。なお、第2メインエリア42への表示対象となっているフレーズの次のフレーズが存在しない場合は、第2メインエリア42へ表示される文字はなくなる(全ての表示枠45は空白)。 When all syllables of the phrase to be displayed in the first main area 41 have been pronounced, the CPU 10 belongs to the next phrase of the phrase to be displayed in the first main area 41. Character information is displayed in the first main area 41, and character information belonging to the phrase next to the phrase to be displayed in the second main area 42 is displayed in the second main area 42. If there is no phrase following the phrase to be displayed in the second main area 42, the characters displayed in the second main area 42 will disappear (all display frames 45 are blank).
 図1に示す進み操作子34は、フレーズ単位で表示を繰り上げるための操作子である。また、進み操作子34を押下して離す操作をフレーズ進み操作の一例とする。戻し操作子35はフレーズ単位で表示を繰り下げるための操作子である。戻し操作子35を押下して離す操作をフレーズ戻し操作の一例とする。進み操作子34によるフレーズ進み操作、戻し操作子35によるフレーズ戻し操作が、次の発音対象フレーズ(発音対象区間)を指定するフレーズ指定操作(区間指定操作)に該当する。 The advance operator 34 shown in FIG. 1 is an operator for advancing the display in phrase units. In addition, an operation of pressing and releasing the advance operator 34 is an example of the phrase advance operation. The return operator 35 is an operator for advancing the display in phrase units. An operation of pressing and releasing the return operation element 35 is an example of the phrase return operation. The phrase advance operation by the advance operator 34 and the phrase return operation by the return operator 35 correspond to a phrase specification operation (section specification operation) for specifying the next phrase to be sounded (a section to be sounded).
 CPU10は、フレーズ指定操作を検出すると、次の発音対象フレーズを確定させる。例えばCPU10は、進み操作子34の押下操作を検出した後、進み操作子34の離し操作を検出すると、現在のフレーズの1つ後のフレーズを発音対象フレーズとして確定させる。また、戻し操作子35の押下操作を検出した後、戻し操作子35の離し操作を検出すると、現在のフレーズの1つ前のフレーズを発音対象フレーズとして確定させる。進み操作子34の押下操作、戻し操作子35の押下操作は、フレーズ指定操作のうち指定開始操作となる。進み操作子34の離し操作、戻し操作子35の離し操作は、フレーズ指定操作のうち指定終了操作となる。 When detecting the phrase specification operation, the CPU 10 fixes the next pronunciation target phrase. For example, after detecting the pressing operation of the advance operator 34, when detecting the release operation of the advance operator 34, the CPU 10 determines the phrase one after the current phrase as the pronunciation target phrase. Further, after detecting the pressing operation of the return operation 35, when detecting the release operation of the return operation 35, the phrase immediately before the current phrase is decided as the pronunciation target phrase. The depression operation of the advance operation element 34 and the depression operation of the return operation element 35 are the designation start operation of the phrase designation operation. The release operation of the advance operation element 34 and the release operation of the return operation element 35 are the designation end operation among the phrase specification operations.
 発音対象フレーズの確定処理に連動し、CPU10は次のように歌詞表示処理を実行する。この歌詞表示処理は不図示の別途のフローチャートにより実行される。まず、CPU10は、フレーズ進み操作を検出すると、フレーズ表示の繰り上げ処理を実行することで、確定した発音対象フレーズを第1メインエリア41に表示する。例えばCPU10は、それまで第2メインエリア42に表示されていた文字列を第1メインエリア41に表示させると共に、さらに次のフレーズの文字列を第2メインエリア42に表示させる。なお、第2メインエリア42への表示対象となっているフレーズの次のフレーズが存在しない場合は、第2メインエリア42へ表示される文字はなくなる(全ての表示枠45は空白)。一方、CPU10は、フレーズ戻し操作を検出すると、フレーズ表示の繰り下げ処理を実行することで、確定した発音対象フレーズを第1メインエリア41に表示する。例えばCPU10は、第1メインエリア41への表示対象となっていたフレーズの直前のフレーズに属する文字情報を第1メインエリア41に表示させ、第2メインエリア42への表示対象となっていたフレーズの直前のフレーズに属する文字情報を第2メインエリア42に表示させる。 The CPU 10 executes the lyric display process as follows in conjunction with the process of determining the pronunciation target phrase. This lyric display process is executed by a separate flowchart (not shown). First, when detecting the phrase advance operation, the CPU 10 executes the phrase display advancing process to display the determined pronunciation target phrase in the first main area 41. For example, the CPU 10 causes the first main area 41 to display the character string which has been displayed in the second main area 42 so far, and further causes the second main area 42 to display the character string of the next phrase. If there is no phrase following the phrase to be displayed in the second main area 42, the characters displayed in the second main area 42 will disappear (all display frames 45 are blank). On the other hand, when detecting the phrase return operation, the CPU 10 executes the phrase display advancing process to display the determined pronunciation target phrase in the first main area 41. For example, the CPU 10 causes the first main area 41 to display character information belonging to the phrase immediately before the phrase to be displayed in the first main area 41, and the phrase used to be displayed in the second main area 42. The character information belonging to the immediately preceding phrase is displayed in the second main area 42.
 ところで、発音対象フレーズの確定までに、ユーザに認識され得る程度の時間を要する場合がある。発音対象フレーズが確定するまでは次の音節を発音できないため、違和感が生じるおそれがある。そこで本実施の形態では、CPU10は、フレーズ進み操作または戻し操作(指定開始を示す区間指定操作)が検出されたことに応じて、ダミー音(所定の歌唱音)の発音を開始し、少なくとも次の発音対象フレーズが確定するまでそのダミー音を継続する。ダミー音は歌唱合成による「ル(ru)」等の歌唱音であり、その種類は問わず、その発音の基となる音節情報は予めROM12に格納されている。なお、ダミー音の発音の基となる音節情報は、歌唱用データ14aに付随させてもよい。また、歌唱用データ14aにおいて、ダミー音用の音節情報をフレーズごとに付随させ、現在の発音対象フレーズまたは次の発音対象フレーズに対応するダミー音を生成するようにしてもよい。また、ダミー音の発音の基となる音節情報を複数格納しておき、直前に発音していた歌唱音に基づいてダミー音を生成するようにしてもよい。 By the way, it may take some time to be recognized by the user before the pronunciation target phrase is determined. Since the next syllable can not be pronounced until the pronunciation target phrase is decided, there is a possibility that the user may feel discomfort. Therefore, in the present embodiment, in response to the detection of the phrase advance operation or the return operation (the section designation operation indicating the start of designation), the CPU 10 starts the generation of the dummy sound (predetermined singing sound), and at least the next Continue the dummy sound until the target phrase for pronunciation is decided. The dummy sound is a singing sound such as "ru" by singing synthesis, and regardless of the type, the syllable information which is the basis of its pronunciation is stored in the ROM 12 in advance. The syllable information that is the basis of the pronunciation of the dummy sound may be attached to the singing data 14a. Moreover, in the singing data 14a, syllable information for dummy sound may be added for each phrase, and a dummy sound corresponding to the current pronunciation target phrase or the next pronunciation target phrase may be generated. Also, a plurality of syllable information that is the basis of the pronunciation of the dummy sound may be stored, and the dummy sound may be generated based on the singing voice that was pronounced immediately before.
 図4は、電子楽器100による演奏が行われる場合の処理の流れの一例を示すフローチャートである。ここでは、ユーザにより、演奏曲の選択と選択した曲の演奏とが行われる場合の処理について説明する。また、説明を簡単にするため、複数の鍵が同時に操作された場合であっても、単音のみを出力する場合について説明する。この場合、同時に操作された鍵の音高のうち、最も高い音高のみについて処理してもよいし、最も低い音高のみについて処理してもよい。なお、以下に説明する処理は、例えば、CPU10がROM12やRAM13に記憶されたプログラムを実行することにより実現される。図4に示す処理において、CPU10は、データ取得部、検出部、発音制御部、確定部としての役割を果たす。 FIG. 4 is a flowchart showing an example of the flow of processing when a performance is performed by the electronic musical instrument 100. Here, the processing in the case where the user performs the selection of the musical composition and the performance of the selected musical composition will be described. Further, in order to simplify the description, a case where only a single sound is output will be described even if a plurality of keys are simultaneously operated. In this case, only the highest pitch among the pitches of keys operated simultaneously may be processed, or only the lowest pitch may be processed. The processing described below is realized, for example, by the CPU 10 executing a program stored in the ROM 12 or the RAM 13. In the process illustrated in FIG. 4, the CPU 10 plays a role as a data acquisition unit, a detection unit, a sound generation control unit, and a determination unit.
 電源がオンにされると、CPU10は、演奏する曲を選択する操作がユーザから受け付けられるまで待つ(ステップS101)。なお、一定時間経過しても曲選択の操作がない場合は、CPU10は、デフォルトで設定されている曲が選択されたと判断してもよい。CPU10は、曲の選択を受け付けると、選択された曲の歌唱用データ14aの歌詞テキストデータを読み出す。そして、CPU10は、歌詞テキストデータに記述された先頭の音節にカーソル位置を設定する(ステップS102)。ここで、カーソルとは、次に発音する音節の位置を示す仮想的な指標である。次に、CPU10は、鍵盤部KBの操作に基づくノートオンを検出したか否かを判定する(ステップS103)。CPU10は、ノートオンが検出されない場合、ノートオフを検出したか否かを判別する(ステップS109)。一方、ノートオンを検出した場合、すなわち新たな押鍵を検出した場合は、CPU10は、音を出力中であればその音の出力を停止する(ステップS104)。この場合の音にはダミー音も含まれ得る。次にCPU10は、次の発音対象フレーズが確定状態となっているか否かを判別する(ステップS105)。通常の、歌唱指示(ノートオン)の取得に応じて歌唱音節を順に歩進させている段階では、発音対象フレーズが確定状態となっている。従ってこの場合は、CPU10は、ノートオンに応じた歌唱音を発音する出力音生成処理を実行する(ステップS107)。 When the power is turned on, the CPU 10 waits until an operation of selecting a song to be played is received from the user (step S101). Note that if there is no song selection operation even after a certain time has elapsed, the CPU 10 may determine that a song set by default has been selected. When the CPU 10 receives the selection of the song, it reads the lyric text data of the song data 14a of the selected song. Then, the CPU 10 sets the cursor position at the top syllable described in the lyric text data (step S102). Here, the cursor is a virtual index indicating the position of the syllable to be pronounced next. Next, the CPU 10 determines whether note-on has been detected based on the operation of the keyboard section KB (step S103). When the note-on is not detected, the CPU 10 determines whether the note-off is detected (step S109). On the other hand, when note-on is detected, that is, when a new key depression is detected, the CPU 10 stops the output of the sound if the sound is being output (step S104). The sounds in this case may include dummy sounds. Next, the CPU 10 determines whether or not the next pronunciation target phrase is in the determined state (step S105). At the stage where the singing syllables are sequentially stepped in response to the acquisition of a normal singing instruction (note-on), the pronunciation target phrase is in the determined state. Therefore, in this case, the CPU 10 executes an output sound generation process for producing a singing sound according to note-on (step S107).
 この出力音生成処理を説明する。CPU10はまず、カーソル位置に対応する音節の音声素片データ(波形データ)を読み出し、ノートオンに対応する音高で、読み出した音声素片データが示す波形の音を出力する。具体的には、CPU10は、音声素片データに含まれる素片ピッチデータが示す音高と、操作された鍵に対応する音高との差分を求め、この差分に相当する周波数だけ波形データが示すスペクトル分布を周波数軸方向に移動させる。これにより、電子楽器100は、操作された鍵に対応する音高で歌唱音を出力することができる。次に、CPU10は、カーソル位置(読出位置)を更新し(ステップS108)、処理をステップS109に進める。 The output sound generation process will be described. First, the CPU 10 reads voice segment data (waveform data) of the syllable corresponding to the cursor position, and outputs the sound of the waveform indicated by the read voice segment data at a pitch corresponding to note-on. Specifically, the CPU 10 obtains the difference between the pitch indicated by the segment pitch data included in the voice segment data and the pitch corresponding to the operated key, and the waveform data is obtained by the frequency corresponding to this difference. The spectral distribution shown is moved in the frequency axis direction. Thus, the electronic musical instrument 100 can output a singing sound at the pitch corresponding to the operated key. Next, the CPU 10 updates the cursor position (read position) (step S108), and advances the process to step S109.
 ここで、ステップS107、S108の処理に係るカーソル位置の決定と歌唱音の発音について、具体例を用いて説明する。まず、カーソル位置の更新について説明する。図5は、歌詞テキストデータの一例を示す図である。図5の例では、歌詞テキストデータには、5つの音節c1~c5の歌詞が記述されている。各字「は」、「る」、「よ」、「こ」、「い」は、日本語のひらがなの1字を示し、各字が1音節に対応する。CPU10は、音節単位でカーソル位置を更新する。例えば、カーソルが音節c3に位置している場合、「よ」に対応する音声素片データをデータ記憶部14から読み出し、「よ」の歌唱音を発音する。CPU10は、「よ」の発音が終了すると、次の音節c4にカーソル位置を移動させる。このように、CPU10は、ノートオンに応じて次の音節にカーソル位置を順次移動させる。 Here, the determination of the cursor position and the sounding of the singing voice according to the processes of steps S107 and S108 will be described using a specific example. First, updating of the cursor position will be described. FIG. 5 is a view showing an example of lyrics text data. In the example of FIG. 5, the lyrics of the five syllables c1 to c5 are described in the lyrics text data. Each character "ha", "ru", "yo", "ko", "i" indicates one Japanese hiragana character and each character corresponds to one syllable. The CPU 10 updates the cursor position in syllable units. For example, when the cursor is located at the syllable c3, the voice segment data corresponding to "Y" is read from the data storage unit 14, and the singing voice of "Y" is pronounced. When the sound generation of "Yo" is completed, the CPU 10 moves the cursor position to the next syllable c4. Thus, the CPU 10 sequentially moves the cursor position to the next syllable in response to the note-on.
 次に、歌唱音の発音について説明する。図6は、音声素片データの種類の一例を示す図である。CPU10は、カーソル位置に対応する音節を発音させるために、音韻情報データベースから、音節に対応する音声素片データを抽出する。音声素片データには、音素連鎖データと、定常部分データの2種類が存在する。音素連鎖データとは、「無音(#)から子音」、「子音から母音」、「母音から(次の音節の)子音又は母音」など、発音が変化する際の音声素片を示すデータである。定常部分データは、母音の発音が継続する際の音声素片を示すデータである。例えば、カーソル位置が音節c1の「は(ha)」に設定されている場合、音源19は、「無音→子音h」に対応する音声連鎖データ「#-h」と、「子音h→母音a」に対応する音声連鎖データ「h-a」と、「母音a」に対応する定常部分データ「a」と、を選択する。そして、CPU10は、演奏が開始されて押鍵を検出すると、音声連鎖データ「#-h」、音声連鎖データ「h-a」、定常部分データ「a」に基づく歌唱音を、操作された鍵に応じた音高、操作に応じたベロシティで出力する。このようにして、カーソル位置の決定と歌唱音の発音が実行される。 Next, the pronunciation of the singing sound will be described. FIG. 6 is a diagram showing an example of the type of speech segment data. The CPU 10 extracts speech segment data corresponding to syllables from the phonological information database in order to pronounce syllables corresponding to the cursor position. There are two types of phonetic segment data: phoneme chain data and stationary partial data. The phoneme chain data is data indicating a speech segment when the pronunciation changes, such as "silence (#) to consonant", "consonant to vowel", "vowel to consonant or vowel (of the next syllable)" . The steady part data is data indicating a speech segment when the pronunciation of the vowel continues. For example, when the cursor position is set to “ha” of syllable c 1, the sound source 19 includes voice chain data “# -h” corresponding to “silence → consonant h”, “consonant h → vowel a The voice chain data “ha” corresponding to “” and the stationary partial data “a” corresponding to “vowel a” are selected. Then, when the performance is started and the key depression is detected, the CPU 10 operates the singing voice based on the voice chain data “# -h”, the voice chain data “ha,” and the steady part data “a”. Output according to the pitch according to, the velocity according to the operation. Thus, the determination of the cursor position and the sounding of the singing sound are performed.
 一方、ステップS105の判別の結果、次の発音対象フレーズが未確定状態である場合は、CPU10は、ステップS103で検出されたノートオンの音高でダミー音の出力音を生成し、ダミー音を出力する。ここで、後述するステップS115で、指定開始操作に基づきダミー音が既に出力されている。従って、出力中のダミー音の音高とステップS103で検出されたノートオンの音高とが相違する場合は、CPU10は、出力中のダミー音をステップS103で検出されたノートオンの音高に修正するよう、ダミー音の出力音を生成する。従って、ダミー音の出力後、次のフレーズ確定まで、演奏者が押鍵によってダミー音の音高を修正できる。その後、ステップS109に進む。 On the other hand, if it is determined in step S105 that the next pronunciation target phrase is in an undetermined state, the CPU 10 generates an output sound of a dummy sound at the note-on pitch detected in step S103, and generates a dummy sound. Output. Here, a dummy sound has already been output based on the designation start operation in step S115 described later. Therefore, when the pitch of the dummy sound being output is different from the pitch of the note-on detected in step S103, the CPU 10 changes the dummy sound being output to the pitch of the note-on detected in step S103. Generate a dummy sound output sound to correct. Therefore, after the output of the dummy sound, the player can correct the pitch of the dummy sound by pressing the key until the next phrase is determined. Thereafter, the process proceeds to step S109.
 図4のステップS109でノートオフが検出されない場合は、CPU10は処理をステップS112に進める。一方、ノートオフを検出した場合は、CPU10は、次の発音対象フレーズが確定状態となっているか否かを判別する(ステップS110)。通常の、歌唱指示(ノートオン)の取得に応じて歌唱音節を順に歩進させている段階では、発音対象フレーズが確定状態となっている。従ってこの場合は、CPU10は、音を出力中であればその音の出力を停止して(ステップS111)、処理をステップS112に進める。ステップS110の判別の結果、次の発音対象フレーズが未確定状態である場合は、CPU10は処理をステップS112に進める。ステップS112では、CPU10は、指定開始操作(進み操作子34または戻し操作子35の押下操作)が検出されたか否かを判別する。そして指定開始操作が検出されない場合は、CPU10は、指定終了操作(進み操作子34または戻し操作子35の離し操作)が検出されたか否かを判別する(ステップS116)。そして指定終了操作が検出されない場合は、CPU10は、処理をステップS121に進める。 If note-off is not detected in step S109 of FIG. 4, the CPU 10 advances the process to step S112. On the other hand, when the note-off is detected, the CPU 10 determines whether the next phrase to be sounded is in the finalized state (step S110). At the stage where the singing syllables are sequentially stepped in response to the acquisition of a normal singing instruction (note-on), the pronunciation target phrase is in the determined state. Therefore, in this case, if the sound is being output, the CPU 10 stops the output of the sound (step S111), and the process proceeds to step S112. If it is determined in step S110 that the next pronunciation target phrase is in an undetermined state, the CPU 10 advances the process to step S112. In step S112, the CPU 10 determines whether or not a designation start operation (an operation of pressing the advance operator 34 or the return operator 35) is detected. When the designation start operation is not detected, the CPU 10 determines whether the designation end operation (the release operation of the advance operator 34 or the return operator 35) is detected (step S116). Then, when the designation end operation is not detected, the CPU 10 advances the process to step S121.
 ステップS112の判別の結果、指定開始操作が検出された場合は、CPU10は、音を出力中であればその音の出力を停止して(ステップS113)、発音対象フレーズを未確定状態とする(ステップS114)。なお、CPU10は、例えば所定のフラグに0、1を設定する等によって、発音対象フレーズの未確定状態、確定状態を管理する。次に、CPU10は、ダミー音を自動生成し、ダミー音の出力を開始する(ステップS115))。これにより、指定開始操作に応じてダミー音の発音が開始される。その後、処理はステップS116に進む。 As a result of the determination in step S112, when the designation start operation is detected, the CPU 10 stops outputting the sound if it is outputting the sound (step S113), and sets the sound generation target phrase in an unconfirmed state (step S112). Step S114). The CPU 10 manages, for example, the unconfirmed state and the confirmed state of the pronunciation target phrase by setting 0 and 1 to predetermined flags, for example. Next, the CPU 10 automatically generates a dummy sound and starts output of the dummy sound (step S115). Thereby, the sound generation of the dummy sound is started according to the designation start operation. Thereafter, the process proceeds to step S116.
 ステップS116の判別の結果、指定終了操作が検出された場合は、CPU10は、ステップS112で検出した指定開始操作と当該指定終了操作とに基づいて、次の発音対象フレーズを確定させる(ステップS117)。例えばCPU10は、上述したように、ステップS112で進み操作子34の押下操作を検出した後、ステップS116で進み操作子34の離し操作を検出した場合、現在のフレーズの1つ後のフレーズを発音対象フレーズとして確定させる。次に、CPU10は、読み出し位置の更新、すなわち、確定した発音対象フレーズにおける先頭の音節にカーソル位置を更新する(ステップS118)。これにより、次の発音対象フレーズが確定した後のステップS103で歌唱指示が取得されると、当該発音対象フレーズにおける先頭に対応する音節が歌唱されるので、確定したフレーズの歌唱へ直ちに移行できる。なお、確定した発音対象フレーズにおけるカーソル位置の更新先は所定位置でよく、必ずしも先頭位置でなくてもよい。その後、CPU10は、発音対象フレーズを確定状態とし(ステップS119)、出力中のダミー音を停止する(ステップS120)。これにより、発音対象フレーズが確定したことに応じてダミー音の発音が終了する。その後、処理はステップS121に進む。 When the designation end operation is detected as a result of the determination in step S116, the CPU 10 determines the next pronunciation target phrase based on the designation start operation detected in step S112 and the designation end operation (step S117). . For example, as described above, after the CPU 10 proceeds in step S112 to detect the pressing operation of the operating element 34, it proceeds in step S116 to detect the releasing operation of the operating element 34, and then pronounces the phrase one phrase after the current phrase. Confirm as the target phrase. Next, the CPU 10 updates the reading position, that is, updates the cursor position to the first syllable in the determined pronunciation target phrase (step S118). As a result, when a singing instruction is acquired in step S103 after the next pronunciation target phrase is determined, the syllable corresponding to the beginning of the pronunciation target phrase is sung, and it is possible to immediately shift to the singing of the determined phrase. The update destination of the cursor position in the determined pronunciation target phrase may be a predetermined position, and may not necessarily be the head position. Thereafter, the CPU 10 sets the phrase to be sounded into a definite state (step S119), and stops the outputting dummy sound (step S120). As a result, the generation of the dummy sound ends in response to the determination of the pronunciation target phrase. Thereafter, the process proceeds to step S121.
 ステップS121では、CPU10は、その他の処理を実行する。例えばCPU10は、ダミー音の発音が一定時間以上継続している場合は、同じダミー音の生成及び出力をやり直す。それにより、例えば、「ルー」というダミー音が長く続いている場合に、「ルールールー」というように同じ音節の発音を繰り返すことができる。その後、CPU10は、演奏が終了したか否かを判別し(ステップS122)、演奏を終了していない場合は処理をステップS103に戻す。一方、演奏を終了した場合は、CPU10は、音を出力中であればその音の出力を停止して(ステップS123)、図4に示す処理を終了する。なお、CPU10は、演奏を終了したか否かを、例えば、選択曲の最後尾の音節が発音されたか否か、あるいは他操作子16により演奏を終了する操作が行われた否か、などに基づき判別できる。 In step S121, the CPU 10 executes other processing. For example, when the generation of the dummy sound continues for a predetermined time or more, the CPU 10 restarts generation and output of the same dummy sound. Thereby, for example, when the dummy sound "ru" continues for a long time, the same syllable can be repeatedly sounded like "rule rue". Thereafter, the CPU 10 determines whether or not the performance has ended (step S122), and returns the process to step S103 if the performance has not ended. On the other hand, when the performance is ended, if the sound is being outputted, the CPU 10 stops the output of the sound (step S123), and the processing shown in FIG. 4 is ended. Note that the CPU 10 determines whether the performance has ended, for example, whether the last syllable of the selected song has been pronounced, or whether the operation to end the performance has been performed by the other operating element 16 or the like. It can be determined based on
 本実施の形態によれば、フレーズ指定操作が検出されたことに応じて、歌唱の指示に基づく歌唱音とは別のダミー音(所定の歌唱音)が発音される。これにより、フレーズ切り替えの度に本来の歌唱指示に基づく音節の発音が停止しても、ダミー音が発音されることで、発音区間の切り替わり時における違和感を緩和することができる。特にダミー音の発音は、指定開始操作が検出されたことに応じて開始され、少なくとも、次の発音対象区間が確定するまで継続するので、発音区間の切り替わり時に無音となることが回避される。また、指定終了操作により発音対象フレーズが確定するので、ユーザがフレーズ指定操作をしている間、ダミー音の発音を継続させることができる。 According to the present embodiment, in response to the phrase designating operation being detected, a dummy sound (predetermined singing sound) different from the singing sound based on the singing instruction is produced. Thereby, even if the pronunciation of the syllable based on the original singing instruction is stopped every time the phrase is switched, the dummy sound is pronounced, so that the sense of discomfort at the switching of the pronunciation section can be alleviated. In particular, since the sound generation of the dummy sound is started in response to the detection of the designation start operation and continues at least until the next sound generation target section is determined, it is avoided to become silent when the sound generation section changes. Further, since the pronunciation target phrase is determined by the designation end operation, it is possible to continue the generation of the dummy sound while the user is performing the phrase specification operation.
 また、ダミー音の発音中に音高を指定する指示を取得した場合は、CPU10は、ダミー音の発音音高を指定された音高へ変更する(ステップS106)ので、ダミー音の音高修正により違和感を一層緩和することができる。 Further, when an instruction to specify a pitch is acquired during the generation of the dummy sound, the CPU 10 changes the sound generation pitch of the dummy sound to the specified pitch (step S106), so the dummy sound pitch correction The discomfort can be further alleviated.
 (第2の実施の形態)
 第1の実施の形態では、ダミー音は、発音対象フレーズが確定状態となると直ちに停止された。これに対し、本発明の第2の実施の形態では、発音を開始したダミー音を、発音対象フレーズの確定後における最初にノートオンがあるまで継続する。そのために、図4のステップS120を廃止すればよい。そうすれば、発音対象フレーズが確定状態となった後の最初のノートオンにより、それまで出力中であったダミー音がステップS104で停止される。従って、発音が開始されたダミー音を、発音対象フレーズの確定後のノートオンまで途切れないようにすることができる。
Second Embodiment
In the first embodiment, the dummy sound is stopped as soon as the pronunciation target phrase is in the finalized state. On the other hand, in the second embodiment of the present invention, the dummy sound whose sound generation has been started is continued until the note-on is first after the determination of the sound generation target phrase. For that purpose, step S120 of FIG. 4 may be eliminated. Then, by the first note-on after the to-be-pronounced phrase is determined, the dummy sound which has been being output is stopped at step S104. Therefore, it is possible to prevent the dummy sound whose sound generation has been started from being interrupted until note-on after the determination of the sound generation target phrase.
 なお、本実施の形態は、例えば、1つの操作で指定開始操作と指定終了操作が完結してしまうような仕様において効果的である。例えば、進み操作子34または戻し操作子35を押しただけで指定開始操作と指定終了操作が指示され、離し操作は何も意味を持たないような仕様に本発明を適用してもよい。 The present embodiment is effective, for example, in a specification in which the designation start operation and the designation end operation are completed by one operation. For example, the present invention may be applied to a specification in which the designation start operation and the designation end operation are instructed only by pressing the advance operator 34 or the return operator 35, and the release operation has no meaning.
 (第3の実施の形態)
 第1の実施の形態では、ダミー音の発音後にノートオンがあった場合は、音高を変えてダミー音を再発音することで、ダミー音の音高をノートオンの音高に修正するとした(ステップS106)。これに対し本発明の第3の実施の形態では、ダミー音の発音後にノートオンがあってもダミー音の再生成・再発音をしない。
Third Embodiment
In the first embodiment, when there is a note-on after the generation of the dummy sound, the pitch of the dummy sound is corrected to the pitch of the note-on by changing the pitch and re-generating the dummy sound. (Step S106). On the other hand, in the third embodiment of the present invention, even if there is a note-on after the generation of the dummy sound, the dummy sound is not regenerated or reproduced again.
 図7は、本発明の第3の実施の形態に係る電子楽器100による演奏が行われる場合の処理の流れの一例を示すフローチャートの一部である。このフローチャートでは、図4のフローチャートに対し、ステップS103より前の処理、ステップS109より後の処理は同じであるので、それらの図示を省略している。ステップS105、S106は廃止されている。 FIG. 7 is a part of a flowchart showing an example of the flow of processing when a performance is performed by the electronic musical instrument 100 according to the third embodiment of the present invention. In this flowchart, since the process before step S103 and the process after step S109 are the same as the flowchart in FIG. 4, the illustration thereof is omitted. Steps S105 and S106 are abolished.
 ステップS103で、ノートオンを検出すると、CPU10は、ダミー音を発音中であるか否かを判別する(ステップS201)。そして、ダミー音を発音中でない場合は、ステップS104、S107、S108を実行して処理をステップS109に進める。従って、前回のノートオンに基づく発音中の音は停止され、今回のノートオンに基づく歌唱音が発音される。なお、ダミー音が停止されていることは、発音対象フレーズが確定していることを意味する。一方CPU10は、ダミー音を発音中である場合は、処理をステップS109に進める。従って、ダミー音を発音中である場合、ノートオンがあってもノートオンに基づく発音はなされず、ダミー音の発音が音高修正されることなく継続する。 When note-on is detected in step S103, the CPU 10 determines whether a dummy sound is being generated (step S201). If the dummy sound is not being generated, steps S104, S107, and S108 are performed, and the process proceeds to step S109. Therefore, the sound being produced based on the previous note-on is stopped, and the singing sound based on the current note-on is produced. In addition, that a dummy sound is stopped means that the pronunciation target phrase is decided. On the other hand, when the CPU 10 is producing a dummy sound, the process proceeds to step S109. Therefore, when the dummy sound is being generated, the sound generation based on the note-on is not performed even if the note-on is performed, and the generation of the dummy sound continues without the pitch correction.
 なお、フレーズ指定操作の態様については、例示したものに限らず、各種のバリエーションが考えられる。例えば、第2の実施の形態でも言及したように、進み操作子34や戻し操作子35のような指定操作子を1回押下することで指定開始操作及び指定終了操作が指示され、発音対象フレーズが確定する構成としてもよい。また、1組の操作で移動する先のフレーズは、隣接するフレーズに限らず、CPU10は、複数フレーズを飛び越して発音対象フレーズを確定させてもよい。また、指定操作子を一定時間長押しすることで、指定開始操作及び指定終了操作が完了する構成としてもよい。その際、CPU10は、長押しの時間長に応じて移動先のフレーズを確定させてもよい。またCPU10は、一定時間内における指定操作子の押下操作と離し操作の繰り返し回数によって移動先の発音対象フレーズを確定させてもよい。あるいは、指定操作子と他の操作子との操作の組み合わせによって移動先の発音対象フレーズを指定できる構成としてもよい。また、指定操作子を所定の態様で操作することにより、現在のフレーズに拘わらず、選択曲の先頭のフレーズが発音対象フレーズとして確定されるようにしてもよい。 In addition, about the aspect of phrase designation | designated operation, not only what was illustrated but various variations are considered. For example, as mentioned in the second embodiment, pressing the designated operator such as the advance operator 34 or the return operator 35 once instructs the designation start operation and the designation end operation, and the phrase to be sounded May be determined. Further, the phrase to be moved by one set of operation is not limited to the adjacent phrase, and the CPU 10 may jump over a plurality of phrases to determine the pronunciation target phrase. Alternatively, the designation start operation and the designation end operation may be completed by pressing the designation operator for a certain period of time. At that time, the CPU 10 may determine the phrase of the movement destination according to the length of time of the long press. In addition, the CPU 10 may fix the pronunciation target phrase of the movement destination based on the number of repetitions of the pressing operation of the designated operator and the releasing operation within a predetermined time. Alternatively, the configuration may be such that the pronunciation target phrase of the movement destination can be designated by a combination of operations of the designated operator and other operators. Further, by operating the designated operator in a predetermined mode, the leading phrase of the selected music may be determined as the pronunciation target phrase regardless of the current phrase.
 なお、発音対象フレーズの確定と確定した発音対象フレーズにおけるカーソル位置の設定については次のようにしてもよい。例えば、選択曲の最終フレーズで進み操作子34によるフレーズ指定操作があった場合、CPU10は、選択曲の先頭フレーズを発音対象フレーズとして確定させ、発音対象フレーズの先頭音節にカーソルを設定してもよい。また、先頭フレーズで戻し操作子35によるフレーズ指定操作があった場合、CPU10は、選択曲の先頭フレーズを発音対象フレーズとして確定させ、発音対象フレーズの先頭音節にカーソルを設定してもよい。 The setting of the cursor position in the pronunciation target phrase determined as the determination of the pronunciation target phrase may be performed as follows. For example, if there is a final phrase of the selected song and there is a phrase specification operation by the operating element 34, the CPU 10 determines the top phrase of the selected song as the pronunciation target phrase and sets the cursor to the top syllable of the pronunciation target phrase. Good. In addition, when the phrase specification operation is performed with the return operator 35 as the leading phrase, the CPU 10 may fix the leading phrase of the selected music as the pronunciation target phrase and set the cursor at the leading syllable of the pronunciation target phrase.
 なお、選択曲の歌唱用データ14aは、複数のフレーズに分けられた状態で取得できればよく、曲単位で取得することに限定されず、フレーズ単位で取得してもよい。歌唱用データ14aがデータ記憶部14に記憶される態様も曲単位に限定されない。また、歌唱用データ14aの取得先は記憶部に限定されず、通信I/F22を通じた外部機器を取得先としてもよい。また、電子楽器100でユーザが編集または作成することでCPU10により取得されるようにしてもよい。 The song data 14a for the selected song may be acquired in a state of being divided into a plurality of phrases, and is not limited to acquisition in units of songs, and may be acquired in units of phrases. The manner in which the song data 14a is stored in the data storage unit 14 is not limited to the music unit. Further, the acquisition destination of the singing data 14a is not limited to the storage unit, and an external device through the communication I / F 22 may be the acquisition destination. Alternatively, the CPU 10 may acquire the information by the user editing or creating the electronic musical instrument 100.
 以上、本発明をその好適な実施形態に基づいて詳述してきたが、本発明はこれら特定の実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の様々な形態も本発明に含まれる。 Although the present invention has been described in detail based on its preferred embodiments, the present invention is not limited to these specific embodiments, and various embodiments within the scope of the present invention are also included in the present invention. included.
 なお、本発明を達成するためのソフトウェアによって表される制御プログラムを記憶した記憶媒体を、本楽器に読み出すことによって同様の効果を奏するようにしてもよく、その場合、記憶媒体から読み出されたプログラムコード自体が本発明の新規な機能を実現することになり、そのプログラムコードを記憶した、非一過性のコンピュータ読み取り可能な記録媒体は本発明を構成することになる。また、プログラムコードを伝送媒体等を介して供給してもよく、その場合は、プログラムコード自体が本発明を構成することになる。なお、これらの場合の記憶媒体としては、ROMのほか、フロッピディスク、ハードディスク、光ディスク、光磁気ディスク、CD-ROM、CD-R、磁気テープ、不揮発性のメモリカード等を用いることができる。「非一過性のコンピュータ読み取り可能な記録媒体」は、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ(例えばDRAM(Dynamic Random Access Memory))のように、一定時間プログラムを保持しているものも含む。 Note that the storage medium storing the control program represented by the software for achieving the present invention may be read out to the present instrument to achieve the same effect, in which case, the storage medium is read from the storage medium. The program code itself implements the novel functions of the present invention, and the non-transitory computer readable recording medium storing the program code constitutes the present invention. Also, the program code may be supplied via a transmission medium or the like, in which case the program code itself constitutes the present invention. In addition to ROMs, floppy disks, hard disks, optical disks, magneto-optical disks, CD-ROMs, CD-Rs, magnetic tapes, non-volatile memory cards, etc. can be used as storage media in these cases. The “non-transitory computer readable recording medium” is a volatile memory (for example, a server or client internal to the computer system when the program is transmitted via a network such as the Internet or a communication line such as a telephone line) It also includes one that holds a program for a fixed time, such as a dynamic random access memory (DRAM).
10 CPU(データ取得部、検出部、発音制御部、確定部)
14a 歌唱用データ
 
 
 
 
 
10 CPU (data acquisition unit, detection unit, sound generation control unit, determination unit)
14a Data for singing



Claims (8)

  1.  発音の基となる音節情報を含み連続する複数の区間からなる歌唱用データを取得するデータ取得部と、
     前記データ取得部により取得された歌唱用データのうち次の発音対象区間を指定する区間指定操作を検出する検出部と、
     前記検出部により区間指定操作が検出されたことに応じて、歌唱の指示に基づく歌唱音とは別の所定の歌唱音を発音する発音制御部と、を有する音発生装置。
    A data acquisition unit for acquiring song data including plural consecutive segments including syllable information as a basis of pronunciation;
    A detection unit for detecting a section designating operation for designating a next sound generation target section among the song data acquired by the data acquisition section;
    A sound generation control unit for generating a predetermined singing sound different from the singing sound based on the singing instruction in response to the detection of the section designating operation by the detecting unit.
  2.  前記検出部により検出された区間指定操作に基づいて前記次の発音対象区間を確定させる確定部を有し、
     前記発音制御部は、前記検出部により指定開始を示す区間指定操作が検出されたことに応じて前記所定の歌唱音の発音を開始し、少なくとも前記確定部により前記次の発音対象区間が確定するまで、前記所定の歌唱音の発音を継続する請求項1に記載の音発生装置。
    And a determination unit configured to determine the next sound generation target section based on the section designation operation detected by the detection section,
    The sound generation control unit starts sound generation of the predetermined song sound in response to detection of a section designation operation indicating designation start by the detection unit, and at least the next sound generation target section is determined by the determination unit. The sound generation device according to claim 1, wherein the sound generation of the predetermined singing sound is continued until:
  3.  前記確定部は、前記検出部により指定終了を示す区間指定操作が検出されたことに応じて前記次の発音対象区間を確定させる請求項2に記載の音発生装置。 The sound generation device according to claim 2, wherein the determination unit determines the next sound generation target interval in response to a detection of a section designation operation indicating designation end by the detection unit.
  4.  前記歌唱の指示を取得する指示取得部を有し、
     前記発音制御部は、前記指示取得部により歌唱の指示が取得されたことに応じて、前記歌唱用データにおける複数の音節情報のうち、予め定められた順番で規定される音節情報を歌唱する請求項1~3のいずれか1項に記載の音発生装置。
    It has an instruction acquisition unit for acquiring the instruction of the singing,
    The sound generation control unit is configured to sing the syllable information defined in a predetermined order among the plurality of syllable information in the song data in response to the song acquisition instruction being acquired by the instruction acquisition unit. Item 4. A sound generator according to any one of Items 1 to 3.
  5.  前記発音制御部は、前記次の発音対象区間が確定した後、前記指示取得部により歌唱の指示が取得されたことに応じて、前記次の発音対象区間における所定位置に対応する音節情報を歌唱する請求項4に記載の音発生装置。 The sound generation control unit sings syllable information corresponding to a predetermined position in the next sound generation target section in response to the instruction acquisition section acquiring a singing instruction after the next sound generation target section is determined. The sound generator according to claim 4.
  6.  前記発音制御部は、前記次の発音対象区間が確定した後、前記指示取得部により歌唱の指示が取得されたことに応じて前記所定位置に対応する音節情報の歌唱が開始されるまで、前記所定の歌唱音の発音を継続する請求項5に記載の音発生装置。 The sound generation control unit, after the next sound generation target section is determined, until the singing of syllable information corresponding to the predetermined position is started in response to acquisition of a singing instruction by the instruction acquiring unit. The sound generator according to claim 5, wherein the sound generation of a predetermined singing sound is continued.
  7.  前記発音制御部は、前記所定の歌唱音の発音中に音高を指定する指示を取得した場合は、前記所定の歌唱音の発音音高を前記指定された音高へ変更する請求項1~6のいずれか1項に記載の音発生装置。 The sound generation control unit changes the sound generation pitch of the predetermined singing sound to the specified sound pitch when obtaining an instruction to specify the pitch during the generation of the predetermined singing sound. The sound generator according to any one of 6.
  8.  発音の基となる音節情報を含み連続する複数の区間からなる歌唱用データを取得するデータ取得ステップと、
     前記データ取得ステップにより取得された歌唱用データのうち次の発音対象区間を指定する区間指定操作を検出する検出ステップと、
     前記検出ステップにより区間指定操作が検出されたことに応じて、歌唱の指示に基づく歌唱音とは別の所定の歌唱音を発音する発音ステップと、を有する音発生方法。
     
                      
                      
                      
    A data acquisition step of acquiring singing data including a plurality of continuous sections including syllable information as a basis of pronunciation;
    A detection step of detecting a section designating operation for designating a next sound generation target section among the song data acquired by the data acquisition step;
    A sound generating step of generating a predetermined singing sound different from the singing sound based on the singing instruction in response to the detection of the section designating operation in the detecting step.



PCT/JP2017/023783 2017-06-28 2017-06-28 Sound-producing device and method WO2019003349A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2019526038A JP6787491B2 (en) 2017-06-28 2017-06-28 Sound generator and method
CN201780091661.1A CN110720122B (en) 2017-06-28 2017-06-28 Sound generating device and method
PCT/JP2017/023783 WO2019003349A1 (en) 2017-06-28 2017-06-28 Sound-producing device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2017/023783 WO2019003349A1 (en) 2017-06-28 2017-06-28 Sound-producing device and method

Publications (1)

Publication Number Publication Date
WO2019003349A1 true WO2019003349A1 (en) 2019-01-03

Family

ID=64742814

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/023783 WO2019003349A1 (en) 2017-06-28 2017-06-28 Sound-producing device and method

Country Status (3)

Country Link
JP (1) JP6787491B2 (en)
CN (1) CN110720122B (en)
WO (1) WO2019003349A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003167594A (en) * 2001-12-03 2003-06-13 Oki Electric Ind Co Ltd Portable telephone and portable telephone system using singing voice synthesis
JP2008170592A (en) * 2007-01-10 2008-07-24 Yamaha Corp Device and program for synthesizing singing voice
JP2014098800A (en) * 2012-11-14 2014-05-29 Yamaha Corp Voice synthesizing apparatus

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3233036B2 (en) * 1996-07-30 2001-11-26 ヤマハ株式会社 Singing sound synthesizer
JP2003140694A (en) * 2001-11-05 2003-05-16 Matsushita Electric Ind Co Ltd Audio decoder
JP2007504495A (en) * 2003-08-26 2007-03-01 クリアプレイ,インク. Method and apparatus for controlling the performance of an acoustic signal
JP3895766B2 (en) * 2004-07-21 2007-03-22 松下電器産業株式会社 Speech synthesizer
JP6683103B2 (en) * 2016-11-07 2020-04-15 ヤマハ株式会社 Speech synthesis method
JP6977818B2 (en) * 2017-11-29 2021-12-08 ヤマハ株式会社 Speech synthesis methods, speech synthesis systems and programs

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003167594A (en) * 2001-12-03 2003-06-13 Oki Electric Ind Co Ltd Portable telephone and portable telephone system using singing voice synthesis
JP2008170592A (en) * 2007-01-10 2008-07-24 Yamaha Corp Device and program for synthesizing singing voice
JP2014098800A (en) * 2012-11-14 2014-05-29 Yamaha Corp Voice synthesizing apparatus

Also Published As

Publication number Publication date
JP6787491B2 (en) 2020-11-18
JPWO2019003349A1 (en) 2020-01-16
CN110720122B (en) 2023-06-27
CN110720122A (en) 2020-01-21

Similar Documents

Publication Publication Date Title
JP6728754B2 (en) Pronunciation device, pronunciation method and pronunciation program
JP6465136B2 (en) Electronic musical instrument, method, and program
JP6705272B2 (en) Sound control device, sound control method, and program
JP7259817B2 (en) Electronic musical instrument, method and program
US20220076658A1 (en) Electronic musical instrument, method, and storage medium
JP4929604B2 (en) Song data input program
JP6809608B2 (en) Singing sound generator and method, program
JP2016142967A (en) Accompaniment training apparatus and accompaniment training program
JP6787491B2 (en) Sound generator and method
JP6977741B2 (en) Information processing equipment, information processing methods, performance data display systems, and programs
JP2001042879A (en) Karaoke device
JP6732216B2 (en) Lyrics display device, lyrics display method in lyrics display device, and electronic musical instrument
WO2018198380A1 (en) Song lyric display device and method
WO2019003348A1 (en) Singing sound effect generation device, method and program
WO2023120121A1 (en) Consonant length changing device, electronic musical instrument, musical instrument system, method, and program
WO2023153033A1 (en) Information processing method, program, and information processing device
JP7456430B2 (en) Information processing device, electronic musical instrument system, electronic musical instrument, syllable progression control method and program
WO2016152708A1 (en) Sound control device, sound control method, and sound control program
JP2018151548A (en) Pronunciation device and loop section setting method
JP2022038903A (en) Electronic musical instrument, control method for electronic musical instrument, and program
JP2017003625A (en) Singing voice output control device
CN117877459A (en) Recording medium, sound processing method, and sound processing system
WO2019026233A1 (en) Effect control device
JP2017161721A (en) Lyrics generator and lyrics generating method
JP2000200084A (en) Device and method for extracting musical phoneme, and record medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17916367

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019526038

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17916367

Country of ref document: EP

Kind code of ref document: A1