WO2022190502A1 - 音生成装置およびその制御方法、プログラム、電子楽器 - Google Patents

音生成装置およびその制御方法、プログラム、電子楽器 Download PDF

Info

Publication number
WO2022190502A1
WO2022190502A1 PCT/JP2021/046585 JP2021046585W WO2022190502A1 WO 2022190502 A1 WO2022190502 A1 WO 2022190502A1 JP 2021046585 W JP2021046585 W JP 2021046585W WO 2022190502 A1 WO2022190502 A1 WO 2022190502A1
Authority
WO
WIPO (PCT)
Prior art keywords
utterance
instruction
character
time
audio signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2021/046585
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
達也 入山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yamaha Corp
Original Assignee
Yamaha Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yamaha Corp filed Critical Yamaha Corp
Priority to JP2023505112A priority Critical patent/JP7568055B2/ja
Priority to CN202180095312.3A priority patent/CN117043853A/zh
Publication of WO2022190502A1 publication Critical patent/WO2022190502A1/ja
Priority to US18/463,470 priority patent/US20230419946A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • G10L13/0335Pitch control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/36Accompaniment arrangements
    • G10H1/361Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K15/00Acoustics not otherwise provided for
    • G10K15/04Sound-producing devices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/005Musical accompaniment, i.e. complete instrumental rhythm synthesis added to a performed melody, e.g. as output by drum machines
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/051Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for extraction or detection of onsets of musical sounds or notes, i.e. note attack timings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/325Musical pitch modification
    • G10H2210/331Note pitch correction, i.e. modifying a note pitch or replacing it by the closest one in a given scale
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2220/00Input/output interfacing specifically adapted for electrophonic musical tools or instruments
    • G10H2220/155User input interfaces for electrophonic musical instruments
    • G10H2220/361Mouth control in general, i.e. breath, mouth, teeth, tongue or lip-controlled input devices or sensors detecting, e.g. lip position, lip vibration, air pressure, air velocity, air flow or air jet angle
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/315Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
    • G10H2250/455Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis

Definitions

  • the present invention relates to a sound generation device, its control method, program, and electronic musical instrument.
  • singing sounds are synthesized and generated.
  • Such singing sounds (to be distinguished from actual singing, hereinafter referred to as synthesized singing sounds) are synthesized by synthesizing waveforms so as to have a specified pitch while combining speech segments corresponding to characters such as lyrics, for example. By doing so, a synthesized sound is generated as if the character was pronounced.
  • a technique has been used in which a musical score (sequence data, etc.) prepared in advance and characters are combined to generate a synthesized singing voice. Technologies for generating synthesized singing sounds in real time have also been developed.
  • one of the objects of the present invention is to generate natural synthesized singing sounds when vocalizing singing sounds in real-time performance.
  • a plurality of characters to be pronounced are arranged in time series, and first lyric data including at least a first character and a second character after the first character is obtained.
  • first acquisition unit a second acquisition unit that acquires a vocalization start instruction, and a case where the vocalization start instruction is acquired by the second acquisition unit, wherein the vocalization start instruction satisfies a first condition.
  • natural synthesized singing sounds can be generated when vocalizing singing sounds in real-time performance.
  • FIG. 1 is a block diagram showing the configuration of a karaoke system according to one embodiment of the present invention
  • FIG. 1 is a block diagram showing the configuration of an electronic musical instrument according to one embodiment of the present invention
  • FIG. It is a figure explaining the 1st lyric data in one Embodiment of this invention.
  • 4 is a flowchart illustrating sound generation processing in one embodiment of the present invention
  • 4 is a flowchart for explaining instruction processing
  • FIG. 4 is a diagram showing the relationship between time and pitch in sound generation processing
  • FIG. 4 is a diagram showing the relationship between time and pitch in sound generation processing
  • FIG. 4 is a diagram showing the relationship between time and pitch in sound generation processing
  • 4 is a functional block diagram showing a sound generation function in one embodiment of the present invention
  • FIG. 4 is a flowchart for explaining instruction processing;
  • FIG. 4 is a diagram showing the relationship between time and pitch in sound generation processing; It is a figure explaining the 1st lyric data in one Embodiment of this invention.
  • FIG. 4 is a diagram showing the relationship between time and pitch in sound generation processing; It is a figure explaining the 2nd lyric data in one Embodiment of this invention.
  • FIG. 4 is a diagram showing the relationship between time and pitch in sound generation processing; 1 is a block diagram showing the configuration of an electronic wind instrument according to an embodiment of the present invention; FIG.
  • a karaoke system according to an embodiment of the present invention is a karaoke system using an electronic musical instrument capable of generating synthesized singing sounds. In addition, it has the function of generating natural synthesized singing sounds.
  • FIG. 1 is a block diagram showing the configuration of a karaoke system according to one embodiment of the present invention.
  • the karaoke system 100 includes a karaoke device 1 , a control terminal 2 , an electronic musical instrument 3 (sound generation device), a karaoke server 1000 and a singing sound synthesis server 2000 .
  • the karaoke device 1, the karaoke server 1000, and the singing sound synthesis server 2000 are connected via a network NW such as the Internet.
  • the karaoke device 1 is connected to each of the control terminal 2 and the electronic musical instrument 3 by short-range wireless communication, but may be connected by communication via the network NW.
  • Short-range wireless communication is communication using, for example, Bluetooth (registered trademark), infrared communication, LAN (Local Area Network), and the like.
  • the karaoke server 1000 includes a storage device that stores song data necessary for providing karaoke in the karaoke device 1 in association with song IDs.
  • the music data includes data related to karaoke songs, such as lead vocal data, chorus data, accompaniment data, karaoke caption data, and the like.
  • Lead vocal data is data which shows the main melody part of singing music.
  • the chorus data is data indicating a side melody part such as harmonies for the main melody.
  • the accompaniment data is data indicating the accompaniment sound of the song.
  • the lead vocal data, chorus data, and accompaniment data may be data expressed in MIDI format.
  • the karaoke subtitle data is data for displaying lyrics on the display of the karaoke device 1 .
  • the singing sound synthesis server 2000 includes a storage device that stores setting data for setting the electronic musical instrument 3 in accordance with the song ID in association with the song ID.
  • the setting data includes lyric data corresponding to each part of the singing song corresponding to the song ID.
  • the lyric data corresponding to the lead vocal part is called first lyric data.
  • the first lyric data stored in the singing voice synthesis server 2000 may be the same as or different from the karaoke caption data stored in the karaoke server 1000 . That is, the first lyric data stored in the singing voice synthesis server 2000 is the same in that it defines the lyric (characters) to be uttered, but is adjusted to a format that is easy to use in the electronic musical instrument 3.
  • karaoke subtitle data stored in the karaoke server 1000 character strings such as “ko”, “n”, “ni”, “chi”, and “ha”. be.
  • the first lyric data stored in the singing voice synthesis server 2000 is composed of “ko (ko)”, “n (n)”, “ni (ni)”, and “chi” so that the electronic musical instrument 3 can easily use it. It may be a character string matching the actual pronunciation of "(chi)" and "wa”.
  • this format may include, for example, information for identifying the case where two characters are sung with one sound, information for identifying breaks in phrases, and the like.
  • the karaoke device 1 includes an input terminal to which an audio signal is supplied, and a speaker that outputs the audio signal as sound.
  • An audio signal input to the input terminal may be supplied from the electronic musical instrument 3 or may be supplied from a microphone.
  • the karaoke device 1 reproduces an audio signal from the accompaniment data of the music data received from the karaoke server 1000, and outputs the audio signal from the speaker as the accompaniment sound of the song.
  • a sound corresponding to the audio signal supplied to the input terminal may be synthesized with the accompaniment sound and output.
  • the control terminal 2 is a remote controller that transmits user instructions to the karaoke device 1 (for example, song designation, volume, transpose, etc.).
  • the control terminal 2 may transmit a user's instruction to the electronic musical instrument 3 (for example, setting lyrics, setting tone, etc.) via the karaoke apparatus 1 .
  • the control terminal 2 transmits to the karaoke device 1 an instruction to set the music set by the user.
  • the karaoke device 1 acquires the song data of the song from the karaoke server 1000 and the first lyric data from the singing sound synthesis server 2000 based on the instruction.
  • the karaoke device 1 transmits first lyric data to the electronic musical instrument 3 .
  • the electronic musical instrument 3 stores first lyric data.
  • the karaoke apparatus 1 reads the music data and outputs an accompaniment sound or the like according to the user's instruction to start playing the music, and the electronic musical instrument 3 reads the first lyric data and produces a synthesized singing sound according to the performance operation by the user.
  • the electronic musical instrument 3 is a device that generates an audio signal representing a synthesized singing voice in accordance with the contents of instructions in response to the operation of the performance operation section 321 (FIG. 2).
  • the electronic musical instrument 3 is an electronic keyboard device.
  • the performance operation section 321 includes a keyboard including a plurality of keys and a sensor that detects an operation on each key (hereinafter sometimes referred to as a performance operation).
  • the synthesized singing sound may be output from the speaker of the karaoke apparatus 1 by supplying an audio signal from the electronic musical instrument 3 to the input terminal of the karaoke apparatus 1, or may be output from the speaker connected to the electronic musical instrument 3. may be output from
  • FIG. 2 is a block diagram showing the configuration of the electronic musical instrument 3 according to one embodiment of the present invention.
  • the electronic musical instrument 3 includes a control section 301 , a storage section 303 , an operation section 305 , a display section 307 , a communication section 309 , an interface 317 and a performance operation section 321 . Each of these configurations is connected via a bus.
  • the control unit 301 includes an arithmetic processing circuit such as a CPU.
  • the control unit 301 causes the CPU to execute a program stored in the storage unit 303 to realize various functions in the electronic musical instrument 3 .
  • Functions implemented in the electronic musical instrument 3 include, for example, a sound generation function for executing sound generation processing.
  • the control unit 301 includes a DSP (Digital Signal Processor) for generating an audio signal using a sound generation function.
  • the storage unit 303 is a storage device such as a nonvolatile memory.
  • the storage unit 303 stores a program for realizing the sound generation function. The sound generation function will be described later.
  • the storage unit 303 also stores setting information used when generating an audio signal representing a synthesized singing voice, speech segments for generating the synthesized singing voice, and the like.
  • the setting information is, for example, the tone color and the first lyric data received from the server 2000 for singing voice synthesis.
  • the operation unit 305 is a device such as a switch and a volume knob, and outputs a signal to the control unit 301 according to the input operation.
  • a display unit 307 is a display device such as a liquid crystal display or an organic EL display, and displays a screen based on control by the control unit 301 . Note that the operation unit 305 and the display unit 307 may be integrated to form a touch panel.
  • the communication unit 309 connects with the control terminal 2 through short-range wireless communication under the control of the control unit 301 .
  • the performance operation section 321 outputs a performance signal corresponding to the performance operation to the control section 301 .
  • the performance signal includes information indicating the position of the operated key (note number), information indicating key depression (note on), information indicating key release (note off), key depression speed (velocity), and the like. including. Specifically, when a key is pressed, note-on associated with velocity and note number (also referred to as pitch instruction) is output as a performance signal indicating the start of vocalization, and when the key is released, the note number is associated. Then, note-off is output as a performance signal indicating an instruction to stop vocalization.
  • the control section 301 uses this performance signal to generate an audio signal.
  • Interface 317 includes a terminal for outputting the generated audio signal.
  • the first lyric data is data that defines lyrics (characters) to be uttered.
  • the first lyric data has text data in which a plurality of characters to be pronounced are arranged in chronological order.
  • the first lyric data includes timing data defining start and stop times of vocalization for each character on a predetermined time axis. The start time and stop time are defined, for example, as the time relative to the beginning of the song. This timing data associates the progression position of the song with lyrics to be uttered at the progression position.
  • each of the lyrics (characters) to be pronounced that is, one unit of speech (a group of sound breaks) is sometimes expressed as a "syllable".
  • "characters" in lyrics data are used synonymously with “syllables.”
  • the first lyric data includes “ko”, “n”, “ni”, “chi”, “wa”, “sa”, “ Contains text data indicating yo, o, na, and ra.
  • M(i) is associated with characters indicating "ko”, “n”, “ni”, “chi”, “wa”, “sa”, “yo”, “o”, “na”, and “ra”. 1 to n) set the order of the characters in the lyrics.
  • M(5) corresponds to the fifth character in the lyrics.
  • the first lyric data includes timing data in which utterance start time ts(i) and utterance stop time te(i) are set for each character M(i).
  • the utterance start time is time ts(1) and the utterance stop time is time te(1).
  • the utterance start time is time ts(n) and the utterance stop time is time te(n).
  • a period from time ts(i) to time te(i) corresponding to each character M(i) is referred to as a set period for uttering the character M(i).
  • the set period of vocalization indicates, for example, the period of ideal singing. As will be described below, the vocalization period of each character included in the synthesized singing sound is controlled based on the vocalization start instruction and the vocalization stop instruction by the performance signal.
  • the sound generation process outputs an instruction to generate or stop an audio signal corresponding to the utterance of each character based on the performance operation to the performance operation unit 321 .
  • FIG. 4 is a flowchart describing sound generation processing in one embodiment of the present invention. This processing is realized by the CPU of the control unit 301 developing the program stored in the storage unit 303 in the RAM of the storage unit 303 or the like and executing the program. This processing is started, for example, when the user instructs reproduction of music.
  • step S405 the input of the user's instruction to stop playing the music, or the reception of the performance signal
  • step S405 the input of the user's instruction to stop playing the music, or the reception of the performance signal
  • step S405 the input of the user's instruction to stop playing the music, or the reception of the performance signal
  • step S405 the input of the user's instruction to stop playing the music, or the reception of the performance signal
  • step S407 the processing of steps S403 and S404.
  • step S405 When the reading of the accompaniment data is completed by reading the accompaniment data to the end in the standby state (step S405; Yes), the control unit 301 ends the sound generation processing.
  • step S406 When the user inputs an instruction to stop playing music in the standby state (step S406; Yes), the control unit 301 ends the sound generation process.
  • step S407 When a performance signal is received from the performance operation unit 321 in the standby state (step S407; Yes), the control unit 301 executes instruction processing for generating an audio signal by the DSP (step S500). A detailed description of the instruction process for generating the audio signal will be given later.
  • step S500 A detailed description of the instruction process for generating the audio signal will be given later.
  • FIG. 5 is a flow chart showing the instruction process executed in step S500 of FIG.
  • control unit 301 sets the pitch based on the performance signal acquired from the performance operation unit 321 (step S501).
  • the control unit 301 determines whether or not the performance signal acquired from the performance operation unit 321 is an instruction to start vocalization (step S502).
  • control unit 301 determines that the performance signal is an instruction to start vocalization (step S502; Yes)
  • the count value tc at the time when the instruction to start vocalization is obtained refers to the first lyric data. It is determined whether or not it is within the set period of utterance corresponding to the character.
  • control unit 301 determines that the time at which the utterance start instruction is acquired is within the set utterance period corresponding to one of the characters M(i) (step S503; Yes), it falls within the set utterance period.
  • character M(p) is set as a character to be pronounced (step S504).
  • the control unit 301 outputs to the DSP an instruction to generate an audio signal based on the set pitch and utterance of the character M(p) (step S509), terminates the instruction processing, and performs step S403 shown in FIG. proceed to
  • step S503 determines that the time at which the vocalization start instruction is acquired is not within the vocalization setting period for any character.
  • the control unit 301 determines the time of the vocalization start instruction.
  • the central time tm(q ) is calculated (step S505). Assuming that the stop time te(q) is the “first time” and the start time ts(q+1) is the “second time”, the central time between the stop time te(q) and the start time ts(q+1) is the “second time”. 3 hours”.
  • step S506 If the count value tc is before the central time tm(q) (step S506; Yes), the control unit 301 sets the character M(q) corresponding to the set period before the central time tm(q). (step S507). Next, the control unit 301 outputs to the DSP an instruction to generate an audio signal based on the set pitch and utterance of the character M(q) (step S509), terminates the instruction processing, and performs step S403 shown in FIG. proceed to
  • step S506 If the acquired start instruction is not before the central time tm(q) (step S506; No), the control unit 301 reads the character M(q+1) corresponding to the set period after the central time tm(q) (step S508). Next, the control unit 301 outputs a signal for starting vocalization of the acquired pitch and character (step S509), ends the instruction processing, and proceeds to step S403 shown in FIG.
  • step S502 If it is determined that the performance signal acquired from the performance operation unit 321 is not an instruction to start vocalization, that is, it is an instruction to stop vocalization (step S502; No), the control unit 301 controls the set pitch and character M (q 4.) Outputs to the DSP an instruction to stop the generation of the audio signal generated based on the utterance (step S510), terminates the instruction processing, and proceeds to step S403 shown in FIG.
  • the control unit 301 determines whether or not the utterance start instruction satisfies the first condition. If the first condition is satisfied, the control unit 301 generates an audio signal based on the first utterance corresponding to the first character, and if the first condition is not satisfied, the control unit 301 generates the second character next to the first character. generates an audio signal based on the second utterance corresponding to .
  • the first condition is that the time at which the utterance start instruction is acquired is before the center time between the stop time of the first character and the start time of the second character.
  • control unit 301 specifies the set period to which the acquisition time of the utterance start instruction belongs or the set period closest to the acquisition time, and based on the utterance corresponding to the characters corresponding to the specified set period. Generate an audio signal.
  • the characters specified as the accompaniment sound progresses by reproducing the accompaniment sound data in the lyrics of the music are sequentially uttered at a pitch and timing corresponding to the performance operation. A sound is produced. Then, an audio signal representing the synthesized singing voice is output to the karaoke device 1 .
  • FIGS. 6-8 are diagrams showing the relationship between time and pitch in sound generation processing.
  • the control unit 301 receives from the performance operation unit 321 a performance signal including a vocalization start instruction associated with the pitch “G4” in the standby state of the sound generation process.
  • the control unit 301 executes instruction processing (step S500), and sets the pitch "G4" based on the performance signal (step S501).
  • the control unit 301 determines that the performance signal is an instruction to start vocalization (step S502; No), and refers to the first lyric data shown in FIG. is included in (belongs to) (step S503).
  • the control unit 301 determines that the time at which the start instruction was acquired is within the utterance set period corresponding to character M(1). (step S503; Yes), and the character "ko" corresponding to the character M(1) is set as a character to be pronounced (step S504).
  • the control unit 301 outputs to the DSP an instruction to generate an audio signal based on the vocalization of the set pitch "G4" and the character "ko” (step S509).
  • time ton(1) indicates the time when an instruction to generate an audio signal based on the set pitch "G4" and the character "ko" is output to the DSP.
  • the DSP of the control unit 301 starts generating an audio signal based on the instruction.
  • step S500 executes instruction processing (step S500), and sets the pitch "G4" based on the performance signal (step S501).
  • step S502 determines that the performance signal is an instruction to stop vocalization (step S502; No), and the DSP of the control unit 301 generates an audio signal based on the vocalization (character “ko”) at the set pitch “G4”. is output (step S510).
  • time toff(1) the time at which the instruction to stop the generation of the audio signal based on the set pitch "G4" and the character “ko" is output is denoted as time toff(1).
  • the DSP of the control unit 301 stops generating the audio signal based on the instruction.
  • the vocalization period ton(1) to toff(1) is the period during which an audio signal is generated based on the vocalization of the pitch "G4" and the character "ko".
  • the count value tc at which the vocalization start instruction is acquired is a period between the vocalization set period ts(1) to te(1) and the vocalization set period ts(2) to te(2), A case close to the set period ts(1) to te(1) will be described with reference to FIG. It is assumed that the control unit 301 receives from the performance operation unit 321 a performance signal including a vocalization start instruction associated with the pitch “G4” in the standby state of the sound generation process. In this case, the control unit 301 executes instruction processing (step S500), and sets the pitch "G4" based on the performance signal (step S501).
  • the control unit 301 determines that the performance signal is an instruction to start vocalization (step S502; No), refers to the first lyric data shown in FIG. It is determined whether or not it is included in the set period (step S503). Since the time at which the start instruction is acquired is not included in any of the set vocalization periods corresponding to each character M(i), the control unit 301 determines that the start instruction is not included in the set vocalization period ( Step S503; No). Next, the control unit 301 calculates the central time tm(i) from the set periods set immediately before and after the count value tc.
  • the control unit 301 sets the stop time te( 1) and the start time ts(2) is calculated (step S505).
  • the control unit 301 determines that the count value tc when the start instruction is acquired is before the central time tm(1) (step S506; Yes), and determines that the count value tc is before the central time tm(1)
  • the character "ko" (character M(1)) in the set period is set as the character to be spoken (step S507).
  • the vocalization period ton(1) to toff(1) is the period during which an audio signal is generated based on the vocalization of the pitch "G4" and the character "ko".
  • step S505 determines that the time at which the start instruction was acquired is not earlier than the central time tm(1) (step S506; No), and the character "n" of the set period after the central time tm(1) is displayed. (Character M(2)) is set as a character to be pronounced (step S508).
  • FIG. 9 is a functional block diagram showing the sound generation function in one embodiment of the invention. Note that part or all of the configuration that implements each function described below may be implemented by hardware.
  • the electronic musical instrument 3 includes a lyric data acquisition unit 31 (first acquisition unit), a vocalization control unit 32 (control unit), a signal generation unit 33, and a vocalization unit 33 as functional blocks for realizing a sound generation function for generating synthesized singing sounds.
  • a start instruction acquisition unit 34 (second acquisition unit) is included. Functions of these functional units are realized by cooperation of the control unit 301, the storage unit 303, a timer (not shown), and the like. Note that it is not essential for the functional block to include the signal generator 33 in the present invention.
  • the lyric data acquisition unit 31 acquires the first lyric data corresponding to the song ID from the singing sound synthesis server 2000 via the karaoke device 1 .
  • the utterance control unit 32 mainly executes the instruction processing shown in FIG.
  • the utterance start instruction acquisition unit 34 acquires an utterance start instruction.
  • the vocalization start instruction is acquired as a performance signal input from the user via the performance operation unit 321, for example.
  • the signal generation unit 33 corresponds to the DSP described above, and based on the instruction received from the utterance control unit 32, starts generating the audio signal or stops generating the audio signal.
  • the audio signal generated by the signal generator 33 is output to the outside via the interface 317 .
  • Sound generation processing that is partially different from the sound generation processing described in the first embodiment will be described with reference to FIGS. 4, 10, and 11.
  • FIG. This embodiment differs from the first embodiment in instruction processing for generating an audio signal. Therefore, portions different from the first embodiment will be described in detail, and the description of the first embodiment will be used for other portions. Also, in this embodiment, the velocity is treated as volume information.
  • the control unit 301 acquires the first lyric data from the storage unit 303 (step S401).
  • the control unit 301 executes initialization processing (step S402).
  • the "i" indicates the order of letters in the lyrics, as described above.
  • ts refers to the time when the immediately preceding utterance start instruction was acquired.
  • step S403 if a performance signal is received from the performance operation unit 321 (step S407; Yes), instruction processing for generating an audio signal is executed (step S500).
  • FIG. 10 is a flow chart explaining instruction processing for generating an audio signal. This process is executed in step S500 of FIG.
  • control section 301 sets the pitch based on the performance signal acquired from the performance operation section 321 (step S521).
  • the control unit 301 determines whether or not the performance signal acquired from the performance operation unit 321 is an instruction to start vocalization (step S522).
  • tc-ts is the elapsed time from the last acquisition of the vocalization start instruction to the present.
  • control unit 301 sets the character “ko” as the character to be uttered, and when tc ⁇ ts ⁇ t th is satisfied sets the same character as the character set in the previous utterance as the character to be uttered.
  • control unit 301 sets count value tc to time ts (step S527), terminates the instruction process, and proceeds to step S403 shown in FIG.
  • control unit 301 sets count value tc to time ts (step S527), terminates the instruction process, and proceeds to step S403 shown in FIG.
  • FIG. 11 is a diagram showing the relationship between time and pitch in sound generation processing.
  • utterances of pitch "G4" and the letter “ko”, pitch “A5" and the letter “ ⁇ ”, pitch "B5" and the letter “ ⁇ ” are illustrated as syllabic notes with pitch information. ing.
  • the control unit 301 acquires the first lyric data (step S401) and executes the initialization process (step S402).
  • time ts at which an instruction to generate an audio signal based on the set pitch "G4" and the character “ko" is output to the DSP is denoted as time ton(1).
  • the DSP of the control unit 301 starts generating an audio signal based on the instruction.
  • control section 301 receives a performance signal associated with the pitch "G4" from the performance operation section 321 in the standby process in the audio processing.
  • the control unit 301 executes instruction processing (step S500), and sets the pitch "G4" based on the performance signal (step S521).
  • the control unit 301 determines that the performance signal is an instruction to stop vocalization (step S522; No)
  • it outputs an instruction to stop generating an audio signal based on the vocalization of the set pitch "G4" and the character "ko”.
  • step S510 the instruction process is terminated, and the process proceeds to step S403 shown in FIG. In FIG.
  • time toff(1) represents the time at which an instruction to stop the generation of the audio signal was output to the DSP based on the set pitch "G4" and the character "ko".
  • the DSP of the control unit 301 stops generating the audio signal based on the instruction.
  • a period from ton(1) to toff(1) is a period during which an audio signal based on the utterance of the pitch "G4" and the character "ko" is generated.
  • control unit 301 receives a performance signal including a vocalization start instruction associated with the pitch “A5” from the performance operation unit 321 in the standby process in the audio processing.
  • the control unit 301 executes instruction processing (step S500), and sets the pitch "A5" based on the performance signal (step S521).
  • the predetermined period t th is, for example, in the range of 10 ms to 100 ms, and is assumed to be 100 ms in this embodiment.
  • step S524 determines that the volume is equal to or higher than the predetermined volume.
  • character M(2) next to character M(1) is set.
  • control unit 301 Since the character M(2) is 'n', the control unit 301 outputs to the DSP an instruction to generate an audio signal based on the pitch 'A5' and the utterance of the character 'n' (step S526). Control unit 301 sets count value tc as time ts (step S527), ends the instruction process, and proceeds to step S403 shown in FIG. In FIG. 11, a period from ton(2) to toff(2) is a period during which an audio signal based on the utterance of the pitch "A5" and the character " ⁇ " is generated.
  • step S500 executes instruction processing (step S500), and sets the pitch "B5" based on the performance signal (step S521).
  • tc-ts is shorter than the predetermined period t th , it is determined that tc-ts ⁇ t th is satisfied (step S523; Yes).
  • An instruction to generate an audio signal is output (step 526).
  • the control unit 301 outputs an instruction to generate an audio signal so as to continue the utterance of the immediately preceding character "n". Therefore, an audio signal is generated based on the utterance of the long vowel "-" at pitch "B5" in order to continuously utter the character " ⁇ ".
  • Control unit 301 sets count value tc as time ts (step S527), ends the instruction process, and proceeds to step S403 shown in FIG. In FIG. 11, a period from ton(3) to toff(3) is a period during which an audio signal is generated based on the utterance of the pitch "A5" and the character " ⁇ ".
  • the sound generating process if the period from the immediately preceding utterance start instruction to the next utterance start instruction is shorter than a predetermined period, the characters of the first lyric data are changed. You can prevent it from progressing.
  • the control unit 301 outputs an instruction to generate an audio signal so as to continue the first utterance corresponding to the instruction to start the first utterance.
  • syllable notes in the period from ton(3) to toff(3) are assigned a pitch of "B5" and a long note of "-".
  • the first lyric data stored in the storage unit 303 will be described with reference to FIG.
  • FIG. 12 is the first lyric data used in one embodiment of the present invention.
  • the first lyric data shown in FIG. 12 includes first phrases of "ko”, “n”, “ni”, “chi”, and “wa”, and second phrases of "sa”, “yo”, “o”, “na”, and “ra”. including.
  • the start time of the first utterance corresponds to tfs(1)
  • the stop time corresponds to tfe(1).
  • the start time of the second utterance corresponds to tfs(2)
  • the stop time corresponds to tfe( 2).
  • 13 and 14 are diagrams showing the relationship between time and pitch in sound generation processing.
  • 13 and 14 show utterance periods defined by phrases.
  • the utterance corresponding to the characters in the phrase may proceed at each key depression or according to the instruction processing shown in the second embodiment.
  • a center time tfm(1) between the stop time tfe(1) of the first phrase and the start time tfs(2) of the second phrase is preset. good too.
  • the control unit 301 determines whether or not the acquisition time of the utterance start instruction is earlier than the central time tfm(1).
  • control unit 301 determines that the utterance start instruction is before the center time tfm(1), the control unit 301 instructs the DSP to generate an audio signal based on the utterance corresponding to the first character of the first phrase. Output. After that, when the control unit 301 determines that the utterance start instruction is before the central time tfm(1), the control unit 301 continues to perform audio based on the utterance corresponding to the character from the first character of the second phrase. An instruction to generate the signal may be output to the DSP.
  • control unit 301 determines that the vocalization start instruction is after the central time tfm(1), it further determines whether the vocalization start instruction is after the second phrase start time tfs(2). judge. If the control unit 301 determines that the vocalization start instruction is later than the start time tfs(2) of the second phrase, the control unit 301 selects the characters corresponding to the vocalization of the second phrase from those that have not yet been vocalized. An instruction is output to the DSP to generate an audio signal based on the utterance. Specifically, as shown in FIG.
  • the control unit 301 determines that the vocalization start instruction is before the start time tfs(2) of the second phrase, the control unit 301 outputs the audio signal based on the vocalization corresponding to the first character of the characters corresponding to the vocalization. Generate. Specifically, as shown in FIG. 14, between the start time tfs(1) and the stop time tfe(1) of the first phrase, ⁇ ko'', ⁇ n'', ⁇ ni'', ⁇ chi'', ⁇ wa'', and ⁇ sa'' Assume that the audio signal is generated based on the utterance corresponding to the character ".
  • an utterance start instruction is obtained before the start time tfe(2) of the second phrase (time tfon)
  • an audio signal is generated based on the utterance corresponding to the character "sa" of the second phrase.
  • the control unit 301 outputs an instruction to stop generating the audio signal to the DSP.
  • the first condition is that the time when the utterance start instruction is acquired is earlier than the center time between the stop time of the first phrase and the start time of the second phrase.
  • the second condition is that the time when the instruction to start vocalization is acquired is later than the second vocalization start time tfs(2). In other words, the second condition is satisfied when the acquisition time of the utterance start instruction is later than the second utterance start time defined in the first lyric data.
  • FIG. 15 shows the second lyric data corresponding to the chorus part.
  • the second lyric data also has text data in which a plurality of characters to be pronounced are arranged in chronological order.
  • the second lyric data includes timing data defining start times and stop times of utterance for each of a plurality of characters along a predetermined time axis.
  • the second lyric data includes “a (a)” “a (a)” “a (a)” “a (a)” “a (a)” “a (a)” “o (o)” " It includes text data indicating o(o), o(o), o(o).
  • the second lyric data also includes timing data that defines the vocalization start time ts and the vocalization stop time te for each character.
  • N(3) corresponds to the third character in the lyrics.
  • the start time of utterance is time tcs(3) and the stop time is time tce(3).
  • the utterance period specified in the first lyric data overlaps with the utterance period specified in the second lyric data as shown in FIG. That is, the start times and end times of N(1) to N(n) shown in FIG. 15 and the start times and end times of M(1) to M(n) shown in FIG. I am doing it.
  • the control unit 301 may output to the DSP an instruction to generate an audio signal based on the utterance corresponding to the characters of the chorus part instead of the lead vocal part.
  • the control unit 301 replaces the first condition in the first embodiment with another condition.
  • the control unit 301 identifies the set period to which the acquisition time of the vocalization start instruction belongs or the set period closest to the acquisition time in the first lyric data. Then, if the second lyric data has a set period that temporally coincides with the specified set period, the control unit 301 replaces the audio signal based on the first or second utterance in the first lyric data with An audio signal is generated based on the utterance corresponding to the character corresponding to the time-matching set period in the second lyric data.
  • the utterance of the second lyric data is prioritized.
  • Such processing can also be applied when the second lyric data corresponds to the first lyric data only in a partial time domain.
  • the third time may be shifted forward or backward with respect to the central time between the stop time te(q) and the start time ts(q+1).
  • the electronic musical instrument 3 may be an electronic wind instrument. A case where an electronic wind instrument is applied as the electronic musical instrument 3 will be described below with reference to FIG.
  • FIG. 16 shows the hardware configuration when the electronic musical instrument 3A is an electronic wind instrument.
  • the performance operation section 321 includes operation keys 311 and a breath sensor 312 .
  • the electronic musical instrument 3A is provided with a plurality of sound holes provided in the musical instrument body, a plurality of operation keys 311 for changing the opening/closing state of the sound holes, and a breath sensor 312 .
  • a performer plays a plurality of operation keys 311
  • the opening/closing state of the tone holes changes and sounds of a predetermined scale are output.
  • a mouthpiece is attached to the instrument body, and a breath sensor 312 is provided inside the instrument body and near the mouthpiece.
  • the breath sensor 312 is a blow pressure sensor that detects the blow pressure of the user's (performer's) breath through the mouthpiece.
  • the breath sensor 312 detects the presence or absence of blowing, and also detects the strength and speed (momentum) of the blowing pressure at least when the electronic musical instrument 3A is playing.
  • the volume of vocalization is determined according to the magnitude of the pressure detected by the breath sensor 312 .
  • the magnitude of pressure detected by the breath sensor 312 is treated as volume information.
  • the first period from the instruction to start the first utterance to the instruction to start the second utterance is less than a predetermined period, and is detected as a passing sound peculiar to the wind instrument.
  • a passing sound peculiar to the wind instrument.
  • the sound generation process according to the embodiment of the present invention even if such a passing sound is generated in the middle of the performance, it is possible to prevent the position of the lyrics from advancing ahead of the performance. Singing sound can be generated.
  • the performance signal may be acquired from the outside via communication. Therefore, it is not essential to provide the performance operation section 321, and it is not essential that the sound generating device has the function and form of a musical instrument.
  • the same effect as the present invention may be obtained by reading a storage medium storing a control program represented by software for achieving the present invention into the present apparatus.
  • the read program code itself implements the novel functions of the present invention, and a non-transitory computer-readable recording medium storing the program code constitutes the present invention.
  • the program code may be supplied via a transmission medium or the like, in which case the program code itself constitutes the present invention.
  • ROM floppy disks, hard disks, optical disks, magneto-optical disks, CD-ROMs, CD-Rs, magnetic tapes, non-volatile memory cards, etc.
  • volatile memory e.g., DRAM (Dynamic Random Access Memory)
  • DRAM Dynamic Random Access Memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)
PCT/JP2021/046585 2021-03-09 2021-12-16 音生成装置およびその制御方法、プログラム、電子楽器 Ceased WO2022190502A1 (ja)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2023505112A JP7568055B2 (ja) 2021-03-09 2021-12-16 音生成装置およびその制御方法、プログラム、電子楽器
CN202180095312.3A CN117043853A (zh) 2021-03-09 2021-12-16 音生成装置及其控制方法、程序、电子乐器
US18/463,470 US20230419946A1 (en) 2021-03-09 2023-09-08 Sound generation device and control method thereof, program, and electronic musical instrument

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2021-037651 2021-03-09
JP2021037651 2021-03-09

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/463,470 Continuation US20230419946A1 (en) 2021-03-09 2023-09-08 Sound generation device and control method thereof, program, and electronic musical instrument

Publications (1)

Publication Number Publication Date
WO2022190502A1 true WO2022190502A1 (ja) 2022-09-15

Family

ID=83227880

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/046585 Ceased WO2022190502A1 (ja) 2021-03-09 2021-12-16 音生成装置およびその制御方法、プログラム、電子楽器

Country Status (4)

Country Link
US (1) US20230419946A1 (https=)
JP (1) JP7568055B2 (https=)
CN (1) CN117043853A (https=)
WO (1) WO2022190502A1 (https=)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN120015043B (zh) * 2023-11-16 2026-02-17 腾讯科技(深圳)有限公司 音频合成方法、装置、计算机设备和存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004287099A (ja) * 2003-03-20 2004-10-14 Sony Corp 歌声合成方法、歌声合成装置、プログラム及び記録媒体並びにロボット装置
JP2014062969A (ja) * 2012-09-20 2014-04-10 Yamaha Corp 歌唱合成装置および歌唱合成プログラム
JP2014098801A (ja) * 2012-11-14 2014-05-29 Yamaha Corp 音声合成装置
JP2016206496A (ja) * 2015-04-24 2016-12-08 ヤマハ株式会社 制御装置、合成歌唱音生成装置およびプログラム
JP2019184936A (ja) * 2018-04-16 2019-10-24 カシオ計算機株式会社 電子楽器、電子楽器の制御方法、及びプログラム
JP2019219570A (ja) * 2018-06-21 2019-12-26 カシオ計算機株式会社 電子楽器、電子楽器の制御方法、及びプログラム

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004287099A (ja) * 2003-03-20 2004-10-14 Sony Corp 歌声合成方法、歌声合成装置、プログラム及び記録媒体並びにロボット装置
JP2014062969A (ja) * 2012-09-20 2014-04-10 Yamaha Corp 歌唱合成装置および歌唱合成プログラム
JP2014098801A (ja) * 2012-11-14 2014-05-29 Yamaha Corp 音声合成装置
JP2016206496A (ja) * 2015-04-24 2016-12-08 ヤマハ株式会社 制御装置、合成歌唱音生成装置およびプログラム
JP2019184936A (ja) * 2018-04-16 2019-10-24 カシオ計算機株式会社 電子楽器、電子楽器の制御方法、及びプログラム
JP2019219570A (ja) * 2018-06-21 2019-12-26 カシオ計算機株式会社 電子楽器、電子楽器の制御方法、及びプログラム

Also Published As

Publication number Publication date
JP7568055B2 (ja) 2024-10-16
CN117043853A (zh) 2023-11-10
US20230419946A1 (en) 2023-12-28
JPWO2022190502A1 (https=) 2022-09-15

Similar Documents

Publication Publication Date Title
CN110390923B (zh) 电子乐器、电子乐器的控制方法以及存储介质
CN110390922B (zh) 电子乐器、电子乐器的控制方法以及存储介质
US12106745B2 (en) Electronic musical instrument and control method for electronic musical instrument
US12183319B2 (en) Electronic musical instrument, method, and storage medium
US12499858B2 (en) Electronic musical instrument, method, and storage medium
US11854521B2 (en) Electronic musical instruments, method and storage media
JP7578156B2 (ja) 電子楽器、電子楽器の制御方法、及びプログラム
JP6760457B2 (ja) 電子楽器、電子楽器の制御方法、及びプログラム
JP6766935B2 (ja) 電子楽器、電子楽器の制御方法、及びプログラム
JP2026042060A (ja) 音制御装置およびその制御方法、プログラム、電子楽器
JP7568055B2 (ja) 音生成装置およびその制御方法、プログラム、電子楽器
WO2023058173A1 (ja) 音制御装置およびその制御方法、電子楽器、プログラム
JPWO2022190502A5 (https=)
JP5106437B2 (ja) カラオケ装置及びその制御方法並びにその制御プログラム
JP2001042879A (ja) カラオケ装置
CN116324971A (zh) 语音合成方法及程序
JP2002221978A (ja) ボーカルデータ生成装置、ボーカルデータ生成方法および歌唱音合成装置
JPH04146473A (ja) 電子音声楽器
JP2022038903A (ja) 電子楽器、電子楽器の制御方法、及びプログラム
JPH0895588A (ja) 音声合成装置
JP7666625B2 (ja) 音制御装置およびその制御方法、電子楽器、プログラム
JP7158331B2 (ja) カラオケ装置
WO2023120121A1 (ja) 子音長変更装置、電子楽器、楽器システム、方法及びプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21930347

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180095312.3

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 2023505112

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21930347

Country of ref document: EP

Kind code of ref document: A1