WO2022208627A1 - 歌唱音出力システムおよび方法 - Google Patents
歌唱音出力システムおよび方法 Download PDFInfo
- Publication number
- WO2022208627A1 WO2022208627A1 PCT/JP2021/013379 JP2021013379W WO2022208627A1 WO 2022208627 A1 WO2022208627 A1 WO 2022208627A1 JP 2021013379 W JP2021013379 W JP 2021013379W WO 2022208627 A1 WO2022208627 A1 WO 2022208627A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound
- singing
- information
- syllable
- data
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 33
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 14
- 230000002123 temporal effect Effects 0.000 claims description 7
- 230000001360 synchronised effect Effects 0.000 abstract description 7
- 230000015572 biosynthetic process Effects 0.000 abstract description 2
- 238000003786 synthesis reaction Methods 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 14
- 230000006870 function Effects 0.000 description 12
- 230000000694 effects Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 235000006025 Durio zibethinus Nutrition 0.000 description 1
- 240000000716 Durio zibethinus Species 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000030279 gene silencing Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
- G10H1/361—Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
- G10H1/361—Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
- G10H1/366—Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems with means for modifying or correcting the external signal, e.g. pitch correction, reverberation, changing a singer's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10G—REPRESENTATION OF MUSIC; RECORDING MUSIC IN NOTATION FORM; ACCESSORIES FOR MUSIC OR MUSICAL INSTRUMENTS NOT OTHERWISE PROVIDED FOR, e.g. SUPPORTS
- G10G1/00—Means for the representation of music
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0033—Recording/reproducing or transmission of music for electrophonic musical instruments
- G10H1/0041—Recording/reproducing or transmission of music for electrophonic musical instruments in coded form
- G10H1/0058—Transmission between separate instruments or between individual components of a musical system
- G10H1/0066—Transmission between separate instruments or between individual components of a musical system using a MIDI interface
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/005—Non-interactive screen display of musical or status data
- G10H2220/011—Lyrics displays, e.g. for karaoke applications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2240/00—Data organisation or data communication aspects, specifically adapted for electrophonic musical tools or instruments
- G10H2240/325—Synchronizing two or more audio tracks or files according to musical features or musical timings
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/315—Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
- G10H2250/455—Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
Definitions
- the present invention relates to a singing sound output system and method for outputting singing sounds.
- a technique for generating singing sounds in response to performance operations is known.
- the singing sound synthesizer disclosed in Patent Document 1 automatically advances lyrics one character or one syllable in accordance with real-time performance to generate singing sounds.
- Patent Document 1 does not disclose outputting singing sounds in real time together with accompaniment. If a singing sound is output in real time together with an accompaniment, it is difficult to generate the singing sound exactly at the desired timing. For example, even if the performance operation is started at the desired timing, the actual start of singing is delayed because it takes a long time to synthesize and pronounce the singing sound. Therefore, there is room for improvement in terms of outputting the singing sound at the intended timing in accordance with the accompaniment.
- One object of the present invention is to provide a singing sound output system and method capable of outputting singing sounds at the timing of inputting sound information in synchronization with accompaniment.
- a teaching unit that indicates to the user the progression position in singing data that is temporally associated with accompaniment data and includes a plurality of syllables, and at least one piece of sound information input by a performance.
- a singing sound output system comprising: an output unit for synchronizing and outputting accompaniment sounds based on accompaniment data.
- the singing sound can be output in synchronization with the accompaniment at the timing when the sound information is input.
- FIG. 1 is a block diagram of a singing sound output system
- FIG. 1 is a diagram showing the overall configuration of a singing sound output system according to the first embodiment of the present invention.
- This singing sound output system 1000 includes a PC (personal computer) 101 , a cloud server 102 and a sound output device 103 .
- PC 101 and sound output device 103 are communicably connected to cloud server 102 via communication network 104 such as the Internet.
- a keyboard 105, a wind instrument 106, and a drum 107 exist as items and devices for inputting sounds in the environment where the PC 101 is used.
- the keyboard 105 and drums 107 are electronic musical instruments used to input MIDI (Musical Instrument Digital Interface) signals.
- the wind instrument 106 is an acoustic instrument used for inputting monophonic analog sounds. Keyboard 105 and wind instrument 106 can also input pitch information. Note that the wind instrument 106 may be an electronic instrument, and the keyboard 105 and drum 107 may be acoustic instruments.
- These musical instruments are examples of devices for inputting sound information, and are played by the user on the PC 101 side.
- the utterance of the user on the PC 101 side may also be used as means for inputting analog sound, in which case the human voice is input as analog sound. Therefore, the concept of "performance" for inputting sound information in the present embodiment also includes the input of real voice.
- the device for inputting sound information does not have to be in the form of a musical instrument.
- the user on the PC 101 side plays the musical instrument while listening to the accompaniment.
- PC 101 transmits singing data 51 , timing information 52 and accompaniment data 53 (all of which will be described later with reference to FIG. 3 ) to cloud server 102 .
- the cloud server 102 synthesizes the singing sound based on the sound produced by the performance of the user on the PC 101 side.
- Cloud server 102 transmits the singing sound, timing information 52 and accompaniment data 53 to sound output device 103 .
- the sound output device 103 is a device having a speaker function.
- the sound output device 103 outputs the received singing sound and accompaniment data 53 .
- the sound output device 103 outputs the singing sound and the accompaniment data 53 in synchronization with each other based on the timing information 52 .
- the form of "output" here is not limited to reproduction, but includes transmission to an external device and recording to a recording medium.
- FIG. 2 is a block diagram of the singing sound output system 1000.
- the PC 101 has a CPU 11 , a ROM 12 , a RAM 13 , a storage section 14 , a timer 15 , an operation section 16 , a display section 17 , a sound generation section 18 , an input section 8 and various I/Fs (interfaces) 19 . These components are connected together by a bus 10 .
- the CPU 11 controls the PC 101 as a whole.
- the ROM 12 stores programs executed by the CPU 11 as well as various data.
- the RAM 13 provides a work area when the CPU 11 executes programs.
- the RAM 13 temporarily stores various information.
- Storage unit 14 includes a non-volatile memory.
- a timer 15 measures time. Note that the timer 15 may be of a counter type.
- the operation unit 16 includes a plurality of operators for inputting various kinds of information, and receives instructions from the user.
- the display unit 17 displays various information.
- the sound generator 18 includes a tone generator circuit, an effect circuit and a sound system.
- the input unit 8 includes an interface for acquiring sound information from electronic sound information input devices such as the keyboard 105 and drums 107 .
- the input unit 8 also includes a device such as a microphone for acquiring sound information from a device for inputting acoustic sound information such as the wind instrument 106 .
- Various I/Fs 19 are connected to the communication network 104 (FIG. 1) wirelessly or by wire.
- the cloud server 102 has a CPU 21, a ROM 22, a RAM 23, a storage section 24, a timer 25, an operation section 26, a display section 27, a sound generation section 28, and various I/Fs 29. These components are connected to each other by bus 20 . The configurations of these components are the same as those indicated by reference numerals 11 to 17 and 19 in the PC 101. FIG.
- the sound output device 103 has a CPU 31, a ROM 32, a RAM 33, a storage section 34, a timer 35, an operation section 36, a display section 37, a sound generation section 38, and various I/Fs 39. These components are connected together by a bus 30 . The configurations of these components are the same as those indicated by reference numerals 11 to 19 in the PC 101. FIG.
- FIG. 3 is a functional block diagram of the singing sound output system 1000.
- the singing sound output system 1000 has functional blocks 110 .
- the functional block 110 includes a teaching section 41, an acquiring section 42, a syllable specifying section 43, a timing specifying section 44, a synthesizing section 45, an output section 46, and a phrase generating section 47 as individual functional sections.
- the functions of the teaching unit 41 and the acquiring unit 42 are implemented by the PC 101 as an example.
- Each of these functions is implemented in software by a program stored in the ROM 12 .
- each function is provided by the CPU 11 developing necessary programs in the RAM 13 and executing them, and controlling various calculations and hardware resources.
- these functions are realized mainly by the cooperation of the CPU 11, ROM 12, RAM 13, timer 15, display section 17, sound generation section 18, input section 8 and various I/Fs 19.
- the programs executed here include sequence software.
- the functions of the syllable identification unit 43, the timing identification unit 44, the synthesis unit 45, and the phrase generation unit 47 are realized by the cloud server 102. Each of these functions is implemented in software by a program stored in the ROM 22 . These functions are realized mainly by the cooperation of the CPU 21, ROM 22, RAM 23, timer 25 and various I/Fs 29.
- the function of the output unit 46 is realized by the sound output device 103 .
- the functions of the output unit 46 are implemented in software by a program stored in the ROM 32 . These functions are realized mainly by the cooperation of the CPU 31, ROM 32, RAM 33, timer 35, sound generator 38 and various I/Fs 39. FIG.
- the singing sound output system 1000 refers to singing data 51, timing information 52, accompaniment data 53 and phrase database 54.
- the phrase database 54 is pre-stored in the ROM 12, for example. Note that the phrase generator 47 and the phrase database 54 are not essential in this embodiment. These will be described in a third embodiment, which will be described later.
- the singing data 51, the timing information 52, and the accompaniment data 53 are associated with each other and stored in the ROM 12 in advance.
- the accompaniment data 53 is information for reproducing the accompaniment of each piece of music recorded as sequence data.
- the singing data 51 includes a plurality of syllables.
- the singing data 51 includes lyric text data and a phoneme information database.
- the lyric text data is data describing lyrics, and the lyrics of each song are described in units of syllables.
- the accompaniment positions of the accompaniment data 53 and the syllables of the singing data 51 are temporally associated by the timing information 52 .
- the teaching unit 41 indicates (teach) the progress position in the singing data 51 to the user.
- the acquisition unit 42 acquires at least one piece of sound information N (see FIG. 4) input by a performance.
- the syllable identification unit 43 identifies syllables corresponding to the acquired sound information N from a plurality of syllables in the singing data 51 .
- the timing specifying unit 44 associates the sound information N with the difference ⁇ T (see FIG. 4) as relative information indicating relative timing with respect to the specified syllable.
- the synthesizing unit 45 synthesizes singing sounds based on the identified syllables.
- the output unit 46 synchronizes and outputs the synthesized singing sound and the accompaniment sound based on the accompaniment data 53 based on the relative information.
- FIG. 4 is a timing chart of the process of outputting singing sounds through performance.
- the syllable corresponding to the progression position in the singing data 51 is displayed to the user on the PC 101, as shown in FIG.
- syllables are displayed in order like ⁇ sa'', ⁇ ku'', and ⁇ ra''.
- the sounding start timing t (t1 to t3) is defined by the temporal correspondence with the accompaniment data 53, and is the sounding start timing of the original syllable defined in the singing data 51.
- time t1 indicates the pronunciation start position of the syllable “sa” on the singing data 51 .
- the accompaniment based on the accompaniment data 53 also progresses in parallel with the syllable progress instruction.
- the user plays along with the progression of the indicated syllables.
- a MIDI signal is input by playing the keyboard 105 capable of inputting pitch information.
- the user who is the performer, sequentially presses the keys corresponding to the syllables in time with the start timings of the syllables "sa", "ku", and "ra”.
- sound information N (N1 to N3) is obtained sequentially.
- the sound generation length of each sound information N is the time from the input start timing s (s1 to s3) to the input end timing e (e1 to e3).
- the input start timing s corresponds to note-on, and the input end timing e corresponds to note-off.
- the sound information N includes pitch information and velocity.
- the user may deliberately shift the actual input start timing s with respect to the pronunciation start timing t.
- the lag time of the input start timing s with respect to the pronunciation start timing t is calculated as a temporal difference ⁇ T ( ⁇ T1 to T3) (relative information).
- the difference ⁇ T is calculated for each syllable and associated with each syllable.
- the cloud server 102 synthesizes a singing sound based on the sound information N and sends it to the sound output device 103 together with the accompaniment data 53 .
- the sound output device 103 synchronizes and outputs the singing sound and the accompaniment sound based on the accompaniment data 53 . At this time, the sound output device 103 outputs the accompaniment sound at the set constant tempo. As for the singing sound, the sound output device 103 outputs while matching each syllable with the accompaniment position based on the timing information 52 . Processing time is required from the input of the sound information N to the output of the singing sound. Therefore, the sound output device 103 uses delay processing to delay the output of the accompaniment sound in order to match each syllable with the accompaniment position.
- the sound output device 103 adjusts the output timing by referring to the difference ⁇ T corresponding to each syllable.
- the singing sound is started to be output according to the input timing (at the input start timing s).
- the output (pronunciation) of the syllable “ku” is started at a timing earlier than the pronunciation start timing t2 by the difference ⁇ T2.
- the output (pronunciation) of the syllable "ra” is started at a timing later than the pronunciation start timing t3 by the difference ⁇ T3.
- the pronunciation of each syllable ends (silenced) at the time corresponding to the input end timing e. Therefore, accompaniment sounds are output at a fixed tempo, and singing sounds are output at timings corresponding to performance timings. Therefore, the singing sound can be output at the timing when the sound information N is input in synchronization with the accompaniment.
- FIG. 5 is a flow chart showing system processing for outputting singing sounds by a performance performed by the singing sound output system 1000.
- PC processing executed by the PC 101 PC processing executed by the PC 101
- cloud server processing executed by the cloud server 102 and sound output device processing executed by the sound output device 103 are executed in parallel.
- the PC processing is realized by the CPU 11 developing a program stored in the ROM 12 in the RAM 13 and executing the program.
- the cloud server processing is realized by the CPU 21 expanding the program stored in the ROM 22 into the RAM 23 and executing the program.
- the sound output device processing is realized by the CPU 31 developing a program stored in the ROM 32 in the RAM 33 and executing the program. Each of these processes is started when the PC 101 is instructed to start the system process.
- step S101 the CPU 11 of the PC 101 selects a song to be played this time (hereinafter referred to as a selected song) from among a plurality of prepared songs based on an instruction from the user.
- the performance tempo of a song is determined in advance by default for each song.
- the CPU 11 may change the tempo to be set based on instructions from the user when the piece of music to be played is selected.
- step S102 the CPU 11 transmits related data (singing data 51, timing information 52, accompaniment data 53) corresponding to the selected song to the cloud server 102 through various I/Fs 19.
- the CPU 11 starts teaching the advancing position.
- the CPU 11 transmits to the cloud server 102 a notification that teaching of the advancing position has started.
- the teaching process here is realized by executing sequence software as an example.
- the CPU 11 (teaching unit 41) uses the timing information 52 to teach the current advancing position.
- the display unit 17 displays lyrics corresponding to the syllables in the singing data 51 .
- the CPU 11 tells the progression position on the displayed lyrics.
- the teaching unit 41 indicates the progression position by changing the display mode such as the color of the lyrics at the current position, or by moving the cursor position or the position of the lyrics themselves.
- the CPU 11 indicates the progress position by reproducing the accompaniment data 53 at the set tempo.
- the method of indicating the advancing position is not limited to these exemplified modes, and various methods of visual or auditory recognition can be adopted. For example, a method of indicating the note at the current position on the displayed musical score may be used. Alternatively, after indicating the start timing, a metronome sound may be generated. At least one technique may be adopted, and a plurality of techniques may be combined.
- step S104 the CPU 11 (acquisition unit 42) executes sound information acquisition processing.
- the user for example, plays along with the lyrics while confirming the taught progression position (for example, while listening to the accompaniment).
- the CPU 11 acquires, as the sound information N, MIDI data or analog sound produced by the performance.
- the sound information N usually includes input start timing s, input end timing e, pitch information and velocity information. Note that the pitch information is not necessarily included as in the case where the drum 107 is played. Velocity information may be canceled.
- the input start timing s and the input end timing e are defined by relative time to the accompaniment progression. Note that when an analog sound such as a human voice is acquired by a microphone, audio data is acquired as the sound information N.
- step S105 the CPU 11 transmits the sound information N acquired in step S104 to the cloud server 102.
- step S106 it is determined whether or not the selected piece of music has ended, that is, whether or not teaching of the progression position up to the last position in the selected piece of music has been completed. Then, if the selected song has not ended, the CPU 11 returns to step S104. Therefore, the sound information N acquired according to the performance of the music is transmitted to the cloud server 102 at any time until the selected music is finished. When the selected piece of music ends, the CPU 11 sends a notification to that effect to the cloud server 102 and terminates the PC processing.
- step S201 when the CPU 21 of the cloud server 102 receives related data corresponding to the selected song through various I/Fs 29, the process proceeds to step S202.
- step S202 CPU21 transmits the received related data to the sound output device 103 through various I/F29. Note that the singing data 51 need not be transmitted to the sound output device 103 .
- the CPU 21 starts a series of processes (S204 to S209). At the start of this series of processing, the CPU 21 executes the sequence software and advances the time while waiting for reception of the next sound information N using the received related data. In step S204, the CPU 21 receives the sound information N.
- step S205 the CPU 21 (syllable identification unit 43) identifies the syllable corresponding to the received sound information N.
- the CPU 21 calculates, for each syllable, the difference ⁇ T between the input start timing s in the sound information N and the pronunciation start timing t in each of the plurality of syllables in the singing data 51 corresponding to the selected piece of music. Then, the CPU 21 identifies the syllable with the smallest difference ⁇ T among the plurality of syllables in the singing data 51 as the syllable corresponding to the sound information N received this time.
- the CPU 21 identifies the syllable "ku" as the syllable corresponding to the sound information N2. In this way, for each sound information N, the syllable corresponding to the pronunciation start timing t closest to the input start timing s is specified as the corresponding syllable.
- the CPU 21 determines the sounding/silencing timing, pitch, and velocity of the sound information N by analysis.
- step S206 the CPU 21 (timing identification unit 44) executes timing identification processing. That is, the CPU 21 associates the sound information N received this time with the syllable specified as the syllable corresponding to the sound information N with the difference ⁇ T.
- step S207 the CPU 21 (synthesizing unit 45) synthesizes singing sounds based on the identified syllables.
- the pitch of the singing sound is determined by the pitch information of the corresponding sound information N. If the sound information N is a drum sound, the pitch of the singing sound may be, for example, a constant pitch.
- the pronunciation timing and the silence timing are determined by the pronunciation start timing t and the input end timing e (or pronunciation length) of the corresponding sound information N.
- FIG. Therefore, a singing sound is synthesized from the syllables corresponding to the sound information N at the pitch determined by the performance.
- the input end timing e may be modified so that the sound is forcibly silenced before the original pronunciation timing of the next syllable.
- step S208 the CPU 21 executes data transmission. That is, the CPU 21 transmits the synthesized singing sound, the difference ⁇ T corresponding to the syllables, and the velocity information at the time of performance to the sound output device 103 through various I/Fs 29 .
- step S209 the CPU 21 determines whether or not the selected music has ended, that is, whether or not it has received a notification from the PC 101 that the selected music has ended. Then, if the selected song has not ended, the CPU 21 returns to step S204. Therefore, the singing sound based on the syllables corresponding to the sound information N is synthesized and transmitted at any time until the selected song is finished. It should be noted that the CPU 21 may determine that the selected piece of music has ended when a predetermined period of time has elapsed since the last received sound information N data has been processed. When the selected music ends, the CPU 21 ends the cloud server processing.
- step S301 when the CPU 31 of the sound output device 103 receives related data corresponding to the selected song through various I/Fs 39, the process proceeds to step S302.
- step S302 CPU 31 receives the data (singing sound, difference ⁇ T, velocity) transmitted from cloud server 102 in step S208.
- step S303 the CPU 31 (output unit 46) executes synchronous output of the singing sound and the accompaniment based on the received singing sound and the difference ⁇ T, the already received accompaniment data 53, and the timing information 52.
- the CPU 31 outputs accompaniment sounds based on the accompaniment data 53, and concurrently outputs singing sounds while adjusting the output timing based on the timing information and the difference ⁇ T.
- reproduction is employed as a typical mode of synchronous output of accompaniment sounds and singing sounds. Therefore, the sound output device 103 can listen to the performance of the user of the PC 101 in synchronization with the accompaniment.
- the mode of synchronous output is not limited to reproduction, and may be recorded as an audio file in the storage unit 34 or may be transmitted to an external device through various I/Fs 39 .
- step S304 the CPU 31 determines whether or not the selected music has ended, that is, whether or not it has received a notification from the cloud server 102 indicating that the selected music has ended. Then, if the selected song has not ended, the CPU 31 returns to step S302. Therefore, the synchronous output of the received singing sound is continued until the selected song is finished. It should be noted that the CPU 31 may determine that the selected song has ended when a predetermined period of time has elapsed since the processing of the last received data ended. When the selected song ends, the CPU 31 terminates the sound output device processing.
- the syllable corresponding to the sound information N obtained while indicating the progress position in the singing data 51 to the user is specified from a plurality of syllables in the singing data 51 .
- Relative information (difference ⁇ T) is associated with sound information N, and singing sounds are synthesized based on the specified syllables.
- the singing sound and the accompaniment sound based on the accompaniment data 53 are synchronously output. Therefore, the singing sound can be output at the timing when the sound information N is input in synchronization with the accompaniment.
- the sound information N includes pitch information
- the sound information N includes velocity information
- the singing sound can be output at a volume corresponding to the strength of the performance.
- the related data (singing data 51, timing information 52, accompaniment data 53) was transmitted to the cloud server 102 and the sound output device 103 after the selected music was determined, but is not limited to this.
- the cloud server 102 or the sound output device 103 may store related data for a plurality of songs in advance. Then, when the selected music is determined, information specifying the selected music may be transmitted to the cloud server 102 and further to the sound output device 103 .
- FIG. 6 is a timing chart of the process of outputting singing sounds through performance.
- the order of a plurality of syllables in the singing data 51 is predetermined.
- the singing sound output system 1000 indicates to the user the next syllable in the singing data while waiting for the input of the sound information N, and each time the sound information N is input, the progress is displayed.
- the progress instruction of the accompaniment data also waits until there is a performance input together with the progress of the syllables.
- the cloud server 102 identifies, as the syllable corresponding to the input sound information N, the syllable that was the next syllable in the order of progress when the sound information N was input. Therefore, the corresponding syllables are identified in order each time the key is turned on.
- the actual input start timing s may deviate from the pronunciation start timing t.
- the cloud server 102 calculates the shift time of the input start timing s from the pronunciation start timing t as a temporal difference ⁇ T ( ⁇ T1 to T3) (relative information).
- the difference ⁇ T is calculated for each syllable and associated with each syllable.
- the cloud server 102 synthesizes a singing sound based on the sound information N and sends it to the sound output device 103 together with the accompaniment data 53 .
- the syllable pronunciation start timing t' (t1' to t3') is the syllable pronunciation start timing at the time of output.
- the syllable pronunciation start timing t' is determined by the input start timing s.
- the progression of accompaniment tones at the time of output also changes at any time depending on the syllable sounding start timing t'.
- the sound output device 103 synchronizes and outputs the singing sound and the accompaniment sound based on the accompaniment data 53 by outputting while adjusting the output timing based on the timing information and the difference ⁇ T. At that time, the sound output device 103 outputs the singing sound at the syllable pronunciation start timing t'. The sound output device 103 outputs the accompaniment sound while matching each syllable with the accompaniment position based on the difference ⁇ T. The sound output device 103 uses delay processing to delay the output of the accompaniment sound in order to match each syllable with the accompaniment position. Therefore, the singing sounds are output at timings corresponding to the performance timings, and the tempo of the accompaniment sounds changes in accordance with the performance timings.
- the CPU 11 uses the timing information 52 to teach the current progress position.
- the CPU 11 acquire unit 42
- the CPU 11 executes sound information acquisition processing.
- the user plays and inputs the sound corresponding to the next syllable while confirming the progression position.
- the CPU 11 waits for the syllable teaching progress and the accompaniment progress until the next sound information N is input. Therefore, the CPU 11 teaches the next syllable while waiting for input of the sound information N, and each time the sound information N is input, advances the syllable indicating the progress position to the next syllable.
- the CPU 11 matches the progress of the accompaniment to the teaching progress of the syllables.
- the CPU 21 advances time while waiting for reception of the sound information N.
- the CPU 21 receives the sound information N as needed, and advances the time when the sound information N is received. Therefore, the progress of time is waited until the next sound information N is received.
- the CPU 21 Upon receiving the sound information N, the CPU 21 (syllable identification unit 43) identifies the syllable corresponding to the received sound information N in step S205. Here, the CPU 21 identifies the syllable that was the next syllable in the order of progress when the sound information N was input as the syllable corresponding to the sound information N received this time. Therefore, the corresponding syllables are identified in order each time there is a key-on due to performance.
- step S206 the CPU 21 calculates the difference ⁇ T and associates it with the specified syllable. That is, as shown in FIG. 6, the CPU 21 obtains, as a difference ⁇ T, the shift time between the input start timing s and the pronunciation start timing t corresponding to the specified syllable. Then, the CPU 21 associates the determined difference ⁇ T with the specified syllable.
- the CPU 21 transmits the synthesized singing sound, the difference ⁇ T corresponding to the syllables, and the velocity at the time of performance to the sound output device 103 through various I/Fs 29.
- the CPU 31 outputs the received singing sound and the difference ⁇ T based on the already received accompaniment data 53 and the timing information 52. synchronous output of singing voice and accompaniment.
- the CPU 31 refers to the difference ⁇ T and adjusts the output timing of the accompaniment sound and the singing sound, thereby performing output processing while matching each syllable with the accompaniment position.
- the singing sound starts to be output according to the input timing (at the input start timing s).
- the output (pronunciation) of the syllable “ku” is started at a timing earlier than the pronunciation start timing t2 by the difference ⁇ T2.
- the output (pronunciation) of the syllable "ra” is started at a timing later than the pronunciation start timing t3 by the difference ⁇ T3.
- the pronunciation of each syllable ends at the time corresponding to the input end timing e.
- the performance tempo of the accompaniment sound changes according to the performance timing.
- the CPU 31 corrects the position of the sounding start timing t2 to the position of the sounding start timing t2' and outputs the accompaniment sound.
- accompaniment sounds are output at a variable tempo
- singing sounds are output at timings that correspond to performance timings. Therefore, the singing sound can be output at the timing when the sound information N is input in synchronization with the accompaniment.
- the teaching unit 41 indicates the next syllable while waiting for input of the sound information N, and each time the sound information N is input, the syllable indicating the progress position is advanced to the next syllable. advance one. Then, the syllable specifying unit 43 specifies the syllable that was the next syllable in the order of progress when the sound information N was input as the syllable corresponding to the input sound information N.
- FIG. Therefore, the same effects as in the first embodiment can be obtained in terms of outputting the singing sound at the timing when the sound information N is input in synchronization with the accompaniment. Moreover, even when the user performs the music at a free tempo, the singing sound can be output in synchronization with the accompaniment according to the user's performance tempo.
- the relative information associated with the sound information N is not limited to the difference ⁇ T.
- the relative information indicating the relative timing with respect to the specified syllable may be the relative time of the sound information N and the relative time of each syllable with respect to a certain time defined by the timing information 52 .
- FIG. 1 A third embodiment of the present invention will be described with reference to FIGS. 1 to 3 and 7.
- FIG. If a device such as a drum that cannot input pitch information can be used to produce singing sounds, the enjoyment will increase. Therefore, in this embodiment, the drum 107 is used for performance input.
- a singing phrase is generated for each unit of a series of sound information N obtained thereby.
- the basic configuration of the singing sound output system 1000 is the same as that of the first embodiment. In the present embodiment, it is assumed that performance input is performed on the drum 107, and since it is premised that there is no pitch information, control different from that in the first embodiment is applied.
- the phrase generation unit 47 analyzes the accent of the series of sound information N from the velocity of each piece of sound information N in the series of sound information N, and based on the accent, a plurality of syllables corresponding to the series of sound information N. Generate Phrases.
- the phrase generation unit 47 generates phrases corresponding to a series of sound information N by extracting phrases that match the accent from a phrase database 54 that includes a plurality of phrases prepared in advance. Phrases having the number of syllables forming a series of phonetic information N are extracted.
- the accent of the series of sound information N refers to the accent of the relative strength of the sound.
- Phrasal accent refers to the pitch accent due to the relative pitch of each syllable. Therefore, the intensity of the sound of the sound information N corresponds to the pitch of the phrase.
- FIG. 7 is a flow chart showing system processing for outputting singing sounds by a performance performed by the singing sound output system 1000.
- FIG. Execution subjects, execution conditions, and start conditions of the PC processing, the cloud server processing, and the sound output device processing in this system processing are the same as those of the system processing shown in FIG.
- step S401 the CPU 11 of the PC 101 transitions to a performance start state based on an instruction from the user. At that time, the CPU 11 transmits a notification to the cloud server 102 through various I/Fs 19 to the effect that it has shifted to the performance start state.
- step S402 when the user strikes the drum 107, the CPU 11 (acquisition unit 42) acquires the corresponding sound information N.
- the sound information N is MIDI data or analog sound.
- the sound information N includes at least information indicating input start timing (hitting on) and information indicating velocity.
- step S403 the CPU 11 (acquisition unit 42) determines whether or not the current series of sound information N has been finalized. For example, when the first sound information N is input within a first predetermined period of time after the transition to the performance start state, the CPU 11 performs It is determined that a series of sound information N has been finalized.
- a series of sound information N is assumed to be a collection of a plurality of sound information N, but may be one sound information N.
- step S404 the CPU 11 transmits the acquired series of sound information N to the cloud server 102.
- step S405 the CPU 11 determines whether or not the user has given an instruction to end the performance state. Then, the CPU 11 returns to step S402 if the end of the performance has not been instructed, and transmits a notification to that effect to the cloud server 102 if the end of the performance has been instructed, and terminates the PC processing. Therefore, each time a series of sound information N is determined, the series of sound information N is transmitted.
- step S501 the CPU 21 receives a series of processes (S502 to S506) in step S501.
- step S502 the CPU 21 receives a series of sound information N transmitted from the PC 101 at step S404.
- step S503 the CPU 21 (phrase generation unit 47) generates one phrase for the current series of sound information N.
- the method is exemplified below.
- the CPU 21 analyzes the accent of a series of sound information N from the velocity of each sound information N, and extracts from the phrase database 54 a phrase that matches the accent and the number of syllables forming the series of sound information N. .
- the extraction range may be narrowed down depending on the conditions.
- the phrase database 54 is classified according to conditions, and the user can set at least one condition such as "noun", "fruit”, “stationery”, “color”, "size”, etc. good.
- step S504 the CPU 21 (synthesizing unit 45) synthesizes singing sounds from the generated phrases.
- the pitch of the singing sound may conform to the pitch of each syllable set in the phrase.
- step S505 CPU21 transmits a singing sound to the sound output device 103 through various I/F29.
- step S506 the CPU 21 determines whether or not it has received a notification from the PC 101 indicating that the performance has been finished. If the CPU 21 has not received the notification that the performance end has been instructed, the CPU 21 returns to step S502. When the CPU 21 receives the notification that the performance end has been instructed, the CPU 21 transmits the notification that the performance end has been instructed to the sound output device 103 and terminates the cloud server processing.
- step S601 when the CPU 31 of the sound output device 103 receives the singing sound through various I/Fs 39, the process proceeds to step S602.
- step S602 CPU 31 (output unit 46) outputs the received singing sound.
- the output timing of each syllable depends on the input timing of the corresponding sound information N.
- FIG. The mode of output referred to here is not limited to reproduction, as in the first embodiment.
- step S603 it is determined whether or not a notification has been received from the cloud server 102 that an instruction to end the performance has been received.
- the CPU 31 returns to step S601 if it has not received the notification that the performance end has been instructed, and terminates the sound output device processing if it has received the notification that the performance end has been instructed. Therefore, the CPU 31 outputs the phrase whenever it receives the singing sound of the phrase.
- the timbre differs between hitting the head and hitting the rim (rim shot), so this difference in timbre may also be used as a parameter for phrase generation.
- the conditions for extracting phrases may differ between head hits and rim shots.
- the sound generated by striking is not limited to drums, and may be hand clapping.
- the hitting position on the head may be detected, and the difference in hitting position may also be used as a parameter for phrase generation.
- the pitch when the obtainable sound information N includes pitch information, the pitch may be replaced with an accent, and the same processing as when hitting the drum may be performed. For example, a phrase corresponding to the case where "do-mi-do" is played on the piano and "weak-strong-weak” on the drums may be extracted.
- the singing voice to be used may be switched according to the sound information N.
- the sound information N is audio data
- the singing voice may be switched according to the timbre.
- the sound information N is MIDI data
- the singing voice may be switched according to the tone color set in the PC 101 and other parameters.
- each functional unit shown in FIG. 3 may be realized by any device, or may be realized by one device. If each of the functional units described above is realized by a single integrated device, the device may not be called a singing sound output system, but may be called a singing sound output device.
- each functional unit shown in FIG. 3 may be realized by AI (Artificial Intelligence).
- the same effect as the present invention may be obtained by reading a storage medium storing a control program represented by software for achieving the present invention into the present system.
- the read program code itself implements the novel functions of the present invention, and a non-transitory computer-readable recording medium storing the program code constitutes the present invention.
- the program code may be supplied via a transmission medium or the like, in which case the program code itself constitutes the present invention.
- ROM floppy disks, hard disks, optical disks, magneto-optical disks, CD-ROMs, CD-Rs, magnetic tapes, non-volatile memory cards, etc.
- volatile memory e.g., DRAM (Dynamic Random Access Memory)
- DRAM Dynamic Random Access Memory
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
Description
図1は、本発明の第1の実施の形態に係る歌唱音出力システムの全体構成を示す図である。この歌唱音出力システム1000は、PC(パーソナルコンピュータ)101、クラウドサーバ102および音出力装置103を含む。PC101および音出力装置103は、インターネット等の通信ネットワーク104によってクラウドサーバ102と通信可能に接続されている。PC101が使用される環境内には、音を入力するアイテムやデバイスとして、キーボード105、管楽器106およびドラム107が存在する。
本発明の第2の実施の形態では、第1の実施の形態に対し、システム処理の一部が異なる。従って、図5、図6を参照し、主に第1の実施の形態との相違を説明する。第1の実施の形態では、演奏テンポは固定であったが、本実施の形態では、演奏テンポは可変であり、演奏者による演奏によって変化する。
図1~図3、図7を参照して本発明の第3の実施の形態を説明する。ドラムのような、音高情報を入力できないデバイスを用いて歌唱音を発音させることができれば、楽しみが広がる。そこで、本実施の形態では、ドラム107を演奏入力に用いる。本実施の形態では、伴奏や音節の進行を教示することなく、ユーザがドラム107を自由に打撃演奏すると、それによって取得された1単位の一連の音情報Nごとに歌唱フレーズが生成される。歌唱音出力システム1000の基本構成は第1の実施の形態と同様である。本実施の形態では、ドラム107での演奏入力を想定しており、音高情報を有さないことが前提となるため、第1の実施の形態とは異なる制御が適用される。
42 取得部
43 音節特定部
44 タイミング特定部
46 出力部
1000 歌唱音出力システム
Claims (20)
- 伴奏データに時間的に対応付けられ且つ複数の音節を含む歌唱用データにおける進行位置をユーザに対して示す教示部と、
演奏によって入力された少なくとも1つの音情報を取得する取得部と、
前記取得部により取得された前記音情報に対応する音節を前記歌唱用データにおける複数の音節から特定する音節特定部と、
前記音情報に、前記特定された音節に対する相対的なタイミングを示す相対情報を対応付けるタイミング特定部と、
前記特定された音節に基づいて歌唱音を合成する合成部と、
前記相対情報に基づいて、前記合成部により合成された前記歌唱音と前記伴奏データに基づく伴奏音とを同期させて出力する出力部と、
を有する、歌唱音出力システム。 - 前記音情報は、少なくとも音高情報を含み、
前記合成部は、前記特定された音節と前記音高情報とに基づいて前記歌唱音を合成する、請求項1に記載の歌唱音出力システム。 - 前記教示部は、前記歌唱用データにおける音節に対応して表示された歌詞上で前記進行位置を示す、請求項1または2に記載の歌唱音出力システム。
- 前記教示部は、予め設定されたテンポで前記伴奏データを再生することで、前記歌唱用データにおける前記進行位置を示す、請求項1または2に記載の歌唱音出力システム。
- 前記音節特定部は、前記歌唱用データにおける複数の音節のうち、前記伴奏データとの時間的対応関係によって規定される発音開始タイミングと前記音情報の入力開始タイミングとの差分が最も小さい音節を、前記音情報に対応する音節として特定する、請求項1乃至4のいずれか1項に記載の歌唱音出力システム。
- 前記相対情報は前記差分である、請求項5に記載の歌唱音出力システム。
- 前記伴奏データと前記歌唱用データにおける複数の音節とは、タイミング情報によって時間的に対応付けられており、
前記出力部は、前記伴奏音の出力と並行して、前記歌唱音を、前記タイミング情報と前記相対情報とに基づいて出力タイミングを調整しつつ出力することで、前記歌唱音と前記伴奏音とを同期させて出力する、請求項1乃至6のいずれか1項に記載の歌唱音出力システム。 - 前記歌唱用データにおける音節の順番は予め決まっており、
前記教示部は、音情報の入力を待つ状態で前記歌唱用データにおける次の音節を示し、音情報が入力されるごとに、進行位置を示す音節を次の音節へと1つ進行させ、
前記音節特定部は、音情報が入力された時点で進行順における次の音節であったものを、入力された前記音情報に対応する音節として特定する、請求項1乃至3のいずれか1項に記載の歌唱音出力システム。 - 前記伴奏データと前記歌唱用データにおける複数の音節とは、タイミング情報によって時間的に対応付けられており、
前記タイミング特定部は、前記特定された音節の、前記伴奏データとの時間的対応関係によって規定される発音開始タイミングと前記音情報の入力開始タイミングとの差分を前記相対情報として求め、
前記出力部は、前記歌唱音および前記伴奏音を、前記タイミング情報と前記差分とに基づいて出力タイミングを調整しつつ出力することで、前記歌唱音と前記伴奏音とを同期させて出力する、請求項8に記載の歌唱音出力システム。 - タイミングを示す情報とベロシティを示す情報とを少なくとも含む一連の音情報を取得する取得部と、
前記取得部により取得された前記一連の音情報における個々の音情報のベロシティから前記一連の音情報のアクセントを解析し、当該アクセントに基づいて前記一連の音情報に対応する複数の音節からなるフレーズを生成するフレーズ生成部と、
前記フレーズ生成部により生成されたフレーズの音節に基づいて歌唱音を合成する合成部と、
前記合成部により合成された歌唱音を出力する出力部と、を有する、歌唱音出力システム。 - 前記フレーズ生成部は、予め用意されたフレーズのデータベースから、前記アクセントに合致するフレーズを抽出することで前記一連の音情報に対応するフレーズを生成する、請求項10に記載の歌唱音出力システム。
- 伴奏データに時間的に対応付けられ且つ複数の音節を含む歌唱用データにおける進行位置をユーザに対して示し、
演奏によって入力された少なくとも1つの音情報を取得し、
取得された前記音情報に対応する音節を前記歌唱用データにおける複数の音節から特定し、
前記音情報に、前記特定された音節に対する相対的なタイミングを示す相対情報を対応付け、
前記特定された音節に基づいて歌唱音を合成し、
前記相対情報に基づいて、合成された前記歌唱音と前記伴奏データに基づく伴奏音とを同期させて出力する、
歌唱音出力方法。 - 前記音情報は、少なくとも音高情報を含み、
前記歌唱音を合成する際、前記特定された音節と前記音高情報とに基づいて前記歌唱音を合成する、請求項12に記載の歌唱音出力方法。 - 前記進行位置を示す際、前記歌唱用データにおける音節に対応して表示された歌詞上で前記進行位置を示す、請求項12または13に記載の歌唱音出力方法。
- 前記進行位置を示す際、予め設定されたテンポで前記伴奏データを再生することで、前記歌唱用データにおける前記進行位置を示す、請求項12または13に記載の歌唱音出力方法。
- 前記音情報に対応する音節を特定する際、前記歌唱用データにおける複数の音節のうち、前記伴奏データとの時間的対応関係によって規定される発音開始タイミングと前記音情報の入力開始タイミングとの差分が最も小さい音節を、前記音情報に対応する音節として特定する、請求項12乃至15のいずれか1項に記載の歌唱音出力方法。
- 前記相対情報は前記差分である、請求項16に記載の歌唱音出力方法。
- 前記伴奏データと前記歌唱用データにおける複数の音節とは、タイミング情報によって時間的に対応付けられており、
前記歌唱音を出力する際、前記伴奏音の出力と並行して、前記歌唱音を、前記タイミング情報と前記相対情報とに基づいて出力タイミングを調整しつつ出力することで、前記歌唱音と前記伴奏音とを同期させて出力する、請求項12乃至17のいずれか1項に記載の歌唱音出力方法。 - 前記歌唱用データにおける音節の順番は予め決まっており、
前記進行位置を示す際、音情報の入力を待つ状態で前記歌唱用データにおける次の音節を示し、音情報が入力されるごとに、進行位置を示す音節を次の音節へと1つ進行させ、
前記音情報に対応する音節を特定する際、音情報が入力された時点で進行順における次の音節であったものを、入力された前記音情報に対応する音節として特定する、請求項12乃至14のいずれか1項に記載の歌唱音出力方法。 - 前記伴奏データと前記歌唱用データにおける複数の音節とは、タイミング情報によって時間的に対応付けられており、
前記音情報に前記相対情報を対応付ける際、前記特定された音節の、前記伴奏データとの時間的対応関係によって規定される発音開始タイミングと前記音情報の入力開始タイミングとの差分を前記相対情報として求め、
前記歌唱音を出力する際、前記歌唱音および前記伴奏音を、前記タイミング情報と前記差分とに基づいて出力タイミングを調整しつつ出力することで、前記歌唱音と前記伴奏音とを同期させて出力する、請求項19に記載の歌唱音出力方法。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/013379 WO2022208627A1 (ja) | 2021-03-29 | 2021-03-29 | 歌唱音出力システムおよび方法 |
JP2023509935A JPWO2022208627A1 (ja) | 2021-03-29 | 2021-03-29 | |
CN202180096124.2A CN117043846A (zh) | 2021-03-29 | 2021-03-29 | 歌唱音输出系统及方法 |
US18/475,309 US20240021183A1 (en) | 2021-03-29 | 2023-09-27 | Singing sound output system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/013379 WO2022208627A1 (ja) | 2021-03-29 | 2021-03-29 | 歌唱音出力システムおよび方法 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/475,309 Continuation US20240021183A1 (en) | 2021-03-29 | 2023-09-27 | Singing sound output system and method |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022208627A1 true WO2022208627A1 (ja) | 2022-10-06 |
Family
ID=83455800
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/013379 WO2022208627A1 (ja) | 2021-03-29 | 2021-03-29 | 歌唱音出力システムおよび方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20240021183A1 (ja) |
JP (1) | JPWO2022208627A1 (ja) |
CN (1) | CN117043846A (ja) |
WO (1) | WO2022208627A1 (ja) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09281970A (ja) * | 1996-04-16 | 1997-10-31 | Roland Corp | 電子楽器 |
JP2014062969A (ja) * | 2012-09-20 | 2014-04-10 | Yamaha Corp | 歌唱合成装置および歌唱合成プログラム |
JP2016080827A (ja) * | 2014-10-15 | 2016-05-16 | ヤマハ株式会社 | 音韻情報合成装置および音声合成装置 |
JP2020013145A (ja) * | 2019-09-10 | 2020-01-23 | カシオ計算機株式会社 | 電子楽器、電子楽器の制御方法、及びプログラム |
-
2021
- 2021-03-29 CN CN202180096124.2A patent/CN117043846A/zh active Pending
- 2021-03-29 WO PCT/JP2021/013379 patent/WO2022208627A1/ja active Application Filing
- 2021-03-29 JP JP2023509935A patent/JPWO2022208627A1/ja active Pending
-
2023
- 2023-09-27 US US18/475,309 patent/US20240021183A1/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09281970A (ja) * | 1996-04-16 | 1997-10-31 | Roland Corp | 電子楽器 |
JP2014062969A (ja) * | 2012-09-20 | 2014-04-10 | Yamaha Corp | 歌唱合成装置および歌唱合成プログラム |
JP2016080827A (ja) * | 2014-10-15 | 2016-05-16 | ヤマハ株式会社 | 音韻情報合成装置および音声合成装置 |
JP2020013145A (ja) * | 2019-09-10 | 2020-01-23 | カシオ計算機株式会社 | 電子楽器、電子楽器の制御方法、及びプログラム |
Also Published As
Publication number | Publication date |
---|---|
JPWO2022208627A1 (ja) | 2022-10-06 |
CN117043846A (zh) | 2023-11-10 |
US20240021183A1 (en) | 2024-01-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4735544B2 (ja) | 歌唱合成のための装置およびプログラム | |
EP1849154B1 (en) | Methods and apparatus for use in sound modification | |
JPH0944171A (ja) | カラオケ装置 | |
JP2008015195A (ja) | 楽曲練習支援装置 | |
JP7476934B2 (ja) | 電子楽器、電子楽器の制御方法、及びプログラム | |
JP2003241757A (ja) | 波形生成装置及び方法 | |
JP2006215204A (ja) | 音声合成装置およびプログラム | |
JP4274272B2 (ja) | アルペジオ演奏装置 | |
TW201027514A (en) | Singing synthesis systems and related synthesis methods | |
JP4038836B2 (ja) | カラオケ装置 | |
JP4844623B2 (ja) | 合唱合成装置、合唱合成方法およびプログラム | |
JP6171393B2 (ja) | 音響合成装置および音響合成方法 | |
WO2022208627A1 (ja) | 歌唱音出力システムおよび方法 | |
JP4304934B2 (ja) | 合唱合成装置、合唱合成方法およびプログラム | |
JP6167503B2 (ja) | 音声合成装置 | |
JPH11282483A (ja) | カラオケ装置 | |
JP2002229567A (ja) | 波形データ録音装置および録音波形データ再生装置 | |
Dannenberg | Human computer music performance | |
JP5106437B2 (ja) | カラオケ装置及びその制御方法並びにその制御プログラム | |
JP2022065554A (ja) | 音声合成方法およびプログラム | |
JP2001125599A (ja) | 音声データ同期装置及び音声データ作成装置 | |
JP2002221978A (ja) | ボーカルデータ生成装置、ボーカルデータ生成方法および歌唱音合成装置 | |
WO2022080395A1 (ja) | 音声合成方法およびプログラム | |
WO2023171522A1 (ja) | 音響生成方法、音響生成システムおよびプログラム | |
JP3457582B2 (ja) | 楽曲の自動表情付装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21934800 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2023509935 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202180096124.2 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21934800 Country of ref document: EP Kind code of ref document: A1 |