WO2022054496A1 - 電子楽器、電子楽器の制御方法、及びプログラム - Google Patents
電子楽器、電子楽器の制御方法、及びプログラム Download PDFInfo
- Publication number
- WO2022054496A1 WO2022054496A1 PCT/JP2021/029833 JP2021029833W WO2022054496A1 WO 2022054496 A1 WO2022054496 A1 WO 2022054496A1 JP 2021029833 W JP2021029833 W JP 2021029833W WO 2022054496 A1 WO2022054496 A1 WO 2022054496A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- performance
- data
- time
- playing
- pitch
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 101
- 230000008859 change Effects 0.000 claims abstract description 23
- 230000001755 vocal effect Effects 0.000 claims description 13
- 239000011295 pitch Substances 0.000 description 91
- 230000008569 process Effects 0.000 description 86
- 230000015572 biosynthetic process Effects 0.000 description 60
- 238000003786 synthesis reaction Methods 0.000 description 60
- 238000010586 diagram Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 13
- 238000001308 synthesis method Methods 0.000 description 7
- 238000010801 machine learning Methods 0.000 description 6
- 230000003595 spectral effect Effects 0.000 description 6
- 238000000605 extraction Methods 0.000 description 5
- 230000007786 learning performance Effects 0.000 description 5
- 230000004044 response Effects 0.000 description 4
- 238000001228 spectrum Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000007664 blowing Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000000630 rising effect Effects 0.000 description 2
- 238000013179 statistical model Methods 0.000 description 2
- KNMAVSAGTYIFJF-UHFFFAOYSA-N 1-[2-[(2-hydroxy-3-phenoxypropyl)amino]ethylamino]-3-phenoxypropan-2-ol;dihydrochloride Chemical compound Cl.Cl.C=1C=CC=CC=1OCC(O)CNCCNCC(O)COC1=CC=CC=C1 KNMAVSAGTYIFJF-UHFFFAOYSA-N 0.000 description 1
- LJJFNFYPZOHRHM-UHFFFAOYSA-N 1-isocyano-2-methoxy-2-methylpropane Chemical compound COC(C)(C)C[N+]#[C-] LJJFNFYPZOHRHM-UHFFFAOYSA-N 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 230000029058 respiratory gaseous exchange Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/02—Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos
- G10H1/04—Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos by additional modulation
- G10H1/053—Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos by additional modulation during execution only
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
- G10H1/361—Recording/reproducing of accompaniment for use with an external source, e.g. karaoke systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/36—Accompaniment arrangements
- G10H1/40—Rhythm
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
- G10L13/0335—Pitch control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/375—Tempo or beat alterations; Music timing control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/311—Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/315—Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
- G10H2250/455—Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis
Definitions
- the present invention relates to an electronic musical instrument that drives a learned acoustic model in response to an operation of an operator such as a keyboard and outputs a sound, a control method for the electronic musical instrument, and a program.
- the digital signal is used for the human vocalization mechanism and the sounding mechanism of acoustic musical instruments.
- the acoustic model modeled by processing is trained by machine learning based on singing motion and playing motion, and the trained acoustic model is driven based on the actual playing operation to infer and output the voice waveform data of singing voice and musical sound.
- Patent Document 1 Japanese Patent No. 6610714
- the generated waveform often changes depending on the tempo played, the singing style of the phrase, and the playing style.
- the length of the consonant part of the vocal voice, the length of the blow sound of the wind instrument, and the length of the noise component when starting to rub the strings of the rubbing instrument become longer in a slow performance with few notes.
- the sound becomes expressive and vivid, and in a fast-paced performance with many notes, it is played in a short time with a crisp sound.
- the sound source device when the user plays in real time on a keyboard or the like, there is no means for the sound source device to convey the performance speed between the notes that changes in response to the change in the staff division of each note or the difference in the playing phrase, so that the acoustic model Cannot infer an appropriate voice waveform according to the change in the playing speed between notes, for example, lacks expressiveness when playing slowly, or conversely, it is generated for playing at a fast tempo. There was a problem that the rise of the voice waveform to be played was delayed and it became difficult to play.
- an object of the present invention is to make it possible to infer an appropriate voice waveform that matches a change in playing speed between notes that changes in real time.
- An electronic musical instrument as an example of the embodiment has a pitch specification unit that outputs performance pitch data designated at the time of performance, a performance mode output unit that outputs performance mode data indicating the performance mode at the time of performance, and a performance mode output unit that outputs performance mode data at the time of performance.
- Playing pitch data and playing form data are synthesized based on the acoustic model parameters inferred by inputting the playing tone data and playing form data into the trained acoustic model. It is equipped with a sounding model unit that outputs.
- the electronic instrument of another example of the embodiment has a lyrics output unit that outputs performance lyrics data indicating the lyrics at the time of performance, and a pitch designation that outputs performance pitch data specified according to the output of the lyrics at the time of performance.
- FIG. 1 is a diagram showing an external example of an embodiment of an electronic keyboard instrument.
- FIG. 2 is a block diagram showing a hardware configuration example of an embodiment of a control system for an electronic keyboard instrument.
- FIG. 3 is a block diagram showing a configuration example of a voice learning unit and a voice synthesis unit.
- FIG. 4A is an explanatory diagram showing an example of staff division that is the basis of singing.
- FIG. 4B is an explanatory diagram showing an example of staff division that is the basis of singing.
- FIG. 5A is a diagram showing a waveform change of a singing voice caused by a difference in playing tempo.
- FIG. 5B is a diagram showing a waveform change of a singing voice caused by a difference in playing tempo.
- FIG. 5A is a diagram showing a waveform change of a singing voice caused by a difference in playing tempo.
- FIG. 6 is a block diagram showing a configuration example of a lyrics output unit, a pitch designation unit, and a performance form output unit.
- FIG. 7 is a diagram showing an example of data configuration of the present embodiment.
- FIG. 8 is a main flowchart showing an example of control processing of an electronic musical instrument according to the present embodiment.
- FIG. 9A is a flowchart showing a detailed example of the initialization process.
- FIG. 9B is a flowchart showing a detailed example of the tempo change process.
- FIG. 9C is a flowchart showing a detailed example of the song start process.
- FIG. 10 is a flowchart showing a detailed example of switch processing.
- FIG. 11 is a flowchart showing a detailed example of keyboard processing.
- FIG. 12 is a flowchart showing a detailed example of the automatic performance interrupt processing.
- FIG. 13 is a flowchart showing a detailed example of the song reproduction process.
- FIG. 1 is a diagram showing an external example of an embodiment 100 of an electronic keyboard instrument.
- the electronic keyboard instrument 100 includes a keyboard 101 composed of a plurality of keys as an operator, a volume specification, a song playback tempo setting described later, a performance tempo mode setting described later, a performance tempo adjustment setting described later, and a song described later.
- a first switch panel 102 that instructs various settings such as playback start and accompaniment playback described later, a second switch panel 103 that selects a song or accompaniment song, a tone color, etc., and lyrics during song playback described later.
- It is equipped with an LCD 104 (Liquid Keyboard Display) that displays musical scores and various setting information.
- the electronic keyboard instrument 100 is provided with a speaker for emitting a musical sound generated by the performance on the back surface portion, the side surface portion, the back surface portion, or the like.
- FIG. 2 is a diagram showing a hardware configuration example of an embodiment of the control system 200 of the electronic keyboard instrument 100 of FIG.
- the control system 200 includes a CPU (central processing unit) 201, a ROM (read-only memory) 202, a RAM (random access memory) 203, a sound source LSI (large-scale integrated circuit) 204, a voice synthesis LSI 205, and FIG. Key scanner 101 to which the first switch panel 102 and the second switch panel 103 are connected, the LCD controller 208 to which the LCD 104 of FIG. 1 is connected, and a network for exchanging MIBI data with an external network.
- Interfaces 219 are each connected to system bus 209.
- a timer 210 for controlling the sequence of automatic performance is connected to the CPU 201. Further, the music sound data 218 and the singing voice data 217 output from the sound source LSI 204 and the voice synthesis LSI 205, respectively, are converted into an analog music sound output signal and an analog singing voice voice output signal by the D / A converters 211 and 212, respectively. The analog music output signal and the analog singing voice output signal are mixed by the mixer 213, and after the mixed signal is amplified by the amplifier 214, they are output from a speaker or an output terminal (not particularly shown).
- the CPU 201 executes the control operation of the electronic keyboard instrument 100 of FIG. 1 by executing the control program loaded from the ROM 202 to the RAM 203 while using the RAM 203 as the work memory. Further, the ROM 202 (non-temporary recording medium) stores song data including lyrics data and accompaniment data in addition to the control program and various fixed data.
- the timer 210 used in this embodiment is mounted on the CPU 201, and for example, the progress of automatic performance in the electronic keyboard instrument 100 is counted.
- the sound source LSI 204 reads music sound type data from, for example, a waveform ROM (not shown in particular) according to the sound control data 216 from the CPU 201, and outputs the music sound data 218 to the D / A converter 211.
- the sound source LSI 204 has the ability to produce up to 256 voices at the same time.
- the voice synthesis LSI 205 includes text data of lyrics (playing lyrics data), data for designating each pitch corresponding to each lyrics (playing pitch data), and data on how to sing (playing form data). Is given as the singing voice data 215 at the time of performance, the singing voice data 217 corresponding to the singing voice data 217 is synthesized and output to the D / A converter 212.
- the key scanner 206 constantly scans the key press / release state of the key 101 of FIG. 1, the switch operation state of the first switch panel 102, and the second switch panel 103, and interrupts the CPU 201. Communicate change.
- the LCD controller 208 is an IC (integrated circuit) that controls the display state of the LCD 104.
- FIG. 3 is a block diagram showing a configuration example of the voice synthesis unit and the voice learning unit in the present embodiment.
- the voice synthesis unit 302 is built in the electronic keyboard instrument 100 as one function executed by the voice synthesis LSI 205 of FIG.
- the voice synthesis unit 302 is a lyrics instructed by the CPU 201 via the key scanner 206 of FIG. 2 based on the key pressed on the keyboard 101 of FIG. 1 by the automatic reproduction of lyrics (hereinafter referred to as “song reproduction”) processing described later.
- the singing voice data 215 is synthesized and output.
- the processor of the voice synthesizing unit 302 associates the lyrics information generated by the CPU 201 in response to the operation on any of the plurality of keys (operators) on the keyboard 101 with the one of the keys.
- the playing singing voice data 215 including the pitch information and the information about the singing method is input to the playing singing voice analysis unit 307, and the playing language feature quantity series 316 output from the input is stored in the acoustic model unit 306.
- the singing voice data 217 that infers the singing voice of the singer is output based on the spectral information 318 and the sound source information 319 output by the acoustic model unit 306 as a result of inputting to the trained acoustic model. ..
- the voice learning unit 301 may be implemented as a function executed by a server computer 300 existing outside the electronic keyboard instrument 100 of FIG. 1.
- the voice learning unit 301 is built into the electronic keyboard instrument 100 as a function executed by the voice synthesis LSI 205 if the processing capacity of the voice synthesis LSI 205 in FIG. 2 is sufficient. May be good.
- the speech learning unit 301 and the speech synthesis unit 302 of FIG. 2 are implemented based on, for example, the technique of "statistical speech synthesis based on deep learning" described in Non-Patent Document 1 below.
- Non-Patent Document 1 Yoshi Hashimoto, Shinji Takagi, "Statistical Speech Synthesis Based on Deep Learning,” Journal of the Acoustical Society of Japan, Vol. 73, No. 1 (2017), pp. 55-62
- the voice learning unit 301 of FIG. 2 which is a function executed by an external server computer 300, includes a learning singing voice analysis unit 303, a learning acoustic feature amount extraction unit 304, and a model learning unit 305. include.
- the learning singing voice data 312 for example, a voice recorded by a singer singing a plurality of songs of an appropriate genre is used. Further, the learning singing voice data 311 includes text data of the lyrics of each song (learning lyrics data), data for designating each pitch corresponding to each lyrics (learning pitch data), and learning singing voice data. Data showing how to sing 312 (playing form data for learning) is prepared. As the learning performance mode data, the time interval in which the learning pitch data is sequentially designated is sequentially measured, and each data indicating the sequentially measured time interval is designated.
- Learning singing voice data 311 including learning lyrics data, learning pitch data, and learning performance form data is input to the learning singing voice analysis unit 303, and the learning singing voice analysis unit 303 inputs the input data. To analyze. As a result, the learning singing voice analysis unit 303 estimates and outputs the learning language feature quantity series 313, which is a discrete numerical series expressing the phonemes, pitches, and singing styles corresponding to the learning singing voice data 311.
- the learning acoustic feature amount extraction unit 304 a singer sings the lyrics corresponding to the learning singing voice data 311 in accordance with the input of the learning singing voice data 311, and the learning is collected via a microphone or the like.
- the singing voice voice data 312 is input, and the learning acoustic feature amount extraction unit 304 analyzes the input data.
- the learning acoustic feature amount extraction unit 304 extracts the learning acoustic feature amount series 314 representing the voice feature amount corresponding to the learning singing voice voice data 312, and outputs it as teacher data.
- the language feature quantity series 313 for learning is represented by the following symbols.
- the acoustic model is represented by the following symbols.
- the learning acoustic feature series 314 is represented by the following symbols.
- the probability that the learning acoustic feature sequence 314 is generated is represented by the following symbols.
- the acoustic model that maximizes the probability that the learning acoustic feature sequence 314 is generated is represented by the following symbols.
- the model learning unit 305 machine-learns an acoustic model that maximizes the probability that the learning acoustic feature sequence 314 is generated from the learning language feature sequence 314 and the acoustic model according to the following equation (1). Estimated by. That is, the relationship between the linguistic feature series, which is text, and the acoustic feature series, which is voice, is expressed by a statistical model called an acoustic model.
- the model learning unit 305 outputs learning result data 315 representing an acoustic model calculated as a result of performing machine learning by the calculation shown in the equation (1).
- the calculated acoustic model is represented by the following symbols.
- the learning result data 315 is stored in the ROM 202 of the control system of the electronic keyboard instrument 100 of FIG. 2 at the time of shipment from the factory of the electronic keyboard instrument 100 of FIG. 1, and the power of the electronic keyboard instrument 100 is stored. When it is on, it may be loaded from the ROM 202 of FIG. 2 into the acoustic model unit 306 described later in the voice synthesis LSI 205.
- the learning result data 315 can be obtained by operating the second switch panel 103 of the electronic keyboard instrument 100, so that the Internet or USB (Universal Serial Bus) cable (not particularly shown) is used. It may be downloaded from the network such as, etc.
- USB Universal Serial Bus
- the trained acoustic model may be made into hardware by FPGA (Field-Programmable Gate Array) or the like, and this may be used as the acoustic model unit.
- FPGA Field-Programmable Gate Array
- the voice synthesis unit 302 which is a function executed by the voice synthesis LSI 205, includes a performance singing voice analysis unit 307, an acoustic model unit 306, and a vocalization model unit 308.
- the voice synthesis unit 302 sequentially synthesizes and outputs singing voice data 217 corresponding to the singing voice data 215 during performance, which is sequentially input at the time of performance, by predicting using a statistical model called an acoustic model set in the acoustic model unit 306. Perform statistical speech synthesis processing.
- the performance singing voice analysis unit 307 As a result of the performer's performance in accordance with the automatic performance, the performance lyrics data (sound elements of the lyrics corresponding to the lyrics text) and the performance pitch data designated by the CPU 201 in FIG. And the playing singing voice data 215 including the information about the playing performance form data (singing style data) is input, and the playing singing voice analysis unit 307 analyzes the input data. As a result, the playing singing voice analysis unit 307 analyzes and outputs the phoneme, part of speech, word, pitch, and singing style of the playing language feature quantity series 316 corresponding to the playing singing voice data 215.
- the acoustic model unit 306 estimates and outputs the playing acoustic feature sequence 317, which is the corresponding acoustic model parameter.
- the playing language feature sequence 316 input from the playing singing voice analysis unit 307 is represented by the following symbols.
- the acoustic model set as the learning result data 315 by machine learning in the model learning unit 305 is represented by the following symbols.
- the acoustic feature series 317 during performance is represented by the following symbols.
- the probability that the acoustic feature sequence 317 during performance is generated is represented by the following symbols.
- the estimated value of the playing acoustic feature series 317 which is an acoustic model parameter that maximizes the probability that the playing acoustic feature series 317 is generated, is represented by the following symbols.
- the acoustic model unit 306 is set as learning result data 315 by machine learning in the performance language feature quantity series 316 input from the performance singing voice analysis unit 307 and the model learning unit 305 according to the following equation (2). Based on the model, the estimated value of the playing acoustic feature series 317, which is an acoustic model parameter that maximizes the probability that the playing acoustic feature series 317 is generated, is estimated.
- the vocalization model unit 308 synthesizes and outputs the singing voice data 217 corresponding to the singing voice data 215 during performance specified by the CPU 201 by inputting the acoustic feature amount series 317 during performance.
- the singing voice data 217 is output from the D / A converter 212 of FIG. 2 via the mixer 213 and the amplifier 214, and is particularly emitted from a speaker (not shown).
- the acoustic features represented by the learning acoustic feature series 314 and the playing acoustic feature series 317 include spectral information modeling the human vocal tract and sound source information modeling the human vocal cords.
- spectrum information for example, a mer cepstrum, a line spectral pair (Line Spectral Pairs: LSP) or the like can be adopted.
- sound source information a fundamental frequency (F0) indicating the pitch frequency of human voice and a power value can be adopted.
- the vocalization model unit 308 includes a sound source generation unit 309 and a synthetic filter unit 310.
- the sound source generation unit 309 is a portion that models a human voice band, and by sequentially inputting a series of sound source information 319 input from the acoustic model unit 306, for example, the basic frequency (F0) included in the sound source information 319.
- a sound source consisting of pulse train data (in the case of voiced sound elements) that is periodically repeated by the power value, white noise data (in the case of unvoiced sound elements) having a power value included in the sound source information 319, or data in which they are mixed. Generate signal data.
- the synthetic filter unit 310 is a portion that models the human voice path, forms a digital filter that models the voice path based on a series of spectral information 318 sequentially input from the acoustic model section 306, and forms a sound source generation unit.
- the sound source signal data input from 309 is used as the excitation source signal data, and the singing voice audio data 321 which is the digital signal data is generated and output.
- the sampling frequency for the learning singing voice data 312 and the singing voice data 217 is, for example, 16 KHz (kilohertz).
- the update frame period is, for example, 5 msec (. Milliseconds).
- the analysis window length is 25 msec
- the window function is the Blackman window
- the analysis order is 24th order.
- the HMM A method using a Hidden Markov Model (Hidden Markov Model) or a method using a DNN (Deep Neural Network: Deep Neural Network) can be adopted. Since these specific embodiments are disclosed in Patent Document 1 described above, detailed description thereof will be omitted in this application.
- the electronic keyboard instrument 100 that outputs the singing voice voice data 217 that a certain singer sings well is realized.
- FIG. 4A and 4B are explanatory views showing an example of staff division which is the basis of singing.
- FIG. 4A shows an example of a musical score of a fast passage lyrics melody
- FIG. 4B shows an example of a musical score of a slow passage lyrics melody.
- the pattern of pitch change is similar, but FIG. 4A shows a continuous division of sixteenth notes (note length is one-fourth of a quarter note), while FIG. 4B shows. It is a continuous division of quarter notes. Therefore, regarding the speed at which the pitch is changed, the staff division in FIG. 4A is four times as fast as the staff division in FIG. 4B.
- the consonant part of the singing voice In a song with a fast passage, the consonant part of the singing voice must be shortened to sing (play) well. On the contrary, in a slow passage song, it is possible to sing (play) with high expressiveness by lengthening the consonant part of the singing voice.
- the singing (performance) speed will differ due to the difference in the length of each note in the singing melody (quarter note, eighth note, sixteenth note, etc.).
- the playing speed will differ if the tempo at the time of playing changes.
- the time interval (pronunciation speed) between notes caused by the above two factors will be described as "performance tempo" in order to distinguish it from the tempo of a normal musical piece.
- FIGS. 5A and 5B are diagrams showing waveform changes of singing voice voice caused by a difference in playing tempo as illustrated in FIGS. 4A and 4B.
- the examples shown in FIGS. 5A and 5B show an example of the waveform of the singing voice when the voice of / ga / is pronounced.
- the voice of / ga / is a voice in which the consonant / g / and the vowel / a / are combined.
- the sound length (time length) of the consonant part is usually about several tens of milliseconds to 200 milliseconds.
- FIG. 5A shows an example of a singing voice waveform when sung in a fast passage
- FIG. 5B shows an example of a singing voice waveform when sung in a slow passage.
- the difference between the waveforms of FIGS. 5A and 5B is that the length of the consonant / g / portion is different.
- the pronunciation time of the consonant is short, as shown in FIG. 5A, and conversely, when sung in a slow passage, the consonant is shown in FIG. 5B.
- the pronunciation time of the part is long.
- the consonant is not sung clearly and the pronunciation start speed is prioritized, but in a slow passage, the consonant is often pronounced clearly for a long time to increase the intelligibility as a word.
- the voice learning is performed in the statistical voice synthesis process including the voice learning unit 301 and the voice synthesis unit 302 exemplified in FIG. 3 in the present embodiment.
- Learning performance form data indicating how to sing is added to the learning lyrics data indicating the lyrics and the learning pitch data indicating the pitch to the learning singing voice data 311 input in the unit 301, and this learning performance.
- Information on the playing tempo is included in the morphological data.
- the learning singing voice analysis unit 303 in the voice learning unit 301 generates a learning language feature quantity series 313 by analyzing such learning singing voice data 311. Then, the model learning unit 305 in the voice learning unit 301 performs machine learning using the learning language feature quantity series 313.
- the model learning unit 305 can output the learned acoustic model including the performance tempo information as the learning result data 315 and store it in the acoustic model unit 306 in the speech synthesis unit 302 of the speech synthesis LSI 205.
- the learning performance mode data the time interval in which the learning pitch data is sequentially designated is sequentially measured, and each performance tempo data indicating the sequentially measured time interval is designated.
- the model learning unit 305 in the present embodiment can perform learning so as to derive a learned acoustic model in which the difference in performance tempo depending on the singing method is added.
- the performance singing voice data 215 includes the performance lyrics data indicating the lyrics and the performance sound indicating the pitch. Performance data indicating how to sing is added to the high data, and the performance tempo information can be included in the performance data during performance.
- the playing singing voice analysis unit 307 in the speech synthesis unit 302 analyzes the playing singing voice data 215 to generate the playing language feature sequence 316.
- the acoustic model unit 306 in the speech synthesis unit 302 outputs the corresponding spectrum information 318 and the sound source information 319 by inputting the performance language feature quantity series 316 into the trained acoustic model, and each of them is a vocalization model unit. It is supplied to the synthesis filter unit 310 and the sound source generation unit 309 in the 308.
- the vocalization model unit 308 can output singing voice data 217 that reflects changes in the length of consonants as illustrated in FIGS. 5A and 5B due to differences in performance tempo depending on the singing method. That is, it is possible to infer appropriate singing voice data 217 that matches the change in the playing speed between the notes that changes in real time.
- FIG. 6 shows a lyrics output unit and a pitch designation unit realized by the CPU 201 of FIG. 2 as a control processing function exemplified by the flowcharts of FIGS. 8 to 11 described later for generating the above-mentioned performance singing voice data 215.
- FIG. 6 shows a lyrics output unit and a pitch designation unit realized by the CPU 201 of FIG. 2 as a control processing function exemplified by the flowcharts of FIGS. 8 to 11 described later for generating the above-mentioned performance singing voice data 215.
- a block diagram showing a configuration example of a performance mode output unit is exemplified by the flowcharts of FIGS. 8 to 11 described later for generating the above-mentioned performance singing voice data 215.
- the lyrics output unit 601 includes each performance lyrics data 609 indicating the lyrics at the time of performance in each performance singing voice data 215 to be output to the voice synthesis LSI 205 of FIG. 2 and outputs the lyrics. Specifically, the lyrics output unit 601 sequentially reads each timing data 605 in the song data 604 of the song reproduction loaded in advance from the ROM 202 to the RAM 203 by the CPU 201 in FIG. 2, and according to the timing indicated by each timing data 605, each Each lyrics data (lyric text) 608 in each event data 606 stored as song data 604 in combination with the timing data 605 is sequentially read out, and each is used as each performance lyrics data 609.
- the pitch designation unit 602 includes each performance pitch data 610 indicating each pitch designated according to the output of each lyrics at the time of performance in each performance singing voice data 215 output to the speech synthesis LSI 205 of FIG. No output. Specifically, the pitch designation unit 602 sequentially reads out each timing data 605 in the song data 604 for song reproduction loaded in the RAM 203, and the performer at the timing indicated by each timing data 605. If any key is pressed on the keyboard 101 and the pitch information of the pressed key is input via the key scanner 206, the pitch information is input to the playing pitch data 610. And. Further, when the performer does not press any key on the keyboard 101 of FIG. 1 at the timing indicated by each timing data 605, the pitch designation unit 602 sets the timing data 605 and the music data 604. The pitch data 607 in the event data 606 stored as is used as the playing pitch data 610.
- the performance form output unit 603 includes the performance form data 611 indicating the singing method, which is the performance form at the time of performance, in each performance singing voice data 215 output to the voice synthesis LSI 205 of FIG. 2 and outputs the data.
- the performance form output unit 603 is used by the performer at the time of performance.
- the time interval in which the pitch is specified by pressing the key is sequentially measured, and each performance tempo data indicating the sequentially measured time interval is referred to as performance mode data 611 at the time of each performance.
- the performance mode output unit 603 is the song loaded in the RAM 203 when the performer does not set the performance tempo mode to the free mode as described later on the first switch panel 102 of FIG.
- Each performance tempo data corresponding to each time interval indicated by each timing data 605 sequentially read from the song data 604 for reproduction is referred to as performance mode data 611 at the time of each performance.
- the performance mode output unit 603 plays the performance. Based on the value of the tempo adjust setting, the value of each performance tempo data sequentially obtained as described above is intentionally changed, and each performance tempo data after the change is set as the performance form data 611 at the time of performance.
- a key press event is generated by the performer's key press operation or song playback.
- the performance singing voice data 215 including the performance lyrics data 609, the performance pitch data 610, and the performance performance form data 611 is generated, and the singing voice data 215 is generated in the voice synthesis LSI 205 having the configuration of FIG. 2 or FIG. It can be issued to the voice synthesis unit 302 of.
- FIG. 7 is a diagram showing a detailed data configuration example of song data read from ROM 202 to RAM 203 in FIG. 2 in the present embodiment.
- This data structure example conforms to the standard MIDI file format, which is one of the MIDI (Musical Instrument Digital Interface) file formats.
- This song data is composed of data blocks called chunks. Specifically, the song data consists of a header chunk at the beginning of the file, a track chunk 1 in which the lyrics data for the following lyrics part is stored, and a track chunk 2 in which the performance data for the accompaniment part is stored. It is composed.
- the header chunk consists of four values: ChunkID, ChunkSize, FormatType, NumberOfTrack, and TimeDivision.
- ChunkID is a 4-byte ASCII code "4D 54 68 64" (numbers are hexadecimal) corresponding to four single-byte characters "MThd” indicating that it is a header chunk.
- the ChunkSize is 4-byte data indicating the data length of the FormatType, NumberOfTrack, and TimeDivision parts excluding the ChunkID and the ChunkSize in the header chunk, and the data length is 6 bytes: "00000006" (numbers are hexadecimal). It is fixed to.
- the Format Type is 2-byte data “00 01” (numbers are hexadecimal numbers), which means format 1 using a plurality of tracks.
- the NumberOfTrack is 2-byte data "00 02” (number is a hexadecimal number) indicating that two tracks corresponding to the lyrics part and the accompaniment part are used.
- the Time Division is data indicating a time base value indicating a resolution per quarter note, and in the case of the present embodiment, it is 2-byte data "01E0" (number is a hexadecimal number) indicating 480 in decimal notation.
- the track chunk 1 indicates a lyrics part, corresponds to the song data 604 of FIG. 6, and corresponds to the ChunkID, the ChunkSize, the DeltaTime_1 [i] corresponding to the timing data 605 of FIG. 6, and the Event_1 corresponding to the event data 606 of FIG. It consists of a performance data set (0 ⁇ i ⁇ L-1) consisting of [i].
- the track chunk 2 corresponds to the accompaniment part, and is a performance data set (0 ⁇ ) consisting of a ChunkID, a ChunkSize, DeltaTime_2 [i] which is the timing data of the accompaniment part, and Event_2 [j] which is the event data of the accompaniment part. It consists of j ⁇ M-1).
- Each Chunk ID in track chunks 1 and 2 is a 4-byte ASCII code "4D 54 72 6B" (numbers are hexadecimal numbers) corresponding to four half-width characters "MTrk” indicating that they are track chunks.
- Each ChunkSize in the track chunks 1 and 2 is 4-byte data indicating the data length of the portion of each track chunk excluding the ChunkID and the ChunkSize.
- DeltaTime_1 [i] which is the timing data 605 of FIG. 6, is a variable of 1 to 4 bytes indicating the waiting time (relative time) from the execution time of Event_1 [i-1], which is the event data 606 of FIG. 6 immediately before that. It is long data.
- DeltaTime_2 [i] which is the timing data of the accompaniment part, is 1 to 4 bytes indicating the waiting time (relative time) from the execution time of Event_2 [i-1], which is the event data of the accompaniment part immediately before that. It is variable length data.
- Event_1 [i] which is the event data 606 of FIG. 6, is a meta-event having two pieces of information, the vocalized text of the lyrics and the pitch, in the track chunk 1 / lyrics part of this embodiment.
- the event_2 [i] which is the event data of the accompaniment part, is a MIDI event instructing the note-on or note-off of the accompaniment sound in the track chunk 2 / accompaniment part, or a meta event instructing the beat of the accompaniment sound.
- FIG. 8 is a main flowchart showing an example of control processing of an electronic musical instrument according to the present embodiment.
- This control process is, for example, an operation in which the CPU 201 of FIG. 2 executes a control process program loaded from the ROM 202 into the RAM 203.
- the CPU 201 first executes the initialization process (step S801), and then repeatedly executes a series of processes from steps S802 to S808.
- the CPU 201 first executes the switch process (step S802).
- the CPU 201 executes a process corresponding to the switch operation of the first switch panel 102 or the second switch panel 103 of FIG. 1 based on the interrupt from the key scanner 206 of FIG. The details of the switch processing will be described later using the flowchart of FIG.
- the CPU 201 executes a keyboard process for determining whether or not any key of the key 101 of FIG. 1 has been operated based on an interrupt from the key scanner 206 of FIG. 2 (step S803).
- the CPU 201 outputs musical tone control data 216 instructing the sound source LSI 204 of FIG. 2 to start or stop the sound in response to the operation of pressing or releasing any key by the performer. ..
- the CPU 201 executes a process of calculating the time interval from the immediately preceding key press to the current key press as performance tempo data. The details of the keyboard processing will be described later using the flowchart of FIG.
- the CPU 201 processes the data to be displayed on the LCD 104 of FIG. 1, and executes the display process of displaying the data on the LCD 104 via the LCD controller 208 of FIG. 2 (step S804).
- the data displayed on the LCD 104 includes, for example, lyrics corresponding to the singing voice data 217 to be played, melody and accompaniment score corresponding to the lyrics, and various setting information.
- the CPU 201 executes the song playback process (step S805).
- the CPU 201 generates performance singing voice data 215 including lyrics, vocal pitch, and performance tempo for operating the speech synthesis LSI 205 based on the song reproduction, and issues the speech synthesis LSI 205 to the speech synthesis LSI 205.
- the details of the song reproduction process will be described later using the flowchart of FIG.
- the CPU 201 executes sound source processing (step S806).
- the CPU 201 executes control processing such as envelope control of the musical tone being sounded in the sound source LSI 204.
- the CPU 201 executes the speech synthesis process (step S807).
- the CPU 201 controls the execution of speech synthesis by the speech synthesis LSI 205.
- the CPU 201 determines whether or not the performer has pressed a power-off switch (not shown) to power off (step S808). If the determination in step S808 is NO, the CPU 201 returns to the process of step S802. If the determination in step S808 is YES, the CPU 201 ends the control process shown in the flowchart of FIG. 8 and turns off the power of the electronic keyboard instrument 100.
- 9A, 9B, and 9C show the initialization process of step S801 of FIG. 8, the tempo change process of step S1002 of FIG. 10, which will be described later in the switch process of step S802 of FIG. 8, and step S1006 of FIG. It is a flowchart which shows the detailed example of the song start processing of.
- the CPU 201 executes the TickTime initialization process.
- the progress of the lyrics and the automatic accompaniment proceed in units of time called TickTime.
- the timebase value specified as the TimeDivision value in the header chunk of the song data exemplified in FIG. 7 indicates the resolution of the quarter note, and if this value is, for example, 480, the quarter note has the time length of 480TickTime. Have.
- the value of the waiting time DeltaTime_1 [i] and the value of DeltaTime_2 [i] in each track chunk of the song data exemplified in FIG. 7 are also counted by the time unit of TickTime.
- how many seconds 1 Tick Time actually becomes depends on the tempo specified for the song data.
- the tempo value is Tempo [beat / minute] and the time base value is Time Division
- the number of seconds of Tick Time is calculated by the following equation (3).
- the CPU 201 first calculates TickTime [seconds] by the arithmetic processing corresponding to the above equation (10) (step S901).
- a predetermined value for example, 60 [beats / second] is stored in the ROM 202 of FIG. 2 in the initial state.
- the tempo value at the time of the previous end may be stored in the non-volatile memory.
- the CPU 201 sets a timer interrupt by the TickTime [seconds] calculated in step S901 for the timer 210 in FIG. 2 (step S902).
- an interrupt for song playback and automatic accompaniment hereinafter referred to as “automatic performance interrupt”. Therefore, in the automatic performance interrupt processing (FIG. 12 described later) executed by the CPU 201 based on this automatic performance interrupt, the control processing for advancing the song reproduction and the automatic accompaniment is executed for each TickTime.
- the CPU 201 executes other initialization processing such as initialization of the RAM 203 of FIG. 2 (step S903). After that, the CPU 201 ends the initialization process of step S801 of FIG. 8 exemplified by the flowchart of FIG. 9A.
- FIG. 10 is a flowchart showing a detailed example of the switch processing in step S802 of FIG.
- the CPU 201 determines whether or not the tempo of the lyrics progression and the automatic accompaniment has been changed by the tempo change switch in the first switch panel 102 of FIG. 1 (step S1001). If the determination is YES, the CPU 201 executes the tempo change process (step S1002). Details of this process will be described later with reference to FIG. 9B. If the determination in step S1001 is NO, the CPU 201 skips the process in step S1002.
- step S1003 the CPU 201 determines whether or not any song song has been selected in the second switch panel 103 of FIG. 1 (step S1003). If the determination is YES, the CPU 201 executes the song song reading process (step S1004). This process is a process of reading the song data having the data structure described with reference to FIG. 7 from the ROM 202 of FIG. 2 to the RAM 203. The song song reading process may be performed not during the performance or before the start of the performance. From then on, data access to track chunks 1 or 2 in the data structure illustrated in FIG. 7 is performed on the song data read into RAM 203. If the determination in step S1003 is NO, the CPU 201 skips the process in step S1004.
- step S1005 determines whether or not the song start switch has been operated on the first switch panel 102 of FIG. 1. If the determination is YES, the CPU 201 executes the song start process (step S1006). Details of this process will be described later with reference to FIG. 9C. If the determination in step S1005 is NO, the CPU 201 skips the process in step S1006.
- the CPU 201 determines whether or not the free mode switch has been operated in the first switch panel 102 of FIG. 1 (step S1007). If the determination is YES, the CPU 201 executes a free mode set process for changing the value of the variable FreeMode on the RAM 203 (step S1008).
- the free mode switch is, for example, a toggle operation, and the value of the variable FreeMode is initially set to, for example, a value 1 in step S903 of FIG. 9A.
- the free mode switch is pressed in that state, the value of the variable FreeMode becomes 0, when it is pressed again, the value becomes 1, and so on. Every time the free mode switch is pressed, the value of the variable FreeMode becomes 0 and 1. It can be switched alternately. When the value of the variable FreeMode is 1, the free mode is set, and when the value is 0, the free mode setting is canceled. If the determination in step S1007 is NO, the CPU 201 skips the process in step S1008.
- the CPU 201 determines whether or not the performance tempo adjust switch has been operated on the first switch panel 102 of FIG. 1 (step S1009). If the determination is YES, the CPU 201 changes the value of the variable ShinAdjust on the RAM 203 to the value specified by the numerical key on the first switch panel 102 following the operation of the performance tempo adjust switch. The setting process is executed (step S1010). The value of the variable ShinAdjust is initially set to the value 0, for example, in step S903 of FIG. 9A. If the determination in step S1009 is NO, the CPU 201 skips the process in step S1010.
- the CPU 201 determines whether or not other switches have been operated in the first switch panel 102 or the second switch panel 103 in FIG. 1, and executes a process corresponding to each switch operation (step S1011). .. After that, the CPU 201 ends the switch process of step S802 of FIG. 8 exemplified by the flowchart of FIG.
- FIG. 9B is a flowchart showing a detailed example of the tempo change process in step S1002 of FIG. As mentioned above, when the tempo value is changed, the TickTime [seconds] is also changed. In the flowchart of FIG. 9B, the CPU 201 executes a control process relating to the change of the TickTime [seconds].
- the CPU 201 calculates TickTime [seconds] by the arithmetic processing corresponding to the above-mentioned equation (3) in the same manner as in the case of step S901 of FIG. 9A executed in the initialization process of step S801 of FIG. (Step S911). It is assumed that the tempo value Tempo is stored in the RAM 203 or the like after being changed by the tempo change switch in the first switch panel 102 of FIG.
- the CPU 201 is based on the TickTime [seconds] calculated in step S911 with respect to the timer 210 of FIG. 2 in the same manner as in the case of step S902 of FIG. 9A executed in the initialization process of step S801 of FIG. Set the timer interrupt (step S912). After that, the CPU 201 ends the tempo change process of step S1002 of FIG. 10 exemplified by the flowchart of FIG. 9B.
- FIG. 9C is a flowchart showing a detailed example of the song start process of step S1006 of FIG.
- the CPU 201 has timing data variables DeltaT_1 (track chunk 1) and DeltaT_2 (track chunk 2) on the RAM 203 for counting the relative time from the occurrence time of the immediately preceding event in TickTime as a unit. Initialize both values to 0.
- the CPU 201 specifies the values of each of the performance data sets DeltaTime_1 [i] and Event_1 [i] (1 ⁇ i ⁇ L-1) in the track chunk 1 of the song data exemplified in FIG. 7.
- the CPU 201 initially sets the value of the variable SongIndex on the RAM 203 that indicates the current song position to the Null value (step S922).
- the Null value is usually defined as 0, but since the index number may be 0, the Null value is defined as -1 in this embodiment.
- the CPU 201 determines whether or not the performer is set to reproduce the accompaniment in accordance with the reproduction of the lyrics by the first switch panel 102 of FIG. 1 (step S924).
- step S924 determines whether the determination in step S924 is YES or whether the determination in step S924 is YES or whether the determination in step S924 is YES or whether the determination in step S924 is NO, the CPU 201 sets the value of the variable Bansou to 0 (no accompaniment) (step S926). After the process of step S925 or S926, the CPU 201 ends the song start process of step S1006 of FIG. 10 exemplified by the flowchart of FIG. 9C.
- FIG. 11 is a flowchart showing a detailed example of the keyboard processing in step S803 of FIG.
- the CPU 201 determines whether or not any key on the keyboard 101 of FIG. 1 has been operated via the key scanner 206 of FIG. 2 (step S1101).
- step S1101 If the determination in step S1101 is NO, the CPU 201 directly ends the keyboard processing in step S803 of FIG. 8 exemplified by the flowchart of FIG.
- step S1101 determines whether the key has been pressed or released (step S1102).
- the CPU 201 instructs the voice synthesis LSI 205 to mute the utterance of the singing voice data 217 corresponding to the released pitch (or key number). (Step S1113). According to this instruction, the voice synthesis unit 302 in FIG. 3 in the voice synthesis LSI 205 stops uttering the corresponding singing voice data 217. After that, the CPU 201 ends the keyboard processing in step S803 of FIG. 8 exemplified by the flowchart of FIG.
- step S1102 determines the value of the variable FreeMode on the RAM 203 (step S1103).
- the value of this variable FreeMode is set in step S1008 of FIG. 10 described above.
- the variable free mode is a value 1
- the free mode is set, and when the value is 0, the free mode setting is canceled.
- the CPU 201 uses the RAM 203 as described above in the description of the performance mode output unit 603 of FIG.
- the predetermined coefficient is the Time Division value of the song data ⁇ 60 in this embodiment. That is, if the TimeDivision value is 480, the PlayTempo is 60 (corresponding to the normal tempo 60) when the DeltaTime_1 [AutoIndex_1] is 480. When DeltaTime_1 [AutoIndex_1] is 240, PlayTempo is 120 (corresponding to a normal tempo 120).
- the performance tempo will be set in synchronization with the song playback timing information.
- step S1104 the CPU 201 further determines whether or not the value of the variable NoteOnTime on the RAM 203 is a Null value (step S1104).
- the value of the variable NoteOnTime is initially set to the Null value, and after the start of song playback, the current time of the timer 210 of FIG. 2 is sequentially set in step S1110 described later. To.
- step S1104 determines whether the determination in step S1104 is YES at the start of song playback. If the determination in step S1104 is YES at the start of song playback, the performance tempo cannot be determined from the key press operation of the performer. Therefore, the CPU 201 uses the timing data 605 on the RAM 203, DeltaTime_1 [ AutoIndex_1] is used to set the value calculated by the arithmetic processing exemplified by the above-mentioned equation (4) in the variable PlayTempo on the RAM 203 (step S1109). In this way, at the start of song playback, the performance tempo is provisionally set in synchronization with the timing information of song playback.
- step S1104 If the determination in step S1104 is NO after the start of song playback, the CPU 201 first determines the value of the variable NoteOnTime on the RAM 203 indicating the previous key press time from the current time indicated by the timer 210 in FIG. The difference time obtained by subtracting is set in the variable DeltaTime on the RAM 203 (step S1105).
- the CPU 201 determines whether or not the value of the variable DeltaTime indicating the difference time from the previous key press to the current key press is smaller than the predetermined maximum time regarded as simultaneous key presses by chord performance (chord). (Step S1106).
- step S1106 If the determination in step S1106 is YES and it is determined that the key pressed this time is a simultaneous key pressed by chord performance (chord), the CPU 201 does not execute the process for determining the performance tempo, and will be described later. The process proceeds to step S1110.
- step S1106 determines whether or not the current key press is a simultaneous key press by chord performance (chord). If the determination in step S1106 is NO and it is determined that the current key press is not a simultaneous key press by chord performance (chord), the CPU 201 further determines the difference time from the previous key press to the current key press. It is determined whether or not the value of the indicated variable DeltaTime is larger than the minimum time for which the performance is considered to be interrupted (step S1107).
- step S1107 If the determination in step S1107 is YES and it is determined that the key is pressed (beginning of the performance phrase) after the performance is interrupted for a while, the performance tempo of the performance phrase cannot be determined. Therefore, the CPU 201 uses the RAM 203.
- the value calculated by the arithmetic processing exemplified by the above-mentioned equation (4) using the above timing data 605 DeltaTime_1 [AutoIndex_1] is set in the variable PlayTempo on the RAM 203 (step S1109). In this way, when the key is pressed (at the beginning of the performance phrase) after the performance is interrupted for a while, the performance tempo is provisionally set in synchronization with the timing information of the song reproduction.
- step S1107 If the determination in step S1107 is NO and it is determined that the key pressed this time is neither a simultaneous key pressed by chord performance (chord) nor a key pressed at the beginning of a performance phrase, the CPU 201 is subjected to the following equation (5). As illustrated in, the value obtained by multiplying the inverse of the variable DeltaTime, which indicates the difference time from the previous key press to the current key press, by a predetermined coefficient corresponds to the performance form data 611 during performance in FIG. It is set in the variable PlayTempo on the RAM 203 indicating the playing tempo to be performed (step S1108).
- step S1108 when the value of the variable DeltaTime indicating the time difference between the previous key press and the current key press is small, the value of PlayTempo, which is the playing tempo, becomes large (the playing tempo becomes faster), and the playing phrase. Is considered to be a fast passage, and in the voice synthesis unit 302 in the voice synthesis LSI 205, the voice waveform of the singing voice data 217 having a short time length of the consonant part is inferred as illustrated in FIG. 5A. On the other hand, when the value of the variable DeltaTime indicating the time difference is large, the value of the playing tempo becomes small (the playing tempo becomes slow), and the playing phrase is regarded as a slow passage. As illustrated in 5B, the voice waveform of the singing voice data 217 having a long time length of the consonant part is inferred.
- step S1108 After the process of step S1108 described above, after the process of step S1109 described above, or after the determination of step S1106 described above becomes YES, the CPU 201 sets the variable NoteOnTime on the RAM 203 indicating the previous key press time to the figure. The current time indicated by the timer 210 of 2 is set (step S1110).
- the CPU 201 determines the variable ShinAdjust (ShinAdjust) on the RAM 203 in which the performance tempo adjust value intentionally set by the performer is set in the value of the variable PlayTempo indicating the performance tempo determined in step S1108 or S1109.
- the value obtained by adding the values in step S1010 in FIG. 10 is set as the value of the new variable PlayTempo (step S1111).
- the CPU 201 ends the keyboard processing in step S803 of FIG. 8 exemplified by the flowchart of FIG.
- the performer can intentionally adjust the time length of the consonant portion in the singing voice voice data 217 synthesized by the voice synthesis unit 302.
- the performer may want to adjust the singing style according to the music and taste. For example, if you want to cut the sound short and play it crisply in a certain song, you want to shorten the consonants and pronounce the voice as if you sang it quickly. In some cases, you may want to pronounce a voice that allows you to clearly hear the breathing of consonants as if you were singing slowly. Therefore, in the present embodiment, the performer changes the value of the variable ShinAdjust by, for example, operating the performance tempo adjust switch on the first switch panel 102 of FIG.
- the value of the variable PlayTempo is set.
- the value of ShinAdjust can be finely controlled at any timing in the music by operating the pedal using the variable resistor connected to the electronic keyboard instrument 100 with the foot.
- the performance tempo value set in the variable PlayTempo by the above keyboard processing is set as a part of the performance singing voice data 215 in the song reproduction processing described later (see step S1305 in FIG. 13 described later), and is set in the voice synthesis LSI 205. publish.
- steps S1103 to S1109 and step S1111 corresponds to the function of the performance form output unit 603 of FIG.
- FIG. 12 is a flowchart showing a detailed example of an automatic performance interrupt process executed based on an interrupt generated every TickTime [seconds] in the timer 210 of FIG. 2 (see step S902 of FIG. 9A or step S912 of FIG. 9B). Is. The following processing is executed for the performance data sets of the track chunks 1 and 2 of the song data exemplified in FIG. 7.
- the CPU 201 executes a series of processes (steps S1201 to S1206) corresponding to the track chunk 1. First, the CPU 201 determines whether or not the SongStart value is 1 (see step S1006 in FIG. 10 and step S923 in FIG. 9C), that is, whether or not the progress of lyrics and accompaniment is instructed (step S1201).
- step S1201 When it is determined that the progress of the lyrics and the accompaniment is not instructed (the determination in step S1201 is NO), the CPU 201 does not perform the progress of the lyrics and the accompaniment, and the automatic performance illustrated in the flowchart of FIG. 12 is performed. The interrupt processing is terminated as it is.
- step S1201 When it is determined that the progress of the lyrics and the accompaniment is instructed (the determination in step S1201 is YES), the CPU 201 is on the RAM 203 indicating the relative time from the occurrence time of the previous event regarding the track chunk 1. Whether or not the value of the variable DeltaT_1 matches DeltaTime_1 [AutoIndex_1] on the RAM 203, which is the timing data 605 (FIG. 6) indicating the waiting time of the performance data set to be executed, which is indicated by the value of the variable AutoIndex_1 on the RAM 203. Is determined (step S1202).
- step S1202 If the determination in step S1202 is NO, the CPU 201 increments the value of the variable DeltaT_1 indicating the relative time from the occurrence time of the previous event by +1 with respect to the track chuck 1, and the time is increased by 1 TickTime unit corresponding to the current interrupt. (Step S1203). After that, the CPU 201 proceeds to step S1207, which will be described later.
- step S1202 When the determination in step S1202 becomes YES, the CPU 201 stores the value of the variable AutoIndex_1 indicating the position of the next song event to be executed in the track chunk 1 in the variable SongIndex on the RAM 203 (step S1204).
- the CPU 201 increments the value of the variable AutoIndex_1 for referring to the performance data set in the track chunk 1 by +1 (step S1205).
- the CPU 201 resets the variable DeltaT_1 value indicating the relative time from the occurrence time of the song event referred to this time with respect to the track chunk 1 to 0 (step S1206). After that, the CPU 201 shifts to the process of step S1207.
- the CPU 201 executes a series of processes (steps S1207 to S1213) corresponding to the track chunk 2.
- the CPU 201 has a variable DeltaT_2 value on the RAM 203 indicating a relative time from the occurrence time of the previous event regarding the track chunk 2 on the RAM 203 of the performance data set to be executed, which is indicated by the value of the variable AutoIndex_2 on the RAM 203. It is determined whether or not the timing data DeltaTime_2 [AutoIndex_2] is matched (step S1207).
- step S1207 If the determination in step S1207 is NO, the CPU 201 increments the indicator variable DeltaT_2 value by +1 for the relative time from the occurrence time of the previous event with respect to the track chuck 2, and sets the time by 1 Tick Time unit corresponding to this interrupt. Proceed (step S1208). After that, the CPU 201 ends the automatic performance interrupt process exemplified in the flowchart of FIG.
- step S1207 determines whether or not the value of the variable Bansou on the RAM 203 instructing accompaniment reproduction is 1 (with accompaniment) (without accompaniment) (step S1209) (FIG. 9C). See steps S924 to S926).
- step S1209 the CPU 201 executes the process indicated by the event data Event_2 [AutoIndex_2] on the RAM 203 regarding the accompaniment of the track chuck 2 indicated by the variable AutoIndex_2 value (step S1210). If the process indicated by the event data Event_2 [AutoIndex_2] executed here is, for example, a note-on event, the key number and velocity specified by the note-on event can be used for accompaniment to the sound source LSI 204 of FIG. An instruction to pronounce a musical tone is issued.
- Event_2 [AutoIndex_2] is, for example, a note-off event
- the key number specified by the note-off event is used to mute the accompaniment musical sound being sounded with respect to the sound source LSI 204 of FIG. Instructions are issued.
- step S1209 determines whether the determination in step S1209 is NO, the CPU 201 skips step S1210, so that the process indicated by the event data Event_2 [AutoIndex_2] relating to this accompaniment is not executed, and the progress is synchronized with the lyrics.
- the process proceeds to the next step S1211, and only the control process for advancing the progress of the event is executed.
- step S1210 the CPU 201 increments the value of the variable AutoIndex_2 for referencing the performance data set for the accompaniment data on the track chunk 2 by +1 (step S1211).
- the CPU 201 resets the value of the variable DeltaT_2 indicating the relative time from the occurrence time of the event data executed this time with respect to the track chunk 2 to 0 (step S1212).
- the CPU 201 determines whether or not the value of the timing data DeltaTime_2 [AutoIndex_2] on the RAM 203 of the performance data set on the track chunk 2 to be executed next, which is indicated by the value of the variable AutoIndex_2, is 0, that is, with this event. It is determined whether or not the events are executed at the same time (step S1213).
- step S1213 If the determination in step S1213 is NO, the CPU 201 ends the current automatic performance interrupt process exemplified by the flowchart of FIG.
- step S1213 If the determination in step S1213 is YES, the CPU 201 returns to the process of step S1209 and relates to the event data Event_2 [AutoIndex_2] on the RAM 203 of the performance data set to be executed next on the track chunk 2 indicated by the value of the variable AutoIndex_2. The control process is repeated.
- the CPU 201 repeatedly executes the processes of steps S1209 to S1213 as many times as the number of times it is executed at the same time this time.
- the above processing sequence is executed when a plurality of note-on events are sounded at the same timing, such as a chord.
- FIG. 13 is a flowchart showing a detailed example of the song reproduction process of step S805 of FIG.
- the CPU 201 determines in step S1204 in the automatic performance interrupt process of FIG. 12 whether or not a new value other than the Null value is set in the variable SongIndex on the RAM 203 and the song is in the playback state (step S1301).
- the null value is initially set in the variable SongIndex in step S922 of FIG. 9C described above, and the determination of step S1202 described above in the automatic performance interrupt process of FIG. 12 is YES each time the reproduction timing of the singing voice arrives.
- step S1204 a valid value of the variable AutoIndex_1 indicating the position of the next song event to be executed in the track chunk 1 is set, and the song reproduction process exemplified by the flowchart of FIG.
- step S1301 When the determination in step S1301 becomes YES, that is, when the current time is the timing for playing the song, the CPU 201 detects a new key pressed by the performer on the keyboard 101 in FIG. 1 by the keyboard processing in step S803 in FIG. It is determined whether or not the keyboard is used (step S1302).
- step S1302 If the determination in step S1302 is YES, the CPU 201 sets the pitch specified by the key press by the performer in a register (not particularly shown) or a variable on the RAM 203 as the vocal pitch (step S1303).
- step S1301 determines whether the current time is the timing for playing the song, and the determination in step S1302 is NO, that is, it is determined that no new key press is detected at the present time.
- the pitch data (corresponding to the pitch data 607 in the event data 606 of FIG. 6) is read from the song event data Event_1 [SongIndex] on the track chunk 1 of the song data on the RAM 203 indicated by the variable SongIndex on the RAM203, and this sound is read. High data is set as vocal pitch in a register (not shown) or a variable on RAM 203 (step S1304).
- the CPU 201 inputs a lyrics character string (corresponding to the lyrics data 608 in the event data 606 of FIG. 6) from the song event Event_1 [SongIndex] on the track chunk 1 of the song data on the RAM 203 indicated by the variable SongIndex on the RAM 203. read out. Then, the CPU 201 includes the read lyrics character string (corresponding to the playing lyrics data 609 in FIG. 6), the vocal pitch acquired in step S1303 or S1304 (corresponding to the playing pitch data 610 in FIG. 6), and Singing voice data 215 during performance in which the performance tempo (corresponding to the performance form data 611 during performance in FIG. 6) obtained in the variable PlayTempo on the RAM 203 in step S1111 in FIG. 10 corresponding to step S803 in FIG. 8 described above is set. Is set in a register (not shown) or a variable on the RAM 203 (step S1305).
- the CPU 201 issues the performance singing voice data 215 created in step S1305 to the voice synthesis unit 302 of FIG. 3 of the voice synthesis LSI 205 of FIG. 2 (step S1306).
- the voice synthesis LSI 205 presses the lyrics specified by the playing singing voice data 215 on the keyboard 101 by the performer designated by the playing singing voice data 215.
- the pitch is automatically specified as pitch data 607 (see FIG. 6) by playing the song, and the singing voice is appropriately sung at the playing tempo (singing style) specified by the singing voice data 215 during performance.
- Data 217 is inferred, synthesized, and output.
- the CPU 201 clears the value of the variable SongIndex to the Null value, and sets the timing after that to a state other than the timing of song playback (step S1307). After that, the CPU 201 ends the song reproduction process of step S805 of FIG. 8 exemplified by the flowchart of FIG.
- steps S1302 to S1304 corresponds to the function of the pitch designation unit 602 of FIG.
- the process of step S1305 corresponds to the function of the lyrics output unit 601 of FIG.
- the pronunciation time of the consonant part of the vocal voice should be long and expressive and vivid in a slow passage with few notes.
- a tone color change that matches the performance phrase, such as making the sound short and crisp.
- the above-mentioned embodiment is an embodiment of an electronic musical instrument that generates singing voice data, but as another embodiment, an embodiment of an electronic musical instrument that generates a wind instrument sound or a stringed instrument sound is also possible.
- the acoustic model unit corresponding to the acoustic model unit 306 of FIG. 3 is a learning sound showing the pitch data for learning that specifies the pitch and the sound of a certain sound source of a wind instrument or a string instrument corresponding to the pitch.
- Acoustic model parameters corresponding to the input pitch data and performance form data that are machine-learned by the teacher data corresponding to the data and the learning play form data indicating the performance form (for example, performance tempo) of the learning acoustic data.
- the pitch designation unit (corresponding to the pitch designation unit 602 in FIG. 6) outputs performance pitch data indicating the pitch designated by the performer's performance operation during performance.
- the performance mode output unit (corresponding to the performance mode output unit 603 of FIG. 6) outputs the performance mode at the time of performance, for example, the performance mode data at the time of performance indicating the performance tempo.
- the sounding model unit (corresponding to the vocal model unit 308 in FIG. 3) inputs the above-mentioned pitch data during performance and performance form data during performance into the trained acoustic model stored in the acoustic model unit at the time of performance.
- the musical sound data that infers the sound of a certain sound source source is synthesized and output.
- pitch data is inferred such that the blow sound at the beginning of blowing of a wind instrument or the speed at which the bow is applied at the moment when the string of a stringed instrument is rubbed with a bow is shortened. By combining them, a crisp performance becomes possible.
- the pitch data is inferred and synthesized so that the blow sound at the beginning of blowing of the wind instrument and the sound of the bow hitting the string at the moment of rubbing with the bow are inferred and synthesized. Highly expressive performance is possible.
- the speed of a performance phrase such as the first key press or the first key press of a performance phrase cannot be estimated, and if the player sings or plays strongly, the rising part of the consonant or sound is short.
- the strength with which the key is played is calculated by taking advantage of the fact that when singing or playing weakly, the consonant or rising part of the sound tends to be long. It may be used as a basis for.
- the speech synthesis method that can be adopted as the vocalization model unit 308 in FIG. 3 is not limited to the cepstrum speech synthesis method, and various speech synthesis methods including the LSP speech synthesis method can be adopted.
- a speech synthesis method a machine such as a statistical speech synthesis process using an HMM acoustic model, a speech synthesis method based on a statistical speech synthesis process using a DNN acoustic model, and an acoustic model combining HMM and DNN. Any speech synthesis method may be adopted as long as it is a technique using statistical speech synthesis processing based on learning.
- the lyrics data 609 at the time of performance is given as song data 604 stored in advance, but text data obtained by voice recognition of the content sung by the performer in real time is given in real time as lyrics information. You may.
- a pitch specification unit that outputs the pitch data during performance that is specified during performance
- a performance mode output unit that outputs performance mode data indicating the performance mode during performance
- a performance mode output unit The playing pitch data and the playing playing form data are based on the acoustic model parameters inferred by inputting the playing pitch data and the playing playing form data into the trained acoustic model at the time of playing.
- a sound model unit that synthesizes and outputs music sound data corresponding to Electronic musical instrument equipped with.
- a lyrics output section that outputs lyrics data at the time of performance, which shows the lyrics at the time of performance
- a pitch designation unit that outputs pitch data during performance, which is designated according to the output of the lyrics at the time of performance
- a pitch designation unit that outputs pitch data during performance, which is designated according to the output of the lyrics at the time of performance
- a performance mode output unit that outputs performance mode data indicating the performance mode during performance, and a performance mode output unit.
- the performance lyrics data based on the acoustic model parameters inferred by inputting the performance lyrics data, the performance pitch data, and the performance performance form data into the trained acoustic model at the time of the performance.
- a vocal model unit that synthesizes and outputs the singing voice data corresponding to the performance pitch data and the performance mode data during performance, and a vocal model unit.
- the performance mode output unit sequentially measures the time interval in which the pitch is designated during the performance, and sequentially outputs the performance tempo data indicating the sequentially measured time interval as the performance mode data during the performance. Or the electronic musical instrument according to any one of 2.
- the electronic musical instrument according to Appendix 4 The electronic musical instrument according to Appendix 3, wherein the performance form output unit includes a changing means for intentionally changing the performance tempo data obtained sequentially by the performer.
- For the processor of electronic musical instruments Outputs the pitch data during performance specified during performance, Outputs the performance mode data at the time of performance, which indicates the performance mode at the time of performance.
- the playing pitch data and the playing playing form data are based on the acoustic model parameters inferred by inputting the playing pitch data and the playing playing form data into the trained acoustic model at the time of playing. Synthesizes and outputs the music data corresponding to A method of controlling an electronic musical instrument that executes processing. (Appendix 6)
- For the processor of electronic musical instruments Outputs the lyrics data at the time of performance, which shows the lyrics at the time of performance.
- the pitch data at the time of performance specified according to the output of the lyrics is output.
- the performance mode data at the time of performance which indicates the performance mode at the time of performance, is output.
- the performance lyrics data based on the acoustic model parameters inferred by inputting the performance lyrics data, the performance pitch data, and the performance performance form data into the trained acoustic model at the time of the performance. Synthesizes and outputs the performance pitch data and the singing voice data corresponding to the performance mode data.
- a method of controlling an electronic musical instrument that executes processing. (Appendix 7) For the processor of electronic musical instruments Outputs the pitch data during performance specified during performance, Outputs the performance mode data at the time of performance, which indicates the performance mode at the time of performance.
- the playing pitch data and the playing playing form data are based on the acoustic model parameters inferred by inputting the playing pitch data and the playing playing form data into the trained acoustic model at the time of playing. Synthesizes and outputs the music data corresponding to A program for executing processing.
- Appendix 8 For the processor of electronic musical instruments Outputs the lyrics data at the time of performance, which shows the lyrics at the time of performance. At the time of the performance, the pitch data at the time of performance specified according to the output of the lyrics is output. The performance mode data at the time of performance, which indicates the performance mode at the time of performance, is output.
- the performance lyrics data based on the acoustic model parameters inferred by inputting the performance lyrics data, the performance pitch data, and the performance performance form data into the trained acoustic model at the time of the performance. Synthesizes and outputs the performance pitch data and the singing voice data corresponding to the performance mode data.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
Description
橋本佳,高木信二「深層学習に基づく統計的音声合成」日本音響学会誌73巻1号(2017),pp.55-62
(付記1)
演奏時に指定される演奏時音高データを出力する音高指定部と、
前記演奏時の演奏形態を示す演奏時演奏形態データを出力する演奏形態出力部と、
前記演奏時に、前記演奏時音高データ及び前記演奏時演奏形態データを学習済み音響モデルに入力することにより推論される音響モデルパラメータに基づいて、前記演奏時音高データ及び前記演奏時演奏形態データに対応する楽音データを合成し出力する発音モデル部と、
を備える電子楽器。
(付記2)
演奏時の歌詞を示す演奏時歌詞データを出力する歌詞出力部と、
前記演奏時に前記歌詞の出力に合わせて指定される演奏時音高データを出力する音高指定部と、
前記演奏時の演奏形態を示す演奏時演奏形態データを出力する演奏形態出力部と、
前記演奏時に、前記演奏時歌詞データ、前記演奏時音高データ、及び前記演奏時演奏形態データを学習済み音響モデルに入力することにより推論される音響モデルパラメータに基づいて、前記演奏時歌詞データ、前記演奏時音高データ、及び前記演奏時演奏形態データに対応する歌声音声データを合成し出力する発声モデル部と、
を備える電子楽器。
(付記3)
前記演奏形態出力部は、前記演奏時に前記音高が指定される時間間隔を順次計測し、順次計測された前記時間間隔を示す演奏テンポデータを前記演奏時演奏形態データとして順次出力する、付記1又は2の何れかに記載の電子楽器。
(付記4)
前記演奏形態出力部は、順次得られる前記演奏テンポデータを演奏者に意図的に変更させる変更手段を含む、付記3に記載の電子楽器。
(付記5)
電子楽器のプロセッサに、
演奏時に指定される演奏時音高データを出力し、
前記演奏時の演奏形態を示す演奏時演奏形態データを出力し、
前記演奏時に、前記演奏時音高データ及び前記演奏時演奏形態データを学習済み音響モデルに入力することにより推論される音響モデルパラメータに基づいて、前記演奏時音高データ及び前記演奏時演奏形態データに対応する楽音データを合成し出力する、
処理を実行させる電子楽器の制御方法。
(付記6)
電子楽器のプロセッサに、
演奏時の歌詞を示す演奏時歌詞データを出力し、
前記演奏時に前記歌詞の出力に合わせて指定される演奏時音高データを出力し、
前記演奏時の演奏形態を示す前記演奏時演奏形態データを出力し、
前記演奏時に、前記演奏時歌詞データ、前記演奏時音高データ、及び前記演奏時演奏形態データを学習済み音響モデルに入力することにより推論される音響モデルパラメータに基づいて、前記演奏時歌詞データ、前記演奏時音高データ、及び前記演奏時演奏形態データに対応する歌声音声データを合成し出力する、
処理を実行させる電子楽器の制御方法。
(付記7)
電子楽器のプロセッサに、
演奏時に指定される演奏時音高データを出力し、
前記演奏時の演奏形態を示す演奏時演奏形態データを出力し、
前記演奏時に、前記演奏時音高データ及び前記演奏時演奏形態データを学習済み音響モデルに入力することにより推論される音響モデルパラメータに基づいて、前記演奏時音高データ及び前記演奏時演奏形態データに対応する楽音データを合成し出力する、
処理を実行させるためのプログラム。
(付記8)
電子楽器のプロセッサに、
演奏時の歌詞を示す演奏時歌詞データを出力し、
前記演奏時に前記歌詞の出力に合わせて指定される演奏時音高データを出力し、
前記演奏時の演奏形態を示す前記演奏時演奏形態データを出力し、
前記演奏時に、前記演奏時歌詞データ、前記演奏時音高データ、及び前記演奏時演奏形態データを学習済み音響モデルに入力することにより推論される音響モデルパラメータに基づいて、前記演奏時歌詞データ、前記演奏時音高データ、及び前記演奏時演奏形態データに対応する歌声音声データを合成し出力する、
処理を実行させるためのプログラム。
101 鍵盤
102 第1のスイッチパネル
103 第2のスイッチパネル
104 LCD
200 制御システム
201 CPU
202 ROM
203 RAM
204 音源LSI
205 音声合成LSI
206 キースキャナ
208 LCDコントローラ
209 システムバス
210 タイマ
211、212 D/Aコンバータ
213 ミキサ
214 アンプ
215 歌声データ
216 発音制御データ
217 歌声音声データ
218 楽音データ
219 ネットワークインタフェース
300 サーバコンピュータ
301 音声学習部
302 音声合成部
303 学習用歌声解析部
304 学習用音響特徴量抽出
305 モデル学習部
306 音響モデル部
307 演奏時歌声解析部
308 発声モデル部
309 音源生成部
310 合成フィルタ部
311 学習用歌声データ
312 学習用歌声音声データ
313 学習用言語特徴量系列
314 学習用音響特徴量系列
315 学習結果データ
316 演奏時言語情報量系列
317 演奏時音響特徴量系列
318 スペクトル情報
319 音源情報
601 歌詞出力部
602 音高指定部
603 演奏形態出力部
604 曲データ
605 タイミングデータ
606 イベントデータ
607 音高データ
608 歌詞データ
609 演奏時歌詞データ
610 演奏時音高データ
611 演奏時演奏形態データ
Claims (8)
- 演奏時に指定される演奏時音高データを出力する音高指定部と、
前記演奏時の演奏形態を示す演奏時演奏形態データを出力する演奏形態出力部と、
前記演奏時に、前記演奏時音高データ及び前記演奏時演奏形態データを学習済み音響モデルに入力することにより推論される音響モデルパラメータに基づいて、前記演奏時音高データ及び前記演奏時演奏形態データに対応する楽音データを合成し出力する発音モデル部と、
を備える電子楽器。 - 演奏時の歌詞を示す演奏時歌詞データを出力する歌詞出力部と、
前記演奏時に前記歌詞の出力に合わせて指定される演奏時音高データを出力する音高指定部と、
前記演奏時の演奏形態を示す演奏時演奏形態データを出力する演奏形態出力部と、
前記演奏時に、前記演奏時歌詞データ、前記演奏時音高データ、及び前記演奏時演奏形態データを学習済み音響モデルに入力することにより推論される音響モデルパラメータに基づいて、前記演奏時歌詞データ、前記演奏時音高データ、及び前記演奏時演奏形態データに対応する歌声音声データを合成し出力する発声モデル部と、
を備える電子楽器。 - 前記演奏形態出力部は、前記演奏時に音高が指定される時間間隔を順次計測し、順次計測された前記時間間隔を示す演奏テンポデータを前記演奏時演奏形態データとして順次出力する、請求項1又は2に記載の電子楽器。
- 前記演奏形態出力部は、順次得られる前記演奏テンポデータを演奏者に変更させる変更手段を含む、請求項3に記載の電子楽器。
- 電子楽器のプロセッサに、
演奏時に指定される演奏時音高データを出力し、
前記演奏時の演奏形態を示す演奏時演奏形態データを出力し、
前記演奏時に、前記演奏時音高データ及び前記演奏時演奏形態データを学習済み音響モデルに入力することにより推論される音響モデルパラメータに基づいて、前記演奏時音高データ及び前記演奏時演奏形態データに対応する楽音データを合成し出力する、
処理を実行させる電子楽器の制御方法。 - 電子楽器のプロセッサに、
演奏時の歌詞を示す演奏時歌詞データを出力し、
前記演奏時に前記歌詞の出力に合わせて指定される演奏時音高データを出力し、
前記演奏時の演奏形態を示す前記演奏時演奏形態データを出力し、
前記演奏時に、前記演奏時歌詞データ、前記演奏時音高データ、及び前記演奏時演奏形態データを学習済み音響モデルに入力することにより推論される音響モデルパラメータに基づいて、前記演奏時歌詞データ、前記演奏時音高データ、及び前記演奏時演奏形態データに対応する歌声音声データを合成し出力する、
処理を実行させる電子楽器の制御方法。 - 電子楽器のプロセッサに、
演奏時に指定される演奏時音高データを出力し、
前記演奏時の演奏形態を示す演奏時演奏形態データを出力し、
前記演奏時に、前記演奏時音高データ及び前記演奏時演奏形態データを学習済み音響モデルに入力することにより推論される音響モデルパラメータに基づいて、前記演奏時音高データ及び前記演奏時演奏形態データに対応する楽音データを合成し出力する、
処理を実行させるためのプログラム。 - 電子楽器のプロセッサに、
演奏時の歌詞を示す演奏時歌詞データを出力し、
前記演奏時に前記歌詞の出力に合わせて指定される演奏時音高データを出力し、
前記演奏時の演奏形態を示す前記演奏時演奏形態データを出力し、
前記演奏時に、前記演奏時歌詞データ、前記演奏時音高データ、及び前記演奏時演奏形態データを学習済み音響モデルに入力することにより推論される音響モデルパラメータに基づいて、前記演奏時歌詞データ、前記演奏時音高データ、及び前記演奏時演奏形態データに対応する歌声音声データを合成し出力する、
処理を実行させるためのプログラム。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/044,922 US20240021180A1 (en) | 2020-09-11 | 2021-08-13 | Electronic musical instrument, electronic musical instrument control method, and program |
EP21866456.3A EP4213143A4 (en) | 2020-09-11 | 2021-08-13 | ELECTRONIC MUSICAL INSTRUMENT, ELECTRONIC MUSICAL INSTRUMENT CONTROL METHOD, AND PROGRAM |
CN202180062213.5A CN116057624A (zh) | 2020-09-11 | 2021-08-13 | 电子乐器、电子乐器控制方法和程序 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020-152926 | 2020-09-11 | ||
JP2020152926A JP7276292B2 (ja) | 2020-09-11 | 2020-09-11 | 電子楽器、電子楽器の制御方法、及びプログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022054496A1 true WO2022054496A1 (ja) | 2022-03-17 |
Family
ID=80632199
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/029833 WO2022054496A1 (ja) | 2020-09-11 | 2021-08-13 | 電子楽器、電子楽器の制御方法、及びプログラム |
Country Status (5)
Country | Link |
---|---|
US (1) | US20240021180A1 (ja) |
EP (1) | EP4213143A4 (ja) |
JP (2) | JP7276292B2 (ja) |
CN (1) | CN116057624A (ja) |
WO (1) | WO2022054496A1 (ja) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7143816B2 (ja) * | 2019-05-23 | 2022-09-29 | カシオ計算機株式会社 | 電子楽器、電子楽器の制御方法、及びプログラム |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050262989A1 (en) * | 2004-05-28 | 2005-12-01 | Electronic Learning Products, Inc. | Computer-aided learning system employing a pitch tracking line |
JP2015075574A (ja) * | 2013-10-08 | 2015-04-20 | ヤマハ株式会社 | 演奏データ生成装置および演奏データ生成方法を実現するためのプログラム |
JP2017107228A (ja) * | 2017-02-20 | 2017-06-15 | 株式会社テクノスピーチ | 歌声合成装置および歌声合成方法 |
WO2018016581A1 (ja) * | 2016-07-22 | 2018-01-25 | ヤマハ株式会社 | 楽曲データ処理方法およびプログラム |
JP2019184935A (ja) * | 2018-04-16 | 2019-10-24 | カシオ計算機株式会社 | 電子楽器、電子楽器の制御方法、及びプログラム |
JP6610714B1 (ja) | 2018-06-21 | 2019-11-27 | カシオ計算機株式会社 | 電子楽器、電子楽器の制御方法、及びプログラム |
JP2020152926A (ja) | 2020-06-29 | 2020-09-24 | 王子ホールディングス株式会社 | 繊維状セルロース及び繊維状セルロースの製造方法 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3823930B2 (ja) * | 2003-03-03 | 2006-09-20 | ヤマハ株式会社 | 歌唱合成装置、歌唱合成プログラム |
JP6747489B2 (ja) * | 2018-11-06 | 2020-08-26 | ヤマハ株式会社 | 情報処理方法、情報処理システムおよびプログラム |
-
2020
- 2020-09-11 JP JP2020152926A patent/JP7276292B2/ja active Active
-
2021
- 2021-08-13 EP EP21866456.3A patent/EP4213143A4/en active Pending
- 2021-08-13 CN CN202180062213.5A patent/CN116057624A/zh active Pending
- 2021-08-13 US US18/044,922 patent/US20240021180A1/en active Pending
- 2021-08-13 WO PCT/JP2021/029833 patent/WO2022054496A1/ja active Application Filing
-
2023
- 2023-04-28 JP JP2023073896A patent/JP2023100776A/ja active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050262989A1 (en) * | 2004-05-28 | 2005-12-01 | Electronic Learning Products, Inc. | Computer-aided learning system employing a pitch tracking line |
JP2015075574A (ja) * | 2013-10-08 | 2015-04-20 | ヤマハ株式会社 | 演奏データ生成装置および演奏データ生成方法を実現するためのプログラム |
WO2018016581A1 (ja) * | 2016-07-22 | 2018-01-25 | ヤマハ株式会社 | 楽曲データ処理方法およびプログラム |
JP2017107228A (ja) * | 2017-02-20 | 2017-06-15 | 株式会社テクノスピーチ | 歌声合成装置および歌声合成方法 |
JP2019184935A (ja) * | 2018-04-16 | 2019-10-24 | カシオ計算機株式会社 | 電子楽器、電子楽器の制御方法、及びプログラム |
JP6610714B1 (ja) | 2018-06-21 | 2019-11-27 | カシオ計算機株式会社 | 電子楽器、電子楽器の制御方法、及びプログラム |
JP2019219568A (ja) * | 2018-06-21 | 2019-12-26 | カシオ計算機株式会社 | 電子楽器、電子楽器の制御方法、及びプログラム |
JP2020152926A (ja) | 2020-06-29 | 2020-09-24 | 王子ホールディングス株式会社 | 繊維状セルロース及び繊維状セルロースの製造方法 |
Non-Patent Citations (1)
Title |
---|
KEI HASHIMOTOSHINJI TAKAKI: "Statistical parametric speech synthesis based on deep learning", JOURNAL OF THE ACOUSTICAL SOCIETY OF JAPAN, vol. 73, no. 1, 2017, pages 55 - 62 |
Also Published As
Publication number | Publication date |
---|---|
JP2022047167A (ja) | 2022-03-24 |
EP4213143A1 (en) | 2023-07-19 |
CN116057624A (zh) | 2023-05-02 |
EP4213143A4 (en) | 2024-10-23 |
JP7276292B2 (ja) | 2023-05-18 |
JP2023100776A (ja) | 2023-07-19 |
US20240021180A1 (en) | 2024-01-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6547878B1 (ja) | 電子楽器、電子楽器の制御方法、及びプログラム | |
JP6610714B1 (ja) | 電子楽器、電子楽器の制御方法、及びプログラム | |
JP6610715B1 (ja) | 電子楽器、電子楽器の制御方法、及びプログラム | |
JP6587008B1 (ja) | 電子楽器、電子楽器の制御方法、及びプログラム | |
US11417312B2 (en) | Keyboard instrument and method performed by computer of keyboard instrument | |
JP6835182B2 (ja) | 電子楽器、電子楽器の制御方法、及びプログラム | |
JP6766935B2 (ja) | 電子楽器、電子楽器の制御方法、及びプログラム | |
JP2023100776A (ja) | 電子楽器、電子楽器の制御方法、及びプログラム | |
JP6760457B2 (ja) | 電子楽器、電子楽器の制御方法、及びプログラム | |
JP6801766B2 (ja) | 電子楽器、電子楽器の制御方法、及びプログラム | |
JP6819732B2 (ja) | 電子楽器、電子楽器の制御方法、及びプログラム | |
JP2022038903A (ja) | 電子楽器、電子楽器の制御方法、及びプログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21866456 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202317015268 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18044922 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2021866456 Country of ref document: EP Effective date: 20230411 |