WO2022202415A1 - 機械学習モデルを用いた信号処理方法、信号処理装置および音生成方法 - Google Patents
機械学習モデルを用いた信号処理方法、信号処理装置および音生成方法 Download PDFInfo
- Publication number
- WO2022202415A1 WO2022202415A1 PCT/JP2022/011067 JP2022011067W WO2022202415A1 WO 2022202415 A1 WO2022202415 A1 WO 2022202415A1 JP 2022011067 W JP2022011067 W JP 2022011067W WO 2022202415 A1 WO2022202415 A1 WO 2022202415A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- degree
- acoustic feature
- control value
- sequence
- signal processing
- Prior art date
Links
- 238000012545 processing Methods 0.000 title claims abstract description 54
- 238000003672 processing method Methods 0.000 title claims description 22
- 238000000034 method Methods 0.000 title claims description 17
- 238000010801 machine learning Methods 0.000 title claims description 12
- 239000013643 reference control Substances 0.000 claims description 46
- 230000005236 sound signal Effects 0.000 claims description 13
- 238000001514 detection method Methods 0.000 claims description 4
- 206010010219 Compulsions Diseases 0.000 abstract description 6
- 238000012549 training Methods 0.000 description 21
- 238000010586 diagram Methods 0.000 description 18
- 238000001228 spectrum Methods 0.000 description 9
- 230000015572 biosynthetic process Effects 0.000 description 7
- 239000011295 pitch Substances 0.000 description 7
- 238000003786 synthesis reaction Methods 0.000 description 7
- 230000007274 generation of a signal involved in cell-cell signaling Effects 0.000 description 6
- 238000013473 artificial intelligence Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 108010076504 Protein Sorting Signals Proteins 0.000 description 3
- 239000000284 extract Substances 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/02—Means for controlling the tone frequencies, e.g. attack or decay; Means for producing special musical effects, e.g. vibratos or glissandos
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
- G10H1/0025—Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/46—Volume control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2210/00—Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
- G10H2210/101—Music Composition or musical creation; Tools or processes therefor
- G10H2210/111—Automatic composing, i.e. using predefined musical rules
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/155—User input interfaces for electrophonic musical instruments
- G10H2220/161—User input interfaces for electrophonic musical instruments with 2D or x/y surface coordinates sensing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2220/00—Input/output interfacing specifically adapted for electrophonic musical tools or instruments
- G10H2220/155—User input interfaces for electrophonic musical instruments
- G10H2220/315—User input interfaces for electrophonic musical instruments for joystick-like proportional control of musical input; Videogame input devices used for musical input or control, e.g. gamepad, joysticks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/311—Neural networks for electrophonic musical instruments or musical processing, e.g. for musical recognition or control, automatic composition or improvisation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H2250/00—Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
- G10H2250/315—Sound category-dependent sound synthesis processes [Gensound] for musical use; Sound category-specific synthesis-controlling parameters or control means therefor
- G10H2250/455—Gensound singing voices, i.e. generation of human voices for musical applications, vocal singing sounds or intelligible words at a desired pitch or with desired vocal effects, e.g. by phoneme synthesis
Definitions
- the present invention relates to a signal processing method, a signal processing device, and a sound generation method capable of generating sound.
- An AI (artificial intelligence) singer is known as a sound source for singing in a specific singer's singing style. By learning the characteristics of a specific singer's singing, the AI singer imitates the singer and generates arbitrary sound signals. Here, it is preferable that the AI singer can generate a sound signal reflecting not only the singing characteristics of the learned singer, but also the user's instructions on how to sing. Jesse Engel, Lamtharn Hantrakul, Chenjie Gu and Adam Roberts, "DDSP: Differentiable Digital Signal Processing", arXiv:2001.04643v1 [cs.LG] 14 Jan 2020
- Non-Patent Document 1 describes a neural synthesis model that generates a sound signal based on a user's input sound.
- the synthesis model allows the user to indicate pitch or volume to the synthesis model during synthesis.
- the user needs to specify the pitch or volume in detail.
- giving detailed instructions is troublesome for the user.
- An object of the present invention is to provide a signal processing method, a signal processing device, and a sound generation method capable of generating a high-quality sound signal without requiring the user to do troublesome work.
- a signal processing method receives a control value indicative of a musical characteristic and selects either a first degree of enforcement or a second degree of enforcement lower than the first degree of enforcement. Receives a selection signal for and uses a trained model to generate an acoustic feature sequence reflecting the control value according to the first degree of coercion and an acoustic feature sequence reflecting the control value according to the second coercion degree. and one of them according to a selection signal, and is implemented by a computer.
- a signal processing apparatus receives a control value indicating a musical characteristic, and sets either a first forcing degree or a second forcing degree lower than the first forcing degree.
- a receiving unit that receives a selection signal for selection, an acoustic feature value sequence that reflects a control value according to a first degree of coercion, and a control value that reflects a control value according to a second degree of coercion using a trained model and a sound generating unit for generating either one of the sound feature quantity sequence according to the selection signal.
- a sound generation method is a system for generating a sound of a piece of music corresponding to a given string of notes, receiving from a user an instruction of a control value indicating a musical characteristic,
- the trained model is used to generate a sound that reflects the instruction from the user according to the first degree of enforcement, and the user
- the trained model is used to generate a sound that reflects the indication from the user with less than the first degree of enforcement.
- a high-quality sound signal can be generated without the user's troublesome work.
- FIG. 1 is a block diagram showing the configuration of a processing system including a signal processing device according to one embodiment of the present invention.
- FIG. 2 is a block diagram showing the configuration of the signal processing device.
- FIG. 3 is a diagram showing an example of a GUI displayed on the display unit.
- FIG. 4 is a block diagram showing the configuration of the training device.
- FIG. 5 is a diagram for explaining the operation of the training device.
- FIG. 6 is a diagram for explaining the operation of the training device.
- FIG. 7 is a diagram for explaining the operation of the training device.
- FIG. 8 is a diagram for explaining the operation of the training device.
- FIG. 9 is a flow chart showing an example of signal processing by the signal processing device of FIG.
- FIG. 10 is a flow chart showing an example of training processing by the training device of FIG.
- FIG. 11 is a schematic diagram showing a processing system in the first modified example.
- FIG. 12 is a schematic diagram showing a processing system in the second modified example.
- FIG. 1 is a block diagram showing the configuration of a processing system including a signal processing device according to one embodiment of the present invention.
- the processing system 100 includes a RAM (random access memory) 110, a ROM (read only memory) 120, a CPU (central processing unit) 130, a storage section 140, an operation section 150 and a display section 160. .
- RAM random access memory
- ROM read only memory
- CPU central processing unit
- the processing system 100 is implemented by a computer such as a PC, tablet terminal, or smart phone. Alternatively, the processing system 100 may be realized by cooperative operation of a plurality of computers connected by a communication channel such as Ethernet.
- RAM 110 , ROM 120 , CPU 130 , storage unit 140 , operation unit 150 and display unit 160 are connected to bus 170 .
- RAM 110 , ROM 120 and CPU 130 constitute signal processing device 10 and training device 20 .
- the signal processing device 10 and the training device 20 are configured by the common processing system 100, but may be configured by separate processing systems.
- the RAM 110 consists of, for example, a volatile memory, and is used as a work area for the CPU 130.
- the ROM 120 is, for example, a non-volatile memory and stores a signal processing program and a training program.
- CPU 130 performs signal processing by executing a signal processing program stored in ROM 120 on RAM 110 . Further, CPU 130 performs training processing by executing a training program stored in ROM 120 on RAM 110 . Details of signal processing and training processing will be described later.
- the signal processing program or training program may be stored in the storage unit 140 instead of the ROM 120.
- the signal processing program or training program may be provided in a form stored in a computer-readable storage medium and installed in ROM 120 or storage unit 140 .
- a signal processing program distributed from a server (including a cloud server) on the network may be installed in the ROM 120 or the storage unit 140.
- the storage unit 140 includes a storage medium such as a hard disk, an optical disk, a magnetic disk, or a memory card.
- the storage unit 140 stores an untrained generative model m, a trained model M, a plurality of musical score data D1, a plurality of reference musical score data D2, and a plurality of reference data D3.
- Each piece of musical score data D1 represents a musical score that includes a time series (note string) of a plurality of notes arranged on a time axis as a musical score feature amount string.
- the trained model M includes, for example, a DNN (deep neural network).
- the trained model M is a generative model that receives the musical score feature quantity string of the musical score data D1 and generates an acoustic feature quantity string that reflects the musical score feature quantity string.
- the acoustic feature quantity sequence is a time series of feature quantities representing acoustic features such as pitch, volume, frequency spectrum, and the like.
- the trained model M further receives a control value indicating a musical feature, it generates an acoustic feature quantity string reflecting the score feature quantity string and the control value.
- the control value is a feature quantity such as volume instructed by the user.
- the first acoustic feature value sequence generated by the trained model M is a frequency spectrum time series
- the control value is generated from a second acoustic feature value sequence representing the volume time series.
- the trained model M may generate a first acoustic feature value sequence indicating other acoustic feature values
- the control value may be a second acoustic feature value indicating other acoustic feature values. It may be generated from an acoustic feature quantity sequence.
- the first acoustic feature amount and the second acoustic feature amount may be the same feature amount.
- the trained model M may be trained to generate a sequence of acoustic features representing detailed pitch changes from a sequence of control values representing rough pitch changes.
- the signal processing device 10 uses the trained model M to generate a plurality of acoustic signals in which the control values are reflected at a plurality of forcing degrees in accordance with a selection signal for selecting the degree of reflection of the control values in the generated acoustic feature sequence.
- a selection signal for selecting the degree of reflection of the control values in the generated acoustic feature sequence.
- the trained model M may include an autoregressive DNN. This trained model M generates an acoustic feature value sequence corresponding to real-time changes in the control value and the degree of coercion.
- Each piece of reference musical score data D2 indicates a musical score including a time series of multiple notes arranged on the time axis.
- a musical score feature value string input to the trained model M is generated from each piece of reference musical score data D2.
- Each reference data D3 is waveform data representing a time series of samples of a performance sound waveform obtained by playing the time series of the note.
- the plurality of reference musical score data D2 and the plurality of reference data D3 correspond to each other.
- the reference musical score data D2 and the corresponding reference data D3 are used for building the trained model M by the training device 20.
- a frequency spectrum time series is extracted as a first reference acoustic feature value sequence
- a sound volume time series is extracted as a second reference acoustic feature value sequence.
- a time series of control values indicating musical features is obtained as a reference control value sequence from the second reference acoustic feature value sequence.
- a plurality of reference control value sequences having different finenesses are generated from the second reference acoustic feature value sequence corresponding to a plurality of forcing degrees.
- the degree of definition indicates the frequency of changes in the feature amount over time, and the higher the degree of definition, the more frequently the value of the feature amount changes. Also, high definition corresponds to high enforcement, and low definition corresponds to low enforcement.
- the trained model M is constructed by having the generative model m learn the input/output relationship between the reference musical score feature value sequence and a plurality of reference control value sequences at each forcing degree, and the corresponding first reference acoustic feature value sequence. be.
- the untrained generative model m, the trained model M, the musical score data D1, the reference musical score data D2, the reference data D3, etc. may not be stored in the storage unit 140, but may be stored in a computer-readable storage medium.
- the untrained generative model m, the trained model M, the musical score data D1, the reference musical score data D2, the reference data D3, etc. are stored in a server on the network. may be
- the operation unit 150 includes a pointing device such as a mouse or a keyboard, and is operated by the user to instruct control values.
- the display unit 160 includes, for example, a liquid crystal display, and displays a predetermined GUI (Graphical User Interface) or the like. Operation unit 150 and display unit 160 may be configured by a touch panel display.
- the display unit 160 may display an image of a simulated performer such as an AI singer performing the musical score data D1. Furthermore, the emphasis effect indicating the player's display mode and excitement displayed on the display unit 160 may be changed in accordance with changes in the performance based on the user's operation.
- FIG. 2 is a block diagram showing the configuration of the signal processing device 10.
- FIG. 3 is a diagram showing an example of a GUI displayed on the display unit 160.
- the signal processing device 10 includes a reception section 11 , a signal generation section 12 and a sound generation section 13 .
- Functions of the reception unit 11, the signal generation unit 12, and the sound generation unit 13 are realized by the CPU 130 in FIG. 1 executing a signal processing program.
- At least part of the reception unit 11, the signal generation unit 12, and the sound generation unit 13 may be realized by hardware such as an electronic circuit.
- the reception unit 11 causes the display unit 160 to display the GUI 30 operated by the user, as shown in FIG.
- the GUI 30 displays an instruction bar 31 extending in one direction and a slider 32 movable on the instruction bar 31 .
- the position of the slider 32 on the indication bar 31 corresponds to control values indicating musical characteristics.
- the user instructs a control value according to the position of the slider 32 by operating the operation unit 150 in FIG. 1 to move the slider 32 on the instruction bar 31 .
- the accepting unit 11 accepts the control value indicated through the GUI 30 from the operating unit 150 .
- the user can select any one of the first, second and third forcing degrees as the forcing degree for signal processing.
- Check boxes 33a, 33b, and 33c corresponding to the three forcing degrees are further displayed.
- the user can select the degree of enforcement by operating the operation unit 150 and checking the check boxes 33a to 33c corresponding to the desired degree of enforcement.
- the first degree of enforcement is higher than the second degree of enforcement
- the second degree of enforcement is higher than the third degree of enforcement.
- the acoustic feature sequence generated by the trained model M is relatively strongly coerced to the control value and follows the change of the control value relatively tightly. change over time.
- the generated acoustic feature sequence is relatively weakly enforced by the control value, and temporally changes relatively loosely following changes in the control value. For example, if the third forcing degree is zero, the generated acoustic feature quantity sequence changes regardless of the control value.
- a check box for selecting the degree of enforcement is displayed on the GUI 30, but the embodiment is not limited to this.
- the GUI 30 may display a pull-down menu or the like for selecting the degree of enforcement instead of the check box.
- the signal generation unit 12 generates a selection signal indicating the degree of forcing selected by the user on the operation unit 150 through the GUI 30 .
- the degree of enforcement may be automatically selected without being selected by the user.
- the signal generation unit 12 analyzes the musical score data D1, detects portions where the dynamics suddenly change (portions marked with dynamic symbols such as forte or piano, etc.), and It is possible to select a degree of enforcement and select a low degree of enforcement in other parts. Then, the signal generator 12 generates a selection signal indicating the degree of forcing automatically selected based on the musical score data D1 at each time point t, and supplies it to the sound generator 13 . Therefore, the check boxes 33a to 33c are not displayed on the GUI 30. FIG.
- the user operates the operation unit 150 to designate the musical score data D1 to be used for signal processing from among the plurality of musical score data D1 stored in the storage unit 140 or the like.
- the sound generation unit 13 acquires the trained model M stored in the storage unit 140 or the like and the musical score data D1 specified by the user.
- the sound generator 13 functions as a signal receiver that receives the selection signal from the signal generator 12 .
- the sound generation unit 13 also functions as a vector generation unit that generates a control vector composed of a plurality of elements according to the degree of forcing indicated by the selection signal from the control value. Details of the control vector will be described later.
- the sound generation unit 13 generates a musical score feature amount from the acquired musical score data D1 at each time point t, and processes the control value from the reception unit 11 according to the degree of forcing indicated by the received selection signal.
- the musical score features and the processed control values are supplied to the trained model M.
- the trained model M generates, at each point in time t, an acoustic feature value string that reflects the control value according to the degree of forcing indicated by the selection signal and that corresponds to the musical score data D1.
- a sound signal is generated by a known sound signal generating device such as a vocoder (not shown) based on the acoustic feature amount at each time point t.
- the generated sound signal is supplied to a playback device (not shown) such as a speaker and converted into sound.
- FIG. 4 is a block diagram showing the configuration of the training device 20. As shown in FIG. 5 to 8 are diagrams for explaining the operation of the training device 20.
- the training device 20 includes an extraction unit 21, an acquisition unit 22 and a construction unit .
- the functions of the extraction unit 21, the acquisition unit 22, and the construction unit 23 are realized when the CPU 130 in FIG. 1 executes a training program.
- At least part of the extraction unit 21, the acquisition unit 22, and the construction unit 23 may be realized by hardware such as an electronic circuit.
- the extraction unit 21 extracts the first reference acoustic feature value sequence and the second reference acoustic feature value sequence from the sound wave waveform of each reference data D3 stored in the storage unit 140 and the like.
- An example of a sound waveform in the reference data D3 is shown in the upper part of FIG.
- the lower part of FIG. 5 shows the second reference acoustic feature quantity sequence extracted from the reference data D3 representing the sound waveform.
- the feature amount (volume in this example) in the second reference acoustic feature amount sequence changes temporally with high definition.
- the acquisition unit 22 generates a plurality of reference control value sequences corresponding to a plurality of forcing degrees by lowering the definition of each second reference acoustic feature quantity sequence from the extracting unit 21 according to a plurality of forcing degrees. .
- a high definition corresponds to a high degree of enforcement.
- the acquisition unit 22 extracts the representative value of the second reference acoustic feature quantity sequence within a predetermined period T including each time point t.
- the interval between two adjacent time points is, for example, 5 milliseconds, and each time point t is positioned at the center of the corresponding predetermined period T.
- FIG. 6 the acquisition unit 22 extracts the representative value of the second reference acoustic feature quantity sequence within a predetermined period T including each time point t.
- the interval between two adjacent time points is, for example, 5 milliseconds, and each time point t is positioned at the center of the corresponding predetermined period T.
- the representative value at each time point t is the maximum value of the second reference acoustic feature quantity sequence within the corresponding period T, but the embodiment is not limited to this.
- the representative value at each time point t may be a statistical value such as the mean value, median value, mode value, variance or standard deviation of the second reference acoustic feature value sequence within the corresponding period T.
- a high degree of coercion therefore corresponds to a short period of time T. For example, let the length of the period T corresponding to the first higher degree of enforcement be 1 second, and the length of the period T corresponding to the second lower degree of enforcement be 3 seconds.
- the acquiring unit 22 arranges the representative values of the plurality of time points t extracted from the second reference acoustic feature value sequence in chronological order according to the degree of forcing, thereby obtaining a reference control value sequence with the degree of definition corresponding to the degree of forcing. Generate.
- the upper part of FIG. 7 shows a reference control value string (first reference control value string) corresponding to the first degree of enforcement.
- the lower part of FIG. 7 shows a reference control value string (second reference control value string) corresponding to the second forcing degree.
- the feature amount in the reference control value sequence corresponding to the low forcing degree changes over time with low definition.
- each vector in the reference control vector sequence contains 5 elements.
- the first and second of the five elements correspond to the first degree of enforcement
- the third and fourth elements correspond to the second degree of enforcement
- the fifth element corresponds to the third degree of enforcement.
- the reference control vector sequence at the first degree of coercion shown in the upper part of FIG. reflected.
- the larger the feature amount the smaller the first element and the larger the second element (upper right figure).
- the sum of the first and second elements is 1, and the third to fifth elements that do not correspond to the first forcing degree are set to zero.
- the construction unit 23 prepares a generative model m (untrained or pre-trained) composed of DNN.
- the constructing unit 23 uses a machine learning technique to combine the first reference acoustic feature value sequence from the extracting unit 21 with the corresponding reference control value sequence and the corresponding reference musical score feature value sequence from the acquiring unit 22. train a generative model m based on As a result, the trained model M has learned the input/output relationship between the reference musical score feature value sequence and the reference control value sequence corresponding to a plurality of forcing degrees as inputs, and the first reference acoustic feature value sequence as output. is constructed.
- the input/output relationship includes a first input/output relationship, a second input/output relationship and a third input/output relationship.
- a first input/output relationship is a relationship between a first reference control vector including first and second elements representing musical features at a first forcing degree and a first reference acoustic feature quantity sequence.
- the second input/output relationship is the relationship between the second reference control vector including the third and fourth elements representing the musical features at the second forcing degree and the first reference acoustic feature quantity sequence.
- the third input/output relationship is the relationship between the third reference control vector including the fifth element representing the musical feature at the third forcing degree and the first reference acoustic feature quantity sequence.
- the construction unit 23 stores the constructed trained model M in the storage unit 140 or the like.
- FIG. 9 is a flowchart showing an example of signal synthesis processing by the signal processing device 10 of FIG.
- the signal processing in FIG. 9 is performed by CPU 130 in FIG. 1 executing a signal processing program stored in storage unit 140 or the like.
- the CPU 130 determines whether or not the musical score data D1 has been selected by the user (step S1). If the musical score data D1 is not selected, the CPU 130 waits until the musical score data D1 is selected.
- the CPU 130 sets the current time t to the top of the musical score data and causes the display unit 160 to display the GUI 30 of FIG. 3 (step S2).
- CPU 130 generates a selection signal indicating a degree of enforcement predetermined as an initial setting (for example, a third degree of enforcement) as the current selection signal (step S3).
- CPU 130 accepts a preset volume value (eg, -10 dB) as an initial setting as the current control value (step S4). Any of steps S2 to S4 may be performed first, or may be performed simultaneously.
- step S5 determines whether or not the user has selected a forcing degree on the GUI 30 displayed in step S2 (step S5). If the degree of enforcement is not selected, the CPU 130 proceeds to step S7. When the degree of enforcement is selected, the CPU 130 receives a selection signal corresponding to the selected degree of enforcement, updates the current selection signal (step S6), and proceeds to step S7.
- step S7 the CPU 130 determines whether or not the user has instructed a control value on the GUI 30 displayed at step S2 (step S7). If no control value is indicated, the CPU 130 proceeds to step S9. When the control value is instructed, the CPU 130 accepts the control value according to the instruction, updates the current control value (step S8), and proceeds to step S9. Either of steps S5 and S6 and steps S7 and S8 may be executed first.
- step S9 CPU 130 uses the trained model M to extract the score data D1 selected in step S1, the current selection signal generated in step S3 or S6, and the current tense received in step S4 or S8. Acoustic features (frequency spectrum) at the current time t are generated according to the system. Specifically, the CPU 130 first generates the current musical score feature amount from the musical score data D1, and also generates the current control vector corresponding to the degree of forcing indicated by the current selection signal from the current control value. That is, when the current selection signal indicates the first degree of forcing, the current control value is reflected in the first and second elements of the control vector (upper part of FIG. 8), and when the second degree of forcing is indicated, , the current control value is reflected in the third and fourth elements (middle of FIG.
- the CPU 130 uses the trained model M to process the current musical score feature amount of the musical score data D1 and the current control vector. Thereby, the CPU 130 generates the current acoustic feature amount reflecting the current control value according to the degree of forcing indicated by the current selection signal (step S9).
- a sound signal is generated from the current acoustic feature amount (frequency spectrum) by the sound signal generation device, and reproduced by the reproduction device.
- a selection signal string is received by the CPU 130's repeated execution of steps S5 and S6. By repeatedly executing steps S7 and S8, a control value string is received.
- steps S7 and S8 By repeatedly executing steps S7 and S8, a control value string is received.
- step S9 a musical score feature value string is generated from the musical score data D1, and a control vector string corresponding to the received selection signal string is generated from the received control value string. Further, by repeatedly executing step S9 by the CPU 130, the trained model M is used to generate an acoustic feature quantity sequence according to the musical score feature quantity sequence and the control vector sequence.
- the control vector sequence shown in the upper part of FIG. 8 is generated from the control value sequence and processed by the trained model M.
- the volume of the acoustic features (frequency spectrum) generated by the trained model M closely follows changes in the control value (volume) in the control value sequence.
- the control vector sequence shown in the middle of FIG. 8 is generated from the control value sequence and processed by the trained model M.
- the volume of the acoustic features (frequency spectrum) generated by the trained model M loosely follows changes in the control value (volume) in the control value sequence.
- the control vector sequence shown in the lower part of FIG. 8 is generated from the control value sequence and processed by the trained model M.
- the volume of the acoustic features (frequency spectrum) generated by the trained model M changes regardless of changes in the control values (volume) in the control value sequence.
- the trained model M Since the trained model M has learned to generate high-definition first acoustic feature amounts, it generates high-definition acoustic feature amounts whose volume changes in any period.
- the CPU 130 ends the signal processing.
- FIG. 10 is a flowchart showing an example of training processing by the training device 20 of FIG.
- the training process in FIG. 10 is performed by CPU 130 in FIG. 1 executing a training program stored in storage unit 140 or the like.
- the CPU 130 acquires a plurality of reference data D3 used for training from the storage unit 140 or the like (step S11).
- the CPU 130 extracts a first acoustic feature quantity sequence (frequency spectrum time series) and a second reference acoustic feature quantity sequence (volume time series) from each reference data D3 acquired in step S11 (step S12). ).
- the CPU 130 generates a reference control value string at the first forcing degree from each of the extracted second reference acoustic feature value strings (step S13). Also, the CPU 130 generates a reference control value sequence at the second forcing degree from each second reference acoustic feature value sequence (step S14). Furthermore, the CPU 130 generates a reference control value string at the third forcing degree from each second reference acoustic feature value string (step S15). Any of steps S13 to S15 may be executed first. Also, if the third forcing degree is zero, generation of the corresponding reference control value sequence is unnecessary, and step S15 can be omitted.
- the CPU 130 prepares the generative model m having the input of the reference control vector sequence, the reference musical score feature value sequence generated from the reference musical score data D2 corresponding to each reference data D3, and the reference musical score data generated in steps S13 to S15.
- the generative model m is trained using the control value sequence and the first reference acoustic feature value sequence extracted in step S12.
- the CPU 130 determines the input/output relationship between each of the reference musical score feature sequence as input and the plurality of reference control value sequences corresponding to the plurality of forcing degrees, and the first reference acoustic feature sequence as output.
- the generative model m is machine-learned (step S16).
- the CPU 130 determines whether sufficient machine learning has been performed for the generative model m to learn the input/output relationship (step S17). If the quality of the generated acoustic feature quantity is low and it is determined that the machine learning is insufficient, the CPU 130 returns to step S16. Steps S16 to S17 are repeated while changing the parameters until sufficient machine learning is performed. The number of iterations of machine learning changes according to quality conditions that the trained model M to be constructed should satisfy.
- the generative model m is trained to obtain a reference musical score feature value string as an input, each of a plurality of reference control value strings corresponding to a plurality of degrees of forcing, and as an output , and the CPU 130 stores the generative model m, which has learned the input/output relationship, as a trained model M (step S18). exit.
- the selection of the degree of enforcement and the instruction of the control value are not limited to the operation of the operation unit 150 on the GUI 30 by the user. Selection of the degree of compulsion and indication of the control value may be performed by a physical knob operation by the user without the GUI 30 . In this case, step S2 of the signal processing in FIG. 9 is not executed.
- FIG. 11 is a schematic diagram showing the processing system 100 in the first modified example.
- the processing system 100 further includes a planar proximity sensor 180 .
- the front-back direction, the up-down direction, and the left-right direction of the proximity sensor 180 are defined as first, second, and third directions, respectively.
- the proximity sensor 180 is, for example, an electrostatic sensor, and detects first, second and third positions of a user's hand as a detection target in first, second and third directions.
- the first position corresponds to a larger control value (volume) toward the back.
- the second position corresponds to a higher degree of forcing downward.
- the third position (left and right) may correspond to a playing style that is louder to the right or a pitch that is higher to the right.
- the second position is the distance between the proximity sensor 180 and the hand, and the lower the second position (the closer the distance), the higher the accuracy or speed of detecting the first position or the third position of the proximity sensor 180 . Therefore, if the degree of enforcement is made higher as the second position is lower as in this example, the feeling of use for the user is improved when the degree of enforcement is increased.
- the correspondence relationships between the first to third directions, control values, forcing levels, performance styles, and the like are not limited to the above examples.
- the accepting unit 11 accepts instructions for different control values based on the first position (forward and backward) detected by the proximity sensor 180 .
- the signal generator 12 accepts selection of a different forcing degree based on the detected second position (upper and lower), and generates a selection signal indicating the accepted forcing degree.
- the receiving unit 11 also receives instructions for different performance styles or pitches based on the detected third positions (left and right).
- FIG. 12 is a schematic diagram showing the processing system 100 in the second modified example.
- the operating section 150 includes a stick-shaped operating lever 151 and an operating trigger 152 provided at the upper end of the operating lever 151 .
- the operation lever 151 and the operation trigger 152 are examples of first and second operators, respectively.
- the tilt angle of the operation lever 151 in the front-rear direction corresponds to a larger control value as it tilts further.
- the amount of depression of the operation trigger 152 corresponds to a greater degree of force as it is pushed downward.
- the user changes the control value and the degree of forcing by operating the operating lever 151 and the operating trigger 152 .
- the receiving unit 11 receives selection of different control values based on the tilt angle of the operating lever 151 .
- the signal generation unit 12 receives instructions for different degrees of enforcement based on the amount of depression of the operation trigger 152, and generates a selection signal indicating the received degrees of enforcement.
- the signal processing method is a method implemented by a computer. Receive a selection signal for selecting one of the forcing degrees of , generates one of the first acoustic feature quantity sequences according to the selection signal.
- the sound generation method receives an instruction of a control value from a user in a system that synthesizes the sound of a piece of music corresponding to a given note string.
- the trained model is used to generate a sound that reflects the indication from the user according to the first degree of enforcement.
- the trained model is used to generate a sound that reflects the instruction from the user according to the second degree of enforcement. be done.
- the trained model is used to generate sounds that do not reflect the direction from the user.
- the first forcing degree it is possible to generate the first acoustic feature value sequence that relatively closely follows the control value. Also, by selecting the second degree of forcing, it is possible to generate the first acoustic feature value sequence that relatively loosely follows the control value. Furthermore, by selecting the third forcing degree, it is possible to generate the first acoustic feature value sequence that changes independently of the control value. Therefore, the user does not need to specify detailed control values for the entire piece of music, and selects the first forcing degree only at key points in the piece of music and specifies detailed control values to obtain the desired sound. Synthesis is possible. As a result, high-quality performance sounds can be generated without the need for troublesome operations by the user.
- the trained model With respect to the reference data indicating the sound waveform, the first reference control value sequence indicating the musical features in the first degree of enforcement as an input, and the first reference of the reference data as an output a first relationship between the acoustic feature quantity sequence and a second reference control value sequence indicating musical features at the second forcing degree as input and the first reference acoustic feature quantity sequence as output may have been learned.
- the trained model further obtains, with respect to the reference data indicating the sound waveform, the third reference control value sequence indicating the musical features at the third forcing degree as input, and the reference data as output.
- a third relationship with one reference acoustic feature quantity sequence may be learned.
- the first reference control value sequence temporally changes at the first resolution according to the second reference acoustic feature quantity sequence
- the second reference control value sequence varies according to the second reference acoustic feature quantity sequence. It may vary in time with a second level of detail.
- the first reference acoustic feature amount and the second reference acoustic feature amount may be the same acoustic feature amount, or may be different acoustic feature amounts.
- the first reference control value at each point in time is a representative value of the second reference acoustic feature quantity sequence of the reference data within the first period including that point in time
- the second reference control value at each point in time is and a representative value of the second reference acoustic feature quantity sequence of the reference data within a second period longer than the first period.
- the degree of enforcement is selected in three stages including zero, but the embodiment is not limited to this.
- the forcing degree may be selected in two stages, or may be selected in four or more stages.
- selection may be made in two stages, the first enforcement degree and the second enforcement degree.
- the first acoustic feature value sequence generated at the first forcing degree follows the control value relatively tightly and changes over time.
- the first acoustic feature quantity sequence generated at the second degree of enforcement changes over time in a relatively loose manner following the control value.
- the enforcement degree may be selected in two stages of the first enforcement degree and the third enforcement degree, or may be selected in two stages of the second enforcement degree and the third enforcement degree.
- the first acoustic feature quantity sequence generated at the first or second forcing degree changes temporally following the control value.
- the first acoustic feature quantity sequence generated at the third forcing degree changes independently of the control value.
- the user operates the operator to input the control value in real time. It may be given to the model M to generate an acoustic feature sequence.
- Supplementary note receiving a control value indicative of a musical characteristic; receiving a selection signal indicating the degree of enforcement of the control value in signal processing; generating a control vector consisting of a plurality of elements according to the degree of forcing indicated by the selection signal from the control value; generating an acoustic feature sequence according to the control vector using the trained model; A signal processing method implemented by a computer.
- the control vector generated from the control value has at least a first element corresponding to a first degree of forcing and a second element corresponding to a second degree of forcing lower than the first degree of forcing.
- the signal processing method comprising: (Aspect 3)
- the trained model includes a first reference control vector including the first element indicating the musical feature at the first enforcement degree of the reference data indicating the sound waveform, and the first element of the reference data. and the second reference control vector including the second element indicating the musical feature at the second forcing degree and the first reference acoustic feature
- the signal processing method according to aspect 2, wherein the second input/output relationship with the quantity sequence has been learned.
- the control value can take an intermediate value between the first degree of enforcement and the second degree of enforcement. (Aspect 5) 5.
- control value is reflected in at least one of the plurality of elements of the generated control vector that corresponds to the degree of forcing indicated by the selection signal.
- Method. (Aspect 6) a signal receiving unit that receives a control value indicating a musical feature and a selection signal indicating the degree of enforcement of the control value in signal processing; a vector generator that generates a control vector consisting of a plurality of elements according to the degree of forcing indicated by the selection signal from the control value;
- a signal processing device comprising: a sound generation unit that generates a sound feature quantity sequence corresponding to the control vector using a trained model.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
Description
Jesse Engel, Lamtharn Hantrakul, Chenjie Gu and Adam Roberts, "DDSP: Differentiable Digital Signal Processing", arXiv:2001.04643v1 [cs.LG] 14 Jan 2020
以下、本発明の実施形態に係る信号処理方法、信号処理装置および音生成方法について図面を用いて詳細に説明する。図1は、本発明の一実施形態に係る信号処理装置を含む処理システムの構成を示すブロック図である。図1に示すように、処理システム100は、RAM(ランダムアクセスメモリ)110、ROM(リードオンリメモリ)120、CPU(中央演算処理装置)130、記憶部140、操作部150および表示部160を備える。
図2は、信号処理装置10の構成を示すブロック図である。図3は、表示部160に表示されるGUIの一例を示す図である。図2に示すように、信号処理装置10は、受付部11、信号生成部12および音響生成部13を含む。受付部11、信号生成部12および音響生成部13の機能は、図1のCPU130が信号処理プログラムを実行することにより実現される。受付部11、信号生成部12および音響生成部13の少なくとも一部が電子回路等のハードウエアにより実現されてもよい。
図4は、訓練装置20の構成を示すブロック図である。図5~図8は、訓練装置20の動作を説明するための図である。図4に示すように、訓練装置20は、抽出部21、取得部22および構築部23を含む。抽出部21、取得部22および構築部23の機能は、図1のCPU130が訓練プログラムを実行することにより実現される。抽出部21、取得部22および構築部23の少なくとも一部が電子回路等のハードウエアにより実現されてもよい。
図9は、図2の信号処理装置10による信号成処理の一例を示すフローチャートである。図9の信号処理は、図1のCPU130が記憶部140等に記憶された信号処理プログラムを実行することにより行われる。まず、CPU130は、使用者により楽譜データD1が選択されたか否かを判定する(ステップS1)。楽譜データD1が選択されない場合、CPU130は、楽譜データD1が選択されるまで待機する。
図10は、図4の訓練装置20による訓練処理の一例を示すフローチャートである。図10の訓練処理は、図1のCPU130が記憶部140等に記憶された訓練プログラムを実行することにより行われる。まず、CPU130は、記憶部140等から訓練に用いる複数の参照データD3を取得する(ステップS11)。次に、CPU130は、ステップS11で取得された各参照データD3から第1音響特徴量列(周波数スペクトルの時系列)および第2参照音響特徴量列(音量の時系列)を抽出する(ステップS12)。
強制度の選択および制御値の指示は、使用者によるGUI30上の操作部150の操作に限定されない。強制度の選択および制御値の指示は、GUI30なしで、使用者による物理的な摘みの操作により行われてもよい。この場合、図9の信号処理のステップS2は実行されない。
以上説明したように、本実施形態に係る信号処理方法は、コンピュータにより実現される方法であって、音楽的な特徴を示す制御値を受け取るとともに、第1~第3の強制度とのいずれかを選択するための選択信号を受け取り、訓練済モデルを用いて、第1~第3の強制度に応じて制御値をそれぞれ反映した第1音響特徴量列のうちの、選択信号に応じたいずれか1の第1音響特徴量列を生成する。
上記実施形態において、強制度はゼロを含む3段階で選択されるが、実施形態はこれに限定されない。強制度は、2段階で選択されてもよいし、4段階以上で選択されてもよい。例えば、上記実施形態において、第1の強制度と第2の強制度との2段階で選択されてもよい。この場合、第1の強制度において生成される第1音響特徴量列は、制御値に対して比較的タイトに追随して時間的に変化する。第2の強制度において生成される第1音響特徴量列は、制御値に対して比較的ルーズに追随して時間的に変化する。
(態様1)
音楽的な特徴を示す制御値を受け取り、
信号処理における前記制御値の強制度を示す選択信号を受け取り、
前記制御値から、前記選択信号が示す強制度に応じた、複数の要素からなる制御ベクトルを生成し、
訓練済モデルを用いて、前記制御ベクトルに応じた音響特徴量列を生成する、
コンピュータにより実現される信号処理方法。
(態様2)
前記制御値から生成される前記制御ベクトルは、少なくとも、第1の強制度に対応する第1の要素と、前記第1の強制度よりも低い第2の強制度に対応する第2の要素とを含む、態様1記載の信号処理方法。
(態様3)
前記訓練済モデルは、機械学習により、音波形を示す参照データの、前記第1の強制度における音楽的特徴を示す前記第1の要素を含む第1の参照制御ベクトルと前記参照データの第1の参照音響特徴量列との第1の入出力関係、および前記第2の強制度における音楽的な特徴を示す前記第2の要素を含む第2の参照制御ベクトルと前記第1の参照音響特徴量列との第2の入出力関係を、学習済である、態様2記載の信号処理方法。
(態様4)
前記制御値は、前記第1の強制度と前記第2の強制度との中間の値を取り得る、態様3記載の信号処理方法。
(態様5)
前記制御値は、前記生成される制御ベクトルの前記複数の要素のうち、少なくとも、前記選択信号が示す強制度に応じた要素に反映される、態様1~4のいずれか一に記載の信号処理方法。
(態様6)
音楽的な特徴を示す制御値と、信号処理における前記制御値の強制度を示す選択信号とを受け取る信号受取部と、
前記制御値から、前記選択信号が示す強制度に応じた、複数の要素からなる制御ベクトルを生成するベクトル生成部と、
訓練済モデルを用いて、前記制御ベクトルに応じた音響特徴量列を生成する音響生成部とを備える、信号処理装置。
Claims (15)
- 音楽的な特徴を示す制御値を受け取り、
第1の強制度と、前記第1の強制度よりも低い第2の強制度とのいずれか一方を選択するための選択信号を受け取り、
訓練済モデルを用いて、前記第1の強制度に応じて前記制御値を反映した音響特徴量列と、前記第2の強制度に応じて前記制御値を反映した音響特徴量列とのうちの、前記選択信号に応じたいずれか一方を生成する、
コンピュータにより実現される信号処理方法。 - 前記訓練済モデルは、機械学習により、前記第1の強制度および前記第2の強制度における音楽的な特徴を示す参照制御値列と参照音響特徴量列との関係を学習済である、請求項1記載の信号処理方法。
- 前記訓練済モデルは、機械学習により、音波形を示す参照データに関して、入力としての前記第1の強制度における音楽的な特徴を示す第1の参照制御値列と、出力としての前記参照データの第1の参照音響特徴量列との間の第1関係、および入力としての前記第2の強制度における音楽的な特徴を示す第2の参照制御値列と、出力としての前記第1の参照音響特徴量列との間の第2関係を、学習済である、請求項2記載の信号処理方法。
- 前記第1の参照制御値列は、第2の参照音響特徴量列に応じて第1の精細度で時間的に変化し、
前記第2の参照制御値列は、前記第2の参照音響特徴量列に応じて第2の精細度で時間的に変化する、請求項3記載の信号処理方法。 - 前記第1の参照音響特徴量と前記第2の参照音響特徴量とは、同じ音響特徴量または異なる音響特徴量である、請求項4記載の信号処理方法。
- 各時点の第1の参照制御値は、当該時点を含む第1の期間内の前記参照データの第2の参照音響特徴量列の代表値であり、
各時点の第2の参照制御値は、当該時点を含みかつ前記第1の期間よりも長い第2の期間内の前記第2の参照音響特徴量列の代表値である、請求項4記載の信号処理方法。 - 前記第1の参照音響特徴量と前記第2の参照音響特徴量とは、同じ音響特徴量または異なる音響特徴量である、請求項6記載の信号処理方法。
- 前記第1の強制度において生成される前記音響特徴量列は、前記制御値に追随して時間的に変化し、
前記第2の強制度において生成される前記音響特徴量列は、前記制御値とは無関係に変化する、請求項1~3のいずれか一項に記載の信号処理方法。 - 前記第1の強制度において生成される前記音響特徴量列は、前記制御値にタイトに追随して時間的に変化し、
前記第2の強制度において生成される前記音響特徴量列は、前記制御値にルーズに追随して時間的に変化する、請求項1~4のいずれか一項に記載の信号処理方法。 - さらに、前記生成された音響特徴量列から音信号を生成する、請求項1~9のいずれか一項に記載の信号処理方法。
- センサにより第1の方向および第2の方向における検出対象物の位置を検出し、
前記制御値は、検出された前記第1の方向における前記検出対象物の位置に基づいて受け付けられ、
前記選択信号は、検出された前記第2の方向における前記検出対象物の位置に基づいて受け取られる、請求項1~9のいずれか一項に記載の信号処理方法。 - 前記制御値は、第1の操作子が操作されることにより受け付けられ、
前記選択信号は、第2の操作子が操作されることにより受け取られる、請求項1~9のいずれか一項に記載の信号処理方法。 - 音楽的な特徴を示す制御値と、第1の強制度と前記第1の強制度よりも低い第2の強制度とのいずれか一方を選択するための選択信号とを受け取る受取部と、
訓練済モデルを用いて、前記第1の強制度に応じて前記制御値を反映した音響特徴量列と、前記第2の強制度に応じて前記制御値を反映した音響特徴量列とのうちの、前記選択信号に応じたいずれか一方を生成する音響生成部とを備える、信号処理装置。 - 与えられた音符列に対応する楽曲の音を生成するシステムにおいて、
使用者から音楽的な特徴を示す制御値の指示を受け取り、
第1の強制度で使用者から前記制御値の指示を受け取ったときには、訓練済みモデルを用いて、使用者からの指示を前記第1の強制度に応じて反映した音を生成し、
第2の強制度で使用者から前記制御値の指示を受け取ったときには、前記訓練済みモデルを用いて、使用者からの指示を前記第1の強制度よりも低く反映した音を生成する、音生成方法。 - 前記使用者からの指示を前記第1の強制度よりも低く反映した音を生成することは、使用者からの指示を反映しない音を生成することを含む、請求項14記載の音生成方法。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202280023027.5A CN117043854A (zh) | 2021-03-25 | 2022-03-11 | 使用了机器学习模型的信号处理方法、信号处理装置及音生成方法 |
JP2023509023A JPWO2022202415A1 (ja) | 2021-03-25 | 2022-03-11 | |
US18/472,119 US20240029695A1 (en) | 2021-03-25 | 2023-09-21 | Signal processing method, signal processing device, and sound generation method using machine learning model |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2021051091 | 2021-03-25 | ||
JP2021-051091 | 2021-03-25 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/472,119 Continuation US20240029695A1 (en) | 2021-03-25 | 2023-09-21 | Signal processing method, signal processing device, and sound generation method using machine learning model |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022202415A1 true WO2022202415A1 (ja) | 2022-09-29 |
Family
ID=83397141
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/011067 WO2022202415A1 (ja) | 2021-03-25 | 2022-03-11 | 機械学習モデルを用いた信号処理方法、信号処理装置および音生成方法 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20240029695A1 (ja) |
JP (1) | JPWO2022202415A1 (ja) |
CN (1) | CN117043854A (ja) |
WO (1) | WO2022202415A1 (ja) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017107228A (ja) * | 2017-02-20 | 2017-06-15 | 株式会社テクノスピーチ | 歌声合成装置および歌声合成方法 |
JP2018077283A (ja) * | 2016-11-07 | 2018-05-17 | ヤマハ株式会社 | 音声合成方法 |
JP2019008206A (ja) * | 2017-06-27 | 2019-01-17 | 日本放送協会 | 音声帯域拡張装置、音声帯域拡張統計モデル学習装置およびそれらのプログラム |
-
2022
- 2022-03-11 CN CN202280023027.5A patent/CN117043854A/zh active Pending
- 2022-03-11 WO PCT/JP2022/011067 patent/WO2022202415A1/ja active Application Filing
- 2022-03-11 JP JP2023509023A patent/JPWO2022202415A1/ja active Pending
-
2023
- 2023-09-21 US US18/472,119 patent/US20240029695A1/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018077283A (ja) * | 2016-11-07 | 2018-05-17 | ヤマハ株式会社 | 音声合成方法 |
JP2017107228A (ja) * | 2017-02-20 | 2017-06-15 | 株式会社テクノスピーチ | 歌声合成装置および歌声合成方法 |
JP2019008206A (ja) * | 2017-06-27 | 2019-01-17 | 日本放送協会 | 音声帯域拡張装置、音声帯域拡張統計モデル学習装置およびそれらのプログラム |
Also Published As
Publication number | Publication date |
---|---|
JPWO2022202415A1 (ja) | 2022-09-29 |
CN117043854A (zh) | 2023-11-10 |
US20240029695A1 (en) | 2024-01-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gillick et al. | Learning to groove with inverse sequence transformations | |
US10789921B2 (en) | Audio extraction apparatus, machine learning apparatus and audio reproduction apparatus | |
EP2760014B1 (en) | Interactive score curve for adjusting audio parameters of a user's recording. | |
DE112013001343B4 (de) | Benutzerschnittstelle für ein virtuelles Musikinstrument und Verfahren zum Bestimmen einer Eigenschaft einer auf einem virtuellen Saiteninstrument gespielten Note | |
EP2680254B1 (en) | Sound synthesis method and sound synthesis apparatus | |
JP6004358B1 (ja) | 音声合成装置および音声合成方法 | |
US7608775B1 (en) | Methods and systems for providing musical interfaces | |
US9251773B2 (en) | System and method for determining an accent pattern for a musical performance | |
DE112014003260T5 (de) | System und Verfahren zum Erzeugen einer rhythmischen Begleitungfür eine musikalische Darbietung | |
US20150013528A1 (en) | System and method for modifying musical data | |
WO2010034063A1 (en) | Video and audio content system | |
CN109243416A (zh) | 用于产生鼓型式的装置配置和方法 | |
CN113874932A (zh) | 电子乐器、电子乐器的控制方法及存储介质 | |
WO2014058835A1 (en) | System and methods for simulating real-time multisensory output | |
US20080190270A1 (en) | System and method for online composition, and computer-readable recording medium therefor | |
Johansson | Empirical research on asymmetrical rhythms in scandinavian folk music: A critical review | |
WO2022202415A1 (ja) | 機械学習モデルを用いた信号処理方法、信号処理装置および音生成方法 | |
US10304434B2 (en) | Methods, devices and computer program products for interactive musical improvisation guidance | |
Simon et al. | Audio analogies: Creating new music from an existing performance by concatenative synthesis | |
Caetano et al. | Independent manipulation of high-level spectral envelope shape features for sound morphing by means of evolutionary computation | |
JP2022122706A (ja) | 機械学習モデルを用いた音生成方法、機械学習モデルの訓練方法、音生成装置、訓練装置、音生成プログラムおよび訓練プログラム | |
WO2019113954A1 (zh) | 话筒、声音处理系统和声音处理方法 | |
WO2018159063A1 (ja) | 電子音響装置および音色設定方法 | |
US20230395046A1 (en) | Sound generation method using machine learning model, training method for machine learning model, sound generation device, training device, non-transitory computer-readable medium storing sound generation program, and non-transitory computer-readable medium storing training program | |
KR102662975B1 (ko) | 인공지능 모델을 이용한 음악 vr게임 장치 및 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22775207 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202280023027.5 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023509023 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22775207 Country of ref document: EP Kind code of ref document: A1 |