CN108269579B - Voice data processing method and device, electronic equipment and readable storage medium - Google Patents

Voice data processing method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN108269579B
CN108269579B CN201810049575.0A CN201810049575A CN108269579B CN 108269579 B CN108269579 B CN 108269579B CN 201810049575 A CN201810049575 A CN 201810049575A CN 108269579 B CN108269579 B CN 108269579B
Authority
CN
China
Prior art keywords
target
voice data
frequency domain
domain parameters
midi audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810049575.0A
Other languages
Chinese (zh)
Other versions
CN108269579A (en
Inventor
卓鹏鹏
张康
方博伟
尤嘉华
张伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Meitu Technology Co Ltd
Original Assignee
Xiamen Meitu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Meitu Technology Co Ltd filed Critical Xiamen Meitu Technology Co Ltd
Priority to CN201810049575.0A priority Critical patent/CN108269579B/en
Publication of CN108269579A publication Critical patent/CN108269579A/en
Application granted granted Critical
Publication of CN108269579B publication Critical patent/CN108269579B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used
    • G10L21/013Adapting to target pitch
    • G10L2021/0135Voice conversion or morphing

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Auxiliary Devices For Music (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

The invention provides a voice data processing method and device, electronic equipment and a readable storage medium, and relates to the technical field of data processing. The method comprises the steps of obtaining initial frequency domain parameters of voice data; and then, obtaining target frequency domain parameters corresponding to the preset target MIDI audio, and modifying the initial frequency domain parameters according to the target frequency domain parameters to obtain the voice data after tone modification. The voice in the voice data can have the frequency domain parameters of the target MIDI audio, the voice data after tone modification can have the pitch characteristics of the target MIDI audio, the tone modification operation of the voice data is realized, and the tone modification of the voice data can be realized under the condition that the speed and the duration of the voice in the voice data are not changed. The phase place of the voice data after the tone modification is continuous, noise can not appear, mechanical sound can be avoided appearing simultaneously, and the tone modification effect is better. The method can be applied to correction of pitch in songs, or conversion from voice to singing voice and the like, and has a high application prospect in the field of voice processing.

Description

Voice data processing method and device, electronic equipment and readable storage medium
Technical Field
The invention relates to the technical field of data processing, in particular to a voice data processing method and device, electronic equipment and a readable storage medium.
Background
The voice tone change is realized by a certain algorithm under the condition of not changing the speed of the voice of the audio file, and comprises the steps of translating the tone and converting the voice to a specific tone. The phase discontinuity of the existing variable modulation processing can occur, and the problem of noise can be generated.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, an electronic device and a readable storage medium for processing voice data, which can solve the above problems and achieve phase continuity of a voice after tone modification.
The technical scheme provided by the invention is as follows:
a method of speech data processing, comprising:
acquiring voice data and target MIDI audio, wherein the voice data comprises voice aligned with the target MIDI audio;
obtaining initial frequency domain parameters of the voice data;
obtaining target frequency domain parameters corresponding to preset target MIDI audio, wherein the initial frequency domain parameters comprise initial phases of the voice data, and the target frequency domain parameters comprise target phases corresponding to the target MIDI audio;
and modifying the initial frequency domain parameters according to the target frequency domain parameters, and transforming the pitch in the voice data to a target pitch in the target MIDI audio to obtain the modulated voice data.
Further, the step of obtaining initial frequency domain parameters of the voice data comprises:
acquiring voice data in the voice data at the time corresponding to the target pitch;
performing zero point drift and pre-emphasis processing on the voice data in the time corresponding to the target pitch;
and performing time-frequency conversion on the voice data subjected to zero point drift removal and pre-emphasis processing to obtain a frequency domain parameter of each frame of the voice data.
Further, the step of performing time-frequency conversion on the voice data subjected to the zero point drift and pre-emphasis processing comprises:
calculating the frame shift of each frame in the voice data;
framing and windowing the voice data according to the frame shift obtained by calculation and a preset window function;
and carrying out Fourier transform on each frame of voice data subjected to framing and windowing to obtain the frequency domain parameter of each frame in the voice data.
Further, the step of calculating the frame shift of each frame in the speech data comprises:
dividing a sampling rate by a target frequency to obtain a frame shift of each frame, wherein the target frequency is the frequency of the target MIDI audio, and the target frequency is calculated by adopting the following formula:
Figure BDA0001551961290000021
where F is the target frequency of the target MIDI audio and MIDINote is the pitch value included in the target MIDI audio.
Further, the target MIDI audio records a target frequency of a sound, and the step of obtaining a target frequency domain parameter corresponding to a preset target MIDI audio includes:
generating a target waveform with the same pitch as the target frequency and equal duration of voice data corresponding to the target frequency;
extracting a phase value of the target waveform as the target frequency domain parameter;
correspondingly, the step of modifying the frequency domain parameters of the voice data according to the frequency domain parameters of the target MIDI audio comprises:
replacing the phase value of the voice data at the position corresponding to the target waveform in the voice data with the phase value of the target waveform to obtain the frequency domain parameter of the voice data after tone modification;
and carrying out inverse Fourier transform on the frequency domain parameters of the voice data after tone modification, and processing the frequency domain parameters through an OLA overlapping and overlapping algorithm to obtain the voice data after tone modification.
The present invention also provides a voice data processing apparatus, comprising:
the data acquisition module is used for acquiring voice data and target MIDI audio, wherein the voice data comprises voice aligned with the target MIDI audio;
the voice data processing module is used for obtaining initial frequency domain parameters of the voice data;
a target MIDI audio processing module for obtaining target frequency domain parameters corresponding to preset target MIDI audio, wherein the initial frequency domain parameters comprise initial phases of the voice data, and the target frequency domain parameters comprise target phases corresponding to the target MIDI audio;
and the tone modification module is used for modifying the initial frequency domain parameters according to the target frequency domain parameters, and transforming the pitch in the voice data to a target pitch in the target MIDI audio to obtain the tone-modified voice data.
Further, the method for obtaining the initial frequency domain parameters of the voice data by the voice data processing module includes:
performing zero point drift removal and pre-emphasis processing on the voice data;
and performing time-frequency conversion on the voice data subjected to zero point drift removal and pre-emphasis processing to obtain a frequency domain parameter of each frame of the voice data.
Further, the step of performing time-frequency conversion on the voice data subjected to the zero point drift and pre-emphasis processing by the voice data processing module includes:
calculating the frame shift of each frame in the voice data;
framing and windowing the voice data according to the frame shift obtained by calculation and a preset window function;
and carrying out Fourier transform on each frame of voice data subjected to framing and windowing to obtain the frequency domain parameter of each frame in the voice data.
Further, the step of calculating the frame shift of each frame in the voice data by the voice data processing module includes:
dividing a sampling rate by a target frequency to obtain a frame shift of each frame, wherein the target frequency is the frequency of the target MIDI audio, and the target frequency is calculated by adopting the following formula:
Figure BDA0001551961290000041
where F is the target frequency of the target MIDI audio, MIDINote is the pitch value included in the target MIDI audio.
Further, the method for the target MIDI audio to record the target frequency of the sound, and the target MIDI audio processing module obtaining the target frequency domain parameter corresponding to the preset target MIDI audio includes:
generating a target waveform with the same pitch as the target frequency and equal duration of voice data corresponding to the target frequency;
extracting a phase value of the target waveform as the target frequency domain parameter;
correspondingly, the method for modifying the frequency domain parameters of the voice data according to the frequency domain parameters of the target MIDI audio by the transposition module comprises the following steps:
replacing the phase value of the voice data at the position corresponding to the target waveform in the voice data with the phase value of the target waveform to obtain the frequency domain parameter of the voice data after tone modification;
and carrying out inverse Fourier transform on the frequency domain parameters of the voice data after tone modification, and processing the frequency domain parameters through an OLA overlapping and overlapping algorithm to obtain the voice data after tone modification.
The present invention also provides an electronic device, including: a processor and a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the electronic device to:
acquiring voice data and target MIDI audio, wherein the voice data comprises voice aligned with the target MIDI audio;
obtaining initial frequency domain parameters of the voice data;
obtaining target frequency domain parameters corresponding to preset target MIDI audio, wherein the initial frequency domain parameters comprise initial phases of the voice data, and the target frequency domain parameters comprise target phases corresponding to the target MIDI audio;
and modifying the initial frequency domain parameters according to the target frequency domain parameters, and transforming the pitch in the voice data to a target pitch in the target MIDI audio to obtain the modulated voice data.
The invention also provides a readable storage medium, which comprises a computer program, and the computer program controls the electronic device where the readable storage medium is located to execute the voice data processing method of any one of claims 1-5 when running.
The embodiment of the application can enable the voice in the voice data to have the frequency domain parameters of the target MIDI audio, enable the voice data after tone modification to have the pitch characteristics of the target MIDI audio, realize tone modification operation on the voice data, and realize tone modification on the voice data under the condition of not changing the speed and duration of the voice in the voice data. The phase place of the voice data after the tone modification is continuous, noise can not appear, mechanical sound can be avoided appearing simultaneously, and the tone modification effect is better. The method can be applied to correction of pitch in songs, or conversion from voice to singing voice and the like, and has a high application prospect in the field of voice processing.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a block diagram of an electronic device according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating a voice data processing method according to an embodiment of the present invention.
Fig. 3 is a flowchart illustrating the sub-step of step S102 in a speech data processing method according to an embodiment of the present invention.
Fig. 4 is a flowchart illustrating the sub-step of step S103 in the speech data processing method according to the embodiment of the present invention.
Fig. 5 is a functional block diagram of a voice data processing apparatus according to an embodiment of the present invention.
Icon: 100-an electronic device; 111-a memory; 112-a memory controller; 113-a processor; 300-a voice data processing device; 310-a data acquisition module; 320-a voice data processing module; 330-target MIDI audio processing module; 340-tone changing module.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
The existing pitch-changing methods can be mainly divided into two categories: one is a time domain interpolation and splicing method, such as a synchronized overlapped-added fixed synthesis (SOLA-FS); another type is a frequency domain processing method, often referred to as a phase vocoder. The time domain processing method has the advantages that the calculated amount is small, the naturalness of the tone-changing result is good, but the splicing processing brings discontinuity of phases, and noise is generated; the frequency domain method needs time-frequency conversion, phase estimation and the like, needs a large amount of calculation, and mechanical sound exists in the pitch-modified voice.
Fig. 1 is a block diagram of an electronic device 100 according to a preferred embodiment of the invention. The electronic device 100 may include a voice data processing apparatus 300, a memory 111, a storage controller 112, and a processor 113.
The memory 111, the memory controller 112 and the processor 113 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The voice data processing apparatus 300 may include at least one software functional module which may be stored in the memory 111 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the electronic device 100. The processor 113 is used for executing executable modules stored in the memory 111, such as software functional modules and computer programs included in the voice data processing apparatus 300.
The Memory 111 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 111 is used for storing a program, and the processor 113 executes the program after receiving an execution instruction. Access to the memory 111 by the processor 113 and possibly other components may be under the control of the memory controller 112.
The processor 113 may be an integrated circuit chip having signal processing capabilities. The Processor 113 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The embodiment of the present application provides a voice data processing method, which can implement tonal modification of voice data, and can be applied to the electronic device 100, as shown in fig. 2, where the method includes the following steps.
Step S101, acquiring voice data and a target MIDI audio, where the voice data includes a voice aligned with the target MIDI audio.
Step S102, obtaining initial frequency domain parameters of the voice data.
The voice data in the embodiment of the present application may be a piece of voice or a piece of song, the embodiment of the present application does not limit the duration and the content of the voice data, and the voice data may be selected and determined according to actual needs. In the embodiment of the present application, the tone of the voice in the voice data is modified by processing the voice data, the initial frequency domain parameter obtained by calculation may be an initial frequency domain parameter of each frame of data in the voice data, or may be a frequency domain parameter of only a frame that needs to be modified by calculation.
The pitch change in the embodiment of the present application refers to changing the pitch of sound in the voice data to change the pitch of a certain frame of voice to a desired pitch.
As shown in fig. 3, the step of obtaining initial frequency domain parameters of the speech data may include the following sub-steps.
And a substep S1021, performing zero point drift elimination and pre-emphasis processing on the voice data.
And a substep S1022, performing time-frequency conversion on the voice data subjected to the zero point drift removal and pre-emphasis processing to obtain a frequency domain parameter of each frame of the voice data.
Because the voice data can have zero drift, the zero drift can be improved by removing the zero drift. Meanwhile, voice data can be influenced by lip radiation, the high-frequency part of voice can be emphasized through pre-emphasis, the influence of the lip radiation is removed, and the high-frequency resolution of the voice is increased. The zero drift and pre-emphasis process can be calculated using the following equations.
x(n)=x(n)-mean_x
Wherein x (n) is a sampling value corresponding to the nth point, and is an output value after zero point drift is removed, and mean _ x is a mean value of time domain amplitude of the speech segment obtained by calculation.
The pre-emphasis can be implemented by a first order FIR high pass filter. The specific calculation formula is as follows.
y(n)=x(n)-ax(n-1)
Wherein y (n) is the output after preprocessing, x (n) is the audio without preprocessing, a is the pre-emphasis coefficient, generally 0.9-1.0, and optionally, a is 0.98.
The method for performing time-frequency conversion on the voice data subjected to the zero point drift and pre-emphasis processing can comprise the following three steps.
First, a frame shift for each frame in the speech data is calculated.
And thirdly, framing and windowing the voice data according to the frame shift obtained by calculation and a preset window function.
And then, carrying out Fourier transform on each frame of voice data after framing and windowing to obtain the frequency domain parameter of each frame in the voice data.
Calculating the frame shift of each frame in the voice data may use a sampling rate to divide a target frequency to obtain the frame shift of each frame, where the target frequency is the frequency of the target MIDI audio, and is calculated by using the following formula:
Figure BDA0001551961290000101
where F is the corresponding frequency pitch and MIDINote is the pitch value included in the target MIDI audio file. The value 110 may be replaced by 220 in order to raise by one octave.
A speech signal is a signal that changes with time, but the state of a sound-emitting organ changes at a much slower rate than the rate of sound vibration. We can consider that the speech signal is stationary in a very short time, i.e. stationary for a short time. So that we can frame the speech and then analyze it. The frame length is generally 10-30 milliseconds, and frame overlapping exists between frames. Windowing has two main functions: firstly, make the signal global more continuous, avoid appearing the gibbs effect. And secondly, the voice signal without periodicity originally presents partial characteristics of the periodic function. Windowing may be performed using a window function, several of which are listed below.
The rectangular window function is as follows:
Figure BDA0001551961290000111
the Hamming window function is as follows:
Figure BDA0001551961290000112
the Hanning Window function is as follows:
Figure BDA0001551961290000113
where N is the window length. And realizing windowing processing on the voice data through the window function.
The initial frequency domain parameters of the voice data can be obtained through the method.
Step S103, obtaining target frequency domain parameters corresponding to the preset target MIDI audio.
The target MIDI audio in the embodiment of the present application may include pitch information after the voice data needs to be transposed, and the target MIDI audio may be data of a duration equal to that of the voice data, and the target MIDI audio may be used as a reference for the transposition of the voice data. It will be appreciated that the voice data requiring transposition may be determined first, and then the target MIDI audio as the basis for the transposition may be determined. Or, the target MIDI audio may be determined first, and then the voice data with equal duration may be selected according to the duration of the target MIDI audio.
In this embodiment, the target MIDI audio may be a file in MIDI (musical instrument Digital interface) format, which includes pitch information at different time points based on time, duration of different pitches, and start and stop time points of different pitches. By determining the pitch information for the target MIDI audio, the pitch to which the voice data needs to be transposed can be determined. And according to the conversion relation between the pitches and the frequencies, the frequencies corresponding to different pitches can be determined.
It will be appreciated that in obtaining the speech data and the target MIDI audio, the start position of the transposition required and the corresponding target pitch to which the transposition is required may be determined first.
In detail, as shown in fig. 4, target frequency domain parameters of target MIDI audio may be determined by the following sub-steps.
And a substep S1031 of generating a target waveform having the same pitch as the target frequency and having the same duration as the voice data corresponding to the target frequency.
The frequency of the target waveform is the same as the frequency of a preset target frequency in the target MIDI audio, and the duration of the target waveform is equal to the duration of voice data corresponding to the preset target frequency.
As mentioned above, the target MIDI audio includes different pitch information, and the frequencies corresponding to different pitches can be determined according to the conversion relationship between the pitches and the frequencies, and these frequencies are the preset target frequencies included in the target MIDI audio. The frequency of the generated target waveform is the same as the frequency of the preset target frequency in the target MIDI audio, a plurality of preset target frequencies may be included in one target MIDI audio, target waveforms corresponding to the plurality of preset target frequencies may be generated, respectively, and the durations of the target waveforms are equal to the durations of voices at corresponding positions in the voice data, respectively.
The target waveform may be determined according to actual needs, for example, a sine wave or a deformation of a sine wave may be generated as the target waveform. Since the vocal cords of a human being are sounds that directly generate sine waves, and the vibrations of the vocal cords when speaking are similar to waveforms of the chord type. When performing a transposition operation on all of the speech data, a targeted waveform selection can be performed for speech at different points in time. Sine waves can be selected as target waveforms of voice pitch change at all time points, and different target waveforms can also be generated aiming at voice data at different time points. Different target waveforms may correspond to different timbres, so that human hearing experiences are also different.
In detail, the target waveform may be generated by the following method.
Firstly, the number of sampling points corresponding to a target waveform at a target pitch is obtained. Calculated by the following formula.
Len=Fs/F
Where Len is the number of sampling points corresponding to one period of the target waveform, Fs is the sampling rate, and F is the target frequency.
Then, the sampling interval is calculated.
delta1=(4*π)/Len
delta2=(2*π)/Len
And then calculating sampling values corresponding to different target waveforms. The reference tone color 1 can be expressed as:
y[n]=(sin(-3*π+n*delta1))/(-3*π+n*delta1)
the reference timbre 2 can be expressed as:
y[n]=(sin(n*delta2)+abs(sin(n*delta2))*alpha)/(1+alpha)
wherein y is all sampling values corresponding to one period of the waveform, n is a sampling point, n is more than or equal to 0 and less than Len, abs () is used for solving an absolute value, and alpha is more than 0 and less than 1. After repeating the data of one period for many times, the waveform sampling value data with the same length as the target voice can be obtained.
And a substep S1032 of extracting a phase value of the target waveform.
After the corresponding target waveform is generated, the target waveform may be firstly subjected to framing and windowing processing to keep the frame length of the target waveform consistent with the frame length of the voice data, and then subjected to short-time fourier transform, and a corresponding phase value after each frame of target waveform is transformed is extracted as a target frequency domain parameter of the target MIDI audio.
And step S104, modifying the initial frequency domain parameters according to the target frequency domain parameters, and transforming the pitch in the voice data to a target pitch in the target MIDI audio to obtain the tone-modified voice data.
After the target frequency domain parameters are obtained through the steps, the initial frequency domain parameters of the voice data can be replaced by the target frequency domain parameters, and the initial frequency domain parameters are modified. Specifically, the initial phase of the voice data is replaced with the phase value of the corresponding target waveform. Since the voice data includes unvoiced sound and voiced sound, and unvoiced sound has no periodicity, if the phase value is also replaced for the initial phase corresponding to unvoiced sound, the result after pitch modification is degraded. In the embodiment of the present application, the phase value may be replaced only for the sound frame corresponding to the voiced sound, the phase value of the unvoiced sound is not replaced, and the original phase value is still used for the voice data corresponding to the unvoiced sound.
In detail, the phase value of the voice data at the position corresponding to the target waveform in the voice data may be replaced with the phase value of the target waveform, so as to obtain the frequency domain parameter of the voice data after tone modification.
And performing inverse Fourier transform on the frequency domain parameters of the voice data after tone modification, and processing the frequency domain parameters through an OLA (Overlap-and-Add) overlapping algorithm to obtain the voice data after tone modification. The modified voice data can be output, stored and the like.
An embodiment of the present application further provides a voice data processing apparatus 300, as shown in fig. 5, including:
a data obtaining module 310, configured to obtain voice data and target MIDI audio, where the voice data includes voice aligned with the target MIDI audio;
a voice data processing module 320, configured to obtain initial frequency domain parameters of the voice data;
a target MIDI audio processing module 330, configured to obtain target frequency domain parameters corresponding to a preset target MIDI audio, where the initial frequency domain parameters include an initial phase of the voice data, and the target frequency domain parameters include a target phase corresponding to the target MIDI audio;
and the transposition module 340 is configured to modify the initial frequency domain parameter according to the target frequency domain parameter, and transform a pitch in the voice data to a target pitch in the target MIDI audio to obtain transposed voice data.
It is understood that the method for the voice data processing module 320 to obtain the initial frequency domain parameters of the voice data includes:
performing zero point drift removal and pre-emphasis processing on the voice data;
and performing time-frequency conversion on the voice data subjected to zero point drift removal and pre-emphasis processing to obtain a frequency domain parameter of each frame of the voice data.
In this embodiment, the step of performing time-frequency conversion on the voice data subjected to the zero point drift and pre-emphasis processing by the voice data processing module 320 includes:
calculating the frame shift of each frame in the voice data;
framing and windowing the voice data according to the frame shift obtained by calculation and a preset window function;
and carrying out Fourier transform on each frame of voice data subjected to framing and windowing to obtain the frequency domain parameter of each frame in the voice data.
In this embodiment, the step of the voice data processing module 320 calculating the frame shift of each frame in the voice data includes:
dividing a sampling rate by a target frequency to obtain a frame shift of each frame, wherein the target frequency is the frequency of the target MIDI audio, and the target frequency is calculated by adopting the following formula:
Figure BDA0001551961290000151
where F is the target frequency of the target MIDI audio, MIDINote is the pitch value included in the target MIDI audio.
In this embodiment, the target MIDI audio is recorded with a target frequency of a sound, and the method for modifying the frequency domain parameters of the voice data by the transposition module 340 according to the frequency domain parameters of the target MIDI audio includes:
generating a target waveform with the same pitch as the target frequency and equal duration of voice data corresponding to the target frequency;
extracting phase values of the target waveform;
replacing the phase value of the voice data at the position corresponding to the target waveform in the voice data with the phase value of the target waveform to obtain the frequency domain parameter of the voice data after tone modification;
and carrying out inverse Fourier transform on the frequency domain parameters of the voice data after tone modification, and processing the frequency domain parameters through an OLA overlapping and overlapping algorithm to obtain the voice data after tone modification.
In the embodiment of the application, a target waveform corresponding to the voice data is generated according to the target MIDI audio, the target waveform is generated based on the pitch information contained in the target MIDI audio, and the phase value of the target waveform is used for replacing the phase value of the voice in the voice data. And modifying the frequency domain parameters of the voice data into frequency domain parameters corresponding to the target MIDI audio, so that the voice data has the pitch characteristics of the target MIDI audio, and realizing tone modification processing of the voice data. According to the voice data phase value replacement method and device, the phase value of the voice data is not set to zero through replacement of the phase value, and the situations of phase discontinuity and mechanical sound can be avoided while tone changing is achieved. Meanwhile, the target waveform is used for replacing the phase value of the voice data, so that the modified voice data can have the sound effect of the target waveform, and the modified voice has the tone color property of the target waveform.
In summary, by modifying the frequency domain parameters of the voice data by using the frequency domain parameters of the target MIDI audio, the voice in the voice data can have the frequency domain parameters of the target MIDI audio, and the modified voice data can have the pitch characteristics of the target MIDI audio, so as to implement the tone modification operation on the voice data, and implement the tone modification on the voice data without changing the speed and duration of the voice in the voice data. The phase place of the voice data after the tone modification is continuous, noise can not appear, mechanical sound can be avoided appearing simultaneously, and the tone modification effect is better. The method can be applied to correction of pitch in songs, or conversion from voice to singing voice and the like, and has a high application prospect in the field of voice processing.
The method is obtained by improving the traditional pitch-shifting algorithm based on the zero phase, and improves the conditions of phase discontinuity and mechanical sound by adding the phase value corresponding to the waveform with the same frequency. Meanwhile, some tone information added with waveforms is added to the original voice, so that different tonal modification results can be obtained by adding different waveforms, and the diversity of tonal modification is increased. In application, each user can obtain an individualized tonal modification result in a mode of enabling the user to select waveforms by himself, and the method has a good practical background. Compared with the traditional zero-phase-based method, the method has the advantages that the mechanical sound condition is better improved, and the phase continuity is obviously improved compared with the traditional time domain method.
The method provided by the embodiment of the application can be combined with a voice speed changing method, and the voice can be automatically synthesized by combining the modified dry voice and background music by combining a sound mixing technology. Because the tone-changing algorithm in the method can realize the personalized tone-changing, the method can realize the personalized singing voice synthesis in the singing voice synthesis. Different singing voice synthesis outputs can be controlled through different added waveforms, and the waveforms are selectable by a user, so that the user can select different effects according to own preference, and the practicability of the method is improved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (12)

1. A method for processing voice data, comprising:
acquiring voice data and target MIDI audio, wherein the voice data comprises voice aligned with the target MIDI audio;
obtaining initial frequency domain parameters of the voice data;
obtaining target frequency domain parameters corresponding to preset target MIDI audio, wherein the initial frequency domain parameters comprise initial phases of the voice data, and the target frequency domain parameters comprise target phases corresponding to the target MIDI audio;
modifying the initial frequency domain parameters according to the target frequency domain parameters, and transforming the pitch in the voice data to a target pitch in the target MIDI audio to obtain the modulated voice data;
the step of modifying the initial frequency domain parameters according to the target frequency domain parameters comprises:
replacing an initial phase of voiced speech in the speech data with a phase value of a corresponding target waveform.
2. The speech data processing method of claim 1, wherein the step of obtaining initial frequency domain parameters of the speech data comprises:
acquiring voice data in the voice data at the time corresponding to the target pitch;
performing zero point drift and pre-emphasis processing on the voice data in the time corresponding to the target pitch;
and performing time-frequency conversion on the voice data subjected to zero point drift removal and pre-emphasis processing to obtain a frequency domain parameter of each frame of the voice data.
3. The speech data processing method according to claim 2, wherein the step of performing time-frequency conversion on the speech data subjected to the zero point drift elimination and pre-emphasis processing comprises:
calculating the frame shift of each frame in the voice data;
framing and windowing the voice data according to the frame shift obtained by calculation and a preset window function;
and carrying out Fourier transform on each frame of voice data subjected to framing and windowing to obtain the frequency domain parameter of each frame in the voice data.
4. The method of claim 3, wherein the step of calculating the frame shift for each frame in the speech data comprises:
dividing a sampling rate by a target frequency to obtain a frame shift of each frame, wherein the target frequency is the frequency of the target MIDI audio, and the target frequency is calculated by adopting the following formula:
Figure FDA0002585947620000021
where F is the target frequency of the target MIDI audio and MIDINote is the pitch value included in the target MIDI audio.
5. The method of claim 1, wherein the target MIDI audio is recorded with a target frequency of a sound, and the step of obtaining target frequency domain parameters corresponding to a preset target MIDI audio comprises:
generating a target waveform with the same pitch as the target frequency and equal duration of voice data corresponding to the target frequency;
extracting a phase value of the target waveform as the target frequency domain parameter;
correspondingly, the step of modifying the frequency domain parameters of the voice data according to the frequency domain parameters of the target MIDI audio comprises:
replacing the phase value of the voice data at the position corresponding to the target waveform in the voice data with the phase value of the target waveform to obtain the frequency domain parameter of the voice data after tone modification;
and carrying out inverse Fourier transform on the frequency domain parameters of the voice data after tone modification, and processing the frequency domain parameters through an OLA overlapping and overlapping algorithm to obtain the voice data after tone modification.
6. A speech data processing apparatus, comprising:
the data acquisition module is used for acquiring voice data and target MIDI audio, wherein the voice data comprises voice aligned with the target MIDI audio;
the voice data processing module is used for obtaining initial frequency domain parameters of the voice data;
a target MIDI audio processing module for obtaining target frequency domain parameters corresponding to preset target MIDI audio, wherein the initial frequency domain parameters comprise initial phases of the voice data, and the target frequency domain parameters comprise target phases corresponding to the target MIDI audio;
a tone modification module, configured to modify the initial frequency domain parameter according to the target frequency domain parameter, and transform a pitch in the voice data to a target pitch in the target MIDI audio, to obtain tone-modified voice data;
the transposition module is further used for replacing the initial phase of the voiced sound in the voice data with the phase value of the corresponding target waveform.
7. The apparatus according to claim 6, wherein the means for obtaining the initial frequency-domain parameters of the speech data by the speech data processing module comprises:
performing zero point drift removal and pre-emphasis processing on the voice data;
and performing time-frequency conversion on the voice data subjected to zero point drift removal and pre-emphasis processing to obtain a frequency domain parameter of each frame of the voice data.
8. The apparatus as claimed in claim 7, wherein the step of performing time-frequency conversion on the voice data processed by the voice data processing module and subjected to the zero point drift and pre-emphasis processing comprises:
calculating the frame shift of each frame in the voice data;
framing and windowing the voice data according to the frame shift obtained by calculation and a preset window function;
and carrying out Fourier transform on each frame of voice data subjected to framing and windowing to obtain the frequency domain parameter of each frame in the voice data.
9. The speech data processing device of claim 7, wherein the step of the speech data processing module calculating a frame shift for each frame of the speech data comprises:
dividing a sampling rate by a target frequency to obtain a frame shift of each frame, wherein the target frequency is the frequency of the target MIDI audio, and the target frequency is calculated by adopting the following formula:
Figure FDA0002585947620000041
where F is the target frequency of the target MIDI audio, MIDINote is the pitch value included in the target MIDI audio.
10. The apparatus of claim 6, wherein the target MIDI audio is recorded with a target frequency of a sound, and the method for obtaining the target frequency domain parameters corresponding to the preset target MIDI audio by the target MIDI audio processing module comprises:
generating a target waveform with the same pitch as the target frequency and equal duration of voice data corresponding to the target frequency;
extracting a phase value of the target waveform as the target frequency domain parameter;
correspondingly, the method for modifying the frequency domain parameters of the voice data according to the frequency domain parameters of the target MIDI audio by the transposition module comprises the following steps:
replacing the phase value of the voice data at the position corresponding to the target waveform in the voice data with the phase value of the target waveform to obtain the frequency domain parameter of the voice data after tone modification;
and carrying out inverse Fourier transform on the frequency domain parameters of the voice data after tone modification, and processing the frequency domain parameters through an OLA overlapping and overlapping algorithm to obtain the voice data after tone modification.
11. An electronic device, characterized in that the electronic device comprises: a processor and a memory coupled to the processor, the memory storing instructions that, when executed by the processor, cause the electronic device to:
acquiring voice data and target MIDI audio, wherein the voice data comprises voice aligned with the target MIDI audio;
obtaining initial frequency domain parameters of the voice data;
obtaining target frequency domain parameters corresponding to preset target MIDI audio, wherein the initial frequency domain parameters comprise initial phases of the voice data, and the target frequency domain parameters comprise target phases corresponding to the target MIDI audio;
modifying the initial frequency domain parameters according to the target frequency domain parameters, and transforming the pitch in the voice data to a target pitch in the target MIDI audio to obtain the modulated voice data;
the step of modifying the initial frequency domain parameters according to the target frequency domain parameters comprises:
replacing an initial phase of voiced speech in the speech data with a phase value of a corresponding target waveform.
12. A readable storage medium comprising a computer program, wherein the computer program controls an electronic device where the readable storage medium is located to execute the voice data processing method according to any one of claims 1 to 5 when the computer program runs.
CN201810049575.0A 2018-01-18 2018-01-18 Voice data processing method and device, electronic equipment and readable storage medium Active CN108269579B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810049575.0A CN108269579B (en) 2018-01-18 2018-01-18 Voice data processing method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810049575.0A CN108269579B (en) 2018-01-18 2018-01-18 Voice data processing method and device, electronic equipment and readable storage medium

Publications (2)

Publication Number Publication Date
CN108269579A CN108269579A (en) 2018-07-10
CN108269579B true CN108269579B (en) 2020-11-10

Family

ID=62776086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810049575.0A Active CN108269579B (en) 2018-01-18 2018-01-18 Voice data processing method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN108269579B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109697985B (en) * 2018-12-25 2021-06-29 广州市百果园信息技术有限公司 Voice signal processing method and device and terminal
CN111739544B (en) * 2019-03-25 2023-10-20 Oppo广东移动通信有限公司 Voice processing method, device, electronic equipment and storage medium
CN112309425A (en) * 2020-10-14 2021-02-02 浙江大华技术股份有限公司 Sound tone changing method, electronic equipment and computer readable storage medium
CN112420062A (en) * 2020-11-18 2021-02-26 腾讯音乐娱乐科技(深圳)有限公司 Audio signal processing method and device
CN114449339B (en) * 2022-02-16 2024-04-12 深圳万兴软件有限公司 Background sound effect conversion method and device, computer equipment and storage medium

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1164084A (en) * 1995-12-28 1997-11-05 日本胜利株式会社 Sound pitch converting apparatus
CN1283060A (en) * 1999-07-28 2001-02-07 雅马哈株式会社 Pronounciation control device and terminal device and system used on carried pronounciation control device
CN1473325A (en) * 2001-08-31 2004-02-04 ��ʽ���罨�� Pitch waveform signal generation apparatus, pitch waveform signal generation method, and program
CN101015451A (en) * 2007-02-13 2007-08-15 电子科技大学 Music brain electricity analytical method
CN101267686A (en) * 2007-03-12 2008-09-17 雅马哈株式会社 Speaker array apparatus and signal processing method therefor
CN101354889A (en) * 2008-09-18 2009-01-28 北京中星微电子有限公司 Method and apparatus for tonal modification of voice
CN101652807A (en) * 2007-02-01 2010-02-17 缪斯亚米有限公司 Music transcription
CN1831940B (en) * 2006-04-07 2010-06-23 安凯(广州)微电子技术有限公司 Tune and rhythm quickly regulating method based on audio-frequency decoder
CN101894563A (en) * 2010-07-15 2010-11-24 瑞声声学科技(深圳)有限公司 Voice enhancing method
CN102870153A (en) * 2010-02-26 2013-01-09 弗兰霍菲尔运输应用研究公司 Apparatus and method for modifying an audio signal using harmonic locking
CN103514883A (en) * 2013-09-26 2014-01-15 华南理工大学 Method for achieving self-adaptive switching of male voice and female voice
CN104409073A (en) * 2014-11-04 2015-03-11 贵阳供电局 Substation equipment sound and voice identification method
CN104780091A (en) * 2014-01-13 2015-07-15 北京发现角科技有限公司 Instant messaging method and instant messaging system with speech and audio processing function
CN105654941A (en) * 2016-01-20 2016-06-08 华南理工大学 Voice change method and device based on specific target person voice change ratio parameter
CN105788589A (en) * 2016-05-04 2016-07-20 腾讯科技(深圳)有限公司 Audio data processing method and device
CN106228973A (en) * 2016-07-21 2016-12-14 福州大学 Stablize the music voice modified tone method of tone color
CN106297770A (en) * 2016-08-04 2017-01-04 杭州电子科技大学 The natural environment sound identification method extracted based on time-frequency domain statistical nature
CN106328111A (en) * 2016-08-22 2017-01-11 广州酷狗计算机科技有限公司 Audio processing method and audio processing device
CN107170464A (en) * 2017-05-25 2017-09-15 厦门美图之家科技有限公司 A kind of changing speed of sound method and computing device based on music rhythm

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1199711A1 (en) * 2000-10-20 2002-04-24 Telefonaktiebolaget Lm Ericsson Encoding of audio signal using bandwidth expansion
US8600737B2 (en) * 2010-06-01 2013-12-03 Qualcomm Incorporated Systems, methods, apparatus, and computer program products for wideband speech coding
KR101466843B1 (en) * 2010-11-02 2014-11-28 에스케이텔레콤 주식회사 System and method for improving sound quality in data delivery communication by means of transform of audio signal, apparatus applied to the same
US8923829B2 (en) * 2012-12-28 2014-12-30 Verizon Patent And Licensing Inc. Filtering and enhancement of voice calls in a telecommunications network
CN104599677B (en) * 2014-12-29 2018-03-09 中国科学院上海高等研究院 Transient noise suppressing method based on speech reconstructing
EP3113175A1 (en) * 2015-07-02 2017-01-04 Thomson Licensing Method for converting text to individual speech, and apparatus for converting text to individual speech

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1164084A (en) * 1995-12-28 1997-11-05 日本胜利株式会社 Sound pitch converting apparatus
CN1135531C (en) * 1995-12-28 2004-01-21 日本胜利株式会社 Sound pitch converting apparatus
CN1283060A (en) * 1999-07-28 2001-02-07 雅马哈株式会社 Pronounciation control device and terminal device and system used on carried pronounciation control device
CN1473325A (en) * 2001-08-31 2004-02-04 ��ʽ���罨�� Pitch waveform signal generation apparatus, pitch waveform signal generation method, and program
CN1831940B (en) * 2006-04-07 2010-06-23 安凯(广州)微电子技术有限公司 Tune and rhythm quickly regulating method based on audio-frequency decoder
CN101652807A (en) * 2007-02-01 2010-02-17 缪斯亚米有限公司 Music transcription
CN101015451A (en) * 2007-02-13 2007-08-15 电子科技大学 Music brain electricity analytical method
CN101267686A (en) * 2007-03-12 2008-09-17 雅马哈株式会社 Speaker array apparatus and signal processing method therefor
CN101354889A (en) * 2008-09-18 2009-01-28 北京中星微电子有限公司 Method and apparatus for tonal modification of voice
CN102870153A (en) * 2010-02-26 2013-01-09 弗兰霍菲尔运输应用研究公司 Apparatus and method for modifying an audio signal using harmonic locking
CN101894563A (en) * 2010-07-15 2010-11-24 瑞声声学科技(深圳)有限公司 Voice enhancing method
CN103514883A (en) * 2013-09-26 2014-01-15 华南理工大学 Method for achieving self-adaptive switching of male voice and female voice
CN103514883B (en) * 2013-09-26 2015-12-02 华南理工大学 A kind of self-adaptation realizes men and women's sound changing method
CN104780091A (en) * 2014-01-13 2015-07-15 北京发现角科技有限公司 Instant messaging method and instant messaging system with speech and audio processing function
CN104409073A (en) * 2014-11-04 2015-03-11 贵阳供电局 Substation equipment sound and voice identification method
CN105654941A (en) * 2016-01-20 2016-06-08 华南理工大学 Voice change method and device based on specific target person voice change ratio parameter
CN105788589A (en) * 2016-05-04 2016-07-20 腾讯科技(深圳)有限公司 Audio data processing method and device
CN106228973A (en) * 2016-07-21 2016-12-14 福州大学 Stablize the music voice modified tone method of tone color
CN106297770A (en) * 2016-08-04 2017-01-04 杭州电子科技大学 The natural environment sound identification method extracted based on time-frequency domain statistical nature
CN106328111A (en) * 2016-08-22 2017-01-11 广州酷狗计算机科技有限公司 Audio processing method and audio processing device
CN107170464A (en) * 2017-05-25 2017-09-15 厦门美图之家科技有限公司 A kind of changing speed of sound method and computing device based on music rhythm

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"改进相位声码器的音频时长变换算法研究";汪石农等;《计算机工程与应用》;20121231;第155-159页 *

Also Published As

Publication number Publication date
CN108269579A (en) 2018-07-10

Similar Documents

Publication Publication Date Title
CN108269579B (en) Voice data processing method and device, electronic equipment and readable storage medium
JP5425952B2 (en) Apparatus and method for operating audio signal having instantaneous event
KR101492702B1 (en) Apparatus and method for modifying an audio signal using harmonic locking
US8706496B2 (en) Audio signal transforming by utilizing a computational cost function
Amatriain et al. Spectral processing
JP6791258B2 (en) Speech synthesis method, speech synthesizer and program
EP2401740A1 (en) Apparatus and method for determining a plurality of local center of gravity frequencies of a spectrum of an audio signal
Välimäki et al. Creating endless sounds
Ottosen et al. A phase vocoder based on nonstationary Gabor frames
JP2018077283A (en) Speech synthesis method
JP7359164B2 (en) Sound signal synthesis method and neural network training method
Verfaille et al. Adaptive digital audio effects
CN113257211B (en) Audio adjusting method, medium, device and computing equipment
US10319353B2 (en) Method for audio sample playback using mapped impulse responses
Rai et al. Analysis of three pitch-shifting algorithms for different musical instruments
Royer Pitch-shifting algorithm design and applications in music
JP6834370B2 (en) Speech synthesis method
Zivanovic Harmonic bandwidth companding for separation of overlapping harmonics in pitched signals
JP4468506B2 (en) Voice data creation device and voice quality conversion method
JP2000010597A (en) Speech transforming device and method therefor
JP4419486B2 (en) Speech analysis generation apparatus and program
JP2018077280A (en) Speech synthesis method
Cheng Design of a pitch quantization and pitch correction system for real-time music effects signal processing
CA2821035A1 (en) Device and method for manipulating an audio signal having a transient event
Esquef et al. Spectral-based analysis and synthesis of audio signals

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant