CN108269579A - Voice data processing method, device, electronic equipment and readable storage medium storing program for executing - Google Patents
Voice data processing method, device, electronic equipment and readable storage medium storing program for executing Download PDFInfo
- Publication number
- CN108269579A CN108269579A CN201810049575.0A CN201810049575A CN108269579A CN 108269579 A CN108269579 A CN 108269579A CN 201810049575 A CN201810049575 A CN 201810049575A CN 108269579 A CN108269579 A CN 108269579A
- Authority
- CN
- China
- Prior art keywords
- voice data
- target
- frequency
- audios
- domain parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 18
- 238000000034 method Methods 0.000 claims abstract description 52
- 238000012545 processing Methods 0.000 claims abstract description 48
- 230000006870 function Effects 0.000 claims description 23
- 230000015654 memory Effects 0.000 claims description 23
- 230000008569 process Effects 0.000 claims description 20
- 238000009432 framing Methods 0.000 claims description 14
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims description 7
- 230000002123 temporal effect Effects 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 241000208340 Araliaceae Species 0.000 claims description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 2
- 235000008434 ginseng Nutrition 0.000 claims description 2
- 230000008878 coupling Effects 0.000 claims 1
- 238000010168 coupling process Methods 0.000 claims 1
- 238000005859 coupling reaction Methods 0.000 claims 1
- 230000008859 change Effects 0.000 abstract description 10
- 230000000694 effects Effects 0.000 abstract description 6
- 238000006243 chemical reaction Methods 0.000 abstract description 3
- 239000011295 pitch Substances 0.000 description 40
- 238000010586 diagram Methods 0.000 description 9
- 238000004364 calculation method Methods 0.000 description 4
- 230000015572 biosynthetic process Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 210000001260 vocal cord Anatomy 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000010189 synthetic method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
- G10L21/013—Adapting to target pitch
- G10L2021/0135—Voice conversion or morphing
Landscapes
- Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Auxiliary Devices For Music (AREA)
- Electrophonic Musical Instruments (AREA)
Abstract
The present invention provides a kind of voice data processing method, device, electronic equipment and readable storage medium storing program for executing, are related to technical field of data processing.This method is by obtaining the initial frequency-domain parameter of voice data;Target frequency domain parameter corresponding with preset target MIDI audios is obtained again, is then modified according to the target frequency domain parameter to the initial frequency-domain parameter, the voice data after being modified tone.It can make the voice in voice data that there is the frequency domain parameter of target MIDI audios, allow the voice data after modified tone that there are the pitch parameters of target MIDI audios, realize that the modified tone to voice data operates, it can realize and not change in voice data in the case of word speed and voice duration, modify tone to voice data.The Phase Continuation of voice data after modified tone is not in noise, while can avoid the occurrence of mechanical sound, and modified tone effect is more preferable.The amendment of pitch or voice be can be applied in song to the conversion of song etc., it is with high application prospect in acoustic processing field.
Description
Technical field
The present invention relates to technical field of data processing, in particular to a kind of voice data processing method, device, electricity
Sub- equipment and readable storage medium storing program for executing.
Background technology
Breaking of voice is in the case where not changing audio file word speed, and changing for speaker's intonation is realized by certain algorithm
Become, translation including intonation and will be on phonetic modification to specific tone.It is discontinuous that existing modified tone processing will appear phase, and
And noise can be led to the problem of.
Invention content
In view of this, the present invention provides a kind of voice data processing method, device, electronic equipment and readable storage mediums
Matter can solve the above problems, and realize the voice Phase Continuation after modified tone.
Technical solution provided by the invention is as follows:
A kind of voice data processing method, including:
Voice data and target MIDI audios are obtained, the voice data is included after being aligned with the target MIDI audios
Voice;
Obtain the initial frequency-domain parameter of the voice data;
Target frequency domain parameter corresponding with preset target MIDI audios is obtained, wherein the initial frequency-domain parameter includes institute
The initial phase of voice data is stated, the target frequency domain parameter includes target phase corresponding with the target MIDI audios;
It is modified according to the target frequency domain parameter to the initial frequency-domain parameter, by the pitch in the voice data
Transform to the target pitch in the target MIDI audios, the voice data after being modified tone.
Further, the step of initial frequency-domain parameter for obtaining voice data, includes:
Obtain temporal voice data corresponding with the target pitch in the voice data;
Pair temporal voice data corresponding with the target pitch carries out null offset and preemphasis processing;
To by null offset and the voice data of preemphasis processing is gone to carry out time-frequency convert, it is every to obtain the voice data
The frequency domain parameter of one frame.
Further, to passing through the step of voice data of null offset and preemphasis processing is gone to carry out time-frequency convert packet
It includes:
The frame for calculating each frame in the voice data moves;
It is moved according to the frame being calculated and preset window function carries out framing, adding window to the voice data;
Each frame voice data after framing, adding window is subjected to Fourier transformation, obtains each frame in the voice data
Frequency domain parameter.
Further, the step of frame shifting for calculating each frame in the voice data, includes:
The frame that each frame is obtained using sample rate divided by target frequency is moved, wherein the target frequency is the target MIDI
The frequency of audio, target frequency are calculated using the following formula:
Wherein, F is the target frequency of the target MIDI audios, and MIDINote is the sound that the target MIDI audios include
High level.
Further, the target MIDI audio recordings have the target frequency of sound, obtain and preset target MIDI sounds
Frequently the step of corresponding target frequency domain parameter includes:
Generation pitch identical with the target frequency, and the target of the durations such as voice data corresponding with the target frequency
Waveform;
The phase value of the target waveform is extracted, as the target frequency domain parameter;
Correspondingly, it is modified according to the frequency domain parameter of the target MIDI audios to the frequency domain parameter of the voice data
The step of include:
In the voice data target will be replaced with the phase value of the voice data of the target waveform corresponding position
The phase value of waveform, the frequency domain parameter of the voice data after being modified tone;
Inverse Fourier transform is carried out, and be overlapped at superposition algorithm by OLA to the frequency domain parameter of the voice data after modified tone
Voice data after being modified tone after reason.
The present invention also provides a kind of voice data processing apparatus, including:
Data acquisition module, for obtaining voice data and target MIDI audios, the voice data includes and the mesh
Mark the voice after the alignment of MIDI audios;
Language data process module, for obtaining the initial frequency-domain parameter of the voice data;
Target MIDI audio processing modules obtain target frequency domain parameter corresponding with preset target MIDI audios, wherein
The initial frequency-domain parameter includes the initial phase of the voice data, and the target frequency domain parameter includes and the target MIDI
The corresponding target phase of audio;
Modify tone module, for being modified according to the target frequency domain parameter to the initial frequency-domain parameter, by institute's predicate
Pitch in sound data transforms to the target pitch in the target MIDI audios, the voice data after being modified tone.
Further, the method that the language data process module obtains the initial frequency-domain parameter of voice data includes:
Null offset and preemphasis processing are carried out to the voice data;
To by null offset and the voice data of preemphasis processing is gone to carry out time-frequency convert, it is every to obtain the voice data
The frequency domain parameter of one frame.
Further, the language data process module to through going the voice data that null offset and preemphasis are handled into
The step of row time-frequency convert, includes:
The frame for calculating each frame in the voice data moves;
It is moved according to the frame being calculated and preset window function carries out framing, adding window to the voice data;
Each frame voice data after framing, adding window is subjected to Fourier transformation, obtains each frame in the voice data
Frequency domain parameter.
Further, the step of frame of each frame moves in the language data process module calculating voice data is wrapped
It includes:
The frame that each frame is obtained using sample rate divided by target frequency is moved, wherein the target frequency is the target MIDI
The frequency of audio, target frequency are calculated using the following formula:
Wherein F is the target frequency of the target MIDI audios, and MIDINote is the sound that the target MIDI audios include
High level.
Further, the target MIDI audio recordings have the target frequency of sound, the target MIDI audio frequency process moulds
The method that block obtains target frequency domain parameter corresponding with preset target MIDI audios includes:
Generation pitch identical with the target frequency, and the target of the durations such as voice data corresponding with the target frequency
Waveform;
The phase value of the target waveform is extracted, as the target frequency domain parameter;
Correspondingly, the modified tone module according to the frequency domain parameters of the target MIDI audios to the frequency domain of the voice data
The method that parameter is modified includes:
In the voice data target will be replaced with the phase value of the voice data of the target waveform corresponding position
The phase value of waveform, the frequency domain parameter of the voice data after being modified tone;
Inverse Fourier transform is carried out, and be overlapped at superposition algorithm by OLA to the frequency domain parameter of the voice data after modified tone
Voice data after being modified tone after reason.
The present invention also provides a kind of electronic equipment, the electronic equipment includes:Processor and memory, the storage
Device is couple to the processor, and the memory store instruction makes the electronics when executed by the processor
Equipment performs following operate:
Voice data and target MIDI audios are obtained, the voice data is included after being aligned with the target MIDI audios
Voice;
Obtain the initial frequency-domain parameter of the voice data;
Target frequency domain parameter corresponding with preset target MIDI audios is obtained, wherein the initial frequency-domain parameter includes institute
The initial phase of voice data is stated, the target frequency domain parameter includes target phase corresponding with the target MIDI audios;
It is modified according to the target frequency domain parameter to the initial frequency-domain parameter, by the pitch in the voice data
Transform to the target pitch in the target MIDI audios, the voice data after being modified tone.
The present invention also provides a kind of readable storage medium storing program for executing, the readable storage medium storing program for executing includes computer program, the meter
In electronic equipment perform claim requirement 1-5 calculation machine program controls the readable storage medium storing program for executing when running where described in any one
Voice data processing method.
The embodiment of the present application can make frequency domain parameter of the voice in voice data with target MIDI audios, after making modified tone
Voice data can have target MIDI audios pitch parameters, realize to voice data modified tone operation, can realize not
Change in voice data in the case of word speed and voice duration, modify tone to voice data.The phase of voice data after modified tone
Position is continuous, is not in noise, while can avoid the occurrence of mechanical sound, and modified tone effect is more preferable.It can be applied to pitch in song
Amendment or voice to the conversion of song etc., it is with high application prospect in acoustic processing field.
For the above objects, features and advantages of the present invention is enable to be clearer and more comprehensible, preferred embodiment cited below particularly, and coordinate
Appended attached drawing, is described in detail below.
Description of the drawings
It in order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached
Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair
The restriction of range, for those of ordinary skill in the art, without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 is the block diagram of electronic equipment provided in an embodiment of the present invention.
Fig. 2 is a kind of flow diagram of voice data processing method provided in an embodiment of the present invention.
Fig. 3 is that the flow of the sub-step of step S102 in a kind of voice data processing method provided in an embodiment of the present invention is shown
It is intended to.
Fig. 4 is that the flow of the sub-step of step S103 in a kind of voice data processing method provided in an embodiment of the present invention is shown
It is intended to.
Fig. 5 is a kind of high-level schematic functional block diagram of voice data processing apparatus provided in an embodiment of the present invention.
Icon:100- electronic equipments;111- memories;112- storage controls;113- processors;At 300- voice data
Manage device;310- data acquisition modules;320- language data process modules;330- target MIDI audio processing modules;340- becomes
Mode transfer block.
Specific embodiment
Below in conjunction with attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Ground describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.Usually exist
The component of the embodiment of the present invention described and illustrated in attached drawing can be configured to arrange and design with a variety of different herein.Cause
This, the detailed description of the embodiment of the present invention to providing in the accompanying drawings is not intended to limit claimed invention below
Range, but it is merely representative of the selected embodiment of the present invention.Based on the embodiment of the present invention, those skilled in the art are not doing
Go out all other embodiments obtained under the premise of creative work, shall fall within the protection scope of the present invention.
It should be noted that:Similar label and letter represents similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined in a attached drawing, does not then need to that it is further defined and explained in subsequent attached drawing.Meanwhile the present invention's
In description, term " first ", " second " etc. are only used for distinguishing description, and it is not intended that instruction or hint relative importance.
Current existing modified tone method can be mainly divided into two major class:One kind is the method for time domain interpolation splicing, such as together
Tired fold of step is mutually fixed synthetic method (synchronized overlap-add fixed synthesis, SOLA-FS);It is another kind of
It is frequency domain facture, is commonly referred to as phase vocoder (phase vocoder).The advantages of time-domain processing method is that calculation amount is small,
And the naturalness of modified tone result is preferable, but since splicing can bring the discontinuous of phase, generates noise;Frequency domain method needs
Time-frequency convert, phase estimation etc. are carried out, needs the voice after larger operand and modified tone that there can be mechanical sound.
Fig. 1 is please referred to, is the block diagram for a kind of electronic equipment 100 that present pre-ferred embodiments provide.It is described
Electronic equipment 100 can include voice data processing apparatus 300, memory 111, storage control 112 and processor 113.
The memory 111, storage control 112 and 113 each element of processor are directly or indirectly electrical between each other
Connection, to realize the transmission of data or interaction.For example, these elements can pass through one or more communication bus or letter between each other
Number line, which is realized, to be electrically connected.The voice data processing apparatus 300 can include it is at least one can be with software or firmware
(firmware) form is stored in the memory 111 or is solidificated in the operating system of the electronic equipment 100
Software function module in (operating system, OS).The processor 113 is deposited for performing in the memory 111
The executable module of storage, such as software function module included by the voice data processing apparatus 300 and computer program etc..
Wherein, the memory 111 may be, but not limited to, random access memory (Random Access
Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable
Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only
Memory, EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only
Memory, EEPROM) etc..Wherein, memory 111 is for storing program, the processor 113 after execute instruction is received,
Perform described program.The processor 113 and other possible components can control the access of memory 111 in the storage
It is carried out under the control of device 112.
The processor 113 may be a kind of IC chip, have the processing capacity of signal.Above-mentioned processor
113 can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit
(Network Processor, NP) etc.;It can also be digital signal processor (DSP), application-specific integrated circuit (ASIC), ready-made
Programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor logic, discrete hardware group
Part.It can realize or perform disclosed each method, step and the logic diagram in the embodiment of the present invention.General processor can be with
It is microprocessor or the processor can also be any conventional processor etc..
The embodiment of the present application provides a kind of voice data processing method, can realize the modified tone to voice data, can be with
Applied to above-mentioned electronic equipment 100, as shown in Fig. 2, this method includes the following steps.
Step S101, obtains voice data and target MIDI audios, and the voice data includes and the target MIDI sounds
Voice after frequency alignment.
Step S102 obtains the initial frequency-domain parameter of voice data.
Voice data in the embodiment of the present application can be one section of voice or one section of song, the embodiment of the present application
The duration and content of voice data are not limited, which can select to determine according to actual needs.The embodiment of the present application
By realizing the modified tone to voice in voice data to the processing of voice data, the initial frequency-domain parameter being calculated can be language
The initial frequency-domain parameter of each frame data in sound data can also only calculate the frequency domain parameter of frame for needing to modify tone, in the application
In embodiment, the initial frequency-domain parameter of voice data can include the phase and amplitude of sound in voice data.
Modified tone in the embodiment of the present application refers to change the pitch of sound in voice data, and the pitch of a certain frame voice is changed
Become the pitch needed.
As shown in figure 3, the step of obtaining the initial frequency-domain parameter of voice data can include following sub-step.
Sub-step S1021 carries out the voice data null offset and preemphasis processing.
Sub-step S1022, to by null offset and the voice data of preemphasis processing is gone to carry out time-frequency convert, obtaining institute
State the frequency domain parameter of each frame of voice data.
Since voice data can have null offset, by going null offset that can improve the feelings of null offset
Condition.Voice data can be radiated by lip and be influenced simultaneously, and the high frequency section of voice can be aggravated by preemphasis, removed
The influence of lip radiation increases the high frequency resolution of voice.Go null offset and preemphasis handle may be used the following formula into
Row calculates.
X (n)=x (n)-mean_x
Wherein x (n) is the corresponding sampled value of nth point, and to go the output valve after null offset, mean_x is calculated
The mean value of the temporal amplitude of this section of voice.
Preemphasis can be realized by single order FIR high-pass filters.Specific formula for calculation is as follows.
Y (n)=x (n)-ax (n-1)
Wherein y (n) is pretreated output, and x (n) is not pretreated audio, and a is pre emphasis factor, is generally taken
0.9~1.0, optionally, a takes 0.98.
To following by going the method that the voice data of null offset and preemphasis processing carries out time-frequency convert that can include
Three steps.
First, the frame for calculating each frame in the voice data moves.
Again, it is moved according to the frame being calculated and preset window function carries out framing, adding window to the voice data.
Then, each frame voice data after framing, adding window is subjected to Fourier transformation, obtained every in the voice data
The frequency domain parameter of one frame.
The frame for calculating each frame in the voice data moves and sample rate divided by target frequency can be utilized to obtain each frame
Frame moves, wherein the target frequency is the frequency of the target MIDI audios, is calculated using the following formula:
Wherein F is corresponding frequency pitch, and MIDINote is the pitch value that target MIDI audio files include.To improve
Numerical value 110 can be substituted for 220 by one octave.
Voice signal is a kind of signal of time to time change, however the state change speed of phonatory organ is compared with acoustical vibration
Speed it is slowly more.One can consider that voice signal is stable, i.e. short-term stationarity in a short period of time.In this way
Voice can be carried out framing and then analyzed by us.General frame length is 10~30 milliseconds, has frame to fold between frame and frame.Adding window
It is main that there are two act on:First, making the signal overall situation more continuous, Gibbs' effect is avoided the occurrence of.Second is that make originally without periodically
Voice signal show the Partial Feature of periodic function.Window function may be used and carry out windowing process, several windows are set forth below
Function.
Rectangular window function is as follows:
Hamming window function is as follows:
Hanning window function is as follows:
Wherein N is long for window.Windowing process to voice data is realized by above-mentioned window function.
The initial frequency-domain parameter of voice data can be obtained by the above method.
Step S103 obtains target frequency domain parameter corresponding with preset target MIDI audios.
Target MIDI audios in the embodiment of the present application, which can include voice data, needs the pitch information after modifying tone, the mesh
Mark MIDI audios can be the data of the durations such as one section and voice data, and target MIDI audios can be used as voice data to modify tone
Benchmark.It is understood that the voice data for needing to modify tone can be determined first, then it is determined as the target MIDI of modified tone benchmark
Audio.Target MIDI audios can also be first determined, further according to the voice data of the durations such as the duration selection of target MIDI audios.
In the embodiment of the present application, target MIDI audios can be MIDI (Musical instrument Digital
Interface) the file of form is held wherein including the pitch information of different time points and different pitches on the basis of the time
Continuous duration and the starting of different pitches and termination time point.It, can be true by determining the pitch information of target MIDI audios
Determining voice data needs to become the pitch being transferred to.And according to pitch and the transformational relation of frequency, it may be determined that different pitches correspond to
Frequency.
It is understood that during voice data and target MIDI audios is obtained, can first determine to need to modify tone
Initial position and corresponding need to become the target pitch be transferred to.
Detailed, as shown in figure 4, the target frequency domain parameter of target MIDI audios can be determined by following sub-step.
Sub-step S1031 generates pitch identical with the target frequency, and voice data corresponding with the target frequency
Wait the target waveform of durations.
The frequency of the target waveform is identical with the frequency of the goal-selling frequency in the target MIDI audios, the mesh
The duration for marking waveform is equal with the duration of the corresponding voice data of goal-selling frequency.
As previously mentioned, include different pitch informations in target MIDI audios, by pitch and the transformational relation of frequency,
It can determine the corresponding frequency of different pitches, these frequencies are the goal-selling frequency that target MIDI audios include.Generation
Target waveform frequency it is identical with the frequency of the goal-selling frequency in target MIDI audios, in a target MIDI audio
Multiple goal-selling frequencies can be included, can generate respectively with multiple corresponding target waveforms of goal-selling frequency, and
And the duration of target waveform is equal with the duration difference of the voice of corresponding position in voice data.
Target waveform can determine according to actual needs, such as can generate the deformation of sine wave or sine wave as mesh
Mark waveform.Since the vocal cords of the mankind are the sound that directly generates sine wave, and the vibration of vocal cords and the waveform of string class when speaking
It is similar.When all in voice data are carried out with modified tone operation, the voice that can be directed to different time points carries out targetedly
Waveform selection.Target waveform of the sine wave as the breaking of voice at all time points can be selected, different time can also be directed to
The voice data of point generates different target waveforms.Different target waveforms can correspond to different tone colors so that human auditory
Impression is also different.
Detailed, can target waveform be generated by following method.
First, a target waveform corresponding sampling number in target pitch is obtained.It is calculated by equation below.
Len=Fs/F
Wherein Len is the corresponding sampling number of target waveform a cycle, and Fs is sample rate, and F is target frequency.
Then, the sampling interval is calculated.
Delta1=(4* π)/Len
Delta2=(2* π)/Len
The corresponding sampled value of different target waveform is calculated again.It is represented by with reference to tone color 1:
Y [n]=(sin (- 3* π+n*delta1))/(- 3* π+n*delta1)
It is represented by with reference to tone color 2:
Y [n]=(sin (n*delta2)+abs (sin (n*delta2)) * alpha)/(1+alpha)
Wherein y is the corresponding all sampled values of waveform a cycle, and n is sampled point, 0≤n<Len, abs () is ask absolute
Value, alpha are more than 0 and are less than 1.After data by repeating a cycle are multiple, you can obtain and the wave of target voice equal length
Shape sample values.
Sub-step S1032 extracts the phase value of the target waveform.
After generating corresponding target waveform, can framing, windowing process first be carried out to target waveform, make the frame of target waveform
The long frame length with voice data is consistent, then carry out Short Time Fourier Transform, is corresponded to after extracting each frame target waveform transformation
Phase value, the target frequency domain parameter as target MIDI audios.
Step S104 modifies to the initial frequency-domain parameter according to the target frequency domain parameter, by the voice number
Pitch in transforms to the target pitch in the target MIDI audios, the voice data after being modified tone.
After target frequency domain parameter having been obtained by above-mentioned steps, it is possible to replace with the initial frequency-domain parameter of voice data
Target frequency domain parameter realizes the modification to initial frequency-domain parameter.Specifically, the initial phase of voice data is replaced with corresponding
The phase value of target waveform.Due to including voiceless sound and voiced sound in voice data, and voiceless sound does not have periodically, if to voiceless sound
Corresponding initial phase also carries out phase value replacement, and the result after modified tone will be made to be deteriorated.Phase value in the embodiment of the present application
Replacement the phase value of voiceless sound can not be replaced, the corresponding voice data of voiceless sound is still only for voiced sound corresponding sound frame
So use original phase value.
Detailed, it can be first by the phase value with the voice data of the target waveform corresponding position in the voice data
The phase value of the target waveform is replaced with, the frequency domain parameter of the voice data after being modified tone.
Inverse Fourier transform is carried out, and pass through OLA (Overlap-and- to the frequency domain parameter of the voice data after modified tone
Add the voice data after) being modified tone after the processing of overlapping superposition algorithm.Voice data after modified tone can be carried out exporting, protect
Other operations such as deposit.
The embodiment of the present application additionally provides a kind of voice data processing apparatus 300, as shown in figure 5, including:
Data acquisition module 310, for obtaining voice data and target MIDI audios, the voice data include with it is described
Voice after the alignment of target MIDI audios;
Language data process module 320, for obtaining the initial frequency-domain parameter of the voice data;
Target MIDI audio processing modules 330 obtain target frequency domain parameter corresponding with preset target MIDI audios,
Described in initial frequency-domain parameter include the initial phase of the voice data, the target frequency domain parameter includes and the target
The corresponding target phase of MIDI audios;
Modify tone module 340, for being modified according to the target frequency domain parameter to the initial frequency-domain parameter, by described in
Pitch in voice data transforms to the target pitch in the target MIDI audios, the voice data after being modified tone.
It is understood that the method that the language data process module 320 obtains the initial frequency-domain parameter of voice data
Including:
Null offset and preemphasis processing are carried out to the voice data;
To by null offset and the voice data of preemphasis processing is gone to carry out time-frequency convert, it is every to obtain the voice data
The frequency domain parameter of one frame.
In the present embodiment, the language data process module 320 is to the language by going null offset and preemphasis processing
The step of sound data progress time-frequency convert, includes:
The frame for calculating each frame in the voice data moves;
It is moved according to the frame being calculated and preset window function carries out framing, adding window to the voice data;
Each frame voice data after framing, adding window is subjected to Fourier transformation, obtains each frame in the voice data
Frequency domain parameter.
In the present embodiment, the language data process module 320 calculates the frame shifting of each frame in the voice data
Step includes:
The frame that each frame is obtained using sample rate divided by target frequency is moved, wherein the target frequency is the target MIDI
The frequency of audio, target frequency are calculated using the following formula:
Wherein F is the target frequency of the target MIDI audios, and MIDINote is the sound that the target MIDI audios include
High level.
In the present embodiment, the target MIDI audio recordings have the target frequency of sound, modified tone 340 basis of module
The method that the frequency domain parameter of the target MIDI audios modifies to the frequency domain parameter of the voice data includes:
Generation pitch identical with the target frequency, and the target of the durations such as voice data corresponding with the target frequency
Waveform;
Extract the phase value of the target waveform;
In the voice data target will be replaced with the phase value of the voice data of the target waveform corresponding position
The phase value of waveform, the frequency domain parameter of the voice data after being modified tone;
Inverse Fourier transform is carried out, and be overlapped at superposition algorithm by OLA to the frequency domain parameter of the voice data after modified tone
Voice data after being modified tone after reason.
Target waveform corresponding with voice data, and target wave are generated according to target MIDI audios in the embodiment of the present application
Shape is that the pitch information included based on target MIDI audios is generated, and the phase value for reusing target waveform replaces voice number
The phase value of voice in.So that the frequency domain parameter of voice data is modified to frequency domain ginseng corresponding with target MIDI audios
Number makes voice data have the pitch parameters of target MIDI audios, realizes the modified tone processing to voice data.The embodiment of the present application
By the replacement of phase value, not by the phase value zero setting of voice data, while modified tone is realized, phase can be avoided the occurrence of
The situation of the discontinuous and mechanical sound in position.Simultaneously by using replacement of the target waveform to voice data phase value so that after modified tone
Voice data can have target waveform sound effect, make the voice after modified tone have target waveform tone color property.
In conclusion modify by using the frequency domain parameter of target MIDI audios to the frequency domain parameter of voice data,
It can make the voice in voice data that there is the frequency domain parameter of target MIDI audios, allow the voice data after modified tone that there is mesh
Mark the pitch parameters of MIDI audios, realize the modified tone operation to voice data, can realize do not change in voice data word speed and
In the case of voice duration, modify tone to voice data.The Phase Continuation of voice data after modified tone is not in noise,
Mechanical sound can be avoided the occurrence of simultaneously, and modified tone effect is more preferable.The amendment of pitch or voice be can be applied in song to song
Conversion etc., it is with high application prospect in acoustic processing field.
This method is improved on traditional modified tone algorithm based on zero phase, by adding in same frequency
The corresponding phase value of waveform improves situation of the phase discontinuously with mechanical sound.Addition waveform is added on primitive sound simultaneously
Some timbre informations, thus can obtain different modified tones as a result, increasing the more of modified tone by adding different waveform
Sample.Each user can be made to obtain personalized modified tone as a result, having allowing by way of the free waveform of user in the application
Preferable practicality background.This method compared to it is traditional had based on the method for zero phase to the situation of mechanical sound preferably change
Kind, compared to traditional time domain approach, there has also been be more obviously improved on phase continuity.
Method provided by the embodiments of the present application can also be combined with changing speed of sound method, and can be combined audio mixing technology and be incited somebody to action
Dry sound after modified tone is implemented in combination with being automatically synthesized for song with background pleasure.Since the modified tone algorithm in this method can realize individual character
The modified tone of change, so this method can be achieved on personalized song synthesis in song synthesis.Different addition waves can be passed through
Shape exports different song to be controlled to synthesize, and waveform is optional for user, and such user can select not according to the hobby of oneself
Same effect increases the practicability of this method.
In several embodiments provided herein, it should be understood that disclosed device and method can also pass through
Other modes are realized.The apparatus embodiments described above are merely exemplary, for example, flow chart and block diagram in attached drawing
Show the device of multiple embodiments according to the present invention, the architectural framework in the cards of method and computer program product,
Function and operation.In this regard, each box in flow chart or block diagram can represent the one of a module, program segment or code
Part, a part for the module, program segment or code include one or more and are used to implement holding for defined logic function
Row instruction.It should also be noted that at some as in the realization method replaced, the function that is marked in box can also be to be different from
The sequence marked in attached drawing occurs.For example, two continuous boxes can essentially perform substantially in parallel, they are sometimes
It can perform in the opposite order, this is depended on the functions involved.It is it is also noted that every in block diagram and/or flow chart
The combination of a box and the box in block diagram and/or flow chart can use function or the dedicated base of action as defined in performing
It realizes or can be realized with the combination of specialized hardware and computer instruction in the system of hardware.
In addition, each function module in each embodiment of the present invention can integrate to form an independent portion
Point or modules individualism, can also two or more modules be integrated to form an independent part.
If the function is realized in the form of software function module and is independent product sale or in use, can be with
It is stored in a computer read/write memory medium.Based on such understanding, technical scheme of the present invention is substantially in other words
The part contribute to the prior art or the part of the technical solution can be embodied in the form of software product, the meter
Calculation machine software product is stored in a storage medium, is used including some instructions so that a computer equipment (can be
People's computer, server or network equipment etc.) perform all or part of the steps of the method according to each embodiment of the present invention.
And aforementioned storage medium includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic disc or CD.It needs
Illustrate, herein, relational terms such as first and second and the like be used merely to by an entity or operation with
Another entity or operation distinguish, and without necessarily requiring or implying between these entities or operation, there are any this realities
The relationship or sequence on border.Moreover, term " comprising ", "comprising" or its any other variant are intended to the packet of nonexcludability
Contain so that process, method, article or equipment including a series of elements not only include those elements, but also including
It other elements that are not explicitly listed or further includes as elements inherent to such a process, method, article, or device.
In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including the element
Process, method, also there are other identical elements in article or equipment.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, that is made any repaiies
Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.It should be noted that:Similar label and letter exists
Similar terms are represented in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, are then not required in subsequent attached drawing
It is further defined and is explained.
The above description is merely a specific embodiment, but protection scope of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in change or replacement, should all contain
Lid is within protection scope of the present invention.Therefore, protection scope of the present invention described should be subject to the protection scope in claims.
Claims (12)
1. a kind of voice data processing method, which is characterized in that including:
Voice data and target MIDI audios are obtained, the voice data includes the language after being aligned with the target MIDI audios
Sound;
Obtain the initial frequency-domain parameter of the voice data;
Target frequency domain parameter corresponding with preset target MIDI audios is obtained, wherein the initial frequency-domain parameter includes institute's predicate
The initial phase of sound data, the target frequency domain parameter include target phase corresponding with the target MIDI audios;
It is modified according to the target frequency domain parameter to the initial frequency-domain parameter, the pitch in the voice data is converted
To the target pitch in the target MIDI audios, the voice data after being modified tone.
2. voice data processing method according to claim 1, which is characterized in that obtain the initial frequency-domain ginseng of voice data
Several steps include:
Obtain temporal voice data corresponding with the target pitch in the voice data;
Pair temporal voice data corresponding with the target pitch carries out null offset and preemphasis processing;
To by null offset and the voice data of preemphasis processing is gone to carry out time-frequency convert, obtaining each frame of the voice data
Frequency domain parameter.
3. voice data processing method according to claim 2, which is characterized in that by going null offset and preemphasis
The step of voice data progress time-frequency convert of processing, includes:
The frame for calculating each frame in the voice data moves;
It is moved according to the frame being calculated and preset window function carries out framing, adding window to the voice data;
Each frame voice data after framing, adding window is subjected to Fourier transformation, obtains the frequency of each frame in the voice data
Field parameter.
4. voice data processing method according to claim 3, which is characterized in that calculate each frame in the voice data
Frame move the step of include:
The frame that each frame is obtained using sample rate divided by target frequency is moved, wherein the target frequency is the target MIDI audios
Frequency, target frequency is calculated using the following formula:
Wherein, F is the target frequency of the target MIDI audios, and MIDINote is the pitch that the target MIDI audios include
Value.
5. voice data processing method according to claim 1, which is characterized in that the target MIDI audio recordings are sound
The step of target frequency of sound, acquisition target frequency domain parameter corresponding with preset target MIDI audios, includes:
Generation pitch identical with the target frequency, and the target wave of the durations such as voice data corresponding with the target frequency
Shape;
The phase value of the target waveform is extracted, as the target frequency domain parameter;
Correspondingly, the step modified according to the frequency domain parameter of the target MIDI audios to the frequency domain parameter of the voice data
Suddenly include:
In the voice data target waveform will be replaced with the phase value of the voice data of the target waveform corresponding position
Phase value, the frequency domain parameter of the voice data after being modified tone;
Inverse Fourier transform is carried out to the frequency domain parameter of the voice data after modified tone, and after being overlapped superposition algorithm processing by OLA
Voice data after being modified tone.
6. a kind of voice data processing apparatus, which is characterized in that including:
Data acquisition module, for obtaining voice data and target MIDI audios, the voice data includes and the target
Voice after the alignment of MIDI audios;
Language data process module, for obtaining the initial frequency-domain parameter of the voice data;
Target MIDI audio processing modules obtain target frequency domain parameter corresponding with preset target MIDI audios, wherein described
Initial frequency-domain parameter includes the initial phase of the voice data, and the target frequency domain parameter includes and the target MIDI audios
Corresponding target phase;
Modify tone module, for being modified according to the target frequency domain parameter to the initial frequency-domain parameter, by the voice number
Pitch in transforms to the target pitch in the target MIDI audios, the voice data after being modified tone.
7. voice data processing apparatus according to claim 6, which is characterized in that the language data process module obtains
The method of the initial frequency-domain parameter of voice data includes:
Null offset and preemphasis processing are carried out to the voice data;
To by null offset and the voice data of preemphasis processing is gone to carry out time-frequency convert, obtaining each frame of the voice data
Frequency domain parameter.
8. voice data processing apparatus according to claim 7, which is characterized in that the language data process module is to warp
The step of voice data of past null offset and preemphasis processing carries out time-frequency convert includes:
The frame for calculating each frame in the voice data moves;
It is moved according to the frame being calculated and preset window function carries out framing, adding window to the voice data;
Each frame voice data after framing, adding window is subjected to Fourier transformation, obtains the frequency of each frame in the voice data
Field parameter.
9. voice data processing apparatus according to claim 7, which is characterized in that the language data process module calculates
The step of frame of each frame moves in the voice data includes:
The frame that each frame is obtained using sample rate divided by target frequency is moved, wherein the target frequency is the target MIDI audios
Frequency, target frequency is calculated using the following formula:
Wherein F is the target frequency of the target MIDI audios, and MIDINote is the pitch value that the target MIDI audios include.
10. voice data processing apparatus according to claim 6, which is characterized in that the target MIDI audio recordings have
The target frequency of sound, the target MIDI audio processing modules obtain target frequency domain corresponding with preset target MIDI audios
The method of parameter includes:
Generation pitch identical with the target frequency, and the target wave of the durations such as voice data corresponding with the target frequency
Shape;
The phase value of the target waveform is extracted, as the target frequency domain parameter;
Correspondingly, the modified tone module according to the frequency domain parameters of the target MIDI audios to the frequency domain parameter of the voice data
The method modified includes:
In the voice data target waveform will be replaced with the phase value of the voice data of the target waveform corresponding position
Phase value, the frequency domain parameter of the voice data after being modified tone;
Inverse Fourier transform is carried out to the frequency domain parameter of the voice data after modified tone, and after being overlapped superposition algorithm processing by OLA
Voice data after being modified tone.
11. a kind of electronic equipment, which is characterized in that the electronic equipment includes:Processor and memory, the memory coupling
The processor is connected to, the memory store instruction makes the electronic equipment when executed by the processor
Perform following operate:
Voice data and target MIDI audios are obtained, the voice data includes the language after being aligned with the target MIDI audios
Sound;
Obtain the initial frequency-domain parameter of the voice data;
Target frequency domain parameter corresponding with preset target MIDI audios is obtained, wherein the initial frequency-domain parameter includes institute's predicate
The initial phase of sound data, the target frequency domain parameter include target phase corresponding with the target MIDI audios;
It is modified according to the target frequency domain parameter to the initial frequency-domain parameter, the pitch in the voice data is converted
To the target pitch in the target MIDI audios, the voice data after being modified tone.
12. a kind of readable storage medium storing program for executing, the readable storage medium storing program for executing includes computer program, which is characterized in that the computer
Voice in electronic equipment perform claim requirement 1-5 program controls the readable storage medium storing program for executing when running where described in any one
Data processing method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810049575.0A CN108269579B (en) | 2018-01-18 | 2018-01-18 | Voice data processing method and device, electronic equipment and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810049575.0A CN108269579B (en) | 2018-01-18 | 2018-01-18 | Voice data processing method and device, electronic equipment and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108269579A true CN108269579A (en) | 2018-07-10 |
CN108269579B CN108269579B (en) | 2020-11-10 |
Family
ID=62776086
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810049575.0A Active CN108269579B (en) | 2018-01-18 | 2018-01-18 | Voice data processing method and device, electronic equipment and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108269579B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109697985A (en) * | 2018-12-25 | 2019-04-30 | 广州市百果园信息技术有限公司 | Audio signal processing method, device and terminal |
CN111739544A (en) * | 2019-03-25 | 2020-10-02 | Oppo广东移动通信有限公司 | Voice processing method and device, electronic equipment and storage medium |
CN112309425A (en) * | 2020-10-14 | 2021-02-02 | 浙江大华技术股份有限公司 | Sound tone changing method, electronic equipment and computer readable storage medium |
CN112420062A (en) * | 2020-11-18 | 2021-02-26 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio signal processing method and device |
CN114449339A (en) * | 2022-02-16 | 2022-05-06 | 深圳万兴软件有限公司 | Background sound effect conversion method and device, computer equipment and storage medium |
Citations (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1164084A (en) * | 1995-12-28 | 1997-11-05 | 日本胜利株式会社 | Sound pitch converting apparatus |
CN1283060A (en) * | 1999-07-28 | 2001-02-07 | 雅马哈株式会社 | Pronounciation control device and terminal device and system used on carried pronounciation control device |
CN1470050A (en) * | 2000-10-20 | 2004-01-21 | ����ɭ�绰�ɷ�����˾ | Perceptually improved enhancement of encoded ocoustic signals |
CN1473325A (en) * | 2001-08-31 | 2004-02-04 | ��ʽ���罨�� | Pitch waveform signal generation apparatus, pitch waveform signal generation method, and program |
CN101015451A (en) * | 2007-02-13 | 2007-08-15 | 电子科技大学 | Music brain electricity analytical method |
CN101267686A (en) * | 2007-03-12 | 2008-09-17 | 雅马哈株式会社 | Speaker array apparatus and signal processing method therefor |
CN101354889A (en) * | 2008-09-18 | 2009-01-28 | 北京中星微电子有限公司 | Method and apparatus for tonal modification of voice |
CN101652807A (en) * | 2007-02-01 | 2010-02-17 | 缪斯亚米有限公司 | Music transcription |
CN1831940B (en) * | 2006-04-07 | 2010-06-23 | 安凯(广州)微电子技术有限公司 | Tune and rhythm quickly regulating method based on audio-frequency decoder |
CN101894563A (en) * | 2010-07-15 | 2010-11-24 | 瑞声声学科技(深圳)有限公司 | Voice enhancing method |
KR20120061008A (en) * | 2010-11-02 | 2012-06-12 | 에스케이 텔레콤주식회사 | System and method for improving sound quality in data delivery communication by means of transform of audio signal, apparatus applied to the same |
CN102870153A (en) * | 2010-02-26 | 2013-01-09 | 弗兰霍菲尔运输应用研究公司 | Apparatus and method for modifying an audio signal using harmonic locking |
CN102934163A (en) * | 2010-06-01 | 2013-02-13 | 高通股份有限公司 | Systems, methods, apparatus, and computer program products for wideband speech coding |
CN103514883A (en) * | 2013-09-26 | 2014-01-15 | 华南理工大学 | Method for achieving self-adaptive switching of male voice and female voice |
US20140187210A1 (en) * | 2012-12-28 | 2014-07-03 | Cellco Partnership D/B/A Verizon Wireless | Filtering and enhancement of voice calls in a telecommunications network |
CN104409073A (en) * | 2014-11-04 | 2015-03-11 | 贵阳供电局 | Substation equipment sound and voice identification method |
CN104599677A (en) * | 2014-12-29 | 2015-05-06 | 中国科学院上海高等研究院 | Speech reconstruction-based instantaneous noise suppressing method |
CN104780091A (en) * | 2014-01-13 | 2015-07-15 | 北京发现角科技有限公司 | Instant messaging method and instant messaging system with speech and audio processing function |
CN105654941A (en) * | 2016-01-20 | 2016-06-08 | 华南理工大学 | Voice change method and device based on specific target person voice change ratio parameter |
CN105788589A (en) * | 2016-05-04 | 2016-07-20 | 腾讯科技(深圳)有限公司 | Audio data processing method and device |
CN106228973A (en) * | 2016-07-21 | 2016-12-14 | 福州大学 | Stablize the music voice modified tone method of tone color |
EP3113175A1 (en) * | 2015-07-02 | 2017-01-04 | Thomson Licensing | Method for converting text to individual speech, and apparatus for converting text to individual speech |
CN106297770A (en) * | 2016-08-04 | 2017-01-04 | 杭州电子科技大学 | The natural environment sound identification method extracted based on time-frequency domain statistical nature |
CN106328111A (en) * | 2016-08-22 | 2017-01-11 | 广州酷狗计算机科技有限公司 | Audio processing method and audio processing device |
CN107170464A (en) * | 2017-05-25 | 2017-09-15 | 厦门美图之家科技有限公司 | A kind of changing speed of sound method and computing device based on music rhythm |
-
2018
- 2018-01-18 CN CN201810049575.0A patent/CN108269579B/en active Active
Patent Citations (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1135531C (en) * | 1995-12-28 | 2004-01-21 | 日本胜利株式会社 | Sound pitch converting apparatus |
CN1164084A (en) * | 1995-12-28 | 1997-11-05 | 日本胜利株式会社 | Sound pitch converting apparatus |
CN1283060A (en) * | 1999-07-28 | 2001-02-07 | 雅马哈株式会社 | Pronounciation control device and terminal device and system used on carried pronounciation control device |
CN1470050A (en) * | 2000-10-20 | 2004-01-21 | ����ɭ�绰�ɷ�����˾ | Perceptually improved enhancement of encoded ocoustic signals |
CN1473325A (en) * | 2001-08-31 | 2004-02-04 | ��ʽ���罨�� | Pitch waveform signal generation apparatus, pitch waveform signal generation method, and program |
CN1831940B (en) * | 2006-04-07 | 2010-06-23 | 安凯(广州)微电子技术有限公司 | Tune and rhythm quickly regulating method based on audio-frequency decoder |
CN101652807A (en) * | 2007-02-01 | 2010-02-17 | 缪斯亚米有限公司 | Music transcription |
CN101015451A (en) * | 2007-02-13 | 2007-08-15 | 电子科技大学 | Music brain electricity analytical method |
CN101267686A (en) * | 2007-03-12 | 2008-09-17 | 雅马哈株式会社 | Speaker array apparatus and signal processing method therefor |
CN101354889A (en) * | 2008-09-18 | 2009-01-28 | 北京中星微电子有限公司 | Method and apparatus for tonal modification of voice |
CN102870153A (en) * | 2010-02-26 | 2013-01-09 | 弗兰霍菲尔运输应用研究公司 | Apparatus and method for modifying an audio signal using harmonic locking |
CN102934163A (en) * | 2010-06-01 | 2013-02-13 | 高通股份有限公司 | Systems, methods, apparatus, and computer program products for wideband speech coding |
CN101894563B (en) * | 2010-07-15 | 2013-03-20 | 瑞声声学科技(深圳)有限公司 | Voice enhancing method |
CN101894563A (en) * | 2010-07-15 | 2010-11-24 | 瑞声声学科技(深圳)有限公司 | Voice enhancing method |
KR20120061008A (en) * | 2010-11-02 | 2012-06-12 | 에스케이 텔레콤주식회사 | System and method for improving sound quality in data delivery communication by means of transform of audio signal, apparatus applied to the same |
US20140187210A1 (en) * | 2012-12-28 | 2014-07-03 | Cellco Partnership D/B/A Verizon Wireless | Filtering and enhancement of voice calls in a telecommunications network |
CN103514883A (en) * | 2013-09-26 | 2014-01-15 | 华南理工大学 | Method for achieving self-adaptive switching of male voice and female voice |
CN103514883B (en) * | 2013-09-26 | 2015-12-02 | 华南理工大学 | A kind of self-adaptation realizes men and women's sound changing method |
CN104780091A (en) * | 2014-01-13 | 2015-07-15 | 北京发现角科技有限公司 | Instant messaging method and instant messaging system with speech and audio processing function |
CN104409073A (en) * | 2014-11-04 | 2015-03-11 | 贵阳供电局 | Substation equipment sound and voice identification method |
CN104599677A (en) * | 2014-12-29 | 2015-05-06 | 中国科学院上海高等研究院 | Speech reconstruction-based instantaneous noise suppressing method |
EP3113175A1 (en) * | 2015-07-02 | 2017-01-04 | Thomson Licensing | Method for converting text to individual speech, and apparatus for converting text to individual speech |
CN105654941A (en) * | 2016-01-20 | 2016-06-08 | 华南理工大学 | Voice change method and device based on specific target person voice change ratio parameter |
CN105788589A (en) * | 2016-05-04 | 2016-07-20 | 腾讯科技(深圳)有限公司 | Audio data processing method and device |
CN106228973A (en) * | 2016-07-21 | 2016-12-14 | 福州大学 | Stablize the music voice modified tone method of tone color |
CN106297770A (en) * | 2016-08-04 | 2017-01-04 | 杭州电子科技大学 | The natural environment sound identification method extracted based on time-frequency domain statistical nature |
CN106328111A (en) * | 2016-08-22 | 2017-01-11 | 广州酷狗计算机科技有限公司 | Audio processing method and audio processing device |
CN107170464A (en) * | 2017-05-25 | 2017-09-15 | 厦门美图之家科技有限公司 | A kind of changing speed of sound method and computing device based on music rhythm |
Non-Patent Citations (2)
Title |
---|
梅铁民: ""一种有效的语音变调算法研究"", 《沈阳理工大学学报》 * |
汪石农等: ""改进相位声码器的音频时长变换算法研究"", 《计算机工程与应用》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109697985A (en) * | 2018-12-25 | 2019-04-30 | 广州市百果园信息技术有限公司 | Audio signal processing method, device and terminal |
CN109697985B (en) * | 2018-12-25 | 2021-06-29 | 广州市百果园信息技术有限公司 | Voice signal processing method and device and terminal |
CN111739544A (en) * | 2019-03-25 | 2020-10-02 | Oppo广东移动通信有限公司 | Voice processing method and device, electronic equipment and storage medium |
CN111739544B (en) * | 2019-03-25 | 2023-10-20 | Oppo广东移动通信有限公司 | Voice processing method, device, electronic equipment and storage medium |
CN112309425A (en) * | 2020-10-14 | 2021-02-02 | 浙江大华技术股份有限公司 | Sound tone changing method, electronic equipment and computer readable storage medium |
CN112420062A (en) * | 2020-11-18 | 2021-02-26 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio signal processing method and device |
CN114449339A (en) * | 2022-02-16 | 2022-05-06 | 深圳万兴软件有限公司 | Background sound effect conversion method and device, computer equipment and storage medium |
CN114449339B (en) * | 2022-02-16 | 2024-04-12 | 深圳万兴软件有限公司 | Background sound effect conversion method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108269579B (en) | 2020-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108269579A (en) | Voice data processing method, device, electronic equipment and readable storage medium storing program for executing | |
Serra et al. | Spectral modeling synthesis: A sound analysis/synthesis system based on a deterministic plus stochastic decomposition | |
JP6791258B2 (en) | Speech synthesis method, speech synthesizer and program | |
Fitz et al. | On the use of time: Frequency reassignment in additive sound modeling | |
Quatieri et al. | Audio signal processing based on sinusoidal analysis/synthesis | |
Schwarz et al. | Spectral envelope estimation, representation, and morphing for sound analysis, transformation, and synthesis. | |
CN108766409A (en) | A kind of opera synthetic method, device and computer readable storage medium | |
Serra | Introducing the phase vocoder | |
JP2018077283A (en) | Speech synthesis method | |
Caetano et al. | A source-filter model for musical instrument sound transformation | |
Cavaliere et al. | Granular synthesis of musical signals | |
WO2020162392A1 (en) | Sound signal synthesis method and training method for neural network | |
Every | Separation of musical sources and structure from single-channel polyphonic recordings | |
JP2017219595A (en) | Music producing method | |
Lee et al. | Excitation signal extraction for guitar tones | |
JP6834370B2 (en) | Speech synthesis method | |
Bonada et al. | Spectral approach to the modeling of the singing voice | |
US5911170A (en) | Synthesis of acoustic waveforms based on parametric modeling | |
JP6683103B2 (en) | Speech synthesis method | |
Liao | Analysis and trans-synthesis of acoustic bowed-string instrument recordings–a case study using Bach cello suites | |
Mignot et al. | Extended subtractive synthesis of harmonic musical tones | |
Rajan et al. | A continuous time model for Karnatic flute music synthesis | |
JP6822075B2 (en) | Speech synthesis method | |
US20230154451A1 (en) | Differentiable wavetable synthesizer | |
JP2013041128A (en) | Discriminating device for plurality of sound sources and information processing device interlocking with plurality of sound sources |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |