CN103489443A - Method and device for imitating sound - Google Patents

Method and device for imitating sound Download PDF

Info

Publication number
CN103489443A
CN103489443A CN201310423715.3A CN201310423715A CN103489443A CN 103489443 A CN103489443 A CN 103489443A CN 201310423715 A CN201310423715 A CN 201310423715A CN 103489443 A CN103489443 A CN 103489443A
Authority
CN
China
Prior art keywords
speech frame
model
sound
voice
submodule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310423715.3A
Other languages
Chinese (zh)
Other versions
CN103489443B (en
Inventor
赵欢
郑睿
陈佐
张希翔
杨泽英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan University
Original Assignee
Hunan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan University filed Critical Hunan University
Priority to CN201310423715.3A priority Critical patent/CN103489443B/en
Publication of CN103489443A publication Critical patent/CN103489443A/en
Application granted granted Critical
Publication of CN103489443B publication Critical patent/CN103489443B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

The invention discloses a method and device for imitating sound. The method comprises the steps of obtaining voice signals and an imitating object, preprocessing the voice signals, carrying tone and timbre conversion on each voice frame of the voice signals according to the imitating object, carrying out synthesizing to obtain synthesized voice frames, adding the synthesized voice frames to imitated voice frames, and finally outputting imitated voice formed by the imitated voice frames. The sound imitating device comprises an input module, a tone conversion module, a timbre conversion module, a voice frame synthesizing module, an imitated voice frame adding module, a judgment processing module and an imitated voice output module. The method and device for imitating sound have the advantages of being good in voice imitating effect, high in voice imitating similarity, and capable of achieving automatic expansion of a voice database.

Description

A kind of sound imitation method and device
Technical field
The present invention relates to field of voice signal, be specifically related to a kind of sound imitation method and device.
Background technology
Under the popular popularization of the fast development of voice process technology and social platform, the correlative study that sound imitates highlights its application advantage and progressively occupies critical positions.The common sound imitation method of prior art is for channel model and is changed, and the effect that its sound imitates is not satisfactory, has speech simulation similarity shortcoming on the low side.
The disclosed technical scheme of the Chinese invention patent application that publication number is CN102592590A has been put down in writing a kind of voice nature change of voice method and the device that can regulate arbitrarily and has been proposed voice signal is carried out in sound channel modeling, obtain subsequently the system model of pronunciation, it is the model of sound channel, change again afterwards the model of sound channel, finally in new channel model, restore voice signal.To a certain degree realized that the tone color of sound changes, but not from reaching in essence the effect of speech simulation; The disclosed technical scheme of the Chinese invention patent application that publication number is CN101567132A has been put down in writing audio conditioning apparatus and the volume adjustment device that a kind of sound changing device proposes to adopt the sound-producing device of reading the newspaper, make audio conditioning apparatus and volume adjustment device can carry out to the sound-producing device of reading the newspaper the free adjustment of audio frequency and volume, this device has been realized the variation of sound on tone and volume, but during sound is imitated, main tone color does not change, and the speech simulation effect of realization is not satisfactory.And what the sound mimicking system generally adopted at present is the sound material storehouse that pre-stores the sound material.Although the step that gathers the sound material can be simplified in the sound material storehouse pre-stored, this mode has been limited to the alternative of sound model greatly, lacks maneuverability and dirigibility, and has reduced the interest of the application based on sound imitation aspect.The problems such as in sum, current sound imitation method ubiquity speech simulation effect is not satisfactory, the speech simulation similarity is on the low side, sound material database data is single, can't increase, can not change.
Summary of the invention
For the problems referred to above of prior art, the technical problem to be solved in the present invention is to provide sound imitation method and the device that a kind of speech simulation is effective, sound imitation similarity is higher, can realize sound material storehouse automatic expansion.
In order to solve the problems of the technologies described above, the technical solution used in the present invention is:
A kind of sound imitation method, implementation step is as follows:
1) obtain speaker's voice signal and the model of appointment in sound material storehouse, by described voice signal, divide frame windowing pre-service, from described pretreated voice signal, select a speech frame as the current speech frame, redirect is carried out next step;
2) the current speech frame is carried out to pitch conversion and tone color conversion according to the model of appointment in sound material storehouse;
3) by the result after pitch conversion and the tone color conversion synthetic synthetic speech frame that obtains again;
4) described synthetic speech frame is added into to the imitation speech frame;
Whether all speech frames that 5) judge described voice signal all are disposed, and if not yet were disposed would select one still untreated speech frame as current speech frame redirect execution step 2), otherwise redirect is carried out next step;
The imitation voice output that 6) will be formed by described imitation speech frame.
Imitate further improvements in methods as sound of the present invention:
Described step 2) detailed step that in, the current speech frame is carried out to pitch conversion according to the model of appointment in sound material storehouse is as follows:
2.1.1) the current speech frame is carried out to linear prediction analysis;
2.1.2) result obtained according to linear prediction analysis obtains the LPC residual signals of current speech frame;
2.1.3) described LPC residual signals is carried out discrete Fourier transform (DFT) and calculates amplitude spectrum;
2.1.4) by the LPC residual signals after discrete Fourier transform (DFT) and and described amplitude spectrum build and obtain pseudo-harmonic wave voice by pseudo-harmonic wave speech model;
2.1.5) extract the pitch period of the model of appointment in described sound material storehouse;
2.1.6) according to the pitch period of described model, pseudo-harmonic wave voice are carried out to the fundamental tone conversion;
2.1.7) the pseudo-harmonic wave voice after fundamental tone conversion are carried out to the result output as pitch conversion of synthetic residual error new signal that inverse discrete Fourier transformer inverse-discrete obtains.
Described step 2) detailed step that in, the current speech frame is carried out to the tone color conversion according to the model of appointment in sound material storehouse is as follows:
2.2.1) the current speech frame is carried out to linear prediction analysis;
2.2.2) result obtained according to linear prediction analysis obtains the vocal tract filter of current speech frame;
2.2.3) extract the vocal tract filter of the model of appointment in described sound material storehouse;
2.2.4) vocal tract filter of described model is replaced to the result of the vocal tract filter of current speech frame as the tone color conversion.
In described step 1), described voice signal is divided after frame windowing pre-service and also comprise the step that deposits sound material storehouse using voice signal as model in, the described detailed step that deposits sound material storehouse using voice signal as model in is as follows:
1.1) the current speech frame is carried out to linear prediction analysis;
1.2) result obtained according to linear prediction analysis obtains LPC residual signals and the vocal tract filter of current speech frame;
1.3) described LPC residual signals is processed by the amplitude of cycles sum of squares function;
1.4) will extract speaker's pitch period by the LPC residual signals after the processing of amplitude of cycles sum of squares function;
1.5) speech parameter using described speaker's pitch period and vocal tract filter as model deposits sound material storehouse in.
In addition, the present invention also provides a kind of sound to imitate device, comprising:
Load module, reach the model in the storehouse appointment of sound material for the voice signal that obtains the speaker, by described voice signal, divides frame windowing pre-service, from described pretreated voice signal, selects a speech frame as the current speech frame;
The pitch conversion module, for carrying out pitch conversion by the current speech frame according to the model of sound material storehouse appointment;
The tone color modular converter, for carrying out the tone color conversion by the current speech frame according to the model of sound material storehouse appointment;
The speech frame synthesis module, for the result by after pitch conversion and the tone color conversion synthetic synthetic speech frame that obtains again;
Imitate speech frame and add module, described synthetic speech frame is added into to the imitation speech frame;
Judging treatmenting module, for judging whether all speech frames of described voice signal all are disposed, if not yet be disposed select one still untreated speech frame as current speech frame redirect execution step 2), otherwise redirect is carried out next step;
Imitate the voice output module, for the imitation voice output that will be formed by described imitation speech frame.
Imitate the further improvement of device as sound of the present invention:
Described pitch conversion module comprises:
The first linear prediction analysis submodule, for carrying out linear prediction analysis by the current speech frame;
The one LPC residual signals obtains submodule, obtains the LPC residual signals of current speech frame for the result obtained according to linear prediction analysis;
The discrete Fourier transform (DFT) submodule, for carrying out discrete Fourier transform (DFT) to the LPC residual signals;
The amplitude spectrum calculating sub module, for being calculated amplitude spectrum to the LPC residual signals;
Pseudo-harmonic wave speech production submodule, for the LPC residual signals by after discrete Fourier transform (DFT) and and described amplitude spectrum build and obtain pseudo-harmonic wave voice by pseudo-harmonic wave speech model;
The first pitch period extracts submodule, for the pitch period of the model of extracting described sound material storehouse appointment;
The fundamental tone transformation submodule, carry out the fundamental tone conversion for the pitch period according to described model to pseudo-harmonic wave voice;
The inverse discrete Fourier transformer inverse-discrete submodule, carry out the result output as pitch conversion of synthetic residual error new signal that inverse discrete Fourier transformer inverse-discrete obtains for the pseudo-harmonic wave voice using after fundamental tone conversion.
Described tone color modular converter comprises:
The second linear prediction analysis submodule, for carrying out linear prediction analysis by the current speech frame;
Vocal tract filter to be replaced extracts submodule, obtains the vocal tract filter of current speech frame for the result obtained according to linear prediction analysis;
The target vocal tract filter extracts submodule, for the vocal tract filter of the model of extracting described sound material storehouse appointment;
The target vocal tract filter is replaced submodule, replaces the result of the vocal tract filter of current speech frame as the tone color conversion for the vocal tract filter using described model.
Described sound imitates device and also comprises sound material storehouse expansion module, and described sound material storehouse expansion module comprises:
The 3rd linear prediction analysis submodule, for carrying out linear prediction analysis by the current speech frame;
Vocal tract filter obtains submodule, obtains the vocal tract filter of current speech frame for the result obtained according to linear prediction analysis;
The 2nd LPC residual signals obtains submodule, obtains the LPC residual signals of current speech frame for the result obtained according to linear prediction analysis;
The amplitude of cycles quadratic sum is processed submodule, for described LPC residual signals is processed by the amplitude of cycles sum of squares function;
The second pitch period extracts submodule, for the LPC residual signals after processing by the amplitude of cycles sum of squares function, extracts speaker's pitch period;
Model warehouse-in submodule, deposit sound material storehouse for the pitch period using described speaker and vocal tract filter in as the speech parameter of model.
Sound imitation method of the present invention has following technique effect:
1, the speech frame that the present invention is directed to voice signal carries out pitch conversion and tone color conversion according to the model of appointment in sound material storehouse, then by the result after pitch conversion and the tone color conversion synthetic synthetic speech frame that obtains again, the synthetic speech frame is added into to the imitation speech frame, the line output thereby the sound that by tone color and two factors of tone, voice signal is modeled as to the model of appointment sound material storehouse from that is gone forward side by side, therefore have advantages of that speech simulation is effective, that sound imitates similarity is higher.
2, the present invention further comprises the step that deposits sound material storehouse using voice signal as model in after voice signal being divided to frame windowing pre-service, can overcome prior art sound imitation method middle pitch material database data single, can't increase, problem and the speech simulation similarity such as can not change on the low side etc. not enough, the voice that utilize arbitrarily the inventive method to be imitated the speaker of sound by automatic collection extract feature and are saved in sound material storehouse, realize the expansion certainly in sound material storehouse, sound material storehouse is enriched and can be expanded, model is abundant, dirigibility and the interest of the inventive method have greatly been improved.
Because sound of the present invention imitates device, be device corresponding to the sound method of imitationing of the present invention, so sound of the present invention imitate device and also have the technique effect identical with sound imitation device of the present invention, therefore do not repeat them here.
The accompanying drawing explanation
The implementing procedure schematic diagram that Fig. 1 is the inventive method embodiment.
The detailed process schematic diagram that Fig. 2 is the inventive method embodiment.
Fig. 3 deposits the method flow schematic diagram in sound material storehouse in using voice signal as model in the inventive method embodiment.
The method flow schematic diagram that Fig. 4 is the conversion of the inventive method embodiment medium pitch.
The method flow schematic diagram that Fig. 5 is tone color conversion in the inventive method embodiment.
The framed structure schematic diagram that Fig. 6 is apparatus of the present invention embodiment.
The framed structure schematic diagram that Fig. 7 is apparatus of the present invention embodiment medium pitch modular converter.
The framed structure schematic diagram that Fig. 8 is tone color modular converter in apparatus of the present invention embodiment.
The output schematic diagram that Fig. 9 is load module in apparatus of the present invention embodiment.
The framed structure schematic diagram that Figure 10 is apparatus of the present invention embodiment middle pitch material storehouse expansion module.
The principle of work schematic diagram that Figure 11 is apparatus of the present invention embodiment.
Embodiment
As depicted in figs. 1 and 2, the implementation step of the sound imitation method of the present embodiment is as follows:
1) obtain speaker's voice signal and the model of appointment in sound material storehouse, by voice signal, divide frame windowing pre-service, from pretreated voice signal, select a speech frame as the current speech frame, redirect execution step 2).
While in the present embodiment, gathering voice signal, the hardware unit be specially by having recording, playing function and mobile network service is gathered, also need in advance voice signal to be divided frame windowing pre-service after gathering voice signal, window being set long is k, frame moves as k ', frame number is N, and n frame speech data is s (n), and the synthetic frame that corresponding simulation obtains is S (n).
As shown in Figure 3, in the present embodiment, voice signal is divided frame windowing pre-service also to comprise the step that deposits sound material storehouse using voice signal as model in afterwards, the detailed step that deposits sound material storehouse using voice signal as model in is as follows:
1.1) the current speech frame is carried out to linear prediction analysis LPC;
1.2) result obtained according to linear prediction analysis LPC obtains LPC residual signals R (n) and the vocal tract filter An of current speech frame;
1.3) LPC residual signals R (n) is processed by amplitude of cycles sum of squares function SCMDSF;
1.4) will extract speaker's pitch period by the LPC residual signals after the processing of amplitude of cycles sum of squares function;
1.5) speech parameter using speaker's pitch period and vocal tract filter as model deposits sound material storehouse in.
2) the current speech frame is carried out to pitch conversion and tone color conversion according to the model of appointment in sound material storehouse.
As shown in Figure 4, the detailed step that the present embodiment step 2), the current speech frame is carried out to pitch conversion according to the model of appointment in sound material storehouse is as follows:
2.1.1) the current speech frame is carried out to linear prediction analysis LPC;
2.1.2) result obtained according to linear prediction analysis LPC obtains the LPC residual signals R (n) of current speech frame;
2.1.3) the LPC residual signals is carried out discrete Fourier transform (DFT) DFT and calculates amplitude spectrum S (k);
2.1.4) by the LPC residual signals DFT after discrete Fourier transform (DFT) and and amplitude spectrum S (k) build and obtain pseudo-harmonic wave voice by pseudo-harmonic wave speech model;
2.1.5) extract the pitch period of the model of appointment in sound material storehouse;
2.1.6) according to the pitch period of model, pseudo-harmonic wave voice are carried out to the fundamental tone conversion;
2.1.7) the pseudo-harmonic wave voice after fundamental tone conversion are carried out to the result output as pitch conversion of synthetic residual error new signal that inverse discrete Fourier transformer inverse-discrete obtains.
As shown in Figure 5, the detailed step that the present embodiment step 2), the current speech frame is carried out to the tone color conversion according to the model of appointment in sound material storehouse is as follows:
2.2.1) the current speech frame is carried out to linear prediction analysis LPC;
2.2.2) result obtained according to linear prediction analysis LPC obtains the vocal tract filter An of current speech frame;
2.2.3) extract the vocal tract filter An ' of the model of appointment in sound material storehouse;
2.2.4) the vocal tract filter An ' of model is replaced to the result of the vocal tract filter An of current speech frame as the tone color conversion.
3) speech frame s (n) being carried out to result after the conversion of pitch conversion and tone color (residual error new signal R (n) ' and vocal tract filter An ') again synthesizes and obtains synthetic speech frame S (n) '.
4) synthetic speech frame S (n) ' is added into to the imitation speech frame.
Whether all speech frames that 5) judge voice signal all are disposed, if not yet be disposed (n is not equal to N) select one still untreated speech frame as current speech frame redirect execution step 2), otherwise (n equals N) redirect is carried out next step.
The imitation voice output that 6) will be formed by the imitation speech frame.
As shown in Figure 6, corresponding with the sound imitation method of the present embodiment, the sound of the present embodiment imitates device and comprises:
Load module, reach the model in the storehouse appointment of sound material for the voice signal that obtains the speaker, by voice signal, divides frame windowing pre-service, from pretreated voice signal, selects a speech frame as the current speech frame;
The pitch conversion module, for carrying out pitch conversion by the current speech frame according to the model of sound material storehouse appointment;
The tone color modular converter, for carrying out the tone color conversion by the current speech frame according to the model of sound material storehouse appointment;
The speech frame synthesis module, for the result by after pitch conversion and the tone color conversion synthetic synthetic speech frame that obtains again;
Imitate speech frame and add module, the synthetic speech frame is added into to the imitation speech frame;
Whether judging treatmenting module, all be disposed for all speech frames that judge voice signal, and if not yet were disposed would select one still untreated speech frame as current speech frame redirect execution step 2), otherwise redirect is carried out next step;
Imitate the voice output module, for the imitation voice output that will be formed by the imitation speech frame.
As shown in Figure 7, the pitch conversion module comprises:
The first linear prediction analysis submodule, for carrying out linear prediction analysis by the current speech frame;
The one LPC residual signals obtains submodule, obtains the LPC residual signals of current speech frame for the result obtained according to linear prediction analysis;
The discrete Fourier transform (DFT) submodule, for carrying out discrete Fourier transform (DFT) to the LPC residual signals;
The amplitude spectrum calculating sub module, for being calculated amplitude spectrum to the LPC residual signals;
Pseudo-harmonic wave speech production submodule, for the LPC residual signals by after discrete Fourier transform (DFT) and and amplitude spectrum build and obtain pseudo-harmonic wave voice by pseudo-harmonic wave speech model;
The first pitch period extracts submodule, for the pitch period of the model of extracting the storehouse appointment of sound material;
The fundamental tone transformation submodule, carry out the fundamental tone conversion for the pitch period according to model to pseudo-harmonic wave voice;
The inverse discrete Fourier transformer inverse-discrete submodule, carry out the result output as pitch conversion of synthetic residual error new signal that inverse discrete Fourier transformer inverse-discrete obtains for the pseudo-harmonic wave voice using after fundamental tone conversion.
As shown in Figure 8, the tone color modular converter comprises:
The second linear prediction analysis submodule, for carrying out linear prediction analysis by the current speech frame;
Vocal tract filter to be replaced extracts submodule, obtains the vocal tract filter of current speech frame for the result obtained according to linear prediction analysis;
The target vocal tract filter extracts submodule, for the vocal tract filter of the model of extracting the storehouse appointment of sound material;
The target vocal tract filter is replaced submodule, replaces the result of the vocal tract filter of current speech frame as the tone color conversion for the vocal tract filter using model.
As shown in Figure 9, the present embodiment also comprises sound material storehouse expansion module.Load module is except exporting voice signal to the speech simulation unit by pitch conversion module, tone color modular converter, speech frame synthesis module, imitation speech frame interpolation module, judging treatmenting module, imitation voice output module composition, also export sound material storehouse expansion module to, by sound material storehouse expansion module, deposit sound material storehouse in using speaker's voice signal as model.
As shown in figure 10, sound material storehouse expansion module comprises:
The 3rd linear prediction analysis submodule, for carrying out linear prediction analysis by the current speech frame;
Vocal tract filter obtains submodule, obtains the vocal tract filter of current speech frame for the result obtained according to linear prediction analysis;
The 2nd LPC residual signals obtains submodule, obtains the LPC residual signals of current speech frame for the result obtained according to linear prediction analysis;
The amplitude of cycles quadratic sum is processed submodule, for the LPC residual signals is processed by the amplitude of cycles sum of squares function;
The second pitch period extracts submodule, for the LPC residual signals after processing by the amplitude of cycles sum of squares function, extracts speaker's pitch period;
Model warehouse-in submodule, deposit sound material storehouse for the pitch period using the speaker and vocal tract filter in as the speech parameter of model.
As shown in figure 11, the utilization of the present embodiment load module has the hardware unit collection speaker voice of recording, playing function and mobile network service, and the reception speaker selects the model in sound material storehouse.The speech simulation unit carries out change of voice processing output according to above-mentioned gathered voice and change of voice object, and sound material storehouse expansion module extracts former speaker's speech parameter and is saved in sound material storehouse simultaneously, increases former speaker's model in sound material storehouse.
The above is only the preferred embodiment of the present invention, and protection scope of the present invention also not only is confined to above-described embodiment, and all technical schemes belonged under thinking of the present invention all belong to protection scope of the present invention.It should be pointed out that for those skilled in the art, some improvements and modifications without departing from the principles of the present invention, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (8)

1. a sound imitation method is characterized in that implementation step is as follows:
1) obtain speaker's voice signal and the model of appointment in sound material storehouse, by described voice signal, divide frame windowing pre-service, from described pretreated voice signal, select a speech frame as the current speech frame, redirect is carried out next step;
2) the current speech frame is carried out to pitch conversion and tone color conversion according to the model of appointment in sound material storehouse;
3) by the result after pitch conversion and the tone color conversion synthetic synthetic speech frame that obtains again;
4) described synthetic speech frame is added into to the imitation speech frame;
Whether all speech frames that 5) judge described voice signal all are disposed, and if not yet were disposed would select one still untreated speech frame as current speech frame redirect execution step 2), otherwise redirect is carried out next step;
The imitation voice output that 6) will be formed by described imitation speech frame.
2. sound imitation method according to claim 1, is characterized in that, described step 2) in the current speech frame is carried out to pitch conversion according to the model of appointment in sound material storehouse detailed step as follows:
2.1.1) the current speech frame is carried out to linear prediction analysis;
2.1.2) result obtained according to linear prediction analysis obtains the LPC residual signals of current speech frame;
2.1.3) described LPC residual signals is carried out discrete Fourier transform (DFT) and calculates amplitude spectrum;
2.1.4) by the LPC residual signals after discrete Fourier transform (DFT) and and described amplitude spectrum build and obtain pseudo-harmonic wave voice by pseudo-harmonic wave speech model;
2.1.5) extract the pitch period of the model of appointment in described sound material storehouse;
2.1.6) according to the pitch period of described model, pseudo-harmonic wave voice are carried out to the fundamental tone conversion;
2.1.7) the pseudo-harmonic wave voice after fundamental tone conversion are carried out to the result output as pitch conversion of synthetic residual error new signal that inverse discrete Fourier transformer inverse-discrete obtains.
3. sound imitation method according to claim 2, is characterized in that, described step 2) in the current speech frame is carried out to the tone color conversion according to the model of appointment in sound material storehouse detailed step as follows:
2.2.1) the current speech frame is carried out to linear prediction analysis;
2.2.2) result obtained according to linear prediction analysis obtains the vocal tract filter of current speech frame;
2.2.3) extract the vocal tract filter of the model of appointment in described sound material storehouse;
2.2.4) vocal tract filter of described model is replaced to the result of the vocal tract filter of current speech frame as the tone color conversion.
4. according to claim 1 or 2 or 3 described sound imitation methods, it is characterized in that, in described step 1), described voice signal is divided after frame windowing pre-service and also comprise the step that deposits sound material storehouse using voice signal as model in, the described detailed step that deposits sound material storehouse using voice signal as model in is as follows:
1.1) the current speech frame is carried out to linear prediction analysis;
1.2) result obtained according to linear prediction analysis obtains LPC residual signals and the vocal tract filter of current speech frame;
1.3) described LPC residual signals is processed by the amplitude of cycles sum of squares function;
1.4) will extract speaker's pitch period by the LPC residual signals after the processing of amplitude of cycles sum of squares function;
1.5) speech parameter using described speaker's pitch period and vocal tract filter as model deposits sound material storehouse in.
5. a sound imitates device, it is characterized in that comprising:
Load module, reach the model in the storehouse appointment of sound material for the voice signal that obtains the speaker, by described voice signal, divides frame windowing pre-service, from described pretreated voice signal, selects a speech frame as the current speech frame;
The pitch conversion module, for carrying out pitch conversion by the current speech frame according to the model of sound material storehouse appointment;
The tone color modular converter, for carrying out the tone color conversion by the current speech frame according to the model of sound material storehouse appointment;
The speech frame synthesis module, for the result by after pitch conversion and the tone color conversion synthetic synthetic speech frame that obtains again;
Imitate speech frame and add module, described synthetic speech frame is added into to the imitation speech frame;
Judging treatmenting module, for judging whether all speech frames of described voice signal all are disposed, if not yet be disposed select one still untreated speech frame as current speech frame redirect execution step 2), otherwise redirect is carried out next step;
Imitate the voice output module, for the imitation voice output that will be formed by described imitation speech frame.
6. sound according to claim 5 imitates device, it is characterized in that, described pitch conversion module comprises:
The first linear prediction analysis submodule, for carrying out linear prediction analysis by the current speech frame;
The one LPC residual signals obtains submodule, obtains the LPC residual signals of current speech frame for the result obtained according to linear prediction analysis;
The discrete Fourier transform (DFT) submodule, for carrying out discrete Fourier transform (DFT) to the LPC residual signals;
The amplitude spectrum calculating sub module, for being calculated amplitude spectrum to the LPC residual signals;
Pseudo-harmonic wave speech production submodule, for the LPC residual signals by after discrete Fourier transform (DFT) and and described amplitude spectrum build and obtain pseudo-harmonic wave voice by pseudo-harmonic wave speech model;
The first pitch period extracts submodule, for the pitch period of the model of extracting described sound material storehouse appointment;
The fundamental tone transformation submodule, carry out the fundamental tone conversion for the pitch period according to described model to pseudo-harmonic wave voice;
The inverse discrete Fourier transformer inverse-discrete submodule, carry out the result output as pitch conversion of synthetic residual error new signal that inverse discrete Fourier transformer inverse-discrete obtains for the pseudo-harmonic wave voice using after fundamental tone conversion.
7. sound according to claim 6 imitates device, it is characterized in that, described tone color modular converter comprises:
The second linear prediction analysis submodule, for carrying out linear prediction analysis by the current speech frame;
Vocal tract filter to be replaced extracts submodule, obtains the vocal tract filter of current speech frame for the result obtained according to linear prediction analysis;
The target vocal tract filter extracts submodule, for the vocal tract filter of the model of extracting described sound material storehouse appointment;
The target vocal tract filter is replaced submodule, replaces the result of the vocal tract filter of current speech frame as the tone color conversion for the vocal tract filter using described model.
8. imitate device according to claim 5 or 6 or 7 described sound, it is characterized in that, described sound imitates device and also comprises sound material storehouse expansion module, and described sound material storehouse expansion module comprises:
The 3rd linear prediction analysis submodule, for carrying out linear prediction analysis by the current speech frame;
Vocal tract filter obtains submodule, obtains the vocal tract filter of current speech frame for the result obtained according to linear prediction analysis;
The 2nd LPC residual signals obtains submodule, obtains the LPC residual signals of current speech frame for the result obtained according to linear prediction analysis;
The amplitude of cycles quadratic sum is processed submodule, for described LPC residual signals is processed by the amplitude of cycles sum of squares function;
The second pitch period extracts submodule, for the LPC residual signals after processing by the amplitude of cycles sum of squares function, extracts speaker's pitch period;
Model warehouse-in submodule, deposit sound material storehouse for the pitch period using described speaker and vocal tract filter in as the speech parameter of model.
CN201310423715.3A 2013-09-17 2013-09-17 A kind of sound imitates method and device Active CN103489443B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310423715.3A CN103489443B (en) 2013-09-17 2013-09-17 A kind of sound imitates method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310423715.3A CN103489443B (en) 2013-09-17 2013-09-17 A kind of sound imitates method and device

Publications (2)

Publication Number Publication Date
CN103489443A true CN103489443A (en) 2014-01-01
CN103489443B CN103489443B (en) 2016-06-15

Family

ID=49829623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310423715.3A Active CN103489443B (en) 2013-09-17 2013-09-17 A kind of sound imitates method and device

Country Status (1)

Country Link
CN (1) CN103489443B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104900226A (en) * 2014-03-03 2015-09-09 联想(北京)有限公司 Information processing method and device
CN106302134A (en) * 2016-09-29 2017-01-04 努比亚技术有限公司 A kind of message playing device and method
CN109616131A (en) * 2018-11-12 2019-04-12 南京南大电子智慧型服务机器人研究院有限公司 A kind of number real-time voice is changed voice method
CN110444192A (en) * 2019-08-15 2019-11-12 广州科粤信息科技有限公司 A kind of intelligent sound robot based on voice technology
CN111223475A (en) * 2019-11-29 2020-06-02 北京达佳互联信息技术有限公司 Voice data generation method and device, electronic equipment and storage medium
CN111317316A (en) * 2018-12-13 2020-06-23 南京硅基智能科技有限公司 Photo frame for simulating appointed voice to carry out man-machine conversation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1719514A (en) * 2004-07-06 2006-01-11 中国科学院自动化研究所 Based on speech analysis and synthetic high-quality real-time change of voice method
CN102592590A (en) * 2012-02-21 2012-07-18 华南理工大学 Arbitrarily adjustable method and device for changing phoneme naturally
CN102664003A (en) * 2012-04-24 2012-09-12 南京邮电大学 Residual excitation signal synthesis and voice conversion method based on harmonic plus noise model (HNM)
US20130151256A1 (en) * 2010-07-20 2013-06-13 National Institute Of Advanced Industrial Science And Technology System and method for singing synthesis capable of reflecting timbre changes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1719514A (en) * 2004-07-06 2006-01-11 中国科学院自动化研究所 Based on speech analysis and synthetic high-quality real-time change of voice method
US20130151256A1 (en) * 2010-07-20 2013-06-13 National Institute Of Advanced Industrial Science And Technology System and method for singing synthesis capable of reflecting timbre changes
CN102592590A (en) * 2012-02-21 2012-07-18 华南理工大学 Arbitrarily adjustable method and device for changing phoneme naturally
CN102664003A (en) * 2012-04-24 2012-09-12 南京邮电大学 Residual excitation signal synthesis and voice conversion method based on harmonic plus noise model (HNM)

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李波: "语音转换的关键技术研究", 《中国优秀博硕士学位论文全文数据库 (博士) 信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104900226A (en) * 2014-03-03 2015-09-09 联想(北京)有限公司 Information processing method and device
CN106302134A (en) * 2016-09-29 2017-01-04 努比亚技术有限公司 A kind of message playing device and method
CN109616131A (en) * 2018-11-12 2019-04-12 南京南大电子智慧型服务机器人研究院有限公司 A kind of number real-time voice is changed voice method
CN109616131B (en) * 2018-11-12 2023-07-07 南京南大电子智慧型服务机器人研究院有限公司 Digital real-time voice sound changing method
CN111317316A (en) * 2018-12-13 2020-06-23 南京硅基智能科技有限公司 Photo frame for simulating appointed voice to carry out man-machine conversation
CN110444192A (en) * 2019-08-15 2019-11-12 广州科粤信息科技有限公司 A kind of intelligent sound robot based on voice technology
CN111223475A (en) * 2019-11-29 2020-06-02 北京达佳互联信息技术有限公司 Voice data generation method and device, electronic equipment and storage medium
CN111223475B (en) * 2019-11-29 2022-10-14 北京达佳互联信息技术有限公司 Voice data generation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN103489443B (en) 2016-06-15

Similar Documents

Publication Publication Date Title
CN103489443A (en) Method and device for imitating sound
CN106328126B (en) Far field voice recognition processing method and device
CN110782878B (en) Attention mechanism-based multi-scale audio scene recognition method
Choi et al. Investigating U-Nets with various intermediate blocks for spectrogram-based singing voice separation
CN105845127A (en) Voice recognition method and system
CN104134444B (en) A kind of song based on MMSE removes method and apparatus of accompanying
CN106228973A (en) Stablize the music voice modified tone method of tone color
CN110797038B (en) Audio processing method and device, computer equipment and storage medium
CN104538011A (en) Tone adjusting method and device and terminal device
CN102436807A (en) Method and system for automatically generating voice with stressed syllables
CN104575487A (en) Voice signal processing method and device
CN105741835A (en) Audio information processing method and terminal
CN110047478B (en) Multi-channel speech recognition acoustic modeling method and device based on spatial feature compensation
CN113724712B (en) Bird sound identification method based on multi-feature fusion and combination model
CN105304092A (en) Real-time voice changing method based on intelligent terminal
CN101740034A (en) Method for realizing sound speed-variation without tone variation and system for realizing speed variation and tone variation
CN102568476A (en) Voice conversion method based on self-organizing feature map network cluster and radial basis network
CN106375780A (en) Method and apparatus for generating multimedia file
CN108053814A (en) A kind of speech synthesis system and method for analog subscriber song
CN114267372A (en) Voice noise reduction method, system, electronic device and storage medium
CN113539232A (en) Muslim class voice data set-based voice synthesis method
CN111724809A (en) Vocoder implementation method and device based on variational self-encoder
CN101178895A (en) Model self-adapting method based on generating parameter listen-feel error minimize
CN105719640B (en) Speech synthesizing device and speech synthesizing method
CN105654941A (en) Voice change method and device based on specific target person voice change ratio parameter

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant