CN103489443A

CN103489443A - Method and device for imitating sound

Info

Publication number: CN103489443A
Application number: CN201310423715.3A
Authority: CN
Inventors: 赵欢; 郑睿; 陈佐; 张希翔; 杨泽英
Original assignee: Hunan University
Current assignee: Hunan University
Priority date: 2013-09-17
Filing date: 2013-09-17
Publication date: 2014-01-01
Anticipated expiration: 2033-09-17
Also published as: CN103489443B

Abstract

The invention discloses a method and device for imitating sound. The method comprises the steps of obtaining voice signals and an imitating object, preprocessing the voice signals, carrying tone and timbre conversion on each voice frame of the voice signals according to the imitating object, carrying out synthesizing to obtain synthesized voice frames, adding the synthesized voice frames to imitated voice frames, and finally outputting imitated voice formed by the imitated voice frames. The sound imitating device comprises an input module, a tone conversion module, a timbre conversion module, a voice frame synthesizing module, an imitated voice frame adding module, a judgment processing module and an imitated voice output module. The method and device for imitating sound have the advantages of being good in voice imitating effect, high in voice imitating similarity, and capable of achieving automatic expansion of a voice database.

Description

A kind of sound imitation method and device

Technical field

The present invention relates to field of voice signal, be specifically related to a kind of sound imitation method and device.

Background technology

Under the popular popularization of the fast development of voice process technology and social platform, the correlative study that sound imitates highlights its application advantage and progressively occupies critical positions.The common sound imitation method of prior art is for channel model and is changed, and the effect that its sound imitates is not satisfactory, has speech simulation similarity shortcoming on the low side.

The disclosed technical scheme of the Chinese invention patent application that publication number is CN102592590A has been put down in writing a kind of voice nature change of voice method and the device that can regulate arbitrarily and has been proposed voice signal is carried out in sound channel modeling, obtain subsequently the system model of pronunciation, it is the model of sound channel, change again afterwards the model of sound channel, finally in new channel model, restore voice signal.To a certain degree realized that the tone color of sound changes, but not from reaching in essence the effect of speech simulation; The disclosed technical scheme of the Chinese invention patent application that publication number is CN101567132A has been put down in writing audio conditioning apparatus and the volume adjustment device that a kind of sound changing device proposes to adopt the sound-producing device of reading the newspaper, make audio conditioning apparatus and volume adjustment device can carry out to the sound-producing device of reading the newspaper the free adjustment of audio frequency and volume, this device has been realized the variation of sound on tone and volume, but during sound is imitated, main tone color does not change, and the speech simulation effect of realization is not satisfactory.And what the sound mimicking system generally adopted at present is the sound material storehouse that pre-stores the sound material.Although the step that gathers the sound material can be simplified in the sound material storehouse pre-stored, this mode has been limited to the alternative of sound model greatly, lacks maneuverability and dirigibility, and has reduced the interest of the application based on sound imitation aspect.The problems such as in sum, current sound imitation method ubiquity speech simulation effect is not satisfactory, the speech simulation similarity is on the low side, sound material database data is single, can't increase, can not change.

Summary of the invention

For the problems referred to above of prior art, the technical problem to be solved in the present invention is to provide sound imitation method and the device that a kind of speech simulation is effective, sound imitation similarity is higher, can realize sound material storehouse automatic expansion.

In order to solve the problems of the technologies described above, the technical solution used in the present invention is:

A kind of sound imitation method, implementation step is as follows:

1) obtain speaker's voice signal and the model of appointment in sound material storehouse, by described voice signal, divide frame windowing pre-service, from described pretreated voice signal, select a speech frame as the current speech frame, redirect is carried out next step;

2) the current speech frame is carried out to pitch conversion and tone color conversion according to the model of appointment in sound material storehouse;

3) by the result after pitch conversion and the tone color conversion synthetic synthetic speech frame that obtains again;

4) described synthetic speech frame is added into to the imitation speech frame;

Whether all speech frames that 5) judge described voice signal all are disposed, and if not yet were disposed would select one still untreated speech frame as current speech frame redirect execution step 2), otherwise redirect is carried out next step;

The imitation voice output that 6) will be formed by described imitation speech frame.

Imitate further improvements in methods as sound of the present invention:

Described step 2) detailed step that in, the current speech frame is carried out to pitch conversion according to the model of appointment in sound material storehouse is as follows:

2.1.1) the current speech frame is carried out to linear prediction analysis;

2.1.2) result obtained according to linear prediction analysis obtains the LPC residual signals of current speech frame;

2.1.3) described LPC residual signals is carried out discrete Fourier transform (DFT) and calculates amplitude spectrum;

2.1.4) by the LPC residual signals after discrete Fourier transform (DFT) and and described amplitude spectrum build and obtain pseudo-harmonic wave voice by pseudo-harmonic wave speech model;

2.1.5) extract the pitch period of the model of appointment in described sound material storehouse;

2.1.6) according to the pitch period of described model, pseudo-harmonic wave voice are carried out to the fundamental tone conversion;

2.1.7) the pseudo-harmonic wave voice after fundamental tone conversion are carried out to the result output as pitch conversion of synthetic residual error new signal that inverse discrete Fourier transformer inverse-discrete obtains.

Described step 2) detailed step that in, the current speech frame is carried out to the tone color conversion according to the model of appointment in sound material storehouse is as follows:

2.2.1) the current speech frame is carried out to linear prediction analysis;

2.2.2) result obtained according to linear prediction analysis obtains the vocal tract filter of current speech frame;

2.2.3) extract the vocal tract filter of the model of appointment in described sound material storehouse;

2.2.4) vocal tract filter of described model is replaced to the result of the vocal tract filter of current speech frame as the tone color conversion.

In described step 1), described voice signal is divided after frame windowing pre-service and also comprise the step that deposits sound material storehouse using voice signal as model in, the described detailed step that deposits sound material storehouse using voice signal as model in is as follows:

1.1) the current speech frame is carried out to linear prediction analysis;

1.2) result obtained according to linear prediction analysis obtains LPC residual signals and the vocal tract filter of current speech frame;

1.3) described LPC residual signals is processed by the amplitude of cycles sum of squares function;

1.4) will extract speaker's pitch period by the LPC residual signals after the processing of amplitude of cycles sum of squares function;

1.5) speech parameter using described speaker's pitch period and vocal tract filter as model deposits sound material storehouse in.

In addition, the present invention also provides a kind of sound to imitate device, comprising:

Load module, reach the model in the storehouse appointment of sound material for the voice signal that obtains the speaker, by described voice signal, divides frame windowing pre-service, from described pretreated voice signal, selects a speech frame as the current speech frame;

The pitch conversion module, for carrying out pitch conversion by the current speech frame according to the model of sound material storehouse appointment;

The tone color modular converter, for carrying out the tone color conversion by the current speech frame according to the model of sound material storehouse appointment;

The speech frame synthesis module, for the result by after pitch conversion and the tone color conversion synthetic synthetic speech frame that obtains again;

Imitate speech frame and add module, described synthetic speech frame is added into to the imitation speech frame;

Judging treatmenting module, for judging whether all speech frames of described voice signal all are disposed, if not yet be disposed select one still untreated speech frame as current speech frame redirect execution step 2), otherwise redirect is carried out next step;

Imitate the voice output module, for the imitation voice output that will be formed by described imitation speech frame.

Imitate the further improvement of device as sound of the present invention:

Described pitch conversion module comprises:

The first linear prediction analysis submodule, for carrying out linear prediction analysis by the current speech frame;

The one LPC residual signals obtains submodule, obtains the LPC residual signals of current speech frame for the result obtained according to linear prediction analysis;

The discrete Fourier transform (DFT) submodule, for carrying out discrete Fourier transform (DFT) to the LPC residual signals;

The amplitude spectrum calculating sub module, for being calculated amplitude spectrum to the LPC residual signals;

Pseudo-harmonic wave speech production submodule, for the LPC residual signals by after discrete Fourier transform (DFT) and and described amplitude spectrum build and obtain pseudo-harmonic wave voice by pseudo-harmonic wave speech model;

The first pitch period extracts submodule, for the pitch period of the model of extracting described sound material storehouse appointment;

The fundamental tone transformation submodule, carry out the fundamental tone conversion for the pitch period according to described model to pseudo-harmonic wave voice;

The inverse discrete Fourier transformer inverse-discrete submodule, carry out the result output as pitch conversion of synthetic residual error new signal that inverse discrete Fourier transformer inverse-discrete obtains for the pseudo-harmonic wave voice using after fundamental tone conversion.

Described tone color modular converter comprises:

The second linear prediction analysis submodule, for carrying out linear prediction analysis by the current speech frame;

Vocal tract filter to be replaced extracts submodule, obtains the vocal tract filter of current speech frame for the result obtained according to linear prediction analysis;

The target vocal tract filter extracts submodule, for the vocal tract filter of the model of extracting described sound material storehouse appointment;

The target vocal tract filter is replaced submodule, replaces the result of the vocal tract filter of current speech frame as the tone color conversion for the vocal tract filter using described model.

Described sound imitates device and also comprises sound material storehouse expansion module, and described sound material storehouse expansion module comprises:

The 3rd linear prediction analysis submodule, for carrying out linear prediction analysis by the current speech frame;

Vocal tract filter obtains submodule, obtains the vocal tract filter of current speech frame for the result obtained according to linear prediction analysis;

The 2nd LPC residual signals obtains submodule, obtains the LPC residual signals of current speech frame for the result obtained according to linear prediction analysis;

The amplitude of cycles quadratic sum is processed submodule, for described LPC residual signals is processed by the amplitude of cycles sum of squares function;

The second pitch period extracts submodule, for the LPC residual signals after processing by the amplitude of cycles sum of squares function, extracts speaker's pitch period;

Model warehouse-in submodule, deposit sound material storehouse for the pitch period using described speaker and vocal tract filter in as the speech parameter of model.

Sound imitation method of the present invention has following technique effect:

1, the speech frame that the present invention is directed to voice signal carries out pitch conversion and tone color conversion according to the model of appointment in sound material storehouse, then by the result after pitch conversion and the tone color conversion synthetic synthetic speech frame that obtains again, the synthetic speech frame is added into to the imitation speech frame, the line output thereby the sound that by tone color and two factors of tone, voice signal is modeled as to the model of appointment sound material storehouse from that is gone forward side by side, therefore have advantages of that speech simulation is effective, that sound imitates similarity is higher.

2, the present invention further comprises the step that deposits sound material storehouse using voice signal as model in after voice signal being divided to frame windowing pre-service, can overcome prior art sound imitation method middle pitch material database data single, can't increase, problem and the speech simulation similarity such as can not change on the low side etc. not enough, the voice that utilize arbitrarily the inventive method to be imitated the speaker of sound by automatic collection extract feature and are saved in sound material storehouse, realize the expansion certainly in sound material storehouse, sound material storehouse is enriched and can be expanded, model is abundant, dirigibility and the interest of the inventive method have greatly been improved.

Because sound of the present invention imitates device, be device corresponding to the sound method of imitationing of the present invention, so sound of the present invention imitate device and also have the technique effect identical with sound imitation device of the present invention, therefore do not repeat them here.

The accompanying drawing explanation

The implementing procedure schematic diagram that Fig. 1 is the inventive method embodiment.

The detailed process schematic diagram that Fig. 2 is the inventive method embodiment.

Fig. 3 deposits the method flow schematic diagram in sound material storehouse in using voice signal as model in the inventive method embodiment.

The method flow schematic diagram that Fig. 4 is the conversion of the inventive method embodiment medium pitch.

The method flow schematic diagram that Fig. 5 is tone color conversion in the inventive method embodiment.

The framed structure schematic diagram that Fig. 6 is apparatus of the present invention embodiment.

The framed structure schematic diagram that Fig. 7 is apparatus of the present invention embodiment medium pitch modular converter.

The framed structure schematic diagram that Fig. 8 is tone color modular converter in apparatus of the present invention embodiment.

The output schematic diagram that Fig. 9 is load module in apparatus of the present invention embodiment.

The framed structure schematic diagram that Figure 10 is apparatus of the present invention embodiment middle pitch material storehouse expansion module.

The principle of work schematic diagram that Figure 11 is apparatus of the present invention embodiment.

Embodiment

As depicted in figs. 1 and 2, the implementation step of the sound imitation method of the present embodiment is as follows:

1) obtain speaker's voice signal and the model of appointment in sound material storehouse, by voice signal, divide frame windowing pre-service, from pretreated voice signal, select a speech frame as the current speech frame, redirect execution step 2).

While in the present embodiment, gathering voice signal, the hardware unit be specially by having recording, playing function and mobile network service is gathered, also need in advance voice signal to be divided frame windowing pre-service after gathering voice signal, window being set long is k, frame moves as k ', frame number is N, and n frame speech data is s (n), and the synthetic frame that corresponding simulation obtains is S (n).

As shown in Figure 3, in the present embodiment, voice signal is divided frame windowing pre-service also to comprise the step that deposits sound material storehouse using voice signal as model in afterwards, the detailed step that deposits sound material storehouse using voice signal as model in is as follows:

1.1) the current speech frame is carried out to linear prediction analysis LPC;

1.2) result obtained according to linear prediction analysis LPC obtains LPC residual signals R (n) and the vocal tract filter An of current speech frame;

1.3) LPC residual signals R (n) is processed by amplitude of cycles sum of squares function SCMDSF;

1.5) speech parameter using speaker's pitch period and vocal tract filter as model deposits sound material storehouse in.

2) the current speech frame is carried out to pitch conversion and tone color conversion according to the model of appointment in sound material storehouse.

As shown in Figure 4, the detailed step that the present embodiment step 2), the current speech frame is carried out to pitch conversion according to the model of appointment in sound material storehouse is as follows:

2.1.1) the current speech frame is carried out to linear prediction analysis LPC;

2.1.2) result obtained according to linear prediction analysis LPC obtains the LPC residual signals R (n) of current speech frame;

2.1.3) the LPC residual signals is carried out discrete Fourier transform (DFT) DFT and calculates amplitude spectrum S (k);

2.1.4) by the LPC residual signals DFT after discrete Fourier transform (DFT) and and amplitude spectrum S (k) build and obtain pseudo-harmonic wave voice by pseudo-harmonic wave speech model;

2.1.5) extract the pitch period of the model of appointment in sound material storehouse;

2.1.6) according to the pitch period of model, pseudo-harmonic wave voice are carried out to the fundamental tone conversion;

As shown in Figure 5, the detailed step that the present embodiment step 2), the current speech frame is carried out to the tone color conversion according to the model of appointment in sound material storehouse is as follows:

2.2.1) the current speech frame is carried out to linear prediction analysis LPC;

2.2.2) result obtained according to linear prediction analysis LPC obtains the vocal tract filter An of current speech frame;

2.2.3) extract the vocal tract filter An ' of the model of appointment in sound material storehouse;

2.2.4) the vocal tract filter An ' of model is replaced to the result of the vocal tract filter An of current speech frame as the tone color conversion.

3) speech frame s (n) being carried out to result after the conversion of pitch conversion and tone color (residual error new signal R (n) ' and vocal tract filter An ') again synthesizes and obtains synthetic speech frame S (n) '.

4) synthetic speech frame S (n) ' is added into to the imitation speech frame.

Whether all speech frames that 5) judge voice signal all are disposed, if not yet be disposed (n is not equal to N) select one still untreated speech frame as current speech frame redirect execution step 2), otherwise (n equals N) redirect is carried out next step.

The imitation voice output that 6) will be formed by the imitation speech frame.

As shown in Figure 6, corresponding with the sound imitation method of the present embodiment, the sound of the present embodiment imitates device and comprises:

Load module, reach the model in the storehouse appointment of sound material for the voice signal that obtains the speaker, by voice signal, divides frame windowing pre-service, from pretreated voice signal, selects a speech frame as the current speech frame;

Imitate speech frame and add module, the synthetic speech frame is added into to the imitation speech frame;

Whether judging treatmenting module, all be disposed for all speech frames that judge voice signal, and if not yet were disposed would select one still untreated speech frame as current speech frame redirect execution step 2), otherwise redirect is carried out next step;

Imitate the voice output module, for the imitation voice output that will be formed by the imitation speech frame.

As shown in Figure 7, the pitch conversion module comprises:

Pseudo-harmonic wave speech production submodule, for the LPC residual signals by after discrete Fourier transform (DFT) and and amplitude spectrum build and obtain pseudo-harmonic wave voice by pseudo-harmonic wave speech model;

The first pitch period extracts submodule, for the pitch period of the model of extracting the storehouse appointment of sound material;

The fundamental tone transformation submodule, carry out the fundamental tone conversion for the pitch period according to model to pseudo-harmonic wave voice;

As shown in Figure 8, the tone color modular converter comprises:

The target vocal tract filter extracts submodule, for the vocal tract filter of the model of extracting the storehouse appointment of sound material;

The target vocal tract filter is replaced submodule, replaces the result of the vocal tract filter of current speech frame as the tone color conversion for the vocal tract filter using model.

As shown in Figure 9, the present embodiment also comprises sound material storehouse expansion module.Load module is except exporting voice signal to the speech simulation unit by pitch conversion module, tone color modular converter, speech frame synthesis module, imitation speech frame interpolation module, judging treatmenting module, imitation voice output module composition, also export sound material storehouse expansion module to, by sound material storehouse expansion module, deposit sound material storehouse in using speaker's voice signal as model.

As shown in figure 10, sound material storehouse expansion module comprises:

The amplitude of cycles quadratic sum is processed submodule, for the LPC residual signals is processed by the amplitude of cycles sum of squares function;

Model warehouse-in submodule, deposit sound material storehouse for the pitch period using the speaker and vocal tract filter in as the speech parameter of model.

As shown in figure 11, the utilization of the present embodiment load module has the hardware unit collection speaker voice of recording, playing function and mobile network service, and the reception speaker selects the model in sound material storehouse.The speech simulation unit carries out change of voice processing output according to above-mentioned gathered voice and change of voice object, and sound material storehouse expansion module extracts former speaker's speech parameter and is saved in sound material storehouse simultaneously, increases former speaker's model in sound material storehouse.

The above is only the preferred embodiment of the present invention, and protection scope of the present invention also not only is confined to above-described embodiment, and all technical schemes belonged under thinking of the present invention all belong to protection scope of the present invention.It should be pointed out that for those skilled in the art, some improvements and modifications without departing from the principles of the present invention, these improvements and modifications also should be considered as protection scope of the present invention.

Claims

1. a sound imitation method is characterized in that implementation step is as follows:

2. sound imitation method according to claim 1, is characterized in that, described step 2) in the current speech frame is carried out to pitch conversion according to the model of appointment in sound material storehouse detailed step as follows:

2.1.1) the current speech frame is carried out to linear prediction analysis;

3. sound imitation method according to claim 2, is characterized in that, described step 2) in the current speech frame is carried out to the tone color conversion according to the model of appointment in sound material storehouse detailed step as follows:

2.2.1) the current speech frame is carried out to linear prediction analysis;

4. according to claim 1 or 2 or 3 described sound imitation methods, it is characterized in that, in described step 1), described voice signal is divided after frame windowing pre-service and also comprise the step that deposits sound material storehouse using voice signal as model in, the described detailed step that deposits sound material storehouse using voice signal as model in is as follows:

1.1) the current speech frame is carried out to linear prediction analysis;

5. a sound imitates device, it is characterized in that comprising:

6. sound according to claim 5 imitates device, it is characterized in that, described pitch conversion module comprises:

7. sound according to claim 6 imitates device, it is characterized in that, described tone color modular converter comprises:

8. imitate device according to claim 5 or 6 or 7 described sound, it is characterized in that, described sound imitates device and also comprises sound material storehouse expansion module, and described sound material storehouse expansion module comprises: