CN109616131A

CN109616131A - A kind of number real-time voice is changed voice method

Info

Publication number: CN109616131A
Application number: CN201811342131.2A
Authority: CN
Inventors: 陈锴; 刘晓峻; 狄敏
Original assignee: Jiangsu Nanda Electronic Information Technology Co Ltd; Nanjing Nanda Electronics Intelligent Service Robot Research Institute Co Ltd; Nanjing University
Current assignee: Jiangsu Nanda Electronic Information Technology Co Ltd; Nanjing Nanda Electronics Intelligent Service Robot Research Institute Co Ltd; Nanjing University
Priority date: 2018-11-12
Filing date: 2018-11-12
Publication date: 2019-04-12
Anticipated expiration: 2038-11-12
Also published as: CN109616131B

Abstract

It changes voice method the invention discloses a kind of digital real-time voice, it is adjusted and analyzes by the non-unvoiced part to raw tone, and according to comparison result, the original fundamental tone of signal substituting in particular person fundamental tone library is extracted, signal of changing voice further is obtained by synthesis and superposition processing.The characteristics of present invention effect of changing voice has the characteristics that naturalness and intelligibility are high, and the voice after changing voice is not easy to be reconditioned, and has stronger confidentiality, while the present invention has both low time delay and low complex degree.

Description

A kind of number real-time voice is changed voice method

Technical field

It changes voice method the present invention relates to a kind of voice, belongs to Audiotechnica field.

Background technique

Changing voice is a kind of important voice processing technology, is widely used in interactive voice, secret communication, consumer electricity Sub- equipment special sound effect etc..

Traditional voice is changed voice mainly using frequency modulation technology, such technology of changing voice is primarily present following technological deficiency: firstly, Speech naturalness after changing voice is lower, while reducing intelligibility；Secondly, method of changing voice is simple, it is easy to be reverted to by people original Voice, to influence the effect of secret communication；Finally, the complexity changed voice is higher, processing delay is larger, and real-time is limited System.

Summary of the invention

Goal of the invention: in order to overcome the deficiencies in the prior art, the present invention provides a kind of real time digital voice and changes voice Method, changes voice present in method following three problems the method overcome current main-stream: 1, effect naturalness of changing voice and can understand Spend it is low, 2, change voice after voice be easy to be resumed, 3, treatment process of changing voice time delay it is higher, computational complexity is higher.

Technical solution: to achieve the above object, the technical solution adopted by the present invention are as follows:

A kind of number real-time voice is changed voice method, comprising the following steps:

Step 1, divided by sound and distinguish the voiceless sound in voice with non-voiceless sound.

Step 2, non-voiceless sound is decomposed by linear prediction, and raw tone is divided into two portions of former base sound and channel model Point.

Step 3, former base sound is adjusted according to actual needs, be can be and is changed fundamental frequency, changes fundamental frequency pace of change Deng.

Step 4, the Pitch Information in fundamental tone adjusted and particular person fundamental tone library is compared, finds out and is best suitable for requirement Pitch signal.

Step 5, Pitch Information is reconstructed and optimized, revised pitch signal is obtained.

Step 6, fundamental tone and channel model carry out speech synthesis after amendment, form the non-Unvoiced signal after changing voice.

Step 7, original Unvoiced signal and non-Unvoiced signal are integrated, forms voice signal adjusted.

Preferred: particular person fundamental tone library mostlys come from the content that the voice of particular person is analyzed and extracted, including The pitch signal of particular person corresponding common syllable and word during the pronunciation process.

Preferred: passing through linear prediction in step 2 for speech decomposition is two parts of channel model and former base sound, wherein Channel model parameter retains, the speech synthesis for the later period.

It is preferred: former base sound adjusted being compared with all pitch signals in particular person fundamental tone library, passes through phase Closing property compares, pattern match or machine learning method, obtains the most similar pitch signal segment.

Preferred: particular person fundamental tone library is stored in cloud system, while utilizing dedicated real-time retrieval system.

It is preferred: to be realized using DSP and ARM system.

Preferred: DSP realizes sound segmentation, linear prediction function, extracts the former base sound of non-Unvoiced signal.

Preferred: fundamental tone adjusted and channel model are synthesized non-Unvoiced signal by DSP, and further with original voiceless sound mould Type is overlapped, and forms the voice signal after changing voice.

The present invention compared with prior art, has the advantages that

1, it is of the invention change voice during used in Pitch Information all from fundamental tone extracted in natural-sounding, Rather than upconversion operation is directly carried out to voice, so speech naturalness and intelligibility are guaranteed.

2, the voice fundamental information after the present invention changes voice completely eliminated original language entirely from the sound bank of particular person Characteristic information in sound signal, so being not easy by other system reducings.

3, the computational complexity that the present invention changes voice is low, and processing delay is small, in conjunction with cloud processing technique, is conducive to Real-time System It realizes.

Detailed description of the invention

Fig. 1 is inflexion system schematic diagram

The present invention is based on the realization block diagrams of Floating-point DSP and ARM system by Fig. 2.

Specific embodiment

In the following with reference to the drawings and specific embodiments, the present invention is furture elucidated, it should be understood that these examples are merely to illustrate this It invents rather than limits the scope of the invention, after the present invention has been read, those skilled in the art are to of the invention various The modification of equivalent form falls within the application range as defined in the appended claims.

A kind of number real-time voice is changed voice method, as shown in Figure 1, including following 7 parts:

The voiceless sound in voice is distinguished with non-voiceless sound (voiced sound, voiced consonant, fricative) 1. being divided by sound；

2. non-voiceless sound (voiced sound, voiced consonant, fricative) is decomposed by linear prediction, raw tone is divided into former base sound With two parts of channel model；

3. adjusting former base sound according to actual needs, it can be and change fundamental frequency, change fundamental frequency pace of change etc.；

4. the Pitch Information in fundamental tone adjusted and particular person fundamental tone library is compared, the base for being best suitable for requirement is found out Sound signal；

5. reconstruct and optimization Pitch Information, obtain revised pitch signal；

6. fundamental tone and channel model carry out speech synthesis after amendment, the non-Unvoiced signal after changing voice is formed；

7. the non-Unvoiced signal by Unvoiced signal and after changing voice integrates, voice signal adjusted is formed.

Sound segmentation for distinguishing voiceless sound and non-unvoiced part in voice, wherein non-unvoiced part include voiced sound, it is turbid auxiliary Sound and fricative, during synthesis, non-voiceless sound adjusted and original voiceless sound are overlapped by system, form new change voice Voice signal afterwards.

Particular person fundamental tone library mostlys come from the content that the voice of particular person is analyzed and extracted, and is included in common sound Pitch signal during section and word pronunciation.The fundamental tone library of particular person is established and needs specific training process.

By linear prediction by speech decomposition be two parts of channel model and former base sound, wherein channel model parameter protect It stays, the speech synthesis for the later period.

According to the requirement of user, former base sound is adjusted, including adjustment fundamental frequency, adjustment fundamental frequency variation speed Degree etc..

Former base sound adjusted is compared with all pitch signals in particular person fundamental tone library, passes through correlation ratio Compared with the methods of, pattern match and machine learning, the most similar pitch signal segment is obtained, and do certain optimization, optimization Purpose is mainly to guarantee the continuity of fundamental tone, improves the naturalness of voice, ultimately forms amendment fundamental tone.

Particular person fundamental tone library can be stored in cloud system, while utilize dedicated searching system, improve the effect of system Rate and utilization rate.

Amendment fundamental tone and channel model carry out synthesis and form revised non-unvoiced speech section.

Inflexion system is adjusted and analyzes to the non-unvoiced part of raw tone, and according to comparison result, extracts specific The original fundamental tone of signal substituting in people's fundamental tone library further obtains signal of changing voice by synthesis and overlap-add operation.To particular person base Result of the sound library from speech analysis and extraction to particular person.

As shown in Fig. 2, whole system is based on Floating-point DSP and ARM system is realized:

1, the adjustment requirement of system is passed to Floating-point DSP by ARM；

2, microphone acquisition data pass to Floating-point DSP by ADC (analog-digital converter), input as system；

3, Floating-point DSP is fed signal loudspeaker playback by DAC (digital analog converter), is exported as system；

4, Floating-point DSP realizes the functions such as sound segmentation, linear prediction, extracts the former base sound of non-Unvoiced signal；

5, Floating-point DSP is adjusted former base sound, and former base sound adjusted is passed to cloud by ARM；

6, former base sound adjusted and particular person fundamental tone library are compared in cloud, find out the most similar pitch signal, And by the signal return to Floating-point DSP；

Fundamental tone adjusted and channel model are synthesized non-Unvoiced signal by Floating-point DSP, and further with original Unvoiced signal It is overlapped, forms the voice signal after changing voice.

The above is only a preferred embodiment of the present invention, it should be pointed out that: for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

A kind of method 1. number real-time voice is changed voice, which comprises the following steps:

Step 1, divided by sound and distinguish the voiceless sound in voice with non-voiceless sound；

Step 2, non-voiceless sound is decomposed by linear prediction, and raw tone is divided into two parts of former base sound and channel model；

Step 3, former base sound is adjusted according to actual needs；

Step 4, the Pitch Information in fundamental tone adjusted and particular person fundamental tone library is compared, finds out the base for being best suitable for requirement Sound signal；

Step 5, Pitch Information is reconstructed and optimized, revised pitch signal is obtained；

Step 6, fundamental tone and channel model carry out speech synthesis after amendment, form the non-Unvoiced signal after changing voice；

Step 7, original Unvoiced signal and non-Unvoiced signal are integrated, forms voice signal adjusted.
2. digital real-time voice is changed voice method according to claim 1, it is characterised in that: particular person fundamental tone library mostlys come from The content that the voice of particular person is analyzed and extracted, including particular person corresponding common syllable and word during the pronunciation process Pitch signal.
3. digital real-time voice is changed voice method according to claim 1, it is characterised in that: pass through linear prediction general in step 2 Speech decomposition is two parts of channel model and former base sound, wherein channel model parameter retains, the speech synthesis for the later period.
4. digital real-time voice is changed voice method according to claim 1, it is characterised in that: by former base sound adjusted with it is specific All pitch signals in people's fundamental tone library are compared, and are compared by correlation, pattern match or machine learning method, obtain The most similar pitch signal segment.
5. digital real-time voice is changed voice method according to claim 1, it is characterised in that: particular person fundamental tone library saves beyond the clouds In system, while utilizing dedicated real-time retrieval system.
6. digital real-time voice is changed voice method according to claim 1, it is characterised in that: realized using DSP and ARM system.
7. digital real-time voice is changed voice method according to claim 6, it is characterised in that: DSP realizes sound segmentation, linear pre- Brake extracts the former base sound of non-Unvoiced signal.
8. digital real-time voice is changed voice method according to claim 7, it is characterised in that: DSP is by fundamental tone harmony adjusted Road model synthesizes non-Unvoiced signal, and is further overlapped with original voiceless sound model, forms the voice signal after changing voice.
9. digital real-time voice is changed voice method according to claim 1, it is characterised in that: adjusted according to actual needs in step 3 Whole former base sound includes changing fundamental frequency and/or change fundamental frequency pace of change.