CN109616131A - A kind of number real-time voice is changed voice method - Google Patents
A kind of number real-time voice is changed voice method Download PDFInfo
- Publication number
- CN109616131A CN109616131A CN201811342131.2A CN201811342131A CN109616131A CN 109616131 A CN109616131 A CN 109616131A CN 201811342131 A CN201811342131 A CN 201811342131A CN 109616131 A CN109616131 A CN 109616131A
- Authority
- CN
- China
- Prior art keywords
- voice
- sound
- signal
- fundamental tone
- changed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 10
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 10
- 230000008859 change Effects 0.000 claims description 9
- 239000000284 extract Substances 0.000 claims description 4
- 230000008569 process Effects 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 claims description 4
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 238000010801 machine learning Methods 0.000 claims description 3
- 230000005236 sound signal Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 5
- 238000012545 processing Methods 0.000 abstract description 5
- 238000005516 engineering process Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000001603 reducing effect Effects 0.000 description 2
- 230000005611 electricity Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/003—Changing voice quality, e.g. pitch or formants
- G10L21/007—Changing voice quality, e.g. pitch or formants characterised by the process used
Landscapes
- Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Auxiliary Devices For Music (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
It changes voice method the invention discloses a kind of digital real-time voice, it is adjusted and analyzes by the non-unvoiced part to raw tone, and according to comparison result, the original fundamental tone of signal substituting in particular person fundamental tone library is extracted, signal of changing voice further is obtained by synthesis and superposition processing.The characteristics of present invention effect of changing voice has the characteristics that naturalness and intelligibility are high, and the voice after changing voice is not easy to be reconditioned, and has stronger confidentiality, while the present invention has both low time delay and low complex degree.
Description
Technical field
It changes voice method the present invention relates to a kind of voice, belongs to Audiotechnica field.
Background technique
Changing voice is a kind of important voice processing technology, is widely used in interactive voice, secret communication, consumer electricity
Sub- equipment special sound effect etc..
Traditional voice is changed voice mainly using frequency modulation technology, such technology of changing voice is primarily present following technological deficiency: firstly,
Speech naturalness after changing voice is lower, while reducing intelligibility;Secondly, method of changing voice is simple, it is easy to be reverted to by people original
Voice, to influence the effect of secret communication;Finally, the complexity changed voice is higher, processing delay is larger, and real-time is limited
System.
Summary of the invention
Goal of the invention: in order to overcome the deficiencies in the prior art, the present invention provides a kind of real time digital voice and changes voice
Method, changes voice present in method following three problems the method overcome current main-stream: 1, effect naturalness of changing voice and can understand
Spend it is low, 2, change voice after voice be easy to be resumed, 3, treatment process of changing voice time delay it is higher, computational complexity is higher.
Technical solution: to achieve the above object, the technical solution adopted by the present invention are as follows:
A kind of number real-time voice is changed voice method, comprising the following steps:
Step 1, divided by sound and distinguish the voiceless sound in voice with non-voiceless sound.
Step 2, non-voiceless sound is decomposed by linear prediction, and raw tone is divided into two portions of former base sound and channel model
Point.
Step 3, former base sound is adjusted according to actual needs, be can be and is changed fundamental frequency, changes fundamental frequency pace of change
Deng.
Step 4, the Pitch Information in fundamental tone adjusted and particular person fundamental tone library is compared, finds out and is best suitable for requirement
Pitch signal.
Step 5, Pitch Information is reconstructed and optimized, revised pitch signal is obtained.
Step 6, fundamental tone and channel model carry out speech synthesis after amendment, form the non-Unvoiced signal after changing voice.
Step 7, original Unvoiced signal and non-Unvoiced signal are integrated, forms voice signal adjusted.
Preferred: particular person fundamental tone library mostlys come from the content that the voice of particular person is analyzed and extracted, including
The pitch signal of particular person corresponding common syllable and word during the pronunciation process.
Preferred: passing through linear prediction in step 2 for speech decomposition is two parts of channel model and former base sound, wherein
Channel model parameter retains, the speech synthesis for the later period.
It is preferred: former base sound adjusted being compared with all pitch signals in particular person fundamental tone library, passes through phase
Closing property compares, pattern match or machine learning method, obtains the most similar pitch signal segment.
Preferred: particular person fundamental tone library is stored in cloud system, while utilizing dedicated real-time retrieval system.
It is preferred: to be realized using DSP and ARM system.
Preferred: DSP realizes sound segmentation, linear prediction function, extracts the former base sound of non-Unvoiced signal.
Preferred: fundamental tone adjusted and channel model are synthesized non-Unvoiced signal by DSP, and further with original voiceless sound mould
Type is overlapped, and forms the voice signal after changing voice.
The present invention compared with prior art, has the advantages that
1, it is of the invention change voice during used in Pitch Information all from fundamental tone extracted in natural-sounding,
Rather than upconversion operation is directly carried out to voice, so speech naturalness and intelligibility are guaranteed.
2, the voice fundamental information after the present invention changes voice completely eliminated original language entirely from the sound bank of particular person
Characteristic information in sound signal, so being not easy by other system reducings.
3, the computational complexity that the present invention changes voice is low, and processing delay is small, in conjunction with cloud processing technique, is conducive to Real-time System
It realizes.
Detailed description of the invention
Fig. 1 is inflexion system schematic diagram
The present invention is based on the realization block diagrams of Floating-point DSP and ARM system by Fig. 2.
Specific embodiment
In the following with reference to the drawings and specific embodiments, the present invention is furture elucidated, it should be understood that these examples are merely to illustrate this
It invents rather than limits the scope of the invention, after the present invention has been read, those skilled in the art are to of the invention various
The modification of equivalent form falls within the application range as defined in the appended claims.
A kind of number real-time voice is changed voice method, as shown in Figure 1, including following 7 parts:
The voiceless sound in voice is distinguished with non-voiceless sound (voiced sound, voiced consonant, fricative) 1. being divided by sound;
2. non-voiceless sound (voiced sound, voiced consonant, fricative) is decomposed by linear prediction, raw tone is divided into former base sound
With two parts of channel model;
3. adjusting former base sound according to actual needs, it can be and change fundamental frequency, change fundamental frequency pace of change etc.;
4. the Pitch Information in fundamental tone adjusted and particular person fundamental tone library is compared, the base for being best suitable for requirement is found out
Sound signal;
5. reconstruct and optimization Pitch Information, obtain revised pitch signal;
6. fundamental tone and channel model carry out speech synthesis after amendment, the non-Unvoiced signal after changing voice is formed;
7. the non-Unvoiced signal by Unvoiced signal and after changing voice integrates, voice signal adjusted is formed.
Sound segmentation for distinguishing voiceless sound and non-unvoiced part in voice, wherein non-unvoiced part include voiced sound, it is turbid auxiliary
Sound and fricative, during synthesis, non-voiceless sound adjusted and original voiceless sound are overlapped by system, form new change voice
Voice signal afterwards.
Particular person fundamental tone library mostlys come from the content that the voice of particular person is analyzed and extracted, and is included in common sound
Pitch signal during section and word pronunciation.The fundamental tone library of particular person is established and needs specific training process.
By linear prediction by speech decomposition be two parts of channel model and former base sound, wherein channel model parameter protect
It stays, the speech synthesis for the later period.
According to the requirement of user, former base sound is adjusted, including adjustment fundamental frequency, adjustment fundamental frequency variation speed
Degree etc..
Former base sound adjusted is compared with all pitch signals in particular person fundamental tone library, passes through correlation ratio
Compared with the methods of, pattern match and machine learning, the most similar pitch signal segment is obtained, and do certain optimization, optimization
Purpose is mainly to guarantee the continuity of fundamental tone, improves the naturalness of voice, ultimately forms amendment fundamental tone.
Particular person fundamental tone library can be stored in cloud system, while utilize dedicated searching system, improve the effect of system
Rate and utilization rate.
Amendment fundamental tone and channel model carry out synthesis and form revised non-unvoiced speech section.
Inflexion system is adjusted and analyzes to the non-unvoiced part of raw tone, and according to comparison result, extracts specific
The original fundamental tone of signal substituting in people's fundamental tone library further obtains signal of changing voice by synthesis and overlap-add operation.To particular person base
Result of the sound library from speech analysis and extraction to particular person.
As shown in Fig. 2, whole system is based on Floating-point DSP and ARM system is realized:
1, the adjustment requirement of system is passed to Floating-point DSP by ARM;
2, microphone acquisition data pass to Floating-point DSP by ADC (analog-digital converter), input as system;
3, Floating-point DSP is fed signal loudspeaker playback by DAC (digital analog converter), is exported as system;
4, Floating-point DSP realizes the functions such as sound segmentation, linear prediction, extracts the former base sound of non-Unvoiced signal;
5, Floating-point DSP is adjusted former base sound, and former base sound adjusted is passed to cloud by ARM;
6, former base sound adjusted and particular person fundamental tone library are compared in cloud, find out the most similar pitch signal,
And by the signal return to Floating-point DSP;
Fundamental tone adjusted and channel model are synthesized non-Unvoiced signal by Floating-point DSP, and further with original Unvoiced signal
It is overlapped, forms the voice signal after changing voice.
The above is only a preferred embodiment of the present invention, it should be pointed out that: for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (9)
- A kind of method 1. number real-time voice is changed voice, which comprises the following steps:Step 1, divided by sound and distinguish the voiceless sound in voice with non-voiceless sound;Step 2, non-voiceless sound is decomposed by linear prediction, and raw tone is divided into two parts of former base sound and channel model;Step 3, former base sound is adjusted according to actual needs;Step 4, the Pitch Information in fundamental tone adjusted and particular person fundamental tone library is compared, finds out the base for being best suitable for requirement Sound signal;Step 5, Pitch Information is reconstructed and optimized, revised pitch signal is obtained;Step 6, fundamental tone and channel model carry out speech synthesis after amendment, form the non-Unvoiced signal after changing voice;Step 7, original Unvoiced signal and non-Unvoiced signal are integrated, forms voice signal adjusted.
- 2. digital real-time voice is changed voice method according to claim 1, it is characterised in that: particular person fundamental tone library mostlys come from The content that the voice of particular person is analyzed and extracted, including particular person corresponding common syllable and word during the pronunciation process Pitch signal.
- 3. digital real-time voice is changed voice method according to claim 1, it is characterised in that: pass through linear prediction general in step 2 Speech decomposition is two parts of channel model and former base sound, wherein channel model parameter retains, the speech synthesis for the later period.
- 4. digital real-time voice is changed voice method according to claim 1, it is characterised in that: by former base sound adjusted with it is specific All pitch signals in people's fundamental tone library are compared, and are compared by correlation, pattern match or machine learning method, obtain The most similar pitch signal segment.
- 5. digital real-time voice is changed voice method according to claim 1, it is characterised in that: particular person fundamental tone library saves beyond the clouds In system, while utilizing dedicated real-time retrieval system.
- 6. digital real-time voice is changed voice method according to claim 1, it is characterised in that: realized using DSP and ARM system.
- 7. digital real-time voice is changed voice method according to claim 6, it is characterised in that: DSP realizes sound segmentation, linear pre- Brake extracts the former base sound of non-Unvoiced signal.
- 8. digital real-time voice is changed voice method according to claim 7, it is characterised in that: DSP is by fundamental tone harmony adjusted Road model synthesizes non-Unvoiced signal, and is further overlapped with original voiceless sound model, forms the voice signal after changing voice.
- 9. digital real-time voice is changed voice method according to claim 1, it is characterised in that: adjusted according to actual needs in step 3 Whole former base sound includes changing fundamental frequency and/or change fundamental frequency pace of change.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811342131.2A CN109616131B (en) | 2018-11-12 | 2018-11-12 | Digital real-time voice sound changing method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811342131.2A CN109616131B (en) | 2018-11-12 | 2018-11-12 | Digital real-time voice sound changing method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109616131A true CN109616131A (en) | 2019-04-12 |
CN109616131B CN109616131B (en) | 2023-07-07 |
Family
ID=66003036
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811342131.2A Active CN109616131B (en) | 2018-11-12 | 2018-11-12 | Digital real-time voice sound changing method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109616131B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110364177A (en) * | 2019-07-11 | 2019-10-22 | 努比亚技术有限公司 | Method of speech processing, mobile terminal and computer readable storage medium |
CN110942765A (en) * | 2019-11-11 | 2020-03-31 | 珠海格力电器股份有限公司 | Method, device, server and storage medium for constructing corpus |
CN111739547A (en) * | 2020-07-24 | 2020-10-02 | 深圳市声扬科技有限公司 | Voice matching method and device, computer equipment and storage medium |
CN113486964A (en) * | 2021-07-13 | 2021-10-08 | 盛景智能科技(嘉兴)有限公司 | Voice activity detection method and device, electronic equipment and storage medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1152776A (en) * | 1995-10-26 | 1997-06-25 | 索尼公司 | Method and arrangement for phoneme signal duplicating, decoding and synthesizing |
US20030055646A1 (en) * | 1998-06-15 | 2003-03-20 | Yamaha Corporation | Voice converter with extraction and modification of attribute data |
CN1567428A (en) * | 2003-06-19 | 2005-01-19 | 北京中科信利技术有限公司 | Phoneme changing method based on digital signal processing |
CN101354889A (en) * | 2008-09-18 | 2009-01-28 | 北京中星微电子有限公司 | Method and apparatus for tonal modification of voice |
CN101399044A (en) * | 2007-09-29 | 2009-04-01 | 国际商业机器公司 | Voice conversion method and system |
CN101510424A (en) * | 2009-03-12 | 2009-08-19 | 孟智平 | Method and system for encoding and synthesizing speech based on speech primitive |
CN102592590A (en) * | 2012-02-21 | 2012-07-18 | 华南理工大学 | Arbitrarily adjustable method and device for changing phoneme naturally |
CN102982809A (en) * | 2012-12-11 | 2013-03-20 | 中国科学技术大学 | Conversion method for sound of speaker |
CN103489443A (en) * | 2013-09-17 | 2014-01-01 | 湖南大学 | Method and device for imitating sound |
CN203386472U (en) * | 2013-04-26 | 2014-01-08 | 天津科技大学 | Character voice changer |
CN105023570A (en) * | 2014-04-30 | 2015-11-04 | 安徽科大讯飞信息科技股份有限公司 | method and system of transforming speech |
CN107924678A (en) * | 2015-09-16 | 2018-04-17 | 株式会社东芝 | Speech synthetic device, phoneme synthesizing method, voice operation program, phonetic synthesis model learning device, phonetic synthesis model learning method and phonetic synthesis model learning program |
CN108682413A (en) * | 2018-04-24 | 2018-10-19 | 上海师范大学 | A kind of emotion direct system based on voice conversion |
-
2018
- 2018-11-12 CN CN201811342131.2A patent/CN109616131B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1152776A (en) * | 1995-10-26 | 1997-06-25 | 索尼公司 | Method and arrangement for phoneme signal duplicating, decoding and synthesizing |
US20030055646A1 (en) * | 1998-06-15 | 2003-03-20 | Yamaha Corporation | Voice converter with extraction and modification of attribute data |
CN1567428A (en) * | 2003-06-19 | 2005-01-19 | 北京中科信利技术有限公司 | Phoneme changing method based on digital signal processing |
CN101399044A (en) * | 2007-09-29 | 2009-04-01 | 国际商业机器公司 | Voice conversion method and system |
CN101354889A (en) * | 2008-09-18 | 2009-01-28 | 北京中星微电子有限公司 | Method and apparatus for tonal modification of voice |
CN101510424A (en) * | 2009-03-12 | 2009-08-19 | 孟智平 | Method and system for encoding and synthesizing speech based on speech primitive |
CN102592590A (en) * | 2012-02-21 | 2012-07-18 | 华南理工大学 | Arbitrarily adjustable method and device for changing phoneme naturally |
CN102982809A (en) * | 2012-12-11 | 2013-03-20 | 中国科学技术大学 | Conversion method for sound of speaker |
CN203386472U (en) * | 2013-04-26 | 2014-01-08 | 天津科技大学 | Character voice changer |
CN103489443A (en) * | 2013-09-17 | 2014-01-01 | 湖南大学 | Method and device for imitating sound |
CN105023570A (en) * | 2014-04-30 | 2015-11-04 | 安徽科大讯飞信息科技股份有限公司 | method and system of transforming speech |
CN107924678A (en) * | 2015-09-16 | 2018-04-17 | 株式会社东芝 | Speech synthetic device, phoneme synthesizing method, voice operation program, phonetic synthesis model learning device, phonetic synthesis model learning method and phonetic synthesis model learning program |
CN108682413A (en) * | 2018-04-24 | 2018-10-19 | 上海师范大学 | A kind of emotion direct system based on voice conversion |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110364177A (en) * | 2019-07-11 | 2019-10-22 | 努比亚技术有限公司 | Method of speech processing, mobile terminal and computer readable storage medium |
CN110364177B (en) * | 2019-07-11 | 2024-07-23 | 努比亚技术有限公司 | Voice processing method, mobile terminal and computer readable storage medium |
CN110942765A (en) * | 2019-11-11 | 2020-03-31 | 珠海格力电器股份有限公司 | Method, device, server and storage medium for constructing corpus |
CN110942765B (en) * | 2019-11-11 | 2022-05-27 | 珠海格力电器股份有限公司 | Method, device, server and storage medium for constructing corpus |
CN111739547A (en) * | 2020-07-24 | 2020-10-02 | 深圳市声扬科技有限公司 | Voice matching method and device, computer equipment and storage medium |
CN113486964A (en) * | 2021-07-13 | 2021-10-08 | 盛景智能科技(嘉兴)有限公司 | Voice activity detection method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109616131B (en) | 2023-07-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Childers et al. | Voice conversion | |
CN109616131A (en) | A kind of number real-time voice is changed voice method | |
Erro et al. | Harmonics plus noise model based vocoder for statistical parametric speech synthesis | |
Charpentier et al. | Diphone synthesis using an overlap-add technique for speech waveforms concatenation | |
Wali et al. | Generative adversarial networks for speech processing: A review | |
CN102568476B (en) | Voice conversion method based on self-organizing feature map network cluster and radial basis network | |
Qian et al. | A unified trajectory tiling approach to high quality speech rendering | |
CN1815552B (en) | Frequency spectrum modelling and voice reinforcing method based on line spectrum frequency and its interorder differential parameter | |
CN110648684B (en) | Bone conduction voice enhancement waveform generation method based on WaveNet | |
CN111210803A (en) | System and method for training clone timbre and rhythm based on Bottleneck characteristics | |
CN116364096B (en) | Electroencephalogram signal voice decoding method based on generation countermeasure network | |
CN113436606A (en) | Original sound speech translation method | |
Zhang et al. | Susing: Su-net for singing voice synthesis | |
CN116092471A (en) | Multi-style personalized Tibetan language speech synthesis model oriented to low-resource condition | |
Rao | Real time prosody modification | |
Jalin et al. | Text to speech synthesis system for tamil using HMM | |
Gao et al. | Polyscriber: Integrated fine-tuning of extractor and lyrics transcriber for polyphonic music | |
CN102231275B (en) | Embedded speech synthesis method based on weighted mixed excitation | |
Mizuno et al. | Waveform-based speech synthesis approach with a formant frequency modification | |
CN116913244A (en) | Speech synthesis method, equipment and medium | |
Chandra et al. | Towards the development of accent conversion model for (l1) bengali speaker using cycle consistent adversarial network (cyclegan) | |
CN114913844A (en) | Broadcast language identification method for pitch normalization reconstruction | |
Wen et al. | Amplitude Spectrum based Excitation Model for HMM-based Speech Synthesis. | |
Ding | A Systematic Review on the Development of Speech Synthesis | |
CN113744715A (en) | Vocoder speech synthesis method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |