CN109616131A - A kind of number real-time voice is changed voice method - Google Patents

A kind of number real-time voice is changed voice method Download PDF

Info

Publication number
CN109616131A
CN109616131A CN201811342131.2A CN201811342131A CN109616131A CN 109616131 A CN109616131 A CN 109616131A CN 201811342131 A CN201811342131 A CN 201811342131A CN 109616131 A CN109616131 A CN 109616131A
Authority
CN
China
Prior art keywords
voice
sound
signal
fundamental tone
changed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811342131.2A
Other languages
Chinese (zh)
Other versions
CN109616131B (en
Inventor
陈锴
刘晓峻
狄敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Nanda Electronic Information Technology Co Ltd
Nanjing Nanda Electronics Intelligent Service Robot Research Institute Co Ltd
Nanjing University
Original Assignee
Jiangsu Nanda Electronic Information Technology Co Ltd
Nanjing Nanda Electronics Intelligent Service Robot Research Institute Co Ltd
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Nanda Electronic Information Technology Co Ltd, Nanjing Nanda Electronics Intelligent Service Robot Research Institute Co Ltd, Nanjing University filed Critical Jiangsu Nanda Electronic Information Technology Co Ltd
Priority to CN201811342131.2A priority Critical patent/CN109616131B/en
Publication of CN109616131A publication Critical patent/CN109616131A/en
Application granted granted Critical
Publication of CN109616131B publication Critical patent/CN109616131B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used

Landscapes

  • Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Auxiliary Devices For Music (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

It changes voice method the invention discloses a kind of digital real-time voice, it is adjusted and analyzes by the non-unvoiced part to raw tone, and according to comparison result, the original fundamental tone of signal substituting in particular person fundamental tone library is extracted, signal of changing voice further is obtained by synthesis and superposition processing.The characteristics of present invention effect of changing voice has the characteristics that naturalness and intelligibility are high, and the voice after changing voice is not easy to be reconditioned, and has stronger confidentiality, while the present invention has both low time delay and low complex degree.

Description

A kind of number real-time voice is changed voice method
Technical field
It changes voice method the present invention relates to a kind of voice, belongs to Audiotechnica field.
Background technique
Changing voice is a kind of important voice processing technology, is widely used in interactive voice, secret communication, consumer electricity Sub- equipment special sound effect etc..
Traditional voice is changed voice mainly using frequency modulation technology, such technology of changing voice is primarily present following technological deficiency: firstly, Speech naturalness after changing voice is lower, while reducing intelligibility;Secondly, method of changing voice is simple, it is easy to be reverted to by people original Voice, to influence the effect of secret communication;Finally, the complexity changed voice is higher, processing delay is larger, and real-time is limited System.
Summary of the invention
Goal of the invention: in order to overcome the deficiencies in the prior art, the present invention provides a kind of real time digital voice and changes voice Method, changes voice present in method following three problems the method overcome current main-stream: 1, effect naturalness of changing voice and can understand Spend it is low, 2, change voice after voice be easy to be resumed, 3, treatment process of changing voice time delay it is higher, computational complexity is higher.
Technical solution: to achieve the above object, the technical solution adopted by the present invention are as follows:
A kind of number real-time voice is changed voice method, comprising the following steps:
Step 1, divided by sound and distinguish the voiceless sound in voice with non-voiceless sound.
Step 2, non-voiceless sound is decomposed by linear prediction, and raw tone is divided into two portions of former base sound and channel model Point.
Step 3, former base sound is adjusted according to actual needs, be can be and is changed fundamental frequency, changes fundamental frequency pace of change Deng.
Step 4, the Pitch Information in fundamental tone adjusted and particular person fundamental tone library is compared, finds out and is best suitable for requirement Pitch signal.
Step 5, Pitch Information is reconstructed and optimized, revised pitch signal is obtained.
Step 6, fundamental tone and channel model carry out speech synthesis after amendment, form the non-Unvoiced signal after changing voice.
Step 7, original Unvoiced signal and non-Unvoiced signal are integrated, forms voice signal adjusted.
Preferred: particular person fundamental tone library mostlys come from the content that the voice of particular person is analyzed and extracted, including The pitch signal of particular person corresponding common syllable and word during the pronunciation process.
Preferred: passing through linear prediction in step 2 for speech decomposition is two parts of channel model and former base sound, wherein Channel model parameter retains, the speech synthesis for the later period.
It is preferred: former base sound adjusted being compared with all pitch signals in particular person fundamental tone library, passes through phase Closing property compares, pattern match or machine learning method, obtains the most similar pitch signal segment.
Preferred: particular person fundamental tone library is stored in cloud system, while utilizing dedicated real-time retrieval system.
It is preferred: to be realized using DSP and ARM system.
Preferred: DSP realizes sound segmentation, linear prediction function, extracts the former base sound of non-Unvoiced signal.
Preferred: fundamental tone adjusted and channel model are synthesized non-Unvoiced signal by DSP, and further with original voiceless sound mould Type is overlapped, and forms the voice signal after changing voice.
The present invention compared with prior art, has the advantages that
1, it is of the invention change voice during used in Pitch Information all from fundamental tone extracted in natural-sounding, Rather than upconversion operation is directly carried out to voice, so speech naturalness and intelligibility are guaranteed.
2, the voice fundamental information after the present invention changes voice completely eliminated original language entirely from the sound bank of particular person Characteristic information in sound signal, so being not easy by other system reducings.
3, the computational complexity that the present invention changes voice is low, and processing delay is small, in conjunction with cloud processing technique, is conducive to Real-time System It realizes.
Detailed description of the invention
Fig. 1 is inflexion system schematic diagram
The present invention is based on the realization block diagrams of Floating-point DSP and ARM system by Fig. 2.
Specific embodiment
In the following with reference to the drawings and specific embodiments, the present invention is furture elucidated, it should be understood that these examples are merely to illustrate this It invents rather than limits the scope of the invention, after the present invention has been read, those skilled in the art are to of the invention various The modification of equivalent form falls within the application range as defined in the appended claims.
A kind of number real-time voice is changed voice method, as shown in Figure 1, including following 7 parts:
The voiceless sound in voice is distinguished with non-voiceless sound (voiced sound, voiced consonant, fricative) 1. being divided by sound;
2. non-voiceless sound (voiced sound, voiced consonant, fricative) is decomposed by linear prediction, raw tone is divided into former base sound With two parts of channel model;
3. adjusting former base sound according to actual needs, it can be and change fundamental frequency, change fundamental frequency pace of change etc.;
4. the Pitch Information in fundamental tone adjusted and particular person fundamental tone library is compared, the base for being best suitable for requirement is found out Sound signal;
5. reconstruct and optimization Pitch Information, obtain revised pitch signal;
6. fundamental tone and channel model carry out speech synthesis after amendment, the non-Unvoiced signal after changing voice is formed;
7. the non-Unvoiced signal by Unvoiced signal and after changing voice integrates, voice signal adjusted is formed.
Sound segmentation for distinguishing voiceless sound and non-unvoiced part in voice, wherein non-unvoiced part include voiced sound, it is turbid auxiliary Sound and fricative, during synthesis, non-voiceless sound adjusted and original voiceless sound are overlapped by system, form new change voice Voice signal afterwards.
Particular person fundamental tone library mostlys come from the content that the voice of particular person is analyzed and extracted, and is included in common sound Pitch signal during section and word pronunciation.The fundamental tone library of particular person is established and needs specific training process.
By linear prediction by speech decomposition be two parts of channel model and former base sound, wherein channel model parameter protect It stays, the speech synthesis for the later period.
According to the requirement of user, former base sound is adjusted, including adjustment fundamental frequency, adjustment fundamental frequency variation speed Degree etc..
Former base sound adjusted is compared with all pitch signals in particular person fundamental tone library, passes through correlation ratio Compared with the methods of, pattern match and machine learning, the most similar pitch signal segment is obtained, and do certain optimization, optimization Purpose is mainly to guarantee the continuity of fundamental tone, improves the naturalness of voice, ultimately forms amendment fundamental tone.
Particular person fundamental tone library can be stored in cloud system, while utilize dedicated searching system, improve the effect of system Rate and utilization rate.
Amendment fundamental tone and channel model carry out synthesis and form revised non-unvoiced speech section.
Inflexion system is adjusted and analyzes to the non-unvoiced part of raw tone, and according to comparison result, extracts specific The original fundamental tone of signal substituting in people's fundamental tone library further obtains signal of changing voice by synthesis and overlap-add operation.To particular person base Result of the sound library from speech analysis and extraction to particular person.
As shown in Fig. 2, whole system is based on Floating-point DSP and ARM system is realized:
1, the adjustment requirement of system is passed to Floating-point DSP by ARM;
2, microphone acquisition data pass to Floating-point DSP by ADC (analog-digital converter), input as system;
3, Floating-point DSP is fed signal loudspeaker playback by DAC (digital analog converter), is exported as system;
4, Floating-point DSP realizes the functions such as sound segmentation, linear prediction, extracts the former base sound of non-Unvoiced signal;
5, Floating-point DSP is adjusted former base sound, and former base sound adjusted is passed to cloud by ARM;
6, former base sound adjusted and particular person fundamental tone library are compared in cloud, find out the most similar pitch signal, And by the signal return to Floating-point DSP;
Fundamental tone adjusted and channel model are synthesized non-Unvoiced signal by Floating-point DSP, and further with original Unvoiced signal It is overlapped, forms the voice signal after changing voice.
The above is only a preferred embodiment of the present invention, it should be pointed out that: for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (9)

  1. A kind of method 1. number real-time voice is changed voice, which comprises the following steps:
    Step 1, divided by sound and distinguish the voiceless sound in voice with non-voiceless sound;
    Step 2, non-voiceless sound is decomposed by linear prediction, and raw tone is divided into two parts of former base sound and channel model;
    Step 3, former base sound is adjusted according to actual needs;
    Step 4, the Pitch Information in fundamental tone adjusted and particular person fundamental tone library is compared, finds out the base for being best suitable for requirement Sound signal;
    Step 5, Pitch Information is reconstructed and optimized, revised pitch signal is obtained;
    Step 6, fundamental tone and channel model carry out speech synthesis after amendment, form the non-Unvoiced signal after changing voice;
    Step 7, original Unvoiced signal and non-Unvoiced signal are integrated, forms voice signal adjusted.
  2. 2. digital real-time voice is changed voice method according to claim 1, it is characterised in that: particular person fundamental tone library mostlys come from The content that the voice of particular person is analyzed and extracted, including particular person corresponding common syllable and word during the pronunciation process Pitch signal.
  3. 3. digital real-time voice is changed voice method according to claim 1, it is characterised in that: pass through linear prediction general in step 2 Speech decomposition is two parts of channel model and former base sound, wherein channel model parameter retains, the speech synthesis for the later period.
  4. 4. digital real-time voice is changed voice method according to claim 1, it is characterised in that: by former base sound adjusted with it is specific All pitch signals in people's fundamental tone library are compared, and are compared by correlation, pattern match or machine learning method, obtain The most similar pitch signal segment.
  5. 5. digital real-time voice is changed voice method according to claim 1, it is characterised in that: particular person fundamental tone library saves beyond the clouds In system, while utilizing dedicated real-time retrieval system.
  6. 6. digital real-time voice is changed voice method according to claim 1, it is characterised in that: realized using DSP and ARM system.
  7. 7. digital real-time voice is changed voice method according to claim 6, it is characterised in that: DSP realizes sound segmentation, linear pre- Brake extracts the former base sound of non-Unvoiced signal.
  8. 8. digital real-time voice is changed voice method according to claim 7, it is characterised in that: DSP is by fundamental tone harmony adjusted Road model synthesizes non-Unvoiced signal, and is further overlapped with original voiceless sound model, forms the voice signal after changing voice.
  9. 9. digital real-time voice is changed voice method according to claim 1, it is characterised in that: adjusted according to actual needs in step 3 Whole former base sound includes changing fundamental frequency and/or change fundamental frequency pace of change.
CN201811342131.2A 2018-11-12 2018-11-12 Digital real-time voice sound changing method Active CN109616131B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811342131.2A CN109616131B (en) 2018-11-12 2018-11-12 Digital real-time voice sound changing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811342131.2A CN109616131B (en) 2018-11-12 2018-11-12 Digital real-time voice sound changing method

Publications (2)

Publication Number Publication Date
CN109616131A true CN109616131A (en) 2019-04-12
CN109616131B CN109616131B (en) 2023-07-07

Family

ID=66003036

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811342131.2A Active CN109616131B (en) 2018-11-12 2018-11-12 Digital real-time voice sound changing method

Country Status (1)

Country Link
CN (1) CN109616131B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110364177A (en) * 2019-07-11 2019-10-22 努比亚技术有限公司 Method of speech processing, mobile terminal and computer readable storage medium
CN110942765A (en) * 2019-11-11 2020-03-31 珠海格力电器股份有限公司 Method, device, server and storage medium for constructing corpus
CN111739547A (en) * 2020-07-24 2020-10-02 深圳市声扬科技有限公司 Voice matching method and device, computer equipment and storage medium
CN113486964A (en) * 2021-07-13 2021-10-08 盛景智能科技(嘉兴)有限公司 Voice activity detection method and device, electronic equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1152776A (en) * 1995-10-26 1997-06-25 索尼公司 Method and arrangement for phoneme signal duplicating, decoding and synthesizing
US20030055646A1 (en) * 1998-06-15 2003-03-20 Yamaha Corporation Voice converter with extraction and modification of attribute data
CN1567428A (en) * 2003-06-19 2005-01-19 北京中科信利技术有限公司 Phoneme changing method based on digital signal processing
CN101354889A (en) * 2008-09-18 2009-01-28 北京中星微电子有限公司 Method and apparatus for tonal modification of voice
CN101399044A (en) * 2007-09-29 2009-04-01 国际商业机器公司 Voice conversion method and system
CN101510424A (en) * 2009-03-12 2009-08-19 孟智平 Method and system for encoding and synthesizing speech based on speech primitive
CN102592590A (en) * 2012-02-21 2012-07-18 华南理工大学 Arbitrarily adjustable method and device for changing phoneme naturally
CN102982809A (en) * 2012-12-11 2013-03-20 中国科学技术大学 Conversion method for sound of speaker
CN103489443A (en) * 2013-09-17 2014-01-01 湖南大学 Method and device for imitating sound
CN203386472U (en) * 2013-04-26 2014-01-08 天津科技大学 Character voice changer
CN105023570A (en) * 2014-04-30 2015-11-04 安徽科大讯飞信息科技股份有限公司 method and system of transforming speech
CN107924678A (en) * 2015-09-16 2018-04-17 株式会社东芝 Speech synthetic device, phoneme synthesizing method, voice operation program, phonetic synthesis model learning device, phonetic synthesis model learning method and phonetic synthesis model learning program
CN108682413A (en) * 2018-04-24 2018-10-19 上海师范大学 A kind of emotion direct system based on voice conversion

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1152776A (en) * 1995-10-26 1997-06-25 索尼公司 Method and arrangement for phoneme signal duplicating, decoding and synthesizing
US20030055646A1 (en) * 1998-06-15 2003-03-20 Yamaha Corporation Voice converter with extraction and modification of attribute data
CN1567428A (en) * 2003-06-19 2005-01-19 北京中科信利技术有限公司 Phoneme changing method based on digital signal processing
CN101399044A (en) * 2007-09-29 2009-04-01 国际商业机器公司 Voice conversion method and system
CN101354889A (en) * 2008-09-18 2009-01-28 北京中星微电子有限公司 Method and apparatus for tonal modification of voice
CN101510424A (en) * 2009-03-12 2009-08-19 孟智平 Method and system for encoding and synthesizing speech based on speech primitive
CN102592590A (en) * 2012-02-21 2012-07-18 华南理工大学 Arbitrarily adjustable method and device for changing phoneme naturally
CN102982809A (en) * 2012-12-11 2013-03-20 中国科学技术大学 Conversion method for sound of speaker
CN203386472U (en) * 2013-04-26 2014-01-08 天津科技大学 Character voice changer
CN103489443A (en) * 2013-09-17 2014-01-01 湖南大学 Method and device for imitating sound
CN105023570A (en) * 2014-04-30 2015-11-04 安徽科大讯飞信息科技股份有限公司 method and system of transforming speech
CN107924678A (en) * 2015-09-16 2018-04-17 株式会社东芝 Speech synthetic device, phoneme synthesizing method, voice operation program, phonetic synthesis model learning device, phonetic synthesis model learning method and phonetic synthesis model learning program
CN108682413A (en) * 2018-04-24 2018-10-19 上海师范大学 A kind of emotion direct system based on voice conversion

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110364177A (en) * 2019-07-11 2019-10-22 努比亚技术有限公司 Method of speech processing, mobile terminal and computer readable storage medium
CN110364177B (en) * 2019-07-11 2024-07-23 努比亚技术有限公司 Voice processing method, mobile terminal and computer readable storage medium
CN110942765A (en) * 2019-11-11 2020-03-31 珠海格力电器股份有限公司 Method, device, server and storage medium for constructing corpus
CN110942765B (en) * 2019-11-11 2022-05-27 珠海格力电器股份有限公司 Method, device, server and storage medium for constructing corpus
CN111739547A (en) * 2020-07-24 2020-10-02 深圳市声扬科技有限公司 Voice matching method and device, computer equipment and storage medium
CN113486964A (en) * 2021-07-13 2021-10-08 盛景智能科技(嘉兴)有限公司 Voice activity detection method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109616131B (en) 2023-07-07

Similar Documents

Publication Publication Date Title
Childers et al. Voice conversion
CN109616131A (en) A kind of number real-time voice is changed voice method
Erro et al. Harmonics plus noise model based vocoder for statistical parametric speech synthesis
Charpentier et al. Diphone synthesis using an overlap-add technique for speech waveforms concatenation
Wali et al. Generative adversarial networks for speech processing: A review
CN102568476B (en) Voice conversion method based on self-organizing feature map network cluster and radial basis network
Qian et al. A unified trajectory tiling approach to high quality speech rendering
CN1815552B (en) Frequency spectrum modelling and voice reinforcing method based on line spectrum frequency and its interorder differential parameter
CN110648684B (en) Bone conduction voice enhancement waveform generation method based on WaveNet
CN111210803A (en) System and method for training clone timbre and rhythm based on Bottleneck characteristics
CN116364096B (en) Electroencephalogram signal voice decoding method based on generation countermeasure network
CN113436606A (en) Original sound speech translation method
Zhang et al. Susing: Su-net for singing voice synthesis
CN116092471A (en) Multi-style personalized Tibetan language speech synthesis model oriented to low-resource condition
Rao Real time prosody modification
Jalin et al. Text to speech synthesis system for tamil using HMM
Gao et al. Polyscriber: Integrated fine-tuning of extractor and lyrics transcriber for polyphonic music
CN102231275B (en) Embedded speech synthesis method based on weighted mixed excitation
Mizuno et al. Waveform-based speech synthesis approach with a formant frequency modification
CN116913244A (en) Speech synthesis method, equipment and medium
Chandra et al. Towards the development of accent conversion model for (l1) bengali speaker using cycle consistent adversarial network (cyclegan)
CN114913844A (en) Broadcast language identification method for pitch normalization reconstruction
Wen et al. Amplitude Spectrum based Excitation Model for HMM-based Speech Synthesis.
Ding A Systematic Review on the Development of Speech Synthesis
CN113744715A (en) Vocoder speech synthesis method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant