CN109616131B - Digital real-time voice sound changing method - Google Patents

Digital real-time voice sound changing method Download PDF

Info

Publication number
CN109616131B
CN109616131B CN201811342131.2A CN201811342131A CN109616131B CN 109616131 B CN109616131 B CN 109616131B CN 201811342131 A CN201811342131 A CN 201811342131A CN 109616131 B CN109616131 B CN 109616131B
Authority
CN
China
Prior art keywords
fundamental tone
voice
original
pitch
sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811342131.2A
Other languages
Chinese (zh)
Other versions
CN109616131A (en
Inventor
陈锴
刘晓峻
狄敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Province Nanjing University Of Science And Technology Electronic Information Technology Co ltd
Nanjing Nanda Electronic Wisdom Service Robot Research Institute Co ltd
Nanjing University
Original Assignee
Jiangsu Province Nanjing University Of Science And Technology Electronic Information Technology Co ltd
Nanjing Nanda Electronic Wisdom Service Robot Research Institute Co ltd
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Province Nanjing University Of Science And Technology Electronic Information Technology Co ltd, Nanjing Nanda Electronic Wisdom Service Robot Research Institute Co ltd, Nanjing University filed Critical Jiangsu Province Nanjing University Of Science And Technology Electronic Information Technology Co ltd
Priority to CN201811342131.2A priority Critical patent/CN109616131B/en
Publication of CN109616131A publication Critical patent/CN109616131A/en
Application granted granted Critical
Publication of CN109616131B publication Critical patent/CN109616131B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/003Changing voice quality, e.g. pitch or formants
    • G10L21/007Changing voice quality, e.g. pitch or formants characterised by the process used

Abstract

The invention discloses a digital real-time voice sound changing method, which is characterized in that a non-unvoiced part of original voice is adjusted and analyzed, signals in a specific person pitch library are extracted according to a comparison result to replace original pitches, and a sound changing signal is further obtained through synthesis and superposition processing. The invention has the characteristics of high naturalness and intelligibility, the voice after the sound change is not easy to be restored, and the invention has stronger confidentiality, and simultaneously has the characteristics of low time delay and low complexity.

Description

Digital real-time voice sound changing method
Technical Field
The invention relates to a voice change method, and belongs to the technical field of audio.
Background
The sound variation is an important voice processing technology and is widely applied to voice interaction, secret communication, special sound effects of consumer electronic equipment and the like.
The traditional voice sound changing technology mainly uses a frequency modulation technology, and the sound changing technology mainly has the following technical defects: firstly, the naturalness of the voice after the voice is changed is lower, and the intelligibility is reduced; secondly, the sound changing method is simple and is easy to restore to original voice by people, thereby affecting the effect of secret communication; finally, the complexity of the tone variation is higher, the processing time delay is larger, and the real-time performance is limited.
Disclosure of Invention
The invention aims to: in order to overcome the defects in the prior art, the invention provides a real-time digital voice sound changing method, which overcomes the following three problems in the current main stream sound changing method: 1. the naturalness and the intelligibility of the sound-changing effect are low, 2, the voice after sound changing is easy to recover, 3, the time delay of the sound-changing processing process is higher, and the operation complexity is higher.
The technical scheme is as follows: in order to achieve the above purpose, the invention adopts the following technical scheme:
a digital real-time voice pitch method, comprising the steps of:
and step 1, distinguishing unvoiced sound from non-unvoiced sound in the voice through initial and final segmentation.
And 2, decomposing the non-unvoiced sound through linear prediction, and dividing the original voice into two parts of an original fundamental tone model and an original acoustic model.
And step 3, adjusting the original fundamental tone according to the actual requirement, which can be changing the fundamental tone frequency, changing the fundamental tone frequency change speed and the like.
And 4, comparing the adjusted fundamental tone with fundamental tone information in a specific person fundamental tone library to find out the fundamental tone signal which meets the requirement best.
And 5, reconstructing and optimizing the pitch information to obtain a corrected pitch signal.
And 6, performing voice synthesis on the corrected fundamental tone and the vocal tract model to form a non-unvoiced sound signal after the sound change.
And 7, synthesizing the original unvoiced sound signal and the non-unvoiced sound signal to form an adjusted voice signal.
Preferably: the pitch libraries of the specific person are mainly from the content of analyzing and extracting the voice of the specific person, and comprise pitch signals of common syllables and words corresponding to the specific person in the pronunciation process.
Preferably: in the step 2, the voice is decomposed into two parts of a sound channel model and an original fundamental tone through linear prediction, wherein parameters of the sound channel model are reserved for later voice synthesis.
Preferably: and comparing the adjusted original pitch with all pitch signals in a specific person pitch library, and obtaining the most similar pitch signal fragments through a correlation comparison method, a pattern matching method or a machine learning method.
Preferably: the specific person pitch library is stored in the cloud system, and a special real-time retrieval system is utilized.
Preferably: the method is realized by adopting a DSP and ARM system.
Preferably: the DSP realizes the functions of initial and final segmentation and linear prediction and extracts the original fundamental tone of the non-unvoiced sound signal.
Preferably: and the DSP synthesizes the adjusted fundamental tone and sound channel model into a non-unvoiced sound signal and further overlaps the original unvoiced sound model to form a voice signal after the sound change.
Compared with the prior art, the invention has the following beneficial effects:
1. the pitch information used in the pitch changing process of the invention is all from the extracted pitch in the natural voice, and the frequency conversion operation is not directly carried out on the voice, so the naturalness and the intelligibility of the voice are ensured.
2. The voice pitch information after the voice change is completely from a voice library of a specific person, and the characteristic information in the original voice signal is completely removed, so that the voice pitch information is not easy to restore by other systems.
3. The invention has low operation complexity of the variable tone and small processing time delay, and is beneficial to the realization of a real-time system by combining a cloud processing technology.
Drawings
FIG. 1 is a schematic diagram of a sound conversion system
FIG. 2 is a block diagram of an implementation of the present invention based on a floating point DSP and ARM system.
Detailed Description
The present invention is further illustrated in the accompanying drawings and detailed description which are to be understood as being merely illustrative of the invention and not limiting of its scope, and various equivalent modifications to the invention will fall within the scope of the appended claims to the skilled person after reading the invention.
A digital real-time voice pitch-shifting method, as shown in fig. 1, comprises the following 7 parts:
1. distinguishing unvoiced sound from non-unvoiced sound (voiced sound, voiced consonant, fricative sound) in the voice through phonological segmentation;
2. non-unvoiced sounds (voiced sounds, voiced consonants and fricatives) are decomposed through linear prediction, and original voice is divided into two parts of an original fundamental tone model and an original acoustic model;
3. the original fundamental tone can be adjusted according to actual demands, such as changing the fundamental tone frequency, changing the fundamental tone frequency change speed and the like;
4. comparing the adjusted fundamental tone with fundamental tone information in a fundamental tone library of specific people to find out the fundamental tone signal which meets the requirement best;
5. reconstructing and optimizing the pitch information to obtain a corrected pitch signal;
6. the corrected fundamental tone and the sound channel model are subjected to voice synthesis to form a non-unvoiced sound signal after sound change;
7. and synthesizing the unvoiced signal and the non-unvoiced signal after the sound change to form an adjusted voice signal.
The initial and final segmentation is used for distinguishing unvoiced and non-unvoiced parts in the voice, wherein the non-unvoiced parts comprise voiced sounds, turbid consonants and fricatives, and in the comprehensive process, the system superimposes the adjusted non-unvoiced sounds and original unvoiced sounds to form a new voice signal after the voice is changed.
The speaker-specific pitch pool is mainly derived from the analysis and extraction of the speaker-specific speech, including pitch signals during common syllable and word pronunciation. A specific training procedure is required for pitch library establishment for a specific person.
The voice is decomposed into two parts of a vocal tract model and an original fundamental tone through linear prediction, wherein parameters of the vocal tract model are reserved for later voice synthesis.
The original pitch is adjusted according to the requirements of users, including adjusting the pitch frequency, adjusting the change speed of the pitch frequency, and the like.
The adjusted original pitch is compared with all pitch signals in a pitch library of specific people, the most similar pitch signal fragments are obtained through methods of correlation comparison, pattern matching, machine learning and the like, certain optimization is carried out, the purpose of optimization is mainly to ensure the continuity of the pitch, improve the naturalness of the voice and finally form the corrected pitch.
The specific person pitch library can be stored in the cloud system, and meanwhile, the efficiency and the utilization rate of the system are improved by utilizing a special retrieval system.
The modified pitch and the vocal tract model are synthesized to form a modified non-unvoiced speech segment.
The pitch-changing system adjusts and analyzes the non-unvoiced part of the original voice, extracts signals in the specific person pitch bank to replace the original pitch according to the comparison result, and further obtains a pitch-changing signal through synthesis and superposition operation. The pitch pool for a particular person results from the analysis and extraction of speech for the particular person.
As shown in fig. 2, the whole system is realized based on a floating point DSP and an ARM system:
1. ARM transmits the adjustment requirement of the system to the floating point DSP;
2. microphone acquisition data is transmitted to a floating-point DSP through an ADC (analog-to-digital converter) to be used as system input;
3. the floating-point DSP feeds signals to a loudspeaker through a DAC (digital-to-analog converter) for playback as system output;
4. the floating-point DSP realizes the functions of initial and final segmentation, linear prediction and the like, and extracts the original fundamental tone of a non-unvoiced signal;
5. the floating point DSP adjusts the original fundamental tone and transmits the adjusted original fundamental tone to the cloud end through the ARM;
6. the cloud compares the adjusted original fundamental tone with a fundamental tone library of specific people, finds out the most similar fundamental tone signal, and transmits the signal back to the floating point DSP;
the floating-point DSP synthesizes the adjusted fundamental tone and the sound channel model into a non-unvoiced sound signal, and further superimposes the non-unvoiced sound signal with the original unvoiced sound signal to form a voice signal after sound change.
The foregoing is only a preferred embodiment of the invention, it being noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present invention, and such modifications and adaptations are intended to be comprehended within the scope of the invention.

Claims (6)

1. A digital real-time voice conversion method, comprising the steps of:
step 1, distinguishing unvoiced sound from non-unvoiced sound in voice through initial and final segmentation;
step 2, decomposing the non-unvoiced sound through linear prediction, and dividing the original voice into two parts of an original fundamental tone model and an original acoustic model;
step 3, the original fundamental tone is adjusted according to the actual demand;
step 4, comparing the adjusted fundamental tone with the fundamental tone information in the fundamental tone library of the specific person to find out the fundamental tone signal which meets the requirement most; the specific person pitch library mainly comes from the content for analyzing and extracting the voice of the specific person, and comprises pitch signals of common syllables and words corresponding to the specific person in the pronunciation process; comparing the adjusted original fundamental tone with all fundamental tone signals in a fundamental tone library of a specific person, and obtaining the most similar fundamental tone signal fragments through a correlation comparison method, a pattern matching method or a machine learning method;
step 5, reconstructing and optimizing the pitch information to obtain a corrected pitch signal;
step 6, the corrected fundamental tone and the sound channel model are subjected to voice synthesis to form a non-unvoiced sound signal after the sound change;
step 7, synthesizing the original unvoiced sound signal and the non-unvoiced sound signal to form an adjusted voice signal; and the DSP synthesizes the adjusted fundamental tone and sound channel model into a non-unvoiced sound signal and further overlaps the original unvoiced sound model to form a voice signal after the sound change.
2. The digital real-time voice conversion method according to claim 1, wherein: in the step 2, the voice is decomposed into two parts of a sound channel model and an original fundamental tone through linear prediction, wherein parameters of the sound channel model are reserved for later voice synthesis.
3. The digital real-time voice conversion method according to claim 2, wherein: the specific person pitch library is stored in the cloud system, and a special real-time retrieval system is utilized.
4. A digital real-time voice conversion method according to claim 3, wherein: the method is realized by adopting a DSP and ARM system.
5. The digital real-time voice conversion method according to claim 4, wherein: the DSP realizes the functions of initial and final segmentation and linear prediction and extracts the original fundamental tone of the non-unvoiced sound signal.
6. The digital real-time voice conversion method according to claim 5, wherein: in step 3, the step of adjusting the original pitch according to the actual requirement comprises changing the pitch frequency and/or changing the change speed of the pitch frequency.
CN201811342131.2A 2018-11-12 2018-11-12 Digital real-time voice sound changing method Active CN109616131B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811342131.2A CN109616131B (en) 2018-11-12 2018-11-12 Digital real-time voice sound changing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811342131.2A CN109616131B (en) 2018-11-12 2018-11-12 Digital real-time voice sound changing method

Publications (2)

Publication Number Publication Date
CN109616131A CN109616131A (en) 2019-04-12
CN109616131B true CN109616131B (en) 2023-07-07

Family

ID=66003036

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811342131.2A Active CN109616131B (en) 2018-11-12 2018-11-12 Digital real-time voice sound changing method

Country Status (1)

Country Link
CN (1) CN109616131B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110364177A (en) * 2019-07-11 2019-10-22 努比亚技术有限公司 Method of speech processing, mobile terminal and computer readable storage medium
CN110942765B (en) * 2019-11-11 2022-05-27 珠海格力电器股份有限公司 Method, device, server and storage medium for constructing corpus
CN111739547B (en) * 2020-07-24 2020-11-24 深圳市声扬科技有限公司 Voice matching method and device, computer equipment and storage medium
CN113486964A (en) * 2021-07-13 2021-10-08 盛景智能科技(嘉兴)有限公司 Voice activity detection method and device, electronic equipment and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1152776A (en) * 1995-10-26 1997-06-25 索尼公司 Method and arrangement for phoneme signal duplicating, decoding and synthesizing
CN1567428A (en) * 2003-06-19 2005-01-19 北京中科信利技术有限公司 Phoneme changing method based on digital signal processing
CN101354889A (en) * 2008-09-18 2009-01-28 北京中星微电子有限公司 Method and apparatus for tonal modification of voice
CN101399044A (en) * 2007-09-29 2009-04-01 国际商业机器公司 Voice conversion method and system
CN101510424A (en) * 2009-03-12 2009-08-19 孟智平 Method and system for encoding and synthesizing speech based on speech primitive
CN102592590A (en) * 2012-02-21 2012-07-18 华南理工大学 Arbitrarily adjustable method and device for changing phoneme naturally
CN102982809A (en) * 2012-12-11 2013-03-20 中国科学技术大学 Conversion method for sound of speaker
CN103489443A (en) * 2013-09-17 2014-01-01 湖南大学 Method and device for imitating sound
CN203386472U (en) * 2013-04-26 2014-01-08 天津科技大学 Character voice changer
CN105023570A (en) * 2014-04-30 2015-11-04 安徽科大讯飞信息科技股份有限公司 method and system of transforming speech
CN107924678A (en) * 2015-09-16 2018-04-17 株式会社东芝 Speech synthetic device, phoneme synthesizing method, voice operation program, phonetic synthesis model learning device, phonetic synthesis model learning method and phonetic synthesis model learning program
CN108682413A (en) * 2018-04-24 2018-10-19 上海师范大学 A kind of emotion direct system based on voice conversion

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW430778B (en) * 1998-06-15 2001-04-21 Yamaha Corp Voice converter with extraction and modification of attribute data

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1152776A (en) * 1995-10-26 1997-06-25 索尼公司 Method and arrangement for phoneme signal duplicating, decoding and synthesizing
CN1567428A (en) * 2003-06-19 2005-01-19 北京中科信利技术有限公司 Phoneme changing method based on digital signal processing
CN101399044A (en) * 2007-09-29 2009-04-01 国际商业机器公司 Voice conversion method and system
CN101354889A (en) * 2008-09-18 2009-01-28 北京中星微电子有限公司 Method and apparatus for tonal modification of voice
CN101510424A (en) * 2009-03-12 2009-08-19 孟智平 Method and system for encoding and synthesizing speech based on speech primitive
CN102592590A (en) * 2012-02-21 2012-07-18 华南理工大学 Arbitrarily adjustable method and device for changing phoneme naturally
CN102982809A (en) * 2012-12-11 2013-03-20 中国科学技术大学 Conversion method for sound of speaker
CN203386472U (en) * 2013-04-26 2014-01-08 天津科技大学 Character voice changer
CN103489443A (en) * 2013-09-17 2014-01-01 湖南大学 Method and device for imitating sound
CN105023570A (en) * 2014-04-30 2015-11-04 安徽科大讯飞信息科技股份有限公司 method and system of transforming speech
CN107924678A (en) * 2015-09-16 2018-04-17 株式会社东芝 Speech synthetic device, phoneme synthesizing method, voice operation program, phonetic synthesis model learning device, phonetic synthesis model learning method and phonetic synthesis model learning program
CN108682413A (en) * 2018-04-24 2018-10-19 上海师范大学 A kind of emotion direct system based on voice conversion

Also Published As

Publication number Publication date
CN109616131A (en) 2019-04-12

Similar Documents

Publication Publication Date Title
CN109616131B (en) Digital real-time voice sound changing method
US9135923B1 (en) Pitch synchronous speech coding based on timbre vectors
Qian et al. A unified trajectory tiling approach to high quality speech rendering
CN111462769B (en) End-to-end accent conversion method
WO2013133768A1 (en) Method and system for template-based personalized singing synthesis
CN104081453A (en) System and method for acoustic transformation
CN101930747A (en) Method and device for converting voice into mouth shape image
Aryal et al. Foreign accent conversion through voice morphing.
CN113436606B (en) Original sound speech translation method
CN111210803A (en) System and method for training clone timbre and rhythm based on Bottleneck characteristics
CN112382308A (en) Zero-order voice conversion system and method based on deep learning and simple acoustic features
CN111724809A (en) Vocoder implementation method and device based on variational self-encoder
Toth et al. Synthesizing speech from electromyography using voice transformation techniques
Zhang et al. AccentSpeech: learning accent from crowd-sourced data for target speaker TTS with accents
CN110570842B (en) Speech recognition method and system based on phoneme approximation degree and pronunciation standard degree
Eichner et al. Voice characteristics conversion for TTS using reverse VTLN
Aso et al. Speakbysinging: Converting singing voices to speaking voices while retaining voice timbre
Xie et al. End-to-End Voice Conversion with Information Perturbation
CN113744715A (en) Vocoder speech synthesis method, device, computer equipment and storage medium
Krug et al. Articulatory synthesis for data augmentation in phoneme recognition
Lian et al. ARVC: An Auto-Regressive Voice Conversion System Without Parallel Training Data.
Zheng et al. Bandwidth extension WaveNet for bone-conducted speech enhancement
KR101095867B1 (en) Apparatus and method for producing speech
González-Docasal et al. Exploring the limits of neural voice cloning: A case study on two well-known personalities
Nguyen et al. Spectral modification for voice gender conversion using temporal decomposition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant