KR101706123B1

KR101706123B1 - User-customizable voice revision method of converting voice by parameter modification and voice revision device implementing the same

Info

Publication number: KR101706123B1
Application number: KR1020150060946A
Authority: KR
Inventors: 김남수; 권기수; 배수현; 강우현
Original assignee: 서울대학교산학협력단
Priority date: 2015-04-29
Filing date: 2015-04-29
Publication date: 2017-02-13
Also published as: KR20160128871A

Abstract

본 발명은 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법 및 이를 구현하는 음성 보정 장치에 관한 것으로서, 보다 구체적으로는, 사용자 맞춤형 음성 보정 장치가, (1) 원시 음성 데이터를 입력받는 단계; (2) 상기 입력된 원시 음성 데이터로부터 피치, 특성 파형(CW), 파워 및 선스펙트럼 주파수(LSF)를 포함한 음성 파라미터들을 추출하는 단계; (3) 상기 추출된 음성 파라미터들에 대해, 미리 설정된 복수의 특이 음성이 갖는 제1 특징들을 미리 설정된 정상 음성이 갖는 제2 특징들로 변경하는 단계; 및 (4) 상기 변경된 제2 특징을 갖는 음성 파라미터들에 기초하여, 보정된 음성 데이터를 합성하는 단계를 포함하는 것을 그 구성상의 특징으로 한다.
본 발명에서 제안하고 있는 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법 및 이를 구현하는 음성 보정 장치에 따르면, 사용자 맞춤형 음성 보정 장치가, 원시 음성 데이터를 입력받고, 입력된 원시 음성 데이터로부터 피치, 특성 파형(CW), 파워 및 선스펙트럼 주파수(LSF)를 포함한 음성 파라미터들을 추출하며, 추출된 음성 파라미터들에 대해, 미리 설정된 복수의 특이 음성이 갖는 제1 특징들을 미리 설정된 정상 음성이 갖는 제2 특징들로 변경하고, 변경된 제2 특징을 갖는 음성 파라미터들에 기초하여, 보정된 음성 데이터를 합성함으로써, 쉰(hoarse) 음성, 거친(rough) 음성, 숨찬(breathy) 음성 및 비음(nasal) 음성과 같은 특이 목소리를 분석 및 개선할 수 있고, 음색을 차가운 느낌 또는 따뜻한 느낌으로 변경할 수 있다.
또한, 음성 통화를 통해 식별되기 어려운 음성이나 화자가 갖고 있는 불만족스러운 음성을, 또렷하거나 원하는 음색의 음성으로 변경할 수 있게 하고, 화자가 갖는 발성 기관의 신체적 문제를 보조할 수 있으며, 사용자가 어필하고 싶은 음색의 음성을 생성하게 함으로써, 사용자의 욕구에 맞추어 다양하게 음성을 보정할 수 있다.The present invention relates to a user-customized voice correction method for converting a tone color by parameter change and a voice correction apparatus for implementing the same. More specifically, the user-customized voice correction apparatus includes: (1) receiving raw voice data; (2) extracting speech parameters including pitch, characteristic waveform (CW), power and line spectrum frequency (LSF) from the input original speech data; (3) changing, with respect to the extracted speech parameters, first characteristics of a plurality of predetermined specific sounds to second features having a predetermined normal speech; And (4) synthesizing the corrected voice data based on the voice parameters having the changed second characteristic.
According to the user-customized speech correction method and the speech correction apparatus embodying the present invention, the user-customized speech correction apparatus receives the original speech data and extracts the pitch , The characteristic waveform (CW), the power and the line spectrum frequency (LSF), extracts first characteristics of a predetermined plurality of specific sounds with respect to the extracted speech parameters, Speech, breathy speech, and nasal speech by synthesizing the corrected speech data based on the speech parameters having the modified second characteristic, It is possible to analyze and improve specific voices such as voice, and change the tone to a cold feeling or a warm feeling.
In addition, it is possible to change a voice which is difficult to be identified through voice communication or an unsatisfactory voice of a speaker to a voice of a clear or desired tone color, assists a physical problem of a vocal institution of the speaker, It is possible to variously correct the voice according to the user's desire.

Description

TECHNICAL FIELD [0001] The present invention relates to a user-customized voice correction method for converting a tone color by parameter change, and a voice correction device for implementing the same. [0002]

본 발명은 음성 보정 방법 및 이를 구현하는 장치에 관한 것으로서, 보다 구체적으로는 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법 및 이를 구현하는 음성 보정 장치에 관한 것이다.More particularly, the present invention relates to a user-customized voice correction method for converting a tone color by parameter change and a voice correction apparatus for implementing the same.

음성 코덱(codec)이란, 음성 신호를 디지털 신호로 변환하는 코더/인코더(coder/encoder)와 그 반대로 변환시켜주는 디코더(decoder)를 합성한 용어로서, MP3, AC3, AAC, OGG, WMA 등과 같은 코덱 기술이 알려져 있다.
The voice codec is a combination of a coder / encoder for converting a voice signal into a digital signal and a decoder for converting the voice signal to a coder / encoder in a reverse manner. The voice codec is a combination of MP3, AC3, AAC, OGG and WMA Codec technology is known.

음성 코덱의 한 종류인 파형 보간(waveform interpolation, WI) 방법은, 도 1과 같이 음성의 파라미터를 추출하고 음성을 합성하는 과정을 거친다. 도 1은 종래기술에 의한 음성 코덱 중 파형 보간 방법의 실행 과정을 개념적으로 도시한 도면이다. 도 1에 도시된 바와 같이, 종래기술에 의한 파형 보간 방법은, 음성 통신을 위해 전송할 수 있는 대역폭이 제한적이므로, 원래의 파라미터를 양자화(quantization)시킴으로써 데이터량을 감소시키는 처리를 수행한다. 보다 구체적으로, 인코더 측에서 디지털화된 음성이 분석기에 입력되면(분석기-합성기 레이어), 입력된 음성의 파라미터들이 양자화된다(양자화 레이어). 양자화 이후에는, 파라미터들이 다시 역양자화(dequantization)되어(양자화 레이어), 합성기로 전달되고, 합성기로부터 디지털화된 음성이 출력된다.
A method of waveform interpolation (WI), which is one kind of speech codec, is a process of extracting parameters of speech and synthesizing speech as shown in FIG. 1 is a diagram conceptually illustrating a process of executing a waveform interpolation method among voice codecs according to the related art. As shown in FIG. 1, the waveform interpolation method according to the related art performs a process of reducing the amount of data by quantizing the original parameters since the bandwidth that can be transmitted for voice communication is limited. More specifically, when the digitized voice is input to the analyzer (analyzer-synthesizer layer) on the encoder side, the parameters of the input voice are quantized (quantization layer). After quantization, the parameters are dequantized again (quantization layer), passed to the synthesizer, and the digitized voice is output from the synthesizer.

이와 관련하여, 대한민국 등록특허공보 제10-0768090호(2007.10.17.)에서는 디코딩에서의 재정렬 파라미터의 계산량을 감소시키기 위한 파형 보간 방법 및 장치를 개시하고 있고, 대한민국 공개특허공보 제10-2001-0087391호(2001.09.15.)에서는 인코딩에 필요한 비트를 최소화시키는 음성 세그먼트를 위한 시간 동기식 파형 보간법을 개시하고 있다.
In this regard, Korean Patent Registration No. 10-0768090 (Oct. 7, 2007) discloses a waveform interpolation method and apparatus for reducing the amount of calculation of reordering parameters in decoding, and Korean Patent Application Publication No. 10-2001- 0087391 (Sep. 15, 2001) discloses a time synchronous waveform interpolation method for a speech segment that minimizes the bits required for encoding.

그러나 이러한 종래기술들에서는, 지터(jitter), 쉬머(shimmer), HNR(harmonic-to-noise ratio) 등과 같은 비정상 음성 요소(irregular component)의 수치들을 사용하여 음성 품질을 수량화하기 위한 연구가 주로 진행될 뿐, 사용자의 음성을 적절하게 보정하기 위한 기술은 개시되지 않은 한계가 있다. 또한, 사용자의 일반적인 목소리를 다각도에서 보정하기 위한 기술은 개시되지 않은 문제가 있다.However, in these conventional techniques, studies are mainly conducted to quantify the speech quality by using the values of the irregular component such as jitter, shimmer, harmonic-to-noise ratio (HNR) However, a technique for correcting the user's voice appropriately is not disclosed. Further, a technique for correcting the user's general voice at various angles has a problem not disclosed.

본 발명은 기존에 제안된 방법들의 상기와 같은 문제점들을 해결하기 위해 제안된 것으로서, 사용자 맞춤형 음성 보정 장치가, 원시 음성 데이터를 입력받고, 입력된 원시 음성 데이터로부터 피치, 특성 파형(CW), 파워 및 선스펙트럼 주파수(LSF)를 포함한 음성 파라미터들을 추출하며, 추출된 음성 파라미터들에 대해, 미리 설정된 복수의 특이 음성이 갖는 제1 특징들을 미리 설정된 정상 음성이 갖는 제2 특징들로 변경하고, 변경된 제2 특징을 갖는 음성 파라미터들에 기초하여, 보정된 음성 데이터를 합성함으로써, 쉰(hoarse) 음성, 거친(rough) 음성, 숨찬(breathy) 음성 및 비음(nasal) 음성과 같은 특이 목소리를 분석 및 개선할 수 있고, 음색을 차가운 느낌 또는 따뜻한 느낌으로 변경할 수 있는, 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법 및 이를 구현하는 음성 보정 장치를 제공하는 것을 그 목적으로 한다.
The present invention has been proposed in order to solve the above-mentioned problems of the previously proposed methods, and a user-customized speech correction apparatus receives raw speech data and extracts pitch, characteristic waveform (CW), power And a line spectrum frequency (LSF), and for the extracted speech parameters, changing the first characteristics of a plurality of predetermined specific sounds to the second characteristics having a predetermined normal speech, By synthesizing the corrected speech data based on the speech parameters having the second characteristic, a specific voice such as hoarse speech, rough speech, breathy speech and nasal speech can be analyzed and / User-customizable voices that can change the tone by changing the parameter, which can be improved and change the voices to a cold feeling or warm feeling The method and to provide an audio correction device to implement this as its object.

또한, 본 발명은, 음성 통화를 통해 식별되기 어려운 음성이나 화자가 갖고 있는 불만족스러운 음성을, 또렷하거나 원하는 음색의 음성으로 변경할 수 있게 하고, 화자가 갖는 발성 기관의 신체적 문제를 보조할 수 있으며, 사용자가 어필하고 싶은 음색의 음성을 생성하게 함으로써, 사용자의 욕구에 맞추어 다양하게 음성을 보정할 수 있는, 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법 및 이를 구현하는 음성 보정 장치를 제공하는 것을 다른 목적으로 한다.Further, the present invention can change the voice which is difficult to be identified through voice communication or the unsatisfactory voice of the speaker to a voice of a clear or desired tone color, assists a physical problem of a vocal organ of a speaker, There is provided a user-customized voice correction method for converting a tone color by parameter change and capable of correcting various voices in accordance with a user's desire by causing a user to generate a voice of a tone to be appealed, and a voice correction device implementing the same For other purposes.

상기한 목적들을 달성하기 위한 본 발명의 특징에 따른 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법은,According to an aspect of the present invention, there is provided a method of converting a tone color into a user-

사용자 맞춤형 음성 보정 장치가,A user-

(1) 원시 음성 데이터를 입력받는 단계;(1) receiving raw voice data;

(2) 상기 입력된 원시 음성 데이터로부터 피치, 특성 파형(CW), 파워 및 선스펙트럼 주파수(LSF)를 포함한 음성 파라미터들을 추출하는 단계;(2) extracting speech parameters including pitch, characteristic waveform (CW), power and line spectrum frequency (LSF) from the input original speech data;

(3) 상기 추출된 음성 파라미터들에 대해, 미리 설정된 복수의 특이 음성이 갖는 제1 특징들을 미리 설정된 정상 음성이 갖는 제2 특징들로 변경하는 단계; 및(3) changing, with respect to the extracted speech parameters, first characteristics of a plurality of predetermined specific sounds to second features having a predetermined normal speech; And

(4) 상기 변경된 제2 특징을 갖는 음성 파라미터들에 기초하여, 보정된 음성 데이터를 합성하는 단계를 포함하는 것을 그 구성상의 특징으로 한다.
(4) synthesizing the corrected voice data based on the voice parameters having the changed second characteristic.

바람직하게는, 상기 단계 (3)에서,Preferably, in said step (3)

상기 특이 음성은, 쉰(hoarse) 음성, 거친(rough) 음성, 숨찬(breathy) 음성 및 비음(nasal) 음성을 포함하는 군에서 선택된 적어도 하나를 포함하여 구성될 수 있다.
The specific voice may comprise at least one selected from the group consisting of a hoarse voice, a rough voice, a breathy voice, and a nasal voice.

더욱 바람직하게는, 상기 단계 (3)에서,More preferably, in the step (3)

상기 제1 특징들의 제2 특징들로의 변경은,A change to the second features of the first features,

상기 특이 음성이 갖는 피치를 미리 설정된 범위 내에서 안정화(stabilize)시키는 것을 포함하여 구성될 수 있다.
And stabilizing the pitch of the specific voice within a preset range.

상기 특이 음성이 갖는 선스펙트럼 쌍(LSP)을 보정하여 스펙트럼 포락선(spectral envelope)을 재구성하는 것을 포함하여 구성될 수 있다.
And reconstructing a spectral envelope by correcting a line spectrum pair (LSP) of the specific speech.

상기 특이 음성이 갖는 특성 파형을 SEW(smoothly evolving waveform) 성분 및 REW(rapidly evolving waveform) 성분으로 분리하고, 상기 분리된 SEW 성분 및 REW 성분을 미리 설정된 범위 내에서 조절하는 것을 포함하여 구성될 수 있다.
Separating the characteristic waveform of the specific speech into a smoothly evolving waveform (SEW) component and a rapidly evolving waveform (REW) component, and adjusting the separated SEW component and REW component within a predetermined range .

바람직하게는,Preferably,

상기 단계 (1) 내지 단계 (4)의 사이에는,Between steps (1) to (4) above,

상기 원시 음성 데이터를 양자화 시키는 단계가 포함되지 않도록 구성될 수 있다.
And the step of quantizing the raw speech data is not included.

바람직하게는, 상기 단계 (3)에서는,Preferably, in the step (3)

(3-1) 상기 추출된 음성 파라미터들에 대해, 미리 설정된 제1 음색 범위로부터 미리 설정된 제2 음색 범위로 변경하는 단계를 포함하고,(3-1) changing the extracted voice parameters from a preset first tone color range to a predetermined second tone color range,

상기 제1 음색 범위 및 제2 음색 범위 중 어느 하나는 차가운 느낌의 음색 범위이고, 다른 하나는 따뜻한 느낌의 음색 범위인 것으로 구성될 수 있다.One of the first tone color range and the second tone color range may be a tone color range of a cold feeling and the other tone color range may be a warm tone color range.

본 발명에서 제안하고 있는 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법 및 이를 구현하는 음성 보정 장치에 따르면, 사용자 맞춤형 음성 보정 장치가, 원시 음성 데이터를 입력받고, 입력된 원시 음성 데이터로부터 피치, 특성 파형(CW), 파워 및 선스펙트럼 주파수(LSF)를 포함한 음성 파라미터들을 추출하며, 추출된 음성 파라미터들에 대해, 미리 설정된 복수의 특이 음성이 갖는 제1 특징들을 미리 설정된 정상 음성이 갖는 제2 특징들로 변경하고, 변경된 제2 특징을 갖는 음성 파라미터들에 기초하여, 보정된 음성 데이터를 합성함으로써, 쉰(hoarse) 음성, 거친(rough) 음성, 숨찬(breathy) 음성 및 비음(nasal) 음성과 같은 특이 목소리를 분석 및 개선할 수 있고, 음색을 차가운 느낌 또는 따뜻한 느낌으로 변경할 수 있다.
According to the user-customized speech correction method and the speech correction apparatus embodying the present invention, the user-customized speech correction apparatus receives the original speech data and extracts the pitch , The characteristic waveform (CW), the power and the line spectrum frequency (LSF), extracts first characteristics of a predetermined plurality of specific sounds with respect to the extracted speech parameters, Speech, breathy speech, and nasal speech by synthesizing the corrected speech data based on the speech parameters having the modified second characteristic, It is possible to analyze and improve specific voices such as voice, and change the tone to a cold feeling or a warm feeling.

또한, 음성 통화를 통해 식별되기 어려운 음성이나 화자가 갖고 있는 불만족스러운 음성을, 또렷하거나 원하는 음색의 음성으로 변경할 수 있게 하고, 화자가 갖는 발성 기관의 신체적 문제를 보조할 수 있으며, 사용자가 어필하고 싶은 음색의 음성을 생성하게 함으로써, 사용자의 욕구에 맞추어 다양하게 음성을 보정할 수 있다.In addition, it is possible to change a voice which is difficult to be identified through voice communication or an unsatisfactory voice of a speaker to a voice of a clear or desired tone color, assists a physical problem of a vocal institution of the speaker, It is possible to variously correct the voice according to the user's desire.

도 1은 종래기술에 의한 음성 코덱 중 파형 보간 방법의 실행 과정을 개념적으로 도시한 도면.
도 2는 본 발명의 일실시예에 따른 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법의 흐름을 도시한 도면.
도 3은 본 발명의 일실시예에 따른 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법을 개념적으로 도시한 도면.
도 4는 본 발명의 일실시예에 따른 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법 중 WI 분석기에서 수행되는 과정을 블록 도시한 도면.
도 5는 본 발명의 일실시예에 따른 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법 중 WI 합성기에서 수행되는 과정을 블록 도시한 도면.BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 conceptually illustrates the execution of a waveform interpolation method among speech codecs according to the prior art; FIG.
2 illustrates a flow of a user-customized speech correction method for converting a tone color by parameter change according to an embodiment of the present invention.
3 is a conceptual view illustrating a user-customized voice correction method for converting a tone color by parameter change according to an embodiment of the present invention.
FIG. 4 is a block diagram illustrating a process performed in a WI analyzer of a user-customized voice correction method for converting a tone color by parameter change according to an exemplary embodiment of the present invention; FIG.
FIG. 5 is a block diagram illustrating a process performed in a WI synthesizer of a user-customized speech correction method for converting a tone color according to an exemplary embodiment of the present invention; FIG.

이하에서는 첨부된 도면을 참조하여 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 바람직한 실시예를 상세히 설명한다. 다만, 본 발명의 바람직한 실시예를 상세하게 설명함에 있어, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다. 또한, 유사한 기능 및 작용을 하는 부분에 대해서는 도면 전체에 걸쳐 동일 또는 유사한 부호를 사용한다.
Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. In the following detailed description of the preferred embodiments of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. The same or similar reference numerals are used throughout the drawings for portions having similar functions and functions.

덧붙여, 명세서 전체에서, 어떤 부분이 다른 부분과 ‘연결’되어 있다고 할 때, 이는 ‘직접적으로 연결’되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고 ‘간접적으로 연결’되어 있는 경우도 포함한다. 또한, 어떤 구성요소를 ‘포함’한다는 것은, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다.
In addition, in the entire specification, when a part is referred to as being 'connected' to another part, it may be referred to as 'indirectly connected' not only with 'directly connected' . Also, to "include" an element means that it may include other elements, rather than excluding other elements, unless specifically stated otherwise.

본 발명은 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법 및 이를 구현하는 음성 보정 장치에 관한 것으로서, 본 발명의 특징에 따른 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 장치는 음성 데이터가 저장되는 메모리와, 저장된 음성 데이터를 처리하는 마이크로프로세서를 포함하여 구성될 수 있다. 예를 들어, 이러한 음성 보정 장치는, 마이크로폰에 전기적으로 연결되는 휴대용 단말기, 통신 단말기, 개인용 컴퓨터, 노트북, PDA, 스마트폰, 태블릿 PC, MP3 플레이어 등을 포함할 수 있다. 음성 보정 장치에서 수행되는 음성 데이터의 처리는 후술하는 사용자 맞춤형 음성 보정 방법을 통해 상세히 설명하도록 한다.
The present invention relates to a user-customized voice correction method for converting a tone color by parameter change and a voice correction device for implementing the same, and a user-customized voice correction device for converting a voice tone by changing a parameter according to the feature of the present invention, A memory to be stored, and a microprocessor for processing the stored voice data. For example, such a voice correction device may include a portable terminal electrically connected to a microphone, a communication terminal, a personal computer, a notebook, a PDA, a smart phone, a tablet PC, an MP3 player, and the like. The processing of the voice data performed by the voice correction apparatus will be described in detail through a user-customized voice correction method described later.

도 2는 본 발명의 일실시예에 따른 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법의 흐름을 도시한 도면이다. 도 2에 도시된 바와 같이, 본 발명의 일실시예에 따른 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법은, 원시 음성 데이터를 입력받는 단계(S510), 입력된 원시 음성 데이터로부터 피치, 특성 파형(CW), 파워 및 선스펙트럼 주파수(LSF)를 포함한 음성 파라미터들을 추출하는 단계(S530), 추출된 음성 파라미터들에 대해, 미리 설정된 복수의 특이 음성이 갖는 제1 특징들을 미리 설정된 정상 음성이 갖는 제2 특징들로 변경하는 단계(S550), 및 변경된 제2 특징을 갖는 음성 파라미터들에 기초하여, 보정된 음성 데이터를 합성하는 단계(S570)를 포함하여 구성될 수 있다. 도 2에 도시된 바와 같이, 본 발명의 일실시예에 따른 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법에서는, 도 1과 달리, 원시 음성 데이터를 양자화 시키고, 이를 다시 역양자화 시키는 단계가 포함되지 않는다. 이하에서는, 첨부된 도면을 참조하여 본 발명의 일실시예에 따른 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법의 각각의 단계를 보다 상세히 설명하도록 한다.
FIG. 2 is a flowchart illustrating a user-customized voice correction method for converting a tone color by parameter change according to an exemplary embodiment of the present invention. Referring to FIG. As shown in FIG. 2, a user-customized voice correction method for converting a tone color by parameter change according to an embodiment of the present invention includes receiving (S510) raw voice data, (S530) extracting voice parameters including a characteristic waveform (CW), power and a line spectrum frequency (LSF), extracting first characteristics of a predetermined plurality of specific sounds with respect to the extracted voice parameters, (S550), and synthesizing the corrected voice data based on the voice parameters having the changed second characteristic (S570). 2, in the user-customized speech correction method for converting a tone color by parameter change according to an embodiment of the present invention, unlike FIG. 1, a step of quantizing raw speech data and dequantizing it again not included. Hereinafter, each step of the user-customized voice correction method for converting a tone color by parameter change according to an embodiment of the present invention will be described in more detail with reference to the accompanying drawings.

도 3은 본 발명의 일실시예에 따른 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법을 개념적으로 도시한 도면이다. 도 2 및 도 3에 도시된 바와 같이, 단계 S510에서는, 사용자 맞춤형 음성 보정 장치가 원시 음성 데이터를 입력받을 수 있다. 입력된 원시 음성 데이터에는 특이 음성으로 구분될 수 있는 특징이 포함될 수 있다. 예를 들어, 원시 음성 데이터는 쉰 소리, 숨소리, 거친 소리, 콧소리 등과 같은 특징이 포함될 수 있다. 이어서, 단계 S530에서는, 사용자 맞춤형 음성 보정 장치가, 입력된 원시 음성 데이터로부터 피치, 특성 파형(CW), 파워 및 선스펙트럼 주파수(LSF)를 포함한 음성 파라미터들을 추출할 수 있다. 단계 S510 및 S530에서와 같이, 원시 음성 데이터로부터 음성 파라미터들을 추출하는 과정은 도 4를 참조하여 보다 상세히 설명하도록 한다.
3 is a conceptual diagram illustrating a user-customized voice correction method for converting a tone color by changing a parameter according to an embodiment of the present invention. As shown in FIGS. 2 and 3, in step S510, the user-customized voice correction apparatus can receive the raw voice data. The inputted raw voice data may include a characteristic that can be distinguished as a specific voice. For example, raw speech data may include features such as hoarseness, breath, harshness, nasalance, and the like. Then, in step S530, the user-customized speech correction apparatus can extract speech parameters including pitch, characteristic waveform (CW), power and line spectrum frequency (LSF) from the inputted original speech data. The process of extracting the voice parameters from the raw voice data, as in steps S510 and S530, will be described in more detail with reference to FIG.

도 4는 본 발명의 일실시예에 따른 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법 중 WI 분석기에서 수행되는 과정을 블록 도시한 도면이다. 도 4에 도시된 바와 같이, WI 분석기(100)에서는, 입력된 원시 음성 데이터에 대하여 LPC(linear predictive coding)를 분석하고(130), LSF(line spectral frequency)를 보간하며(120), 예컨대, 프레임당 8 세트로 LP(linear prediction) 분석 필터(110)에 제공될 수 있다. 이 과정에서, LSF는 프레임당 1 세트로 추출될 수 있다.
FIG. 4 is a block diagram illustrating a process performed by a WI analyzer in a user-customized speech correction method for converting a tone color by parameter change according to an exemplary embodiment of the present invention. 4, the WI analyzer 100 analyzes (130) LPC (linear predictive coding), interpolates a line spectral frequency (LSF) (120) for the inputted raw speech data, May be provided to the LP (linear prediction) analysis filter 110 in eight sets per frame. In this process, LSF can be extracted as one set per frame.

한편, LP 분석 필터(110)를 통과한 잔여(residual) 성분으로부터 피치(pitch)가 측정되고(140), 피치가 보간되며(150), 예컨대, 프레임당 8개의 피치로 특성 파형(characteristics waveform, CW)이 추출(160)될 수 있다. 이 과정에서, 피치는 프레임당 1의 비율로 추출될 수 있다.
Meanwhile, a pitch is measured 140 from a residual component that has passed through the LP analysis filter 110 and a pitch is interpolated 150. For example, a characteristic waveform at eight pitches per frame, CW) may be extracted 160. In this process, the pitch can be extracted at a rate of 1 per frame.

이어서, 추출된 특성 파형(CW)으로부터 파형을 정렬하고(170), 파워(power)를 계산(180)하여 정규화(190)하며, 그에 따라, 예컨대, 프레임당 8의 비율로 파워(power)가 추출되고, 동일한 비율로 정규화된 특성 파형이 추출될 수 있다.
The waveform is then sorted (170) from the extracted characteristic waveform (CW) and normalized (190) by calculating 180 power so that power at a rate of, for example, 8 per frame And the characteristic waveform normalized at the same rate can be extracted.

이러한 과정을 통해 음성의 피치(pitch), 특성 파형(CW), 파워(power) 및 선스펙트럼 주파수(LSF)를 포함한 음성 파라미터가 추출될 수 있다.
Through this process, speech parameters including the pitch of the speech, the characteristic waveform (CW), the power and the line spectrum frequency (LSF) can be extracted.

단계 S550에서는, 사용자 맞춤형 음성 보정 장치가, 추출된 음성 파라미터들에 대해, 미리 설정된 복수의 특이 음성이 갖는 제1 특징들을 미리 설정된 정상 음성이 갖는 제2 특징들로 변경할 수 있다. 도 2 및 도 3에 도시된 바와 같이, 본 단계에서는, 단계 S530에서 추출된 음성 파라미터가 변경(300)될 수 있다. 예를 들어, 특이 음성이 갖는 피치를 미리 설정된 범위 내에서 안정화(stabilize)시킬 수 있다. 이 경우, 쉰 목소리, 거친 목소리 등이 갖는 불안정한 기본 주파수를 안정화시킴으로써 목소리를 보정할 수 있다.
In step S550, the user-customized voice correction apparatus may change the first characteristics of a plurality of specific voices preset for the extracted voice parameters to the second characteristics having a predetermined normal voice. As shown in FIGS. 2 and 3, in this step, the voice parameters extracted in step S530 may be changed (300). For example, the pitch of the specific voice can be stabilized within a predetermined range. In this case, the voice can be corrected by stabilizing the unstable fundamental frequency of the hoarse voice, the rough voice, and the like.

또한, 예를 들어, 특이 음성이 갖는 선스펙트럼 쌍(line spectrum pairs, LSP)을 보정하여 스펙트럼 포락선(spectral envelope)을 재구성할 수 있다. 이 경우, 콧소리의 특이 음성을 보정할 수 있다. 즉, 콧소리는 음성의 스펙트럼 포먼트(spectral formant)에 영향을 미치는데, 비강(nasal cavity)으로 인해 새로운 포먼트(formant)인 비음형대(nasal formant)가 추가되거나, 특정 주파수 대역의 에너지가 흡수되면서 반포먼트(anti-formant)가 발생할 수 있다. 이러한 포먼트(formant)는, LSP를 보정하여 스펙트럼 포락선을 재구성함으로써 보정될 수 있다.
In addition, for example, the spectral envelope can be reconstructed by correcting the line spectrum pairs (LSP) of the specific speech. In this case, the specific sound of the nose can be corrected. That is, the nasal nose affects the spectral formant of the voice, and a nasal formant, which is a new formant, is added due to the nasal cavity, or energy of a specific frequency band is absorbed Anti-formant may occur. This formant can be corrected by reconstructing the spectral envelope by correcting the LSP.

또한, 예를 들어, 특이 음성이 갖는 특성 파형을 SEW(smoothly evolving waveform) 성분 및 REW(rapidly evolving waveform) 성분으로 분리하고, 분리된 SEW 성분 및 REW 성분을 미리 설정된 범위 내에서 조절할 수 있다. 음성의 잔여(residual) 신호를 나타내는 특성 파형(CW)을, 유성음 특성을 가진 SEW 성분과 무성음 특징을 가진 REW 성분으로 구분하고, 각각의 SEW 성분 및 REW 성분을 변경시킴으로써, 특이 음성이 가진 유성음 및 무성음의 특징을 조절할 수 있고, 그에 따라 음색을 변화시킬 수 있다. 특성 파형(CW)을 SEW 성분과 REW 성분으로 분리하는 이유는, 유성음과 무성음으로 분리하여 음성을 파악할 수 있기 때문이다. 일반적으로, 유성음은 신호 파형이 주기적인 특성을 가지고 있고, 무성음은 어떠한 특징 없이 노이즈와 같은 형태를 나타낸다. 따라서, 특성 파형(CW)가 저대역 통과 필터를 거치면 SEW 성분이 추출되는데, 이는 음성의 유성음 성분을 나타낸다. 또한, 특성 파형(CW)에서 위의 SEW 성분을 제거하면 REW 성분이 나오는데, 이는 음성의 무성음 성분을 나타낸다. 이처럼, 특성 파형(CW)을 분리하여 유성음과 무성음 성분을 개별 조절할 수 있다. 이를 피치와 함께 고려하면, 피치는 음성 신호의 주기를 나타내는 것이므로, 유성음 구간에서만 나타나고 무성음 구간에서는 나타나지 않는다. 따라서 SEW 성분과 REW 성분은 유/무성음을 판단할 수 있는 기준이 될 수 있고 피치 안정화의 사전 판별 단계로 이용될 수 있다.
In addition, for example, the characteristic waveform of the specific sound can be separated into a SEW (smoothly evolving waveform) component and a REW (rapid evolving waveform) component, and the separated SEW component and REW component can be adjusted within a predetermined range. A characteristic waveform (CW) representing a residual signal of a speech is divided into a SEW component having a voiced sound characteristic and a REW component having an unvoiced sound characteristic, and by changing the respective SEW component and REW component, The characteristics of the unvoiced sound can be adjusted, and the tone color can be changed accordingly. The reason why the characteristic waveform (CW) is separated into the SEW component and the REW component is that the voice can be grasped by separating into voiced and unvoiced sounds. Generally, a voiced sound has a periodic characteristic of a signal waveform, and unvoiced sounds have a noise-like shape without any feature. Therefore, when the characteristic waveform CW passes through the low-pass filter, the SEW component is extracted, which represents the voiced component of the speech. In addition, when the above SEW component is removed from the characteristic waveform (CW), the REW component appears, which represents the unvoiced component of the speech. In this way, the characteristic waveform (CW) can be separated and the voiced and unvoiced components can be individually adjusted. Considering this with the pitch, pitch represents the period of the voice signal, so it appears only in the voiced part and not in the unvoiced part. Therefore, the SEW component and the REW component can be used as a criterion for determining whether the sound is unvoiced or not, and can be used as a pre-discrimination step for pitch stabilization.

한편, 단계 S550은, 사용자 맞춤형 음성 보정 장치가, 추출된 음성 파라미터들에 대해, 미리 설정된 제1 음색 범위로부터 미리 설정된 제2 음색 범위로 변경하는 단계를 포함할 수 있다. 이 때, 제1 음색 범위 및 제2 음색 범위 중 어느 하나는 차가운 느낌의 음색 범위이고, 다른 하나는 따뜻한 느낌의 음색 범위일 수 있다. 이와 같이, 목소리가 가지는 파라미터를 미리 저장된 목소리 데이터베이스를 통해 분석하고, 청취 평가를 통해 분류된 차가운 느낌의 목소리나 따뜻한 느낌의 목소리가 가진 파라미터의 특징을 이용함으로써, 차가운 느낌의 원시 음성 데이터가 가진 음성 파라미터를 따뜻한 느낌의 파라미터로 변경하거나, 따뜻한 느낌의 원시 음성 데이터가 가진 음성 파라미터를 차가운 느낌의 파라미터로 변경할 수 있다.
On the other hand, the step S550 may include the step of changing the user-customized voice correction apparatus from the preset first tone range to the predetermined second tone range for the extracted voice parameters. At this time, either one of the first tone color range and the second tone color range may be a tone color range with a cold feeling, and the other may be a tone color range with a warm tone. Thus, by analyzing the parameters of the voice through a previously stored voice database and using the characteristics of the cold or warm feeling voices classified through the listening evaluation, the voice of the cold voice data The parameter may be changed to a warm-feeling parameter, or the voice parameter of the warm-feeling raw voice data may be changed to a cold-feeling parameter.

단계 S570에서는, 변경된 제2 특징을 갖는 음성 파라미터들에 기초하여, 보정된 음성 데이터를 합성할 수 있다. 도 3에 도시된 바와 같이, 사용자의 설정에 맞추어 변경된 음성 파라미터들은 WI 합성기를 통해 합성되어, 보정된 음성으로 출력될 수 있다. 변경된 음성 파라미터들로부터 보정된 음성 데이터를 합성하는 과정은 도 5를 참조하여 보다 상세히 설명하도록 한다.
In step S570, the corrected voice data can be synthesized based on the voice parameters having the changed second characteristic. As shown in FIG. 3, the voice parameters changed according to the setting of the user may be synthesized through the WI synthesizer and output as the corrected voice. The process of synthesizing the corrected voice data from the changed voice parameters will be described in more detail with reference to FIG.

도 5는 본 발명의 일실시예에 따른 파라미터 변경에 의해 음색을 변환하는 사용자 맞춤형 음성 보정 방법 중 WI 합성기에서 수행되는 과정을 블록 도시한 도면이다. 도 5에 도시된 바와 같이, 예컨대, 프레임당 8의 비율로 제공되는 정규화된 특성 파형(CW) 및 파워(power)를 이용하여, 파워(power)가 역정규화되고(210), 프레임당 1의 비율로 제공되는 피치가 보간되며(240), 그로부터 서브프레임당 1의 비율로 파형이 재배열(220)된 후, 순간적인 피치 및 CW가 생성(230)될 수 있다. 이어서, 샘플당 1의 비율로 위상 트랙이 측정되며(250), 위상 트랙 값과 순간 CW에 기초하여 의해 2D-to-1D 변환(260)이 수행될 수 있다. 이렇게 변환된 잔여 신호는 예컨대, 프레임당 160의 비율로, LP 합성 필터(280)에 제공되며, LP 합성 필터에서는 변경된 LSF의 보간된 값(270)을 함께 이용하여, 보정된 음성을 재구성할 수 있다. 이와 같이, 본 발명의 실시예에 따른 음성 보정 방법에서는, WI 분석기와 WI 합성기 사이에 음성 파라미터가 양자화되지 않고 합성되므로, 음성의 열화 현상 또한 감소할 수 있다.
FIG. 5 is a block diagram illustrating a process performed in the WI synthesizer of a user-customized voice correction method for converting a tone color by parameter change according to an embodiment of the present invention. As shown in FIG. 5, power is denormalized (210) using normalized characteristic waveforms (CW) and power provided at a rate of 8 per frame, for example, The instantaneous pitch and CW may be generated 230 after the pitch provided in the pitch is interpolated 240 and the waveform is rearranged 220 at a rate of 1 per subframe therefrom. The phase track is then measured 250 at a rate of 1 per sample, and the 2D-to-1D transformation 260 may be performed based on the phase track value and the instantaneous CW. The thus transformed residual signal is provided to the LP synthesis filter 280, for example, at a rate of 160 per frame, and the LP synthesis filter may use the modified LSF interpolated value 270 together to reconstruct the corrected speech have. As described above, in the speech correction method according to the embodiment of the present invention, since the speech parameters are synthesized without being quantized between the WI analyzer and the WI synthesizer, the degradation of speech can also be reduced.

이상 설명한 본 발명은 본 발명이 속한 기술분야에서 통상의 지식을 가진 자에 의하여 다양한 변형이나 응용이 가능하며, 본 발명에 따른 기술적 사상의 범위는 아래의 특허청구범위에 의하여 정해져야 할 것이다.The present invention may be embodied in many other specific forms without departing from the spirit or essential characteristics of the invention.

100: WI 분석기 110: LP 분석 필터
120: LSF 보간 130: LPC 분석
140: 피치 측정 150: 피치 보간
160: CW 추출 170: 파형 정렬
180: POWER 계산 190: POWER 정규화
200: WI 합성기 210: POWER 역정규화
220: 파형 재배열 230: 순간 피치 및 CW 생성
240: 피치 보간 250: 위상 트랙 측정
260: 2D-to-1D 변환 270: LSF 보간
280: LP 합성 필터 300: 파라미터 변경
S510: 원시 음성 데이터를 입력받는 단계
S530: 입력된 원시 음성 데이터로부터 피치, 특성 파형(CW), 파워 및 선스펙트럼 주파수(LSF)를 포함한 음성 파라미터들을 추출하는 단계
S550: 추출된 음성 파라미터들에 대해, 미리 설정된 복수의 특이 음성이 갖는 제1 특징들을 미리 설정된 정상 음성이 갖는 제2 특징들로 변경하는 단계
S570: 변경된 제2 특징을 갖는 음성 파라미터들에 기초하여, 보정된 음성 데이터를 합성하는 단계100: WI analyzer 110: LP analysis filter
120: LSF interpolation 130: LPC analysis
140: pitch measurement 150: pitch interpolation
160: CW extraction 170: Waveform alignment
180: POWER calculation 190: POWER normalization
200: WI synthesizer 210: POWER denormalization
220: Rearrange waveform 230: Instantaneous pitch and CW generation
240: Pitch interpolation 250: Phase track measurement
260: 2D-to-1D conversion 270: LSF interpolation
280: LP synthesis filter 300: Parameter change
S510: receiving raw voice data
S530: extracting speech parameters including pitch, characteristic waveform (CW), power and line spectrum frequency (LSF) from the inputted original speech data;
S550: changing, for the extracted speech parameters, the first features of a plurality of predetermined specific sounds to the second features having a predetermined normal speech
S570: synthesizing the corrected voice data based on the voice parameters having the changed second characteristic

Claims

A user-
(1) receiving raw voice data;
(2) extracting speech parameters including pitch, characteristic waveform (CW), power and line spectrum frequency (LSF) from the input original speech data;
(3) changing, with respect to the extracted speech parameters, first characteristics of a plurality of predetermined specific sounds to second features having a predetermined normal speech; And
(4) synthesizing the corrected speech data based on the speech parameters having the modified second characteristic,
In the step (3)
Wherein the specific voice comprises at least one selected from the group consisting of hoarse voice, rough voice, breathy voice and nasal voice,
In the step (3)
A change to the second features of the first features,
And reconstructing a spectral envelope by correcting a pair of line spectra (LSP) of the specific speech. Alternatively, the characteristic waveform of the specific speech may be classified into a smoothly evolving waveform (SEW) component and a rapidly evolving waveform (REW) And adjusting the separated SEW component and the REW component within a predetermined range. 2. The method of claim 1, further comprising:

delete

The method according to claim 1,
Between steps (1) to (4) above,
Characterized in that the step of quantizing the original speech data does not include the step of quantizing the original speech data.

2. The method according to claim 1, wherein in the step (3)
(3-1) changing the extracted voice parameters from a preset first tone color range to a preset second tone color range, wherein the user-customized voice Correction method.

A user-customized voice correction apparatus for implementing a user-customized voice correction method for converting a tone color by changing a parameter according to any one of claims 1, 6, and 7.