KR100241345B1

KR100241345B1 - Simplified intonation stylization for ktobi db construction

Info

Publication number: KR100241345B1
Application number: KR1019970037243A
Authority: KR
Inventors: 이정철; 김상훈
Original assignee: 정선종; 한국전자통신연구원
Priority date: 1997-08-04
Filing date: 1997-08-04
Publication date: 2000-02-01
Also published as: KR19990015260A

Abstract

본 발명은 음성에서 복잡하고 다양하게 나타나는 억양패턴을 단순화하기 위한 신호처리로 합성기의 자연성에 영향을 주는 억양의 주요 변화부만 억양신호에서 추출하는 방법과 이를 계단형 억양모델로 단순화하는 방법에 관한 것으로서, 억양의 주변화 분석결과로부터 각 심볼에 따른 억양을 억양패턴 표기체계에 따라 계단형 억양 모델과 억양 스타일로 정의하는 단계와, 이 모델에 따라 억양패턴을 계단형으로 단순화하는 신호처리단계를 수행함으로써, 사람의 수작업으로 레이블링된 억양정보의 일관성을 조사하거나 또는 수작업시 억양패턴의 분류가 용이해져 억양모델링용 데이터베이스 구축에 소요되는 시간을 대폭 줄일 수 있으며, 데이터베이스 구축을 자동화하는데 필수적인 억양패턴 분류 특징을 용이하게 추출할 수 있는 효과를 가진다.The present invention relates to a method for extracting only the main change parts of intonation from the intonation signal which affects the naturalness of the synthesizer as a signal processing for simplifying the complex and various intonation patterns in speech, and a method of simplifying them into a stepped intonation model. From the result of analysis of accent marginalization, defining accents for each symbol into stepped accent model and accent style according to accent pattern notation system, and signal processing step of simplifying accent pattern to stepped type according to this model. By doing this, it is possible to investigate the consistency of the accent information that is manually labeled by a person or to classify intonation patterns by hand, greatly reducing the time required to construct the database for intonation modeling, and to classify intonation patterns essential for automating database construction. Has the effect of easily extracting features.

Description

How Kthiobia Simplifies Accent Curves for Database Construction

본 발명은 KToBI 데이터베이스 구축을 위한 억양곡선의 단순화 방법에 관한 것이다.The present invention relates to a method of simplifying an intonation curve for constructing a KToBI database.

발화문장의 억양패턴 및 끊어읽기 정보는 한국어 무제한 음성합성기의 자연성에 중요한 요소이다.Accent patterns and utterance information of spoken sentences are important factors for the naturalness of the Korean unlimited speech synthesizer.

특히 근래들어 대량의 운율 데이터베이스에서 이러한 정보를 규칙화하는 연구가 많이 시도되었고, 이를 위해 억양패턴 정보를 심볼로 표현해야 할 필요성이 제기되었다.In particular, many studies have been attempted to regularize such information in a large rhyme database, and the necessity of expressing the intonation pattern information as a symbol has been raised.

한국어 톤 절단 지수(Korean Tone and Break Indices, 이하 KToBI라 칭함) 억양패턴 표기체계는 이러한 목적으로 제안되었으며, 이 억양체계에 따라 기술된 운율 데이터베이스를 이용하여 억양 모델링, 지속시간 모델링, 끊어읽기 등 운율규칙을 통계적으로 추출한다.The Korean Tone and Break Indices (hereinafter referred to as KToBI) accent pattern notation has been proposed for this purpose, and the accent modeling, duration modeling, break reading, etc., are performed using the rhyme database described under this accent system. Statistical extraction of rules

종래에는 상기 대량의 운율 데이터베이스를 구축하기 위해서 음성데이터베이스에 운율 정보를 부가할 수 있는 전문가가 필요하며, 전문가에 의해 운율정보 부가가 가능하다 하더라도 구축하는데 많은 시간이 소요되었다.Conventionally, an expert who can add rhyme information to a voice database is needed to build a large rhyme database, and even if it is possible to add rhyme information by an expert, it takes much time to construct.

특시 수작업은 오류를 빈번히 발생시키고, 데이터베이스의 일관성이 전문가에 따라 달라질 수 있으므로 정보 부가의 자동화가 절실히 요구되는 문제점이 있었다.In particular, manual work frequently causes errors, and since the consistency of the database may vary depending on the expert, there is a problem that the automation of information addition is urgently required.

상기 문제점을 해결하기 위해 본 발명은 음성에서 복잡, 다양하게 나타나는 억양패턴을 단순화하여 억양패턴 분류를 용이하게 하기 위한 방법으로, 수동으로 레이블링된 심볼과 억양패턴과의 일관성을 조사하기 위해 문장의 억양을 단순화하고, 단순화된 억양은 자연성에 중요한 억양의 주변화인 엑센트구와 억양구를 충분히 반영하는 것을 목적으로 한다.In order to solve the above problem, the present invention is a method for facilitating classification of accent patterns by simplifying complex and various accent patterns in speech, and inflicting sentences to investigate the consistency between manually labeled symbols and intonation patterns. Simplification, and the simplified accent aims to fully reflect accents and accents, which are marginalizations of accents that are important to nature.

도 1 은 본 발명에 따른 계단형 모델로 단순화 과정의 흐름도,1 is a flow chart of a simplified process with a stepped model according to the present invention,

도 2a 에서 2e 는 본 발명에 적용되는 각 단계별 억양 곡선과 최종 계단형으로 모델링된 악양곡선의 파형도,2A to 2E are waveform diagrams of each step intonation curve applied to the present invention and the jaw curve modeled in the final stepped form;

도 3 은 본 발명에 따른 원래의 억양, 계단형 억양, 레이블링된 억양심볼 파형도.3 is an original accent, stepped accent, labeled accent symbol waveform in accordance with the present invention.

상기 목적을 달성하기 위해 본 발명은, 억양의 주변화 분석결과로부터 각 심볼에 따른 억양을 억양패턴 표기체계에 따라 계단형 억양모델과 억양 스타일로 정의하는 단계와, 정의된 모델에 따라 억양패턴을 계단형으로 단순화하는 신호처리단계를 포함하는 것을 특징으로 한다.In order to achieve the above object, the present invention, step of defining the accent according to each symbol from the result of the marginal analysis of intonation into a stepped accent model and intonation style according to the accent pattern notation system, and the accent pattern according to the defined model It characterized in that it comprises a signal processing step to simplify the step.

이하 첨부된 도면을 참조하여 본 발명을 상세히 설명하면 다음과 같다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

단순화된 억양은 살펴보고자 하는 억양의 주변화인 엑센트구와 억양구를 충분히 반영할 수 있어야 하므로 음운환경에 의한 과도한 억양변화, 억양의 검출 오류 등으로 인한 패턴의 복잡도를 최소화하고자 다음과 같은 억양의 신호처리 과정을 거친다.Simplified intonation should fully reflect accents and accents, which are the periphery of accents to be examined, so as to minimize the complexity of patterns due to excessive intonation changes due to phonological environment and detection errors of intonations. Go through the process.

도 1은 본 발명에 따른 계단형 모델로 단순화 과정의 흐름도로서, 억양패턴을 단순화하는 신호처리 과정을 보여주고 있다.1 is a flowchart of a simplification process with a stepped model according to the present invention, showing a signal processing process for simplifying an intonation pattern.

원래의 억양곡선을 입력하여(S1) 원 억양곡선의 무성음 구간과 70㎐ 미만의 피티(pitch)값인 구간에 대해 영(zero)으로 대치하고(S2), 3점 메디안 필터링(3 point median filtering)을 수행한다(S3).Input the original accent curve (S1) to replace zero for the unvoiced section of the original accent curve and the section with a pitch value of less than 70 Hz (S2), and 3-point median filtering Perform (S3).

상기 필터링 수행(S3) 과정은 종래 억양을 단순화할 때 큰 오류를 일으킬 수 있는 음운환경, 예를들어 파찰, 마찰, 파열음 환경에 의한 피치 펄스의 발생을 제거한다.The filtering process (S3) eliminates the generation of pitch pulses due to phonological environment, for example, rupture, friction, rupture sound environment, which can cause a large error when simplifying the conventional intonation.

상기 수행 후 무성음 양 끝에 있는 유성음의 피치값을 경계값으로 하는 1차 선형방정식을 만들고, 이 선형방정식으로 무성음 구간에 피치를 선형적으로 할당한다.After performing the above, a linear linear equation is created using the pitch value of the voiced sound at both ends of the unvoiced sound as a boundary value, and the linear equation is linearly allocated to the unvoiced sound interval.

이는 무성음으로 인한 피치의 단절을 연속적으로 이어주기 위한 과정이다(S4).This is a process for continuing the disconnection of the pitch due to unvoiced sound (S4).

피치를 할당한 후 살펴보고자 하는 억양의 주변화, 즉 엑센트구 억양이나, 억양구 억양만 나타나도록 저역 필터링을 수행한다(S5).After allocating the pitch, the low-pass filtering is performed so that only the accent margin to be examined, that is, the accent accent or the accent accent (S5).

이때 필터의 차단주파수(cutoff frequency)는 200㎐, 500㎐, 1,000 ㎐, 1,200㎐, 1,500㎐, 2,000㎐까지 단계적으로 변화하여 필터링했을 때 억양의 주변화가 잘 나타나는 1,200㎐의 차단주파수를 실험적으로 정한다.At this time, the cutoff frequency of the filter is changed to 200㎐, 500㎐, 1,000㎐, 1,200㎐, 1,500㎐, 2,000㎐ by step, and experimentally determine the cutoff frequency of 1,200㎐, which shows the accent marginalization when filtering. .

상기 저역필터링된 억양곡선에서 억양의 피크(peak)롸 밸리(valley)를 찾고, 피크와 밸리의 중간 지점을 계단형으로 단순화한다(S6).In the low-filtered intonation curve, the peak to valley of the intonation is found, and the midpoint between the peak and the valley is simplified stepwise (S6).

그리고 묶음구간에 대해서 다시 영으로 대치하여 최종 계단형 억양곡선을 만든다(S7).Then, the final stepped accent curve is made by substituting again for zero in the bundle section (S7).

도 2a에서 2e는 본 발명에 적용되는 각 단계별 억양 곡선과 최종 계단형으로 모델링된 악양곡선의 파형도로서, 각 단계별 출력치이며, 최종적으로 계단형 억양으로 단순화됨을 나타낸다.2A to 2E are waveform diagrams of each step accent curve applied to the present invention and a jaw curve modeled in the final stepped step, and are output values for each step and finally simplified to stepped accents.

상기 도 2a는 억양곡선이며, 2b는 메디안 필터링 출력, 2c는 선형화 과정, 2d는 저역필터링 결과, 2e는 최종 계단형 억양을 나타낸다.2A is an accent curve, 2b is a median filtering output, 2c is a linearization process, 2d is a low pass filtering result, and 2e is a final stepped intonation.

도 3은 본 발명에 따른 원래의 억양, 계단형 억양, 레이블링된 억양심볼 파형도로서, 계단형 억양과 억양심볼을 보여주고 있다.3 is a diagram of the original intonation, stepped intonation, and labeled intonation symbol waveforms in accordance with the present invention, showing stepped intonation and intonation symbols.

상기 기술된 억양심볼과 다음 [표 1]에서 계단형으로 표현된 억양패턴을 계단형 억양과 비교하여 각 심볼의 표기가 일관성 있게 레이블링되었는지 알 수 있다.Comparing the intonation symbol described above and the intonation pattern represented by the stepped shapes in the following [Table 1] and the stepped intonation can be seen whether the representation of each symbol is consistently labeled.

예를 들어 LH％인 경우, 억양심볼 위치에서의 억양 패턴이 높음(High)이고, 이전 억양 패턴이 낮음(Low)인지 계단형 억양으로부터 찾을 수 있다.For example, in the case of LH%, it can be found from the stepped accent whether the intonation pattern at the intonation symbol position is High and the previous intonation pattern is Low.

계단형 억양이란 원 억양곡선에서의 억양 패턴을 얼마나 잘 모델링하고 있는지 평가하기 위해 총 KToBI 심볼수의 약 10％에 해당하는 테스트 데이터를 무작의로 선정하여 조사하였다.Stepped accents were randomly selected test data that corresponded to about 10% of the total number of KToBI symbols to evaluate how well modeling the accent patterns in the original accent curve.

평가 기준으로는 원 억양곡선에서의 억양패턴이 계단형 억양패턴과 일치되는지 알아보고, 평가 대상은 빈번히 발생하는 심볼, 즉 실제 H-, LHa, HL％, LH％, H％, L％ 등의 억양패턴이 전체 억양패턴중 여성인 경우 98.9％, 남성인 경우 97.2％ 비율로 차지하였다.As a criterion for evaluating whether the accent pattern in the circular accent curve is consistent with the stepped accent pattern, the evaluation target is the accent of frequently occurring symbols, that is, H-, LHa, HL%, LH%, H%, L%, etc. The pattern was 98.9% for females and 97.2% for males.

아래 [표 2]에서와 같이 계단형 모델의 전체 성능은 남녀 각각에 대해 93.2％, 94.1％이며, 이 결과는 수동 레이블링된 KToBI 심볼의 일관성을 대량의 데이터로부터 분석하는데 신뢰할 수 있는 수치라고 할 수 있다.As shown in Table 2 below, the overall performance of the stepped model is 93.2% and 94.1% for men and women, respectively, and this result is a reliable figure for analyzing the consistency of manually labeled KToBI symbols from a large amount of data. have.

그리고 계단형 모델 억양은 KToBI를 자동 레이블링하거나 분석시 억양 패턴을 뚜렷하게 나타낼 수 있어 비전문가라도 쉽게 KToBI를 레이블링 할 수 있다.In addition, stepped model accents can automatically label KToBI or clearly display accent patterns in analysis, making it easy for non-experts to label KToBI.

계단형 억양 모델링 성능 평가(correct 갯수/total 갯수(％))Stepped accent modeling performance evaluation (correct number / total number (%)) 심볼 H- LHa HL％ LH％ H％(+lev) L％(+lev) sccoreSymbol H- LHa HL% LH% H% (+ lev) L% (+ lev) sccore 여성 162/173(93.6) 114/123(92.7) 51/55(92.7) 20/22(90.9) 33/33(100.0) 37/33(100.0) 94.1남성 166/176(94.3) 100/116(86.2) 77/81(95.1) 16/16(100.0) 35/35(100.0) 93.2Female 162/173 (93.6) 114/123 (92.7) 51/55 (92.7) 20/22 (90.9) 33/33 (100.0) 37/33 (100.0) 94.1 Male 166/176 (94.3) 100/116 (86.2) ) 77/81 (95.1) 16/16 (100.0) 35/35 (100.0) 93.2

상술한 바와 같이 본 발명에 의하면 합성기의 자연성에 영향을 주는 억양의 주요 변화부만 억양신호에서 추출하는 방법과, 이를 계단형(step function)으로 단순화하는 방법으로 사람의 수작업으로 레이블링된 억양정보의 일관성을 조사하거나 또는 수작업시 억양패턴의 분류가 용이해져 억양모델링용 데이터베이스 구축에 소요되는 시간을 대폭 줄일 수 있으며, 데이터베이스 구축을 자동화하는데 필수적인 억양패턴 분류 특징을 용이하게 추출할 수 있는 효과가 있다.As described above, according to the present invention, the method of extracting only the main change portion of the intonation which affects the naturalness of the synthesizer from the intonation signal, and the method of simplifying the step into the step function, the method of manually labeling intonation of the human It is easy to classify intonation patterns when checking the consistency or by hand, which can greatly reduce the time required to construct the database for intonation modeling, and it is easy to extract the intonation pattern classification features essential for automating database construction.

Claims

Signal processing to simplify the complex and various intonation patterns of speech,

A first process of defining intonation according to each symbol from the result of the marginalization analysis of intonation into a stepped intonation model and intonation style according to the intonation pattern notation system;

A method of simplifying an intonation curve for constructing a KToBI database, comprising a second process of signal processing to simplify the intonation pattern according to a defined model.

The method of claim 1, wherein the first process is

A first step of inputting a circular intonation curve;

A second step of substituting zero for an unvoiced sound section of the input circular intonation curve and a pitch value section below a predetermined frequency and performing three-point median filtering;

A third step of creating a linear equation having the boundary value of the pitch value of the voiced sound at both ends of the unvoiced sound after performing the filtering;

And a fourth step of linearly allocating a pitch to the unvoiced sound interval using the linear equation.

The method of claim 1, wherein the second process

A first step of performing low-pass filtering so that only an accent accent or an accent intonation to be examined after the pitch is assigned (S5);

Finding a peak and a valley of the intonation in the low-filtered intonation curve, and then simplifying the intermediate point of the peak and the valley into a step (S6);

Simplifying the intonation curve for the construction of the KToBI database, characterized in that it comprises a third step of simplifying and then replacing the bundle interval to zero again to create a final stepped accent curve (S7).