KR920005509B1

KR920005509B1 - Natural sound synthesizer by adding noise

Info

Publication number: KR920005509B1
Application number: KR1019890015829A
Authority: KR
Inventors: 이윤근
Original assignee: 주식회사 금성사; 이헌조
Priority date: 1989-10-31
Filing date: 1989-10-31
Publication date: 1992-07-06
Also published as: KR910008647A

Abstract

The natural tone synthesizer adds noise to impulse to produce natural vocal sound. The synthesizer includes an interfacing unit (9) for interfacing a keyboard with a digital signal processor (4) for receiving Hangul character data, a sound data ROM (7), a program ROM (8), an address decoder (8) for selecting the sound data ROM (7) or the program ROM (8) by decoding data selection signal (DS) and program selection signal (PS), a buffer (6) for buffering address signals (A0-A15) transmitted from a DSP to designate addresses of the ROMs (7,8), and a digital to analog converter (3) for converting digital sound signal transmitted from the DSP into analog signal.

Description

Natural Sound Synthesizer by Noise Addition

제1도는 본 발명의 자연음 합성기 회로구성도.1 is a circuit diagram of a natural sound synthesizer of the present invention.

제2도는 본 발명의 신호흐름도.2 is a signal flow diagram of the present invention.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

1 : 스피커 2 : 증폭기1: speaker 2: amplifier

3 : 디지탈/아날로그변환기 4 : 디지탈신호프로세서3: digital / analog converter 4: digital signal processor

5 : 어드레스디코더 6 : 버퍼5: Address decoder 6: Buffer

7 : 음성데이타롬 8 : 프로그램롬7: Voice data ROM 8: Program ROM

9 : 인터페이스부 10 : 퍼스널컴퓨터9 interface 10 personal computer

11 : 키보드 12 : 모니터11: keyboard 12: monitor

본 발명은 음성합성에 관한 것으로, 특히 음성의 음원과 유사하게 유성음의 경우 임펄스성분과 노이즈(noise)성분으로 이루어진 잔차신호로부터 분석해낸 노이즈성분을 임펄스에 첨가시켜 보다 자연스러운 음성을 발생하도록 하기 위한 잡음 첨가에 의한 자연음 합성기에 관한 것이다.The present invention relates to speech synthesis. In particular, similar to a sound source of a voice, in the case of voiced sound, a noise for generating a more natural voice by adding a noise component analyzed from a residual signal composed of an impulse component and a noise component to an impulse It relates to a natural sound synthesizer by addition.

종래의 자연음 합성기에 있어서는 선형예측부호화에 의한 음성합성의 경우 무성음은 잡음(White noise)에 의해 소리를 발생시켰고, 유성음은 임펄스(Impulse)에 의해 소리를 발생시켰다. 즉 임펄스에 의해 발생된 유성음과 불규칙 잡음발생기에 의해 발생된 무성음이 유성, 무성음 선택스위치에 의해 선택된 후 유성음의 경우 선형예측부호화계수에 의해 필터되어 사람의 음성과 유사한 합성음이 스피커를 통해 출력되었으나, 이는 임펄스로만 음을 발생시키기 때문에 기계음이 발생되는 문제점이 있었다.In the conventional natural sound synthesizer, in case of speech synthesis by linear predictive encoding, unvoiced sound is generated by white noise, and voiced sound is generated by impulse. In other words, the voiced sound generated by the impulse and the unvoiced sound generated by the irregular noise generator were selected by the voiced and unvoiced sound selection switch, and in the case of the voiced sound, the synthesized sound similar to the human voice was output through the speaker. This is because the sound is generated only by the impulse has a problem that the mechanical sound is generated.

본 발명은 이와 같은 종래의 문제점을 시정보완하기 위하여 디지탈신호프로세서 주변에 음성데이타롬과 프로그램롬, 어드레스디코더, 인터페이스부 및 디지탈/아날로그변환기를 구비시킨 후 한글문자 입력을 미리 저장된 음성데이타에 의해 합성해내 디지탈/아날로그변환기와 증폭기를 통해 출력시키도록 창안한 것으로, 이를 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다.In order to solve the above problems, the present invention provides a voice data ROM, a program ROM, an address decoder, an interface unit, and a digital / analog converter in the vicinity of the digital signal processor, and synthesizes Korean character input by pre-stored voice data. It is designed to output through the digital / analog converter and the amplifier, which will be described in detail with reference to the accompanying drawings.

제1도는 본 발명의 자연을 합성기회로 구성도로서 이에 도시한 바와 같이 한글문자가 키보드(11)에서 선택되어 퍼스널컴퓨터(10)에 입력되고, 그 퍼스널컴퓨터(10)의 키이신호를 인터페이스하는 인터페이스부(9)와, 상기 인터페이스부(9)로부터 입력된 데이타에 따라 프로그램롬(8)에 저장된 프로그램을 수행하여 음성데이타롬(7)으로부터 해당데이타를 데이타라인(D₀-D₁₅)을 통해 읽어온 후 이를 분석, 합성하여 다시 데이타라인(D₀-D₁₅)으로 출력하는 디자탈신호프로세서(4)와, 상기 디지탈신호프로세서(4)에서 합성된 데이타를 데이타라인(D₀-D₁₅)을 통해 입력받아 아날로그신호로 변환시킨 후 이를 증폭기(2)를 통해 스피커(1)로 출력하는 디지탈/아날로그변환기(3)와, 상기 디지탈신호프로세서(4)의 데이타선택신호(DS) 및 프로그램선택신호(PS)를 디코딩하여 상기 음성데이타롬(7) 및 프로그램롬(8)을 선택하는 어드레스디코더(5)와, 상기 디지탈신호프로세서(4)의 어드레스신호(A₀-A₁₅)를 완충증폭하여 상기 음성데이타롬(7) 및 프로그램롬(8)의 어드레스를 지정하는 버퍼(6)로 구성한 것으로, 도면의 설명중 미설명 부호 12는 모니터이다.FIG. 1 is a schematic diagram illustrating the nature of the present invention. As shown in FIG. 1, Korean characters are selected from the keyboard 11 and input to the personal computer 10, and the key signals of the personal computer 10 are interfaced. In accordance with the interface unit 9 and the data stored in the program ROM 8 according to the data input from the interface unit 9, the data line D ₀ -D ₁₅ is stored from the voice data ROM 7. Read through the digital signal processor (4) for analyzing, synthesizing and outputting the data line (D ₀ -D ₁₅ ), and the data synthesized in the digital signal processor (4) data line (D ₀ -D ₁₅ , a digital / analog converter 3 for converting the analog signal into an analog signal and outputting the analog signal to the speaker 1 through the amplifier 2, the data selection signal DS of the digital signal processor 4, Decode the program selection signal (PS) Than the voice data ROM 7, and the program ROM 8 by buffer address signals (A ₀ -A ₁₅₎ of the address decoder 5 and the digital signal processor (4) for selecting the amplifying the voice data ROM ( 7) and a buffer 6 for designating an address of the program ROM 8, wherein reference numeral 12 in the description of the drawings indicates a monitor.

이와 같이 구성된 본 발명의 작용, 효과를 설명하면 다음과 같다.Referring to the operation and effects of the present invention configured as described above are as follows.

음성신호는 음원신호가 성도(Vocal track)를 통해 나오면서 변형된 신호인데, 여기서 변형특성은 성도의 모양에 의해 결정되며, 이를 나타내는 계수가 선형예측부호화계수이다. 또한 음원신호는 임펄스 성분과 노이즈 성분으로 이루어져 있는데, 이것이 잔차진호(residual)에 해당된다.The voice signal is a signal that is deformed as the sound source signal comes out through the vocal track, and the deformation characteristic is determined by the shape of the vocal tract, and the coefficient representing the linear signal is a linear predictive coding coefficient. In addition, the sound source signal is composed of an impulse component and a noise component, which corresponds to a residual signal.

즉, 음원신호를 성도 특성을 타나내는 선형예측계수에 의해 구성된 여과기를 통과시키면 음성신호가 발생하는데, 반대로 음성신호를 역필터링하게 되면 잔차신호가 얻어진다.That is, when a sound source signal passes through a filter composed of linear predictive coefficients representing vocal characteristics, a voice signal is generated. On the contrary, if the voice signal is reversely filtered, a residual signal is obtained.

따라서 잡음을 무시하고 임펄스로만 음을 발생시키면 기계음이 발생되어 음질이 좋지 않으므로 이 시스템에서는 잔차신호를 분석하여 잡음성분의 에너지를 구하여 음성합성시 임펄스 트래인에 적당한 에너지의 잡음을 섞어 줌으로써 보다 실제 음성과 흡사한 합성음을 만들어낸다.Therefore, if sound is generated by impulse, ignoring noise, and sound is not good, the sound quality is not good. In this system, the residual signal is analyzed and the energy of the noise component is obtained. Produces synthesized sound similar to

이와 같은 과정을 토대로 본 발명의 구성도 및 신호흐름도인 제2도를 참조하여 설명하면 다음과 같다.When described with reference to FIG. 2 which is a block diagram and a signal flow diagram of the present invention based on the above process.

본 시스템의 하드웨어는 디지탈신호프로세서(4)를 이용하여 구성되며, 제2도의 프로그램 내용은 프로그램롬(8)에 저장되어, 그 프로그램을 디지탈신호프로세서(4)에서 순차적으로 처리하여 수행하게 된다.The hardware of this system is constructed using the digital signal processor 4, the program contents of FIG. 2 are stored in the program ROM 8, and the programs are sequentially processed by the digital signal processor 4 to be executed.

음성데이타롬(7)에는 각 음소의 선형예측부호하(LPC)계수, 피치, 에너지에 관한 데이타를 저장하고, 프로그램롬(8)에는 합성알고리즘을 저장하여 둔다. 이후 키보드(11) 입력을 통해 한글문자 데이타가 퍼스널컴퓨터(10) 및 인터페이스부(9)를 통해 디지탈신호프로세서(4)의 데이타라인(D₀-D₁₅)에 입력되면, 그 디지탈신호프로세서(4)는 음운변환 규칙에 의해 소리나는대로 표기를 변화하여 이를 유성음과 무성음으로 나누고, 이어서 프로그램선택신호(PS)를 출력하면, 그 프로그램선택신호(PS)를 어드레스디코더(5)에서 디코딩하여 프로그램롬(8)을 선택하고, 이와아울러 어드레스신호(A₀-A₁₅)가 버퍼(6)를 통해 프로그램롬(8)의 어드레스(A)를 지정함에 따라 데이타라인(D₀-D₁₅)을 통해 해당데이타를 읽어오고, 다시 데이타선택신호(DS)를 출력하면, 그 데이타선택신호(DS)를 어드레스디코더(5)에서 디코딩하여 음성데이타롬(7)을 선택하고, 이와동시에 어드레스신호(A₀-A₁₅) 버퍼(6)를 통해 음성데이타롬(7)의 어드레스(A)를 지정함에 따라 그 어드레스에 저장된 음성데이타를 뽑아내어 원하는 음성을 합성한 후 데이타라인(D₀-D₁₅)을 통해 출력하고, 이에따라 그 음성합성신호는 디지탈/아날로그변환기(3)에서 아날로그신호로 변환된 후 증폭기(2)에서 증폭되어 스피커(1)로 출력된다.Voice data ROM 7 stores linear predictive code (LPC) coefficients, pitch, and energy of each phoneme, and program algorithm 8 stores synthetic algorithms. When the Hangul character data is input to the data lines D ₀ -D ₁₅ of the digital signal processor 4 through the personal computer 10 and the interface unit 9 through the keyboard 11 input, the digital signal processor ( 4) change the notation according to the phonological conversion rule, divide it into voiced sound and unvoiced sound, and then output the program selection signal PS. Then, the program selection signal PS is decoded by the address decoder 5 and programmed. The ROM 8 is selected and data lines D ₀ -D ₁₅ are selected as the address signals A ₀ -A ₁₅ designate the address A of the program ROM 8 through the buffer 6. When the corresponding data is read out and the data selection signal DS is output again, the data selection signal DS is decoded by the address decoder 5 to select the voice data ROM 7, and at the same time, the address signal A ₀ -A ₁₅ ) to the audio data (7) via the buffer (6). As the dress (A) is designated, the voice data stored at the address is extracted, the desired voice is synthesized, and then output through the data lines D ₀ -D _15. Accordingly, the voice synthesized signal is digital / analog converter 3. After being converted into an analog signal from the amplifier (2) is amplified and output to the speaker (1).

여기서, 디지탈신호프로세서(4)에서 주파수펄스신호(XF)가 출력되어, 아날로그/디지탈변환기(3)에 인가되고, 그 아날로그/디지탈변환기(3)에서 인터럽트신호(INT)가 발생되어 디지탈신호프로세서(4)에 인가되며, 그 디지탈신호프로세서(4)에서 인터럽트인식신호(IACK)가 발생되어 디지탈/아날로그변환기(3)에 인가된다.Here, the frequency pulse signal XF is output from the digital signal processor 4, applied to the analog / digital converter 3, and the interrupt signal INT is generated from the analog / digital converter 3 to generate the digital signal processor. (4), an interrupt recognition signal (IACK) is generated in the digital signal processor (4) and applied to the digital / analog converter (3).

여기서, 음성데이타의 음원을 보면, 음원중 임펄스의 분할을 AV라 하고, 음원중 노이즈의 분할을 AN이라하면, AV+AN=1이 된다. 즉 유성음의 경우에도 AN에 해당되는 노이즈를 첨가시켜 줌으로써 종래의 임펄스에 의한 합성보다 자연스러운 음을 얻을 수 있다. 다음은 AV와 AN을 결정하는 방법을 설명하면, 음성신호를 선형예측부호화계수에 의해 역필터링하면 잔차신호가 얻어진다. 유성음의 경우 순수한 임펄스가 아닌 노이즈가 첨가된 형태의 음원을 가진다. 따라서 잔차신호를 구하면 한 프레임내에서 임펄스와 노이즈의 에너지 비율을 계산할 수 있다.In the sound source of the audio data, if the division of the impulse in the sound source is AV, and the division of the noise in the sound source is AN, AV + AN = 1. That is, even in the case of voiced sound, by adding noise corresponding to AN, a natural sound can be obtained than conventional synthesis by impulse. Next, a method of determining AV and AN will be described. When the audio signal is reversely filtered by a linear predictive encoding coefficient, a residual signal is obtained. In the case of voiced sound, it has a sound source in which noise is added, not pure impulse. Therefore, when the residual signal is obtained, the energy ratio between the impulse and the noise can be calculated in one frame.

(여기서 EV : 엄펄스의 에너지이고, En : 임펄스와 임펄스간의 노이즈의 에너지이다).(Where EV is energy of umpulse and En is energy of noise between impulse and impulse).

이값을 음성데이타롬(7)에 저장하였다가 합성할때 이 비율로 노이즈를 혼합해 음원을 만들어 내면 보다 자연스러운 합성음을 얻을 수 있다.This value is stored in the voice data ROM (7), and when synthesized, a noise can be produced by mixing noise at this ratio to obtain a more natural synthesized sound.

이와 같이 실제 음성의 음원과 유사하게, 유성음의 경우 임펄스에 잔차신호로부터 분석해낸 양의 노이즈를 첨가시킴으로써 보다 자연스러운 합성음을 얻을 수 있는 우수한 특성이 있는 것이다.Similar to the sound source of the actual voice, voiced sound has an excellent characteristic of obtaining a more natural synthesized sound by adding the amount of noise analyzed from the residual signal to the impulse.

Claims

The interface unit 9 for interfacing the Hangul character data of the personal computer according to the key signal of the keyboard 11 to the digital signal processor 4, the voice data ROM 7 and the program in which the voice data is stored. An address for selecting the voice data ROM 7 and the program ROM 8 by decoding the stored program ROM 8 and the data selection signal DS and the program selection signal PS of the digital signal processor 4. A decoder 6, buffer 6 amplifying the address signals A ₀ -A ₁₅ of the digital signal processor 4, and a buffer 6 for addressing the voice data ROM 7 and the program ROM 8; And a digital / analog converter (3) for converting the output data of the digital signal processor (4) into an analog signal and outputting the analog signal to the speaker (2) through the amplifier (2). Energy and pitch linear prediction coding coefficients are stored, When the Hangul character data is output from the interface unit 9 and inputted to the digital signal processor 4, the digital signal processor 4 generates the program ROM by the impulse and noise proportional value obtained by the residual signal analysis during voice analysis. 8) Read the data stored in the data, read the corresponding data stored in the voice data ROM 7 by the data, add noise to the impulse and synthesize the synthesized data, and output the synthesized data to the digital / analog converter 3. Natural sound synthesizer by adding noise, characterized in that the configuration.