KR100994340B1

KR100994340B1 - Music contents production device using tts

Info

Publication number: KR100994340B1
Application number: KR1020100029544A
Authority: KR
Inventors: 염종학; 강원모
Original assignee: (주)티젠스
Priority date: 2010-03-31
Filing date: 2010-03-31
Publication date: 2010-11-12

Abstract

PURPOSE: A musical content composition device using text-to-speech is provided to easily create musical contents to be used in person or sold. CONSTITUTION: A musical content composition device using text-to-speech comprises a music reproduction information acquiring unit(110), a sentence construction analyzing unit(120), a pronunciation converting unit(130), an optimum phoneme selecting unit(140), a sound source selecting unit(150), a rhythm control unit(160), a voice converting unit(170), a tone converting unit(180), and a song and background music synthesizing unit. The music reproduction information acquiring unit acquires lyrics, singers, tracks, musical scales, sound lengths, bits, tempo, and musical effects for music reproduction. The sentence construction analyzing unit analyzes sentences of the lyrics obtained by the music reproduction information acquiring unit and changes into the form defined according to linguistic characteristics. The pronunciation converting unit changes the analyzed data into phonemes. The optimum phoneme selecting unit selects the optimum phonemes corresponding to the lyrics extracted through the sentence construction analyzing unit and the pronunciation converting unit. The sound source selecting unit selects a sound source corresponding to the phonemes selected by the optimum phoneme selecting unit based on singer information obtained by the music reproduction information acquiring unit. The rhythm control unit controls the length and pitch of the optimum phonemes combined according to the characteristics of the sentences of the lyrics. The voice converting unit matches the sentences of the lyrics to output voice according to the musical scales, sound lengths, bits, and tempo obtained by the music reproduction information acquiring unit. The tone converting unit matches tones to the voice to be reproduced according to the musical effects obtained by the music reproduction information acquiring unit. The song and background music synthesizing unit synthesizes the background music information obtained by the music reproduction information acquiring unit and the tones finally converted by the tone converting unit.

Description

Music contents production device using TTS.

본 발명은 문자음성합성을 이용한 음악 컨텐츠 제작장치에 관한 것으로서, 더욱 상세하게는 노래 가사의 문자음성합성(TTS: text to speech)을 이용하여 임의의 가사와 음계, 음길이에 따라 합성된 노래를 출력하거나, 배경음악과 가사에 해당하는 노래를 합성하는 문자음성합성을 이용한 음악 컨텐츠 제작장치에 관한 것이다.
The present invention relates to an apparatus for producing music contents using text speech synthesis, and more particularly, to synthesize a song synthesized according to arbitrary lyrics, scale, and length using text to speech (TTS). The present invention relates to a music content production apparatus using character speech synthesis that outputs or synthesizes songs corresponding to background music and lyrics.

종래의 음성 합성 기술은 단순히 입력된 텍스트 문자를 대화체 형태로 음성을 출력하여 ARS(자동응답서비스), 음성안내, 네비게이션 음성 안내 등 단순 정보 전달 기능에 국한되어 사용되고 있었다.Conventional speech synthesis technology has been used to be limited to the simple information transfer function, such as ARS (Auto Answering Service), voice guidance, navigation voice guidance by simply outputting the input text characters in the form of a dialogue.

따라서, 단순 정보 전달 기능 이외에 인간의 모든 목소리 기능을 재현할 수 있는 기술을 활용하여 노래, 작곡, 드라마 성우, 지능형 로봇 등 다양한 서비스에 적용할 수 있는 문자 음성 합성 기술을 요구하고 있다.
Therefore, by using a technology that can reproduce all human voice functions in addition to a simple information transmission function, there is a demand for a text-to-speech synthesis technology that can be applied to various services such as songs, compositions, drama voice actors, and intelligent robots.

따라서 본 발명은 상기와 같은 종래 기술의 문제점을 감안하여 제안된 것으로서, 본 발명의 목적은 임의의 가사와 음계를 입력하면 음계에 따른 운율을 나타내는 음성을 해당 음길이로 발성할 수 있는 장치를 제공하여 일반인이 노래하는 수준의 품질을 구현할 수 있도록 하는데 있다.Accordingly, the present invention has been proposed in view of the problems of the prior art as described above, and an object of the present invention is to provide a device capable of uttering a voice representing a rhythm according to a scale when the arbitrary lyrics and the scale are input. It is to enable the general public to sing the quality of singing.

본 발명의 다른 목적은 임의의 가사, 음계, 음길이, 음악효과, 배경음악 설정, 비트/템포 등의 음악에 필요한 요소를 가공하여 디지털컨텐츠 형태로 제작할 수 있으며, 각종 언어의 특성에 따라 가사에 해당하는 텍스트를 분석하여 가사와 음성을 합성하고 각종 음악적 효과를 나타낼 수 있는 음악용 음성합성 방법을 제공하는 데 있다.
Another object of the present invention is to process the elements necessary for music, such as arbitrary lyrics, scale, musical length, music effects, background music settings, beat / tempo, and can be produced in the form of digital content, according to the characteristics of various languages The present invention provides a method for synthesizing lyrics and voice by analyzing corresponding texts, and a method for synthesizing speech for music.

본 발명이 해결하고자 하는 과제를 달성하기 위하여,In order to achieve the problem to be solved by the present invention,

본 발명의 일실시예에 따른 문자음성합성을 이용한 음악 컨텐츠 제작장치는,Music content production apparatus using a character voice synthesis according to an embodiment of the present invention,

음악 재생을 위하여 입력된 가사, 가수, 트랙, 음계, 음길이, 비트, 템포, 음악효과를 획득하는 음악재생정보획득부와; A music playback information acquisition unit for acquiring lyrics, singers, tracks, scales, musical lengths, beats, tempos, and music effects inputted for music reproduction;

상기 음악재생정보획득부에 의해 획득된 가사의 문장을 분석하여 언어적 특성에 따라 정의된 형태로 변환하는 구문분석부와;A syntax analysis unit for analyzing the sentence of the lyrics obtained by the music reproduction information acquisition unit and converting the sentence into a form defined according to linguistic characteristics;

상기 구문분석부에 의해 분석된 데이터를 음소 기반으로 변환하는 발음변환부와;A pronunciation converter for converting the data analyzed by the syntax analyzer based on a phoneme;

상기 구문분석부 및 발음변환부에 의해 분석된 가사에 해당하는 최적 음소를 사전에 정의된 규칙에 따라 최적 음소를 선택하는 최적음소선택부와;An optimum phoneme selection unit for selecting an optimum phoneme according to a rule defined in advance from the optimum phoneme corresponding to the lyrics analyzed by the syntax analyzer and the pronunciation converter;

상기 음악재생정보획득부에 의해 획득된 가수 정보를 획득하여 상기 최적음소선택부를 통해 선택된 음소에 해당되는 음원을 음원데이터베이스로부터 상기 획득된 가수 정보의 음원을 선택하는 음원선택부와;A sound source selection unit for acquiring the singer information acquired by the music reproduction information acquisition unit and selecting a sound source corresponding to the phoneme selected through the optimum phoneme selection unit from a sound source database;

가사의 문장 특성에 따라 상기 최적음소선택부에 의해 선택된 최적의 음소를 획득하여 최적의 음소들을 이어붙여 합성할 때 길이와 피치를 제어하는 운율제어부와;A rhyme control unit for acquiring the optimal phoneme selected by the optimum phoneme selecting unit according to the sentence characteristics of the lyrics and controlling the length and pitch when combining the optimum phonemes;

상기 운율제어부에 의해 합성된 가사의 문장을 획득하여 상기 음악재생정보획득부에 의해 획득된 음계, 음길이, 비트, 템포에 따라 재생되도록 획득된 가사의 문장을 매칭시키는 음성변환부와;A voice converter for acquiring the sentences of the lyrics synthesized by the rhyme control unit and matching the sentences of the lyrics acquired to be reproduced according to the scale, the length, the beat, and the tempo obtained by the music reproduction information acquisition unit;

상기 음성변환부에 의해 변환된 음성을 획득하여 상기 음악재생정보획득부에 의해 획득된 음악효과에 따라 재생되도록 상기 변환된 음성에 음색을 매칭시키는 음색변환부;를 포함하여 구성되어 본 발명의 과제를 해결하게 된다.
And a tone conversion unit for acquiring a voice converted by the voice conversion unit and matching a tone to the converted voice to be reproduced according to a music effect obtained by the music reproduction information acquisition unit. Will be solved.

이상의 구성 및 작용을 지니는 본 발명에 따른 문자음성합성을 이용한 음악 컨텐츠 제작장치는, 누구나 쉽게 음악 컨텐츠를 제작할 수 있도록 하여 개인이 창작한 컨텐츠를 온라인, 오프라인에서 유통할 수 있으며, 휴대폰에서 벨소리, 컬러링(RBT, Ring Back Tone) 등의 음악 컨텐츠 응용 부가서비스에 이용할 수 있으며, 다양한 형태의 휴대용 기기에서 음악 재생, 음성안내에 이용할 수 있으며, ARS(자동응답시스템), 네비게이션(지도안내장치)에서 사람과 유사한 억양으로 음성안내 서비스를 제공할 수 있으며, 인공지능로봇 장치에서 사람과 유사한 억양으로 말하게 하고, 노래하게 할 수 있는 효과를 제공하게 된다.Music content production apparatus using the character voice synthesis according to the present invention having the above configuration and operation, so that anyone can easily produce music content can distribute the content created by individuals online, offline, ringtones, coloring in the mobile phone (RBT, Ring Back Tone), etc., can be used for music content application supplementary services, and can be used for music playback and voice guidance on various types of portable devices, and people in ARS (auto answering system) and navigation (map guidance device) It can provide a voice guidance service with a similar accent, and the artificial intelligence robot device to provide an effect that allows you to speak with a similar accent and sing.

또한, 드라마나 애니메이션 컨텐츠 제작에 있어 성우를 대신할 수 있는 자연스런 사람의 억양을 표현할 수 있는 더 나은 효과를 제공하게 된다.
In addition, it will provide a better effect to express the accent of the natural person that can replace the voice actor in the production of drama or animation content.

도 1은 본 발명의 일실시예에 따른 문자음성합성을 이용한 음악 컨텐츠 제작장치의 블록도이다.
도 2는 본 발명의 다른 일실시예에 따른 문자음성합성을 이용한 음악 컨텐츠 제작장치의 블록도이다.
도 3은 본 발명의 일실시예에 따른 문자음성합성을 이용한 음악 컨텐츠 제작장치의 음악재생정보획득부 블록도이다.
도 4는 본 발명의 일실시예에 따른 문자음성합성을 이용한 음악 컨텐츠 제작프로그램을 나타낸 화면이다.1 is a block diagram of an apparatus for producing music contents using character speech synthesis according to an embodiment of the present invention.
2 is a block diagram of an apparatus for producing music contents using character speech synthesis according to another embodiment of the present invention.
3 is a block diagram of a music reproduction information acquisition unit of the apparatus for producing music contents using character speech synthesis according to an embodiment of the present invention.
4 is a screen showing a music content production program using character speech synthesis according to an embodiment of the present invention.

상기 과제를 달성하기 위한 본 발명의 문자음성합성을 이용한 음악 컨텐츠 제작장치는,Music content production apparatus using the character voice synthesis of the present invention for achieving the above object,

문자음성합성을 이용한 음악 컨텐츠 제작장치에 있어서,In the music content production apparatus using character speech synthesis,

상기 음성변환부에 의해 변환된 음성을 획득하여 상기 음악재생정보획득부에 의해 획득된 음악효과에 따라 재생되도록 상기 변환된 음성에 음색을 매칭시키는 음색변환부;를 포함하여 구성되는 것을 특징으로 한다.And a tone conversion unit for acquiring a voice converted by the voice conversion unit and matching a tone to the converted voice to be reproduced according to a music effect acquired by the music reproduction information acquisition unit. .

이때, 상기 음악재생정보획득부는,At this time, the music reproduction information acquisition unit,

가사 정보를 획득하는 가사정보획득부와,Lyrics information acquisition unit for acquiring lyrics information,

사용자에 의해 조절된 보컬 이펙트 정보를 획득하는 보컬이펙트획득부와,A vocal effect acquisition unit for acquiring vocal effect information adjusted by a user,

가수 정보를 획득하는 가수정보획득부를 포함하여 구성되는 것을 특징으로 한다.And a singer information acquisition unit for acquiring the singer information.

한편, 본 발명의 다른 일실시예에 따른 문자음성합성을 이용한 음악 컨텐츠 제작장치는,On the other hand, according to another embodiment of the present invention the apparatus for producing music contents using character speech synthesis,

상기 음성변환부에 의해 변환된 음성을 획득하여 상기 음악재생정보획득부에 의해 획득된 음악효과에 따라 재생되도록 상기 변환된 음성에 음색을 매칭시키는 음색변환부와;A tone conversion unit for acquiring a voice converted by the voice conversion unit and matching a tone to the converted voice to be reproduced according to a music effect obtained by the music reproduction information acquisition unit;

상기 음악재생정보획득부에 의해 획득된 배경 음악 정보와 상기 음색변환부에 의해 최종으로 변환된 음색을 합성하는 노래및배경음악합성부;를 포함하여 구성되는 것을 특징으로 한다.And a song and background music synthesis unit for synthesizing the background music information acquired by the music reproduction information acquisition unit and the tone tone finally converted by the tone conversion unit.

음원데이터베이스에 저장된 배경 음악 음원 중 선택된 배경 음악 음원 정보를 획득하는 배경음악정보획득부와,A background music information acquisition unit for acquiring the selected background music sound source information among the background music sound sources stored in the sound source database;

또한, 본 발명의 일실시예 또는 다른 일실시예의 부가적인 양상에 따라 화면에 출력된 가상피아노 악기에서 사용자에 의해 선택된 피아노 건반 위치 정보를 획득하는 피아노건반위치획득부를 더 포함하여 구성되는 것을 특징으로 한다.In addition, according to an additional aspect of one embodiment of the present invention or another embodiment of the present invention further comprises a piano keyboard position acquisition unit for obtaining the piano keyboard position information selected by the user in the virtual piano musical instrument output on the screen do.

이하, 본 발명에 의한 문자음성합성을 이용한 음악 컨텐츠 제작장치의 실시예를 통해 상세히 설명하도록 한다.Hereinafter, an embodiment of a music content producing apparatus using character voice synthesis according to the present invention will be described in detail.

도 1은 본 발명의 일실시예에 따른 문자음성합성을 이용한 음악 컨텐츠 제작장치의 블록도이다.1 is a block diagram of an apparatus for producing music contents using character speech synthesis according to an embodiment of the present invention.

도 1에 도시한 바와 같이, 본 발명의 일실시예에 따른 문자음성합성을 이용한 음악 컨텐츠 제작장치는,As shown in FIG. 1, an apparatus for producing music contents using character speech synthesis according to an embodiment of the present invention,

음악 재생을 위하여 입력된 가사, 가수, 트랙, 음계, 음길이, 비트, 템포, 음악효과를 획득하는 음악재생정보획득부(110)와; A music reproduction information acquisition unit (110) for acquiring lyrics, singers, tracks, scales, musical lengths, beats, tempos, and music effects input for music reproduction;

상기 음악재생정보획득부에 의해 획득된 가사의 문장을 분석하여 언어적 특성에 따라 정의된 형태로 변환하는 구문분석부(120)와;A syntax analysis unit 120 for analyzing the sentence of the lyrics obtained by the music reproduction information acquisition unit and converting the sentence into a form defined according to linguistic characteristics;

상기 구문분석부에 의해 분석된 데이터를 음소 기반으로 변환하는 발음변환부(130)와;A pronunciation converter 130 for converting the data analyzed by the syntax analyzer based on a phoneme;

상기 구문분석부 및 발음변환부에 의해 분석된 가사에 해당하는 최적 음소를 사전에 정의된 규칙에 따라 최적 음소를 선택하는 최적음소선택부(140)와;An optimum phoneme selection unit 140 for selecting an optimal phoneme according to a rule defined in advance by selecting the optimum phoneme corresponding to the lyrics analyzed by the syntax analyzer and the pronunciation converter;

상기 음악재생정보획득부에 의해 획득된 가수 정보를 획득하여 상기 최적음소선택부를 통해 선택된 음소에 해당되는 음원을 음원데이터베이스로부터 상기 획득된 가수 정보의 음원을 선택하는 음원선택부(150)와;A sound source selection unit 150 for acquiring the singer information obtained by the music reproduction information acquisition unit and selecting a sound source corresponding to the phoneme selected through the optimum phoneme selection unit from a sound source database;

가사의 문장 특성에 따라 상기 최적음소선택부에 의해 선택된 최적의 음소를 획득하여 최적의 음소들을 이어붙여 합성할 때 길이와 피치를 제어하는 운율제어부(160)와;A rhyme control unit 160 for controlling the length and pitch when acquiring the optimal phoneme selected by the optimum phoneme selecting unit according to the sentence characteristics of the lyrics and combining the optimal phonemes;

상기 운율제어부에 의해 합성된 가사의 문장을 획득하여 상기 음악재생정보획득부에 의해 획득된 음계, 음길이, 비트, 템포에 따라 재생되도록 획득된 가사의 문장을 매칭시키는 음성변환부(170)와;A voice converter 170 for acquiring the sentences of the lyrics synthesized by the rhyme control unit and matching the sentences of the lyrics acquired to be reproduced according to the scale, the length, the beat, and the tempo obtained by the music reproduction information acquisition unit; ;

상기 음성변환부에 의해 변환된 음성을 획득하여 상기 음악재생정보획득부에 의해 획득된 음악효과에 따라 재생되도록 상기 변환된 음성에 음색을 매칭시키는 음색변환부(180);를 포함하여 구성되는 것을 특징으로 한다.And a tone conversion unit 180 for acquiring a voice converted by the voice conversion unit and matching a tone to the converted voice to be reproduced according to a music effect obtained by the music reproduction information acquisition unit. It features.

상기 음악재생정보획득부(110)는 음악 재생을 위하여 입력된 가사, 가수, 트랙, 음계, 음길이, 비트, 템포, 음악효과를 획득하게 된다.The music reproduction information acquisition unit 110 acquires lyrics, singers, tracks, scales, sound lengths, beats, tempos, and music effects input for music reproduction.

즉, 도 4에 도시한 바와 같은 문자음성합성을 이용하여 음악 컨텐츠를 작업자가 수행할 수 있도록 음악 컨텐츠 제작프로그램을 본 발명의 장치에 탑재하여 화면에 출력하게 된다.That is, the music content production program is mounted on the apparatus of the present invention so that the operator can perform the music content using the character voice synthesis as shown in FIG.

상기 가사, 가수, 트랙, 음계, 음길이, 비트, 템포, 음악효과의 정보 등을 음악정보데이터베이스(200)에 저장하고 관리하게 되며 상기 사용자가 선택한 음악 재생에 필요한 정보를 참조하여 상기 음악재생정보획득부에서 음악정보데이터베이스에 저장된 해당 정보를 획득하게 되는 것이다.The lyrics, singer, track, scale, musical length, beat, tempo, music effect information, etc. are stored and managed in the music information database 200, the music playback information with reference to the information required for the music playback selected by the user The acquiring unit acquires the corresponding information stored in the music information database.

음악 컨텐츠 제작에 필요한 각종 동작 모드를 사용자가 선택할 수 있도록 제작프로그램을 사용자의 모니터에 출력하게 되며 이를 보고 사용자가 음악 재생을 위하여 입력된 가사, 가수, 트랙, 음계, 음길이, 비트, 템포, 음악효과, 창법 등을 선택하게 되면 해당 선택된 정보를 음악재생정보획득부(110)에서 획득하게 되는 것이다.The production program is output to the user's monitor so that the user can select various operation modes required for the production of music contents. When the effect, the creation method, etc. are selected, the selected information is acquired by the music reproduction information acquisition unit 110.

이때, 상기 음악재생정보획득부에 의해 획득된 가사의 문장을 구문분석부(120)를 통해 분석하여 언어적 특성에 따라 정의된 형태로 변환하게 된다.In this case, the sentence of the lyrics obtained by the music reproduction information acquisition unit is analyzed by the syntax analysis unit 120 and converted into a form defined according to linguistic characteristics.

상기 언어적 특성이란 한국어의 경우, 구문이 주어, 목적어, 동사, 조사, 부사 등이 있으며, 나열하는 순서가 있는데 이를 언어적 특성이라 정의한 것이며, 영어나 일본어 등 모든 언어가 이러한 특성을 지니고 있다.The linguistic characteristics of the Korean language is given by a syntax, and include an object, a verb, a survey, an adverb, and the like. The linguistic characteristics are defined as linguistic characteristics, and all languages such as English and Japanese have such characteristics.

상기 정의된 형태는 언어의 형태소로 구분하는 것을 의미하며, 형태소는 언어에서 뜻을 가진 최소의 단위이다. The above-defined form means to be divided into morphemes of language, and morphemes are the smallest units having meanings in the language.

예를 들어 '동해물과 백두산이’라는 문장은 '동해물’+ ‘과’+‘백두산’+ ‘이’로 형태소 구분이 된다. For example, the sentence 'Donghaemul and Baekdusanyi' is morphologically divided into 'Donghaemul' + 'and' + 'Baekdusan' + '이'.

상기 형태소로 구분한 후에는 문장 성분을 분석하게 되는데, 예를 들어 ‘동해물’= 명사, ‘과’=조사, ‘백두산’=명사, ‘이’=조사 등과 같이 명사, 조사, 부사, 형용사, 동사 등으로 문장 성분을 분석하는 것이다.After classifying the morphemes, the sentence components are analyzed, for example, 'donghaemul' = noun, 'and' = survey, 'baekdusan' = noun, 'yi' = survey, such as noun, survey, adverb, adjective. Analyze sentence components using verbs and verbs.

즉, 선택된 가사가 한국어라면 한국어의 특성에 따라 정의된 형태로 변환하는 것이다.In other words, if the selected lyrics are Korean, they are converted to the defined form according to Korean characteristics.

상기 구문분석부에 의해 분석된 데이터를 발음변환부(130)에서 전송받아 음소 기반으로 변환하게 되며, 최적음소선택부(140)를 통해 상기 구문분석부 및 발음변환부에 의해 분석된 가사에 해당하는 최적 음소를 사전에 정의된 규칙에 따라 최적 음소를 선택하게 되는 것이다.The data analyzed by the parser is transmitted from the pronunciation converter 130 and converted to phoneme based, and corresponds to the lyrics analyzed by the parser and the phonetic converter through the optimal phoneme selecting unit 140. The optimal phonemes are selected according to the predefined rules.

상기 발음변환부는 음소 기반으로 변환하게 되는데, 구분 분석이 된 문장을 한글 읽기 규칙에 따라 발음 형태로 변환하는 것이다.The pronunciation converting unit converts the phoneme based phoneme into a phonetic form according to a Hangul reading rule.

예를 들어 '동해물과 백두산이’는 ‘동해물가 백뚜사니’와 같이 표현될 것이며, 이를 음소기반으로 구분하면 ‘동해물과‘는 ’도 + 옹 + O해 + 우무+ 물 + 울가‘ 와 같이 변환이 되는 것이다.For example, 'Donghaemul and Baekdusanyi' will be expressed as 'Donghaemulgak Baektu Sani', and if divided into phonemes, 'Daehaemulwa' is 'do + Ong + Ohae + Um + water + Ulga' and It will be converted as well.

상기 최적음소선택부(140)는 분석된 가사가 동해물일 경우에 최적 음소는 예를 들어, 도,옹,O해, 애무, 물, 울가 등이 되며 이를 선택하게 되는 것이다.The optimum phoneme selection unit 140, when the analyzed lyrics are Donghae, the best phoneme is, for example, Do, Ong, Ohae, caress, water, Ulga and so on to select this.

상기 음원선택부(150)는 음악재생정보획득부에 의해 획득된 가수 정보를 획득하여 상기 최적음소선택부를 통해 선택된 음소에 해당되는 음원을 음원데이터베이스로부터 상기 획득된 가수 정보의 음원을 선택하게 된다.The sound source selection unit 150 obtains the singer information acquired by the music reproduction information acquisition unit and selects a sound source of the acquired singer information from a sound source database as a sound source corresponding to the phoneme selected through the optimum phoneme selection unit.

즉, 가수를 소녀시대로 선택하게 되면 해당 소녀시대에 해당하는 음원을 음원DB로부터 선택하게 되는 것이다.That is, when the singer is selected as the girl's generation, the sound source corresponding to the girl's generation is selected from the sound source DB.

가수 정보 이외에 트랙 정보를 제공할 수도 있으므로 만약에 사용자가 가수 이외에 트랙을 선택하였다면 해당 트랙 정보 제공도 가능하다.Since track information may be provided in addition to the singer information, if the user selects a track other than the singer, the corresponding track information may be provided.

상기 운율제어부(160)는 가사의 문장 특성에 따라 상기 최적음소선택부에 의해 선택된 최적의 음소를 획득하여 최적의 음소들을 이어붙여 합성할 때 길이와 피치를 제어하게 된다.The rhyme control unit 160 obtains the optimal phoneme selected by the optimum phoneme selection unit according to the sentence characteristics of the lyrics, and controls length and pitch when combining the optimal phonemes.

상기 문장 특성은 연음법칙, 구개음화와 같은 문장을 발음으로 변환할 때 적용되는 법칙 즉, 문자로 표현하는 표현기호와 발음기호가 달라지는 언어 규칙을 의미한다.The sentence characteristic refers to a law applied when converting a sentence such as a soft-sounding or palatalization into a pronunciation, that is, a language rule in which an expression symbol represented by a character and a pronunciation symbol are different.

상기 길이는 가사에 해당하는 음 길이를 의미하는데, 즉 1,2,3박자 길이를 의미하며, 피치는 가사의 음계를 의미하는데, 즉, '도레미파솔라시도'와 같은 음악에서 정의한 음 높이를 의미한다.The length means a note length corresponding to the lyrics, that is, a length of 1, 2, 3 beats, and the pitch means a scale of the lyrics, that is, a pitch defined in music such as 'Doremi Pasolasido'.

즉, 문장의 특성에 따라 자연스러운 발성을 낼 수 있도록 음소를 이어붙여 합성할 때 길이와 피치를 제어하는 역할을 수행하는 것이다.In other words, it plays a role of controlling length and pitch when combining phonemes to synthesize natural utterance according to the characteristics of a sentence.

상기 음성변환부(170)는 운율제어부에 의해 합성된 가사의 문장을 획득하여 상기 음악재생정보획득부에 의해 획득된 음계, 음길이, 비트, 템포에 따라 재생되도록 획득된 가사의 문장을 매칭시키는 역할을 수행하게 된다.The voice converter 170 acquires a sentence of the lyrics synthesized by the rhyme control unit to match the sentence of the lyrics obtained to be reproduced according to the scale, the length, the beat, the tempo obtained by the music reproduction information acquisition unit It will play a role.

즉, 가사에 해당하는 음원을 음계, 음길이, 비트, 템포에 따라 음성을 변환하는 기능을 수행하게 되는데, 예를 들어 '동'에 해당하는 음원을 '솔'이라는 음계(피치)로 1박자의 음길이로 , 4/4박자의 비트로, 120의 템포로 음원을 재생하는 것이다.That is, the sound source corresponding to the lyrics is converted to a voice according to the scale, length, beat, and tempo. For example, the sound source corresponding to 'dong' is beaten on a scale (pitch) called 'sol'. This plays the sound source at 120 tempos, in beats of 4/4 beats.

상기 음계(Pitch)는 음의 높이를 의미하며, 음의 높이를 사용자가 쉽게 지정할 수 있도록 본 발명에서는 가상 피아노 악기 기능을 제공하고 있다.The pitch refers to the height of the sound, and the present invention provides a virtual piano musical instrument function so that the user can easily specify the height of the sound.

상기 음길이는 음의 길이를 의미하며, 음악 악보와 같이 음표를 제공하여 음길이 편집을 쉽게 하도록 한다.The note length means the length of the note, and the note length is provided to facilitate editing of the note length like the music score.

기본적으로 제공하는 음표는 1분음표(1), 2분음표(1/2), 4분음표(1/4), 8분음표(1/8), 16분음표(1/16), 32분음표(1/32), 64분음표(1/64)이다. The notes provided by default are quarter notes (1), half notes (1/2), quarter notes (1/4), eighth notes (1/8), sixteenth notes (1/16), 32 The quarter note (1/32) and the 64-note note (1/64).

상기 비트(Beat)는 음악에서의 박자의 단위이며, 1/2 박자 ,1/4 박자 , 1/8 박자 등이 있다.The beat is a unit of beat in music, and there are 1/2 beat, 1/4 beat, and 1/8 beat.

분모에 해당하는 숫자는 (1,2,4,8,16,32,64)이며 , 분자에 해당하는 숫자는 (1~256)이다. The number corresponding to the denominator is (1,2,4,8,16,32,64) and the number corresponding to the numerator is (1 to 256).

상기 템포(Tempo)는 음악의 악곡 진행 속도를 의미하며, 보통 (20~300) 숫자를 제공하며, 숫자가 작을수록 느린 속도로, 숫자가 클수록 빠른 속도를 의미한다.The tempo (Tempo) means the music progression speed of music, and usually provides a number (20 ~ 300), the smaller the number is the slower speed, the larger the number means the faster speed.

통상 한 박자의 길이의 속도를 120으로 한다.Usually, the speed of one beat is 120.

상기 음색변환부(180)는 음성변환부에 의해 변환된 음성을 획득하여 상기 음악재생정보획득부에 의해 획득된 음악효과(vocal effect) 혹은 창법에 따라 재생되도록 상기 변환된 음성에 음색을 매칭시키는 역할을 수행하게 된다.The tone converting unit 180 acquires a voice converted by the voice converting unit and matches a tone to the converted voice to be reproduced according to a vocal effect or a creative method obtained by the music reproduction information obtaining unit. It will play a role.

예를 들어 ‘동’이라는 음원에 바이브레이션, 어택 등의 음악효과를 주어 음색에 변화를 주게 되는 것이다.For example, the sound source ‘동’ is used to change the tone by giving music effects such as vibration and attack.

상기 음악효과 및 창법은 음악적 효과를 극대화시키기 위한 기능을 제공하기 위한 것이며, 음악효과는 사람의 자연스런 발성법을 지원하기 위한 기능으로서 다음과 같이 음색을 변환해주게 된다.The music effect and the creation method is to provide a function for maximizing the musical effect, the music effect is a function to support the natural voice of the person is to convert the tone as follows.

도 4에 도시된 바와 같이 제작 프로그램에는 VEL(velocity), DYN(dynamics), BRE(Breathiness), BRI(Brightness), CLE(Clearness), OPE(Opening), GEN(Gender Factor), POR(Portamento Timing), PIT(Pitch Bend), PBS(Pitch Bend Sensitivity), VIB(Vibration)등을 사용자에게 제공하게 된다.As shown in FIG. 4, the production program includes VEL (velocity), DYN (dynamics), BRE (Breathiness), BRI (Brightness), CLE (Clearness), OPE (Opening), GEN (Gender Factor), and POR (Portamento Timing). ), PIT (Pitch Bend), PBS (Pitch Bend Sensitivity), VIB (Vibration) and so on.

상기 VEL(velocity)은 어택으로서 값을 높게 하면 자음이 짧아지는 것으로 어택감이 강해지게 되며, 상기 DYN(dynamics)은 강약으로서 가수의 다이나믹스(소리의 크기, 부드러움)를 제어하는 것이다.As the VEL (velocity) is increased as the attack, the consonant becomes shorter as the consonant becomes shorter, and the DYN (dynamics) is the strength and weakness of controlling the dynamics of the singer (sound and softness).

상기 BRE(Breathiness)는 값이 높으면 숨이 더해지는 것이며, BRI(Brightness)는 소리가 높은 주파수 성분을 증감시키는 것으로서 값이 높으면 밝고 낮으면 침울하고 온화한 소리를 제공하게 된다.The BRE (Breathiness) is to add a breath when the value is high, the BRI (Brightness) is to increase or decrease the frequency components of the sound is high and bright and low to provide a dim and mild sound.

상기 CLE(Clearness)는 BRI와 유사하지만 원리가 다르다. 즉, 값이 높으면 샤프하고 맑은 소리를 값이 낮으면 낮고 무거운 소리를 제공하게 된다.The CLE (Clearness) is similar to BRI, but the principle is different. In other words, high value provides sharp and clear sound and low value provides low and heavy sound.

상기 OPE(Opening)는 입의 여는 상태에 의해 톤이 바뀌는 모습을 시뮬레이션하는 것으로서 높으면 선명하고 낮으면 깔끔하지 못한 특성을 제공하게 된다.The OPE (Opening) simulates the appearance that the tone is changed by the opening state of the mouth, and provides high and clear and low-noticeable characteristics.

상기 GEN(Gender Factor)은 가수의 캐릭터를 광범위하게 변형하는 것으로서 높으면 남성적, 낮으면 여성적인 느낌을 제공하게 된다.The GEN (Gender Factor) is a broad variation of the singer's character, which provides a high masculine and low feminine feeling.

상기 POR(Portamento Timing)는 피치가 바뀌는 포인트를 조정하는 것이며 상기 PIT(Pitch Bend)는 피치에 대한 EQ 밴드를 조정하는 것이며, 상기 PBS(Pitch Bend Sensitivity)는 피치 조정에 대한 감도나 감성의 조정을 수행하고 상기 VIB(Vibration)는 음의 떨림을 조정하는 기능을 수행하게 된다.The POR (Portamento Timing) adjusts the point at which the pitch changes, and the PIT (Pitch Bend) adjusts the EQ band for the pitch, and the PBS (Pitch Bend Sensitivity) adjusts the sensitivity or sensitivity to the pitch adjustment. The VIB (Vibration) is to perform the function of adjusting the shaking of the sound.

창법은 사람의 노래 부르는 방법을 의미하며 보컬의 음원을 보컬 음악효과 등의 기법을 가공하여 다양한 창법을 구현하게 되는 것이다.Chang means the way of singing a person's song and implements various methods by processing the vocal sound sources such as vocal music effects.

예를 들어 여성 목소리, 남성 목소리, 아이 목소리, 로봇 목소리, 팝, 클래식, 꺽기 등과 같이 노래 부르는 기법을 제공하는 것이다.For example, they offer singing techniques such as female voices, male voices, child voices, robot voices, pops, classics, and breaks.

도 2는 본 발명의 다른 일실시예에 따른 문자음성합성을 이용한 음악 컨텐츠 제작장치의 블록도이다.2 is a block diagram of an apparatus for producing music contents using character speech synthesis according to another embodiment of the present invention.

도 2에 도시한 바와 같이, 음악재생정보획득부에 의해 획득된 배경 음악 정보와 상기 음색변환부에 의해 최종으로 변환된 음색을 합성하는 노래및배경음악합성부(190)를 더 포함하여 구성하게 된다.As shown in FIG. 2, the apparatus further includes a song and background music synthesis unit 190 for synthesizing the background music information acquired by the music reproduction information acquisition unit and the tone tone finally converted by the tone conversion unit. do.

예를 들어 '동해물과 백두산이’라는 음원을 재생시킬 때 해당 노래의 배경음악(보통 악기로 연주되는 음악)을 합성하는 것이다.For example, when playing a sound source called 'Donghaemul and Baekdusanyi', the background music of the song (usually played by a musical instrument) is synthesized.

즉, 상기 변환된 최종 음색에 배경 음악을 합성하여 완성된 형태의 음악을 출력하게 되는 것이다.That is, the music of the completed form is output by synthesizing the background music with the converted final tone.

도 3은 본 발명의 일실시예에 따른 문자음성합성을 이용한 음악 컨텐츠 제작장치의 음악재생정보획득부 블록도이다.3 is a block diagram of a music reproduction information acquisition unit of the apparatus for producing music contents using character speech synthesis according to an embodiment of the present invention.

도 3에 도시한 바와 같이, 상기 음악재생정보획득부(110)는,As shown in FIG. 3, the music reproduction information acquisition unit 110,

가사 정보를 획득하는 가사정보획득부(111)와,Lyrics information acquisition unit 111 for acquiring lyrics information,

음원데이터베이스에 저장된 배경 음악 음원 중 선택된 배경 음악 음원 정보를 획득하는 배경음악정보획득부(112)와,A background music information acquisition unit 112 for acquiring the selected background music sound source information among the background music sound sources stored in the sound source database;

사용자에 의해 조절된 보컬 이펙트 정보를 획득하는 보컬이펙트획득부(113)와,A vocal effect acquisition unit 113 for acquiring vocal effect information adjusted by a user;

가수 정보를 획득하는 가수정보획득부(114)를 포함하여 구성된다.And a singer information acquisition unit 114 for acquiring the singer information.

또한, 부가적인 양상에 따라 화면에 출력된 가상피아노 악기에서 사용자에 의해 선택된 피아노 건반 위치 정보를 획득하는 피아노건반위치획득부(미도시)를 더 포함하여 구성할 수도 있다.In addition, according to an additional aspect it may be configured to further include a piano keyboard position acquisition unit (not shown) for obtaining the piano keyboard position information selected by the user in the virtual piano musical instrument output on the screen.

상기 피아노 건반 위치 정보는 피아노 악기에 해당하는 각 건반의 음높이(피치)에 해당하는 주파수를 미리 정의하여 제공하는 것이다.The piano keyboard position information is provided in advance by defining a frequency corresponding to the pitch (pitch) of each key corresponding to the piano musical instrument.

도 4는 본 발명의 일실시예에 따른 문자음성합성을 이용한 음악 컨텐츠 제작프로그램을 나타낸 화면이다.4 is a screen showing a music content production program using character speech synthesis according to an embodiment of the present invention.

도 4에 도시한 바와 같이, 사용자가 가사를 편집할 수 있는 가사편집영역(410), 배경음악을 편집할 수 있는 배경음악편집영역(420), 사용자가 피아노 건반을 조작하도록 하는 가상피아노악기영역(430), 사용자가 보컬 이펙트를 편집할 수 있는 보컬이펙트편집영역(440), 가수 혹은 트랙을 편집할 수 있는 가수설정영역(450), 사용자가 파일, 편집, 오디오, 보기, 작업, 트랙, 가사, 설정, 창법, 도움말 등을 선택할 수 있도록 하는 설정영역(460)을 화면에 출력하게 되면 사용자가 자신이 원하는 편집을 수행하게 되는 것이다.As shown in FIG. 4, a lyrics editing area 410 in which a user can edit lyrics, a background music editing area 420 in which background music can be edited, and a virtual piano musical instrument area in which a user manipulates a piano keyboard 430, the vocal effect editing area 440 in which the user can edit vocal effects, the singer setting area 450 in which the singer or track can be edited, the file, edit, audio, view, task, track, When outputting the setting area 460 to select the lyrics, settings, methods, help, etc. on the screen, the user will perform the desired edit.

상기 가사편집영역(410)은 언어의 최소 단위(음절)를 입력할 수 있으며, 각 음절의 음을 표시하고 발음기호를 표시하게 된다.The lyrics editing area 410 may input a minimum unit of a language (syllable), and display the sound of each syllable and display a phonetic symbol.

각 음절에 해당하는 음계(Pitch), 음길이(Length)의 속성을 가지게 된다.Each syllable will have the properties of Pitch and Length.

상기 배경음악편집영역(420)은 WAV, MP3등 종래 음원을 입력하고 편집할 수 있게 된다.The background music editing area 420 can input and edit conventional sound sources such as WAV and MP3.

상기 가상피아노악기영역(430)은 피아노 악기에 해당하는 기능을 제공하는 것으로서 각 피아노 건반 위치에 맞는 음을 재생할 수 있게 된다.The virtual piano musical instrument region 430 provides a function corresponding to a piano musical instrument, so that a sound suitable for each piano keyboard position can be reproduced.

상기 가수설정영역(450)은 보컬에 해당하는 가수 음원을 선택할 수 있고 여러가지 트랙을 편집할 수 있는 기능을 제공하여 여러 가수가 노래하는 기능을 수행하게 된다.The singer setting area 450 may select a singer sound source corresponding to a vocal and provide a function of editing various tracks to perform a function of singing by several singers.

상기 설정영역(460)은 여러가지 노래하는 기법을 설정할 수 있는 창법 설정, 편집 기본단위 음표, 편집 화면 옵션 등을 설정할 수 있게 된다.In the setting area 460, a setting method for setting various singing techniques, an editing basic unit note, an editing screen option, and the like can be set.

상기와 같은 구성 및 동작을 통해 누구나 쉽게 음악 컨텐츠를 제작할 수 있도록 하여 개인이 창작한 컨텐츠를 온라인, 오프라인에서 유통할 수 있으며, 휴대폰에서 벨소리, 컬러링(RBT, Ring Back Tone) 등의 음악 컨텐츠 응용 부가서비스에 이용할 수 있으며, 다양한 형태의 휴대용 기기에서 음악 재생, 음성안내에 이용할 수 있으며, ARS(자동응답시스템), 네비게이션(지도안내장치)에서 사람과 유사한 억양으로 음성안내 서비스를 제공할 수 있으며, 인공지능로봇 장치에서 사람과 유사한 억양으로 말하게 하고, 노래하게 할 수 있는 효과를 제공하게 된다.Through the configuration and operation as described above, anyone can easily produce music contents, so that personally-created contents can be distributed online and offline, and the application of music contents such as ringtones and coloring (RBT, Ring Back Tone) in mobile phones is added. It can be used for service, music playback and voice guidance on various types of portable devices, ARS (automatic answering system), navigation (map guidance device) can provide voice guidance service with human-like accent, The artificial intelligence robot device provides the effect of speaking and singing in a human-like accent.

이상에서와 같은 내용의 본 발명이 속하는 기술분야의 당업자는 본 발명의 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시 예들은 모든 면에서 예시된 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. Those skilled in the art to which the present invention pertains as described above may understand that the present invention may be implemented in other specific forms without changing the technical spirit or essential features of the present invention. Therefore, the above-described embodiments are to be understood as illustrative in all respects and not restrictive.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구 범위의 의미 및 범위 그리고 그 등가 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.
The scope of the invention is indicated by the following claims rather than the above description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents should be construed as being included in the scope of the invention. do.

110 : 음악재생정보획득부
120 : 구문분석부
130 : 발음변환부
140 : 최적음소선택부
150 : 음원선택부
160 : 운율제어부
170 : 음성변환부
180 : 음색변환부110: music playback information acquisition unit
120: parser
130: pronunciation converter
140: optimal phone selection unit
150: sound source selection unit
160: rhyme control unit
170: voice conversion unit
180: tone converter

Claims

delete

In the music content production apparatus using character speech synthesis,
A music playback information acquisition unit for acquiring lyrics, singers, tracks, scales, musical lengths, beats, tempos, and music effects inputted for music reproduction;
A syntax analysis unit for analyzing the sentence of the lyrics obtained by the music reproduction information acquisition unit and converting the sentence into a form defined according to linguistic characteristics;
A pronunciation converter for converting the data analyzed by the syntax analyzer based on a phoneme;
An optimum phoneme selection unit for selecting an optimum phoneme according to a rule defined in advance from the optimum phoneme corresponding to the lyrics analyzed by the syntax analyzer and the pronunciation converter;
A sound source selection unit for acquiring the singer information acquired by the music reproduction information acquisition unit and selecting a sound source corresponding to the phoneme selected through the optimum phoneme selection unit from a sound source database;
A rhyme control unit for acquiring the optimal phoneme selected by the optimum phoneme selecting unit according to the sentence characteristics of the lyrics and controlling the length and pitch when combining the optimum phonemes;
A voice converter for acquiring the sentences of the lyrics synthesized by the rhyme control unit and matching the sentences of the lyrics acquired to be reproduced according to the scale, the length, the beat, and the tempo obtained by the music reproduction information acquisition unit;
A tone conversion unit for acquiring a voice converted by the voice conversion unit and matching a tone to the converted voice to be reproduced according to a music effect obtained by the music reproduction information acquisition unit;
Music using a character voice synthesis, comprising: a song and a background music synthesis unit for synthesizing the background music information acquired by the music reproduction information acquisition unit and the tone tone finally converted by the tone conversion unit; Content production device.

delete

The method of claim 2,
The music reproduction information acquisition unit,
Lyrics information acquisition unit for acquiring lyrics information,
A background music information acquisition unit for acquiring the selected background music sound source information among the background music sound sources stored in the sound source database;
A vocal effect acquisition unit for acquiring vocal effect information adjusted by a user,
Apparatus for producing music contents using character speech synthesis, comprising: a singer information acquisition unit for acquiring singer information.

The method of claim 4, wherein
And a piano keyboard position acquiring unit for acquiring piano keyboard position information selected by a user in a virtual piano musical instrument output on the screen.