KR20200065248A

KR20200065248A - Voice timbre conversion system and method from the professional singer to user in music recording

Info

Publication number: KR20200065248A
Application number: KR1020180151531A
Authority: KR
Inventors: 남주한; 용상언
Original assignee: 한국과학기술원
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2020-06-09

Abstract

Proposed are a system and a method for converting a voice of a singer in sound source into a tone of a user. According to one embodiment of the present invention, the system for converting a voice of a singer in sound source into a tone of a user may comprise: a sound source separation unit separating received sound source into vocal sound source and accompaniment sound source; a sound source conversion unit extracting at least one feature of beat, tune, and volume from the vocal sound source and converting sound source recorded by a voice of the user according to the sound source; and an accompaniment mixing unit mixing the accompaniment sound source in the converted user sound source and obtaining sound source converted into the tone of the user.

Description

VOICE TIMBRE CONVERSION SYSTEM AND METHOD FROM THE PROFESSIONAL SINGER TO USER IN MUSIC RECORDING}

아래의 실시예들은 음원의 가수 목소리를 사용자의 음색으로 변환하는 시스템 및 방법에 관한 것으로, 더욱 상세하게는 대중 음악 음원의 가수 목소리의 음색을 일반 사용자의 목소리 음색으로 변환하는 시스템 및 방법에 관한 것이다. The following embodiments relate to a system and method for converting a voice of a singer of a sound source to a voice of a user, and more particularly, to a system and method of converting a voice of a singer of a popular music source to a voice of a general user. .

음악신호는 사람의 노래 음성에 따른 신호뿐만 아니라 다양한 악기가 발생하는 신호가 혼합된 신호이며, 이는 모노(mono) 음악신호와 스테레오(stereo) 음악신호 등으로 나뉠 수 있다. 이러한 음악신호는 노래방 서비스나 멜로디 전사(melody transcription) 서비스 등을 위해 보컬 신호와 반주 신호로 분리할 수 있다. 또한, 최근에는 청취자에게 입체감을 주기 위해 제작된 스테레오 음악신호를 다시 보컬 신호와 반주 신호로 분리하는 기술이 제안되고 있다. The music signal is a signal obtained by mixing various musical instruments as well as a signal according to a human voice, which can be divided into a mono music signal and a stereo music signal. These music signals can be separated into vocal signals and accompaniment signals for karaoke service or melody transcription service. In addition, recently, a technique for separating a stereo music signal produced to give a three-dimensional effect to a listener into a vocal signal and an accompaniment signal has been proposed.

한국등록특허 10-1840015호는 이러한 스테레오 음악신호를 위한 반주신호 추출방법 및 장치에 관한 것으로, 스테레오 음악신호에 대해 패닝 프로세싱과 메디안 필터를 이용하여 반주신호를 추출하는 장치 및 방법에 관한 기술을 기재하고 있다. Korean Registered Patent No. 10-1840015 relates to a method and apparatus for extracting an accompaniment signal for such a stereo music signal, and describes a technique and an apparatus and method for extracting an accompaniment signal using panning processing and a median filter for a stereo music signal Doing.

한국등록특허 10-1840015호Korean Registered Patent 10-1840015

실시예들은 음원의 가수 목소리를 사용자의 음색으로 변환하는 시스템 및 방법에 관하여 기술하며, 보다 구체적으로 대중 음악 음원의 가수 목소리의 음색을 일반 사용자의 목소리 음색으로 변환하는 기술을 제공한다. Embodiments describe a system and method for converting a voice of a singer of a sound source to a user's voice, and more specifically, providing a technique of converting a voice of a singer voice of a popular music source to a voice of a general user.

실시예들은 일반 사용자 노래의 음색은 유지하면서, 박자, 음정, 음량 등 음악적인 표현을 상업 음원의 가수 목소리 창법처럼 들리도록 변환함으로써, 일반 사용자가 가수처럼 잘 부르면서 원래 본인의 음색을 유지하는 개인화된 음원을 제작할 수 있는 음원의 가수 목소리를 사용자의 음색으로 변환하는 시스템 및 방법을 제공하는데 있다. Embodiments convert personalized expressions such as beat, pitch, and volume to sound like a singer's voice of a commercial sound source while maintaining the tone of the general user's song, so that the general user sings like a singer and maintains his or her original tone It is to provide a system and method for converting a singer's voice of a sound source capable of producing an old sound source into a user's tone.

일 실시예에 따른 음원의 가수 목소리를 사용자의 음색으로 변환하는 시스템은, 입력 받은 음원을 보컬 음원 및 반주 음원으로 분리하는 음원 분리부; 상기 보컬 음원으로부터 박자, 음정 및 음량 중 적어도 어느 하나 이상의 특징을 추출하여, 상기 음원을 따라 사용자의 목소리로 녹음된 사용자 음원을 변환하는 음원 변환부; 및 변환된 상기 사용자 음원에 상기 음원 반주를 믹싱하여, 상기 사용자의 음색으로 변환된 음원을 획득하는 반주 믹싱부를 포함하여 이루어질 수 있다. A system for converting a singer's voice of a sound source into a user's voice according to an embodiment includes: a sound source separating unit separating an input sound source into a vocal sound source and an accompaniment sound source; A sound source converting unit for extracting at least one or more features of the beat, pitch, and volume from the vocal sound source, and converting the user sound source recorded in the user's voice along the sound source; And an accompaniment mixing unit mixing the sound source accompaniment with the converted user sound source to obtain a sound source converted into the user's tone.

또한, 상기 입력 받은 음원을 보컬 음원 및 반주 음원으로 분리하기 이전에, 사용자로부터 상기 음원을 선택 받는 음원 입력부; 및 선택된 상기 음원을 따라 사용자가 부르는 노래를 녹음하여 상기 사용자 음원을 획득하는 사용자 음원 획득부를 더 포함할 수 있다. In addition, before separating the input sound source into a vocal sound source and an accompaniment sound source, a sound source input unit that receives the sound source from the user; And a user sound source acquiring unit for acquiring the user sound source by recording a song sung by the user along the selected sound source.

상기 음원 변환부는, 분리된 상기 보컬 음원과 상기 사용자 음원 간의 시간적 정렬을 계산하여, 상기 사용자 음원의 박자를 변환하는 박자 변환부를 포함할 수 있다. The sound source conversion unit may include a time signature conversion unit that calculates a temporal alignment between the separated vocal sound source and the user sound source to convert the time signature of the user sound source.

상기 박자 변환부는, 상기 사용자 음원의 시간 프레임별 음정 특징을 추출하는 음정 특징 추출부와, 시간 프레임별 음소 특징을 추출하는 가사 특징 추출부를 포함하는 특징 추출부; 상기 특징 추출부에서 추출된 특징의 시퀀스 간의 정렬을 수행하고, 타임-스트레칭 비율(time-stretching ratio)을 추출하는 정렬 알고리즘; 및 상기 정렬 알고리즘에서 추출된 상기 타임-스트레칭 비율을 상기 사용자 음원에 적용하는 타임-스케일 변조 알고리즘을 포함할 수 있다. The beat converter includes a feature extracting unit for extracting a pitch feature for each time frame of the user sound source, and a lyrics feature extracting unit for extracting phoneme features for each time frame; An alignment algorithm that performs alignment between sequences of features extracted by the feature extraction unit and extracts a time-stretching ratio; And a time-scale modulation algorithm that applies the time-stretching ratio extracted from the alignment algorithm to the user sound source.

상기 음원 변환부는, 분리된 상기 보컬 음원과 상기 박자 변환부에 의해 박자가 변환된 상기 사용자 음원 간의 음정 비율을 이용하여, 상기 사용자 음원의 음정을 변환하는 음정 변환부를 더 포함할 수 있다. The sound source converting unit may further include a pitch converting unit that converts the pitch of the user sound source by using a pitch ratio between the separated vocal sound source and the user sound source whose time signature is converted by the beat converting unit.

상기 음정 변환부는, 분리된 상기 보컬 음원과 상기 사용자 음원의 음정 특징을 추출하며, 화성음 및 비화성음을 분리(Harmonic-Percussive Source Separation) 알고리즘으로 화성음만으로 음정을 검출하는 음정 검출부; 및 분리된 상기 보컬 음원과 상기 사용자 음원의 음정 비율(pitch ratio)을 이용하여 상기 사용자 음원의 음정을 변환하는 음정 이동부를 포함할 수 있다. The pitch conversion unit, a pitch detection unit for extracting the pitch characteristics of the separated vocal sound source and the user sound source, and detecting pitch with only harmonic sound by a harmonic-percussive source separation algorithm; And a pitch shift unit that converts the pitch of the user sound source by using the pitch ratio of the separated vocal sound source and the user sound source.

상기 음원 변환부는, 분리된 상기 보컬 음원과 상기 박자 변환부 및 상기 음정 변환부에 의해 박자 및 음정이 변환된 상기 사용자 음원 간의 음량 비율을 이용하여, 상기 사용자 음원의 음량을 변환하는 음량 변환부를 더 포함할 수 있다. The sound source conversion unit further comprises a volume conversion unit that converts the volume of the user sound source by using a volume ratio between the separated vocal sound source and the beat conversion unit and the user sound source whose time and pitch are converted by the pitch conversion unit. It can contain.

상기 음량 변환부는, 실효값(Root Mean Square, RMS)을 활용하여 분리된 상기 보컬 음원과 박자 및 음정이 변환된 상기 사용자 음원의 시간대별 음량 비율을 추출하는 음량 검출부; 및 상기 실효값(RMS)의 음량 비율을 이용하여 박자 및 음정이 변환된 상기 사용자 음원의 크기를 시간별로 조절하는 음량 조절부를 포함할 수 있다. The volume converting unit includes: a volume detecting unit for extracting a volume ratio of the vocal sound source separated by the effective value (Root Mean Square, RMS) and the time range of the user sound source in which the beat and pitch are converted; And it may include a volume control unit for adjusting the size of the user sound source by which the time signature and the pitch are converted by time using the volume ratio of the effective value (RMS).

상기 반주 믹싱부는, 음악 제작용 오디오 이펙트를 이용하여 상기 사용자 음원에서 사용자의 음색은 유지하면서 박자, 음정 및 음량을 상기 음원의 노래 표현만 가수처럼 변환된 사용자의 노래를 상기 음원에서 분리된 상기 음원 반주로 믹싱하여, 상기 사용자의 음색으로 변환된 음원을 획득할 수 있다. The accompaniment mixing unit uses the audio effect for music production to maintain the user's tone in the user's sound source while maintaining the beat, pitch, and volume of the user's song in which only the song expression of the sound source is converted into a singer, the sound source separated from the sound source By mixing with an accompaniment, a sound source converted to the user's tone can be obtained.

다른 실시예에 따른 음원의 가수 목소리를 사용자의 음색으로 변환하는 방법은, 입력 받은 음원을 보컬 음원 및 반주 음원으로 분리하는 단계; 상기 보컬 음원으로부터 박자, 음정 및 음량 중 적어도 어느 하나 이상의 특징을 추출하여, 상기 음원을 따라 사용자의 목소리로 녹음된 사용자 음원을 변환하는 단계; 및 변환된 상기 사용자 음원에 상기 음원 반주를 믹싱하여, 상기 사용자의 음색으로 변환된 음원을 획득하는 단계를 포함할 수 있다. According to another embodiment, a method of converting a voice of a singer of a sound source into a user's voice includes: separating an input sound source into a vocal sound source and an accompaniment sound source; Extracting at least one or more features of the beat, pitch, and volume from the vocal sound source, and converting the recorded user sound source into a user's voice along the sound source; And mixing the sound source accompaniment with the converted user sound source to obtain a sound source converted into the user's tone.

또한, 상기 입력 받은 음원을 보컬 음원 및 반주 음원으로 분리하기 이전에, 사용자로부터 상기 음원을 선택 받는 단계; 및 선택된 상기 음원을 따라 사용자가 부르는 노래를 녹음하여 상기 사용자 음원을 획득하는 단계를 더 포함할 수 있다. In addition, before separating the input sound source into a vocal sound source and an accompaniment sound source, selecting the sound source from the user; And acquiring the user's sound source by recording a song sung by the user along the selected sound source.

상기 사용자 음원을 변환하는 단계는, 분리된 상기 보컬 음원과 상기 사용자 음원 간의 시간적 정렬을 계산하여, 상기 사용자 음원의 박자를 변환하는 단계를 포함할 수 있다. The converting of the user sound source may include converting the time signature of the user sound source by calculating a temporal alignment between the separated vocal sound source and the user sound source.

상기 사용자 음원을 변환하는 단계는, 분리된 상기 보컬 음원과 상기 박자 변환부에 의해 박자가 변환된 상기 사용자 음원 간의 음정 비율을 이용하여, 상기 사용자 음원의 음정을 변환하는 단계를 더 포함할 수 있다. The converting of the user sound source may further include converting the pitch of the user sound source by using a pitch ratio between the separated vocal sound source and the user sound source whose time signature has been converted by the beat conversion unit. .

상기 사용자 음원을 변환하는 단계는, 분리된 상기 보컬 음원과 상기 박자 변환부 및 상기 음정 변환부에 의해 박자 및 음정이 변환된 상기 사용자 음원 간의 음량 비율을 이용하여, 상기 사용자 음원의 음량을 변환하는 단계를 더 포함할 수 있다. In the converting of the user sound source, the volume of the user sound source is converted by using a volume ratio between the separated vocal sound source and the time signature and the user sound source whose pitch and pitch are converted by the pitch conversion unit. It may further include a step.

상기 사용자 음원에 상기 음원 반주를 믹싱하여, 상기 사용자의 음색으로 변환된 음원을 획득하는 단계는, 음악 제작용 오디오 이펙트를 이용하여 상기 사용자 음원에서 사용자의 음색은 유지하면서 박자, 음정 및 음량을 상기 음원의 노래 표현만 가수처럼 변환된 사용자의 노래를 상기 음원에서 분리된 상기 음원 반주로 믹싱하여, 상기 사용자의 음색으로 변환된 음원을 획득할 수 있다. The step of mixing the sound source accompaniment with the user sound source and obtaining a sound source converted to the user's tone is characterized by using the audio effect for music production while maintaining the tone of the user while maintaining the tone of the user in the user sound source. It is possible to obtain a sound source converted into the tone of the user by mixing the user's song, which is converted into a singer, with only the expression of the song of the sound source as the sound source accompaniment separated from the sound source.

실시예들에 따르면 일반 사용자 노래의 음색은 유지하면서, 박자, 음정, 음량 등 음악적인 표현을 상업 음원의 가수 목소리 창법처럼 들리도록 변환함으로써, 일반 사용자가 가수처럼 잘 부르면서 원래 본인의 음색을 유지하는 개인화된 음원을 제작할 수 있는 음원의 가수 목소리를 사용자의 음색으로 변환하는 시스템 및 방법을 제공할 수 있다. According to the embodiments, while maintaining the tone of the general user song, by converting musical expressions such as beat, pitch, and volume to sound like a singer voice method of a commercial sound source, the general user sings like a singer and maintains the original tone of the original It is possible to provide a system and method for converting a singer's voice of a sound source capable of producing a personalized sound source to a user's tone.

도 1은 일 실시예에 따른 음원의 가수 목소리를 사용자의 음색으로 변환하는 시스템을 설명하기 위한 도면이다.
도 2는 일 실시예에 따른 음원의 가수 목소리를 사용자의 음색으로 변환하는 시스템을 나타내는 도면이다.
도 3은 일 실시예에 따른 음원의 가수 목소리를 사용자의 음색으로 변환하는 방법을 나타내는 흐름도이다.
도 4는 일 실시예에 따른 보컬 음원을 이용하여 사용자 음원을 변환하는 방법을 나타내는 흐름도이다. 1 is a diagram for explaining a system for converting a voice of a singer of a sound source into a user's voice according to an embodiment.
2 is a diagram illustrating a system for converting a voice of a singer of a sound source into a voice of a user according to an embodiment.
3 is a flowchart illustrating a method of converting a voice of a singer of a sound source into a voice of a user according to an embodiment.
4 is a flowchart illustrating a method of converting a user sound source using a vocal sound source according to an embodiment.

이하, 첨부된 도면을 참조하여 실시예들을 설명한다. 그러나, 기술되는 실시예들은 여러 가지 다른 형태로 변형될 수 있으며, 본 발명의 범위가 이하 설명되는 실시예들에 의하여 한정되는 것은 아니다. 또한, 여러 실시예들은 당해 기술분야에서 평균적인 지식을 가진 자에게 본 발명을 더욱 완전하게 설명하기 위해서 제공되는 것이다. 도면에서 요소들의 형상 및 크기 등은 보다 명확한 설명을 위해 과장될 수 있다.Hereinafter, embodiments will be described with reference to the accompanying drawings. However, the described embodiments may be modified in various other forms, and the scope of the present invention is not limited by the embodiments described below. In addition, various embodiments are provided to more fully describe the present invention to those skilled in the art. The shape and size of elements in the drawings may be exaggerated for a more clear description.

아래의 실시예들은 음원의 가수 목소리를 사용자의 음색으로 변환하는 시스템 및 방법에 관하여 기술하며, 보다 구체적으로 대중 음악 음원의 가수 목소리의 음색을 일반 사용자의 목소리 음색으로 변환하는 기술을 제공한다. The following embodiments describe a system and method for converting a voice of a singer of a sound source to a user's voice, and more specifically, provide a technique of converting a voice of a singer voice of a popular music source to a voice of a general user.

노래를 잘 부르고 싶은 욕구에는 전세계 모든 국가와 문화에 공통적이다. 일반 사용자의 노래의 음색은 유지하면서, 박자, 음정, 음량 등 음악적인 표현을 상업 음원의 가수 목소리 창법처럼 들리도록 변환할 수 있다. 이에 따라 일반 사용자가 가수처럼 잘 부르면서, 원래 본인의 음색을 유지하는 개인화된 음원을 제작할 수 있으며, 사용자가 모창하고 싶은 곡만 가지고 있다면 녹음, 창법 변환, 믹싱 등 전과정을 자동화해서 진행할 수 있다. The desire to sing well is common to all countries and cultures around the world. While maintaining the tone of a normal user's song, it is possible to convert musical expressions such as beat, pitch, and volume to sound like a singer's voice creation method of a commercial sound source. Accordingly, a normal user sings like a singer and can create a personalized sound source that maintains his or her original tone, and if the user only has a song he wants to sang, he can automate the entire process, such as recording, creative conversion, and mixing.

도 1은 일 실시예에 따른 음원의 가수 목소리를 사용자의 음색으로 변환하는 시스템을 설명하기 위한 도면이다. 1 is a diagram for explaining a system for converting a voice of a singer of a sound source into a user's voice according to an embodiment.

도 1에 도시된 바와 같이, 일 실시예에 따른 음원의 가수 목소리를 사용자의 음색으로 변환하는 시스템(100)은 사용자(U)가 가수의 음원(110)을 선택할 수 있다. 사용자(U)가 가수의 음원(110)을 선택함에 따라 음원(110)을 반주 음원(111)과 가수의 보컬 음원(112)으로 분리할 수 있다. As shown in FIG. 1, in the system 100 for converting a voice of a singer of a sound source into a voice of a user according to an embodiment, the user U can select the sound source 110 of the singer. As the user U selects the sound source 110 of the singer, the sound source 110 may be divided into an accompaniment sound source 111 and a singer's vocal sound source 112.

한편, 사용자(U)는 가수의 음원(110)을 따라 직접 노래를 불러 사용자 음원(120)을 녹음할 수 있다. 이 때, 사용자 음원(120)은 반주가 없는 보컬 음원일 수 있다. 이 사용자 음원(120)을 가수의 음원(110)에서 분리된 보컬 음원(112)을 반영하여 변환함으로써, 가수 목소리 창법처럼 변환된 사용자 음원(보컬 음원)(130)을 획득할 수 있다. Meanwhile, the user U can record the user sound source 120 by directly singing along the singer's sound source 110. At this time, the user sound source 120 may be a vocal sound source without accompaniment. By converting the user sound source 120 by reflecting and converting the vocal sound source 112 separated from the singer's sound source 110, the converted user sound source (vocal sound source) 130 can be obtained like a singer voice method.

가수 목소리 창법처럼 변환된 사용자 음원(130)을 가수의 음원(110)에서 분리된 반주 음원(111)과 믹싱함으로써, 사용자 음색으로 변환된 음원(140)을 획득할 수 있다. By mixing the user's sound source 130 converted as the singer's voice method with the accompaniment sound source 111 separated from the singer's sound source 110, the sound source 140 converted to the user voice can be obtained.

실시예들은 음성 분리 기술(singing voice separation)을 접목하여 좋아하는 가수의 음원만 있는 상황에서도 분리된 가수의 음성에서 창법 특성들을 추출하고, 사용자의 음성에 이식한 후 분리된 반주와 믹싱하여 개인화된 음원을 새로 창출할 수 있다. The embodiments are personalized by grafting a singing voice separation to extract creative characteristics from the voice of the separated singer even after only the sound source of the favorite singer is implanted, implanted in the user's voice, and mixed with the separated accompaniment. You can create a new sound source.

음원 측면에서는 가수의 목소리에서 음색만 사용자의 음색으로 바뀌는 형태 변환되며, 사용자가 모창하고 싶은 곡만 가지고 있다면 녹음부터 목소리 이식까지 전 과정을 자동화해서 진행할 수 있다. 더욱이, 사용자의 음색과 가수의 창법을 결합하여 새로운 음원을 생성함으로써 팬과 아티스트를 음악적으로 직접 연결하는 매개 역할을 할 수 있다.In terms of sound source, only the voice is changed from the voice of the singer to the voice of the user, and if the user only has the song they want to sang, the entire process from recording to voice porting can be automated. Moreover, by combining the user's voice with the singer's creative method, a new sound source can be created, which can serve as an intermediary for directly connecting fans and artists musically.

도 2는 일 실시예에 따른 음원의 가수 목소리를 사용자의 음색으로 변환하는 시스템을 나타내는 도면이다. 2 is a diagram illustrating a system for converting a voice of a singer of a sound source into a voice of a user according to an embodiment.

도 2를 참조하면, 일 실시예에 따른 음원의 가수 목소리를 사용자의 음색으로 변환하는 시스템(200)은 음원 분리부(210), 음원 변환부 및 반주 믹싱부(250)를 포함하여 이루어질 수 있다. 여기서, 음원 변환부는 박자 변환부(220), 음정 변환부(230) 및 음량 변환부(240)를 포함할 수 있다. 또한, 실시예에 따라 음원의 가수 목소리를 사용자의 음색으로 변환하는 시스템(200)은 음원 입력부 및 사용자 음원 획득부를 더 포함하여 이루어질 수 있다. Referring to FIG. 2, a system 200 for converting a voice of a singer of a sound source into a user's voice according to an embodiment may include a sound source separation unit 210, a sound source conversion unit and an accompaniment mixing unit 250. . Here, the sound source conversion unit may include a time signature conversion unit 220, a pitch conversion unit 230, and a volume conversion unit 240. In addition, according to an embodiment, the system 200 for converting the voice of a singer of a sound source into a user's voice may further include a sound source input unit and a user sound source acquisition unit.

음원 분리부(210)는 입력 받은 음원(201)을 보컬 음원 및 반주 음원으로 분리할 수 있다. The sound source separation unit 210 may separate the input sound source 201 into a vocal sound source and an accompaniment sound source.

한편, 입력 받은 음원(201)을 보컬 음원 및 반주 음원으로 분리하기 이전에, 사용자로부터 음원을 선택 받는 음원 입력부를 더 포함할 수 있다. Meanwhile, before separating the input sound source 201 into a vocal sound source and an accompaniment sound source, the sound source input unit for selecting a sound source from a user may be further included.

음원 변환부는 보컬 음원으로부터 박자, 음정 및 음량 중 적어도 어느 하나 이상의 특징을 추출하여, 음원(201)을 따라 사용자의 목소리로 녹음된 사용자 음원(202)을 변환할 수 있다. The sound source converting unit may extract at least one or more features of beat, pitch, and volume from the vocal sound source, and convert the user sound source 202 recorded in the user's voice along the sound source 201.

여기서, 선택된 음원(201)을 따라 사용자가 부르는 노래를 녹음하여 사용자 음원(202)을 획득하는 사용자 음원(202) 획득부를 더 포함할 수 있다. Here, the user's sound source 202 may be further included to acquire a user's sound source 202 by recording a song sung by the user along the selected sound source 201.

음원 변환부는 박자 변환부(220), 음정 변환부(230) 및 음량 변환부(240)를 포함할 수 있다. The sound source conversion unit may include a time signature conversion unit 220, a pitch conversion unit 230, and a volume conversion unit 240.

박자 변환부(220)는 분리된 보컬 음원과 사용자 음원(202) 간의 시간적 정렬을 계산하여, 사용자 음원(202)의 박자를 변환할 수 있다. The time signature conversion unit 220 may calculate the temporal alignment between the separated vocal sound source and the user sound source 202 to convert the time signature of the user sound source 202.

보다 구체적으로, 박자 변환부(220)는 특징 추출부, 정렬 알고리즘 및 타임-스케일 변조 알고리즘을 포함할 수 있다. 특징 추출부는 사용자 음원(202)의 시간 프레임별 음정 특징을 추출하는 음정 특징 추출부와, 시간 프레임별 음소 특징을 추출하는 가사 특징 추출부를 포함할 수 있다. 그리고 정렬 알고리즘은 특징 추출부에서 추출된 특징의 시퀀스 간의 정렬을 수행하고, 타임-스트레칭 비율(time-stretching ratio)을 추출할 수 있으며, 타임-스케일 변조 알고리즘은 정렬 알고리즘에서 추출된 타임-스트레칭 비율을 사용자 음원(202)에 적용할 수 있다. More specifically, the time signature conversion unit 220 may include a feature extraction unit, an alignment algorithm, and a time-scale modulation algorithm. The feature extracting unit may include a pitch feature extracting unit for extracting pitch characteristics for each time frame of the user sound source 202, and a lyrics feature extracting unit for extracting phoneme characteristics for each time frame. In addition, the sorting algorithm may perform sorting between sequences of features extracted from the feature extraction unit, and extract a time-stretching ratio, and the time-scale modulation algorithm may extract the time-stretching ratio extracted from the sorting algorithm. Can be applied to the user sound source 202.

음정 변환부(230)는 분리된 보컬 음원과 박자 변환부(220)에 의해 박자가 변환된 사용자 음원(202) 간의 음정 비율을 이용하여, 사용자 음원(202)의 음정을 변환할 수 있다. The pitch conversion unit 230 may convert the pitch of the user sound source 202 by using a pitch ratio between the separated vocal sound source and the user sound source 202 where the time signature is converted by the time signature conversion unit 220.

보다 구체적으로, 음정 변환부(230)는 음정 검출부 및 음정 이동부를 포함할 수 있다. 음정 검출부는 분리된 보컬 음원과 사용자 음원(202)의 음정 특징을 추출하며, 화성음 및 비화성음을 분리(Harmonic-Percussive Source Separation) 알고리즘으로 화성음만으로 음정을 검출할 수 있다. 그리고 음정 이동부는 분리된 보컬 음원과 사용자 음원(202)의 음정 비율(pitch ratio)을 이용하여 사용자 음원(202)의 음정을 변환하는 음정 이동부를 포함할 수 있다. More specifically, the pitch conversion unit 230 may include a pitch detection unit and a pitch movement unit. The pitch detection unit extracts the pitch characteristics of the separated vocal sound source and the user sound source 202, and can detect the pitch with only the harmonic sound using a harmonic-percussive source separation algorithm. In addition, the pitch shift unit may include a pitch shift unit that converts the pitch of the user sound source 202 by using the separated vocal sound source and the pitch ratio of the user sound source 202.

음량 변환부(240)는 분리된 보컬 음원과 박자 변환부(220) 및 음정 변환부(230)에 의해 박자 및 음정이 변환된 사용자 음원(202) 간의 음량 비율을 이용하여, 사용자 음원(202)의 음량을 변환할 수 있다. The volume converter 240 uses the volume ratio between the separated vocal sound source and the user's sound source 202 in which the time signature and the pitch are converted by the time signature converter 220 and the pitch converter 230, and the user's sound source 202 You can convert the volume.

보다 구체적으로, 음량 변환부(240)는 음량 검출부 및 음량 조절부를 포함할 수 있다. 음량 검출부는 실효값(Root Mean Square, RMS)을 활용하여 분리된 보컬 음원과 박자 및 음정이 변환된 사용자 음원(202)의 시간대별 음량 비율을 추출할 수 있다. 음량 조절부는 실효값(RMS)의 음량 비율을 이용하여 박자 및 음정이 변환된 사용자 음원(202)의 크기를 시간별로 조절할 수 있다. More specifically, the volume converter 240 may include a volume detector and a volume controller. The volume detection unit may extract the volume ratio of each vocal sound source separated by the effective value (Root Mean Square, RMS) and the user sound source 202 in which time and pitch are converted. The volume control unit may adjust the size of the user sound source 202 in which the time signature and pitch are converted by time using the volume ratio of the effective value (RMS).

반주 믹싱부(250)는 변환된 사용자 음원(202)에 음원 반주를 믹싱하여, 사용자의 음색으로 변환된 음원(203)을 획득할 수 있다. 이러한 반주 믹싱부(250)는 음악 제작용 오디오 이펙트를 이용하여 사용자 음원(202)에서 사용자의 음색은 유지하면서 박자, 음정 및 음량을 음원(201)의 노래 표현만 가수처럼 변환된 사용자의 노래를 음원(201)에서 분리된 음원 반주로 믹싱하여, 사용자의 음색으로 변환된 음원(203)을 획득할 수 있다. The accompaniment mixing unit 250 may mix the sound source accompaniment with the converted user sound source 202 to obtain a sound source 203 converted into a user's tone. The accompaniment mixing unit 250 uses the audio effect for music production, while maintaining the user's tone in the user's sound source 202, while maintaining the beat, pitch, and volume of the user's song, the song expression of the sound source 201 is converted into a singer's song. By mixing with the sound source accompaniment separated from the sound source 201, a sound source 203 converted to a user's tone can be obtained.

일 실시예에 따른 음원의 가수 목소리를 사용자의 음색으로 변환하는 시스템(200)은 아래에서 보다 상세히 설명하기로 한다. The system 200 for converting the voice of a singer of a sound source into a user's voice according to an embodiment will be described in more detail below.

도 3은 일 실시예에 따른 음원의 가수 목소리를 사용자의 음색으로 변환하는 방법을 나타내는 흐름도이다. 3 is a flowchart illustrating a method of converting a voice of a singer of a sound source into a voice of a user according to an embodiment.

도 3을 참조하면, 일 실시예에 따른 음원의 가수 목소리를 사용자의 음색으로 변환하는 방법은, 입력 받은 음원을 보컬 음원 및 반주 음원으로 분리하는 단계(S110), 보컬 음원으로부터 박자, 음정 및 음량 중 적어도 어느 하나 이상의 특징을 추출하여, 음원을 따라 사용자의 목소리로 녹음된 사용자 음원을 변환하는 단계(S120), 및 변환된 사용자 음원에 음원 반주를 믹싱하여, 사용자의 음색으로 변환된 음원을 획득하는 단계(S130)를 포함할 수 있다. Referring to FIG. 3, a method of converting a singer's voice of a sound source into a user's voice according to an embodiment is a step of separating the input sound source into a vocal sound source and an accompaniment sound source (S110), beat, pitch, and volume from the vocal sound source Extracting at least one or more features, converting the user sound source recorded in the user's voice along the sound source (S120), and mixing the sound source accompaniment with the converted user sound source to obtain a sound source converted to the user's tone It may include a step (S130).

한편, 음원의 가수 목소리를 사용자의 음색으로 변환하는 방법은, 입력 받은 음원을 보컬 음원 및 반주 음원으로 분리하기 이전에, 사용자로부터 음원을 선택 받는 단계, 및 선택된 음원을 따라 사용자가 부르는 노래를 녹음하여 사용자 음원을 획득하는 단계를 더 포함할 수 있다. On the other hand, the method of converting the voice of the singer of the sound source to the user's voice, prior to separating the input sound source into a vocal sound source and an accompaniment sound source, the step of selecting a sound source from the user, and recording a song sung by the user according to the selected sound source The method may further include acquiring a user sound source.

또한, 보컬 음원으로부터 박자, 음정 및 음량 중 적어도 어느 하나 이상의 특징을 추출하여, 음원을 따라 사용자의 목소리로 녹음된 사용자 음원을 변환하는 단계(S120)는 아래와 같은 단계를 더 포함할 수 있다. In addition, the step (S120) of extracting at least one or more features of a beat, a pitch, and a volume from a vocal sound source and converting the recorded user sound source into a user's voice along the sound source may further include the following steps.

도 4는 일 실시예에 따른 보컬 음원을 이용하여 사용자 음원을 변환하는 방법을 나타내는 흐름도이다. 4 is a flowchart illustrating a method of converting a user sound source using a vocal sound source according to an embodiment.

도 4를 참조하면, 사용자 음원을 변환하는 단계(S120)는, 분리된 보컬 음원과 사용자 음원 간의 시간적 정렬을 계산하여, 사용자 음원의 박자를 변환하는 단계(S121), 분리된 보컬 음원과 박자 변환부에 의해 박자가 변환된 사용자 음원 간의 음정 비율을 이용하여, 사용자 음원의 음정을 변환하는 단계(S122), 그리고 분리된 보컬 음원과 박자 변환부 및 음정 변환부에 의해 박자 및 음정이 변환된 사용자 음원 간의 음량 비율을 이용하여, 사용자 음원의 음량을 변환하는 단계(S123)를 포함할 수 있다. Referring to Figure 4, the step of converting the user sound source (S120), calculating the temporal alignment between the separated vocal sound source and the user sound source, converting the time signature of the user sound source (S121), the separated vocal sound source and time signature conversion A step of converting the pitch of the user's sound source by using the pitch ratio between the user's sound sources whose time signature has been converted by the unit (S122), and the user whose time and pitch have been converted by the separated vocal sound source and the beat conversion unit and the pitch conversion unit It may include the step of converting the volume of the user sound source (S123) by using the volume ratio between the sound sources.

실시예들은 사용자가 원하는 노래의 가수 목소리를 창법을 유지하면서 음색만 사용자의 음색으로 변환하고 새로운 개인화된 음원을 창출할 수 있다. 이는 모바일 폰 등의 단말의 어플리케이션 형태로 구현될 수 있다. 이에 따라 소셜 엔터테인먼트 어플리케이션으로써 즉시 적용 가능하다. 더욱이, 실시예들은 변환에 대한 비용이 알고리즘으로만 이루어지기 때문에 비용이 거의 발생하지 않는다. The embodiments can convert the voice of the singer of the song desired by the user to the voice of the user while maintaining the original method and create a new personalized sound source. This may be implemented in the form of an application of a terminal such as a mobile phone. Accordingly, it can be applied immediately as a social entertainment application. Moreover, the embodiments have little cost because the cost for the conversion is made only with the algorithm.

또한, 실시예들은 박자나 음정이 틀리더라도 가수의 창법 그대로 변화하기 때문에 "Wow Effect"가 매우 큰 서비스로 발전할 수 있다. 그리고, 실시예들은 자신의 목소리와 가수의 창법이 결합된 음악을 만들기 때문에, 팬과 가수와의 연결 고리 역할을 할 수 있다. 실시예들에 따르면 노래를 가수처럼 부르는 음원을 제작할 수 있을 뿐만 아니라, 특히 가수와 직접 연결되는 경험을 할 수 있다. In addition, in the embodiments, "Wow Effect" can be developed as a very large service because the singer's method changes even if the beat or the pitch is wrong. And, since the embodiments make music combining the voice of the singer with the singer's creative method, it can serve as a link between the fan and the singer. According to the embodiments, it is possible not only to produce a sound source that sings like a singer, but also to experience a direct connection with a singer.

도 3 및 도 4에서 설명하는 일 실시예에 따른 음원의 가수 목소리를 사용자의 음색으로 변환하는 방법을 아래에서 하나의 예를 들어 설명한다. A method of converting a voice of a singer of a sound source into a voice of a user according to an embodiment described with reference to FIGS. 3 and 4 will be described as an example below.

일 실시예에 따른 음원의 가수 목소리를 사용자의 음색으로 변환하는 방법은 도 2에서 설명한 일 실시예에 따른 음원의 가수 목소리를 사용자의 음색으로 변환하는 시스템을 이용하여 보다 상세히 설명할 수 있다. 일 실시예에 따른 음원의 가수 목소리를 사용자의 음색으로 변환하는 시스템은 음원 분리부, 음원 변환부 및 반주 믹싱부를 포함하여 이루어질 수 있다. 여기서, 음원 변환부는 박자 변환부, 음정 변환부 및 음량 변환부를 포함할 수 있다.The method of converting the voice of a singer of a sound source to a user's voice according to an embodiment may be described in more detail by using a system of converting the voice of a singer of a sound source to a voice of a user according to an embodiment described in FIG. 2. The system for converting a singer's voice of a sound source into a user's tone according to an embodiment may include a sound source separation unit, a sound source conversion unit, and an accompaniment mixing unit. Here, the sound source conversion unit may include a time signature conversion unit, a pitch conversion unit, and a volume conversion unit.

단계(S110)에서, 음원 분리부는 입력 받은 음원을 보컬 음원 및 반주 음원으로 분리할 수 있다. 즉, 음원 분리부는 음원의 보컬 및 반주를 분리하는 것으로, 보다 구체적으로 일반 대중 음악 음원에서 보컬 및 반주를 분리하는 음원 분리 알고리즘(Singing Voice Separation)을 포함할 수 있다. In step S110, the sound source separation unit may separate the input sound source into a vocal sound source and an accompaniment sound source. That is, the sound source separation unit separates the vocal and accompaniment of the sound source, and more specifically, may include a singing voice separation algorithm that separates the vocal and accompaniment from a general popular music source.

여기서, 음원 분리부는 U-Net 등 딥러닝을 이용한 알고리즘을 통해 좋은 성능을 보이도록 할 수 있으며, 그 외 NMF(Non-negative Matrix Factorization)를 바탕으로 한 알고리즘으로 구성할 수 있다. NMF의 경우, 사용자의 노래를 이용하여 음원 분리 알고리즘을 향상시킬 수도 있다(User-guided source separation). Here, the sound source separation unit may show good performance through an algorithm using deep learning such as U-Net, and may be configured with an algorithm based on other non-negative matrix factorization (NMF). In the case of NMF, a user-guided source separation may be improved using a user's song.

또한, 음원 분리부는 별도의 노래 검출기(Singing Voice Detector)를 이용하여 노래 구간 검출을 하여 보컬 분리 성능을 향상시킬 수 있다. 노래 검출기 또한 딥러닝 기반의 알고리즘을 통해 좋은 성능을 보이도록 할 수 있다. In addition, the sound source separation unit may improve the vocal separation performance by detecting a song section using a separate singing detector. The song detector can also perform well through deep learning-based algorithms.

단계(S120)에서, 음원 변환부는 보컬 음원으로부터 박자, 음정 및 음량 중 적어도 어느 하나 이상의 특징을 추출하여, 음원을 따라 사용자의 목소리로 녹음된 사용자 음원을 변환할 수 있다. In step S120, the sound source converting unit may extract at least one or more features of beat, pitch, and volume from the vocal sound source, and convert the user's sound source recorded in the user's voice along the sound source.

보다 구체적으로, 단계(S121)에서, 박자 변환부는 분리된 보컬 음원과 사용자 음원 간의 시간적 정렬을 계산하여, 사용자 음원의 박자를 변환할 수 있다. More specifically, in step S121, the time signature conversion unit may calculate the temporal alignment between the separated vocal sound source and the user sound source to convert the time signature of the user sound source.

박자 변환부는 특징 추출부, 정렬(alignment) 알고리즘 및 타임-스케일 변조(time-scale modification) 알고리즘을 포함할 수 있다. The time signature conversion unit may include a feature extraction unit, an alignment algorithm, and a time-scale modification algorithm.

또한, 특징 추출부는 음정 특징 추출부 및 가사 특징 추출부를 포함할 수 있다. 음정 특징 추출부는 사용자 음원의 음정 검출기(pitch detector) 또는 음정 분류기(pitch classifier) 등을 사용하여 시간 프레임별 음정 특징을 추출할 수 있다. 그리고, 가사 특징 추출부는 음소 분류기(phoneme classifier)를 사용하여 시간 프레임별 음소 특징을 추출할 수 있다. Also, the feature extracting unit may include a pitch feature extracting unit and a lyrics feature extracting unit. The pitch feature extraction unit may extract pitch characteristics for each time frame using a pitch detector or a pitch classifier of the user's sound source. Then, the lyrics feature extraction unit may extract phoneme features for each time frame using a phoneme classifier.

정렬 알고리즘은 동적 시간 워핑(Dynamic Time Warping, DTW), 은닉 마르코프 모델(Hidden Markov Model, HMM) 등을 이용하여 앞에서 추출한 두 특징 벡터의 시퀀스 간의 정렬을 수행할 수 있다. 그리고 정렬 알고리즘은 정렬 결과를 이용하여 시간에 따른 타임-스트레칭 비율(time-stretching ratio)을 추출 가능하다. 또한 정렬 알고리즘은 스무딩 필터(Smoothing Filter)를 이용하여 타임-스트레칭 비율(time-stretching ratio)을 곡선화할 수 있다. The alignment algorithm can perform the alignment between the sequences of the two feature vectors extracted earlier using Dynamic Time Warping (DTW), Hidden Markov Model (HMM), and the like. And the alignment algorithm can extract the time-stretching ratio according to time using the alignment result. In addition, the alignment algorithm may curve a time-stretching ratio using a smoothing filter.

타임-스케일 변조 알고리즘은 정렬 알고리즘에서 추출된 타임-스트레칭 비율(time-stretching ratio)을 사용자가 녹음한 노래 신호에 적용할 수 있다. 결과적으로, 타임-스케일 변조 알고리즘은 사용자의 음표별 타이밍이 가수가 부른 것처럼 변환할 수 있다. The time-scale modulation algorithm may apply a time-stretching ratio extracted from the alignment algorithm to a song signal recorded by a user. As a result, the time-scale modulation algorithm can convert the user's note-by-note timing as if the singer sang it.

단계(S122)에서, 음정 변환부는 분리된 보컬 음원과 박자 변환부에 의해 박자가 변환된 사용자 음원 간의 음정 비율을 이용하여, 사용자 음원의 음정을 변환할 수 있다. In step S122, the pitch conversion unit may convert the pitch of the user sound source by using a pitch ratio between the separated vocal sound source and the user sound source whose time signature has been converted by the time signature conversion unit.

음정 변환부는 음정 검출부(Pitch Detection) 및 음정 이동부(Pitch Shifting)를 포함할 수 있다. The pitch conversion unit may include a pitch detection unit and a pitch shifting unit.

음정 검출부는 상기 박자 변환부의 음정 특징 추출부에서 사용한 음정 검출기를 가수의 음원에도 적용하여 추출할 수 있다. 정확한 음정 검출을 위해서 화성음 및 비화성음을 분리 (Harmonic-Percussive Source Separation) 알고리즘으로 화성음만으로 음정 검출을 할 수 있다. The pitch detector can be extracted by applying the pitch detector used in the pitch feature extraction unit of the beat converter to the sound source of the singer. For accurate pitch detection, it is possible to detect pitches using only harmonics with a harmonic-percussive source separation algorithm.

음정 이동부는 두 노래의 음정 비율(pitch ratio)을 이용하여 사용자 음원의 음정을 변환할 수 있다. 이 때, 사용자 음원의 음색을 유지하기 위하여 포먼트(formant)를 보존하는 음정 이동 알고리즘인 PSOLA(Pitch-Synchronous OverLap and Add)을 적용할 수 있다. The pitch moving unit may convert the pitch of the user's sound source by using the pitch ratio of the two songs. In this case, pitch-synchronous overlap and add (PSOLA), which is a pitch shift algorithm that preserves formants, can be applied to maintain the tone of the user's sound source.

단계(S123)에서, 음량 변환부는 분리된 보컬 음원과 박자 변환부 및 음정 변환부에 의해 박자 및 음정이 변환된 사용자 음원 간의 음량 비율을 이용하여, 사용자 음원의 음량을 변환할 수 있다. In step S123, the volume converting unit may convert the volume of the user's sound source by using a volume ratio between the separated vocal sound source and the time signature and the user's sound source whose pitch and pitch have been converted by the time converter.

이러한 음량 변환부는 음량 검출부(Envelope Detector) 및 음량 조절부(Gain Control)를 포함할 수 있다. The volume conversion unit may include a volume detector and a volume control unit.

음량 검출부는 실효값(Root Mean Square, RMS) 등을 활용하여 두 음원의 시간대별 음량 비율을 추출할 수 있다. The volume detection unit may extract the volume ratio of each sound source over time by using an effective value (Root Mean Square, RMS).

음량 조절부는 실효값(RMS)의 비율을 이용하여 해당 음원의 크기를 시간별로 조절할 수 있다. The volume control unit may adjust the size of the corresponding sound source by time using the ratio of the effective value (RMS).

마지막으로, 단계(S130)에서, 반주 믹싱부는 변환된 사용자 음원에 음원 반주를 믹싱하여, 사용자의 음색으로 변환된 음원을 획득할 수 있다. Finally, in step S130, the accompaniment mixing unit may mix the sound source accompaniment with the converted user sound source to obtain a sound source converted into a user's tone.

반주 믹싱부는 음악 제작용 오디오 이펙트(예를 들어, compressor, equalizer, reverberation) 등을 이용하여 사용자 노래의 소리를 보다 향상시킬 수 있다. 또한, 이펙트에 대한 프리셋(preset) 조합을 미리 만들어서 사용자로 하여금 선택하게 하는 것도 가능하다. 그리고, 반주 믹싱부는 원곡에서 분리된 반주 외에 원곡과 같은 템포과 조를 가진 MR(Music Recorded) 반주를 이용하여 믹싱하는 것도 가능하다. The accompaniment mixing unit may further improve the sound of a user song by using an audio effect (eg, compressor, equalizer, reverberation) for music production. It is also possible to pre-create a combination of presets for the effect, allowing the user to select it. In addition, the accompaniment mixing unit may mix using an MR (Music Recorded) accompaniment having the same tempo and tone as the original music in addition to the accompaniment separated from the original music.

실시예들은 음악을 중심으로 한 엔터테인먼트 분야에 적용 가능하며, 예컨대 모바일 폰의 어플리케이션 중심의 음악 콘텐츠 분야 등에 적용할 수 있다. 또한, 실시예들은 페이스북, 유튜브 등 동영상, 음원 등의 공유가 활발한 소셜네트워크 기반의 미디어 서비스 분야에 적용할 수 있다.The embodiments can be applied to the entertainment field centered on music, for example, to the application-oriented music content field of a mobile phone. Further, the embodiments can be applied to a social network-based media service field in which sharing of videos and sound sources such as Facebook and YouTube is active.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 컨트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented with hardware components, software components, and/or combinations of hardware components and software components. For example, the devices and components described in the embodiments include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor (micro signal processor), a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose computers or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may run an operating system (OS) and one or more software applications running on the operating system. In addition, the processing device may access, store, manipulate, process, and generate data in response to the execution of the software. For convenience of understanding, a processing device may be described as one being used, but a person having ordinary skill in the art, the processing device may include a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include. For example, the processing device may include a plurality of processors or a processor and a controller. In addition, other processing configurations, such as parallel processors, are possible.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instruction, or a combination of one or more of these, and configure the processing device to operate as desired, or process independently or collectively You can command the device. Software and/or data may be interpreted by a processing device, or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. Can be embodied in The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded in the medium may be specially designed and configured for the embodiments or may be known and usable by those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic media such as floptical disks. -Hardware devices specifically configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language code that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다. As described above, although the embodiments have been described by a limited embodiment and drawings, those skilled in the art can make various modifications and variations from the above description. For example, the described techniques are performed in a different order than the described method, and/or the components of the described system, structure, device, circuit, etc. are combined or combined in a different form from the described method, or other components Alternatively, even if replaced or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A sound source separation unit separating the input sound source into a vocal sound source and an accompaniment sound source;
A sound source converting unit for extracting at least one or more features of the beat, pitch, and volume from the vocal sound source, and converting the user sound source recorded in the user's voice along the sound source; And
An accompaniment mixing unit mixing the sound source accompaniment with the converted user sound source to obtain a sound source converted into the user's tone
A system for converting the voice of a singer of a sound source into a user's voice, including a.

According to claim 1,
A sound source input unit for selecting the sound source from a user before separating the input sound source into a vocal sound source and an accompaniment sound source; And
User sound source acquiring unit for acquiring the user sound source by recording a song sung by the user along the selected sound source
Further comprising, a system for converting the voice of the singer of the sound source to the user's voice.

According to claim 1,
The sound source conversion unit,
A beat conversion unit that calculates the temporal alignment between the separated vocal sound source and the user sound source to convert the time signature of the user sound source
A system for converting the voice of a singer of a sound source into a user's voice, including a.

According to claim 3,
The beat conversion unit,
A feature extracting unit including a pitch feature extracting unit for extracting pitch characteristics for each time frame of the user sound source, and a lyrics feature extracting unit for extracting phoneme characteristics for each time frame;
An alignment algorithm that performs alignment between sequences of features extracted by the feature extraction unit and extracts a time-stretching ratio; And
A time-scale modulation algorithm that applies the time-stretching ratio extracted from the alignment algorithm to the user sound source
A system for converting a voice of a singer of a sound source into a user's voice, including a.

According to claim 3,
The sound source conversion unit,
Pitch conversion unit for converting the pitch of the user sound source by using a pitch ratio between the separated vocal sound source and the user sound source whose time signature has been converted by the time signature conversion unit.
Further comprising, a system for converting the voice of the singer of the sound source to the user's voice.

The method of claim 5,
The pitch conversion unit,
A pitch detection unit for extracting the pitch characteristics of the separated vocal sound source and the user sound source, and detecting pitch with only the harmonic sound by a harmonic-percussive source separation algorithm; And
A pitch shift unit that converts the pitch of the user sound source by using the pitch ratio of the separated vocal sound source and the user sound source.
A system for converting the voice of a singer of a sound source into a user's voice, including a.

The method of claim 5,
The sound source conversion unit,
A volume conversion unit that converts the volume of the user's sound source by using a volume ratio between the separated vocal sound source and the beat conversion unit and the user's sound source with the time and pitch converted by the pitch conversion unit
Further comprising, a system for converting the voice of the singer of the sound source to the user's voice.

The method of claim 7,
The volume conversion unit,
A volume detection unit for extracting a volume ratio for each time zone of the user sound source in which the vocal sound source and the beat and the pitch are separated by using an effective value (Root Mean Square, RMS); And
A volume control unit that adjusts the size of the user's sound source in which time signature and pitch have been converted by time using the volume ratio of the effective value (RMS)
A system for converting the voice of a singer of a sound source into a user's voice, including a.

According to claim 1,
The accompaniment mixing unit,
Using the audio effect for music production, mixing the user's song with the beat, pitch, and volume while converting the beat, pitch, and volume of the user's song like a singer into the sound source accompaniment separated from the sound source while maintaining the user's tone in the user's sound source, Acquiring a sound source converted to the user's tone
Characterized in that, a system for converting the voice of the singer of the sound source to the user's voice.

Separating the input sound source into a vocal sound source and an accompaniment sound source;
Extracting at least one or more features of the beat, pitch, and volume from the vocal sound source, and converting the recorded user sound source into a user's voice along the sound source; And
Mixing the sound source accompaniment with the converted user sound source to obtain a sound source converted into the user's tone
Including, how to convert the voice of the singer of the sound source to the user's voice.

The method of claim 10,
Selecting the sound source from a user before separating the input sound source into a vocal sound source and an accompaniment sound source; And
Acquiring the user sound source by recording a song sung by the user along the selected sound source
Further comprising, a method of converting the voice of the singer of the sound source to the user's voice.

The method of claim 10,
The step of converting the user sound source,
Calculating the temporal alignment between the separated vocal sound source and the user sound source, and converting the beat of the user sound source
Including, how to convert the voice of the singer of the sound source to the user's voice.

The method of claim 12,
The step of converting the user sound source,
Converting the pitch of the user sound source by using a pitch ratio between the separated vocal sound source and the user sound source whose time signature has been converted by the beat conversion unit
Further comprising, a method of converting the voice of the singer of the sound source to the user's voice.

The method of claim 13,
The step of converting the user sound source,
Converting the volume of the user sound source by using a volume ratio between the separated vocal sound source and the time signature and the user sound source in which the time signature and the pitch are converted by the pitch conversion unit
Further comprising, a method of converting the voice of the singer of the sound source to the user's voice.

The method of claim 10,
The step of mixing the sound source accompaniment with the user sound source to obtain a sound source converted into the user's tone,
Using the audio effect for music production, mixing the user's song with the beat, pitch, and volume while converting the beat, pitch, and volume of the user's song like a singer into the sound source accompaniment separated from the sound source while maintaining the user's tone in the user's sound source, Acquiring a sound source converted to the user's tone
Characterized in that, the method of converting the voice of the singer of the sound source to the user's voice.