KR100989867B1

KR100989867B1 - An automatic song transcription method

Info

Publication number: KR100989867B1
Application number: KR1020090003118A
Authority: KR
Inventors: 이준환; 형아영
Original assignee: 전북대학교산학협력단
Priority date: 2009-01-14
Filing date: 2009-01-14
Publication date: 2010-10-29
Also published as: KR20100083637A

Abstract

개시된 내용은 자동 노래 채보방법에 관한 것으로서, 마이크와 같은 음성신호 입력부를 통해 입력되는 음성신호로부터 피치를 추출하고, 추출된 피치 데이터를 기반으로 연속된 음성신호를 음절 단위로 분할함과 동시에 음성신호의 잡음을 제거하고 음절의 대표 피치값을 제공하고, 분할된 음절을 음장 인식이 가능한 유효 단위 구간으로 변환하는 전처리를 수행한 후 각각의 음표 단위로 클러스터링하여 음장 인식을 수행하고, 분할된 첫 음절의 대표 피치값을 기준으로 상대 음정 주파수 표를 재구성한 후 재구성된 상대 음정 주파수 표를 토대로 음절의 대표 피치값을 매핑하여 음정 인식을 수행하고, 추출된 피치 데이터에 임계값을 적용하여 임계값 이하의 비음성 구간인 휴지기를 추출한 후 추출된 휴지기 정보를 이용하여 마디를 검출하며, 음장 인식 정보, 음정 인식 정보, 마디 검출 정보를 토대로 악보 데이터를 생성한다. 따라서, 본 발명은 음절 분할시 잡음 제거 및 효과적인 영역 분할을 수행하고, 개개인의 발성 속도에 구애받지 않는 음장 인식이 가능하고, 상대 음정을 통한 음정 인식을 통해 개개인의 음고에 적합한 음정 인식이 가능하며, 마디 검출을 통해 정확한 악보 생성이 가능하다.Disclosed is a method for automatic song taking, extracting a pitch from a voice signal input through a voice signal input unit such as a microphone, and dividing a continuous voice signal into syllable units based on the extracted pitch data. After preprocessing to remove the noise of the syllables and provide a representative pitch value of the syllables, and convert the divided syllables into effective unit sections capable of sound field recognition, perform clustering by each note unit to perform sound field recognition, and then divide the first syllables. After reconstructing the relative pitch frequency table based on the representative pitch value of, map recognition is performed by mapping the representative pitch values of syllables based on the reconstructed relative pitch frequency table, and the threshold value is applied to the extracted pitch data. After extracting the rest period which is the non-voice interval of, using the extracted rest information, the node is detected and the sound field is And it generates the music data based on the expression information, pitch information recognition, word detection information. Therefore, the present invention can perform noise reduction and effective region division when the syllable is divided, and can recognize the sound field irrespective of the individual's speech rate, and it is possible to recognize the pitch suitable for the individual pitch through the pitch recognition through the relative pitch. By detecting the nodes, accurate scores can be generated.

노래, 채보, 피치 추출, 음절 분할, 음장 인식, 음정 인식, 마디 검출 Song, Grading, Pitch Extraction, Syllable Splitting, Sound Field Recognition, Pitch Recognition, Node Detection

Description

An automatic song transcription method

본 발명은 자동 노래 채보(採譜)방법에 관한 것이다.The present invention relates to an automatic song retrieval method.

보다 상세하게는 사람의 노랫소리로부터 악보 데이터를 자동으로 생성하도록 하는 자동 노래 채보방법에 관한 것이다.More specifically, the present invention relates to an automatic song retrieval method for automatically generating sheet music data from human song.

인류에게 있어서 노래는 오래 전부터 존재해 왔던 하나의 문화 현상이며, 개인과 사회 집단의 감정표현의 수단이자 유희의 도구였다. 노래는 음성의 범주에 속하며, 발성 기관을 통해 표현되어 언어적인 모습을 지닌다는 점에서는 일반적인 말과 비슷하지만, 음고(音高), 음량(音量), 음가(音價), 음색(音色) 등의 음악적 속성을 가지고 있으므로 일반적인 말과는 차이점이 있다. 이와 같이 사람의 노래나 악기로 연주된 음악의 음정, 박자, 가사를 인식하거나 기존의 노래 자료와 입력된 노래의 비교를 통해 곡명을 인식하는 연구인 곡조 인식(music recognition)은 음성 인지 분야 중 하나로 노래의 각 특징량을 사용하여 최종적으로 원하는 데이터 의 형태로 나타내 주는 것을 말한다.For mankind, song is a cultural phenomenon that has existed for a long time, a means of expressing emotions of individuals and social groups and an instrument of amusement. Songs belong to the category of voice and are similar to ordinary words in that they are expressed through vocal organs and have a linguistic appearance, but songs, pitch, volume, tone, tone, etc. It has a musical attribute, so it is different from general words. Music recognition is one of the fields of speech recognition, which is the study of recognizing the pitch, time, and lyrics of music played by human songs or instruments, or by recognizing the name of a song by comparing existing song data with input songs. Each feature of the song is used to finally express it in the form of the desired data.

이러한 곡조 인식은 노래를 통한 시창 교육 분야 및 여가 활동을 위한 엔터테인먼트 분야에 사용 될 수 있을 뿐만 아니라 최근에 중요시되고 있는 노래의 저작권을 보호하기 위한 표절 검사의 도구로 사용 될 수도 있다. 하지만 다양한 정보 수단에 대한 컴퓨터의 성능이 발전하고 있음에도 불구하고 곡조 인식에 관한 연구는 현재까지 미미한 실정이다.This music recognition can be used not only in the field of sight education through the song and in the entertainment field for leisure activities, but also as a tool for plagiarism check to protect the copyright of the song which is recently important. However, despite the development of computer performance for various information means, studies on tune recognition have been minimal.

음성 처리에 있어서 곡조 인식의 한 분야인 자동 채보 시스템은 사람의 음성 및 악기로 연주된 노래로부터 음의 높이(음정(音程), interval), 길이(음장(音長), duration), 가사를 인식하여 그 결과를 악보의 형태로 나타내어 주는 것으로서, 기존의 음악에 익숙한 전문가가 직접 노래를 듣고 채보하는 방법에 비하여 시창자(始唱者)의 노래가 가진 음악적 특징을 시스템이 자동으로 인식하여 악보화할 수 있으므로 일반인도 쉽게 사용할 수 있다.An automatic grading system, which is a field of tune recognition in speech processing, recognizes the height (pitch, interval), length (sound field, duration), and lyrics of a song played by human voices and musical instruments. As a result of expressing the result in the form of sheet music, the system automatically recognizes the musical characteristics of the song of the viewer and compares the score with the expert who is familiar with the existing music. It can be used easily even by the general public.

현재 알려진 종래의 자동 채보 시스템으로는 연속적인 음성신호에서 추출된 특징 정보를 각각의 음표로 인식할 수 있게 하는 분할을 위하여 음소 단위로 분절된 구간을 합쳐 음절의 경계 정보로 사용하는 방법, 피치 간격마다 발생하는 음성의 최대값을 연결하여 구한 에너지 정보를 이용하여 음절 구간(segment)을 형성하는 방법 등이 있다.[0003] The presently known automatic grading system combines segmented sections by phoneme to be used as boundary information of syllables to divide feature information extracted from continuous speech signals into individual notes. There is a method of forming a syllable segment using energy information obtained by connecting the maximum value of the voice generated every time.

그러나, 상술한 바와 같은 종래의 자동 채보 시스템은, 음성신호의 연속적인 특성 때문에 음절 분할시 경계가 모호한 부분에서는 그 효율성이 현저히 떨어지고, 예측된 음절 경계의 한 구간마다 음정 인식을 위하여 피치 정보의 대표값을 찾아 주어야 하는 과정을 추가하여야 하며, 마디 검출이 불가능하여 노래의 인식 결과를 완전한 악보의 형태로 나타내어 줄 수 없으므로 만족할 만한 결과를 제공해 줄 수 없는 문제점을 가지고 있다.However, in the above-described conventional automatic grading system, due to the continuous characteristics of the speech signal, the efficiency is remarkably decreased in the part where the boundary is ambiguous when the syllable is divided, and the pitch information is represented to recognize the pitch at each section of the predicted syllable boundary. The process of finding the value should be added, and since it is impossible to detect the node, it can not represent the result of the song in the form of a complete score, and thus has a problem in that it cannot provide satisfactory results.

또한, 통상적으로 사람의 노래 빠르기는 각각 다른 속도와 시간을 지니고 있으므로 개인차가 매우 크지만, 음장 인식에 있어서 종래의 방법은 일반화된 표준 데이터에 의거하여 표준 음표에 매핑(mapping)하는 방법을 사용하기 때문에 사람마다 다른 노래 입력의 빠르기에 적응하지 못하는 단점이 있다.In addition, although the speed of a person's song usually has different speed and time, the individual difference is very large. However, the conventional method of sound field recognition uses a method of mapping to standard notes based on generalized standard data. Because of this, there is a disadvantage in that each person cannot adapt to the speed of different song input.

본 발명의 목적은 전술한 문제점을 해결할 수 있도록, 사람의 노랫소리 등의 음성신호로부터 음장과 음정을 자동으로 인식하고, 마디를 자동으로 검출하여 최종적으로 악보 데이터를 생성하도록 하는 자동 노래 채보방법을 제공하는 데 있다.SUMMARY OF THE INVENTION An object of the present invention is to provide an automatic song retrieval method for automatically recognizing a sound field and a pitch from a voice signal such as a person's song, and automatically detecting a measure to finally generate sheet music data so as to solve the above problems. To provide.

본 발명의 다른 목적은, 사람의 노랫소리를 음성신호로 변환한 후 효율적인 피치 추출과 음절 분할을 토대로 사람마다 다른 노래의 빠르기에 무관한 음장 인식 및 상대 음정에 의거한 음정 인식을 수행하며, 음성신호의 휴지기 정보를 이용하여 마디를 검출함으로써, 정확한 악보 데이터를 생성하도록 하는 자동 노래 채보방법을 제공하는 데 있다.Another object of the present invention is to convert the song of a person into a voice signal, and then to perform pitch recognition based on efficient pitch extraction and syllable division, and to recognize the sound field based on the relative pitch and the relative pitch. The present invention provides an automatic song collecting method for generating accurate sheet music data by detecting nodes using pause information of a signal.

이러한 목적을 달성하기 위한 본 발명에 따른 자동 노래 채보방법은, (1) 마이크와 같은 음성신호 입력부를 통해 입력되는 음성신호로부터 음성의 높낮이를 나타내는 피치를 추출하는 단계와, (2) (1) 단계에서 추출된 피치 데이터를 기반으로 연속된 음성신호를 음절 단위로 분할함과 동시에 음성신호의 잡음을 제거하며, 음절의 대표 피치값을 제공하는 단계와, (3) (2) 단계에서 분할된 음절을 음장 인식이 가능한 유효 단위 구간으로 변환하는 전처리를 수행하고, 전처리된 음장 데이터를 각각의 음표 단위로 클러스터링하여 음장 인식을 수행하는 단계와, (4) (2) 단계에서 분할된 첫 음절의 대표 피치값을 기준으로 상대 음정 주파수 표를 재구성하고, 재구성된 상대 음정 주파수 표를 토대로 음절의 대표 피치값을 매핑하여 음정 인식을 수행하는 단계와, (5) (1) 단계에서 추출된 피치 데이터에 임계값을 적용하여 임계값 이하의 비음성 구간인 휴지기를 추출하고, 추출된 휴지기 정보를 이용하여 마디를 검출하는 단계, 그리고 (6) (3), (4), (5) 단계를 통해 수행된 음장 인식 정보, 음정 인식 정보, 마디 검출 정보를 토대로 악보 데이터를 생성하는 단계를 포함한다.The automatic song taking method according to the present invention for achieving this object comprises the steps of: (1) extracting the pitch indicating the height of the voice from the voice signal input through the voice signal input unit, such as a microphone, (2) (1) Based on the pitch data extracted in the step, the continuous speech signal is divided into syllable units, the noise of the speech signal is removed, a representative pitch value of the syllables is provided, and (3) (2) are divided. Performing preprocessing to convert the syllables into effective unit sections capable of sound field recognition, clustering the preprocessed sound field data into units of each note, and performing sound field recognition; and (4) step (2) of the first syllable divided Reconstructing a relative pitch frequency table based on the representative pitch value, and performing pitch recognition by mapping a representative pitch value of a syllable based on the reconstructed relative pitch frequency table; (5) applying a threshold value to the pitch data extracted in step (1) to extract a rest period, which is a non-speech interval below the threshold value, detecting a node using the extracted rest period information, and (6) (3 And generating sheet music data based on the sound field recognition information, the pitch recognition information, and the node detection information performed through the steps (4) and (5).

삭제delete

이상에서와 같이 본 발명의 자동 노래 채보방법에 따르면, 안정화된 역확산 방정식을 토대로 음절 분할을 수행하여 잡음 제거 및 효과적인 영역 분할을 수행함과 동시에 분할된 음절의 대표 피치값을 얻을 수 있고, 유전자 알고리즘을 토대로 음길이 인식을 수행하여 비슷한 발성 시간을 가진 각 음표를 클러스터링 함으로써 시창자의 발성 속도에 구애받지 않는 음길이 인식이 가능하고, 음정 인식시 종래와 같이 표준화된 데이터를 기반으로 인식하는 것과 달리 음절 분할시 얻어진 음정의 기본 정보를 상대 음정을 통해 매핑하여 시창자의 음고에 적합한 음정 인식이 가능하며, 음정과 박자 정보를 토대로 악보를 구성한 종래의 채보 방식과 달리 음성신호의 휴지기 정보를 이용하여 마디 검출을 수행하므로 노래를 채보할 때 마디 정보를 통한 정확한 악보 생성이 가능한 효과가 있다.As described above, according to the automatic song picking method of the present invention, the syllable segmentation is performed based on the stabilized despreading equation to remove the noise and the effective region segmentation, and at the same time obtain the representative pitch value of the segmented syllable, and the genetic algorithm By recognizing the note length, clustering each note with a similar utterance time enables note length recognition regardless of the voice speed of the speaker, and when recognizing the note based on the standardized data, It is possible to recognize the pitch suitable for the pitcher's pitch by mapping the basic information of the pitch obtained at the time of division through relative pitch, and detecting node using pause information of voice signal unlike the conventional grading method which composed the score based on the pitch and beat information. So that when you're compiling a song It is possible to create sheet music.

또한, 휴대전화 등의 개인통신 단말기에 적용하여 음성을 통한 자동 작곡 시스템과 같은 응용이 가능하며, 더 나아가 사용자의 음성 특징량을 분석한 감성 검색 시스템으로의 적용이 가능한 효과가 있다.In addition, the present invention can be applied to a personal communication terminal such as a mobile phone, such as an automatic composition system through voice, and furthermore, it can be applied to an emotional search system that analyzes a user's voice feature amount.

이하, 첨부된 도면을 참조하여 본 발명의 자동 노래 채보방법을 상세하게 설명한다.Hereinafter, with reference to the accompanying drawings will be described in detail the automatic song taking method of the present invention.

도 1은 본 발명에 따른 자동 노래 채보방법이 적용된 구성을 개략적으로 나타낸 블록도이다.1 is a block diagram schematically showing a configuration to which the automatic song picking method according to the present invention is applied.

도시된 바와 같이, 사람의 노랫소리로부터 악보 데이터를 자동으로 생성하는 자동 노래 채보방법이 적용된 장치는, 음성신호 입력부(10), 피치 추출부(20), 음절 분할부(30), 전처리부(40), 음장 인식부(50), 음정 인식부(60), 휴지기 추출부(70), 마디 검출부(80), 악보 생성부(90) 등으로 구성된다.As shown in the drawing, the apparatus to which the automatic song collection method for automatically generating the music data from the song of a person is applied includes a voice signal input unit 10, a pitch extractor 20, a syllable divider 30, and a preprocessor ( 40, the sound field recognition unit 50, the pitch recognition unit 60, the pause extraction unit 70, the node detection unit 80, the score generation unit 90 and the like.

음성신호 입력부(10)는 마이크 등으로 구성되며, 외부로부터 입력되는 아날로그 음성신호를 디지털 음성신호로 변환한 후, 디지털로 변환한 음성신호를 피치 추출부(20)로 출력한다.The voice signal input unit 10 includes a microphone, and converts an analog voice signal input from the outside into a digital voice signal, and then outputs the digital voice signal to the pitch extraction unit 20.

피치 추출부(20)는 음성신호 입력부(10)를 통해 입력되는 음성신호의 특징 값을 추출하기 위한 부분으로서, 음성신호의 높낮이를 나타내는 피치를 추출하고, 추출된 피치 데이터를 음절 분할부(30)와 휴지기 추출부(70)로 출력한다.The pitch extractor 20 is a part for extracting feature values of a voice signal input through the voice signal input unit 10. The pitch extractor 20 extracts a pitch representing a height of the voice signal and divides the extracted pitch data into a syllable divider 30. ) And the pause extractor 70.

피치를 추출하기 위해서 일반적인 주기적인 신호에서 주기성을 검출하기 위하여 사용되는 자기상관함수(Autocorrelation function)를 사용하며, 자기상관함수는 어떤 시간에서의 신호값과 다른 시간에서의 신호값과의 상관성을 나타내는 것으로 연산결과로 나오는 신호는 신호의 주기적인 부분을 강조해 주어 피치를 측정할 수 있게 해 준다.To extract the pitch, we use autocorrelation function, which is used to detect periodicity in general periodic signals. The autocorrelation function represents the correlation between the signal value at one time and the signal value at another time. The signal resulting from the calculation highlights the periodic part of the signal, allowing the pitch to be measured.

자기상관함수는 다음의 수학식 1로 정의된다.The autocorrelation function is defined by Equation 1 below.

따라서 자기상관계수는 어느 한 시점 n에서 표본 x(n)의 값과 그로부터 d만큼 떨어져 있는 표본의 값을 서로 곱한 것을 모든 n에 대하여 합한 것이라고 볼 수 있다.Therefore, the autocorrelation coefficient can be regarded as the sum of all n values of the sample x (n) multiplied by the value of the sample at a certain point n from each other.

도 2는 입력된 음성신호에 수학식 1의 자기상관함수를 적용하여 피치를 추출한 예를 나타낸 도면으로서, 검출된 피치는 본 발명의 자동 노래 채보방법의 기본 정보가 되며, 이를 통해 음절 분할, 음장 인식, 음정 인식, 마디 검출 등의 일련의 단계가 수행된다.FIG. 2 is a diagram illustrating an example of extracting a pitch by applying an autocorrelation function of Equation 1 to an input voice signal, wherein the detected pitch becomes basic information of an automatic song collecting method of the present invention. A series of steps are performed, such as recognition, pitch recognition, node detection, and the like.

음절(syllable) 분할부(30)는 피치 추출부(20)에서 추출된 피치 데이터를 기반으로 연속된 음성신호의 잡음을 제거함과 동시에 음절( 음절은 각각 하나의 음표를 나타내므로 음표의 음장, 음정을 인식하는 기본 단위임 ) 단위로 분할하고, 음절의 대표 피치값을 확인하도록 하며, 분할된 음절 데이터를 전처리부(40), 음정 인식부(60)로 출력한다.The syllable divider 30 removes noise of a continuous voice signal based on the pitch data extracted by the pitch extractor 20 and at the same time a syllable (the syllables represent one note each). It is divided into a) unit, to recognize the representative pitch value of the syllable, and outputs the divided syllable data to the preprocessor 40, the pitch recognition unit 60.

이때 음절 분할부(30)는 피치 추출부(20)를 통해 추출된 피치 데이터를 음정 및 음표의 음장 인식이 가능한 단위로 분리하기 위하여 안정화된 역확산 방정식(Stabilized Inverse Diffusion Equation)을 사용한다.In this case, the syllable divider 30 uses a stabilized inverse diffusion equation to separate the pitch data extracted through the pitch extractor 20 into units capable of recognizing the pitch and the sound field of the note.

안정화된 역확산 방정식은 일반적으로 영상처리 분야에서 객체 분할에 사용되는 알고리즘으로서, 잡음에 매우 강인하며 안정된 분할 결과를 나타내므로 본 발명에서 음성신호의 잡음 제거 및 연속된 신호를 음절 단위로 분할하기 위해 사용한다.The stabilized despreading equation is an algorithm generally used for object segmentation in the field of image processing, and is very robust against noise and exhibits stable segmentation results. Therefore, in the present invention, in order to remove noise of a speech signal and divide a continuous signal into syllable units use.

음절을 분할하기 위하여 안정화된 역확산 방정식에서 사용하는 F함수는 다음의 수학식 2로 정의할 수 있다.The F function used in the stabilized despreading equation to divide the syllables can be defined by Equation 2 below.

본 발명에서는 수학식 2를 만족하는 다음의 수학식 3을 F함수로 사용한다.In the present invention, the following equation (3) satisfying equation (2) is used as the F function.

안정화된 역확산 방정식을 기반으로 한 다중 스케일 필터링 방법은 영역 분할을 목적으로 하며 다음의 수학식 4와 같이 영역 값의 변화를 구하는 수학식이 사용된다.The multi-scale filtering method based on the stabilized despreading equation is for the purpose of region division and an equation for obtaining a change in region value is used as shown in Equation 4 below.

안정화된 역확산 방정식에 기반을 둔 음절 분할은 다음과 같은 과정들을 순차적으로 수행한다. 초기 조건인 알고리즘의 반복 횟수(I)는 임의로 설정할 수 있다. 다만, 반복 횟수(I)는 여러 번에 걸친 실험을 통하여 최적화된 값으로 설정하는 것이 바람직하며, 실험은 순차적으로 반복 횟수(I)의 값을 증가시키면서 각 반복 횟수(I)에 따른 음절 분할의 결과를 사람이 직접 듣고 가장 자연스러운 것을 선택하는 관능시험 방법을 적용할 수 있다.The syllable segmentation based on the stabilized despreading equation performs the following processes sequentially. The number of iterations I of the algorithm, which is an initial condition, can be arbitrarily set. However, the number of repetitions (I) is preferably set to an optimized value through several experiments, and the experiment sequentially increases the value of the number of repetitions (I) while the syllable division according to each repetition number (I). Sensory test methods can be applied in which the person hears the results and selects the most natural one.

1단계 - 피치 추출부(20)에서 추출된 각각의 피치 데이터들을 독립된 영역으로 설정한다. 이때, 각 영역의 면적(m_i), 영역의 값(u_i), 분할된 영역의 수(N)를 계산한다.Step 1—Pitch data extracted by the pitch extractor 20 is set as an independent region. At this time, the area m _i of each area, the value u _i of the area, and the number N of divided areas are calculated.

2단계 - 인접한 영역 두 개 이상의 값이 같아질 때까지 각 영역의 필터링이 진행될 때 스케일에 따른 변화율 값을 수학식 4를 통해 갱신한다.Step 2-The change rate value according to the scale is updated through Equation 4 when filtering of each area is performed until two or more values of adjacent areas are equal.

3단계 - 동일한 값을 가진 인접 영역들을 병합한다. 이때, 각 영역의 면적(m_i), 영역의 값(u_i), 분할된 영역의 수(N)를 갱신한다.Step 3-Merges adjacent areas with the same value. At this time, the area m _i of each region, the value u _i of the region, and the number N of divided regions are updated.

4단계 - 2단계 이후를 수행한다. 이때 반복 횟수(I)가 초기 설정값을 만족하면 알고리즘을 종료한다.Perform steps 4-2 and beyond. At this time, if the repetition number I satisfies the initial setting value, the algorithm is terminated.

분할된 결과를 통해 안정화된 역확산 방정식에 기반을 둔 영역 분할을 수행할 때 F함수에서 사용되는 σ에 따라 그 수행 결과에 차이가 있는 것을 확인할 수 있다. 수학식 3과 수학식 4에서 알 수 있듯이 σ값이 클 경우 반복횟수 감소, 잡음제거, 에지 손실 등의 특징이 있는 반면, σ값이 작으면 반복횟수 증가, 잡음보존, 에지 보존 등의 특징을 지닌다. 따라서 효율적인 음절 분할을 위하여 초기에 σ값을 크게 해서 잡음을 제거함과 동시에 수렴속도를 개선하고 분할이 진행될수록 σ값을 작게 해서 그 성능을 높일 수 있는 방법을 적용한다.The segmented results show that when the segmentation is based on the stabilized despreading equation, the results differ depending on the σ used in the F function. As can be seen from Equations 3 and 4, if the value of σ is large, it is characterized by decreasing the number of repetitions, noise reduction, and edge loss.However, if the value of σ is small, it is characterized by increasing the number of repetitions, noise preservation, and edge preservation. Have Therefore, for efficient syllable segmentation, we apply the method to increase the σ value at the beginning to remove noise, improve convergence speed, and increase the performance by decreasing σ value as the segmentation progresses.

도 3은 도 2의 피치 데이터를 토대로 본 발명의 안정화된 역확산 방정식을 사용하여 음절 분할을 수행한 결과의 예를 나타낸 도면으로서, 음성신호에서 모호한 음절 경계를 효과적으로 분할하고 병합하였으며, 음정의 잡음을 제거함과 동시 에 음정 인식에 필요한 음정의 대표값을 찾아낼 수 있다. 또한, 반복 횟수를 줄여 그 소요 시간을 줄이므로 그 성능을 높일 수 있으며, 분할 후에 일정 임계치에 근접하지 못한 영역은 음표로서 유효하지 못하다고 보고 그 앞 음절에 포함하여 처리함을 알 수 있다.FIG. 3 is a diagram illustrating an example of a result of performing syllable division using the stabilized despreading equation of the present invention based on the pitch data of FIG. 2, which effectively divides and merges an ambiguous syllable boundary in a speech signal, and produces noise. At the same time, we can find the representative value of the pitch needed to recognize the pitch. In addition, the performance can be improved by reducing the number of repetitions so that the time required can be improved, and it can be seen that an area which is not close to a predetermined threshold after division is considered invalid as a note and is included in the preceding syllable.

전처리부(40)는 음표의 음장( 음정과 함께 나타내어 한 마디 안에서 일정한 규칙에 따라 표현되며 곡조 인식에 있어서 핵심적인 정보임 )을 인식하기 위하여 음절 분할부(30)에서 분할된 음절을 인식을 위한 기본 단위로 변환하여 음장 인식부(50)로 출력한다.The preprocessor 40 recognizes the syllables divided by the syllable divider 30 in order to recognize the sound field of the note (represented along with the pitch and expressed according to a certain rule within a word and is essential information in recognizing the tune). Converted to the basic unit and output to the sound field recognition unit 50.

도 4는 본 발명의 전처리 과정의 지속시간, 휴지기 및 IOI의 개념을 설명하기 위한 도면으로서, 음성신호는 음성이 발성되는 지속시간(duration)과 사람이 숨쉬거나 또는 발음상의 이유로 인해 발생되는 휴지기(pause)로 이루어져 있다. 사람이 노래를 할 경우에도 음성의 단절이 발생되며, 이때 발성 지속시간과 휴지기를 합한 것을 IOI(Inter Onset Interval)로 정의할 수 있다. 여기서 발성 지속시간의 시작시간과 휴지기의 종료시간은 각 음표의 시작시간과 다음 음표의 시작시간의 차를 의미한다. 본 발명의 음장 인식에서는 기존의 지속시간 데이터보다 IOI로 변형된 데이터를 사용한 경우가 좋은 성능을 얻을 수 있으므로 음절 데이터를 IOI로 변환하는 전처리 과정을 거쳐 사용한다. 도 5는 도 3의 음절 분할 데이터를 토대로 IOI로 변형된 음절 길이 정보의 예를 나타낸 도면이다.4 is a view for explaining the concept of the duration, pause and IOI of the pre-processing process of the present invention, the voice signal is a pause (duration) when the voice is spoken and a pause caused by the person breathing or pronunciation reasons ( pause). Even when a person sings, a voice break occurs, and the sum of the talk duration and the pause can be defined as an inter onset interval (IOI). Here, the start time of the utterance duration and the end time of the pause period means the difference between the start time of each note and the start time of the next note. In the sound field recognition of the present invention, since the performance obtained by using the IOI-modified data is better than the existing duration data, it is used through the preprocessing process of converting the syllable data into the IOI. FIG. 5 is a diagram illustrating an example of syllable length information transformed into an IOI based on syllable segmentation data of FIG. 3.

음장 인식부(50)는 유전자 알고리즘(Genetic Algorithm)을 사용하여, 전처리부(40)로부터 입력되는 전처리된 음장 데이터를 각각의 음표 단위로 클러스터링하 여 음장 인식을 수행하고, 음장 인식 정보를 악보 생성부(90)로 출력한다.The sound field recognition unit 50 performs a sound field recognition by clustering the preprocessed sound field data input from the preprocessing unit 40 into each note unit by using a genetic algorithm, and generates music score recognition information. Output to the unit 90.

유전자 알고리즘은 탐색과 해의 가능 영역들을 균형있게 이용하는 일반성 있는 부류의 탐색방법으로서, 기존의 다른 탐색방법들은 탐색 공간에서 최적값을 찾기 전에 지역 극소(local minimum)에 빠질 위험이 있지만, 유전자 알고리즘은 해가 될 가능성이 있는 개체 집단을 유지하면서 그들 모두가 동시에 최적값을 찾아나가기 때문에 지역 극소에 빠질 위험을 해결할 수 있다는 점에서 본 발명에 적용하게 되었다.Genetic algorithms are a general class of search methods that balance the possible areas of search and solution. While other existing search methods are at risk of falling into local minimum before finding the optimal value in the search space, genetic algorithms The present invention has been applied in that the risk of falling into local minima can be solved by maintaining a population that is likely to be harmful and all of them simultaneously find optimal values.

음장 인식부(50)에서 수행되는 유전자 알고리즘을 이용한 클러스터링의 절차는 다음과 같다.The clustering procedure using the genetic algorithm performed in the sound field recognition unit 50 is as follows.

1단계 - 추출된 음절 데이터를 기반으로 염색체를 구성하고 모집단을 초기화한다.Step 1-Construct chromosomes and initialize the population based on the extracted syllable data.

2단계 - 초기 클러스터의 센터값을 바탕으로 각 객체

와 클러스터 센터간의 거리(d_min)를 최소화하는 클러스터링을 수행한다. 거리를 최소화하는 수학식은 다음의 수학식 5와 같다.Step 2-Each object based on the center of the initial cluster

Clustering is performed to minimize the distance d _min between the cluster and the cluster center. Equation 5 for minimizing the distance is shown in Equation 5 below.

3단계 - 2단계 결과의 적합도를 평가하기 위하여 클러스터 간의 거리인 D_inter 와 클러스터 내부 거리인 D_intra를 다음의 수학식 6, 7을 사용하여 정의한다.In order to evaluate the goodness of fit of the results of Step 3 and Step 2, D _inter and D _{intra, which} are the distance between clusters, are defined using Equations 6 and 7 below.

이때,

이고,

이다.At this time,

ego,

to be.

는 현재 클러스터와 그 센터,

는 나머지 클러스터와 각각의 센터를 말하며, 적합도 함수는 다음의 수학식 8로 정의한다.

Is the current cluster and its center,

Denotes the remaining clusters and their respective centers, and the fitness function is defined by Equation 8 below.

4단계 - 3단계에서 계산된 적합도 함수의 결과는 다음과 같은 일련의 단계들을 수행한다. 룰렛 휠(Roulette wheel) 방법을 통하여 선택하고, 이 결과를 통하여 구성된 염색체를 다시 일점 교차(one point crossover)하며, 최종적으로, 주어진 돌연변이 계수를 이용하여 새로운 세대를 창출해낸다. 그리고 다시 2단계 이후의 알고리즘을 수행한다.The result of the goodness-of-fit function computed in steps 4-3 performs a series of steps: The roulette wheel method is used to select one point crossover the resulting chromosomes, and finally, a new generation is created using a given mutation coefficient. Then, the algorithm after step 2 is performed again.

음정 인식부(60)는 음절 분할부(30)에서 분할된 첫 음절의 대표 피치값을 기준으로 상대 음정 주파수 표를 재구성하고, 재구성된 상대 음정 주파수 표를 토대 로 음절의 대표 피치값을 매핑하여 음정 인식을 수행하며, 음정 인식 정보를 악보 생성부(90)로 출력한다.The pitch recognition unit 60 reconstructs the relative pitch frequency table based on the representative pitch value of the first syllable divided by the syllable divider 30, and maps the representative pitch value of the syllable based on the reconstructed relative pitch frequency table. Pitch recognition is performed, and the pitch recognition information is output to the score generation unit 90.

기존의 음정 인식 방법에서는 절대 음정을 바탕으로 표준 주파수 테이블을 이용하여 사람의 목소리를 인식하였지만, 이러한 방법은 사람의 음고 변화에 적응하지 못하는 모습을 보여준다. 따라서 본 발명의 음정 인식에서는 상대 음정(relative interval)의 개념을 사용한다. 상대 음정은 앞 음과의 상대적인 변화를 측정하여 변화 정도로 음정을 결정하는 방법으로서, 상대 음정을 통하여 앞 음보다 얼마나 변화하였는지를 측정하고, 그 변화정도를 12 음계에 적용할 수 있도록 근사한다.Conventional pitch recognition method recognizes human voice using standard frequency table based on absolute pitch, but this method does not adapt to change of human pitch. Therefore, the pitch recognition of the present invention uses the concept of a relative interval (relative interval). Relative pitch is a method of determining the pitch by measuring the relative change with the previous note. It measures how much the previous note has changed through the relative pitch, and approximates the change to be applied to the 12th scale.

상대 음정 인식을 위하여 첫 음절의 대표 피치값을 기준음으로 보고, 평균율 음계의 비율을 이용하여 각 노래에 해당하는 상대 음정 주파수 표를 재구성하게 된다. 평균율 음계는 도 6에 도시된 바와 같이 음정의 결정시 사용하는 12 음계를 바탕으로 한다.In order to recognize the relative pitch, the representative pitch value of the first syllable is regarded as the reference sound, and the relative pitch frequency table for each song is reconstructed using the ratio of the average rate scale. The average rate scale is based on the 12 scales used in the determination of the pitch, as shown in FIG.

12 음계 시스템은 인간의 청각적 한계와 밀접한 관련이 있으며, 12 음계 내에서 주파수들 간의 관계는 배수 관계로 선형 비례가 아니다. 1도 차이 음의 주파수비는 다음의 수학식 9에 의해 표현된다.The 12 scale system is closely related to the human auditory limits, and the relationship between frequencies within the 12 scale is a linear multiple of the relationship. The frequency ratio of the 1 degree difference sound is expressed by the following equation (9).

그러므로 옥타브가 12개의 반음으로 되어 있고, 모든 반음이 정확하게 같은 크기로 되어 있다면 C∼C#의 음정은 C#∼D의 음정과 같다. 이때의 음정값을 a라 하고, C의 주파수를 1이라 하면 C#은 상대 주파수 a가 되고, D는 a를 또 곱한 a²이 된다.Therefore, if the octaves are 12 semitones and all semitones are exactly the same size, the pitches of C to C # are the same as the pitches of C # to D. If the pitch value is a and the frequency of C is 1, C # is the relative frequency a, and D is a ² multiplied by a.

이러한 주파수 비를 바탕으로 한 평균율 음계는 도 7에서 도시된 바와 같이 상술한 수학식 9에 의해 각 음정에 따른 총 12개로 나뉘어 표현될 수 있다.As shown in FIG. 7, the average rate scale based on the frequency ratio may be divided into a total of 12 according to each pitch by Equation 9 described above.

도 7의 평균율 음계를 이용하여 음정 인식부(60)에서 수행되는 음정 인식의 절차는 다음과 같다.The procedure of pitch recognition performed by the pitch recognizer 60 using the average rate scale of FIG. 7 is as follows.

우선 음절 분할부(30)에서 분할된 첫 음절을 기준음으로 보고 첫 음절의 대표 피치값을 통해 음정을 확인한 후, 확인된 음정을 도 7에 도시된 평균율 음계와 나누어 상대 음정 주파수 표의 C음을 확인한다. 그리고 확인된 C음의 주파수와 평균율 음계를 곱하여 상대 음정 주파수 표의 12 음계를 확인한 후, 이를 토대로 도 8에서 도시된 바와 같이 상대 음정 주파수 표를 재구성한다. 도 8은 첫 음이 표준 음고에서의 '솔'이며 해당 주파수가 200Hz일 때의 상대 음정 주파수 표의 예를 나타낸 도면이다. 이와 같이 재구성된 상대 음정 주파수 표를 기준으로 음절 분할부(30)에서 분할된 각 음절의 대표 피치값을 매핑하면, 음절의 음정을 인식할 수 있다. 이처럼 상대음정을 이용할 경우 각 개인마다 음역이 다름으로 인하여 발생되는 개인차를 고려하지 않아도 된다는 장점이 있다.First, the first syllable divided by the syllable divider 30 is regarded as a reference sound, and the pitch is confirmed through the representative pitch value of the first syllable. Then, the determined pitch is divided by the average rate scale shown in FIG. Check it. After checking the 12 scales of the relative pitch frequency table by multiplying the confirmed C frequency by the average rate scale, the relative pitch frequency table is reconstructed as shown in FIG. 8 is a diagram showing an example of a table of relative pitch frequencies when the first note is a 'sol' in a standard pitch and the corresponding frequency is 200 Hz. The pitch of the syllables may be recognized by mapping the representative pitch values of the syllables divided by the syllable divider 30 based on the reconstructed relative pitch frequency table. As such, when the relative pitch is used, there is an advantage that the individual difference caused by the difference in the ranges of each individual does not have to be considered.

휴지기 추출부(70)는 피치 추출부(20)에서 추출된 피치 데이터에 적정 임계 값을 적용하여 임계값 이하의 비음성 구간인 휴지기를 추출하고, 추출된 휴지기 정보를 정렬하여 마디 검출부(80)로 출력한다.The pause extractor 70 applies a proper threshold value to the pitch data extracted by the pitch extractor 20 to extract a pause that is a non-speech interval below the threshold value, and sorts the extracted pause information to measure the node 80. Will output

마디 검출부(80)는 휴지기 추출부(70)에서 추출된 휴지기 정보를 통해 입력된 음성신호의 마디를 검출하고, 마디 검출 정보를 악보 생성부(90)로 출력한다.The node detector 80 detects a node of the voice signal input through the pause information extracted by the pause extractor 70, and outputs the node detection information to the score generation unit 90.

악곡에서 마디(measure)는 오선 위에 세로로 그은 세로줄(bar) 사이의 유효한 박자의 길이를 나타낸다. 음악의 시간적인 흐름을 구분하는 박자는 기본이 되는 음표의 종류와 1마디 안에 들어가는 음표의 수에 따라서 결정되므로, 마디를 검출하는 것은 보다 정확한 악보를 구성하는 데 중요하다.In music, the measure indicates the effective length of the beat between the vertical bars drawn vertically above the stave. Since the time signature that distinguishes the temporal flow of music is determined by the type of the underlying note and the number of notes in one measure, detecting a measure is important for constructing a more accurate score.

마디 검출부(80)에서 수행되는 마디 검출의 절차는 다음과 같다.The procedure of node detection performed in the node detection unit 80 is as follows.

우선, 휴지기 추출부(70)에서 피치 추출부(30)로부터 입력된 피치 데이터에 임계값을 적용하여 음성/비음성 구간을 판단하며, 비음성 구간을 음성의 휴지기라 하며 악보 상에서 마디의 후보 위치라 가정한다. 사람이 악보로 구성된 노래를 부를 때에는 대체로 곡의 중간 부분에 가장 긴 휴지기가 존재하게 되는데 이것을 기준마디라고 하며, 휴지기 추출부(70)로부터 입력되는 휴지기 정보 중 가장 긴 휴지기를 기준마디로 정의(N=1)한다. 그리고 한 노래에서 각 마디에는 해당 박자에 해당하는 음표들이 존재하게 되며, 이는 한 마디와 다른 마디의 발성 시간이 비슷한 대역에 분포한다는 것을 의미하므로 기준마디에 일정 배수(0.25, 0.5, 0.75…)를 곱하는 연산을 수행하여 도 9에 도시된 바와 같이 기준마디를 이용하여 예측 마디를 생성한다. 예측된 마디는 음성신호의 총 시간보다 짧을 경우에만 유효하므로 예측마디와 총 시간의 비교를 통해 마디 검출의 진행과 종료를 결정한다. 비교 결 과 마디 검출의 진행시에는 예측 마디와 근접한 후보 휴지기를 검출하게 되는데, 후보 휴지기가 검출된다는 것은 N번째 마디의 검출이 완료되었다는 것을 나타내며, 이후 N=N+1 연산을 수행하여 다음 마디의 검출을 수행한다. 그리고 이와 같이 수행한 마디 검출 정보를 악보 생성부(90)로 출력한다.First, the pause extractor 70 determines a speech / non-voice section by applying a threshold value to the pitch data input from the pitch extractor 30, and the non-voice section is called the pause of speech and the candidate position of the node on the score. Assume When a person sings a song composed of sheet music, the longest rest period is generally present in the middle of the song, which is called a reference node, and the longest rest period of the rest information input from the rest period extraction unit 70 is defined as a reference node (N = 1). In each song, each measure has notes corresponding to the time signature, which means that the vocalization time of one measure and the other measure is distributed in a similar band, so a constant multiple (0.25, 0.5, 0.75…) is added to the reference measure. The multiplication operation is performed to generate prediction nodes using reference nodes as shown in FIG. 9. Since the predicted nodes are valid only when the total time of the speech signal is shorter, the progress and end of node detection are determined by comparing the predicted nodes with the total time. As a result of the node detection, candidate pauses close to the prediction node are detected. The detection of the candidate pauses indicates that the detection of the Nth node is completed. Then, the N = N + 1 operation is performed. Perform detection. The node detection information thus performed is output to the sheet music generating unit 90.

악보 생성부(90)는 음장 인식부(50), 음정 인식부(60), 마디 검출부(80)로부터 입력되는 음장 인식 정보, 음정 인식 정보, 마디 검출 정보를 토대로 악보 데이터를 생성한다.The score generation unit 90 generates score data based on the sound field recognition information, the pitch recognition information, and the node detection information input from the sound field recognition unit 50, the pitch recognition unit 60, and the node detector 80.

다음에는, 이와 같이 구성된 본 발명에 따른 자동 노래 채보방법의 일 실시예를 도 10 내지 도 14를 참조하여 상세하게 설명한다.Next, an embodiment of the automatic song picking method according to the present invention configured as described above will be described in detail with reference to FIGS. 10 to 14.

도 10 내지 도 14는 본 발명에 따른 자동 노래 채보방법의 일 실시예의 동작과정을 나타낸 순서도이다.10 to 14 are flowcharts showing the operation of one embodiment of the automatic song picking method according to the present invention.

우선, 음성신호 입력부(10)는 외부로부터 입력되는 아날로그 음성신호(사람의 노랫소리)를 디지털 음성신호로 변환하여 피치 추출부(20)로 출력하고(S100), 피치 추출부(20)는 음성신호 입력부(10)로부터 입력된 음성신호를 상술한 수학식 1의 자기상관함수를 사용하여 음성신호의 높낮이를 나타내는 피치를 추출한 후, 피치 데이터를 음절 분할부(30)와 휴지기 추출부(70)로 출력한다(S200).First, the voice signal input unit 10 converts an analog voice signal (human song) input from the outside into a digital voice signal and outputs it to the pitch extractor 20 (S100). After extracting the pitch indicating the height of the speech signal using the autocorrelation function of Equation 1 described above, the speech signal input from the signal input unit 10 is divided into the syllable divider 30 and the pause extractor 70. Output to (S200).

피치 추출부(20)로부터 피치 데이터를 입력받은 음절 분할부(30)는 안정화된 역확산 방정식을 기반으로 음절 단위로 분할하며, 분할된 음절 데이터를 전처리부(40), 음정 인식부(60)로 출력한다(S300). 이때 음절 분할부(30)에서 분할되는 음절 데이터는 잡음이 제거됨과 동시에 음절의 대표 피치값을 확인할 수 있는 상태가 된다.The syllable divider 30 receives the pitch data from the pitch extractor 20 and divides the syllable data into syllable units based on the stabilized despreading equation. The syllable data is divided into the preprocessor 40 and the note recognizer 60. Output to (S300). At this time, the syllable data divided by the syllable divider 30 is in a state where noise is removed and a representative pitch value of the syllable is confirmed.

S300 단계의 음절 분할을 도 11을 참조하여 보다 상세하게 설명하면, 음절 분할부(30)는 S100 단계에서 추출된 각각의 피치 데이터들을 독립된 영역으로 설정( 이때, 각 영역의 면적, 영역의 값, 분할된 영역의 수를 계산함 )하고(S310), 인접한 영역 두 개 이상의 값이 같아질 때까지 각 영역의 값을 상술한 수학식 4를 통해 갱신한다(S320).If the syllable division of step S300 will be described in more detail with reference to FIG. 11, the syllable division unit 30 sets the respective pitch data extracted in step S100 as an independent area (in this case, the area of each area, the value of the area, The number of divided regions is calculated (S310), and the value of each region is updated through the above-described Equation 4 until two or more adjacent regions have the same value (S320).

그리고 동일한 값을 가진 인접 영역들을 병합( 이때, 각 영역의 면적, 영역의 값, 분할된 영역의 수를 갱신함 )하고(S330), 반복 횟수가 초기 설정값을 만족하는지를 판단한다(S340).Adjacent areas having the same value are merged (in this case, the area of each area, the value of the area, and the number of divided areas are updated) (S330), and it is determined whether the number of repetitions satisfies the initial setting value (S340).

판단결과 반복 횟수가 초기 설정값을 만족하면, 음절 분할부(30)는 음절 분할을 종료한 후 분할된 음절 데이터를 전처리부(40)와 음정 인식부(60)로 각각 출력하고(S350), 아니면 S320 단계 이후를 반복하여 수행한다.As a result of the determination, if the number of repetitions satisfies the initial set value, the syllable division unit 30 outputs the divided syllable data to the preprocessor 40 and the pitch recognition unit 60 after finishing the syllable division (S350). Otherwise, the process is repeated after step S320.

S300 단계의 음절 분할 이후, 전처리부(40)는 음절 분할부(30)에서 분할된 음절을 음장 인식이 가능한 유효 단위 구간으로 변환하는 전처리를 수행하여 음장 인식부(50)로 출력한다(S400).After the syllable division in step S300, the preprocessor 40 performs preprocessing to convert the syllable divided by the syllable divider 30 into an effective unit section capable of sound field recognition, and outputs the result to the sound field recognition unit 50 (S400). .

음장 인식부(50)는 전처리부(40)에서 전처리된 음장 데이터를 유전자 알고리즘을 기반으로 각각의 음표 단위로 클러스터링하여 음장 인식을 수행하고, 음장 인식 정보를 악보 생성부(90)로 출력한다(S500).The sound field recognition unit 50 performs sound field recognition by clustering the sound field data preprocessed by the preprocessor 40 in units of each note based on a genetic algorithm, and outputs the sound field recognition information to the score generation unit 90 ( S500).

S500 단계의 음장 인식을 도 12를 참조하여 보다 상세하게 설명하면, 분할된 음절을 음장 인식이 가능한 유효 단위 구간으로 변환하는 전처리를 수행한 음장 데이터를 기반으로 염색체를 구성하고 모집단을 초기화하고(S510), 음장 데이터의 초기 클러스터의 센터값을 바탕으로 수학식 5와 같이 각 객체와 클러스터 센터간의 거리를 최소화하는 클러스터링을 수행한다(S520).Referring to FIG. 12 in more detail with reference to FIG. 12, a chromosome is constructed and a population is initialized based on pre-processed sound field data for converting a divided syllable into an effective unit section capable of sound field recognition (S510). On the basis of the center value of the initial cluster of sound field data, clustering is performed to minimize the distance between each object and the cluster center as shown in Equation 5 (S520).

그리고 수학식 6, 7을 사용하여 클러스터간의 거리와 클러스터 내부 거리를 정의하고(S530), 수학식 8의 적합도 함수를 사용하여 S520 단계에서 클러스터링된 결과의 평가를 수행한다(S540).The distance between the clusters and the internal distance of the clusters are defined using Equations 6 and 7 (S530), and the evaluation of the clustered result is performed in step S520 using the fitness function of Equation 8 (S540).

클러스터링된 결과의 평가 이후 S540 단계에서 수행된 적합도 함수의 결과를 공지의 룰렛 휠 방법을 통해 선택하고, 이 결과를 토대로 구성된 염색체를 일점 교차하며, 최종적으로 주어진 돌연변이 계수를 이용하여 새로운 세대를 창출해낸다(S550). 그리고 음장 데이터의 클러스터링이 종료될 때까지 S520 단계 이후를 반복하여 수행한다.After the evaluation of the clustered results, the result of the fitness function performed in step S540 is selected by using a known roulette wheel method, and one point crosses the chromosome constructed based on the result, and finally, a new generation is generated using a given mutation coefficient. (S550). And after the step S520 is repeated until the clustering of the sound field data is completed.

S500 단계를 통해 음장 인식을 수행한 이후, 음정 인식부(60)는 S300 단계를 통해 분할된 첫 음절의 대표 피치값을 기준으로 상대 음정 주파수 표를 재구성하고, 재구성된 상대 음정 주파수 표를 토대로 음절의 대표 피치값을 매핑하여 음정 인식을 수행하며, 음정 인식 결과를 악보 생성부(90)로 출력한다(S600).After performing the sound field recognition through the step S500, the pitch recognizer 60 reconstructs the relative pitch frequency table based on the representative pitch value of the first syllable divided by the step S300, and then performs a syllable based on the reconstructed relative pitch frequency table. Pitch recognition is performed by mapping a representative pitch value of S, and outputs the pitch recognition result to the score generation unit 90 (S600).

S600 단계의 음장 인식을 도 13을 참조하여 보다 상세하게 설명하면, 음정 인식부(60)는 S300 단계에서 분할된 첫 음절을 기준음으로 보고 첫 음절의 대표 피치값을 통해 음정을 확인하고(S610), 확인된 음정을 평균율 음계와 나누어 상대 음정 주파수 표의 C음을 확인한다(S620).When the sound field recognition in step S600 is described in more detail with reference to FIG. 13, the pitch recognition unit 60 determines the pitch through the representative pitch value of the first syllable by viewing the first syllable divided in step S300 as a reference sound (S610). ), Dividing the checked pitch by the average rate scale to confirm the C tone of the relative pitch frequency table (S620).

이후 S620 단계에서 확인된 C음의 주파수와 평균율 음계를 곱하여 상대 음정 주파수 표의 12 음계를 확인하고(S630), 이를 토대로 상대 음정 주파수 표를 재구성한다(S640).Subsequently, multiply the frequency of the C sound identified in step S620 by the average rate scale to check the 12th scale of the relative pitch frequency table (S630), and reconstruct the relative pitch frequency table based on this (S640).

그리고 음정 인식부(60)는 S640 단계에서 재구성된 상대 음정 주파수 표를 기준으로 S300 단계에서 분할된 음절의 대표 피치값을 매핑하고(S650), 각 음절별 매핑 결과를 토대로 음정 인식을 처리하여 그 결과를 악보 생성부(90)로 제공한다(S660).The pitch recognition unit 60 maps representative pitch values of the syllables divided in step S300 based on the relative pitch frequency table reconstructed in step S640 (S650), and processes pitch recognition based on the mapping result of each syllable. The result is provided to the sheet music generating unit 90 (S660).

S500 단계와 S600 단계를 통해 음장 인식 및 음정 인식을 수행한 이후, 휴지기 추출부(70)는 S200 단계에서 추출된 피치 데이터에 임계값을 적용하여 임계값 이하의 비음성 구간인 휴지기를 추출하고, 마디 검출부(80)는 휴지기 추출부(70)에서 추출된 휴지기 정보를 이용하여 마디를 검출한 후 마디 검출 정보를 악보 생성부(90)로 출력한다(S700).After performing the sound field recognition and the pitch recognition through the steps S500 and S600, the pause extractor 70 applies a threshold value to the pitch data extracted in the step S200 to extract a pause that is a non-speech interval below the threshold value. The node detector 80 detects the node using the resting period information extracted by the pause extractor 70, and then outputs the node detection information to the score generation unit 90 (S700).

S700 단계의 마디 검출을 도 14를 참조하여 보다 상세하게 설명하면, 휴지기 추출부(70)는 S200 단계에서 추출된 피치 데이터에 적정한 임계값을 적용하여 음성 구간과 비음성 구간을 확인하고(S710), 임계값 이하의 비음성 구간인 휴지기를 추출한다(S720).If the node detection of the step S700 will be described in more detail with reference to FIG. 14, the pause extractor 70 checks the voice section and the non-voice section by applying an appropriate threshold value to the pitch data extracted at step S200 (S710). In operation S720, a pause period that is a non-voice interval below a threshold value is extracted.

이후 마디 검출부(80)는 S720 단계에서 추출된 휴지기 정보 중 가장 긴 휴지기를 기준 마디 N=1로 정의하고(S730), 정의된 기준 마디에 일정 배수(0.25, 0.5, 0.75…)를 곱하는 연산을 수행하여 예측 마디를 생성한다(S740).Then, the node detector 80 defines the longest resting period among the pause information extracted in step S720 as the reference node N = 1 (S730), and multiplies the defined reference node by a predetermined multiple (0.25, 0.5, 0.75…). In operation S740, a prediction node is generated.

그리고 S740 단계에서 생성된 예측 마디와 음성신호의 총 시간을 비교하여 생성된 예측 마디가 음성신호의 총 시간보다 작은지를 판단하고(S750), 판단 결과 생성된 예측 마디가 음성신호의 총 시간보다 크면 마디 검출을 종료한 후 마디 검출 정보를 악보 생성부(90)로 출력한다.In operation S740, it is determined whether the generated prediction node is smaller than the total time of the voice signal by comparing the predicted node generated in step S740 with the total time of the voice signal (S750). After completing the node detection, the node detection information is output to the score generation unit 90.

그러나 S750 단계의 판단결과 생성된 예측 마디가 음성신호의 총 시간보다 작으면, 마디 검출부(80)는 예측 마디와 근접한 후보 휴지기를 검출하고(S770), 예측 마디와 근접한 후보 휴지기가 검출되면 N번째 마디의 검출을 완료하고, 다음 마디의 검출을 위하여 N=N+1 연산을 수행한 후 S740 단계 이후를 반복하여 수행한다(S780).However, if the predicted node generated as a result of the determination in step S750 is smaller than the total time of the speech signal, the node detector 80 detects a candidate dormant close to the predicted node (S770), and if the candidate dormant close to the predicted node is detected, the Nth node is detected. After the detection of the node is completed, the N = N + 1 operation is performed to detect the next node, and the process is repeated after step S740 (S780).

마지막으로 악보 생성부(90)는 S500, S600, S700 단계를 통해 수행된 음장 인식 정보, 음정 인식 정보, 마디 검출 정보를 토대로 악보 데이터를 생성한다(S800).Finally, the sheet music generation unit 90 generates sheet music data based on the sound field recognition information, the pitch recognition information, and the node detection information performed through steps S500, S600, and S700 (S800).

여기에서, 상술한 본 발명에서는 바람직한 실시예를 참조하여 설명하였지만, 해당 기술분야의 숙련된 당업자는 하기의 특허청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경할 수 있음을 이해할 수 있을 것이다.Herein, while the present invention has been described with reference to the preferred embodiments, those skilled in the art will variously modify the present invention without departing from the spirit and scope of the invention as set forth in the claims below. And can be changed.

도 1은 본 발명에 따른 자동 노래 채보방법이 적용된 구성을 개략적으로 나타낸 블록도,1 is a block diagram schematically showing a configuration to which the automatic song picking method according to the present invention is applied;

도 2는 입력된 음성신호에 자기상관함수를 적용하여 피치를 추출한 예를 나타낸 도면,2 is a diagram illustrating an example of extracting a pitch by applying an autocorrelation function to an input voice signal;

도 3은 안정화된 역확산 방정식을 사용하여 음절 분할을 수행한 결과의 예를 나타낸 도면,3 is a diagram showing an example of a result of performing syllable division using a stabilized despreading equation;

도 4는 본 발명의 전처리 과정의 지속시간, 휴지기 및 IOI의 개념을 설명하기 위한 도면,4 is a view for explaining the concept of the duration of the pretreatment process, the rest period and the IOI of the present invention,

도 5는 IOI로 변형된 음절 길이 정보의 예를 나타낸 도면,5 is a diagram illustrating an example of syllable length information transformed into an IOI;

도 6은 음정 결정시 사용하는 12 음계를 나타낸 도면,6 is a diagram showing 12 scales used when determining pitches;

도 7은 평균율 음계를 나타낸 도면,7 is a diagram illustrating an average rate scale;

도 8은 음정 인식에 사용되는 상대 음정 주파수 표의 일 예를 나타낸 도면,8 is a diagram illustrating an example of a relative pitch frequency table used for pitch recognition;

도 9는 기준마디를 이용하여 찾은 예측마디의 예를 나타낸 도면,9 is a view showing an example of prediction nodes found using reference nodes;

도 10은 본 발명에 따른 자동 노래 채보방법의 일 실시예의 동작과정을 나타낸 순서도,10 is a flowchart showing the operation of an embodiment of the automatic song picking method according to the present invention;

도 11 내지 도 14는 도 10의 각 서브루틴의 동작과정을 보다 상세하게 나타낸 순서도이다.11 to 14 are flowcharts illustrating the operation of each subroutine of FIG. 10 in more detail.

* 도면의 주요부분에 대한 부호의 설명 *Explanation of symbols on the main parts of the drawings

10 : 음성신호 입력부 20 : 피치 추출부10: voice signal input unit 20: pitch extraction unit

30 : 음절 분할부 40 : 전처리부30: syllable divider 40: preprocessor

50 : 음장 인식부 60 : 음정 인식부50: sound field recognition unit 60: pitch recognition unit

70 : 휴지기 추출부 80 : 마디 검출부70: pause extraction unit 80: node detection unit

90 : 악보 생성부90: music score generation unit

Claims

(1) extracting the pitch indicating the height of the voice from the voice signal input through the voice signal input unit such as a microphone,

(2) dividing the continuous voice signal into syllable units based on the pitch data extracted in step (1), removing noise of the voice signal, and providing a representative pitch value of the syllable;

(3) performing preprocessing to convert the syllables divided in step (2) into effective unit sections capable of sound field recognition, and performing sound field recognition by clustering the preprocessed sound field data in units of respective notes,

(4) reconstructing the relative pitch frequency table based on the representative pitch value of the first syllable divided in step (2), and performing pitch recognition by mapping the representative pitch value of the syllable based on the reconstructed relative pitch frequency table ,

(5) applying a threshold value to the pitch data extracted in step (1) to extract a rest period which is a non-speech interval below the threshold value, detecting a node using the extracted rest period information, and

(6) generating sheet music data based on sound field recognition information, pitch recognition information, and node detection information performed through steps (3), (4), and (5) above;

The syllable division of the step (2),

Perform syllable segmentation based on a stabilized despreading equation,

(2-1) setting each pitch data extracted in the step (1) as an independent area, and calculating the area, the value of each area, and the number of divided areas,

(2-2) updating the rate of change value according to scale as the filtering of each area is performed until two or more values of adjacent areas are equal,

(2-3) merging adjacent regions having the same value to update the area of each region, the value of the region, and the number of divided regions,

(2-4) determining whether the repetition number satisfies the initial setting value, and

(2-5) If the number of repetitions satisfies the initial setting value as a result of the determination in the step (2-4), the syllable division is terminated, or the step (2-2) or later is performed.

Automatic song picking method, including.

The method of claim 1,

Pitch extraction of the step (1),

Automatic song collection method using autocorrelation function.

delete

The method of claim 1,

Sound field recognition of the step (3),

It performs sound field recognition based on clustering based on genetic algorithm.

(3-1) constructing a chromosome and initializing a population based on pre-processed sound field data for converting the syllables divided in step (2) into effective unit sections capable of sound field recognition;

(3-2) performing clustering to minimize the distance between each object and the cluster center based on the center value of the initial cluster of the sound field data;

(3-3) defining a distance between clusters and an intracluster distance, and performing evaluation of the clustered result in the step (3-2) using a goodness-of-fit function,

(3-4) crossing one chromosome based on the result of the fitness function performed in step (3-3), and finally creating a new generation using a given mutation coefficient, and

(3-5) repeating step (3-2) and later until clustering of the sound field data ends

Automatic song picking method, including.

The method of claim 1,

Pitch recognition of the step (4),

(4-1) confirming the pitch through the representative pitch value of the first syllable by viewing the first syllable divided in the step (2) as a reference sound,

(4-2) dividing the pitches identified in step (4-1) by the average rate scale to confirm the C sound of the relative pitch frequency table,

(4-3) checking the 12th scale of the relative pitch frequency table by multiplying the frequency of the C tone identified in the step (4-2) by the average rate scale, and reconstructing the relative pitch frequency table based on this,

(4-4) mapping representative pitch values of the syllables divided in step (2) based on the relative pitch frequency table reconstructed in step (4-3), and

(4-5) the step of processing the speech recognition based on the mapping result for each syllable in the step (4-4)

Automatic song picking method, including.

The method of claim 1,

Extracting the node of step (5),

(5-1) determining a speech section and a non-voice section by applying a threshold value to the pitch data extracted in step (1), and extracting a non-speech pause period;

(5-2) defining the longest resting period among the resting period information extracted in step (5-1) as a reference node N = 1,

(5-3) generating a prediction node by performing an operation of multiplying the reference node defined in step (5-2) by a predetermined multiple,

(5-4) determining whether the generated prediction node is smaller than the total time of the voice signal by comparing the prediction node generated in step (5-3) with the total time of the voice signal;

(5-5) if the prediction node generated as a result of the determination of step (5-4) is smaller than the total time of the speech signal, detecting a candidate pause in proximity to the prediction node, and

(5-6) In step (5-5), if the candidate rest period close to the prediction node is detected, the detection of the Nth node is completed, and after performing N = N + 1 operation to detect the next node, the above (5 -3) repeat the steps after

Automatic song picking method, including.

delete