KR20230123043A

KR20230123043A - Appartus for generating multi-track music based on generative adversarial neural network and method for generating multi-genre multi-track music using the same

Info

Publication number: KR20230123043A
Application number: KR1020220018941A
Authority: KR
Inventors: 성연식; 이서우
Original assignee: 동국대학교 산학협력단
Priority date: 2022-02-14
Filing date: 2022-02-14
Publication date: 2023-08-23
Also published as: KR102605724B1

Abstract

본 발명은 생성적 적대 신경망에 기반한 멀티트랙 음악 생성 장치 및 그것을 이용한 복수 장르의 멀티트랙 음악 생성 방법에 관한 것이다. 본 발명에 따르면, 멀티트랙 음악 생성 장치를 이용한 복수 장르의 멀티트랙 음악 생성 방법에 있어서, 멀티트랙을 포함한 미디(MIDI) 파일을 입력받으면, 전처리기를 이용하여 미디 파일의 멀티트랙으로부터 제1 헤드 정보를 추출하는 단계, 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 GAN 기반의 헤드 정보 생성기 학습모델의 생성자의 입력데이터로 설정하고, 제2 헤드 정보를 헤드 정보 생성기 학습모델의 생성자의 출력데이터로 설정하며, 제1 헤드 정보 및 제2 헤드 정보의 진위 여부를 판별자를 통해 판별하여 헤드 정보 생성기 학습모델을 학습시키는 단계, 멀티트랙으로부터 각 트랙별로 연속적 n개 마디 정보를 추출한 상태에서, 제1 헤드 정보 및 연속적 n개 마디 정보를 인코더의 입력데이터로 설정하고, 제1 헤드 특징 정보 및 n개 마디 특징 정보를 인코더의 출력 데이터로 설정하며, 제1 헤드 특징 정보 및 n개 마디 특징 정보를 디코더의 입력데이터로 설정하고, n개 마디 복원 정보, 트랙 종류 및 장르 종류를 디코더의 출력데이터로 설정하며, 멀티트랙으로부터 추출된 n개 마디 정보 및 n개 마디 복원 정보, 트랙 종류 및 장르 종류의 차이를 산출하여 인코더 및 디코더를 학습시키는 단계, n개 마디 특징 정보, 트랙 및 장르 라벨 조건 벡터, 마디 순서를 결합하여 GAN 기반의 마디 생성기 학습모델의 생성자의 입력데이터로 설정하고, 제1 마디를 마디 생성기 학습모델의 생성자의 출력데이터로 설정하고, 멀티트랙으로부터 각 트랙별로 제2 마디를 추출한 상태에서, 제1 마디 및 제2 마디의 진위 여부를 판별하여 마디 생성기 학습모델을 학습시키는 단계, 음악 생성 대상이 되는 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 기 학습된 헤드 정보 생성기 학습모델에 적용하여 헤드 정보를 생성하는 단계, 헤드 정보를 기 학습된 인코더에 적용하여 헤드 특징 정보 및 n개 마디 특징 정보를 추출하는 단계, 그리고 n개 마디 특징 정보와 트랙 및 장르 라벨 조건 벡터, 마디 순서를 결합하여 기 학습된 마디 생성기 학습모델의 생성자에 적용하여 사용자가 기 지정한 장르의 음악을 마디 단위로 생성하는 단계를 포함한다.The present invention relates to an apparatus for generating multi-track music based on a generative adversarial network and a method for generating multi-track music of multiple genres using the same. According to the present invention, in the method for generating multi-track music of multiple genres using a multi-track music generating device, when a MIDI file including a multi-track is received, first head information is obtained from the multi-track of the MIDI file using a preprocessor. Extracting a noise vector and track and genre label condition vectors are set as input data of a generator of a GAN-based head information generator learning model, and second head information is set as output data of a generator of a head information generator learning model, , learning the head information generator learning model by determining whether the first head information and the second head information are genuine or false through a discriminator, in a state in which n consecutive node information is extracted for each track from the multi-track, the first head information and Set the consecutive n-node feature information as the input data of the encoder, set the first head feature information and the n-node feature information as the output data of the encoder, and set the first head feature information and the n-node feature information as the input data of the decoder. , set n bar restoration information, track type, and genre type as output data of the decoder, and calculate the difference between n bar information extracted from the multi-track, n bar restoration information, track type, and genre type Learning the encoder and decoder, combining n node feature information, track and genre label condition vectors, and node order to set as input data of the generator of the GAN-based node generator learning model, and setting the first node as the node generator learning model set as the output data of the generator, and in a state in which the second measure is extracted for each track from the multi-track, determining whether the first measure and the second measure are true or false to learn the measure generator learning model, which is the target of music generation Generating head information by applying the noise vector and track and genre label condition vectors to the pre-learned head information generator learning model, applying the head information to the pre-learned encoder to extract head feature information and n node feature information and combining n bar feature information, track and genre label condition vectors, and bar order and applying them to the generator of the previously learned bar generator learning model to generate music of a genre previously specified by the user in units of bars. .

Description

Apparatus for generating multi-track music based on generative adversarial neural network and method for generating multi-track music of multiple genres using the same }

본 발명은 생성적 적대 신경망에 기반한 멀티트랙 음악 생성 장치 및 그것을 이용한 복수 장르의 멀티트랙 음악 생성 방법에 관한 것으로, 더욱 상세하게는 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 기 학습된 헤드 정보 생성기 학습모델, 인코더 및 디코더, 마디 생성기 학습모델에 적용하여 기 지정된 장르의 음악을 생성하는 생성적 적대 신경망에 기반한 멀티트랙 음악 생성 장치 및 그것을 이용한 복수 장르의 멀티트랙 음악 생성 방법에 관한 것이다. The present invention relates to an apparatus for generating multi-track music based on a generative adversarial network and a method for generating multi-track music of multiple genres using the same, and more particularly, to a head information generator in which noise vectors and track and genre label condition vectors are pre-learned. An apparatus for generating multi-track music based on a generative adversarial neural network that generates music of a pre-specified genre by applying a model, an encoder and a decoder, and a bar generator learning model, and a method for generating multi-track music of multiple genres using the same.

일반적으로, 전통적인 음악 작곡에는 작곡가가 전문 음악 지식을 갖고 영감과 창의적인 경험을 결합하여 음악을 창작한다. Generally, in traditional music composition, the composer has expert musical knowledge and creates music by combining inspiration and creative experience.

여기서, 음악을 창작하기 위해서는 피아노(piano), 기타(guitar), 베이스(bass) 및 드럼(drum) 등 다양한 트랙을 포함되는 음원, 미디(MIDI: Musical Instrument Digital Interface)를 이용하여 음악을 창작한다. Here, in order to create music, music is created using a sound source including various tracks such as piano, guitar, bass, and drum, and MIDI (Musical Instrument Digital Interface). .

그리고, 악기 별로 반복되는 패턴을 데이터베이스화하여, 정해진 패턴에 맞게 음악을 창작할 수 있다. In addition, by creating a database of repeated patterns for each musical instrument, music can be created according to a predetermined pattern.

한편, 컴퓨터 기술의 발전함에 따라 다양한 음악 관련 기술이 개발되었다. Meanwhile, with the development of computer technology, various music-related technologies have been developed.

특히, GAN(Generative Adversarial Networks) 알고리즘 등과 같은 인공지능을 통해 사용자가 입력한 정보를 기반으로 유사한 스타일의 음악을 생성할 수 있게 되었다.In particular, it is possible to generate music in a similar style based on information input by a user through artificial intelligence such as a Generative Adversarial Networks (GAN) algorithm.

이러한 음악 생성기술은 사용자의 음성 또는 생성하려는 음악과 유사한 느낌을 가진 곡을 이용하여 사용자가 원하는 멜로디를 음악 생성기술에 적용하여 음악을 자동으로 생성하거나 모티프(motif)를 기반으로 작곡가가 다양한 형태의 음악을 자동으로 생성한 후, 생성한 음악을 전곡의 작곡에 보조적으로 이용하는 등의 다양한 방식으로 응용이 가능하다.This music generation technology automatically generates music by applying a melody desired by the user to the music generation technology using a song that has a similar feeling to the user's voice or the music to be created, or the composer creates various types of music based on the motif. After automatically generating music, it can be applied in various ways, such as using the generated music as an aid to composing all songs.

다만, 악기 별로 반복되는 패턴을 통해 유사한 느낌을 가진 곡을 생성하므로, 생성된 음악의 구조가 다양하지 못하다는 한계가 있다. However, since songs with similar feelings are generated through repeated patterns for each instrument, there is a limit in that the structure of the generated music is not diverse.

그리고, 음악 생성기술로 사용된 딥러닝 모델이 1분 이상의 긴 길이의 음악을 생성하기에는 구조적인 한계가 있다.In addition, the deep learning model used as a music generation technology has structural limitations in generating long-length music of more than 1 minute.

본 발명의 배경이 되는 기술은 대한민국공개특허공보 제10-2021-0093223호 (2021.07.27 공개)에 개시되어 있다.The background technology of the present invention is disclosed in Korean Patent Publication No. 10-2021-0093223 (published on July 27, 2021).

본 발명이 이루고자 하는 기술적 과제는 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 기 학습된 헤드 정보 생성기 학습모델, 인코더 및 디코더, 마디 생성기 학습모델에 적용하여 기 지정된 장르의 음악을 생성하는 생성적 적대 신경망에 기반한 멀티트랙 음악 생성 장치 및 그것을 이용한 복수 장르의 멀티트랙 음악 생성 방법을 제공하는 것이다. The technical problem to be achieved by the present invention is a generative adversarial neural network that generates music of a pre-specified genre by applying a noise vector, track and genre label condition vectors to a pre-learned head information generator learning model, an encoder and decoder, and a bar generator learning model. To provide a multi-track music generating device based on and a multi-genre multi-track music generating method using the same.

이러한 기술적 과제를 이루기 위한 본 발명의 실시예에 따르면, 멀티트랙 음악 생성 장치를 이용한 복수 장르의 멀티트랙 음악 생성 방법에 있어서, 멀티트랙을 포함한 미디(MIDI) 파일을 입력받으면, 전처리기를 이용하여 상기 미디 파일의 멀티트랙으로부터 제1 헤드 정보를 추출하는 단계, 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 GAN 기반의 헤드 정보 생성기 학습모델의 생성자의 입력데이터로 설정하고, 제2 헤드 정보를 헤드 정보 생성기 학습모델의 생성자의 출력데이터로 설정하며, 상기 제1 헤드 정보 및 제2 헤드 정보의 진위 여부를 판별자를 통해 판별하여 헤드 정보 생성기 학습모델을 학습시키는 단계, 상기 멀티트랙으로부터 각 트랙별로 연속적 n개 마디 정보를 추출한 상태에서, 상기 제1 헤드 정보 및 연속적 n개 마디 정보를 인코더의 입력데이터로 설정하고, 제1 헤드 특징 정보 및 n개 마디 특징 정보를 인코더의 출력 데이터로 설정하며, 제1 헤드 특징 정보 및 n개 마디 특징 정보를 디코더의 입력데이터로 설정하고, n개 마디 복원 정보, 트랙 종류 및 장르 종류를 디코더의 출력데이터로 설정하며, 상기 멀티트랙으로부터 추출된 n개 마디 정보 및 n개 마디 복원 정보, 트랙 종류 및 장르 종류의 차이를 산출하여 인코더 및 디코더를 학습시키는 단계, 상기 n개 마디 특징 정보, 트랙 및 장르 라벨 조건 벡터, 마디 순서를 결합하여 GAN 기반의 마디 생성기 학습모델의 생성자의 입력데이터로 설정하고, 제1 마디를 마디 생성기 학습모델의 생성자의 출력데이터로 설정하고, 멀티트랙으로부터 각 트랙별로 제2 마디를 추출한 상태에서, 상기 제1 마디 및 제2 마디의 진위 여부를 판별하여 마디 생성기 학습모델을 학습시키는 단계, 음악 생성 대상이 되는 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 상기 기 학습된 헤드 정보 생성기 학습모델에 적용하여 헤드 정보를 생성하는 단계, 상기 헤드 정보를 상기 기 학습된 인코더에 적용하여 헤드 특징 정보 및 n개 마디 특징 정보를 추출하는 단계, 그리고 상기 n개 마디 특징 정보와 트랙 및 장르 라벨 조건 벡터, 마디 순서를 결합하여 상기 기 학습된 마디 생성기 학습모델의 생성자에 적용하여 사용자가 기 지정한 장르의 음악을 마디 단위로 생성하는 단계를 포함한다. According to an embodiment of the present invention for achieving this technical problem, in the method of generating multi-track music of multiple genres using a multi-track music generating device, when a MIDI file including a multi-track is received, the preprocessor is used to Extracting the first head information from the multi-track of the MIDI file, setting the noise vector and the track and genre label condition vectors as input data of the generator of the GAN-based head information generator learning model, and setting the second head information to the head information generator learning the head information generator learning model by setting it as the output data of the generator of the learning model and discriminating whether the first head information and the second head information are genuine or false through a discriminator, n consecutively for each track from the multi-track In the state in which node information is extracted, the first head information and consecutive n node information are set as input data of the encoder, the first head characteristic information and n node character information are set as output data of the encoder, and the first head Characteristic information and n-bar feature information are set as input data of the decoder, n-bar restoration information, track types and genre types are set as output data of the decoder, and n-bar information and n-bar information extracted from the multi-track are set. Learning the encoder and decoder by calculating the difference between the node restoration information, the track type and the genre type, and generating a GAN-based node generator learning model by combining the n node feature information, track and genre label condition vectors, and node order With the input data set as the input data, the first node set as the output data of the creator of the node generator learning model, and the second node extracted for each track from the multi-track, whether the first node and the second node are true or not is checked. discriminating and learning a bar generator learning model; generating head information by applying a noise vector and a track/genre label condition vector to the previously learned head information generator learning model; Extracting head feature information and n node feature information by applying to the pre-learned encoder, and combining the n node feature information, track and genre label condition vectors, and node order to obtain the pre-learned node generator learning model and generating music of a genre previously specified by a user in units of measures by applying the method to the generator.

상기 전처리기를 이용하여 미디 파일의 멀티트랙으로부터 제1 헤드 정보를 추출하는 단계는, 음악 장르가 분류된 미디 파일을 입력 받는 단계, 상기 멀티트랙으로부터 드럼 트랙, 피아노 트랙, 베이스 트랙, 기타 트랙 중에서 적어도 하나를 추출하는 단계, 그리고 상기 복수의 트랙 각각을 전처리기에 적용하여 각 트랙에 대응하는 제1 헤드 정보를 추출하는 단계를 포함할 수 있다. The step of extracting first head information from the multi-track of the MIDI file using the pre-processor includes the step of receiving a MIDI file classified by music genre, at least among drum tracks, piano tracks, bass tracks, and other tracks from the multi-tracks. It may include extracting one, and extracting first head information corresponding to each track by applying each of the plurality of tracks to a preprocessor.

상기 헤드 정보 생성기 학습모델을 학습시키는 단계는, 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 헤드 정보 생성기 학습모델에 적용하여 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 기반으로 하는 제2 헤드 정보를 출력하는 단계, 그리고 상기 제1 헤드 정보 및 제2 헤드 정보를 헤드 정보 생성기 학습모델의 판별자의 입력데이터로 설정하고, 상기 제1 헤드 정보 및 제2 헤드 정보의 진위 여부를 헤드 정보 생성기 학습모델의 판별자의 출력데이터로 설정하는 단계, 상기 제1 헤드 정보 및 제2 헤드 정보의 진위 여부를 판별하도록 헤드 정보 생성기 학습모델을 학습시키는 단계를 포함할 수 있다. In the step of learning the head information generator learning model, second head information based on the noise vector and the track and genre label condition vectors is output by applying the noise vector and the track and genre label condition vectors to the head information generator learning model. and setting the first head information and the second head information as input data of a discriminator of the head information generator learning model, and determining whether the first head information and the second head information are authentic or false of the discriminator of the head information generator learning model. The method may include setting as output data and training a head information generator learning model to determine whether the first head information and the second head information are authentic.

상기 인코더 및 디코더를 학습시키는 단계는, 상기 미디 파일의 정답 트랙 및 장르 라벨 조건 벡터를 디코더의 학습용 정답 데이터로 설정하고, 상기 제1 헤드 특징 정보 및 n개 마디 특징 정보를 디코더에 적용하여 복원 트랙 및 장르 라벨 조건 벡터를 생성하며, 상기 정답 트랙 및 장르 라벨 조건 벡터와 복원 트랙 및 장르 라벨 조건 벡터의 차이를 산출하도록 상기 인코더 및 디코더를 더 학습시킬 수 있다. In the step of learning the encoder and decoder, the correct answer track and the genre label condition vector of the MIDI file are set as correct answer data for learning of the decoder, and the first head feature information and the n-bar feature information are applied to the decoder to restore the track. and generating a genre label condition vector, and further training the encoder and decoder to calculate a difference between the correct track and genre label condition vector and the restored track and genre label condition vector.

상기 마디 생성기 학습모델을 학습시키는 단계는, 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 상기 헤드 정보 생성기 학습모델에 적용하여 제2 헤드 정보를 출력하는 단계, 상기 제2 헤드 정보를 상기 인코더에 입력하여 제2 헤드 특징 정보 및 n개 마디 특징 정보를 출력하는 단계, 상기 n개 마디 특징 정보와 트랙 및 장르 라벨 조건 벡터, 마디 순서를 결합하여 상기 마디 생성기 학습모델에 적용하여 제1 마디를 출력하는 단계, 상기 미디 파일의 멀티트랙으로부터 각각의 트랙에 대응하는 제2 마디를 추출한 상태에서, 상기 제1 마디와 제2 마디를 상기 마디 생성기 학습모델의 판별자에 적용하여 상기 제1 마디 및 제2 마디의 진위 여부를 판단하도록 상기 마디 생성기를 학습시키는 단계, 그리고 연속적인 제1 마디를 붙여서 제1 트랙을 생성하는 단계를 포함할 수 있다. The step of learning the bar generator learning model includes outputting second head information by applying a noise vector and track and genre label condition vectors to the head information generator learning model, inputting the second head information to the encoder, outputting second head feature information and n node feature information; outputting a first node by combining the n node feature information, track and genre label condition vectors, and node order and applying them to the node generator learning model; , With the second measure corresponding to each track extracted from the multi-track of the MIDI file, the first measure and the second measure are applied to the discriminator of the measure generator learning model to determine the first measure and the second measure. It may include the step of learning the bar generator to determine the authenticity of the, and the step of generating a first track by attaching a continuous first bar.

본 발명의 다른 실시예에 따르면, 생성적 적대 신경망에 기반한 멀티트랙 음악 생성 장치에 있어서, 멀티트랙을 포함한 미디(MIDI) 파일을 입력받으면, 전처리기를 이용하여 상기 미디 파일의 멀티트랙으로부터 제1 헤드 정보를 추출하는 헤드 정보 추출부, 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 GAN 기반의 헤드 정보 생성기 학습모델의 생성자의 입력데이터로 설정하고, 제2 헤드 정보를 상기 헤드 정보 생성기 학습모델의 생성자의 출력데이터로 설정하며, 상기 제1 헤드 정보와 제2 헤드 정보의 진위 여부를 판별자를 통해 판별하여 헤드 정보 생성기 학습모델을 학습시키는 헤드 정보 생성기 학습부, 상기 멀티트랙으로부터 각 트랙별로 연속적 n개 마디 정보를 추출한 상태에서, 상기 제1 헤드 정보 및 연속적 n개 마디 정보를 인코더의 입력데이터로 설정하고, 제1 헤드 특징 정보 및 n개 마디 특징 정보를 인코더의 출력 데이터로 설정하며, 제1 헤드 특징 정보 및 n개 마디 특징 정보를 디코더의 입력데이터로 설정하고, n개 마디 복원 정보, 트랙 종류 및 장르 종류를 디코더의 출력데이터로 설정하며, 상기 멀티트랙으로부터 추출된 n개 마디 정보 및 n개 마디 복원 정보, 트랙 종류 및 장르 종류의 차이를 산출하여 인코더 및 디코더를 학습시키는 인코더 및 디코더 학습부, 상기 n개 마디 특징 정보, 트랙 및 장르 라벨 조건 벡터, 마디 순서를 결합하여 GAN 기반의 마디 생성기 학습모델의 생성자의 입력데이터로 설정하고, 제1 마디를 마디 생성기 학습모델의 생성자의 출력데이터로 설정하고, 멀티트랙으로부터 각 트랙별로 제2 마디를 추출한 상태에서, 상기 제1 마디 및 제2 마디의 진위 여부를 판별하여 마디 생성기 학습모델을 학습시키는 마디 생성기 학습부, 음악 생성 대상이 되는 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 상기 기 학습된 헤드 정보 생성기 학습모델에 적용하여 헤드 정보를 생성하는 생성부, 상기 헤드 정보를 상기 기 학습된 인코더에 적용하여 헤드 특징 정보 및 n개 마디 특징 정보를 추출하는 추출부, 그리고 상기 n개 마디 특징 정보와 트랙 및 장르 라벨 조건 벡터, 마디 순서를 결합하여 상기 기 학습된 마디 생성기 학습모델의 생성자에 적용하여 사용자가 기 지정한 장르의 음악을 마디 단위로 생성하는 제어부를 포함한다. According to another embodiment of the present invention, in a multi-track music generating apparatus based on a generative adversarial network, when a MIDI file including a multi-track is received, a first head is extracted from the multi-track of the MIDI file using a preprocessor. A head information extractor extracting information, a noise vector and track and genre label condition vectors are set as input data of a generator of a GAN-based head information generator learning model, and second head information is set as the generator of the head information generator learning model. a head information generator learning unit that sets the first head information and the second head information as output data and determines whether the first head information and the second head information are genuine or false through a discriminator and learns the head information generator learning model; n consecutive nodes for each track from the multi-track In the information extracted state, the first head information and n consecutive node information are set as input data of the encoder, the first head characteristic information and n node characteristic information are set as output data of the encoder, and the first head feature information and n-bar feature information are set as input data of the decoder, n-bar restoration information, track type and genre type are set as output data of the decoder, and n-bar information and n-bar information extracted from the multi-track are set. An encoder and decoder learning unit that trains the encoder and decoder by calculating the difference between reconstruction information, track type and genre type, GAN-based node generator learning by combining the n node feature information, track and genre label condition vectors, and node order In a state where the input data of the creator of the model is set, the first node is set as the output data of the creator of the node generator learning model, and the second node is extracted for each track from the multi-track, the first node and the second node A bar generator learning unit that determines whether or not it is authentic and learns the bar generator learning model, generates head information by applying the noise vector and the track and genre label condition vectors to be generated to the previously learned head information generator learning model an extraction unit for extracting head feature information and n-node feature information by applying the head information to the pre-learned encoder; and combining the n-node feature information, track and genre label condition vectors, and node order to and a control unit for generating music of a genre previously designated by a user in units of measures by applying the previously learned measure generator to the generator of the learning model.

이와 같이 본 발명에 따르면, 다양한 장르 및 다양한 트랙으로 학습모델을 학습시키므로, 음악을 다양하게 생성할 수 있고, GAN 알고리즘 기반의 학습모델을 통해 한 곡의 음악을 생성시킬 수 있다. As described above, according to the present invention, since the learning model is trained with various genres and various tracks, various types of music can be generated, and one piece of music can be generated through the learning model based on the GAN algorithm.

도 1은 본 발명의 실시예에 따른 생성적 적대 신경망에 기반한 멀티트랙 음악 생성 장치의 구성을 설명하기 위한 도면이다.
도 2는 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치를 학습시키는 과정을 설명하기 위한 순서도이다.
도 3는 도 2의 S220 단계를 설명하기 위한 도면이다.
도 4는 도 2의 S230 단계를 설명하기 위한 순서도이다.
도 5는 도 2의 S230 단계를 설명하기 위한 도면이다.
도 6은 도 2의 S240 단계를 설명하기 위한 순서도이다.
도 7은 도 2의 S240 단계를 설명하기 위한 도면이다.
도 8은 도 2의 S250 단계를 설명하기 위한 순서도이다.
도 9는 도 2의 S250 단계를 설명하기 위한 도면이다.
도 10은 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치를 이용한 복수 장르의 멀티트랙 음악 생성 방법을 설명하기 위한 순서도이다.
도 11은 도 10을 설명하기 위한 도면이다.1 is a diagram for explaining the configuration of an apparatus for generating multi-track music based on a generative adversarial neural network according to an embodiment of the present invention.
2 is a flowchart illustrating a process of learning a multi-track music generating device according to an embodiment of the present invention.
FIG. 3 is a diagram for explaining step S220 of FIG. 2 .
FIG. 4 is a flowchart illustrating step S230 of FIG. 2 .
FIG. 5 is a diagram for explaining step S230 of FIG. 2 .
6 is a flowchart for explaining step S240 of FIG. 2 .
FIG. 7 is a diagram for explaining step S240 of FIG. 2 .
8 is a flowchart for explaining step S250 of FIG. 2 .
FIG. 9 is a diagram for explaining step S250 of FIG. 2 .
10 is a flowchart illustrating a method of generating multi-track music of multiple genres using a multi-track music generating apparatus according to an embodiment of the present invention.
FIG. 11 is a diagram for explaining FIG. 10 .

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시 예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail so that those skilled in the art can easily practice with reference to the accompanying drawings. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. In addition, in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a certain component is said to "include", it means that it may further include other components without excluding other components unless otherwise stated.

그러면 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다.Then, with reference to the accompanying drawings, an embodiment of the present invention will be described in detail so that those skilled in the art can easily practice it.

이하에서는 도 1을 이용하여 본 발명의 실시예에 따른 생성적 적대 신경망에 기반한 멀티트랙 음악 생성 장치(100)의 구성을 설명한다.Hereinafter, the configuration of a multi-track music generating apparatus 100 based on a generative adversarial network according to an embodiment of the present invention will be described using FIG. 1 .

도 1은 본 발명의 실시예에 따른 생성적 적대 신경망에 기반한 멀티트랙 음악 생성 장치의 구성을 설명하기 위한 도면이다.1 is a diagram for explaining the configuration of an apparatus for generating multi-track music based on a generative adversarial neural network according to an embodiment of the present invention.

도 1에서 도시한 바와 같이, 본 발명의 실시예에 따른 생성적 적대 신경망에 기반한 멀티트랙 음악 생성 장치(100)는 헤드 정보 추출부(110), 헤드 정보 생성기 학습부(120), 인코더 및 디코더 학습부(130), 마디 생성기 학습부(140), 생성부(150), 추출부(160) 및 제어부(170)를 포함한다.As shown in FIG. 1, the apparatus 100 for generating multi-track music based on a generative adversarial neural network according to an embodiment of the present invention includes a head information extraction unit 110, a head information generator learning unit 120, an encoder and a decoder. It includes a learning unit 130, a bar generator learning unit 140, a generation unit 150, an extraction unit 160, and a control unit 170.

먼저, 헤드 정보 추출부(110)는 멀티트랙을 포함한 미디(MIDI) 파일을 입력받으면 전처리기를 이용하여 미디 파일의 멀티트랙으로부터 제1 헤드 정보를 추출한다. First, when receiving a MIDI file including multi-tracks, the head information extractor 110 extracts first head information from the multi-tracks of the MIDI file using a preprocessor.

여기서, 멀티트랙은 드럼 트랙, 피아노 트랙, 베이스 트랙, 기타 트랙 중에서 적어도 하나를 포함한다.Here, the multi-track includes at least one of a drum track, a piano track, a bass track, and a guitar track.

그리고, 헤드 정보 추출부(110)는 미디 파일의 드럼 트랙, 피아노 트랙, 베이스 트랙, 기타 트랙 각각을 전처리기에 입력하여 각 트랙에 대응하는 제1 헤드 정보를 추출한다.Then, the head information extractor 110 inputs each of the drum track, piano track, bass track, and guitar track of the MIDI file to the preprocessor and extracts first head information corresponding to each track.

다음으로, 헤드 정보 생성기 학습부(120)는 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 생성적 적대 신경망(Generative Adversarial Networks, 이하 "GAN"로 명명한다.) 기반의 헤드 정보 생성기 학습모델의 생성자의 입력데이터로 설정하고, 제2 헤드 정보를 상기 헤드 정보 생성기 학습모델의 생성자의 출력데이터로 설정한다.Next, the head information generator learning unit 120 sets the noise vector and the track and genre label condition vectors to the generator of the head information generator learning model based on Generative Adversarial Networks (hereinafter referred to as "GAN"). set as the input data, and set the second head information as the output data of the generator of the head information generator learning model.

그리고, 헤드 정보 생성기 학습부(120)는 제1 헤드 정보와 제2 헤드 정보의 진위 여부를 판별자를 통해 판별하여 헤드 정보 생성기 학습모델을 학습시킨다.In addition, the head information generator learning unit 120 determines whether the first head information and the second head information are genuine or false through a discriminator and learns the head information generator learning model.

즉, 헤드 정보 생성기 학습부(120)는 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 헤드 정보 생성기 학습모델에 적용하여 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 기반으로 하는 제2 헤드 정보를 출력한다. That is, the head information generator learning unit 120 applies the noise vector and the track and genre label condition vectors to the head information generator learning model and outputs second head information based on the noise vector and the track and genre label condition vectors.

그리고, 헤드 정보 생성기 학습부(120)는 제1 헤드 정보 또는 제2 헤드 정보를 헤드 정보 생성기 학습모델의 판별자의 입력데이터로 설정하고, 제1 헤드 정보 및 제2 헤드 정보의 진위 여부를 헤드 정보 생성기 학습모델의 판별자의 출력데이터로 설정한다.Then, the head information generator learning unit 120 sets the first head information or the second head information as input data of the discriminator of the head information generator learning model, and determines whether the first head information and the second head information are authentic or false as the head information. Set as the output data of the discriminator of the generator learning model.

그 다음, 헤드 정보 생성기 학습부(120)는 제1 헤드 정보 및 제2 헤드 정보의 진위 여부를 판별하도록 헤드 정보 생성기 학습모델을 학습시킨다. Next, the head information generator learning unit 120 trains the head information generator learning model to determine whether the first head information and the second head information are authentic.

다음으로, 인코더 및 디코더 학습부(130)는 제1 헤드 정보 및 연속적 n개 마디 정보를 인코더의 입력데이터로 설정하고, 제1 헤드 특징 정보 및 n개 마디 특징 정보를 인코더의 출력 데이터로 설정한다.Next, the encoder and decoder learning unit 130 sets the first head information and n consecutive node information as input data of the encoder, and sets the first head feature information and n node feature information as output data of the encoder. .

그리고, 인코더 및 디코더 학습부(130)는 제1 헤드 특징 정보 및 n개 마디 특징 정보를 디코더의 입력데이터로 설정하고, n개 마디 복원 정보, 트랙 종류 및 장르 종류를 디코더의 출력데이터로 설정한다.Then, the encoder and decoder learning unit 130 sets the first head feature information and n measure feature information as input data of the decoder, and sets the n measure restoration information, track type and genre type as output data of the decoder. .

이때, 인코더 및 디코더 학습부(130)는 멀티트랙으로부터 추출된 n개 마디 정보 및 n개 마디 복원 정보, 트랙 종류 및 장르 종류의 차이를 산출하여 인코더 및 디코더를 학습시킨다. At this time, the encoder and decoder learning unit 130 learns the encoder and decoder by calculating the difference between the n-bar information extracted from the multi-track, the n-bar restoration information, the track type, and the genre type.

더욱 자세하게는, 인코더 및 디코더 학습부(130)는 멀티트랙으로부터 각 트랙 마다 연속적 n개 마디 정보를 추출하여 제1 헤드 정보 및 연속적 n개 마디 정보를 인코더에 적용하여 제1 헤드 특징 정보 및 n개 마디 특징 정보를 출력한다.More specifically, the encoder and decoder learning unit 130 extracts n consecutive bar information for each track from the multi-track and applies the first head information and the n consecutive n bar information to the encoder to obtain first head characteristic information and n n consecutive bar information. Output node feature information.

여기서, 인코더 및 디코더 학습부(130)는 제1 헤드 특징 정보 및 n개 마디 특징 정보를 디코더에 적용하여 n개 마디 복원 정보를 출력하고, 멀티트랙으로부터 추출된 실제 n개 마디 정보와 n개 마디 복원 정보를 비교하여 차이를 산출하도록 인코더 및 디코더를 학습시킨다.Here, the encoder and decoder learning unit 130 applies the first head feature information and n node feature information to the decoder to output n node restoration information, and outputs n node reconstruction information and n node information extracted from the multi-track. Encoders and decoders are trained to compare reconstruction information to calculate differences.

그리고, 인코더 및 디코더 학습부(130)는 미디 파일의 정답 트랙 및 장르 라벨 조건 벡터를 디코더의 학습용 정답 데이터로 설정하고, 제1 헤드 특징 정보 및 n개 마디 특징 정보를 디코더에 적용하여 복원 트랙의 종류 및 장르 조건 벡터를 생성한다.Then, the encoder and decoder learning unit 130 sets the correct answer track and genre label condition vector of the MIDI file as correct answer data for learning of the decoder, and applies the first head feature information and n-bar feature information to the decoder to obtain a restored track. Generates the category and genre condition vectors.

그 다음, 인코더 및 디코더 학습부(130)는 정답 트랙 및 장르 라벨 조건 벡터와 복원 트랙 및 장르 라벨 조건 벡터의 차이를 산출하도록 인코더 및 디코더를 더 학습시킨다. Next, the encoder and decoder learning unit 130 further trains the encoder and decoder to calculate the difference between the correct track and genre label condition vector and the restored track and genre label condition vector.

다음으로, 마디 생성기 학습부(140)는 n개 마디 특징 정보, 트랙 및 장르 라벨 조건 벡터, 마디 순서를 결합하여 GAN 기반의 마디 생성기 학습모델의 생성자의 입력데이터로 설정하고, 제1 마디를 마디 생성기 학습모델의 생성자의 출력데이터로 설정한다.Next, the node generator learning unit 140 combines n node feature information, track and genre label condition vectors, and node order to set the first node as input data of the generator of the GAN-based node generator learning model, and Set as the output data of the generator of the generator learning model.

그리고, 마디 생성기 학습부(140)는 멀티트랙으로부터 각 트랙별로 제2 마디를 추출한 상태에서 제1 마디 및 제2 마디의 진위 여부를 판별하여 마디 생성기 학습모델을 학습시킨다.Then, the bar generator learning unit 140 determines whether the first bar and the second bar are true or not in a state in which the second bar is extracted for each track from the multi-track, and learns the bar generator learning model.

즉, 마디 생성기 학습부(140)는 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 헤드 정보 생성기 학습모델에 적용하여 제2 헤드 정보를 출력하고, 제2 헤드 정보를 인코더에 입력하여 제2 헤드 특징 정보 및 n개 마디 특징 정보를 출력한다.That is, the bar generator learning unit 140 applies the noise vector and the track and genre label condition vectors to the head information generator learning model to output second head information, and inputs the second head information to the encoder to obtain second head feature information. and n node feature information is output.

그 다음, 마디 생성기 학습부(140)는 n개 마디 특징 정보와 트랙 및 장르 라벨 조건 벡터, 마디 순서를 결합하여 마디 생성기 학습모델에 적용하여 제1 마디를 출력한다.Next, the bar generator learning unit 140 combines n bar feature information, track and genre label condition vectors, and bar order, applies them to the bar generator learning model, and outputs a first bar.

그리고, 마디 생성기 학습부(140)는 미디 파일의 멀티트랙으로부터 각각의 트랙에 대응하는 제2 마디를 추출한 상태에서 제1 마디와 제2 마디를 마디 생성기 학습모델의 판별자에 적용하여 제1 마디 및 제2 마디의 진위 여부를 판단하도록 마디 생성기 학습모델을 학습시키며, 연속적인 제1 마디를 붙여서 제1 트랙을 생성한다. Then, the bar generator learning unit 140 extracts the second bar corresponding to each track from the multi-track of the MIDI file, applies the first bar and the second bar to the discriminator of the bar generator learning model, and applies the first bar to the discriminator of the bar generator learning model. and learning the learning model of the bar generator to determine whether the second bar is true or not, and generating a first track by attaching consecutive first bars.

다음으로, 생성부(150)는 음악 생성 대상이 되는 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 기 학습된 헤드 정보 생성기 학습모델에 적용하여 헤드 정보를 생성한다. Next, the generation unit 150 generates head information by applying the noise vector and the track and genre label condition vectors to the pre-learned head information generator learning model.

이때, 생성부(150)는 헤드 정보 생성기 학습부(120)에서 기 학습된 헤드 정보 생성기 학습모델을 통해 음악 생성 대상이 되는 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터 기반의 헤드 정보를 생성할 수 있다.At this time, the generator 150 may generate head information based on a noise vector to be generated and track and genre label condition vectors through the head information generator learning model pre-learned in the head information generator learning unit 120. .

다음으로, 추출부(160)는 헤드 정보를 기 학습된 인코더에 적용하여 헤드 특징 정보 및 n개 마디 특징 정보를 추출한다. Next, the extractor 160 extracts head feature information and n node feature information by applying the head information to the pre-learned encoder.

이때, 추출부(160)는 인코더 및 디코더 학습부(130)에서 기 학습된 인코더를 이용하여 노이즈 벡터 기반의 헤드 정보로부터 헤드 특징 정보 및 n개 마디 특징 정보를 추출한다.At this time, the extractor 160 extracts head feature information and n-node feature information from noise vector-based head information using the previously learned encoder in the encoder and decoder learner 130 .

다음으로, 제어부(170)는 헤드 특징 정보 및 n개 마디 특징 정보와 트랙 및 장르 라벨 조건 벡터, 마디 순서를 결합하여 기 학습된 마디 생성기 학습모델의 생성자에 적용하여 사용자가 기 지정한 장르의 음악을 마디 단위로 생성한다.Next, the control unit 170 combines the head feature information, the n-bar feature information, the track and genre label condition vectors, and the order of the bars, and applies them to the generator of the pre-learned bar generator learning model to generate music of a genre previously designated by the user. Create word by word.

이때, 장르는 힙합, 발라드, 댄스, 전통 음악, 클래식, 레게, pop, 포크 등을 포함할 수 있다.At this time, the genre may include hip-hop, ballad, dance, traditional music, classical music, reggae, pop, folk, and the like.

이하에서는 도 2 내지 도 9를 이용하여 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)를 학습시키는 과정에 대하여 설명한다.Hereinafter, a process of learning the multi-track music generating device 100 according to an embodiment of the present invention will be described using FIGS. 2 to 9 .

도 2는 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치를 학습시키는 과정을 설명하기 위한 순서도이다.2 is a flowchart illustrating a process of learning a multi-track music generating device according to an embodiment of the present invention.

먼저, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 멀티트랙을 포함한 미디(MIDI) 파일을 입력받는다(S210).First, the multi-track music generating apparatus 100 according to an embodiment of the present invention receives a MIDI file including multi-tracks (S210).

이때, 멀티트랙은 드럼 트랙, 피아노 트랙, 베이스 트랙, 기타 트랙을 포함한다.In this case, the multi-track includes a drum track, a piano track, a bass track, and a guitar track.

다음으로, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 전처리기를 이용하여 미디 파일의 멀티트랙으로부터 제1 헤드 정보를 추출한다(S220).Next, the multi-track music generating apparatus 100 according to an embodiment of the present invention extracts first head information from the multi-track of the MIDI file using a preprocessor (S220).

도 3는 도 2의 S220 단계를 설명하기 위한 도면이다. FIG. 3 is a diagram for explaining step S220 of FIG. 2 .

도 3에서 도시한 바와 같이, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 멀티트랙으로부터 드럼 트랙(Track_i ^D), 피아노 트랙(Track_i ^P), 베이스 트랙(Track_i ^B), 기타 트랙(Track_i ^G) 중에서 적어도 하나를 추출한다.As shown in FIG. 3, the multi-track music generating apparatus 100 according to an embodiment of the present invention includes a drum track (Track _i ^D ), a piano track (Track _i ^P ), and a bass track (Track _i ^B ) from the multi-track. , at least one of other tracks (Track _i ^G ) is extracted.

여기서, 미디 파일은 멀티트랙으로 구성되어 드럼 트랙(Track_i ^D), 피아노 트랙(Track_i ^P), 베이스 트랙(Track_i ^B), 기타 트랙(Track_i ^G)으로 분류되고, 각각의 트랙은 복수의 마디로 구성될 수 있다. Here, MIDI files are composed of multi-tracks and are classified into drum tracks (Track _i ^D ), piano tracks (Track _i ^P ), bass tracks (Track _i ^B ), and guitar tracks (Track _i ^G ), and each track has multiple tracks. It can be composed of words.

그리고, 멀티트랙 음악 생성 장치(100)는 전처리기를 통해 각 트랙에 대응하는 제1 헤드 정보(H_i ^D, H_i ^P, H_i ^B, H_i ^G)를 추출한다.Then, the multi-track music generating apparatus 100 extracts first head information (H _i ^D , H _i ^P , H _i ^B , H _i ^G ) corresponding to each track through a preprocessor.

이때, 헤드 정보는 평균 피치, 메인 스케일(Main Scale), 최대 피치, 최소 피치, 평균 지속시간 등 통계적인 특징을 포함한다.At this time, the head information includes statistical characteristics such as average pitch, main scale, maximum pitch, minimum pitch, and average duration.

즉, 멀티트랙 음악 생성 장치(100)는 드럼 트랙(Track_i ^D), 피아노 트랙(Track_i ^P), 베이스 트랙(Track_i ^B), 기타 트랙(Track_i ^G)을 포함하는 복수의 트랙 각각을 전처리기에 적용하여 각 트랙에 대응하는 제1 헤드 정보(H_i ^D, H_i ^P, H_i ^B, H_i ^G)를 추출한다.That is, the multi-track music generating apparatus 100 each of a plurality of tracks including a drum track (Track _i ^D ), a piano track (Track _i ^P ), a bass track (Track _i ^B ), and a guitar track (Track _i ^G ). It is applied to the preprocessor to extract first head information (H _i ^D , H _i ^P , H _i ^B , H _i ^G ) corresponding to each track.

다음으로, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 제1 헤드 정보를 이용하여 헤드 정보 생성기 학습모델을 학습시킨다(S230).Next, the multi-track music generating apparatus 100 according to an embodiment of the present invention uses the first head information to learn the head information generator learning model (S230).

이때, 멀티트랙 음악 생성 장치(100)는 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 GAN 기반의 헤드 정보 생성기 학습모델의 생성자의 입력데이터로 설정하고, 제2 헤드 정보를 헤드 정보 생성기 학습모델의 생성자의 출력데이터로 설정한다.At this time, the multi-track music generating apparatus 100 sets the noise vector and the track and genre label condition vectors as input data of the generator of the GAN-based head information generator learning model, and sets the second head information as the generator of the head information generator learning model. Set the output data of

그리고, 멀티트랙 음악 생성 장치(100)는 제1 헤드 정보 및 제2 헤드 정보의 진위 여부를 판별자를 통해 판별하여 헤드 정보 생성기 학습모델을 학습시킨다. In addition, the multi-track music generating apparatus 100 determines whether the first head information and the second head information are genuine or false through a discriminator and learns the head information generator learning model.

도 4는 도 2의 S230 단계를 설명하기 위한 순서도이고, 도 5는 도 2의 S230 단계를 설명하기 위한 도면이다. FIG. 4 is a flowchart illustrating step S230 of FIG. 2 , and FIG. 5 is a diagram illustrating step S230 of FIG. 2 .

도 4에서 도시한 바와 같이, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 헤드 정보 생성기 학습모델에 적용하여 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 기반으로 하는 제2 헤드 정보를 출력한다(S231).As shown in FIG. 4, the apparatus 100 for generating multi-track music according to an embodiment of the present invention applies the noise vector and the track and genre label condition vectors to the head information generator learning model to apply the noise vector and the track and genre label conditions. Second head information based on the vector is output (S231).

더욱 자세하게는, 도 5에서 도시한 바와 같이, 멀티트랙 음악 생성 장치(100)는 노이즈 벡터(n)와 트랙(t^D, t^P, t^B, t^G) 및 장르(g) 라벨 조건 벡터를 헤드 정보 생성기의 생성자(G^R)에 적용하여 노이즈 벡터 기반의 제2 헤드 정보(H'^D, H'^P, H'^B, H'^G)를 생성한다.More specifically, as shown in FIG. 5, the multi-track music generating apparatus 100 generates a noise vector (n), tracks (t ^D , t ^P , t ^B , t ^G ) and genre (g) label condition vectors. Second head information (H' ^D , H' ^P , H' ^B , H' ^G ) based on the noise vector is generated by applying to the generator (G ^R ) of the head information generator.

그리고, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 제1 헤드 정보 및 제2 헤드 정보를 헤드 정보 생성기 학습모델의 판별자의 입력데이터로 설정하고, 제1 헤드 정보 및 제2 헤드 정보의 진위 여부를 헤드 정보 생성기 학습모델의 판별자의 출력데이터로 설정한다(S232).In addition, the multi-track music generating apparatus 100 according to an embodiment of the present invention sets the first head information and the second head information as input data of the discriminator of the head information generator learning model, and sets the first head information and the second head information. The authenticity of the information is set as the output data of the discriminator of the head information generator learning model (S232).

여기서, 도 5와 같이, 멀티트랙 음악 생성 장치(100)는 생성된 제2 헤드 정보(H'^D, H'^P, H'^B, H'^G)와 S210 단계에서 추출된 제1 헤드 정보(H_i ^D, H_i ^P, H_i ^B, H_i ^G)를 판별자(D^R)에 적용하여 제1 헤드 정보 및 제2 헤드 정보의 진위 여부를 출력한다.Here, as shown in FIG. 5 , the multi-track music generating apparatus 100 uses the generated second head information (H' ^D , H' ^P , H' ^B , H' ^G ) and the first head information extracted in step S210 ( H _i ^D , H _i ^P , H _i ^B , H _i ^G ) are applied to the discriminator ^DR to output whether the first head information and the second head information are authentic.

이때, 멀티트랙 음악 생성 장치(100)는 제1 헤드 정보 및 제2 헤드 정보의 진위 여부를 판별하도록 헤드 정보 생성기 학습모델을 학습시킨다. At this time, the multi-track music generating apparatus 100 trains the head information generator learning model to determine whether the first head information and the second head information are authentic.

여기서, 도 5에서 도시한 바와 같이, 멀티트랙 음악 생성 장치(100)는 제2 헤드 정보의 진위 여부에 대하여 진위율을 산출하여 판별하도록 헤드 정보 생성기 학습모델을 학습시킬 수 있다. Here, as shown in FIG. 5 , the multi-track music generating apparatus 100 may train the head information generator learning model to calculate and determine whether the second head information is authentic or not.

더욱 자세하게는, 멀티트랙 음악 생성 장치(100)는 진위율이 1에 수렴할수록 제1 헤드 정보(H_i ^D, H_i ^P, H_i ^B, H_i ^G)와 제2 헤드 정보(H'^D, H'^P, H'^B, H'^G)가 유사한 것으로 판단할 수 있다. More specifically, as the authenticity rate converges to 1, the multi-track music generating apparatus 100 first head information (H _i ^D , H _i ^P , H _i ^B , H _i ^G ) and second head information (H' ^D , H' ^P , H' ^B , H' ^G ) can be determined to be similar.

즉, 헤드 정보 생성기 학습모델은 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 헤드 정보 생성기 학습모델에 적용하여 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 기반으로 하는 제2 헤드 정보를 출력하고, 제1 헤드 정보 및 제2 헤드 정보를 헤드 정보 생성기 학습모델의 판별자에 적용하여 제1 헤드 정보 및 제2 헤드 정보의 진위 여부를 출력하며, 제1 헤드 정보 및 제2 헤드 정보의 진위 여부를 판별하도록 학습된다. That is, the head information generator learning model applies the noise vector and the track and genre label condition vectors to the head information generator learning model, outputs second head information based on the noise vector and the track and genre label condition vectors, and outputs the first head information based on the first head information generator learning model. information and the second head information are applied to the discriminator of the learning model of the head information generator to output whether the first head information and the second head information are authentic or not, and learn to determine whether the first head information and the second head information are authentic or not. do.

다음으로, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 멀티트랙으로부터 각 트랙별로 연속적 n개 마디 정보를 추출한 상태에서 제1 헤드 정보를 이용하여 인코더 및 디코더를 학습시킨다(S240).Next, the apparatus 100 for generating multi-track music according to an embodiment of the present invention trains an encoder and a decoder using first head information in a state in which information of consecutive n measures for each track is extracted from the multi-track (S240). .

여기서, 멀티트랙 음악 생성 장치(100)는 제1 헤드 정보 및 연속적 n개 마디 정보를 인코더의 입력데이터로 설정하고, 제1 헤드 특징 정보 및 n개 마디 특징 정보를 인코더의 출력 데이터로 설정한다.Here, the multi-track music generating apparatus 100 sets the first head information and n consecutive bar information as input data of the encoder, and sets the first head characteristic information and n bar characteristic information as output data of the encoder.

그리고, 멀티트랙 음악 생성 장치(100)는 제1 헤드 특징 정보 및 n개 마디 특징 정보를 디코더의 입력데이터로 설정하고, n개 마디 복원 정보, 트랙 종류 및 장르 종류를 디코더의 출력데이터로 설정한다.Then, the apparatus 100 for generating multi-track music sets the first head feature information and the n-bar feature information as input data of the decoder, and sets the n-bar restoration information, track type, and genre type as output data of the decoder. .

그 다음, 멀티트랙 음악 생성 장치(100)는 멀티트랙으로부터 추출된 n개 마디 정보 및 n개 마디 복원 정보, 트랙 종류 및 장르 종류의 차이를 산출하여 인코더 및 디코더를 학습시킨다. Next, the multi-track music generating apparatus 100 calculates n-bar information extracted from the multi-track, n-bar reconstruction information, and differences between track types and genre types to train the encoder and decoder.

도 6은 도 2의 S240 단계를 설명하기 위한 순서도이고, 도 7은 도 2의 S240 단계를 설명하기 위한 도면이다.6 is a flowchart for explaining step S240 of FIG. 2 , and FIG. 7 is a diagram for explaining step S240 of FIG. 2 .

도 6에서 도시한 바와 같이, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 멀티트랙으로부터 각 트랙 마다 연속적 n개 마디 정보를 추출한다(S241).As shown in FIG. 6, the multi-track music generating apparatus 100 according to an embodiment of the present invention extracts n consecutive bar information from each track from the multi-track (S241).

이때, 멀티트랙은 드럼 트랙, 피아노 트랙, 베이스 트랙, 기타 트랙 각각에 대하여 연속적 n개 마디 정보를 포함할 수 있다. In this case, the multi-track may include n consecutive bar information for each of the drum track, piano track, bass track, and other track.

그 다음, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 제1 헤드 정보 및 연속적 n개 마디 정보를 인코더에 적용하여 제1 헤드 특징 정보 및 n개 마디 특징 정보를 출력한다(S242).Next, the apparatus 100 for generating multi-track music according to an embodiment of the present invention applies the first head information and n consecutive bar information to the encoder and outputs first head characteristic information and n bar characteristic information (S242). ).

더욱 자세하게는, 도 7에서 도시한 바와 같이, 멀티트랙 음악 생성 장치(100)는 S210 단계에서 추출한 제1 헤드 정보(H_i ^D, H_i ^P, H_i ^B, H_i ^G)를 인코더에 적용하여 압축된 특징 정보(h_H ^E, h_i ^E)를 추출한다. More specifically, as shown in FIG. 7 , the multi-track music generating apparatus 100 applies the first head information (H _i ^D , H _i ^P , H _i ^B , H _i ^G ) extracted in step S210 to the encoder. to extract compressed feature information (h _H ^E , h _i ^E ).

이때, 특징 정보(h_H ^E, h_i ^E)는 제1 헤드 특징 정보 및 n개 마디 특징 정보를 포함할 수 있다. In this case, the characteristic information (h _H ^E , h _i ^E ) may include first head characteristic information and n node characteristic information.

그 다음, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 제1 헤드 특징 정보 및 n개 마디 특징 정보를 디코더에 적용하여 n개 마디 복원 정보, 트랙 종류 및 장르 종류를 출력한다(S243).Then, the apparatus 100 for generating multi-track music according to an embodiment of the present invention applies the first head characteristic information and n measure characteristic information to a decoder and outputs n measure restoration information, track type and genre type ( S243).

더욱 자세하게는, 도 7과 같이, 멀티트랙 음악 생성 장치(100)는 제1 헤드 특징 정보 및 n개 마디 특징 정보를 포함하는 특징 정보(h_H ^E, h_i ^E)를 디코더에 적용하여 n개 마디 복원 정보, 트랙 종류 및 장르 종류를 추출한다.More specifically, as shown in FIG. 7 , the apparatus 100 for generating multi-track music applies feature information (h _H ^E , h _i ^E ) including first head feature information and n measure feature information to a decoder, and then applies n head feature information to the decoder. Extract the bar restoration information, track type and genre type.

그 다음, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 멀티트랙으로부터 추출된 n개 마디 정보 및 n개 마디 복원 정보, 트랙 종류 및 장르 종류의 차이를 산출하도록 인코더 및 디코더를 학습시킨다(S244).Next, the multi-track music generating apparatus 100 according to an embodiment of the present invention learns an encoder and a decoder to calculate n-bar information and n-bar restoration information extracted from the multi-track, and differences between track types and genre types. It does (S244).

더욱 자세하게는, 도 7과 같이, 멀티트랙 음악 생성 장치(100)는 제1 헤드 복원 정보 및 n개 마디 복원 정보를 덴스(Dense) 레이어에 적용하여 제n번째 마디 정보(d_i,n, p_i,n, b_i,n, g_i,n)를 복원한다.In more detail, as shown in FIG. 7, the multi-track music generating apparatus 100 applies the first head restoration information and the n-bar restoration information to a dense layer to obtain the n-th bar information (d _{i, n} , p _i,n , b _i,n , g _i,n ).

여기서 d_i,n은 i번째 드럼 트랙의 n번째 마디이고, p_i,n은 i번째 피아노 트랙의 n번째 마디이며, b_i,n은 i번째 베이스 트랙의 n번째 마디이고, g_i,n은 i번째 기타 트랙의 n번째 마디이다. where d _i,n is the n-th bar of the ith drum track, p _i,n is the n-th bar of the i-th piano track, b _i,n is the n-th bar of the i-th bass track, and g _i,n is the n-th bar of the i-th guitar track.

그리고, n번째 마디는 1번째 마디부터 8번째 마디까지 구성되어 있는 걸로 가정한다.And, it is assumed that the n-th node is composed of the 1st node to the 8th node.

예를 들어, 멀티트랙 음악 생성 장치(100)는 제1 헤드 정보(H_i ^D, H_i ^P, H_i ^B, H_i ^G)를 덴스 레이어에 적용하여 i번째 드럼 트랙, 피아노 트랙, 베이스 트랙, 기타 트랙 각각에 대응하는 1번째 마디 정보(d_i,1, p_i,1, b_i,1, g_i,1)를 포함하는 제1 마디 복원 정보를 생성할 수 있다.For example, the multi-track music generating apparatus 100 applies the first head information (H _i ^D , H _i ^P , H _i ^B , H _i ^G ) to the dense layer to obtain an ith drum track, a piano track, and a bass track. , and other tracks, it is possible to generate first measure restoration information including first measure information (d _i,1 , p _i,1 , b _i,1 , g _i,1 ) corresponding to each other.

그리고, 멀티트랙 음악 생성 장치(100)는 제1 마디 복원 정보(d_i,1, p_i,1, b_i,1, g_i,1)를 인코더를 통해 압축된 특징 정보(h₁ ^E)를 추출하고, 특징 정보(h₁ ^E)를 디코더에 적용하여 복원된 제2 마디 복원 정보(h₂ ^D)를 추출한다.In addition, the multi-track music generating apparatus 100 converts the first measure restoration information (d _i,1 , p _i,1 , b _i,1 , g _i,1 ) into compressed feature information (h ₁ ^E ) through an encoder. is extracted, and the restored second measure reconstruction information (h ₂ ^D ) is extracted by applying the feature information (h ₁ ^E ) to the decoder.

그 다음, 멀티트랙 음악 생성 장치(100)는 복원된 제2 마디 복원 정보를 덴스 레이어에 적용하여 i번째 드럼 트랙, 피아노 트랙, 베이스 트랙, 기타 트랙 각각에 대응하는 2번째 마디를 포함하는 제2 마디 정보(d_i,2, p_i,2, b_i,2, g_i,2)를 생성한다.Next, the multi-track music generating apparatus 100 applies the restored second measure restoration information to the dense layer to produce a second measure including a second measure corresponding to the ith drum track, piano track, bass track, and guitar track, respectively. Node information (d _i,2 , p _i,2 , b _i,2 , g _i,2 ) is generated.

이와 동일한 방법으로 멀티트랙 음악 생성 장치(100)는 인코더, 디코더 및 덴스 레이어를 이용하여 i번째 드럼 트랙, i번째 피아노 트랙, i번째 베이스 트랙, i번째 기타 트랙의 8번째 마디를 포함하는 제8 마디 정보(d_i,8, p_i,8, b_i,8, g_i,8)를 생성한다.In the same way, the multi-track music generating apparatus 100 uses an encoder, a decoder, and a dense layer to generate the 8th bar including the 8th measure of the ith drum track, the ith piano track, the ith bass track, and the ith guitar track. Node information (d _i,8 , p _i,8 , b _i,8 , g _i,8 ) is created.

그리고, 멀티트랙 음악 생성 장치(100)는 생성된 제1 마디 정보부터 제8 마디 정보 각각에 대하여 미디 파일의 실제 마디 정보와 비교하여 생성된 마디 정보에 대한 진위 여부를 판단한다. Then, the multi-track music generating apparatus 100 compares each of the generated first to eighth bar information with actual bar information of the MIDI file to determine whether the generated bar information is authentic or false.

예를 들어, 멀티트랙 음악 생성 장치(100)는 미디 파일에서의 i번째 드럼 트랙, i번째 피아노 트랙, i번째 베이스 트랙, i번째 기타 트랙의 1번째 마디를 포함하는 실제 마디 정보와 덴스 레이어로부터 생성된 i번째 드럼 트랙, 피아노 트랙, 베이스 트랙, 기타 트랙 각각에 대응하는 1번째 마디 정보(d_i,1, p_i,1, b_i,1, g_i,1)를 포함하는 제1 마디 정보와 비교하여 제1 마디 정보와 실제 마디 정보의 차이를 산출한다.For example, the multi-track music generating apparatus 100 generates information from actual measures including the first measure of the ith drum track, the ith piano track, the ith bass track, and the ith guitar track in a MIDI file, and the dance layer. A first measure including first measure information (d _i,1 , p _i,1 , b _i,1 , g _i,1 ) corresponding to the generated ith drum track, piano track, bass track, and guitar track, respectively. The difference between the first node information and the actual node information is calculated by comparing with the information.

그리고, 멀티트랙 음악 생성 장치(100)는 미디 파일에서의 i번째 드럼 트랙, i번째 피아노 트랙, i번째 베이스 트랙, i번째 기타 트랙의 2번째 마디를 포함하는 실제 마디 정보와 덴스 레이어로부터 복원된 제2 마디 정보와 비교하여 제2 마디 정보와 실제 마디 정보의 차이를 산출한다.In addition, the multi-track music generating apparatus 100 restores actual bar information including the second bar of the ith drum track, the ith piano track, the ith bass track, and the ith guitar track in the MIDI file and the dense layer A difference between the second node information and the actual node information is calculated by comparing with the second node information.

이와 동일한 방법으로, 멀티트랙 음악 생성 장치(100)는 생성된 제1 마디 정보에서 제8 마디 정보까지를 실제 미디 파일에서의 마디 정보와 비교하여 진위 여부를 판별할 수 있다. In the same way, the multi-track music generating device 100 may compare the generated first bar information to the eighth bar information with bar information in an actual MIDI file to determine authenticity.

그리고, 멀티트랙 음악 생성 장치(100)는 미디 파일의 정답 트랙 및 장르 라벨 조건 벡터를 디코더의 학습용 정답 데이터로 설정하고, 제1 헤드 특징 정보 및 n개 마디 특징 정보를 디코더에 적용하여 복원 트랙 및 장르 라벨 조건 벡터를 생성한다.In addition, the multi-track music generating apparatus 100 sets the correct answer track and genre label condition vector of the MIDI file as correct answer data for learning of the decoder, and applies the first head feature information and n-bar feature information to the decoder to obtain a restored track and Create a genre label condition vector.

이때, 멀티트랙 음악 생성 장치(100)는 정답 트랙 및 장르 라벨 조건 벡터와 복원 트랙 및 장르 라벨 조건 벡터의 차이를 산출하도록 인코더 및 디코더를 더 학습시킨다. At this time, the multi-track music generating apparatus 100 further trains the encoder and decoder to calculate the difference between the correct track and genre label condition vector and the restored track and genre label condition vector.

즉, 멀티트랙 음악 생성 장치(100)는 멀티트랙으로부터 추출된 n개 마디 정보 및 n개 마디 복원 정보, 트랙 종류 및 장르 종류의 차이를 산출하도록 인코더 및 디코더를 학습시킨다. That is, the multi-track music generating apparatus 100 trains an encoder and a decoder to calculate n-bar information extracted from multi-tracks, n-bar reconstruction information, and differences between track types and genre types.

다음으로, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 n개 마디 특징 정보, 트랙 및 장르 라벨 조건 벡터, 마디 순서를 결합하여 GAN 기반의 마디 생성기 학습모델을 학습시킨다(S250).Next, the multi-track music generating apparatus 100 according to an embodiment of the present invention combines n measure feature information, track and genre label condition vectors, and measure order to train a GAN-based measure generator learning model (S250). .

이때, 멀티트랙 음악 생성 장치(100)는 n개 마디 특징 정보, 트랙 및 장르 라벨 조건 벡터, 마디 순서를 결합하여 GAN 기반의 마디 생성기 학습모델의 생성자의 입력데이터로 설정하고, 제1 마디를 마디 생성기 학습모델의 생성자의 출력데이터로 설정한다.At this time, the multi-track music generating apparatus 100 combines n measure feature information, track and genre label condition vectors, and measure order to set the first measure as input data of a generator of a GAN-based measure generator learning model, and sets the first measure as measure Set as the output data of the generator of the generator learning model.

그리고, 멀티트랙으로부터 각 트랙별로 제2 마디를 추출한 상태에서 멀티트랙 음악 생성 장치(100)는 제1 마디 및 제2 마디의 진위 여부를 판별하여 마디 생성기 학습모델을 학습시킨다. Then, in a state in which the second measure is extracted for each track from the multi-track, the multi-track music generating apparatus 100 determines whether the first measure and the second measure are true or not, and learns the measure generator learning model.

도 8은 도 2의 S250 단계를 설명하기 위한 순서도이고, 도 9는 도 2의 S250 단계를 설명하기 위한 도면이다. 8 is a flowchart for explaining step S250 of FIG. 2 , and FIG. 9 is a diagram for explaining step S250 of FIG. 2 .

더욱 자세하게는, 도 8에서 도시한 바와 같이, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 헤드 정보 생성기 학습모델에 적용하여 제2 헤드 정보를 출력한다(S251).More specifically, as shown in FIG. 8, the multi-track music generating apparatus 100 according to an embodiment of the present invention applies the noise vector and the track and genre label condition vectors to the head information generator learning model to generate second head information. is output (S251).

여기서, 도 9에서 도시한 바와 같이, 멀티트랙 음악 생성 장치(100)는 노이즈 벡터(n)와 트랙(t^D, t^P, t^B, t^G) 및 장르(g) 라벨 조건 벡터를 기 학습된 헤드 정보 생성기 학습모델에 적용하여 제2 헤드 정보(H_i ^D, H_i ^P, H_i ^B, H_i ^G)를 생성한다. Here, as shown in FIG. 9, the multi-track music generating apparatus 100 pre-learns a noise vector (n), a track (t ^D , t ^P , t ^B , t ^G ) and a genre (g) label condition vector. Second head information (H _i ^D , H _i ^P , H _i ^B , H _i ^G ) is generated by applying the learning model of the head information generator.

즉, 멀티트랙 음악 생성 장치(100)는 드럼 트랙, 피아노 트랙, 베이스 트랙, 기타 트랙 각각에 대응하는 제2 헤드 정보(H_i ^D, H_i ^P, H_i ^B, H_i ^G)를 출력할 수 있다.That is, the multi-track music generating apparatus 100 outputs second head information (H _i ^D , H _i ^P , H _i ^B , H _i ^G ) corresponding to each of a drum track, a piano track, a bass track, and a guitar track. can

그 다음, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 제2 헤드 정보(H_i ^D, H_i ^P, H_i ^B, H_i ^G)를 인코더에 적용하여 제2 헤드 특징 정보 및 n개 마디 특징 정보를 출력한다(S252).Next, the multi-track music generating apparatus 100 according to an embodiment of the present invention applies the second head information (H _i ^D , H _i ^P , H _i ^B , H _i ^G ) to the encoder to obtain second head characteristic information. and n node feature information is output (S252).

이때, 도 9와 같이, 멀티트랙 음악 생성 장치(100)는 드럼 트랙에 대응하는 헤드 정보(H_i ^D) 및 마디 정보(d_i,n)에 대한 [h₁ ^D, …, h₈ ^D]의 특징 정보를 추출하고, 피아노 트랙에 대응하는 헤드 정보(H_i ^P) 및 마디 정보(p_i,n)에 대한 [h₁ ^P, …, h₈ ^P]의 특징 정보를 추출한다.At this time ^{, as shown in FIG. 9 , the multi-track music generating apparatus 100 [h 1 D} _, _. _. ^. _, h ₈ ^D ] ^, extracts the feature information of [h ₁ _P , . . ^. , h ₈ ^P ] extract feature information.

그리고, 멀티트랙 음악 생성 장치(100)는 베이스 트랙에 대응하는 헤드 정보(H_i ^B) 및 마디 정보(b_i,n)에 대한 [h₁ ^B, …, h₈ ^B]의 특징 정보를 추출하고, 멜로디 트랙에 대응하는 헤드 정보(H_i ^G) 및 마디 정보(g_i,n)에 대한 [h₁ ^G, …, h₈ ^G]의 특징 정보를 추출한다.And, the multi _- track _{music generating apparatus 100 [h 1} ^B _, ^. , h ₈ ^B ], and extracts the head information (H _i ^G ) and measure information (g _i,n ) corresponding to the melody track [h ₁ ^G , . , h ₈ ^G ] extract feature information.

그 다음, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 n개 마디 특징 정보와 트랙 및 장르 라벨 조건 벡터, 마디 순서를 결합하여 마디 생성기 학습모델에 적용하여 제1 마디를 출력한다(S253). Next, the apparatus 100 for generating multi-track music according to an embodiment of the present invention combines n measure feature information, track and genre label condition vectors, and measure order, applies them to the measure generator learning model, and outputs a first measure. (S253).

여기서, 멀티트랙 음악 생성 장치(100)는 제2 헤드 정보에 대한 특징 정보를 마디 순서(p1, p2, …, p3), 트랙 라벨 조건 벡터(t^D, t^P, t^B, t^G), 장르 라벨 조건 벡터 (g)와 결합한다.Here, the multi-track music generating apparatus 100 converts the characteristic information of the second head information into a measure order (p1, p2, ..., p3), a track label condition vector (t ^D , t ^P , t ^B , t ^G ), Combine with the genre label condition vector (g).

그리고, 멀티트랙 음악 생성 장치(100)는 결합된 특징 정보를 마디 생성기 학습모델의 생성자(G_bar)에 적용하여 제1 마디를 출력한다. Then, the multi-track music generating apparatus 100 outputs a first bar by applying the combined feature information to a generator (G _bar ) of the bar generator learning model.

이때, 제1 마디는 드럼 트랙에 대응하는 마디(bar₁'^D, bar₂'^D, …, bar₈'^D), 피아노 트랙에 대응하는 마디(bar₁'^P, bar₂'^P, …, bar₈'^P), 베이스 트랙에 대응하는 마디(bar₁'^B, bar₂'^B, …, bar₈'^B), 기타 트랙에 대응하는 마디(bar₁'^G, bar₂'^G, …, bar₈'^G)를 포함한다.At this time, the first measure is a measure corresponding to the drum track (bar ₁ ' ^D , bar ₂ ' ^D , ..., bar ₈ ' ^D ), and a measure corresponding to the piano track (bar ₁ ' ^P , bar ₂ ' ^P , ..., bar ₈ ' ^P ), measures corresponding to bass tracks (bar ₁ ' ^B , bar ₂ ' ^B , …, bar ₈ ' ^B ), measures corresponding to guitar tracks (bar ₁ ' ^G , bar ₂ ' ^G , …, bar ₈ ' ^G ).

그 다음, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 미디 파일의 멀티트랙으로부터 각각의 트랙에 대응하는 제2 마디를 추출한 상태에서 제1 마디와 제2 마디를 마디 생성기 학습모델의 판별자(D_bar)에 적용하여 제1 마디 및 제2 마디의 진위 여부를 판단하도록 마디 생성기를 학습시킨다(S254).Next, the multi-track music generating apparatus 100 according to an embodiment of the present invention extracts the second measures corresponding to each track from the multi-tracks of the MIDI file, and converts the first measure and the second measure to the measure generator learning model. The node generator is trained to determine whether the first node and the second node are true or false by applying to the discriminator (D _bar ) of (S254).

이때, 도 9에서 도시한 바와 같이, 제1 마디 단위의 트랙은 실제 미디 파일로부터 드럼 트랙에 대응하는 마디(bar₁ ^D, bar₂ ^D, …, bar₈ ^D), 피아노 트랙에 대응하는 마디(bar₁ ^C, bar₂ ^C, …, bar₈ ^C), 베이스 트랙에 대응하는 마디(bar₁ ^B, bar₂ ^B, …, bar₈ ^B), 기타 트랙에 대응하는 마디(bar₁ ^G, bar₂ ^G, …, bar₈ ^G)을 포함한다.At this time, as shown in FIG. 9, the first measure unit tracks include measures (bar ₁ ^D , bar ₂ ^D , ..., bar ₈ ^D ) corresponding to the drum track from the actual MIDI file, and measures corresponding to the piano track ( bar ₁ ^C , bar ₂ ^C , …, bar ₈ ^C ), bars corresponding to bass tracks ( bar ₁ ^B , bar ₂ ^B , …, bar ₈ ^B ), bars corresponding to guitar tracks ( bar ₁ ^G , bar ₂ ^G , …, bar ₈ ^G ).

즉, 마디 생성기 학습모델은 n개 마디 특징 정보와 트랙 및 장르 라벨 조건 벡터, 마디 순서를 결합하여 마디 생성기 학습모델에 입력하여 제1 마디를 출력하고, 제1 마디와 미디 파일로부터 추출된 제2 마디를 마디 생성기 학습모델의 판별자에 적용하여 제1 마디 및 제2 마디의 진위 여부를 판단하도록 학습된다. That is, the bar generator learning model combines n bar feature information, track and genre label condition vectors, and bar order, inputs them to the bar generator learning model, outputs a first bar, and outputs a first bar and a second bar extracted from the MIDI file. It is learned to determine the authenticity of the first node and the second node by applying the node to the discriminator of the node generator learning model.

그 다음, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 연속적인 제1 마디를 붙여서 제1 트랙을 생성한다(S255). Next, the multi-track music generating apparatus 100 according to an embodiment of the present invention generates a first track by attaching consecutive first measures (S255).

즉, 멀티트랙 음악 생성 장치(100)는 연속적으로 출력되는 제n마디를 붙여서 제1 트랙을 생성할 수 있다.That is, the apparatus 100 for generating multi-track music may generate a first track by attaching n-th measures that are continuously output.

이하에서는 도 10을 이용하여 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치를 이용하여 복수 장르의 멀티트랙 음악을 생성하는 방법에 대하여 설명한다.Hereinafter, a method of generating multi-track music of multiple genres using the multi-track music generating apparatus according to an embodiment of the present invention will be described with reference to FIG. 10 .

도 10은 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치를 이용한 복수 장르의 멀티트랙 음악 생성 방법을 설명하기 위한 순서도이고, 도 11은 도 10을 설명하기 위한 도면이다.FIG. 10 is a flowchart for explaining a method for generating multi-track music of multiple genres using a multi-track music generating device according to an embodiment of the present invention, and FIG. 11 is a diagram for explaining FIG. 10 .

먼저, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 음악 생성 대상이 되는 노이즈 벡터와 트랙 및 장르 라벨 조건 벡터를 기 학습된 헤드 정보 생성기 학습모델에 적용하여 헤드 정보를 생성한다(S1010).First, the multi-track music generating apparatus 100 according to an embodiment of the present invention generates head information by applying a noise vector and a track and genre label condition vector to a previously learned head information generator learning model ( S1010).

도 11에서 도시한 바와 같이, 멀티트랙 음악 생성 장치(100)는 도 2의 S230 단계를 통해 기 학습된 헤드 정보 생성기 학습모델을 이용하여 노이즈 벡터(n)와 트랙 및 장르 라벨 조건 벡터(t^D, t^P, t^B, t^G)로부터 헤드 정보(H^D, H^P, H^B, H^G)를 출력한다. As shown in FIG. 11, the multi-track music generating apparatus 100 uses the head information generator learning model previously learned through step S230 of FIG. 2 to generate a noise vector (n) and a track and genre label condition vector (t ^D , t ^P , t ^B , t ^G ) to output head information (H ^D , H ^P , H ^B , H ^G ).

이때, 자세한 내용은 도 2의 S230 단계에서 이미 설명하였는 바, 중복된 설명은 생략한다. At this time, since the details have already been described in step S230 of FIG. 2, duplicate descriptions will be omitted.

다음으로, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 헤드 정보를 기 학습된 인코더에 적용하여 헤드 특징 정보 및 n개 마디 특징 정보를 추출한다(S1020).Next, the apparatus 100 for generating multi-track music according to an embodiment of the present invention extracts head feature information and n-bar feature information by applying the head information to the pre-learned encoder (S1020).

도 11에서 도시한 바와 같이, 멀티트랙 음악 생성 장치(100)는 도 2의 S240 단계를 통해 기 학습된 인코더를 이용하여 트랙별 각각의 헤드 정보 및 마디 정보에 대응하는 헤드 특징 정보 및 마디 특징 정보를 추출한다.As shown in FIG. 11, the multi-track music generating apparatus 100 uses the encoder pre-learned through step S240 of FIG. 2 to head feature information and section feature information corresponding to head information and section information for each track extract

이때, 자세한 내용은 도 2의 S240 단계에서 이미 설명하였는 바, 중복된 설명은 생략한다.At this time, since the details have already been described in step S240 of FIG. 2, duplicate descriptions will be omitted.

다음으로, 본 발명의 실시예에 따른 멀티트랙 음악 생성 장치(100)는 헤드 특징 정보 및 n개 마디 특징 정보와 트랙 및 장르 라벨 조건 벡터, 마디 순서를 결합하여 기 학습된 마디 생성기 학습모델의 생성자에 적용하여 사용자가 기 지정한 장르의 음악을 마디 단위로 생성한다(S1030). Next, the multi-track music generating apparatus 100 according to an embodiment of the present invention combines the head feature information and the n-bar feature information with the track and genre label condition vectors and the order of the bars to create a pre-learned bar generator learning model. is applied to generate music of a genre previously designated by the user in units of measures (S1030).

도 11에서 도시한 바와 같이, 멀티트랙 음악 생성 장치(100)는 도 2의 S250 단계에서 기 학습된 마디 생성기 학습모델을 이용하여 마디 단위의 음악을 생성한다.As shown in FIG. 11 , the multi-track music generating apparatus 100 generates music in units of bars using the previously learned bar generator learning model in step S250 of FIG. 2 .

이때, 음악의 장르는 사용자에 의해 지정될 수 있다. In this case, the genre of music may be designated by the user.

그리고, 자세한 내용은 도 2의 S250 단계에서 이미 설명하였는 바, 중복된 설명은 생략한다. Further, since the details have already been described in step S250 of FIG. 2 , duplicate descriptions will be omitted.

이와 같이 본 발명의 실시예에 따르면, 다양한 장르 및 다양한 트랙으로 학습모델을 학습시키므로, 음악을 다양하게 생성할 수 있고, GAN 알고리즘 기반의 학습모델을 통해 한 곡의 음악을 생성시킬 수 있다.As described above, according to an embodiment of the present invention, since the learning model is trained with various genres and various tracks, various types of music can be generated, and one piece of music can be generated through the learning model based on the GAN algorithm.

본 발명은 도면에 도시된 실시예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 다른 실시예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의하여 정해져야 할 것이다.Although the present invention has been described with reference to the embodiments shown in the drawings, this is only exemplary, and those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Therefore, the true technical scope of protection of the present invention should be determined by the technical spirit of the appended claims.

100: 멀티트랙 음악 생성 장치,
110: 헤드 정보 추출부,
120: 헤드 정보 생성기 학습부,
130: 인코더 및 디코더 학습부,
140: 마디 생성기 학습부,
150: 생성부,
160: 추출부,
170: 제어부100: multi-track music generating device;
110: head information extraction unit,
120: head information generator learning unit;
130: encoder and decoder learning unit,
140: node generator learning unit,
150: generating unit,
160: extraction unit,
170: control unit

Claims

A method for generating multi-track music of multiple genres using a multi-track music generating device,
Extracting first head information from the multi-tracks of the MIDI file using a preprocessor when a MIDI file including multi-tracks is input;
A noise vector and track and genre label condition vectors are set as input data of a generator of a GAN-based head information generator learning model, and second head information is set as output data of a generator of a head information generator learning model, and the first head Learning the head information generator learning model by determining whether the information and the second head information are authentic or not through a discriminator;
In a state in which n consecutive bar information is extracted for each track from the multi-track, the first head information and consecutive n bar information are set as input data of the encoder, and the first head characteristic information and the n n bar characteristic information are set to the encoder. set as the output data of , set the first head feature information and n measure feature information as input data of the decoder, set n measure restoration information, track type and genre type as output data of the decoder, and set the multi-track Learning an encoder and a decoder by calculating differences between n-bar information and n-bar restoration information extracted from, track type and genre type;
Combining the n node feature information, track and genre label condition vectors, and node order, set the input data of the generator of the GAN-based node generator learning model, and set the first node as the output data of the generator of the node generator learning model. and in a state in which the second measure is extracted for each track from the multi-track, determining whether the first measure and the second measure are true or false and learning the measure generator learning model;
Generating head information by applying a noise vector and a track and genre label condition vector to the pre-learned head information generator learning model;
extracting head feature information and n node feature information by applying the head information to the pre-learned encoder; and
Generating music of a genre designated by a user in units of bars by combining the head feature information and the n-bar feature information, track and genre label condition vectors, and bar order, and applying the result to the generator of the previously learned bar generator learning model. A method for generating multi-track music of multiple genres comprising a.

According to claim 1,
The step of extracting first head information from the multi-track of the MIDI file using the preprocessor,
Receiving a MIDI file classified by music genre;
extracting at least one of a drum track, a piano track, a bass track, and a guitar track from the multi-track; and
and extracting first head information corresponding to each track by applying each of the plurality of tracks to a preprocessor.

According to claim 1,
The step of learning the head information generator learning model,
outputting second head information based on the noise vector and the track and genre label condition vectors by applying the noise vector and the track and genre label condition vectors to the head information generator learning model; and
The first head information and the second head information are set as input data of the discriminator of the head information generator learning model, and authenticity of the first head information and the second head information is set as output data of the discriminator of the head information generator learning model. steps to set up,
and training a head information generator learning model to determine whether the first head information and the second head information are genuine or false.

According to claim 1,
The step of learning the encoder and decoder,
A correct answer track and a genre label condition vector of the MIDI file are set as correct answer data for learning of a decoder, and a restored track and a genre label condition vector are generated by applying the first head feature information and n-node feature information to a decoder. further training the encoder and decoder to calculate a difference between a correct track and genre label condition vector and a restored track and genre label condition vector.

According to claim 1,
The step of learning the node generator,
outputting second head information by applying a noise vector and track and genre label condition vectors to the head information generator learning model;
inputting the second head information to the encoder and outputting second head characteristic information and n-node characteristic information;
outputting a first measure by combining the n measure feature information, track and genre label condition vectors, and measure order and applying the result to the measure generator learning model;
In a state in which the second measures corresponding to each track are extracted from the multi-tracks of the MIDI file, the first and second measures are applied to the discriminator of the measure generator learning model to determine the first and second measures training the node generator to determine authenticity; and
A method for generating multi-track music of multiple genres, comprising generating a first track by attaching consecutive first measures.

In the multi-track music generating device based on the generative adversarial neural network,
a head information extraction unit for extracting first head information from the multi-tracks of the MIDI file using a preprocessor when a MIDI file including multi-tracks is received;
A noise vector and a track and genre label condition vector are set as input data of a generator of a GAN-based head information generator learning model, and second head information is set as output data of a generator of the head information generator learning model. a head information generator learning unit for learning a head information generator learning model by discriminating whether the head information and the second head information are authentic or false through a discriminator;
In a state in which n consecutive bar information is extracted for each track from the multi-track, the first head information and consecutive n bar information are set as input data of the encoder, and the first head characteristic information and the n n bar characteristic information are set to the encoder. set as the output data of , set the first head feature information and n measure feature information as input data of the decoder, set n measure restoration information, track type and genre type as output data of the decoder, and set the multi-track An encoder and decoder learning unit for learning the encoder and decoder by calculating the n-node information and the n-bar restoration information extracted from, and the difference between the track type and the genre type;
Combining the n node feature information, track and genre label condition vectors, and node order, sets the input data of the generator of the GAN-based node generator learning model, and sets the first node as the output data of the generator of the node generator learning model. a node generator learning unit for learning a node generator learning model by determining whether the first node and the second node are true or not in a state in which the second node is extracted for each track from the multi-track;
a generating unit generating head information by applying a noise vector to be generated and track and genre label condition vectors to the pre-learned head information generator learning model;
An extraction unit for extracting head feature information and n node feature information by applying the head information to the pre-learned encoder; and
A plurality of controllers including a control unit generating music of a genre previously designated by a user in units of bars by combining the n bar feature information, track and genre label condition vectors, and bar order and applying the result to the constructor of the previously learned bar generator learning model Genre multitrack music generator.

According to claim 6,
The head information extraction unit,
A MIDI file in which music genres are classified is received, at least one of a drum track, a piano track, a bass track, and a guitar track is extracted from the multi-track, and each of the plurality of tracks is applied to a preprocessor to correspond to each track. A multi-track music generating device for extracting first head information that

According to claim 6,
The head information generator learning unit,
The noise vector and the track and genre label condition vectors are applied to the head information generator learning model to output second head information based on the noise vector and the track and genre label condition vectors, and the first head information and the second head information are output. Set the input data of the discriminator of the head information generator learning model, set whether the first head information and the second head information are authentic or false as the output data of the discriminator of the head information generator learning model, and set the first head information and the second head information as output data. A multi-track music generating device that trains a head information generator learning model to determine whether head information is authentic.

According to claim 6,
The encoder and decoder learning unit,
A correct answer track and a genre label condition vector of the MIDI file are set as correct answer data for learning of a decoder, and a restored track and a genre label condition vector are generated by applying the first head feature information and n-node feature information to a decoder. and further training the encoder and decoder to calculate a difference between a correct track and genre label condition vector and a restored track and genre label condition vector.

According to claim 6,
The node generator learning unit,
Second head information is output by applying the noise vector and track and genre label condition vectors to the head information generator learning model, and second head feature information and n-node feature information are obtained by applying the second head information to the encoder. and outputting a first measure by combining the n measure feature information, track and genre label condition vectors, and measure order and applying them to the measure generator learning model, and outputting a first measure, and corresponding to each track from the multi-track of the MIDI file. In the state in which the second node is extracted, the node generator is trained to determine whether the first node and the second node are true or false by applying the first node and the second node to a discriminator of the node generator learning model, and continuously An apparatus for generating multi-track music for generating a first track by attaching a first measure to each other.