KR102472972B1

KR102472972B1 - Apparatus and method for mixing music sources based on artificial intelligence

Info

Publication number: KR102472972B1
Application number: KR1020210045899A
Authority: KR
Inventors: 전이배
Original assignee: 주식회사 폰에어
Priority date: 2021-04-08
Filing date: 2021-04-08
Publication date: 2022-12-02
Also published as: KR20220139665A

Abstract

본 발명의 일 실시예에 따라, 사용자 단말에 의해 수행되는, AI 기반으로 음원을 생성하는 방법에 있어서, (a) 사용자로부터 보이스 정보를 수신하는 단계; (b) 보이스 정보와 소정의 유사도를 갖는 비교 가수를 선정하는 단계; 및 (c) 비교 가수와 매핑되어 기 저장된 음원 조정값에 기초하여 보이스 정보를 믹싱 및 마스터링 하여 사용자 음원을 생성하는 단계;를 포함하고, 보이스 정보는 사용자가 직접 발성하여 생성한 음성 혹은 노래가 녹음된 정보이고, 음원 조정값은 믹싱 조정값 및 마스터링 조정값을 포함한다.According to an embodiment of the present invention, in a method for generating a sound source based on AI, performed by a user terminal, the method comprising: (a) receiving voice information from a user; (b) selecting a comparison singer having a predetermined similarity with voice information; And (c) generating a user sound source by mixing and mastering voice information based on a pre-stored sound source adjustment value mapped to a comparison singer, wherein the voice information is a voice or song generated by the user's own utterance It is recorded information, and the sound source adjustment value includes a mixing adjustment value and a mastering adjustment value.

Description

AI-based sound source mixing device and method {APPARATUS AND METHOD FOR MIXING MUSIC SOURCES BASED ON ARTIFICIAL INTELLIGENCE}

본 발명은 사용자의 목소리에 맞게 음원을 믹싱하기 위한 설정값을 AI를 기반으로 산출하고, 설정값을 기반으로 음원으로 믹싱하기 위한 장치 및 그 방법에 관한 것이다.The present invention relates to an apparatus and method for calculating a set value for mixing a sound source to suit a user's voice based on AI and mixing the sound source based on the set value.

종래에는 유튜브(You tube)를 비롯한 개인이나 기업이 생성한 다양한 동영상을 공유하기 위한 플랫폼이 널리 이용되고 있다. 이러한, 플랫폼이 발달함에 따라, 콘텐츠 창작자는 자신이 생성한 동영상을 플랫폼을 통해 업로드하여 타인과의 공유를 수행하고 있는 상황이다.Conventionally, platforms for sharing various videos created by individuals or companies, including YouTube, have been widely used. As these platforms develop, content creators are sharing videos with others by uploading them through the platform.

또한, 동영상뿐만 아니라, 과거에는 가수들의 전유물이었던 음원을 개인이 동영상 플랫폼 등을 통해 커버(cover)곡으로 형성하여 업로드하는 콘텐츠 창작자가 늘어가고 있다.In addition, an increasing number of content creators are uploading not only videos, but also individual cover songs through video platforms, which were previously the exclusive property of singers.

이때, 커버곡이란 음악 분야에서 특정 사람이 발표한 곡을 다른 사람이 연주 또는 가창한 곡을 뜻한다.In this case, the cover song refers to a song performed or sung by another person in the field of music, which was released by a specific person.

통상적인 콘텐츠 공유 플랫폼은 업로드한 콘텐츠를 재생하기 전이나, 재생되는 과정에 콘텐츠와 매핑된 광고를 일정 시간 의무적으로 노출시키게 된다. 이러한 과정을 통해, 업로드한 콘텐츠의 조회수에 따라 콘텐츠의 재생 시간과 함께 노출되는 광고의 노출 횟수 및 노출 시간에 맞게 콘텐츠 제작자에게 일정 수익이 지급되는 형태를 갖는다.A typical content sharing platform compulsorily exposes an advertisement mapped to the content for a certain period of time before or during playback of the uploaded content. Through this process, a certain amount of revenue is paid to the content creator according to the number of views of the uploaded content and the number and exposure time of the advertisement exposed together with the reproduction time of the content.

앞서 설명한 커버곡의 경우 원저작자 사이에 저작권 문제가 발생할 수 있다. 예를 들어, 저작권법상 원곡 음악 MR을 사용하여 노래를 부르는 것은 복제권 침해에 해당할 수 있고, 편곡은 2차적 저작물 작성권 침해 등이 그것이다. 더 나아가 이러한 영상을 콘텐츠 공유 플랫폼에 업로드하는 행위는 전송권 침해에 해당될 수 있다.In the case of the cover song described above, copyright issues may arise between the original authors. For example, under the Copyright Act, singing a song using an original music MR may constitute an infringement of copying rights, and arranging may constitute an infringement of a secondary copyrighted work. Furthermore, the act of uploading these videos to a content sharing platform may constitute an infringement of transmission rights.

이때, 유튜브의 경우 2008년부터 콘텐츠검증기술(Content ID, 이하 CID)를 도입하여 저작자의 저작권을 보호하고 있다. 이는, 유튜브의 콘텐츠검증기술은 저작권자로부터 제공받은 원본 콘텐츠 참조 파일로 구성된 데이터베이스를 바탕으로 콘텐츠 창작자가 올리는 콘텐츠가 저작권자의 원본 콘텐츠 참조 파일과 일치하는지 자동으로 대조하여 검증하는 시스템이다. At this time, in the case of YouTube, since 2008, the content verification technology (Content ID, hereinafter referred to as CID) has been introduced to protect the copyright of the author. YouTube's content verification technology is a system that automatically compares and verifies whether the content uploaded by the content creator matches the original content reference file of the copyright holder based on a database composed of original content reference files provided by the copyright holder.

유튜브는 이러한 콘텐츠검증기술을 통해 작권자의 저작물과 일치하는 업로드된 컨텐츠를 검색한 경우 저작권자는 이를 보고 받게 된다. 이때, 저작권자는 해당 컨텐츠에 대해 차단, 추적, 수익모델화 여부를 선택할 수 있게 된다. 즉, 업로드된 콘텐츠가 저작권 위반으로 판명되면 저작권자의 설정에 따라서 경고조치가 이루어지거나, 사용자들의 접근이 저절로 차단되거나, 해당 컨텐츠가 제거되거나, 컨텐츠 재생에 따라 발생하는 수익을 저작권자가 가져갈 수 있는 것이다. When YouTube searches for uploaded content that matches the work of the copyright holder through this content verification technology, the copyright owner will be reported. At this time, the copyright owner can select whether to block, track, or make a profit model for the corresponding content. In other words, if the uploaded content is found to be in violation of copyright, a warning is issued according to the setting of the copyright owner, access to users is automatically blocked, the content is removed, or the copyright holder can take the revenue generated from playing the content. .

한편, 유튜브는 저작권을 관리하기 위한 봇(bot)도 함께 이용하고 있다. 봇은 저작권자들이 동영상 공유 플랫폼에 저작권 신청을 하며, 저작권자를 제외한 다른 콘텐츠 창작자들이 저작권자가 신청한 저작물과 일치 또는 유사한 콘텐츠를 업로드하는 경우 관리 봇이 필터링하여 해당 저작물에 대한 권한을 저작권자에게 주고 해당 저작물에서 나오는 광고 수익을 전부 저작권자에게 할당하게 된다.Meanwhile, YouTube is also using a bot to manage copyright. In the bot, copyright holders apply for copyright on a video sharing platform, and if other content creators other than the copyright holder upload content that matches or is similar to the copyright holder's requested work, the management bot filters it and gives the copyright holder the right to the work, and the corresponding work All of the advertising revenue from is allocated to the copyright owner.

2019년 통계에 따르면 유튜브가 저작자나 음원업계에 지불한 음원 사용료가 무려 3조 5천 5백억원으로 추정된다. 이중에는 커버댄스나 커버곡을 부르는 유명 유튜버가 업로드한 동영상에 붙은 광고 수익이 상당 부분을 차지한다. According to statistics in 2019, it is estimated that the music usage fee paid by YouTube to the author or the music industry is a whopping 3.55 trillion won. Among them, a large portion of the advertising revenue attached to videos uploaded by famous YouTubers singing cover dances or cover songs takes up a large portion.

결국, 음악이나 음원을 단순히 이용하는 경우에는 원저작권자의 저작권을 보다 보호할 필요성이 크다고 할 수 있겠지만, 커버곡과 같이 자신이 제작한 콘텐츠로 창출된 수익을 기존 저작자와 배분하는 새로운 사업 모델이 필요한 상황이다.In the end, it can be said that there is a great need to protect the copyright of the original copyright holder in the case of simply using music or sound sources, but a new business model is needed to distribute the revenue generated from content produced by oneself, such as cover songs, with existing authors. to be.

하지만, 커버곡과 같은 콘텐츠를 동영상 공유 플랫폼에 업로드하고자 하는 콘텐츠 창작자가 개인적으로 저작권자와 직접 연락을 취하는 것에는 한계가 있다. 설사, 콘텐츠 창작자와 저작권자 사이에 연락이 닿더라도 저작물의 사용 허락을 받기가 곤란하며 저작물에 대한 사용료 지급부터 사용승인 서류까지 완벽하게 준비하기가 쉽지 않다.However, there is a limit for a content creator who wants to upload content such as a cover song to a video sharing platform to personally contact the copyright holder. Even if contact is made between the content creator and the copyright holder, it is difficult to obtain permission to use the copyrighted work, and it is not easy to prepare completely from the payment of royalties to the use approval documents.

본 발명은 상기한 문제점을 해결하기 위해, AI를 기반으로 사용자가 손쉽게 자신만의 음원을 생성하고, 그에 대한 수익을 창출할 수 있는 종합플랫폼의 구현을 목적으로 한다.In order to solve the above problems, an object of the present invention is to implement a comprehensive platform in which users can easily create their own sound sources based on AI and generate revenue for them.

구체적으로 사용자로부터 녹음된 음성을 분석하여, 사용자에게 맞는 AI믹싱값을 도출하고, 도출된 믹싱값에 따라 사용자의 음성과 MR반주를 믹싱하여 고유한 창작 음원을 생성하게 된다.Specifically, by analyzing the recorded voice from the user, an AI mixing value suitable for the user is derived, and a unique creative sound source is created by mixing the user's voice and MR accompaniment according to the derived mixing value.

또한, 사용자가 창작 음원의 생성뿐만 아니라, 이를 대중에게 공개하고 수익을 창출할 수 있는 종합 플랫폼의 구현을 목적으로 한다.In addition, the purpose is to implement a comprehensive platform where users can not only create creative sound sources, but also disclose them to the public and generate profits.

상기와 같은 기술적 과제를 달성하기 위한 본 발명의 일 실시예에 따른, 사용자 단말에 의해 수행되는, AI 기반으로 음원을 생성하는 방법에 있어서, (a) 사용자로부터 보이스 정보를 수신하는 단계; (b) 보이스 정보와 소정의 유사도를 갖는 비교 가수를 선정하는 단계; 및 (c) 비교 가수와 매핑되어 기 저장된 음원 조정값에 기초하여 보이스 정보를 믹싱 및 마스터링 하여 사용자 음원을 생성하는 단계;를 포함하고, 보이스 정보는 사용자가 직접 발성하여 생성한 음성 혹은 노래가 녹음된 정보이고, 음원 조정값은 믹싱 조정값 및 마스터링 조정값을 포함할 수 있다.In the method for generating a sound source based on AI, performed by a user terminal, according to an embodiment of the present invention for achieving the above technical problem, (a) receiving voice information from a user; (b) selecting a comparison singer having a predetermined similarity with voice information; And (c) generating a user sound source by mixing and mastering voice information based on a pre-stored sound source adjustment value mapped to a comparison singer, wherein the voice information is a voice or song generated by the user's own utterance It is recorded information, and the sound source adjustment value may include a mixing adjustment value and a mastering adjustment value.

또한, (a) 단계 이전에 사용자에게 기 설정된 MR음원을 제공하고, 사용자가 MR음원을 통해 노래를 따라 부르도록 유도하여 보이스 정보를 수신할 수 있다.In addition, before step (a), a preset MR sound source may be provided to the user, and voice information may be received by inducing the user to sing along with the song through the MR sound source.

또한, (b) 단계는 보이스 정보와 기 저장된 적어도 하나 이상의 비교 가수의 가수 보이스 정보와 비교하되, 보이스 정보의 장르, 음색 및 높낮이에 기초하여 가수 보이스 정보와의 유사도를 산출할 수 있다.In step (b), the voice information is compared with the previously stored singer's voice information of at least one comparison singer, and a similarity with the singer's voice information can be calculated based on the genre, timbre, and pitch of the voice information.

또한, 가수 보이스 정보의 각 장르마다 기 설정된 장르 정보가 부여되어 있는 경우, 사용자로부터 기 수신된 선호 장르 정보 또는 보이스 정보에 대응되는 장르 정보를 기준으로 사용자와 연관된 장르 정보를 수치화하고, 가수 보이스 정보에 부여된 각각의 장르 정보를 수치화하여 비교함으로써, 비교 가수를 선택할 수 있다.In addition, when preset genre information is provided for each genre of the singer's voice information, genre information associated with the user is digitized based on preferred genre information previously received from the user or genre information corresponding to the voice information, and the singer's voice information A comparison singer can be selected by quantifying and comparing each genre information given to .

또한, 보이스 정보의 음색을 수치화하고, 비교 가수의 가수 보이스 정보에 포함된 음색값과 비교하여 차이가 가장 낮은 비교 가수가 자동 선택된다.In addition, the timbre of the voice information is digitized and compared with the timbre value included in the singer's voice information of the comparison singer, and the comparison singer having the lowest difference is automatically selected.

또한, 보이스 정보의 높낮이를 수치화하고, 비교 가수의 가수 보이스 정보에 포함된 높낮이값과 비교하여 차이가 가장 낮은 비교 가수가 자동 선택된다.In addition, the height of the voice information is digitized and compared with the height value included in the voice information of the singer of the comparison singer, and the comparison singer having the lowest difference is automatically selected.

또한, 비교 가수의 선정 결과에 기초하여 보이스 정보의 카테고리를 분류하여, 카테고리에 대응되는 음원 조정값을 사용자의 음원 조정값으로 설정할 수 있다.In addition, categories of voice information may be classified based on the comparison singer selection result, and a sound source adjustment value corresponding to the category may be set as the user's sound source adjustment value.

또한, 음원 조정값은 믹싱 조건값과 마스터링 조정값으로 구성되고, 음원 조정값은 비교 가수의 원곡에 따라 서로 다르게 매핑되어 저장할 수 있다.In addition, the sound source adjustment value is composed of a mixing condition value and a mastering adjustment value, and the sound source adjustment value may be mapped and stored differently according to the original song of the comparison singer.

또한, 음원 조정값은 비교 가수의 하나의 원곡에 적어도 하나 이상의 음원 조정값이 부여될 수 있고, 원곡의 박자 혹은 멜로디 중 적어도 하나의 기준에 따라 2가지 종류 이상의 음원 조정값이 구분되어 설정할 수 있다.In addition, as for the sound source adjustment value, at least one sound source adjustment value may be assigned to one original song of the comparison singer, and two or more types of sound source adjustment values may be divided and set according to at least one criterion of the beat or melody of the original song. .

또한, (d) 단계 이후 사용자로부터 보이스 정보를 수정하기 위한 장르, 음색 및 높낮이를 수신하면, 수신된 장르, 음색 및 높낮이에 기초하여 커스텀 음원 조정값을 생성하여 보이스 정보를 수정할 수 있다.In addition, when the genre, timbre and pitch for modifying the voice information are received from the user after step (d), the voice information can be modified by generating a custom sound source adjustment value based on the received genre, timbre and pitch.

또한, 커스텀 음원 조정값에 기초하여, 비교 가수의 가수 보이스 정보와 연동되어 기 저장된 음원 조정값을 보정할 수 있다.In addition, based on the custom sound source adjustment value, a pre-stored sound source adjustment value may be corrected in conjunction with the singer's voice information of the comparison singer.

또한, (d) 단계 이후 복수의 보이스 정보에 적용한 복수의 음원의 장르, 높낮이 및 음색에 대한 데이터를 입력값으로 하는 머신러닝모델을 구축하고, 사용자 음원을 생성하기 위한 믹싱 조건값과 마스터링 조건값을 출력값으로 설정하여, 신규 사용자의 보이스 정보를 통해 사용자 음원을 생성하는 과정에서 믹싱 조건값 및 마스터링 조건값에 대한 정확도를 높일 수 있다.In addition, after step (d), a machine learning model is constructed that takes as input values data on genres, pitches, and timbres of a plurality of sound sources applied to a plurality of voice information, and mixing condition values and mastering conditions for generating user sound sources By setting the value as an output value, it is possible to increase the accuracy of the mixing condition value and mastering condition value in the process of generating the user sound source through the voice information of the new user.

또한, (d) 단계 이후 사용자 음원을 중앙 서버로 업로드하여, 사용자 음원을 다른 사용자 단말로 스트리밍하여 제공할 수 있다.In addition, after step (d), the user sound source may be uploaded to the central server, and the user sound source may be streamed and provided to other user terminals.

또한, AI 기반으로 음원을 생성하는 장치에 있어서, AI 기반으로 음원을 생성하는 프로그램이 저장된 메모리 및 메모리에 저장된 프로그램을 실행하여 AI 기반으로 음원을 생성하는 프로세서를 포함하되, 프로세서는 사용자로부터 보이스 정보를 수신하고, 보이스 정보와 소정의 유사도를 갖는 비교 가수를 선정하고, 비교 가수와 매핑되어 기 저장된 음원 조정값에 기초하여 보이스 정보를 믹싱 및 마스터링 하여 사용자 음원을 생성하고, 보이스 정보는 사용자가 직접 발성하여 생성한 음성 혹은 노래가 녹음된 정보이고, 음원 조정값은 믹싱 조정값 및 마스터링 조정값을 포함하는 장치일 수 있다.In addition, in the device for generating a sound source based on AI, a memory in which a program for generating a sound source based on AI is stored and a processor for generating a sound source based on AI by executing a program stored in the memory, the processor provides voice information from a user Receive, select a comparison singer having a predetermined similarity with the voice information, map the comparison singer and mix and master the voice information based on the pre-stored sound source adjustment value to create a user sound source, and the voice information is A voice or a song generated by direct utterance is recorded information, and the sound source adjustment value may be a device including a mixing adjustment value and a mastering adjustment value.

또한, 제 1 항에 의한 AI 기반으로 음원을 생성하는 방법을 수행하기 위한 프로그램이 기록된 컴퓨터 판독가능 저장매체일 수 있다.In addition, it may be a computer readable storage medium on which a program for performing the method of generating a sound source based on AI according to claim 1 is recorded.

본 발명의 일 실시예에 따른, AI를 기반으로 사용자가 손쉽게 자신만의 음원을 생성하고, 그에 대한 수익을 창출할 수 있는 종합플랫폼을 구현하게 된다.According to an embodiment of the present invention, based on AI, a comprehensive platform that allows users to easily create their own sound sources and generate revenue for them is implemented.

또한, 종합플랫폼을 통해, 사용자는 자신이 창작한 음원을 많은 청취자에게 제공할 수 있는 기회를 가지게 되며, 유통사는 창작된 음원을 통해 수익을 창출하거나 경연을 통해 우수한 인재를 육성할 수 있는 발판을 얻게 된다.In addition, through the comprehensive platform, users will have the opportunity to provide their created sound sources to many listeners, and distributors will have a foothold to generate revenue through created sound sources or foster excellent talent through contests. get it

도 1은 본 발명의 일 실시예에 따른, 음원의 생성 및 수익을 창출하는 플랫폼을 구현하기 위한 시스템의 구성을 나타낸 도면이다.
도 2는 본 발명의 일 실시예에 따른, 플랫폼을 제공하는 중앙 서버(100)의 구성을 나타낸 도면이다.
도 3은 본 발명의 일 실시예에 따른, 음원의 생성 및 경연을 진행되는 과정을 나타낸 동작흐름도이다.
도 4는 본 발명의 일 실시예에 따른, 사용자 음원을 제작하는 과정을 나타낸 동작흐름도이다.
도 5는 본 발명의 일 실시예에 따른, AI기반으로 음원을 믹싱하는 과정을 나타낸 동작흐름도이다.
도 6은 본 발명의 일 실시예에 따른, 플랫폼이 제공하는 인터페이스를 통해 음원의 믹싱 및 경연을 진행하는 과정을 나타낸 동작 흐름도이다.
도 7은 본 발명의 일 실시예에 따른, 음원을 통한 수익 창출 및 수익이 분배되는 과정을 나타낸 동작흐름도이다.
도 8 내지 도 13은 본 발명의 일 실시예에 따른, 플랫폼 인터페이스의 예시를 나타낸 도면이다.1 is a diagram showing the configuration of a system for realizing a platform for generating sound sources and generating profits according to an embodiment of the present invention.
2 is a diagram showing the configuration of a central server 100 providing a platform according to an embodiment of the present invention.
3 is an operational flowchart illustrating a process of generating a sound source and performing a contest according to an embodiment of the present invention.
4 is an operation flowchart illustrating a process of producing a user sound source according to an embodiment of the present invention.
5 is an operation flowchart illustrating a process of mixing a sound source based on AI according to an embodiment of the present invention.
6 is an operational flowchart illustrating a process of mixing and contesting a sound source through an interface provided by a platform according to an embodiment of the present invention.
7 is an operational flowchart illustrating a process of generating revenue through a sound source and distributing revenue according to an embodiment of the present invention.
8 to 13 are diagrams illustrating examples of platform interfaces according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail so that those skilled in the art can easily practice the present invention with reference to the accompanying drawings. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case where it is "directly connected" but also the case where it is "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, this means that it may further include other components, not excluding other components, unless otherwise stated, and one or more other characteristics. However, it should be understood that it does not preclude the possibility of existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

이하의 실시예는 본 발명의 이해를 돕기 위한 상세한 설명이며, 본 발명의 권리 범위를 제한하는 것이 아니다. 따라서 본 발명과 동일한 기능을 수행하는 동일 범위의 발명 역시 본 발명의 권리 범위에 속할 것이다.The following examples are detailed descriptions for better understanding of the present invention, and do not limit the scope of the present invention. Therefore, inventions of the same scope that perform the same functions as the present invention will also fall within the scope of the present invention.

본 발명에서 구현하고자 하는 플랫폼은 사용자가 MR음원을 선택하고 그에 대한 노래를 부르는 경우 이를 AI알고리즘을 통해 결합하거나, 커스텀 음원 조정값을 적용하여 결합하여 고유의 사용자 음원을 생성하게 된다.In the platform to be implemented in the present invention, when a user selects an MR sound source and sings for it, it is combined through an AI algorithm or a custom sound source adjustment value is applied and combined to create a unique user sound source.

또한, 본 발명에서 구현하고자 하는 플랫폼은 사용자 음원의 생성뿐만 아니라, 이를 외부에 공개하여 다른 사용자 음원과 경쟁하거나 수익을 창출하여 분배하는 역할까지 수행하게 된다.In addition, the platform to be implemented in the present invention not only generates user sound sources, but also plays a role in competing with other user sound sources or generating and distributing profits by disclosing them to the outside.

이하의 본 명세서에서 음원이란 사운드 혹은 영상 중 적어도 하나를 포함하는 전자 파일을 뜻한다.In the following specification, a sound source refers to an electronic file including at least one of sound and image.

또한, 본 명세서에서 MR음원(music recorded, 혹은 instrumental track)은 특정 가수들의 음원에 목소리만 제외한 사운드 파일을 뜻한다. 통상적으로 소리로만 이루어지나, 실시예에 따라 시각정보가 더 포함될 수 있다. 이때, MR음원은 창작곡이나 기성곡일 수 있으며, 기성곡인 경우 원저작자의 허락에 의해 2차 창작이 가능한 음원에 해당된다.In addition, in this specification, an MR sound source (music recorded, or instrumental track) refers to a sound file excluding only voices from sound sources of specific singers. Usually, it consists of only sound, but depending on the embodiment, visual information may be further included. At this time, the MR sound source may be a creative song or a ready-made song, and in the case of a ready-made song, it corresponds to a sound source for which secondary creation is possible with the permission of the original author.

또한, 본 명세서에서 보이스 정보는 플랫폼을 통한 서비스를 이용하는 사용자가 자신의 단말을 통해 생성하는 음성 정보이다. 즉, 상기 MR음원에 대응되는 노래(이는 노래 가사)를 단말을 통해 녹음한 정보에 대응된다. 이때, 보이스 정보는 실시예에 따라 사운드에만 한정되는 것이 아닌 동영상의 형태로 생성될 수도 있다.In addition, in this specification, voice information is voice information generated by a user using a service through a platform through his/her terminal. That is, it corresponds to information recorded through a terminal of a song corresponding to the MR sound source (this is song lyrics). In this case, the voice information may be generated in the form of a moving picture, which is not limited to sound only, according to embodiments.

또한, 본 명세서에서 개시하는 기술에는 MR음원과 보이스 정보를 결합한다. 이때, MR음원과 보이스 정보를 믹싱하여 사용자 음원을 생성하고, 사용자 음원에 대한 믹싱 조건을 재설정하여 사용자 음원을 조정한 후 마스터링 작업을 수행하게 된다. 즉, 본 명세서에서 "MR음원과 보이스 정보의 결합"은 MR음원과 보이스 정보를 믹싱하고, 마스터링 작업을 수행했음을 포함한다.In addition, the technology disclosed in this specification combines MR sound sources and voice information. At this time, a user sound source is created by mixing the MR sound source and voice information, mixing conditions for the user sound source are reset, the user sound source is adjusted, and mastering is performed. That is, in this specification, "combination of MR sound source and voice information" includes mixing the MR sound source and voice information and performing mastering work.

도 1은 본 발명의 일 실시예에 따른, 음원의 생성 및 수익을 창출하는 플랫폼을 구현하기 위한 시스템의 구성을 나타낸 도면이다.1 is a diagram showing the configuration of a system for realizing a platform for generating sound sources and generating profits according to an embodiment of the present invention.

도 1을 참조하면, 시스템은 중앙 서버(100), 복수의 사용자 단말(200) 및 유통사 서버(300)로 구성될 수 있다. 이때, 도면상에 도시되지 않았으나, 각 장치는 통신망을 통해 유선 또는 무선으로 상호 연결될 수 있다.Referring to FIG. 1 , the system may include a central server 100 , a plurality of user terminals 200 and a distributor server 300 . At this time, although not shown in the drawing, each device may be interconnected in a wired or wireless manner through a communication network.

본 발명의 일 실시예에 따라, 중앙 서버(100)는 사용자 단말(200)로부터 수신한 사용자 정보를 저장하고, 이후 사용자 단말(200)이 음원제작 인터페이스를 통해 생성한 사용자 음원을 수신하면, 먼저 저장한 사용자 정보와 매칭하여 저장하게 된다.According to an embodiment of the present invention, the central server 100 stores the user information received from the user terminal 200, and then, when the user terminal 200 receives the user sound source generated through the sound source production interface, first It is stored after being matched with the stored user information.

이후, 중앙 서버(100)는 복수의 사용자 단말(200)로부터 수신된 사용자 음원을 통해 경연을 수행하고, 우승한 사용자 음원을 유통사 서버(300)로 제공하여 수익을 창출하는 것을 특징으로 한다.Thereafter, the central server 100 performs a contest through user sound sources received from a plurality of user terminals 200 and provides the winning user sound source to the distributor server 300 to generate revenue.

이때, 사용자 단말(200)은 중앙 서버(100)로부터 수신한 MR음원과 사용자가 녹음한 보이스 정보를 음원제작 인터페이스를 통해 결합하여 사용자 음원을 생성한다.At this time, the user terminal 200 generates a user sound source by combining the MR sound source received from the central server 100 and the voice information recorded by the user through a sound source production interface.

이때, 음원제작 인터페이스는 사용자 단말(200)이 중앙 서버(100)로부터 선택하여 수신한 MR음원을 재생하고, 사용자의 보이스 정보를 수신한 후 MR음원과 보이스 정보가 결합된 사용자 음원에 대해 믹싱 및 마스터링 기능을 제공하는 인터페이스를 뜻한다.At this time, the sound source production interface reproduces the MR sound source selected and received by the user terminal 200 from the central server 100, receives the user's voice information, and then mixes and An interface that provides mastering functions.

한편, 플랫폼은 AI믹싱 알고리즘을 제공할 수 있다. 구체적으로 AI믹싱 알고리즘은 기 저장된 음원 조정값(이는 믹싱 조정값 및 마스터링 조정값이 포함될 수 있다.)에 따라 MR음원과 음성을 결합하게 된다. 따라서, 사용자가 앞서 설명한 음원제작 인터페이스를 통해 믹싱 및 마스터링 작업을 생략하게 된다.Meanwhile, the platform may provide an AI mixing algorithm. Specifically, the AI mixing algorithm combines the MR sound source and voice according to pre-stored sound source adjustment values (which may include mixing adjustment values and mastering adjustment values). Therefore, the user can omit the mixing and mastering work through the sound source production interface described above.

또한, 경연은 소정의 기간 동안 복수의 상기 사용자 음원을 청취한 청취자로부터 선호도를 수신하여 순위를 결정하는 과정을 뜻하게 된다. 이때, 특정 사용자의 사용자 음원이 경연에서 승리하면, 중앙 서버(100)를 통해 유통사 서버(300)는 우승한 사용자 음원을 제공한다. 또한, 유튜브 같은 플랫폼 서버로도 사용자 음원을 제공할 수 있다. 이후, 유통사 서버(300)가 사용자 음원을 통해 얻게 되는 수익의 일정 비율을 중앙 서버(100)의 서비스 주체 및 사용자 단말(200)의 생산자가 가지게 된다.In addition, the contest refers to a process of determining a ranking by receiving preferences from listeners who have listened to a plurality of user sound sources for a predetermined period of time. At this time, when a user sound source of a specific user wins the contest, the distributor server 300 provides the winning user sound source through the central server 100 . In addition, a platform server such as YouTube can also provide user sound sources. Thereafter, the service subject of the central server 100 and the producer of the user terminal 200 have a certain percentage of the revenue that the distributor server 300 obtains through the user sound source.

이때, 사용자 단말(200)이 서비스를 이용하기 위한 플랫폼 애플리케이션은 사용자 단말(200)에 내장된 애플리케이션이거나, 애플리케이션 배포 서버로부터 다운로드되어 사용자 단말(200)에 설치된 애플리케이션일 수 있다.In this case, the platform application for the user terminal 200 to use the service may be an application built into the user terminal 200 or an application downloaded from an application distribution server and installed in the user terminal 200 .

한편, 사용자 단말(200)은 유무선 통신 환경에서 단말 애플리케이션을 이용할 수 있는 통신 단말기를 의미한다. 여기서 사용자 단말(200)은 사용자의 휴대용 단말기일 수 있다. 도 1에서는 사용자 단말(200)이 휴대용 단말기의 일종인 스마트폰(smart phone)으로 도시되었지만, 본 발명의 사상은 이에 제한되지 아니하며, 상술한 바와 같이 단말 애플리케이션을 탑재할 수 있는 단말에 대해서 제한 없이 차용될 수 있다.Meanwhile, the user terminal 200 refers to a communication terminal capable of using a terminal application in a wired/wireless communication environment. Here, the user terminal 200 may be a user's portable terminal. In FIG. 1, the user terminal 200 is shown as a smart phone, which is a kind of portable terminal, but the spirit of the present invention is not limited thereto, and as described above, there is no limitation for terminals capable of loading terminal applications. can be borrowed

이를 더욱 상세히 설명하면, 사용자 단말(200)은 핸드헬드 컴퓨팅 디바이스(예를 들면, PDA, 이메일 클라이언트 등), 핸드폰의 임의의 형태, 또는 다른 종류의 컴퓨팅 또는 커뮤니케이션 플랫폼의 임의의 형태를 포함할 수 있으나, 본 발명이 이에 한정되는 것은 아니다.More specifically, user terminal 200 may include any type of handheld computing device (eg, PDA, email client, etc.), cell phone, or other type of computing or communication platform. However, the present invention is not limited thereto.

한편, 통신망은 사용자 단말(200)이 중앙 서버(100) 혹은 유통사 서버(300)에 접속(이는 반대의 경우로 차용될 수 있다.)한 후 데이터를 송수신할 수 있도록 접속 경로를 제공하는 통신망을 의미한다. 통신망은 예컨대 LANs(Local Area Networks), WANs(Wide Area Networks), MANs(Metropolitan Area Networks), ISDNs(Integrated Service Digital Networks) 등의 유선 네트워크나, 무선 LANs, CDMA, 블루투스, 위성 통신 등의 무선 네트워크를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.On the other hand, the communication network is a communication network that provides an access path so that the user terminal 200 can transmit and receive data after accessing the central server 100 or the distributor server 300 (this can be borrowed in the opposite case). it means. Communication networks include, for example, wired networks such as LANs (Local Area Networks), WANs (Wide Area Networks), MANs (Metropolitan Area Networks), and ISDNs (Integrated Service Digital Networks), wireless LANs, wireless networks such as CDMA, Bluetooth, and satellite communication. However, the scope of the present invention is not limited thereto.

도 2는 본 발명의 일 실시예에 따른, 플랫폼을 제공하는 중앙 서버(100)의 구성을 나타낸 도면이다.2 is a diagram showing the configuration of a central server 100 providing a platform according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시예에 따른 중앙 서버(100)는 통신 모듈(101), 메모리(102), 프로세서(103) 및 데이터베이스(104)를 포함한다.Referring to FIG. 2 , a central server 100 according to an embodiment of the present invention includes a communication module 101, a memory 102, a processor 103, and a database 104.

상세히, 통신 모듈(101)은 통신망과 연동하여 중앙 서버(100), 사용자 단말(200) 및 유통사 서버(300) 간의 송수신 신호를 패킷 데이터 형태로 제공하는 데 필요한 통신 인터페이스를 제공한다. 나아가, 통신 모듈(101)은 사용자 단말(200)로부터 데이터 요청을 수신하고, 이에 대한 응답으로서 데이터를 송신하는 역할을 수행할 수 있다.In detail, the communication module 101 provides a communication interface necessary to provide a transmission/reception signal between the central server 100, the user terminal 200, and the distributor server 300 in the form of packet data in conjunction with a communication network. Furthermore, the communication module 101 may serve to receive a data request from the user terminal 200 and transmit data as a response thereto.

여기서, 통신 모듈(101)은 다른 네트워크 장치와 유무선 연결을 통해 제어 신호 또는 데이터 신호와 같은 신호를 송수신하기 위해 필요한 하드웨어 및 소프트웨어를 포함하는 장치일 수 있다. Here, the communication module 101 may be a device including hardware and software necessary for transmitting and receiving a signal such as a control signal or a data signal with another network device through a wired or wireless connection.

메모리(102)는 플랫폼이 동작되기 위한 프로그램이 기록된다. 또한, 프로세서(103)가 처리하는 데이터를 일시적 또는 영구적으로 저장하는 기능을 수행한다. 여기서, 메모리(102)는 자기 저장 매체(magnetic storage media) 또는 플래시 저장 매체(flash storage media)를 포함할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.The memory 102 stores programs for operating the platform. In addition, it performs a function of temporarily or permanently storing data processed by the processor 103. Here, the memory 102 may include magnetic storage media or flash storage media, but the scope of the present invention is not limited thereto.

프로세서(103)는 일종의 중앙처리장치로서 플랫폼이 동작하는 전체 과정을 제어한다. 프로세서(103)가 수행하는 각 단계에 대해서는 도 3 내지 도 7을 참조하여 후술하기로 한다.The processor 103 controls the entire process of operating the platform as a kind of central processing unit. Each step performed by the processor 103 will be described later with reference to FIGS. 3 to 7 .

여기서, 프로세서(103)는 프로세서(processor)와 같이 데이터를 처리할 수 있는 모든 종류의 장치를 포함할 수 있다. 여기서, '프로세서(processor)'는, 예를 들어 프로그램 내에 포함된 코드 또는 명령으로 표현된 기능을 수행하기 위해 물리적으로 구조화된 회로를 갖는, 하드웨어에 내장된 데이터 처리 장치를 의미할 수 있다. 이와 같이 하드웨어에 내장된 데이터 처리 장치의 일 예로써, 마이크로프로세서(microprocessor), 중앙처리장치(central processing unit: CPU), 프로세서 코어(processor core), 멀티프로세서(multiprocessor), ASIC(application-specific integrated circuit), FPGA(field programmable gate array) 등의 처리 장치를 망라할 수 있으나, 본 발명의 범위가 이에 한정되는 것은 아니다.Here, the processor 103 may include all types of devices capable of processing data, such as a processor. Here, a 'processor' may refer to a data processing device embedded in hardware having a physically structured circuit to perform functions expressed by codes or instructions included in a program, for example. As an example of such a data processing device built into hardware, a microprocessor, a central processing unit (CPU), a processor core, a multiprocessor, an application-specific integrated (ASIC) circuit), field programmable gate array (FPGA), etc., but the scope of the present invention is not limited thereto.

데이터베이스(104)는 사용자 단말(200)로 제공될 MR음원(이는 경우에 따라 일반 음원도 포함될 수 있다.), 사용자 단말(200)로부터 수신된 보이스 정보 및 사용자 음원에 대한 데이터가 저장된다. 또한, 각각의 MR음원에 대응되는 음원 조정값과 AI알고리즘 등이 저장될 수 있다.The database 104 stores MR sound sources to be provided to the user terminal 200 (which may also include general sound sources according to circumstances), voice information received from the user terminal 200, and data on the user sound sources. In addition, sound source adjustment values and AI algorithms corresponding to each MR sound source may be stored.

이때, 음원 조정값은 상기 MR음원과 사용자의 음성을 결합하는 과정에서 필요한 믹싱 조정값에 대응될 수 있는데, 예를 들면 싱크값, EQ값 등이 해당될 수 있다. In this case, the sound source adjustment value may correspond to a mixing adjustment value required in the process of combining the MR sound source and the user's voice, for example, a sync value, an EQ value, and the like.

비록 도 2에는 도시하지 아니하였으나, MR음원, 보이스 정보, 사용자 음원 및 음원 조정값에 대한 데이터 중 일부는 데이터베이스(104)와 물리적 또는 개념적으로 분리된 데이터베이스(미도시)에 저장될 수 있다. 즉, 선택적 실시예로 데이터베이스(104)가 생략되어, 사용자 단말(200)에 구비되는 데이터 베이스로부터 필요한 데이터를 수신하여 이용할 수도 있다.Although not shown in FIG. 2, some of the MR sound source, voice information, user sound source, and data on sound source adjustment values may be stored in a database (not shown) physically or conceptually separated from the database 104. That is, as an optional embodiment, the database 104 may be omitted, and necessary data may be received and used from the database provided in the user terminal 200 .

도 3은 본 발명의 일 실시예에 따른, 음원의 생성 및 경연을 진행되는 과정을 나타낸 동작흐름도이다.3 is an operational flowchart illustrating a process of generating a sound source and performing a contest according to an embodiment of the present invention.

도 3을 참조하면, 중앙 서버(100)는 사용자 단말(200)로부터 수신된 사용자 정보 및 사용자 음원을 저장한다(S110).Referring to FIG. 3 , the central server 100 stores user information and user sound sources received from the user terminal 200 (S110).

구체적으로 중앙 서버(100)는 사전에 사용자 정보를 저장하고, 추후 사용자 단말(200)로부터 음원제작 인터페이스에 의해 생성된 사용자 음원을 수신하게 되면, 사용자 정보와 매칭하여 저장하게 된다.Specifically, the central server 100 stores user information in advance, and when a user sound source generated by the sound source production interface is received from the user terminal 200 later, it is matched with the user information and stored.

이때, 중앙 서버(100)는 사용자 단말(200)로 프로필 인터페이스를 제공하게 된다. 프로필 인터페이스는 사용자 정보, 사용자 음원 및 팔로워/팔로잉의 숫자 중 적어도 하나가 표시될 수 있다.At this time, the central server 100 provides a profile interface to the user terminal 200 . The profile interface may display at least one of user information, a user sound source, and the number of followers/following.

또한, 특정 사용자 단말(200)에 대응되는 사용자 음원이 복수로 존재하면, 이를 리스트화하여 표시할 수 있다. 이때, 리스트에 포함된 각각의 사용자 음원에 대한 조회수 및 유통사 서버(300)에 의해 창출된 수익을 더 표시하게 된다.In addition, if a plurality of user sound sources corresponding to a specific user terminal 200 exist, they may be listed and displayed. At this time, the number of hits for each user sound source included in the list and the revenue generated by the distributor server 300 are further displayed.

이때, 단계(S110)에서 개시된 사용자 단말(200)이 음원제작 인터페이스를 통해 사용자 음원을 생성하는 과정은 후술할 도 4를 통해 구체적으로 설명하도록 한다.At this time, the process of generating a user sound source through the sound source production interface by the user terminal 200 initiated in step S110 will be described in detail with reference to FIG. 4 to be described later.

다음으로 중앙 서버(100)는 서로 다른 사용자 단말(200)로부터 수신된 사용자 음원을 통해 경연을 수행한다(S120).Next, the central server 100 performs a contest through user sound sources received from different user terminals 200 (S120).

단계(S120) 이전에 중앙 서버(100)는 청취자 단말로 사용자 음원의 제공 및 경연을 수행하기 위한 경연 인터페이스를 제공하게 된다.Prior to step S120, the central server 100 provides a user sound source to the listener terminal and provides a contest interface for performing the contest.

이때, 청취자 단말이란 사용자 음원을 청취하거나 평가하는 단말을 뜻하게 된다. 즉, 플랫폼의 서비스를 이용하게 되는 사용자에 대응될 수 있기에, 넓은 의미로 사용자 단말(200)도 청취자 단말에 포함될 수 있다.At this time, the listener terminal means a terminal that listens to or evaluates the user's sound source. That is, since it can correspond to a user who uses the service of the platform, the user terminal 200 can also be included in the listener terminal in a broad sense.

또한, 중앙 서버(100)는 경연 인터페이스를 통해 사용자 음원을 수신한 청취자 단말이 특정 사용자 음원에 선호도를 부여하면, 소정의 기간 동안 복수의 사용자 음원이 획득한 선호도에 기초하여 우승 음원을 선별하게 된다.In addition, the central server 100 selects a winning sound source based on the preferences acquired by a plurality of user sound sources during a predetermined period when a listener terminal that has received a user sound source through the contest interface gives a preference to a specific user sound source. .

마지막으로 중앙 서버(100)는 우승한 사용자 음원을 유통사 서버(300)로 제공하고, 유통사 서버(300)는 수익 창출 및 분배를 수행한다(S130).Finally, the central server 100 provides the winning user sound source to the distributor server 300, and the distributor server 300 generates and distributes profits (S130).

이때, 유통사 서버(300)가 우승 음원을 기초로 수익을 창출하여, 수익의 기 설정된 비율을 중앙 서버(100)의 서비스 주체 및 사용자 단말(200)의 창작자에게 제공하게 된다.At this time, the distributor server 300 generates revenue based on the winning music source, and provides a predetermined ratio of revenue to the service subject of the central server 100 and the creator of the user terminal 200 .

도 4는 본 발명의 일 실시예에 따른, 사용자 음원을 제작하는 과정을 나타낸 동작흐름도이다.4 is an operation flowchart illustrating a process of producing a user sound source according to an embodiment of the present invention.

도 4를 참고하면, 중앙 서버(100)는 사용자 단말(200)의 요청에 따라 MR음원을 제공한다(S210).Referring to FIG. 4 , the central server 100 provides an MR sound source according to the request of the user terminal 200 (S210).

단계(S210) 이전에, 중앙 서버(100)는 보유한 MR음원에 대한 리스트를 사용자 단말(200)로 제공하고, 사용자 단말(200)이 특정 MR음원을 선택하면 특정 MR음원을 제공하게 된다.Before step S210, the central server 100 provides a list of MR sound sources possessed to the user terminal 200, and when the user terminal 200 selects a specific MR sound source, the specific MR sound source is provided.

다음으로 사용자 단말(200)이 음원제작 인터페이스를 통해 MR음원 및 사용자의 보이스 정보를 포함하는 음원을 결합하여 사용자 음원을 생성한다(S220).Next, the user terminal 200 generates a user sound source by combining the MR sound source and the sound source including the user's voice information through the sound source production interface (S220).

단계(S220) 이전에 중앙 서버(100)는 사용자 단말(200)로 보이스 수신 인터페이스를 제공하여, 사용자 단말(200)로부터 보이스 정보를 수신하게 된다. 이때, 보이스 수신 인터페이스는 단순히 음성 정보만을 수신할 수도 있고, 필요에 따라 카메라를 통한 동영상의 형태로 보이스 정보를 수신할 수도 있다. Prior to step S220, the central server 100 provides a voice reception interface to the user terminal 200 to receive voice information from the user terminal 200. At this time, the voice receiving interface may simply receive only voice information, or may receive voice information in the form of a video through a camera, if necessary.

또한, 단계(S220)에서 중앙 서버(100)는 음원제작 인터페이스를 통해 상기 사용자 음원에 포함될 보이스 정보에 대한 믹싱 및 마스터링을 위한 인터페이스를 제공하게 된다.In addition, in step S220, the central server 100 provides an interface for mixing and mastering voice information to be included in the user sound source through a sound source production interface.

또한, 사용자 단말(200)은 사용자로부터 커스텀 음원 조정값을 수신하면 커스텀 음원 조정값에 기초하여 사용자 음원의 보이스 정보에 대한 믹싱 및 마스터링을 수행하게 된다.In addition, when the user terminal 200 receives a custom sound source adjustment value from the user, it performs mixing and mastering of voice information of the user sound source based on the custom sound source adjustment value.

이때, 커스텀 음원 조정값은 믹싱 조정값과 마스터링 조정값을 포함할 수 있다. 구체적으로 믹싱 조정값은 보컬 싱크값, 마스터 볼륨값 및 편집값 중 적어도 하나가 포함될 수 있다. 또한, 마스터링 조정값은 더블링, 컴프레서, 이퀄라이져, 리버브, 딜레이 및 리미터값 중 적어도 하나가 포함될 수 있다.In this case, the custom sound source adjustment value may include a mixing adjustment value and a mastering adjustment value. Specifically, the mixing adjustment value may include at least one of a vocal sync value, a master volume value, and an editing value. Also, the mastering adjustment value may include at least one of doubling, compressor, equalizer, reverb, delay, and limiter values.

예를 들어, 보컬 싱크값은 상기 보이스 정보의 재생 시간을 변경하여 싱크를 수정하는 값에 대응될 수 있다. 또한, 마스터 볼륨값은 보이스 정보의 소정의 구간 별 볼륨을 각각 조절하고, 편집 옵션값은 상기 보이스 정보의 소정의 구간을 반복 녹음하거나 삭제하는 값에 대응될 수 있다.For example, the vocal sync value may correspond to a value for correcting sync by changing a reproduction time of the voice information. In addition, the master volume value may adjust the volume for each predetermined section of voice information, and the editing option value may correspond to a value for repeatedly recording or deleting a predetermined section of the voice information.

선택적 실시예로, 중앙 서버(100)는 사용자의 다른 단말(예를 들면, 데스크탑, 노트북 등)로부터 생성된 보이스 정보를 수신하여 저장할 수 있다. 이후 사용자 단말(200)-앞서 개시한 다른 단말과는 전혀 다른 단말-의 요청에 따라 보이스 정보를 제공할 수 있다. 즉, 사용자가 데스크탑으로 자신의 노래를 녹화하여 보이스 정보를 생성하여 중앙 서버(100)로 업로드하고, 사용자는 추후 중앙 서버(100)에 업로드 된 보이스 정보를 스마트폰으로 다운로드하여 편집을 수행할 수 있게 된다.As an optional embodiment, the central server 100 may receive and store voice information generated from other terminals of the user (eg, desktop, laptop, etc.). Thereafter, voice information may be provided according to a request of the user terminal 200 - a terminal completely different from the other terminals disclosed above. That is, the user records his or her song on the desktop, generates voice information and uploads it to the central server 100, and the user later downloads the voice information uploaded to the central server 100 to the smartphone to perform editing. there will be

마지막으로 중앙 서버(100)가 사용자 단말(200)로부터 사용자 음원을 수신한다(S230).Finally, the central server 100 receives the user sound source from the user terminal 200 (S230).

선택적 실시예로, 단계(S210)에서 사용자 단말(200)이 수신하게 되는 MR음원은 기 설정된 음원 조정값이 매칭되어 저장될 수 있다.As an optional embodiment, the MR sound source received by the user terminal 200 in step S210 may be matched with a preset sound source adjustment value and stored.

이러한 경우, 단계(S220)에서 진행되는 과정은 AI믹싱 알고리즘과 음원 조정값에 따라 MR음원 및 보이스 정보를 자동으로 결합될 수도 있다.In this case, the process in step S220 may automatically combine the MR sound source and voice information according to the AI mixing algorithm and the sound source adjustment value.

상기의 실시예의 구체적인 설명은 후술할 도 5를 통해 수행하도록 한다.A detailed description of the above embodiment will be performed through FIG. 5 to be described later.

도 5는 본 발명의 일 실시예에 따른, AI기반으로 음원을 믹싱하는 과정을 나타낸 동작흐름도이다.5 is an operation flowchart illustrating a process of mixing a sound source based on AI according to an embodiment of the present invention.

도 5를 참조하면, 사용자 단말(200)은 사용자로부터 보이스 정보를 수신한다(S310).Referring to FIG. 5 , the user terminal 200 receives voice information from the user (S310).

이때, 앞서 설명한 바와 같이, 보이스 정보는 사용자가 직접 발성하여 생성한 음성 혹은 노래가 녹음된 정보를 뜻하게 된다.In this case, as described above, the voice information refers to information in which a voice or a song generated by a user directly uttered is recorded.

단계(S310) 에서, 사용자 단말(200)은 사용자의 보이스 정보를 수집한 후, 해당 보이스 정보에서 저음, 중음, 및 고음을 추출할 수 있다. 또는, 추가 실시예로, 사용자 단말(200)은 사용자에게 기 설정된 MR음원을 제공하고, 사용자가 MR음원을 통해 노래를 따라 부르도록 유도하여 각 저음, 중음, 고음 별로 보이스 정보를 수신할 수도 있다.In step S310, the user terminal 200 may collect the user's voice information and extract bass, midrange, and treble from the corresponding voice information. Alternatively, as an additional embodiment, the user terminal 200 may provide a preset MR sound source to the user, induce the user to sing along through the MR sound source, and receive voice information for each bass, middle, and treble tone. .

이후, 사용자 단말(200)은 보이스 정보에 대한 성별을 구분할 수 있다. After that, the user terminal 200 can distinguish the gender of the voice information.

다음으로 사용자 단말(200)은 해당 성별 내에서의 보이스 정보와 유사한 비교 가수를 선정한다(S320).Next, the user terminal 200 selects a comparison singer similar to the voice information within the corresponding gender (S320).

구체적으로 사용자 단말(200)은 보이스 정보와 기 저장된 적어도 하나 이상의 상기 비교 가수의 가수 보이스 정보와 비교하게 된다. 이때, 보이스 정보의 장르, 음색 및 높낮이 중 적어도 하나에 기초하여 상기 가수 보이스 정보와의 유사도를 산출하게 된다. 이때, 가수 보이스 정보란 실제 가수가 자신의 MR음원에 맞게 부른 노래의 육성 정보를 뜻한다.Specifically, the user terminal 200 compares the voice information with pre-stored singer voice information of at least one comparison singer. At this time, the similarity with the singer's voice information is calculated based on at least one of the genre, timbre, and pitch of the voice information. At this time, the singer's voice information refers to voice information of a song sung by an actual singer in accordance with his or her MR sound source.

첫 번째로 장르에 기초하여 비교 가수를 선정하는 방법은 아래와 같다.First, a method for selecting a comparison singer based on a genre is as follows.

또한, 가수 보이스 정보의 각 장르마다 기 설정된 장르 정보가 부여되어 있는 경우, 사용자로부터 기 수신된 선호 장르 정보 또는 보이스 정보에 대응되는 장르 정보 중 어느 하나를 기준으로 사용자와 연관된 장르 정보를 수치화하게 된다. 또는, 사용자 단말(200)은 사용자로부터 기 수신된 선호 장르 정보(즉, 프로필 정보 입력단계에서 선호하는 장르로 입력한 값)를 참고하여 장르 정보를 수치화 할 수도 있다. In addition, when preset genre information is provided for each genre of the singer's voice information, the genre information associated with the user is digitized based on either preferred genre information previously received from the user or genre information corresponding to the voice information. . Alternatively, the user terminal 200 may digitize the genre information by referring to the preferred genre information previously received from the user (ie, the value input as the preferred genre in the profile information input step).

이후, 사용자 단말(200)은 가수 보이스 정보에 부여된 각각의 상기 장르 정보를 수치화 하여, 사용자의 장르 정보와 비교함으로써 비교 가수를 선택하게 된다. 예를 들면, 각 가수 별로 해당 가수가 보유한 곡들의 장르들을 식별한 후, 해당 장르에 대한 점수를 수치화한 후 평균한다면 해당 가수의 대표 장르값을 수치화 할 수 있다. 여기서, 사용자 단말(200)은 사용자의 장르 정보와 비교 가수의 장르 정보를 서로 비교함으로써, 가장 비교값이 적은 비교가수를 선택할 수 있다. Thereafter, the user terminal 200 digitizes each of the genre information given to the singer's voice information and compares it with the user's genre information to select a comparison singer. For example, if genres of songs owned by each singer are identified, scores for the corresponding genre are digitized and then averaged, the representative genre value of the corresponding singer can be digitized. Here, the user terminal 200 may select a comparison singer having the smallest comparison value by comparing genre information of the user and genre information of the comparison singer.

두 번째로 음색에 기초하여 비교 가수를 선정하는 방법은 아래와 같다.Second, a method for selecting a comparison singer based on tone is as follows.

사용자 단말(200)은 보이스 정보의 음색을 수치화하고, 비교 가수의 가수 보이스 정보에 포함된 음색값과 비교하여 차이가 가장 낮은 비교 가수를 선택하게 된다. 예를 들면, 허스키 음색, 부드러운 음색, 거친 음색 등 각 음색을 정의한 후, 해당 음색 별로 수치화를 수행할 수 있다. 이는 종래의 음색 수치화 프로그램을 통해 수치화될 수 있다. The user terminal 200 digitizes the timbre of the voice information, compares it with the timbre value included in the singer's voice information of the comparison singer, and selects a comparison singer having the lowest difference. For example, after defining each tone, such as a husky tone, a soft tone, and a rough tone, digitization may be performed for each tone. This can be digitized through a conventional tone digitization program.

세 번째로, 높낮이에 기초하여 비교 가수를 선정하는 방법은 아래와 같다.Thirdly, a method for selecting a comparison singer based on height is as follows.

사용자 단말(200)은 보이스 정보의 높낮이를 수치화하고, 상기 비교 가수의 가수 보이스 정보에 포함된 높낮이값과 비교하여 차이가 가장 낮은 상기 비교 가수를 선택하게 된다. 이때 높낮이 값은 하나의 가수 보이스 정보 내에서 표현되는 복수의 음의 높낮이들의 평균값일 수 있다. The user terminal 200 digitizes the height of the voice information, compares it with the height value included in the singer voice information of the comparison singer, and selects the comparison singer having the lowest difference. In this case, the pitch value may be an average value of a plurality of pitches expressed within one singer's voice information.

상술한 음색, 장르, 높낮이 중 적어도 하나의 수치를 기준으로 비교 가수를 선택할 수 있다. 만약 음색을 기준으로 예를 든다면, 사용자의 음색 수치와 가장 차이값이 적은 비교 가수를 적어도 3개 이상 선정하고, 차이값 별로 정렬할 수 있다. 이러한 방식으로 각 장르, 높낮이에 대해서도 동일한 방법을 수행할 수 있고, 이 중 음색, 장르, 높낮이 모두를 고려했을 때, 사용자의 수치와 가장 차이가 적은 비교가수를 선정할 수도 있다. 혹은, 3개 분야 중 어느 하나의 수치에 대해 가중치를 적용한 후, 비교가수를 선정할 수도 있다. 예를 들면, 장르, 높낮이에 대해서는 A가수가 도출되었으나, 음색부분에서 가장 차이값이 적은 가수가 B이고, 음색에 대한 가중치를 두게 된다면, 비교가수는 B로 선정될 수 있다. A comparison singer may be selected based on at least one numerical value among the above-described timbre, genre, and pitch. If an example is given based on the tone color, at least three comparison singers having the lowest difference value from the value of the user's tone tone may be selected and sorted according to the difference value. In this way, the same method can be performed for each genre and pitch, and among them, considering all of the timbre, genre, and pitch, a comparison singer with the smallest difference from the user's numerical value can be selected. Alternatively, a comparison singer may be selected after applying a weight to any one of the three fields. For example, although singer A is derived for the genre and pitch, if the singer with the smallest difference value in the timbre part is B, and a weight is given for the timbre, B can be selected as the comparison singer.

또한, 사용자 단말(200)은 비교 가수의 선정 결과에 기초하여 보이스 정보의 카테고리를 분류하게 된다. 이후, 사용자 단말(200)은 분류된 카테고리에 대응되는 음원 조정값을 사용자의 음원 조정값으로 설정하여, 후술할 단계(S330)에서 적용하게 된다.In addition, the user terminal 200 classifies the category of voice information based on the comparison singer selection result. Thereafter, the user terminal 200 sets the sound source adjustment value corresponding to the classified category as the user's sound source adjustment value, and applies it in step S330 to be described later.

여기서 카테고리는 비교 가수별로 규정될 수도 있고, 별도의 기준으로 규정될 수도 있다. 예를 들면, 별도의 기준이란 발라드 카테고리, 힙합 카테고리 등과 같이 장르 별로 규정되거나, 10대, 20대, 30대와 같이 나이대로 규정되는 등 다양하게 규정될 수 있다. 각 카테고리 별로, 해당 카테고리에 포함될 수 있는 장르, 음색, 높낮이 중 적어도 하나에 대한 수치 범위에 대해 규정되어 있을 수 있고, 이러한 규정에 부합하는 경우, 사용자 음원은 해당 카테고리에 해당하는 것으로 규정될 수도 있다. Here, the category may be defined for each comparison singer or may be defined as a separate criterion. For example, the separate criterion may be defined by genre, such as a ballad category or a hip-hop category, or defined by age, such as a teenager, 20s, or 30s. For each category, a numerical range for at least one of the genre, timbre, and pitch that can be included in the category may be defined, and if it meets these regulations, the user sound source may be defined as corresponding to the category. .

그리고, 각 카테고리 별로 음원 조정값(믹싱 조건값, 마스터링 조정값)이 기 설정되어 있을 수 있다. 예를 들어, 카테고리가 비교가수 별로 규정되어 있어, 발라드 유형의 A가수 카테고리에 해당하는 사용자 음원이라면, 믹싱 조건값을 구성하는 마스터볼륨 비율(즉, 가수 보이스 정보(또는 MR음원)과 사용자 보이스 간의 불륨 비율), 보컬싱크 비율(즉, 시간 축 상의 가수 보이스 정보(또는 MR음원)의 재생위치와 시간 축 상의 보이스 정보의 재생위치 간의 정렬 시간차에 대한 비율) (단말에서 들리는 소리를 듣고 사용자가 목소리를 내어 녹음을 하므로, 사용자 입장에서는 동시에 녹음을 하는 것이지만, 실제로는 MR음원보다 느린 박자로 목소리가 녹음되는데, 이는 장르나 사용자 취향 등에 따라서 MR음원과 목소리 간 정렬 시간차는 다르게 규정될 수 있으므로 보컬 싱크 비율 조정이 필요함)에 대한 값이 기 설정되어 있을 수 있다. 또한, 마스터링 조정값 역시 더블링, 컴프레서, 이퀄라이저(EQ), 리버브, 딜레이, 리미터 등의 요소로 구성되는데, 각 요소 별로 조정이 필요한 수치가 기설정되어 있을 수 있다. In addition, sound source adjustment values (mixing condition values, mastering adjustment values) may be preset for each category. For example, if the category is defined for each singer to be compared, and the user sound source corresponds to the category of singer A of the ballad type, the master volume ratio constituting the mixing condition value (i.e., the ratio between the singer's voice information (or MR sound source) and the user's voice volume ratio), vocal sync ratio (that is, the ratio of the alignment time difference between the reproduction position of the singer's voice information (or MR sound source) on the time axis and the reproduction position of voice information on the time axis) Since the recording is done with the MR sound source, it is recorded at the same time from the user's point of view, but in reality, the voice is recorded at a slower beat than the MR sound source. A value for ratio adjustment is required) may be preset. In addition, the mastering adjustment value is also composed of elements such as doubling, compressor, equalizer (EQ), reverb, delay, limiter, etc., and numerical values to be adjusted for each element may be preset.

이러한 음원 조정값은 카테고리 별로 다르게 규정될 수 있다. 또는, 음원 조정값은 곡의 장르 별로 다르게 규정될 수 있다. 예를 들면, 사용자가 발라드 또는 힙합을 노래하였는지에 따라 다른 음원 조정값이 적용될 수 있다. 또한, 하나의 곡이라도 곡의 흐름(박자나 멜로디)에 따라, 중간에 음원 조정값이 다르게 설정될 수도 있다. 예를 들어, 2분 길이의 A라는 곡이 플레이 시간 1분을 기준으로 서로 다른 박자를 갖는 경우 음원 조정값도 각 구간에 맞게 서로 다르게 설정될 수 있다.Such a sound source adjustment value may be differently defined for each category. Alternatively, the sound source adjustment value may be differently defined for each genre of music. For example, a different sound source adjustment value may be applied depending on whether the user sings a ballad or a hip-hop song. In addition, even for one song, the sound source adjustment value may be set differently in the middle according to the flow (beat or melody) of the song. For example, when a song called A with a length of 2 minutes has different beats based on a play time of 1 minute, the sound source adjustment value may be set differently according to each section.

추가 실시예로, 사용자 단말(200)은 중앙 서버(100)의 데이터베이스(104)에 기 저장된 복수의 음색 템플릿과 사용자의 보이스 정보를 비교하고, 가장 유사도가 높은 음색 템플릿을 선택하게 된다. 이때, 음색 템플릿은 소프라노, 메조소프라노, 알토, 카운터테너, 테너, 바리톤 또는 베이스 등으로 구분되거나, 다수의 레벨, 예를 들어 0 레벨(저음 음색)에서 10레벨(고음 음색)로 구분되되, 각각의 선택지에 대응되는 음원 조정값을 가진다. 따라서, 음색 템플릿에서 보이스 정보에 대응되는 음원 조정값을 적용하게 된다.As an additional embodiment, the user terminal 200 compares a plurality of tone templates previously stored in the database 104 of the central server 100 with the user's voice information, and selects a tone template having the highest similarity. At this time, the tone template is divided into soprano, mezzo-soprano, alto, countertenor, tenor, baritone, or bass, or divided into a number of levels, for example, 0 level (low tone tone) to 10 level (high tone tone), each It has a sound source adjustment value corresponding to the option of Accordingly, a sound source adjustment value corresponding to voice information is applied in the tone template.

마지막으로 사용자 단말(200)은 비교 가수와 매핑되어 기 저장된 음원 조정값에 기초하여 보이스 정보를 믹싱 및 마스터링 하여 사용자 음원을 생성한다(S330).Finally, the user terminal 200 generates a user sound source by mixing and mastering voice information based on the previously stored sound source adjustment value mapped to a comparison singer (S330).

단계(S330) 이후 사용자 단말(200)은 사용자로부터 새로운 사용자 음원을 수신할 수도 있다. 이 경우, 사용자 단말(200)은 상기의 단계(S330)까지의 과정을 반복 수행할 수 있다. 단계(S330)까지의 재수행결과에 따르는 음원 조정값은 앞선 음원 조정값과 같을 수도 있지만 다를 수도 있다. 예를 들어, 사용자가 선택한 MR음원의 장르가 다르거나, 사용자의 보이스 정보가 앞선 음원에서의 것과 약간 다르게 수신되고 분석될 수 있기 때문이다. 또는, 사용자가 음원 조정값을 수동으로 변경할 경우, 앞선 음원에서의 것과 다를 수 있다. After step S330, the user terminal 200 may receive a new user sound source from the user. In this case, the user terminal 200 may repeatedly perform the process up to step S330. The sound source adjustment value according to the re-execution result up to step S330 may be the same as or different from the previous sound source adjustment value. This is because, for example, the genre of the MR sound source selected by the user is different, or the user's voice information may be received and analyzed slightly differently from that of the previous sound source. Alternatively, when the user manually changes the adjustment value of the sound source, it may be different from that of the preceding sound source.

이러한 방식으로, 하나의 사용자에 대해 수많은 보이스 정보(음색, 장르, 높낮이)가 수신될 수 있고, 해당 보이스 정보에 대해 각각 대응되는 음원 조정값이 발생될 수 있다. 여러 사용자에 대해서 확장해보면, 무수히 많은 보이스 정보에 대한 무수히 많은 음원 조정값이라는 훈련데이터가 있을 수 있다. 이러한 훈련데이터의 입력값을 보이스 정보(음색, 장르, 높낮이)로 설정하고 출력값을 음원 조정값으로 설정하여 기계학습모델에 적용시켜 훈련을 수행할 경우, AI 알고리즘을 통해 사용자 음원에 최적화된 음원 조정값을 추천해주는 알고리즘 학습이 완성될 수 있다. 이때, 기계학습 모델은, ANN, DNN, KNN, CNN 중 어느 하나의 모델이거나, 이들 중 적어도 둘을 조합한 앙상블 모델일 수 있다. 또는, 다양한 다른 인공지능학습모델이 될 수도 있다. In this way, a large number of voice information (tone, genre, pitch) for one user can be received, and corresponding sound source adjustment values can be generated for each corresponding voice information. If expanded for several users, there may be training data of innumerable sound source adjustment values for innumerable voice information. When training is performed by setting the input value of the training data as voice information (tone, genre, pitch) and setting the output value as the sound source adjustment value and applying it to the machine learning model, the AI algorithm adjusts the sound source optimized for the user's sound source. Algorithm learning that recommends values can be completed. In this case, the machine learning model may be any one of ANN, DNN, KNN, and CNN, or an ensemble model combining at least two of them. Or, it could be a variety of other artificial intelligence learning models.

따라서, 사용자가 음원을 많이 만들어내면 많이 만들어낼수록, 음원 조정값이 조금씩 수정되어, 그 사용자에 맞는 음원 조정값이 더 정확하게 도출될 수 있다. 또한, 훈련이 이미 완성된 이후 접속하게 되는 사용자에게는 최초의 서비스에 대해서도 정확도 있는 음원 조정값 추천이 가능하다. Therefore, the more the user creates the sound source, the more the sound source adjustment value is modified little by little, so that the sound source adjustment value suitable for the user can be more accurately derived. In addition, it is possible to recommend sound source adjustment values with accuracy even for the first service for users who access after training has already been completed.

만약, 상기의 값을 수신하면, 사용자 단말(200)은 수신된 장르, 음색 및 높낮이에 기초하여 커스텀 음원 조정값을 생성하여 사용자 보이스를 수정하게 된다.If the value is received, the user terminal 200 generates a custom sound source adjustment value based on the received genre, timbre, and pitch to correct the user's voice.

이때, 사용자 단말(200)은 커스텀 음원 조정값에 기초하여, 비교 가수의 가수 보이스 정보와 연동되어 기 저장된 음원 조정값을 보정할 수도 있다. 이를 통해, 사용자 취향에 맞도록 사용자 음원을 생성할 수 있게 된다At this time, the user terminal 200 may correct the pre-stored sound source adjustment value in conjunction with the singer's voice information of the comparison singer based on the custom sound source adjustment value. Through this, it is possible to create a user sound source to suit the user's taste.

이때, 사용자 음원을 생성하기 위한 믹싱 조건값과 마스터링 조건값을 출력값으로 설정하여, 신규 사용자의 보이스 정보를 통해 사용자 음원을 생성하는 과정에서 믹싱 조건값 및 마스터링 조건값에 대한 정확도를 높이게 된다. At this time, by setting the mixing condition value and the mastering condition value for generating the user sound source as output values, the accuracy of the mixing condition value and the mastering condition value is increased in the process of generating the user sound source through the voice information of the new user. .

이때, 상기의 머신러닝모델은 사용자 단말(200)에서 구축될 수 있으나, 필요한 경우 중앙 서버(100)에서도 구현될 수도 있다.In this case, the machine learning model may be built in the user terminal 200, but may also be implemented in the central server 100 if necessary.

한편, 다른 선택적 실시예로, 단계(S330) 이후 사용자 단말(200)은 사용자 음원을 중앙 서버로 업로드하여, 상기 사용자 음원을 다른 사용자 단말로 스트리밍하여 제공하게 된다.On the other hand, in another optional embodiment, after step S330, the user terminal 200 uploads the user sound source to the central server, and streams the user sound source to other user terminals.

도 6은 본 발명의 일 실시예에 따른, 플랫폼이 제공하는 인터페이스를 통해 음원의 믹싱 및 경연을 진행하는 과정을 나타낸 동작 흐름도이다.6 is an operational flowchart illustrating a process of mixing and contesting a sound source through an interface provided by a platform according to an embodiment of the present invention.

도 6을 참조하면, 중앙 서버(100)는 사용자 정보를 통해 사용자 프로필을 생성한다(S410).Referring to FIG. 6 , the central server 100 creates a user profile through user information (S410).

이때, 사용자 프로필은 프로필 인터페이스에 의해 상기 사용자 단말로 표시된다. At this time, the user profile is displayed on the user terminal through a profile interface.

프로필 인터페이스는 도 8에 도시된 바와 같이 프로필 인터페이스 제 1 영역(110)에 사용자의 사진 및 팔로워/팔로잉의 숫자 중 적어도 하나가 표시된다.As shown in FIG. 8 , the profile interface displays at least one of a user's picture and the number of followers/following in the first area 110 of the profile interface.

또한, 프로필 인터페이스 제 2 영역(120)에는 사용자 단말이 업로드한 상기 사용자 음원의 리스트가 표시된다. 각각의 사용자 음원에 대한 조회수 및 사용자 음원에 의한 수익을 표시함으로써, 사용자는 자신의 사용자 음원 중 어떤 음원이 가장 인기가 좋은지 혹은 수익률이 높은지 확인할 수 있게 된다.In addition, a list of the user sound source uploaded by the user terminal is displayed in the second area 120 of the profile interface. By displaying the number of hits for each user sound source and the profit by the user sound source, the user can check which sound source is most popular or has a high profit rate among the user sound sources.

또한, 프로필 인터페이스 제 3 영역(140)에는 인터페이스를 선택할 수 있는 선택지가 주어진다.In addition, an option for selecting an interface is provided in the third profile interface area 140 .

한편, 선택적 실시예로 단계(S410) 이전에 중앙 서버(100)는 사용자 정보를 수신하기 위한 설문 인터페이스가 사용자 단말(200)로 제공한다.Meanwhile, as an optional embodiment, before step S410, the central server 100 provides a survey interface for receiving user information to the user terminal 200.

이때, 설문 인터페이스는 기 설정된 질문에 대한 선택지를 제공하고, 선택된 선택지에 따라 사용자의 노래에 대한 성향을 분류하게 된다. 분류된 성향을 바탕으로 추후 제공될 서비스에 대한 질을 높이게 된다.At this time, the survey interface provides options for a preset question, and classifies the user's song propensity according to the selected options. Based on the classified propensity, the quality of the service to be provided in the future is improved.

다음으로 중앙 서버(100)는 사용자 단말(200)로부터 음원제작 인터페이스에 의해 생성된 사용자 음원을 수신하면 사용자 프로필과 매칭하여 저장한다(S420).Next, when the central server 100 receives the user sound source generated by the sound source production interface from the user terminal 200, it is matched with the user profile and stored (S420).

이때, 음원제작 인터페이스는 보이스 수신 인터페이스, 믹싱설정 인터페이스 및 마스터링설정 인터페이스로 구성될 수 있다.At this time, the sound source production interface may be composed of a voice reception interface, a mixing setting interface, and a mastering setting interface.

먼저, 보이스 수신 인터페이스는 용자의 음성 혹은 동영상 중 적어도 하나를 수신하기 위한 인터페이스에 해당된다.First, the voice reception interface corresponds to an interface for receiving at least one of a user's voice and video.

이때, 필요에 따라 도 9에 도시된 바와 같이 MR리스트(210)를 제공하고, 사용자는 자신이 부를 노래에 대응되는 특정 MR음원(220)을 선택하여 입력하게 된다.At this time, the MR list 210 is provided as shown in FIG. 9 as needed, and the user selects and inputs a specific MR sound source 220 corresponding to the song to be sung.

도 10을 참조하면, 보이스 수신 인터페이스의 제 1 영역(230)에는 실시간으로 녹화 중인 영상 정보가 제공될 수 있다.Referring to FIG. 10 , video information being recorded in real time may be provided to the first area 230 of the voice receiving interface.

보이스 수신 인터페이스의 제 2 영역(231)에는 보이스 정보에 대응되는 노래의 가사가 제공되어, 사용자가 가사를 보며 노래를 부를 수 있는 환경을 제공하게 된다.Lyrics of a song corresponding to the voice information are provided in the second area 231 of the voice receiving interface, providing an environment in which the user can sing while viewing the lyrics.

보이스 수신 인터페이스의 제 3 영역(240)에는 보이스 수신 인터페이스를 조작하기 위한 적어도 하나 이상의 버튼이 제공될 수 있다. 예를 들면, "녹음", "카메라 전환", "화면 효과 설정"등이 포함될 수 있고, 사용자가 손쉽게 알 수 있는 아이콘의 형태로 제공될 수 있다.At least one button for manipulating the voice reception interface may be provided in the third region 240 of the voice reception interface. For example, "recording", "camera switching", "screen effect setting", etc. may be included, and may be provided in the form of icons that users can easily recognize.

다음으로 믹싱설정 인터페이스는 상기 MR음원 및 보이스 정보를 믹싱하기 위한 믹싱 조정값을 수정할 수 있다.Next, the mixing setting interface may modify mixing adjustment values for mixing the MR sound source and voice information.

이를 도 11a 및 도 11b를 통해 믹싱설정 인터페이스를 설명하면 아래와 같다.The mixing setting interface will be described through FIGS. 11A and 11B as follows.

이때, 믹싱설정 인터페이스는 믹싱 조정값은 보컬 싱크값, 마스터 볼륨 옵션값 및 편집값 중 적어도 하나가 포함될 수 있다.At this time, in the mixing setting interface, the mixing adjustment value may include at least one of a vocal sync value, a master volume option value, and an edit value.

구체적으로 믹싱설정 인터페이스의 제 1 영역(250)에 보컬 싱크값, 마스터 볼륨 옵션값 및 편집값 중 어느 하나를 선택하여 편집하기 위한 선택지가 제공될 수 있다. 따라서, 사용자는 자신이 무엇을 수정할 것인지 여부에 따라 "보컬 싱크값", "마스터 볼륨값", "편집값" 중 어느 하나를 입력하여 선택하게 된다. Specifically, an option for selecting and editing any one of a vocal sync value, a master volume option value, and an edit value may be provided in the first region 250 of the mixing setting interface. Accordingly, the user inputs and selects one of "vocal sync value", "master volume value", and "edit value" according to what he/she intends to modify.

믹싱설정 인터페이스의 제 2 영역(260)에는 MR음원 및 보이스 정보의 파형이 매핑되어 배치되고, MR음원과 보이스 정보의 싱크를 결정하기 위한 시간값을 시각적인 정보로 제공된다. Waveforms of the MR sound source and voice information are mapped and arranged in the second area 260 of the mixing setting interface, and a time value for determining synchronization between the MR sound source and the voice information is provided as visual information.

믹싱설정 인터페이스의 제 3 영역(251)에 보컬 싱크값, 마스터 볼륨값 및 편집값 중 어느 하나를 조절하기 위한 적어도 하나 이상의 버튼이 제공된다. 이때, 제공되는 버튼의 형태나 개수는 상기의 수치를 조절하기에 알맞은 형태로 제공될 수 있기에 본 발명의 범위를 제한하지 않는다.At least one button for adjusting any one of a vocal sync value, a master volume value, and an edit value is provided in the third area 251 of the mixing setting interface. At this time, since the shape or number of provided buttons may be provided in a form suitable for adjusting the above numerical value, the scope of the present invention is not limited.

예를 들어, 도 11a의 경우 음원의 싱크를 수정하기 위한 인터페이스이다. 그에 따라, 믹싱설정 인터페이스의 제 3 영역(251)에는 시간 단위로 음원의 싱크를 이동하거나 건너뛸 수 있는 형태의 입력 버튼이 구비된다. 한편, 도 11b의 경우 보이스 정보 및 MR음원의 마스터 볼륨을 조절하기 위한 인터페이스의 예시이다. 믹싱설정 인터페이스의 제 3 영역(251)에는 믹싱된 음원의 특정 구간에서 MR음원이나 보이스 음원의 볼륨을 조절할 수 있는 한 쌍의 회전식 버튼이 구비될 수도 있다. 이처럼, 본 발명의 믹싱값 및 마스터링값을 설정하기 위해 제공되는 입력버튼의 형태 및 개수는 사용자가 직관적으로 보고 판단할 수 있는 형태로 제공될 수 있다.For example, in the case of FIG. 11A, it is an interface for correcting the sync of a sound source. Accordingly, the third region 251 of the mixing setting interface is provided with an input button in the form of moving or skipping the sync of the sound source in units of time. Meanwhile, FIG. 11B is an example of an interface for adjusting voice information and master volume of an MR sound source. A pair of rotary buttons for adjusting the volume of the MR sound source or voice sound source in a specific section of the mixed sound source may be provided in the third area 251 of the mixing setting interface. As such, the shape and number of input buttons provided to set the mixing value and mastering value of the present invention can be provided in a form that the user can intuitively see and determine.

다음으로 마스터링설정 인터페이스는 MR음원 및 보이스 정보 중 적어도 하나를 마스터링하기 위한 마스터링 조정값을 수정할 수 있다.Next, the mastering setting interface may modify a mastering adjustment value for mastering at least one of the MR sound source and voice information.

이때, 마스터링 조정값은 예를 들어, 더블링, 컴프레서, 이퀄라이져, 리버브, 딜레이 및 리미터값 중 적어도 하나가 포함될 수 있다.In this case, the mastering adjustment value may include, for example, at least one of doubling, compressor, equalizer, reverb, delay, and limiter values.

도 11c을 통해 마스터링설정 인터페이스의 형태를 설명하면, 아래와 같다.Referring to the form of the mastering setting interface through Figure 11c, as follows.

마스터링설정 인터페이스의 제 1 영역(252)에 상기 더블링, 컴프레서, 이퀄라이져, 리버브, 딜레이 및 리미터값 중 어느 하나를 선택하여 편집하기 위한 선택지가 제공된다. 사용자는 자신이 수정하기 위한 마스터링 설정을 터치 입력하여 선택할 수 있다.An option for selecting and editing any one of the doubling, compressor, equalizer, reverb, delay, and limiter values is provided in the first area 252 of the mastering setting interface. The user may select a mastering setting to be modified by touch input.

마스터링설정 인터페이스의 제 2 영역(261)은 사용자 음원의 파형이 배치되고, 편집될 구간의 위치를 시각적으로 제공하거나 특정 구간을 선택할 수 있도록 제공될 수 있다.The second area 261 of the mastering setting interface may be provided to visually provide a location of a section to be edited or to select a specific section where the waveform of the user sound source is arranged.

마스터링 인터페이스의 제 3 영역(253)에 상기 더블링, 컴프레서, 이퀄라이져, 리버브, 딜레이 및 리미터값 중 어느 하나를 조절하기 위한 적어도 하나 이상의 버튼이 제공될 수 있다.At least one button for adjusting any one of the doubling, compressor, equalizer, reverb, delay, and limiter values may be provided in the third region 253 of the mastering interface.

이때, 제 3 영역(253)을 통해 제공되는 버튼은 수정하려는 값에 따라 형태와 개수가 달라질 수 있기에 발명의 범위를 제한하지 않는다. 예를 들어, 더블링 및 컴프레서를 조절하기 위해서는 마스터링 인터페이스의 제 3 영역(253)에는 도 11b에 도시된 회전식 버튼이 제공될 수도 있다.At this time, the shape and number of buttons provided through the third area 253 may vary according to the value to be modified, so the scope of the invention is not limited. For example, a rotary button shown in FIG. 11B may be provided in the third area 253 of the mastering interface to adjust the doubling and compressor.

마지막으로 다른 사용자 단말로부터 수신된 복수의 사용자 단말을 통해 경연을 수행한다(S430).Finally, a contest is performed through a plurality of user terminals received from other user terminals (S430).

이때, 중앙 서버(100)는 청취자 단말로 상기 사용자 음원의 제공 및 상기 경연을 수행하기 위한 경연 인터페이스를 제공하게 된다.At this time, the central server 100 provides the user sound source to the listener terminal and provides a contest interface for performing the contest.

구체적으로, 도 12를 참조하여 경연 인터페이스를 설명하면, 경연 인터페이스는 경연이 진행 중인 적어도 하나 이상의 사용자 음원의 리스트(330)를 표시하게 된다.Specifically, referring to FIG. 12, the contest interface will display a list 330 of one or more user sound sources in which the contest is in progress.

또한, 청취자 단말이 사용자 음원의 리스트에 포함된 특정 사용자 음원을 선택하면, 특정 사용자 음원의 곡 정보 및 스트리밍이 제공된다. 이때, 곡 정보는 사용자 프로필, 원곡의 명칭, 원곡의 가수명, 원곡 설명, 섬네일, 영상 정보 및 특정 사용자 음원에 부여된 선호도의 수치 중 적어도 하나가 포함하게 된다.In addition, when the listener terminal selects a specific user sound source included in the list of user sound sources, song information and streaming of the specific user sound source are provided. At this time, the song information includes at least one of the user profile, the name of the original song, the singer's name of the original song, a description of the original song, a thumbnail, image information, and a numerical value of preference given to a specific user sound source.

또한, 소정의 기간 동안 각각의 상기 사용자 음원이 상기 청취자 단말로부터 획득한 상기 선호도에 기초하여 우승 음원을 선정되면, 선정된 우승 음원은 소정의 기간 동안 경연 인터페이스에 표시(310, 320)하게 된다.In addition, if a winning sound source is selected based on the preference obtained from the listener terminal for each user sound source during a predetermined period, the selected winning sound source is displayed on the contest interface (310, 320) for a predetermined period.

이때, 도 13에 도시된 바와 같이 사용자 단말(200)로부터 사용자의 프로필 정보(340) 및 코멘트(350)를 수신하여, 상기 사용자가 생성한 다른 사용자 음원 및 우승 횟수와 함께 표시(360)할 수 있다. At this time, as shown in FIG. 13, the user's profile information 340 and comments 350 may be received from the user terminal 200 and displayed 360 along with other user sound sources created by the user and the number of winnings. have.

도 7은 본 발명의 일 실시예에 따른, 음원을 통한 수익 창출 및 수익이 분배되는 과정을 나타낸 동작흐름도이다.7 is an operational flowchart illustrating a process of generating revenue through a sound source and distributing revenue according to an embodiment of the present invention.

이하는 설명하려는 바는 수익의 창출 및 흐름을 설명하기 위한 예시이다. 종래의 경우 음원은 창출된 수익의 경우 음원 판매사가 30%, 저작권 협회 등에서 10%를 나누어 획득하며, 원래 창출 된 수익의 60%를 다시 음원 유통사가 20%의 수수료를, 기획사가 80%의 수수료를 각각 취하기에 음원의 창작자가 가지게 되는 수익은 많이 줄어들게 된다. 한편, 유튜브와 같은 동영상 콘텐츠의 경우에도 유튜브, 저작권협회, 음원(동영상) 유통사, 기획사 등이 수익의 일부를 갖게 됨으로 실제 콘텐츠 제작자가 가지게 되는 수익의 비율이 낮아진다는 한계를 가진다.The following is an example for explaining the generation and flow of revenue. In the case of conventional music, in the case of generated revenue, the music seller divides 30% and the copyright association divides 10%, and 60% of the originally generated revenue is paid again by the music distributor for 20% commission and the agency 80% commission. Since each of them is taken, the profit that the creator of the sound source has will be greatly reduced. On the other hand, even in the case of video content such as YouTube, YouTube, copyright associations, sound source (video) distributors, promoters, etc. have a portion of the revenue, so the actual content producer has a limit in that the percentage of revenue is lowered.

도 7을 참조하면, 유통사 서버(300)는 사용자 음원을 청취자에게 공개함으로써, 100만원의 수익을 창출한다(S510).Referring to FIG. 7 , the distributor server 300 generates revenue of 1 million won by disclosing user sound sources to listeners (S510).

앞서 서술한 바와 같이, 단계(S510)에서 유통사 서버(300)로 전달되는 사용자 음원은 중앙 서버(100)가 기 설정된 기간 동안 복수의 다른 사용자 음원과의 경연을 수행하여 우승한 음원에 대응된다.As described above, the user sound source transmitted to the distribution company server 300 in step S510 corresponds to the sound source that the central server 100 competes with a plurality of other user sound sources for a predetermined period and wins.

만약, 사용자 음원이 사운드로 구성된 미디어이면, 중앙 서버(100)는 음원 판매사에 대응되는 유통사 서버(300)로 전달하게 된다. 만약, 사용자 음원이 동영상에 구성된 미디어이면, 중앙 서버(100)는 동영상 스트리밍사에 대응되는 유통사 서버(300)로 전달하게 된다.If the user's sound source is a media consisting of sound, the central server 100 transmits it to the distribution company server 300 corresponding to the sound source seller. If the user's sound source is a media composed of a video, the central server 100 delivers it to the distributor server 300 corresponding to the video streaming company.

다음으로, 유통사 서버(300)는 창출된 수익 중 30%(30만 원)를 수수료(혹은 유통 마진)로 획득한다(S520).Next, the distributor server 300 acquires 30% (300,000 won) of the generated revenue as a commission (or distribution margin) (S520).

이때, 선택적 실시예로 사용자 음원 생성에 사용된 MR음원 등이 이미 저작권이 등록된 경우 저작권협회 등이 수익의 10%(10만원)를 획득하게 된다(S5210). 하지만, 사용자 음원에 다른 저작권자의 자료가 이용되지 않은 사용자 음원이 유튜브 등을 통해 수익을 창출하는 경우 해당 과정이 생략될 수도 있다.At this time, as an optional embodiment, if the copyright of the MR sound source used to generate the user sound source has already been registered, the Copyright Association, etc. will obtain 10% (100,000 won) of the revenue (S5210). However, when a user sound source in which no other copyright owner's data is used for the user sound source generates revenue through YouTube or the like, the corresponding process may be omitted.

다음으로 유통사 서버(300)로부터 수수료를 제외한 60만원의 수익금을 전달 받은 중앙 서버(100)의 서비스 주체는 원곡 유통사에게 20%(12만원)을 제공하고 남은 수익금 중50%(24만원)의 수익을 획득하게 된다(S530).Next, the service subject of the central server 100, which receives the proceeds of 600,000 won excluding commission from the distributor server 300, provides 20% (120,000 won) to the original music distributor and 50% (240,000 won) of the remaining proceeds. is obtained (S530).

선택적 실시예로 단계(S520) 혹은 단계(S530)에서 특정 사용자 단말(200)이 생성한 적어도 하나 이상의 사용자 음원이 유통사 서버(300)로 제공되면, 단계(S540)에서 특성 사용자 단말(200)의 사용자 음원의 개수에 따라 수익의 분배 비율을 조정하게 된다. As an optional embodiment, when at least one user sound source generated by a specific user terminal 200 is provided to the distributor server 300 in step S520 or step S530, the characteristic user terminal 200 in step S540. The distribution ratio of revenue is adjusted according to the number of user sound sources.

또한, 다른 선택적 실시예로 단계(S510)에서 특정 사용자 음원이 기 설정된 수치 이상의 수익을 창출하면, 단계(S540)에서 특성 사용자 음원의 조회에 따라 상기 수익의 분배 비율을 조정하게 된다. In addition, as another optional embodiment, if a specific user sound source generates more than a predetermined amount of revenue in step S510, the distribution ratio of the revenue is adjusted according to the inquiry of the specific user sound source in step S540.

상기의 두 기능을 통해 중앙 서버(100)는 우수한 창작자를 지원함과 동시에 높은 수익을 보장해주기 위한 수단이 될 수 있다.Through the above two functions, the central server 100 can be a means to support excellent creators and at the same time guarantee high profits.

마지막으로, 사용자 단말(200)의 창작자는 수익의50%(24만 원)를 획득함으로써 수익의 분배가 마무리 된다(S540).Finally, the creator of the user terminal 200 obtains 50% (240,000 won) of the revenue, and thus the distribution of revenue is completed (S540).

한편, 단계(S510)에서 특정 사용자 단말(200)의 요청에 의해 유통사 서버(300)로 전달된 특정 사용자 음원을 비공개 처리를 요청할 수 있다. 이후, 단계(S520)에서 특정 청취자 단말이 특정 사용자 단말(200)의 복수의 사용자 음원을 대상으로 창출된 수익의 기 설정된 비율 이상을 차지하면, 특정 청취자 단말로 특정 사용자 음원을 제공할 수도 있다. 즉, A라는 창작자에 대해서 B라는 청취자가 열성적으로 음악을 듣는 경우 A라는 창작자의 비공개 음원을 B라는 청취자에게만 제공할 수 있게 된다. On the other hand, in step S510, upon request of the specific user terminal 200, it is possible to request non-disclosure processing of a specific user sound source delivered to the distributor server 300. Thereafter, in step S520, when a specific listener terminal accounts for a predetermined percentage or more of the revenue generated from a plurality of user sound sources of the specific user terminal 200, the specific user sound source may be provided to the specific listener terminal. That is, when a listener B enthusiastically listens to music of a creator A, the private music source of creator A can be provided only to a listener B.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes, and those skilled in the art can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the detailed description above, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts should be construed as being included in the scope of the present invention. do.

100: 중앙 서버
200: 사용자 단말
300: 유통사 서버100: central server
200: user terminal
300: distributor server

Claims

In the method of generating a sound source based on AI, performed by a user terminal,
(a) receiving voice information from a user;
(b) selecting a comparison singer having a predetermined similarity with the voice information; and
(c) generating a user sound source by mixing and mastering the voice information based on a pre-stored sound source adjustment value mapped to the comparison singer;
The voice information is information in which a voice or song generated by the user directly uttered is recorded,
The sound source adjustment value includes a mixing adjustment value and a mastering adjustment value,
Before step (a) above
Providing a preset MR sound source to the user and inducing the user to sing along through the MR sound source, wherein step (a) receives the voice information for each bass, middle, and treble tone from the user;
After step (c) above
Build a machine learning model that takes as input values data on the genre, pitch, and timbre of a plurality of sound sources applied to a plurality of voice information, and set the mixing condition value and mastering condition value for generating the user sound source as output values In the process of generating the user sound source through the voice information of the new user, the accuracy of the mixing condition value and the mastering condition value is increased.

delete

According to claim 1,
The step (b) is
Comparing the voice information with the pre-stored at least one singer's voice information of the comparison singer, and calculating a similarity with the singer's voice information based on the genre, timbre, and pitch of the voice information. How to create.

According to claim 3,
If preset genre information is given for each genre of the singer's voice information,
The genre information associated with the user is digitized based on either the preferred genre information previously received from the user or the genre information corresponding to the voice information, and each genre information assigned to the singer's voice information is digitized. A method of generating a sound source based on AI, which is to select the comparison singer by comparing.

According to claim 3,
The method of generating a sound source based on AI, wherein the timbre of the voice information is digitized and compared with the timbre value included in the singer voice information of the comparison singer, and the comparison singer having the lowest difference is selected.

According to claim 3,
The method of generating a sound source based on AI, wherein the height of the voice information is digitized, and the comparison singer having the lowest difference is selected by comparing the height value included in the singer voice information of the comparison singer.

According to claim 1,
After step (b) above
A method for generating a sound source based on AI, wherein the category of the voice information is classified based on the selection result of the comparison singer, and the sound source adjustment value corresponding to the category is set as the user's sound source adjustment value.

According to claim 1,
The sound source adjustment value is composed of a mixing condition value and a mastering adjustment value,
The method of generating a sound source based on AI, wherein the sound source adjustment value is mapped and stored differently according to the original song of the comparison singer.

According to claim 8,
As for the sound source adjustment value, at least one sound source adjustment value may be assigned to one original song of the comparison singer, and two or more types of sound source adjustment values are classified according to at least one criterion of time signature or melody of the original song. A method of generating a sound source based on AI, which is set.

According to claim 1,
After step (c) above
Upon receiving the genre, timbre and pitch for modifying the voice information from the user, a custom sound source adjustment value is created based on the received genre, timbre and pitch to modify the voice information, based on AI. how to create it.

According to claim 10,
Based on the custom sound source adjustment value, the method of generating a sound source based on AI, wherein the pre-stored sound source adjustment value is corrected in conjunction with the singer voice information of the comparison singer.

delete

According to claim 1,
After step (c) above
A method for generating a sound source based on AI, wherein the user sound source is uploaded to a central server and the user sound source is streamed and provided to other user terminals.

In the device for generating a sound source based on AI,
A memory in which a program for generating a sound source based on the AI is stored, and
A processor for generating a sound source based on the AI by executing a program stored in the memory,
The processor receives voice information from a user, selects a comparison singer having a predetermined similarity with the voice information, mixes and masters the voice information based on a pre-stored sound source adjustment value mapped to the comparison singer, and performs user A sound source is generated, the voice information is information recorded by a voice or a song generated by the user's direct utterance, the sound source adjustment value includes a mixing adjustment value and a mastering adjustment value,
Prior to selecting the comparison singer, a preset MR sound source is provided to the user, and the user is induced to sing along through the MR sound source. Receiving the voice information for each,
After generating the user sound source, building a machine learning model that takes as input values data on genres, pitches, and timbres of a plurality of sound sources applied to a plurality of voice information, and mixing condition values and mastering for generating the user sound source A device for generating a sound source based on AI, which sets the condition value as an output value to increase the accuracy of the mixing condition value and the mastering condition value in the process of generating the user sound source through voice information of a new user.

A computer-readable storage medium on which a program for performing the method of generating a sound source based on AI according to claim 1 is recorded.