KR20100007102A

KR20100007102A - Online digital contents management system

Info

Publication number: KR20100007102A
Application number: KR1020080067579A
Authority: KR
Inventors: 박종진; 구광효
Original assignee: (주)모그인터렉티브
Priority date: 2008-07-11
Filing date: 2008-07-11
Publication date: 2010-01-22
Also published as: KR101002732B1

Abstract

PURPOSE: An online digital content management system is provided to manage personal digital contents online by using an audio matching technique. CONSTITUTION: A feature extracting unit extracts feature values from audio data of digital contents selected by a user, and a feature matching unit(27) compares the extracted feature values with the audio feature values stored at a feature database. The feature matching unit searches the matched digital contents, and a content management unit(29) provides a user terminal with the matched digital contents. The contents management unit includes the detailed information to a personal possession list of the user.

Description

ONLINE DIGITAL CONTENTS MANAGEMENT SYSTEM

본 발명은 온라인을 통한 디지털 컨텐츠 관리 시스템에 관한 것이다. The present invention relates to a digital content management system through online.

최근 인터넷이나 이동통신망 등의 데이터 통신의 발달로 인해, 많은 사람이 인터넷이나 이동통신망을 통해 다양한 멀티미디어 데이터를 손쉽게 접할 수 있게 되었다. 그 예로, 온라인 상에서 음악, 영화 등을 판매하거나 제공 또는 공유하는 사이트들이 현재 다수 존재하고 있으며, 사용자는 이러한 사이트에 접속하여 자신이 원하는 멀티미디어 데이터를 쉽게 얻을 수 있다. 또한, 오프라인 또는 온라인 상에서 CD, DVD 등을 구입하고, 데이터를 변환하여 휴대용 단말기 또는 개인 PC 등에 저장시켜 손쉽게 자신이 보유한 멀티미디어 데이터를 재생할 수 있다.Recently, due to the development of data communication such as the Internet or a mobile communication network, many people can easily access various multimedia data through the Internet or a mobile communication network. For example, there are a number of sites that sell, provide, or share music, movies, and the like online, and users can easily access the sites to obtain their desired multimedia data. In addition, it is possible to purchase CDs, DVDs and the like offline or online, convert the data, and store the data in a portable terminal or a personal PC.

그런데, 사용자가 자신이 보유한 멀티미디어 데이터를 자신이 보유한 다수의 기기에서 재생 또는 이용하기 위해서는 해당 데이터를 각각의 기기마다 저장시켜야 한다. 예를 들어, 자신이 보유한 음악을 집의 컴퓨터, 회사의 컴퓨터, 출퇴근 시 이용하는 MP3기기를 이용하여 듣고 싶다면 해당 기기들에 모두 저장시켜야 한다. 이는 매우 번거로운 작업이 아닐 수 없다.However, in order for a user to play or use his or her own multimedia data on a plurality of devices, the corresponding data must be stored for each device. For example, if you want to listen to music that you own using your home computer, your company's computer, or your MP3 device at work, you should store it on all of those devices. This is a very cumbersome task.

또한, 자신의 단말기에서 음원들을 재생하는 경우, 사용자는 자신이 보유한 데이터의 상세정보를 알 수 없는 경우가 허다하다. 이는 일반적으로 온라인 상에서 거래되는 음원들의 경우 정상적인 거래를 통하지 않고 유통되는 경우가 있으므로, 이러한 거래경로를 통해 얻은 개인의 음원들은 파일명이나 아티스트 정보, 앨범 정보, 곡명, 저작권 정보 등이 없는 경우가 많고, 설사 정상적인 거래경로를 통해 구입한 경우에 해당하더라도 개인이 파일명을 변경하거나 관리상의 소홀로 인해 속성 정보가 훼손될 수 있기 때문이다. In addition, in the case of reproducing the sound sources in the user's terminal, the user often cannot know the detailed information of the data held by the user. This is because in general, music files that are traded online are not distributed through normal transactions, so individual sound sources obtained through such a trading path do not have file names, artist information, album information, song names, copyright information, etc. Even if the product is purchased through a normal transaction path, the attribute information may be damaged due to an individual's file name change or neglect in management.

따라서, 본 발명의 목적은 오디오 매칭 기술을 이용하여 온라인 상에서 개인이 보유한 디지털 컨텐츠를 관리할 수 있는 온라인을 통한 디지털 컨텐츠 관리 시스템을 제공하는 것이다. 또한, 개인이 보유한 디지털 컨텐츠의 상세정보를 자동으로 정리하여 보여줄 수 있고, 저작권 정보를 기반으로 정산을 해줄 수 있는 온라인을 통한 디지털 컨텐츠 관리 시스템 및 디지털 컨텐츠 관리 서버를 제공하는 것이다. 또한, 개인이 자신이 보유한 디지털 컨텐츠를 다시 다운로드 받는 경우, 해당 컨텐츠의 파일명 등을 정상적으로 정리해주고 메타정보 등을 추가적으로 입력하여 다운로드시켜 줄 수 있는 온라인을 통한 디지털 컨텐츠 관리 시스템을 제공하는 것 이다. Accordingly, an object of the present invention is to provide an online digital content management system that can manage digital content held by an individual online using audio matching technology. In addition, it is to provide a digital content management system and digital content management server through the online that can automatically organize and show the detailed information of the digital content owned by the individual, and can be settled based on the copyright information. In addition, when an individual downloads his or her digital content again, it is to provide a digital content management system through online that can clean up the file name of the corresponding content and input additional meta information.

상기 목적은 본 발명에 따라, 온라인을 통한 디지털 컨텐츠 관리 시스템에 있어서, 디지털 컨텐츠가 저장된 미디어 저장부; 상기 디지털 컨텐츠에 대한 오디오 특징값들이 저장된 특징데이터베이스; 각 사용자가 보유한 디지털 컨텐츠에 대한 개인보유목록이 저장된 개인데이터베이스; 사용자가 업로드 선택한 디지털 컨텐츠의 오디오 데이터에서 특징값들을 추출하는 특징추출부; 상기 추출된 특징값들을 상기 특징데이터베이스에 저장된 상기 오디오 특징값들과 비교하여 매칭되는 디지털 컨텐츠를 검색하는 특징매칭부; 및 상기 매칭된 디지털 컨텐츠에 대한 상세정보를 상기 사용자 단말기로 제공하는 한편, 상기 사용자의 상기 개인보유목록에 포함시키는 컨텐츠 관리부를 포함하는 것을 특징으로 하는 온라인을 통한 디지털 컨텐츠 관리 시스템에 의해 달성될 수 있다. According to the present invention, there is provided a digital content management system online, comprising: a media storage unit for storing digital content; A feature database storing audio feature values for the digital content; A personal database storing a personal possession list of digital contents held by each user; A feature extractor for extracting feature values from audio data of the digital content uploaded and selected by the user; A feature matching unit which searches the matched digital content by comparing the extracted feature values with the audio feature values stored in the feature database; And a content management unit which provides detailed information on the matched digital content to the user terminal and includes the content information in the personal possession list of the user. have.

그리고, 상기 사용자 단말기를 더 포함하고, 상기 미디어저장부, 상기 특징데이터베이스, 상기 개인데이터베이스, 및 상기 컨텐츠 관리부는 상기 디지털 컨텐츠 관리 서버에 마련되며, 상기 특징추출부와 상기 특징매칭부는 상기 사용자 단말기와 상기 디지털 컨텐츠 관리 서버 중 어느 하나에 마련될 수 있다. And the user terminal, wherein the media storage unit, the feature database, the personal database, and the content manager are provided in the digital content management server, and the feature extractor and the feature matcher are connected to the user terminal. It may be provided in any one of the digital content management server.

여기서, 상기 사용자 단말기로 제공되는 상기 디지털 컨텐츠의 상세정보는 제목, 아티스트명, 제작자, 앨범명, 재생시간, 저작권자, 앨범이미지, 가사 중 적어도 둘 이상 포함할 수 있다. The detailed information of the digital content provided to the user terminal may include at least two of a title, an artist name, a producer, an album name, a playing time, a copyright holder, an album image, and lyrics.

또한, 상기 컨텐츠 관리부는 상기 개인보유목록에 포함된 디지털 컨텐츠에 대한 상기 사용자의 재생 또는 다운로드 요청에 응하여, 상기 요청에 대응하는 디지털 컨텐츠를 상기 특징데이터베이스로부터 추출하여 스트리밍 또는 전송하며, 상기 다운로드 요청에 응하여 전송되는 디지털 컨텐츠는 파일명과 속성정보를 정상적으로 포함할 수 있다. The content manager may extract or stream digital content corresponding to the request from the feature database in response to a user's request to play or download the digital content included in the private list, and transmit or stream the digital content corresponding to the request. The digital content transmitted in response may normally include a file name and attribute information.

상기 컨텐츠 관리부는 상기 매칭이 실패한 경우, 상기 사용자가 업로드 선택한 상기 디지털 컨텐츠를 전송받아 데이터베이스로 구축할 수 있다. When the matching fails, the content manager may receive the digital content uploaded and selected by the user and construct a database.

한편, 상기 특징추출부는 상기 오디오 데이터에서 재생 시작시간부터 일정 재생 시간까지에 해당하는 일부 데이터를 일정 비트를 갖도록 변경한 후 디코딩하고, 상기 디코딩된 데이터를 모노 형식으로 변환하며, 상기 모노형식으로 변환된 데이터를 노멀라이즈 처리한 후 음이 시작되는 시작점을 탐색하여, 상기 탐색한 시작점 이후의 데이터로부터 특징벡터를 추출할 수 있다. Meanwhile, the feature extracting unit decodes some data corresponding to a certain reproduction time from the start time of reproduction to a predetermined reproduction time in the audio data, decodes the data, converts the decoded data into a mono format, and converts the mono data into the mono format. After the normalized data is processed, a starting point at which a sound starts may be searched to extract a feature vector from the data after the searched starting point.

또한, 상기 특징추출부는 상기 노멀라이즈 처리된 데이터를 프레임 단위로, 음을 구성하는 최소 파형 모형이 존재하는지 여부를 검사하고, 상기 최소 파형 모형이 존재하는 프레임이 연속하여 일정 개수 이상 검출되는 경우, 상기 검출된 프레임들 중 처음 최소 파형 모형이 검출된 위치를 시작점으로 결정하며; 상기 최소 파형 모형은 소정 값 이상의 에너지를 가진 샘플의 개수가 일정 개수 이상 포함될 수 있다. In addition, the feature extracting unit checks whether the minimum waveform model constituting the sound exists in the unit of the normalized data on a frame basis, and when a predetermined number or more frames of the minimum waveform model are continuously detected, Determine, as a starting point, a position where the first minimum waveform model of the detected frames is detected; The minimum waveform model may include a predetermined number or more of the number of samples having a predetermined value or more energy.

아울러, 상기 특징추출부는 상기 추출된 특징벡터를 일정 개수의 프레임마다 더하여 합산 데이터들을 산출하고, 상기 합산 데이터들 간의 차이값들을 산출하여 상기 특징값들로 출력할 수 있다. In addition, the feature extractor may add summed feature vectors for each predetermined number of frames to calculate sum data, calculate difference values between the sum data, and output the sum values.

이상 설명한 바와 같이, 본 발명에 따르면, 오디오 매칭 기술을 이용하여 온라인 상에서 개인이 보유한 디지털 컨텐츠를 관리할 수 있는 온라인을 통한 디지털 컨텐츠 관리 시스템이 제공된다. 또한, 개인이 보유한 디지털 컨텐츠의 상세정보를 자동으로 정리하여 보여줄 수 있고, 저작권 정보를 기반으로 정산을 해줄 수 있는 온라인을 통한 디지털 컨텐츠 관리 시스템 및 디지털 컨텐츠 관리 서버가 제공된다. 또한, 개인이 자신이 보유한 디지털 컨텐츠를 다시 다운로드 받는 경우, 해당 컨텐츠의 파일명 등을 정상적으로 정리해주고 메타정보 등을 추가적으로 입력하여 다운로드시켜 줄 수 있는 온라인을 통한 디지털 컨텐츠 관리 시스템이 제공된다. As described above, according to the present invention, there is provided an online digital content management system that can manage digital content held by an individual online using audio matching technology. In addition, a digital content management system and a digital content management server are provided online that can automatically organize and show detailed information of a digital content owned by an individual and make a settlement based on copyright information. In addition, when an individual downloads his or her own digital content again, an online digital content management system is provided that can clean up the file name of the corresponding content and input additional meta information.

이하, 도면을 참조하여 본 발명의 구체적인 실시예들에 대해 설명하기로 한다. Hereinafter, specific embodiments of the present invention will be described with reference to the drawings.

도 1은 본 발명의 제1 실시예에 따른 온라인을 통한 디지털 컨텐츠 관리 시스템의 개략도이다. 도 1에 도시된 바와 같이, 본 발명의 온라인을 통한 디지털 컨텐츠 관리 시스템은 사용자 단말기(10a, 10b, 10c, 이하 10이라 함), 디지털 컨텐츠 관리 서버(20) 및 이들 간의 연동을 위한 네트워크(30)를 포함한다. 1 is a schematic diagram of a digital content management system via online according to a first embodiment of the present invention. As shown in FIG. 1, the digital content management system online according to the present invention includes a user terminal 10a, 10b, 10c, 10, a digital content management server 20, and a network 30 for interworking therebetween. ).

사용자 단말기(10)는 디지털 컨텐츠 관리 서비스를 제공받기 위한 개인용 단말기로서, 예컨대, 개인 PC, 노트북, PDA, MP3 장치 등 다양한 장치가 사용될 수 있다. 이외에도, 인터넷을 통해 디지털 컨텐츠 관리 서버(20)에 접속 가능한 장치라면 본 발명의 사용자 단말기(10)로 이용될 수 있다. The user terminal 10 is a personal terminal for receiving a digital content management service. For example, various devices such as a personal PC, a notebook computer, a PDA, and an MP3 device may be used. In addition, any device that can be connected to the digital content management server 20 through the Internet may be used as the user terminal 10 of the present invention.

사용자 단말기(10)는 디지털 컨텐츠 관리 서버(20)에 접속할 수 있는 모듈 및 데이터의 업로드를 위한 모듈을 갖추고 있다. The user terminal 10 includes a module for accessing the digital content management server 20 and a module for uploading data.

사용자는 사용자 단말기(10)를 통해 디지털 컨텐츠 관리 서버(20)에 접속하여 자신이 보유한 디지털 컨텐츠를 업로드 하거나, 디지털 컨텐츠 관리 서버(20)에 보유중인 자신의 컨텐츠를 자신의 단말기에 저장, 재생 또는 목록에서 삭제, 수정할 수 있다. 여기서, 디지털 컨텐츠는 음악, 비디오, 플래시 등 다양한 디지털 형태의 멀티미디어 데이터를 포함하는 의미이며, 다만 설명의 편의를 위해 음악 파일을 예로 설명하기로 한다. The user accesses the digital content management server 20 through the user terminal 10 and uploads his or her own digital content, or stores, plays back, or stores his or her own content in the digital content management server 20. Can be deleted or modified from the list. Here, the digital content is meant to include multimedia data in various digital forms such as music, video, flash, etc. However, a music file will be described as an example for convenience of description.

본 발명에 따른 온라인을 통한 디지털 컨텐츠의 관리 시스템은 사용자가 업로드 선택한 디지털 컨텐츠로부터 오디오 특징값들을 추출하는 특징추출부(미도시)와 특징값들 간의 매칭을 수행하는 특징매칭부(27)를 포함하며, 각각 사용자 단말기(10) 또는 디지털 컨텐츠 관리 서버(20)에 마련될 수 있다. 본 실시예에서는 특징추출부가 사용자 단말기(10)에 설치되고, 특징매칭부(27)는 디지털 컨텐츠 관리 서버에 설치된 것을 일 예로 한다. The system for managing digital content online according to the present invention includes a feature extractor (not shown) for extracting audio feature values from a digital content uploaded and selected by a user and a feature matching unit 27 for performing matching between feature values. Each may be provided in the user terminal 10 or the digital content management server 20. In this embodiment, the feature extraction unit is installed in the user terminal 10, the feature matching unit 27 is an example that is installed in the digital content management server.

본 실시예에 따른 특징추출부는 웹 설치 프로그램으로서, 예컨대, 액티브 액스 프로그램 등이 될 수 있다. 구체적으로, 사용자 단말기(10)에 설치되는 특징추 출부는 사용자가 디지털 컨텐츠 관리 서버(20)에 접속하여 자신의 단말기에 보유 중이거나 자신이 단말기에 연결 또는 삽입된 기록매체 등에 저장된 디지털 컨텐츠의 업로드 기능을 선택하면, 사용자가 업로드 선택한 디지털 컨텐츠의 오디오 데이터에서 특징값들을 추출하여 네트워크(30)를 통해 디지털 컨텐츠 관리 서버(20)로 전송한다. The feature extraction unit according to the present embodiment may be a web installation program, for example, an active access program. In detail, the feature extraction unit installed in the user terminal 10 uploads digital content stored in a recording medium that the user holds in his terminal by connecting to the digital content management server 20 or is connected to or inserted into the terminal. When the function is selected, feature values are extracted from audio data of the digital content uploaded and selected by the user and transmitted to the digital content management server 20 through the network 30.

디지털 컨텐츠 관리 서버(20)는 네트워크(30)를 통해 사용자 단말기(10)와 연동하여 본 발명의 디지털 컨텐츠 제공 서비스를 하기 위한 것으로, 사용자 단말기(10)로부터 전송되는 요청에 따라 해당 기능이 이루어지도록 동작한다. Digital content management server 20 is to provide a digital content providing service of the present invention in conjunction with the user terminal 10 through the network 30, so that the corresponding function is made according to a request transmitted from the user terminal 10 It works.

디지털 컨텐츠 관리 서버(20)는 도 1에 도시된 바와 같이, 특징 매칭부(21), 미디어 저장부(21), 특징데이터베이스(23), 개인데이터베이스(25), 및 컨텐츠 관리부(29)를 포함한다. As shown in FIG. 1, the digital content management server 20 includes a feature matching unit 21, a media storage unit 21, a feature database 23, a personal database 25, and a content management unit 29. do.

미디어 저장부(21)는 다양한 디지털 컨텐츠가 데이터베이스로 구축된 것으로, 컨텐츠 자체뿐만 아니라, 제목, 앨범명, 재생 시간, 저작자, 저작권자, 실연자, 제작자 등의 다양한 컨텐츠 속성 정보를 포함할 수 있다. 또한, 이외에도 앨범이미지, 가사 등의 상세정보를 더 포함할 수 있다. The media storage unit 21 is constructed of various digital contents as a database, and may include not only the content itself but also various content attribute information such as title, album name, play time, author, copyright holder, performer, producer, and the like. In addition, it may further include detailed information such as album images and lyrics.

특징데이터베이스(23)는 미디어 저장부(21)에 저장된 디지털 컨텐츠에 대한 오디오 특징값들이 저장된 것으로, 사용자 단말기(10)로부터 전송되는 오디오 특징값들과 매칭이 이루어지는 비교 데이터로서 사용된다. 특징데이터베이스(23)에 저장된 오디오 특징값들은 전술한 사용자 단말기(10)에 설치된 특징추출부에서 사용하는 추출방법과 유사한 방법으로 추출되어 저장된 것이다. The feature database 23 stores audio feature values for digital content stored in the media storage unit 21 and is used as comparison data that matches the audio feature values transmitted from the user terminal 10. The audio feature values stored in the feature database 23 are extracted and stored in a similar manner to the extraction method used by the feature extractor installed in the user terminal 10 described above.

개인데이터베이스(25)는 각 사용자들이 보유하고 있는 컨텐츠의 개인보유목록이 저장된 것으로, 예를 들어, 사용자 개인 정보, 보유 컨텐츠 목록, 이력 정보 등 다양한 사용자 정보가 저장될 수 있다. 이러한 개인보유목록은 웹 페이지로 사용자에게 제공되며, 디지털 컨텐츠 관리 서버는 사용자가 보유하고 있는 컨텐츠 목록을 소팅, 그룹화, 편집할 수 있는 다양한 관리기능을 사용자에게 제공할 수 있다. The personal database 25 stores a personal possession list of contents held by each user. For example, various user information such as user personal information, a list of reserved contents, and history information may be stored. Such a private list is provided to the user as a web page, and the digital content management server may provide the user with various management functions for sorting, grouping, and editing the content list owned by the user.

특징매칭부(27)는 사용자 단말기(10)로부터 전송되는 오디오 특징값들을 특징데이터베이스(23)에 저장된 특징값들과 매칭을 수행하여 매칭되는 컨텐츠를 찾는다. The feature matching unit 27 matches the audio feature values transmitted from the user terminal 10 with the feature values stored in the feature database 23 to find matching content.

컨텐츠 관리부(29)는 네트워크(30)를 통해 사용자 단말기(10)와 연동하여 본 발명의 디지털 컨텐츠의 관리 서비스를 하기 위한 것으로, 사용자 단말기(10)로부터 전송되는 요청에 따라 해당 기능이 이루어지도록 동작한다. The content management unit 29 is for interworking with the user terminal 10 through the network 30 to manage the digital content of the present invention, and operates to perform a corresponding function according to a request transmitted from the user terminal 10. do.

구체적으로, 컨텐츠 관리부(29)는 특징매칭부(27)의 매칭 결과, 매칭되는 디지털 컨텐츠에 대한 상세정보를 사용자 단말기(10)에 제공하는 동시에, 개인데이터베이스(25)의 사용자 보유 컨텐츠 목록에 해당 컨텐츠의 정보를 포함시킨다. In detail, the content manager 29 provides the user terminal 10 with detailed information about the matched digital content as a result of the matching of the feature matching unit 27, and corresponds to the user-owned content list of the personal database 25. Include information about the content.

여기서, 사용자에게 제공되는 컨텐츠에 대한 상세정보는 제목, 앨범명, 재생 시간, 저작자, 저작권자, 실연자, 제작자 등의 속성정보를 포함할 수 있다. 이러한 상세정보들이 일목요연하게 정리되어 사용자에게 목록형태로 제공되며, 사용자는 이를 통해 자신이 보유하고 있는 컨텐츠에 대한 다양한 상세정보를 얻을 수 있다. 예를 들어, 사용자가 보유하고 있는 컨텐츠가 속성정보 등이 훼손된 파일이라도 본 발명의 시스템을 이용하면 해당 컨텐츠의 속성정보를 쉽게 알 수 있게 된다.Here, the detailed information about the content provided to the user may include attribute information such as title, album name, playing time, author, copyright holder, performer, producer, and the like. These details are summarized at a glance and provided to the user in the form of a list, through which the user can obtain various detailed information about the contents held by the user. For example, even if the content possessed by the user is a file whose attribute information or the like is damaged, the attribute information of the corresponding content can be easily known using the system of the present invention.

전술한 바와 같이, 사용자는 디지털 컨텐츠 관리 서버(20)에 접속하여 자신이 보유한 컨텐츠의 목록, 컨텐츠의 상세정보 등을 확인하고, 이를 관리할 수 있다. 컨텐츠 관리부(29)에서 이러한 사용자 요청에 따른 관리 기능을 제공하는데, 예를 들어, 재생, 컨텐츠의 다운로드, 앨범이미지 다운로드, 가사 보기 등 다양한 서비스를 제공할 수 있다. As described above, the user may access the digital content management server 20 to check a list of contents owned by the user, detailed information of the contents, and the like. The content manager 29 provides a management function according to the user's request. For example, the content manager 29 may provide various services such as playback, content download, album image download, and lyrics display.

예를 들어, 사용자가 컨텐츠 관리부(29)에 접속하여 자신이 보유하고 있는 컨텐츠 목록의 열람 기능을 선택하면, 컨텐츠 관리부(29)는 개인데이터베이스(25)에 저장된 해당 사용자의 컨텐츠 보유 목록을 추출하여 보유 컨텐츠의 목록과 그 상세 정보를 사용자 단말기(10)로 제공한다. 또한, 사용자가 개인보유목록에 포함된 컨텐츠의 재생 또는 다운로드 기능을 선택하면, 선택한 컨텐츠를 미디어 저장부(21)에서 추출하여 사용자 단말기로 스트리밍 서비스 또는 전송할 수 있다. For example, when a user accesses the content management unit 29 and selects a function of viewing a content list owned by the user, the content management unit 29 extracts a content retention list of the user stored in the personal database 25. A list of retained contents and detailed information thereof are provided to the user terminal 10. In addition, when the user selects a function of playing or downloading the contents included in the personal possession list, the selected contents may be extracted from the media storage unit 21 to be streamed or transmitted to the user terminal.

이때, 다운로드 요청에 따라 전송되는 데이터에는 해당 컨텐츠의 파일명, 메타 정보 등이 정상적으로 포함되어 있는 것이 바람직하다. 예를 들어, MP3 파일과 같이 파일의 메타정보를 추가적으로 저장할 수 있는 음악 파일 형태의 경우, 곡명, 앨범명, 아티스트명, 발행년도, 트랙번호 등을 제대로 입력하여 다운로드해준다. In this case, the data transmitted according to the download request preferably includes the file name, meta information, etc. of the corresponding content. For example, in the case of a music file type that can additionally store meta information of a file, such as an MP3 file, a song name, album name, artist name, issue year, track number, etc. are correctly input and downloaded.

이와 같이, 본 시스템을 이용하면, 사용자는 자신이 업로드한 컨텐츠들에 대한 상세정보를 일목요연하게 볼 수 있으며, 자신이 업로드한 컨텐츠에 대해 디지털 컨텐츠 관리 서버에 접속 가능한 곳이라면 언제 어디서든 스트리밍서비스 받을 수 있다. 또한, 예를 들어 집에 있는 PC에 저장되어 있는 음악을 디지털 컨텐츠 관리 서버(20)로 올리고, 이후 다른 단말기 예컨대, MP3, PDA, 또는 휴대폰을 통해 디지털 컨텐츠 관리 서버(20)에 접속하여 해당 단말기에 다운로드 할 수 있으며, 다운 로드 시 파일명, 파일의 속성 정보가 제대로 포함되어 있는 데이터를 전송받을 수 있으므로 사용자에게 매우 유익하다. As such, when using this system, the user can see the detailed information on the contents uploaded by him at a glance, and can receive streaming service anytime and anywhere if the user can access the digital content management server for the uploaded contents. Can be. In addition, for example, the music stored in the home PC is uploaded to the digital content management server 20, and then connected to the digital content management server 20 through another terminal, for example, an MP3, PDA, or a mobile phone. It can be downloaded to the file, and it is very beneficial to the user because it can receive the data including the file name and file property information when downloading.

예를 들어, 사용자 자신이 보유한 디지털 컨텐츠가 파일명이 임의로 변경되거나 속성정보가 훼손되어 있더라도, 본 시스템을 이용하면 정확한 속성정보를 가진 디지털 컨텐츠를 다운로드, 재생 서비스받을 수 있으며, 자신이 보유한 음원과 버전이 다른 동일한 음원도 다운로드, 재생 서비스받을 수 있다는 장점을 갖는다. For example, even if the digital content owned by the user is randomly changed in the file name or the attribute information is damaged, the system allows the user to download and play the digital content with the correct attribute information. This same sound source has the advantage of being able to download and play services.

또한, 본 발명의 시스템은 저작권자의 정보를 제공할 수 있으므로, 이러한 저작권 정보를 기반으로 개인 간의 거래 등에 있어서 정산을 해줄 수도 있다.In addition, the system of the present invention can provide the information of the copyright holder, it is also possible to settle the transaction between individuals based on such copyright information.

전술한 디지털 컨텐츠 관리 서버(20)는 물리적으로 하나의 장치로 이루어질 수도 있고, 각 구성들이 독립적인 모듈로 구성되고 이들 모듈들이 통신 연결에 의해 연동하여 동작할 수도 있다. 예를 들어, 컨텐츠 관리부(29)는 웹 서버 프로그램, 관리 프로그램 등이 설치된 컴퓨터로 구현되고, 미디어 저장부(21), 특징데이터베이스(23), 개인데이터베이스(25)는 각각 컨텐츠 관리부(29)와 연동하는 DB서버들로 구현될 수 있다. 또한 특징매칭부(27)는 소프트웨어 프로그램으로 구현 가능하며, 서버 컴퓨터에 설치되어 실행될 수 있다. The above-described digital content management server 20 may be configured as a single physical device, each component may be configured as an independent module and these modules may operate in conjunction with a communication connection. For example, the content management unit 29 is implemented by a computer on which a web server program, a management program, or the like is installed, and the media storage unit 21, the feature database 23, and the personal database 25 are each a content management unit 29 and a personal computer 25. Can be implemented with interlocking DB server. In addition, the feature matching unit 27 may be implemented as a software program, and may be installed and executed on a server computer.

이하, 도 1에 따른 본 발명의 디지털 컨텐츠 관리 시스템의 전체적인 동작을 살펴보기로 한다. 도 2는 도 1의 디지털 컨텐츠 관리 시스템의 제어흐름도이다.Hereinafter, the overall operation of the digital content management system of the present invention according to FIG. 1 will be described. FIG. 2 is a control flowchart of the digital content management system of FIG. 1.

도 2를 참조하면, 사용자는 본 서비스를 제공받기 위해, 디지털 컨텐츠 관 리 서버(20)의 웹 사이트에 회원으로 가입한다(S10). 그리고, 디지털 컨텐츠 관리 서버(20)는 서비스를 제공하기 위해 필요한 프로그램인 특징추출부를 웹을 통해 사용자 단말기(10)에 설치한다(S11). 2, the user subscribes to the web site of the digital content management server 20 as a member in order to receive the service (S10). In addition, the digital content management server 20 installs a feature extraction unit, which is a program necessary for providing a service, to the user terminal 10 through the web (S11).

사용자는 디지털 컨텐츠 관리 서버(20)에 접속하고 웹 페이지에서 제공하는 업로드 기능을 선택하여, 자신의 단말기에 현재 저장 중이거나 자신의 단말기에 연결 또는 삽입된 CD, DVD, 메모리 장치 등이 저장하고 있는 디지털 컨텐츠의 업로드를 선택할 수 있다(S12). The user accesses the digital content management server 20 and selects an upload function provided by a web page, and is stored in a CD, DVD, memory device, etc., which is currently being stored in or connected to or inserted into his terminal. Upload of the digital content may be selected (S12).

이렇게, 사용자가 디지털 컨텐츠의 업로드 기능을 선택하면 사용자 단말기(10)에 설치된 특징추출부가 실행되어 업로드 선택된 디지털 컨텐츠의 오디오데이터에서 특징값들을 추출한다(S13). 한편, 특징추출부은 CD 리핑 기능을 지원할 수 있으며, 이 경우 사용자 단말기(10)에 삽입된 CD에 저장된 데이터를 MP3 등의 파일 형태로 변환한 후 특징값들을 추출하게 된다. In this way, when the user selects the upload function of the digital content, the feature extraction unit installed in the user terminal 10 is executed to extract feature values from the audio data of the uploaded digital content (S13). Meanwhile, the feature extracting unit may support a CD ripping function, and in this case, feature values are extracted after converting data stored in the CD inserted into the user terminal 10 into a file such as MP3.

이렇게 추출된 특징값들은 네트워크(30)를 통해 디지털 컨텐츠 관리 서버(20)로 전송된다(S14). The extracted feature values are transmitted to the digital content management server 20 through the network 30 (S14).

특징매칭부(27)는 사용자 단말기(10)로부터 전송된 오디오 특징값들을 특징데이터베이스(23)에 구축된 특징값들과 비교하여(S15), 매칭되는 디지털 컨텐츠를 검색한다. The feature matching unit 27 compares audio feature values transmitted from the user terminal 10 with feature values constructed in the feature database 23 (S15), and searches for matching digital content.

만약, 매칭에 성공하면(S16), 컨텐츠 관리부(29)는 매칭된 디지털 컨텐츠의 상세정보를 사용자 단말기(10)에 제공하고, 사용자의 보유 컨텐츠 목록에 해당 컨텐츠 정보를 추가한다(S17).If the matching is successful (S16), the content manager 29 provides the detailed information of the matched digital content to the user terminal 10, and adds the corresponding content information to the user's own content list (S17).

전술한 바와 같이, 매칭 성공한 디지털 컨텐츠의 상세 정보(예, 곡명, 앨범명, 아티스트 등)가 목록으로 형성되어 사용자에게 제공될 수 있다. 또한, 사용자에게 제공되는 컨텐츠 목록에는 컨텐츠의 제목, 저작자, 실연자, 실행 시간, 앨범 재킷 등의 다양한 정보가 포함될 수 있다.As described above, detailed information (eg, song name, album name, artist, etc.) of the digital content that has been successfully matched may be formed as a list and provided to the user. In addition, the content list provided to the user may include various information such as title, author, performer, execution time, album jacket, and the like of the content.

다만, 매칭이 실패하면(S16), 컨텐츠 관리부(29)는 사용자 단말기(10)에 전체 데이터의 업로드를 요청하고(S18), 사용자 단말기(10)는 요청된 컨텐츠 전체를 컨텐츠 관리부(29)로 전송하고, 컨텐츠 관리부(29)는 전송되는 컨텐츠를 데이터베이스로 구축한다(S19). 예를 들어, 업로드된 데이터에 메타 정보와 같은 속성정보가 제대로 포함된 경우라면, 해당 디지털 컨텐츠를 미디어저장부(21)에 포함시키고, 디지털 컨텐츠의 오디오데이터에서 특징값들을 추출하여 특징데이터베이스(22)에 포함시키며, 또한 해당 디지털 컨텐츠의 정보를 개인데이터베이스(23)에 추가한다. However, if matching fails (S16), the content manager 29 requests the user terminal 10 to upload the entire data (S18), and the user terminal 10 sends the entire requested content to the content manager 29. The content manager 29 constructs the transmitted content as a database (S19). For example, if the uploaded data properly includes attribute information such as meta information, the digital content is included in the media storage unit 21, and the feature database 22 is extracted by extracting feature values from audio data of the digital content. ), And also add the information of the digital content to the personal database (23).

이와 같이, 본 발명은 미디어 저장부(21)에 데이터베이스로 구축되어 있지 않은 컨텐츠의 경우, 사용자가 올리는 컨텐츠를 이용하여 데이터베이스로 구축할 수 있어, 이용자가 많아지고 올리는 컨텐츠의 양이 많아질수록 서버의 DB의 양이 많아지는 장점이 있다. As described above, according to the present invention, in the case of the content that is not built into the database in the media storage unit 21, it can be built into the database using the content uploaded by the user, so that the number of users increases and the amount of content uploaded increases. This has the advantage of increasing the amount of DB.

도 3은 도 1의 디지털 컨텐츠 관리 시스템에서 디지털 컨텐츠의 스트리밍 서비스를 제공하는 방법의 개략도이다. 3 is a schematic diagram of a method of providing a streaming service of digital content in the digital content management system of FIG. 1.

도 3을 참조하면, 사용자가 네트워크(30)를 통해 디지털 컨텐츠 관리 서버(20)에 접속하여 사용자 로그인을 하면(S20), 컨텐츠 관리부(29)는 로그인한 사용자의 페이지를 제공한다(S21). 사용자는 자신의 페이지에서 자신의 사용자 단말기(10)에 저장된 컨텐츠를 업로드하거나, 업로드했던 컨텐츠들을 재생, 저장, 삭제, 또는 다른 사용자와 공유할 수 있다. Referring to FIG. 3, when a user accesses the digital content management server 20 through the network 30 and logs in a user (S20), the content manager 29 provides a page of the logged in user (S21). A user may upload content stored in his or her user terminal 10 on his page, or play, store, delete, or share the uploaded contents with another user.

도 3에서는 자신이 올린 컨텐츠를 재생하는 방법을 설명한다. 사용자가 웹 페이지에서 보유 컨텐츠 목록을 선택하면(S22), 컨텐츠 관리부(29)는 해당 컨텐츠 목록을 사용자에게 제공한다(S23). 여기서, 사용자에게 제공되는 컨텐츠 목록에는 컨텐츠의 제목, 저작자, 실연자, 실행 시간, 앨범 재킷 등의 다양한 정보가 포함될 수 있다. 3 illustrates a method of playing content posted by the user. When the user selects a list of retained content in the web page (S22), the content manager 29 provides the corresponding content list to the user (S23). Here, the content list provided to the user may include various information such as title, author, performer, execution time, album jacket, and the like of the content.

사용자는 컨텐츠 목록에서 희망하는 디지털 컨텐츠의 재생을 선택할 수 있으며(S24), 선택 시 컨텐츠 관리부(29)는 선택한 디지털 컨텐츠를 미디어 저장부(21)에서 추출하여 사용자 단말기(10)로 스트리밍 서비스 한다(S25).The user may select playback of the desired digital content from the content list (S24), and when selected, the content manager 29 extracts the selected digital content from the media storage unit 21 and streams it to the user terminal 10 ( S25).

이외에도, 사용자가 컨텐츠 목록에서 희망하는 컨텐츠의 저장(다운로드)을 선택하면, 컨텐츠 관리부(29)는 해당 컨텐츠를 미디어 저장부(21)로부터 검색하여 사용자 단말기(10)로 다운로드한다. In addition, when the user selects the storage (download) of the desired content from the content list, the content manager 29 retrieves the content from the media storage unit 21 and downloads the content to the user terminal 10.

예를 들어, 사용자가 디지털 컨텐츠 관리 서버(20)에 접속하여 'A' 음악 파일을 올린 경우, 이후 사용자가 컨텐츠 보유 목록에서 'A' 음악 파일의 다운로드를 선택하면 자신이 처음에 보유했던 파일이 아닌, 디지털 컨텐츠 관리 서버(20)가 보유하고 있는 파일을 다운로드 받게 된다. 이때, 다운로드 되는 파일에는 컨텐츠의 속성정보가 올바르게 포함되어 있으며, 파일명 또한 정상적인 명칭으로 다운로드될 수 있다. 예를 들어, 사용자가 이러한 컨텐츠 정보가 포함되지 않은 음악 파일을 올린 후, 컨텐츠 보유 목록에서 해당 음악 파일을 다운로드 받게 되면 사용자는 컨텐츠 속성정보가 정상적으로 포함되어 있는 음악 파일을 다운로드 받을 수 있게 된다.For example, if a user accesses the digital content management server 20 and uploads an 'A' music file, when the user later selects to download the 'A' music file from the content retention list, the file that he originally owned is Instead, the file held by the digital content management server 20 is downloaded. At this time, the downloaded file includes the attribute information of the content correctly, and the file name may also be downloaded under the normal name. For example, if a user uploads a music file that does not include such content information and then downloads the corresponding music file from the content retention list, the user can download a music file that normally contains content attribute information.

다른 예로서, 본 발명의 일 실시예에 따른 디지털 컨텐츠 관리 서버(20)는 사용자 간 컨텐츠 공유 서비스를 제공할 수도 있다. 사용자가 자신이 보유하고 있는 컨텐츠를 타 사용자에게 공유할 수 있도록 한 경우, 디지털 컨텐츠 관리 서버(20)는 공유를 희망하는 타 사용자에게 공유를 선택한 해당 컨텐츠의 상세정보를 목록으로 제공하고, 타 사용자가 재생 또는 다운로드를 원하는 경우 미디어 저장부(21)에서 해당 컨텐츠를 검색하여 타 사용자에게 스트리밍 서비스 또는 다운로드 한다. As another example, the digital content management server 20 according to an embodiment of the present invention may provide a content sharing service between users. When the user allows the user to share the content that he owns to other users, the digital content management server 20 provides the other users who want to share the detailed information of the corresponding content selected in the list, and the other user If the user wants to play or download the media storage unit 21 to search for the content to the streaming service or download to other users.

전술한 바와 같이, 본 발명에 따른 디지털 컨텐츠 관리 시스템은 컨텐츠의 오디오 특징값 추출 및 매칭 기술을 이용하여, 사용자가 보유한 컨텐츠에서 특징값들을 추출하고 매칭 성공된 컨텐츠에 대해서는 미리 구축한 DB를 통해 사용자에게 업로드한 컨텐츠의 상세정보를 보여주고 언제든지 해당 컨텐츠를 스트리밍 또는 다운로드할 수 있도록 한다. 이러한 서비스를 제공하기 위해서는 오디오 데이터의 특징값 추출 및 매칭이 빠르고 정확하게 이루어지는 것이 무엇보다 중요하다. As described above, the digital content management system according to the present invention extracts the feature values from the content possessed by the user by using audio feature value extraction and matching technology of the content, and the user through the pre-built DB for matching successful content It shows the detailed information of the uploaded content to users so that they can stream or download the content at any time. In order to provide such a service, it is important to extract and match feature values of audio data quickly and accurately.

이하, 본 발명의 디지털 컨텐츠 관리 시스템에서, 사용자 단말기(10) 또는 디지털 컨텐츠 관리 서버(20)에 설치되는 특징추출부에 사용되는 오디오 특징값 추출방법과, 디지털 컨텐츠 관리 서버(20)의 특징데이터베이스(23) 구축 시 사용되는 오디오 특징값 추출방법에 대해 구체적으로 설명하기로 한다. Hereinafter, in the digital content management system of the present invention, an audio feature value extraction method used in the feature extraction unit installed in the user terminal 10 or the digital content management server 20, and a feature database of the digital content management server 20 (23) An audio feature value extraction method used in construction will be described in detail.

도 4는 본 발명의 일 실시예에 따른 오디오 데이터의 특징 벡터 추출방법에 관한 흐름도이고, 도 5은 도 4의 처리과정에 따른 파형의 예를 도시한 것이다. 4 is a flowchart illustrating a feature vector extraction method of audio data according to an embodiment of the present invention, and FIG. 5 illustrates an example of a waveform according to the processing of FIG. 4.

본 발명의 일 실시예에서는 디지털 컨텐츠의 오디오 데이터 전체를 사용하지 않고, 일부 데이터만을 가지고 특징 벡터를 추출하는 것을 일 예로 한다. 일부 데이터를 사용하는 이유는 속도를 빠르게 하기 위한 것이며, 이는 신뢰성이 높은 음원 매칭 알고리즘을 사용하기 때문에 가능하다. 물론 경우에 따라서는 전체 데이터를 사용할 수도 있다. According to an exemplary embodiment of the present invention, a feature vector is extracted using only some data without using the entire audio data of the digital content. The reason for using some data is to increase the speed, which is possible due to the use of highly reliable sound source matching algorithm. In some cases, of course, you can use the entire data.

도 4를 참조하면, 본 발명의 일 실시예에서는 오디오 데이터(음악 파일)의 시작 부분 20초 가량의 데이터를 사용한다(S30). 물론, 사용되는 데이터의 양은 실시예에 따라 적절히 변경 가능하다. 예를 들어, 시간 단위로 압축되어 있는 MP3 파일에서 20초에 해당하는 데이터까지 사용한다. Referring to FIG. 4, one embodiment of the present invention uses data of about 20 seconds at the beginning of audio data (music file) (S30). Of course, the amount of data used can be appropriately changed depending on the embodiment. For example, it uses up to 20 seconds of data from an MP3 file compressed in time units.

도 5를 참조하면, (a)는 입력된 MP3 파일의 파형을 도시한 것이고, (b)는 이 MP3 파일에서 시작 부분 20초를 자른 후의 오디오 데이터의 파형을 도시한 것이다. Referring to Fig. 5, (a) shows the waveform of the input MP3 file, and (b) shows the waveform of the audio data after cutting the beginning 20 seconds in this MP3 file.

본 발명의 일 실시예에서는 특징벡터를 추출하기 전에, 획득한 오디오 데이터의 시작 부분 데이터를 일정 형식으로 변환하는 전처리 과정을 수행한다. 동일한 음원이라도 인코딩 환경에 따라 상이한 형식을 가진 경우가 비일비재하기 때문에, 특징 벡터를 추출하기 전에 이러한 다양한 음원을 일정한 형식으로 변경하는 작업을 선행하여, 음원 매칭 성능을 더욱 향상시키도록 한다. 본 발명의 일 실시예에 따른 전처리 과정은 일정 비트 변환, 디코딩, 노멀라이즈, 시작점 추출 및 제거 등의 일련의 처리 프로세스를 포함한다. In an embodiment of the present invention, before extracting the feature vector, a preprocessing process of converting the start data of the acquired audio data into a predetermined format is performed. Even if the same sound source has different formats according to the encoding environment, it is possible to further improve the sound source matching performance by changing the various sound sources to a certain format before extracting the feature vector. The preprocessing process according to an embodiment of the present invention includes a series of processing processes such as constant bit conversion, decoding, normalization, starting point extraction and removal, and the like.

구체적으로, 도 5를 참조하면, 우선 오디오 데이터가 일정 비트를 갖도록 비트레이트를 변경한다(S31). 동일한 음원이라도 비트레이트가 다를 경우 특징 출력 수가 다를 수 있기 때문에 본 발명에서는 소정 비트레이트를 갖도록 전처리한다.Specifically, referring to FIG. 5, first, the bit rate is changed so that audio data has a predetermined bit (S31). Since the number of feature outputs may be different when the bit rate is different even with the same sound source, the present invention is preprocessed to have a predetermined bit rate.

예를 들어, 오디오 데이터의 인코딩 비트레이트는 32 Kbps, 64 Kbps, 128 Kbps, 192 Kbps, 329 Kbps 등 다양할 수 있으며, 본 발명의 일 실시예에서는 속도를 고려하여 64 Kbps로 일률적으로 변경하는 것을 일예로 한다. 도 6에서 (c)는 비트레이트를 64 Kbps로 변경한 후의 파형을 도시한 것이다. For example, the encoding bitrate of the audio data may vary from 32 Kbps, 64 Kbps, 128 Kbps, 192 Kbps, 329 Kbps, etc. In one embodiment of the present invention, the uniform change to 64 Kbps in consideration of the speed As an example. In FIG. 6, (c) shows a waveform after changing the bit rate to 64 Kbps.

이후, 일정 비트레이트를 갖도록 변경된 오디오 데이터를 디코딩한다(S32). 음악 파일의 포맷은 예를 들어, WAVE, MP3, WMA, ASF, OGC 등 다양할 수 있으며, 본 발명의 일 실시예에서는 압축된 파일을 디코딩하여 본래의 PCM 데이터를 추출하고, 이를 기초로 특징 벡터를 추출한다. 도 5에서 (d)는 MP3 파일을 디코딩한 후의 파형을 도시한 것이다. 이때, 파일의 포맷은 MP3에서 WAV로 변경된다. Thereafter, the changed audio data to have a predetermined bit rate is decoded (S32). The format of the music file may vary, for example, WAVE, MP3, WMA, ASF, OGC, etc. In an embodiment of the present invention, the original PCM data is extracted by decoding the compressed file and based on the feature vector Extract In FIG. 5, (d) shows a waveform after decoding the MP3 file. At this time, the format of the file is changed from MP3 to WAV.

이와 같이, 특징벡터를 추출하기 전에 오디오 데이터가 일정 형식을 갖도록 전처리함으로써, 이로부터 추출된 특징벡터가 파일의 포맷이나 비트레이트에 의존하지 않게 되어 매칭 성공률을 더욱 향상시킬 수 있다. In this way, by preprocessing the audio data to have a predetermined format before extracting the feature vector, the feature vector extracted therefrom does not depend on the format or bitrate of the file, further improving the matching success rate.

이렇게 디코딩된 오디오 데이터를 특징이 보다 잘 추출될 수 있도록 하기 위해, 볼륨에 대해 노멀라이즈 한다(S33). 도 5의 (e)는 (d)를 볼륨에 대해 노멀라이즈를 적용한 것으로, 도시된 바와 같이, 전체적인 파형이 크기가 커진 것을 볼 수 있다. The decoded audio data is normalized with respect to volume so that the feature can be extracted better (S33). FIG. 5E illustrates normalization of the volume of FIG. 5D. As shown in FIG. 5E, the overall waveform is increased in size.

노멀라이즈 후, 오디오 데이터의 형식을 모노로 변경한다(S34). 오디오 데이 터가 다중 채널을 가진 경우, 채널마다 소리가 다를 수 있어 이를 모노 채널로 변경한다. 도 5의 (f)는 스테레오 형식의 (e)를 모노 형식으로 변경한 후의 파형을 도시한 것이다. After normalization, the format of the audio data is changed to mono (S34). If the audio data has multiple channels, the sound may be different for each channel, so change it to a mono channel. Fig. 5 (f) shows the waveform after changing the stereo format (e) to the mono format.

위 처리과정이 완료되면, 일정 형식을 갖게 된 오디오 데이터에서 음이 시작되는 시작점을 탐색한다(S35). 동일한 음원이라고 해도 인코딩 환경에 따라 음악의 시작점이 다를 수 있기 때문에, 이러한 경우 음원 매칭이 어긋날 수 있다. 예를 들어, 디지털 음악데이터의 경우 인코딩과정을 거칠 때마다 미세하지만 파형의 변화가 생긴다. 이러한 변형은 타임코드를 쉬프트 시켜 비교 음원간의 시간 축이 어긋나 동일한 데이터 검색에 어려움을 발생시킨다.When the above process is completed, the start point where the sound starts from the audio data having a certain format is searched for (S35). Even in the same sound source, since the starting point of the music may be different according to the encoding environment, the sound source matching may be misaligned in this case. For example, in the case of digital music data, a fine but waveform change occurs every time the encoding process is performed. Such a deformation causes a shift in the time code, causing a shift in the time axis between the comparison sound sources, which causes difficulty in retrieving the same data.

따라서, 본 발명에서는 음악의 시작점을 탐색한 후 이를 제거함으로써(S36), 매칭 성공률을 향상시킬 수 있다. 도 5의 (g)는 (e)에서 시작점을 탐색한 것을 도시한 것이며, (h)는 탐색된 시작점 이전의 데이터를 제거한 후의 오디오 데이터의 파형을 도시한 것이다. Therefore, in the present invention, the matching success rate can be improved by searching for a starting point of music and then removing it (S36). FIG. 5G illustrates a search of a starting point in (e), and (h) illustrates a waveform of audio data after removing data before the searched starting point.

이하, 도 6 내지 도 10을 참조하여 본 발명의 일 실시예에 따른 구체적인 오디오 데이터의 시작점 검출방법에 관해 설명하기로 한다. Hereinafter, a method of detecting a starting point of specific audio data according to an embodiment of the present invention will be described with reference to FIGS. 6 to 10.

본 발명의 일 실시예에 따른 시작점 검출방법은 전처리된 오디오 데이터의 PCM 샘플들을 프레임 단위로 분할하고, 각 프레임마다 최소 파형 모형이 존재하는지 여부를 검사하여 음을 구성하는 시작 프레임을 찾는다. 이때, 노이즈의 영향을 받지 않기 위해 노이즈가 가지는 특성에 따라 예외 조건을 두고 시작점을 찾을 수 있도록 한다. The method for detecting a starting point according to an embodiment of the present invention divides PCM samples of preprocessed audio data into frame units, and examines whether a minimum waveform model exists for each frame to find a starting frame constituting a sound. At this time, in order not to be affected by the noise, the starting point can be found with an exception condition according to the characteristic of the noise.

이하, 시작점 검출에 사용되는 최소 파형 모형에 대해 설명하기로 한다. 사람이 소리를 들을 수 있는 것은 공기가 진동하여 청각기관을 자극하기 때문이며, 이러한 공기의 진동은 파동을 형성한다. 파동은 매질 내의 한 점에서 생긴 매질의 진동 상태가 매질을 통해서 주기적으로 퍼져나가는 현상을 말하며, 공간상의 한점에서 서로 순환적으로 변환되는 에너지가 존재하게 된다. 파동의 주기는 일반적으로 사인파처럼 규칙적으로 퍼져나가지만 음악의 파동은 다양한 악기의 소리와 음성이 합성되어 있기 때문에 불규칙한 주기의 파동을 가지며, 파동의 에너지는 소리의 세기를 나타낸다. Hereinafter, a minimum waveform model used for starting point detection will be described. Humans can hear sounds because the air vibrates and stimulates the auditory organs, and the vibration of the air forms waves. A wave is a phenomenon in which a vibration state of a medium at a point in the medium is periodically spread through the medium, and energy is cyclically converted to each other at a point in space. The period of the wave is generally spread like a sine wave, but the wave of music has an irregular period of wave because the sound and voice of various instruments are synthesized, and the energy of the wave represents the intensity of sound.

최소파형모형은 음악의 불규칙한 주기를 가지는 파동에서 한 부호의 에너지 값을 가지는 반주기 파형을 나타내며, 부호의 에너지 값은 파형의 주기에서 0점을 기준으로 음수와 양수의 값을 의미한다. 반주기 파형은 바로 최소파형모형이 될 수 없고 반주기 파형을 형성하는 샘플의 개수가 일정 수 이상으로 커야 최소파형모형이 된다. 이후 최소파형모형을 형성할 수 있는 최소 샘플의 개수는 N으로 표현한다.The minimum waveform model represents a half-period waveform having an energy value of one sign in a wave having an irregular period of music, and the energy value of a sign means negative and positive values based on zero points in the period of the waveform. The half-period waveform cannot be the minimum waveform model, and the minimum waveform model is required only when the number of samples forming the half-period waveform is larger than a certain number. Since the minimum number of samples that can form the minimum waveform model is represented by N.

도 6은 본 발명의 일 실시예에 따른 최소 파형 모형에 관한 그래프이다. 도 6을 참조하면, N보다 큰 반주기를 갖는 모형이 7개가 검출되는 것을 볼 수 있다(본 실시예에서는 N을 5로 설정). 파형 개수 측정을 위한 샘플 단위는 아날로그 음악의 파동을 디지털 음원으로 변환하기 위해 시간에 따른 에너지를 표본화한 수치이며 보통 CD음질의 음원인 경우 초당 44100의 샘플을 갖는다. 이렇게 반주기 파형으로 형성된 최소파형모형은 음을 구성하는 최소 단위로 사용된다.6 is a graph of a minimum waveform model according to an embodiment of the present invention. Referring to FIG. 6, it can be seen that seven models having a half period larger than N are detected (N is set to 5 in this embodiment). The sample unit for measuring the number of waveforms is a sample of energy over time in order to convert a wave of analog music into a digital sound source. In the case of a CD sound source, it has 44100 samples per second. The minimum waveform model formed by the half-period waveform is used as the smallest unit of sound.

본 발명의 일 실시예에서는, 최소파형모형이 음일 확률을 높여주기 위해 최소파형모형 검사는 샘플 단위로 수행을 하지 않고 샘플의 묶음인 프레임 단위로 수행한다. 샘플 단위로 수행을 하면 최초 하나의 최소파형모형만을 보고 시작점을 결정하지만 프레임 단위로 수행하게 되면 프레임 안에 속한 최소파형모형을 전부 보고 결정하기 때문에 음일 확률이 높아진다. 본 발명의 일실시예에서, 프레임 단위 설정은 사람이 소리를 최소한으로 인식할 수 있는 단위로 음성인식 분야에서 많이 사용된 20ms (880sample)로 적용한다.In one embodiment of the present invention, in order to increase the probability that the minimum waveform model is negative, the minimum waveform model inspection is not performed in units of samples but in units of frames that are bundles of samples. If it is executed in the sample unit, the starting point is determined by looking at only the first minimum waveform model. However, if it is performed in the frame unit, the probability of being negative is increased because all the minimum waveform models belonging to the frame are determined. In one embodiment of the present invention, the frame unit setting is applied to 20ms (880sample), which is widely used in the speech recognition field, as a unit capable of recognizing a sound to a minimum.

한편, 시작점 검출에 있어서, 디지털 음원에 노이즈가 포함될 경우 노이즈도 소리를 구성하는 파동과 에너지를 가지고 있어 최소파형모형이 존재한다. 그래서 노이즈가 포함된 디지털 음원은 노이즈를 시작점으로 찾게 되는 문제가 발생한다.On the other hand, in the detection of the starting point, when noise is included in the digital sound source, the noise also has waves and energy constituting the sound, and thus there is a minimum waveform model. Therefore, a digital sound source containing noise has a problem of finding noise as a starting point.

본 발명에서는 노이즈의 영향을 받지 않고 음악의 시작점을 검출하기 위해 노이즈가 포함된 파형의 특징을 분류하고 이 분류를 이용한 예외 조건을 두어 오류를 최소화 한다.In the present invention, in order to detect the starting point of music without being affected by noise, the characteristics of the waveform including the noise are classified and an exception condition using the classification is used to minimize the error.

도 7a 내지 7d는 디지털 음원에 존재하는 노이즈 파형의 특징을 도시한 것이다. 7A to 7D show the characteristics of the noise waveform present in the digital sound source.

도 7a와 같은 노이즈 파형은 그림으로 보았을 때 노이즈의 식별이 불가능하다. 하지만 시작점을 검출하기 위해 최소파형모형의 존재를 검사하면 음이 나오기 전 묵음 부분에 시작점을 검출하게 된다. 이는 귀로 인지하기 힘들 정도로 낮은 에너지의 파형이 묵음 부분에 존재하기 때문이다. 에너지가 적은 노이즈의 영향을 벗어나기 위해 최소파형모형을 결정짓는 N을 측정할 때 에너지가 낮은 샘플이라면 해 당 샘플을 N개수에 포함시키지 않게 한다. 귀로 인지하기 낮은 샘플 기준은 전체 에너지의 0.5% 미만이다. PCM 데이터의 최고 에너지는 32767으로, 약 0.5% 미만의 기준 값은 128(2⁷)로 정할 수 있다.As shown in FIG. 7A, the noise waveform of FIG. 7A is impossible to identify noise. However, when the existence of the minimum waveform model is examined to detect the starting point, the starting point is detected at the silent portion before the sound comes out. This is because there is a low energy waveform in the silence that is hard to perceive by the ear. When measuring N, which determines the minimum waveform model, to avoid the impact of low-energy noise, do not include those samples in N counts if the samples are low-energy. The low sample criterion is less than 0.5% of total energy. The maximum energy of PCM data is 32767, and a reference value of less than about 0.5% can be set to 128 (2 ⁷ ).

도 7b는 귀로 인지할 수 있는 에너지를 가지며 시간 축으로 고루 분포되어 있지 않고 단 한 점에서 발생하는 노이즈이다. 이러한 특성을 지닌 노이즈는 최소파형모형의 검사를 한 개의 프레임만 하는 것이 아닌, 다수개의 프레임을 연달아 검사하여 연속하여 최소파형모형이 검출된 경우에만 시작점으로 결정한다. 예를 들어, 본 발명의 일 실시예에서는 세 프레임에 연속하여 최소파형모형이 존재하는 경우, 세 프레임 중 첫 번째 프레임을 시작프레임으로 결정한다.Figure 7b is noise that occurs at a single point, with energy recognizable by the ear and not evenly distributed on the time axis. The noise having this characteristic is determined as a starting point only when the minimum waveform model is continuously detected by inspecting a plurality of frames in succession, instead of performing only one frame of the minimum waveform model. For example, in an embodiment of the present invention, when the minimum waveform model exists in three consecutive frames, the first frame of the three frames is determined as the start frame.

도 7c는 귀로 인지할 수 있는 에너지를 가지며 시간 축으로 길게 연장된 특징을 보이고 에너지의 값이 특정 한 부호의 값으로 치중되어있는 특징을 도시한 것이다. 이러한 노이즈 파형은 샘플의 부호 값이 변화하지 않고 일정함을 알 수 있다. 따라서, 이러한 점을 고려하여 최소파형모형을 검출할 때, 예를 들어, 부호가 변화하지 않는 모형 샘플의 개수가 프레임 길이의 1/3 이상이면 노이즈 파형으로 볼 수 있다.FIG. 7C illustrates a characteristic in which the ear has a perceivable energy, shows a feature that is elongated on the time axis, and the value of energy is weighted to a specific sign value. It can be seen that the noise waveform is constant without changing the sign value of the sample. Therefore, when the minimum waveform model is detected in consideration of this point, for example, if the number of model samples whose sign does not change is 1/3 or more of the frame length, it can be regarded as a noise waveform.

도 7d는 귀로 인지할 수 있는 에너지를 가지며 시간 축으로 고루 분포되어 있다. 이러한 특징을 가진 노이즈는 실제 음을 가지는 파형의 모형과 유사하기 때문에 처리하기 힘들다. 다만, 이러한 노이즈는 음악에 따라 시작부분에 일부로 삽입하는 경우가 종종 있는데, 그러한 경우라면 음이라 보아도 무방하기 때문에 문제 가 발생하지 않을 것이다.FIG. 7D has energy recognizable to the ear and is evenly distributed over the time axis. Noise with this characteristic is difficult to process because it is similar to the model of the waveform with the actual sound. However, this noise is often inserted as a part of the beginning depending on the music, in which case it will not be a problem because it can be viewed as a sound.

이하, 전술한 시작점 검출 원리에 따라 음악 파일에서 시작점을 검출하는 알고리즘에 대해 구체적으로 설명하기로 한다. 도 8은 본 발명의 일 실시예에 따라 시작점을 검출하는 방법에 관한 흐름도이고, 아래, <표 1>은 프레임 검사테이블를 예시한 것이다. Hereinafter, an algorithm for detecting a starting point in a music file according to the starting point detecting principle described above will be described in detail. 8 is a flowchart illustrating a method of detecting a starting point according to an embodiment of the present invention. Table 1 below illustrates a frame inspection table.

도 8 및 <표 1>을 참조하면, 전처리된 PCM 데이터를 입력받아(S40), 첫번째 프레임부터 순차적으로 검사한다(S41). 만약 해당 프레임에 최소 파형 모형이 존재하면(S42), 프레임 검사테이블에 기입한다(S43). 기입 시 샘플의 위치와 부호에 따른 최소 파형 모형의 수를 기입한다. 전술한 바와 같이, 최소 파형 모형은 일정 값 이상의 에너지를 가진 샘플만을 카운팅하며, 본 실시예에서는 128 이상의 에너지를 가진 샘플을 카운팅하여 5개 이상 검출되면 이를 최소 파형 모형으로 검출하여 프레임 검사테이블에 기입하게 된다(즉, 도 7a의 노이즈 처리).Referring to FIG. 8 and Table 1, pre-processed PCM data is received (S40), and sequentially checked from the first frame (S41). If the minimum waveform model exists in the frame (S42), it writes to the frame inspection table (S43). When filling in, enter the minimum number of waveform models according to the location and sign of the sample. As described above, the minimum waveform model counts only samples having energy above a predetermined value. In the present embodiment, if five or more samples are counted by counting samples having energy of 128 or more, the minimum waveform model is detected as the minimum waveform model and entered into the frame inspection table. (I.e., noise processing in FIG. 7A).

만약, 해당 프레임에서 최소 파형 모형이 검출되지 않으면(S42), 프레임 검사테이블를 초기화하고(S44), 다음 프레임을 검사하게 된다(S45, S41).If the minimum waveform model is not detected in the frame (S42), the frame inspection table is initialized (S44), and the next frame is examined (S45, S41).

한편, 프레임 검사테이블에서 최소 파형 모형이 존재하는 것으로 카운트 된 프레임의 수가 3이 될 때가지(S46), 즉 연속하여 최소 파형 모형이 존재하는 프레임의 수가 3이 될 때까지 계속하여 프레임을 검사한다(S45, S41)(즉, 도 7b의 노이즈 처리).On the other hand, the frame is continuously examined until the number of frames counted as the minimum waveform model exists in the frame inspection table is 3 (S46), that is, until the number of frames in which the minimum waveform model exists continuously is 3 (S46). (S45, S41) (ie, noise processing in FIG. 7B).

만약, 3을 만족하면, 부호가 변화하지 않는 모형 샘플의 개수가 프레임 길이의 1/3 미만인지 여부의 비율 검사를 하게 된다(S47)(즉, 도 7c의 노이즈 처리). If 3 is satisfied, a ratio test is performed to determine whether the number of model samples whose sign does not change is less than 1/3 of the frame length (S47) (that is, the noise processing of FIG. 7C).

그리고, 비율 검사가 만족되면 프레임 검사테이블의 프레임 위치 첫번째 값이 음악의 시작점이 된다(S48).When the rate check is satisfied, the first frame position value of the frame check table becomes a music starting point (S48).

만약, 카운트 된 프레임의 수가 3을 만족하더라도 비율 검사에서 부호가 변화하지 않는 모형 샘플의 개수가 프레임 길이의 1/3 이상 되면 프레임 검사테이블를 초기화하고(S44), 다시 다음 프레임부터 프레임을 검사한다(S41).If the number of model samples whose sign does not change in the ratio check is 1/3 or more of the frame length even if the number of counted frames satisfies 3, the frame check table is initialized (S44), and the frame is checked again from the next frame ( S41).

이와 같이, 본 발명의 일 실시예에 따르면 음의 시작점 검출 시 노이즈를 함께 고려하였기 때문에, 오류의 가능성을 최소화하여 좋은 성능을 가질 수 있다. As such, according to an embodiment of the present invention, since noise is considered together with the detection of a negative starting point, it is possible to minimize the possibility of error and have good performance.

전술한 시작점 검출 알고리즘의 성능을 음악 장르별 샘플 수 별로 알아보기 위해, 실험을 하였으며, 이에 따른 실험 결과는 도 9 및 도 10에 도시되어 있다. 도 9는 N의 개수에 따른 성능 측정 그래프이고, 도 10은 본 실험에 따른 음악 장르에 따른 성능 측정 그래프이다.In order to find out the performance of the aforementioned starting point detection algorithm by the number of samples per music genre, an experiment was performed, and the experimental results are shown in FIGS. 9 and 10. 9 is a performance measurement graph according to the number of N, Figure 10 is a performance measurement graph according to the music genre according to the present experiment.

실험에서 사용된 디지털 음원은 44100 Hz, 16 bit로 샘플링하였고, 장르와 리듬에 따라 시작점 검출에 영향을 미치는지를 알아보기 위해 음악을 아래 <표 2>와 같이 15가지 군집으로 분류하였다. 또한, 군집 당 곡은 40곡으로 총 600곡에 해당하는 곡을 이용하였다The digital sound sources used in the experiments were sampled at 44100 Hz, 16 bits, and music was classified into 15 clusters as shown in <Table 2> to see if they affect the starting point detection according to genre and rhythm. In addition, 40 songs per group were used for a total of 600 songs.

본 실험은 음악의 최소 파형 모형을 결정지을 수 있는 N을 1 ~ 10 개까지 개수 별로 설정하고 개수마다 측정한 시작 위치를 프레임 단위로 검출하였다. 검출된 결과는 음악을 파형 그림으로 표현하고 프레임마다 선택할 수 있는 프로그램을 제작하여 직접 선택한 검출 프레임과 비교하였다. 시작점의 검출 성능을 검증하기 위해 오차의 범위는 앞뒤 1 프레임으로 하였고, 오차의 범위를 벗어나면 실패로 간주하였다. In this experiment, the number of N to determine the minimum waveform model of music was set by number and the starting position measured for each number was detected in units of frames. The detected results were compared with the directly selected detection frame by creating a program that can represent music as a waveform picture and select each frame. In order to verify the detection performance of the starting point, the error range was one frame before and after, and if it was out of the error range, it was regarded as failure.

도 9에서, 가로축은 최소 파형 모형이 가질 수 있는 샘플의 개수 N을 나타낸 것이며, 세로축은 직접 선택한 검출 프레임과 비교하여 나온 일치율을 나타낸다. 도 9에 도시된 바와 같이, N의 개수가 1개로 이루어진 파형은 최소 파형 모형이라 볼 수 없고, 적어도 2개의 샘플로 구성된 반주기 파형을 최소 파형 모형이라 볼 수 있다. 5개의 샘플을 가진 최소 파형 모형이 약 86%로 실험에서 성능이 가장 좋게 나왔으며 6개 이후로는 점점 성능이 떨어지는 것을 볼 수 있다. 이러한 실험 결과로 볼 때, 최소 파형의 모형을 결정짓는 N의 개수는 2에서 5개 사이에서 결정되는 것이 바람직하다. In FIG. 9, the horizontal axis represents the number N of samples that the minimum waveform model can have, and the vertical axis represents the coincidence rate compared with a directly selected detection frame. As shown in FIG. 9, the waveform consisting of one N number may not be regarded as a minimum waveform model, and a half-period waveform composed of at least two samples may be regarded as a minimum waveform model. The minimum waveform model with five samples was about 86%, which showed the best performance in the experiment, and after 6 the performance was getting worse. From the results of these experiments, it is preferable that the number of N that determines the model of the minimum waveform is determined between two and five.

도 10은 N 개수에 따른 성능 측정 결과를 토대로 가장 성능이 좋게 나온 N=5를 가지고 장르별 성능 측정하였다. 발라드나 재즈와 같이 음이 부드럽게 시작하는 장르는 일치율이 70% ~80% 사이의 낮은값을 가진 것으로 나왔고, 음의 시작이 강하거나 튕기는 음으로 구성된 장르들은 높은 일치율을 보였으나, 전체적으로 볼 때 비교적 좋은 성능을 가진다고 볼 수 있다. 10 is based on the performance measurement results according to the number of N was the best performance was measured by genre with N = 5. Genres that start smoothly, such as ballads and jazz, have a low match between 70% and 80%, while genres that consist of strong or bounced notes show high matches, but overall It has a relatively good performance.

전술한 과정을 거쳐 시작점 이전의 데이터가 제거된 오디오데이터에서 특징 벡터를 추출한다. 특징벡터는 시작점 이전의 데이터가 제거된 데이터를 고속푸리에변환(FFT)하는 단계, 고속푸리에변환된 데이터를 멜 필터 뱅크(MEL Filter Bank)를 수행하는 단계, 멜 필터 뱅크가 수행된 데이터를 로그변환하는 단계, 및 로그 변환된 데이터를 이산코사인변환(DCT)하는 단계를 거친 후, 특징 벡터를 추출하게 된다. 이러한 각각의 변환 과정은 공지된 기술로서, 각각의 구체적인 설명을 생략하기로 한다. The feature vector is extracted from the audio data from which the data before the starting point is removed through the above process. The feature vector includes a fast Fourier transform (FFT) of the data from which data before the start point is removed, a MEL filter bank of the fast Fourier transformed data, and a log conversion of the data performed by the mel filter bank. After performing the step, and the discrete cosine transform (DCT) of the log-transformed data, the feature vector is extracted. Each of these conversion processes is well known, and detailed descriptions thereof will be omitted.

다만, 본 발명에 따른 특징 벡터 추출 과정은 윈도우화 등 기타 다른 과정을 거치지 않고도 성능에는 큰 차이가 없이, MFCC 특징벡터를 추출할 수 있다. 본 발명의 일 실시예에서는 13차 특징벡터를 사용한다. However, the feature vector extraction process according to the present invention can extract the MFCC feature vector without any significant difference in performance without undergoing other processes such as windowing. In an embodiment of the present invention, a thirteenth order feature vector is used.

이하, 도 11을 참조하여 추출된 특징벡터의 후처리 과정을 설명하기로 한다.Hereinafter, a post-processing process of the extracted feature vector will be described with reference to FIG. 11.

도 11은 본 발명의 일 실시예에 따라 특징벡터를 후처리하는 과정을 설명하는 흐름도이다. 도 11을 참조하면, 추출한 13차 MFCC 특징 벡터들을 50프레임씩 합산한다(S50). 1개 프레임에 13차 특징벡터가 존재하게 되므로, 50프레임에는 총 13*50=650개의 특징벡터가 존재하며, 이들 특징벡터들을 모두 합산하여 총 81개의 합산 데이터를 산출한다. 예를 들어, 20초 가량의 시작 데이터에서 시작점 이전의 데이터를 삭제하게 되면 일반적으로 대략 50*81 개의 프레임이 남게 되며, 50 프레임씩 특징벡터들을 묶어서 합산하여 총 81개의 합산 데이터를 산출하여 이를 사용한다. 11 is a flowchart illustrating a process of post-processing a feature vector according to an embodiment of the present invention. Referring to FIG. 11, the extracted 13 th order MFCC feature vectors are summed by 50 frames (S50). Since a 13 th order feature vector exists in one frame, a total of 13 * 50 = 650 feature vectors exist in 50 frames, and all of these feature vectors are summed to calculate a total of 81 sum data. For example, if the data before the starting point is deleted from the starting data of about 20 seconds, approximately 50 * 81 frames are generally left, and after combining feature vectors by 50 frames, a total of 81 sum data are calculated and used. do.

그리고, 81개의 합산 데이터의 차이값을 산출하여(S51), 해당 음원의 특징값들로 정한다. 예를 들어, 81개의 합산 데이터를 Sum₁, Sum₂, Sum₃, ..., Sum₈₁ 이라 할 때, 이들 간의 차이값은 Sum_i ₊₁- Sum_i로 정의할 수 있다. 합산 데이터를 81개 사용하므로, 차이값(특징값 또는 특징 데이터)은 총 80개가 산출된다. 여기서, 경우에 따라 시작점 이전의 데이터가 많이 잘린 경우 81개 미만의 합산 데이터가 존재할 수 있으며, 이 경우 차이값을 0으로 세팅한다. The difference value of the 81 sum data is calculated (S51), and the feature values of the sound source are determined. For example, when 81 sum data is Sum ₁ , Sum ₂ , Sum ₃ ,..., Sum ₈₁ , the difference between them may be defined as Sum _i ₊₁ − Sum _i . Since 81 aggregated data are used, a total of 80 difference values (feature values or feature data) are calculated. Here, in some cases, when much data before the starting point is cut off, there may be less than 81 aggregate data, and in this case, the difference value is set to zero.

이렇듯 차이값을 특징값으로 사용하는 이유는, 81개의 합산 데이터를 그대로 사용하면 노멀라이즈를 하더라도 볼륨량의 차이가 생길 수 있으므로, 차이값인 기울기 데이터를 최종 특징 데이터로 사용한다. The reason why the difference value is used as the feature value is that if the sum total of 81 pieces of data is used as it is, even if it is normalized, a difference in volume may occur. Therefore, the slope data, which is the difference value, is used as the final feature data.

사용자 단말기(10)에 설치된 특징추출부은 전술한 과정을 통해 사용자가 업로드 선택한 음악 파일의 일부 데이터에서 특징값들을 추출하여 네트워크(30)를 통해 컨텐츠 관리부(29)로 전송한다. 한편, 특징데이터베이스(23)에 저장된 특징값들도 전술한 과정을 통해 추출된 것이다. The feature extractor installed in the user terminal 10 extracts feature values from some data of the music file uploaded and selected by the user through the above-described process, and transmits the feature values to the content manager 29 through the network 30. Meanwhile, feature values stored in the feature database 23 are also extracted through the above-described process.

특징매칭부(27)는 네트워크(30)를 통해 음악 파일의 오디오 특징값들을 특징데이터베이스(23)에 저장된 다수의 특징 데이터들과 매칭을 시도한다. The feature matching unit 27 tries to match audio feature values of the music file with a plurality of feature data stored in the feature database 23 through the network 30.

도 12는 본 발명의 일 실시예에 따라 특징매칭부(27)에서 음원 매칭을 수행하는 과정을 설명하는 흐름도이다. 우선, 매칭을 시도하는 음악 파일에 메타데이터가 존재하는 경우, 특징추출부는 해당 메타데이터를 디지털 컨텐츠 관리 서버(20)로 전송하고, 특징매칭부(27)는 메타데이터를 이용하여 매칭을 시도한다. 반면, 음악 파일에 메타데이터가 존재하지 않는 경우, 전술한 과정을 거쳐 해당 음원의 특징값들을 추출하여 네트워크(30)를 통해 디지털 컨텐츠 관리 서버(20)로 전송한다. 12 is a flowchart illustrating a process of performing sound source matching in the feature matching unit 27 according to an embodiment of the present invention. First, when metadata exists in a music file to be matched, the feature extractor transmits the metadata to the digital content management server 20, and the feature matcher 27 attempts to match using the metadata. . On the other hand, if the metadata does not exist in the music file, the feature values of the corresponding sound source are extracted through the above-described process and transmitted to the digital content management server 20 through the network 30.

도 12를 참조하면, 특징매칭부(27)는 추출한 특징 데이터를 데이터베이스에 저장된 특징 데이터들과 비교하여(S60), 특징 데이터들 간의 차이값을 합산한다(S61). 예를 들어, 현재 추출한 음원의 특징 데이터 80개와 데이터베이스에 저장된 다수의 음원들의 특징 데이터 80개를 비교하여 그 차이값을 합산한다. Referring to FIG. 12, the feature matching unit 27 compares the extracted feature data with feature data stored in a database (S60) and adds difference values between the feature data (S61). For example, 80 feature data of the currently extracted sound source and 80 feature data of a plurality of sound sources stored in the database are compared and the difference values are summed.

이하의 <표 3>은 음원 매칭의 판단 결과를 설명하기 위한 예시표이다. Table 3 below is an exemplary table for explaining a result of determination of sound source matching.

본 예에서, 사용자는 자신의 단말기에서 버즈의 "가시"를 업로드 선택하였다. 특징추출부은 사용자가 선택한 버즈의 "가시"의 시작부분 일부 데이터에서 특징값들을 추출하고, 이를 디지털 컨텐츠 관리 서버(20)로 전송하였다. 특징매칭부(27)가 이를 기초로 특징데이터베이스(23)에 저장된 다수의 음원의 특징벡터들과 매칭을 시도한 결과를 표로 나타낸 것이다. 위 표에서는 차이가 최소값을 갖는 음악 3곡이 리스팅 되었다. 곡명 옆에 있는 457.22, 2029.27, 2162.09는 매칭을 시도한 곡과 해당 곡들 간의 특징 데이터의 차이값을 합산한 값들이다. 음원 매칭 시, 위 값들은 소수점 첫째자리에서 반올림하여 사용한다. In this example, the user has uploaded a "visible" of buzz on his terminal. The feature extracting unit extracts feature values from some data at the beginning of the "visible" of the buzz selected by the user, and transmits the feature values to the digital content management server 20. The feature matching unit 27 shows a result of attempting matching with the feature vectors of the plurality of sound sources stored in the feature database 23 based on this. In the table above, three songs with the minimum difference are listed. Next to the song name, 457.22, 2029.27, and 2162.09 are the sums of the difference values of the feature data between the song and the corresponding song. When matching sound sources, the above values are used rounded off to one decimal place.

매칭을 시도하는 곡의 특징값과 특징데이터베이스(23)에 저장된 "가시"의 특징값과의 차이 합산값을 Value1 = 457, "거짓말"과의 차이값을 Value2 = 2029, "체념"과의 차이값을 Value3 = 2162이라 하고, DIS1 =(Value2 - Value1), DIS2 =(Value3 - Value2), DIS3 =abs(DIS2 - DIS1) 이라 하자.The difference between the feature value of the song to be matched and the feature value of "visible" stored in the feature database 23 is the difference between Value1 = 457 and the difference between "lie" Value2 = 2029 and the "consideration" Assume that the value is Value3 = 2162, DIS1 = (Value2-Value1), DIS2 = (Value3-Value2), and DIS3 = abs (DIS2-DIS1).

도 12를 참조하면, DIS1이 5000 보다 크면(S62), 매칭에 실패한 것으로 본다(S63). 이 경우 차이값이 너무 크므로 매칭되는 음악이 없는 것으로 판단한다. Referring to FIG. 12, when DIS1 is larger than 5000 (S62), it is considered that matching fails (S63). In this case, since the difference is too large, it is determined that no music is matched.

한편, DIS1이 100 이하인 경우(S64), 매칭에 성공한 것으로 본다(S65). 차이값이 100 이하인 경우, 거의 매칭된다고 볼 수 있으므로, 해당 곡을 일치하는 곡으로 판단한다.On the other hand, when DIS1 is 100 or less (S64), it is considered that the matching is successful (S65). If the difference is 100 or less, it can be considered that almost matched, it is determined that the song is a match.

그리고, DIS1이 100을 넘어가지만, DIS3이 200보다 큰 경우라면(S66), 매칭에 성공한 것으로 본다(S65).If DIS1 exceeds 100, but DIS3 is larger than 200 (S66), it is considered that the matching is successful (S65).

한편, DIS1이 0인 경우에도(S67), 매칭에 성공한 것으로 본다(S65). 이 경우는 흔히 곡이 리메이크되어 동일한 곡이 여러 곡 데이터베이스에 존재하는 경우에 발생할 수 있다. On the other hand, even when DIS1 is 0 (S67), the matching is considered to be successful (S65). This can often occur when a song is remade so that the same song exists in multiple song databases.

표 3에서 이 기준을 적용한다면, DIS3이 200보다 큰 경우에 해당하므로, 매칭에 성공한 경우에 해당한다. 즉, 매칭을 시도한 곡이 버즈의 가시로 결정된다. If this criterion is applied in Table 3, it corresponds to the case where DIS3 is greater than 200, and thus the case of successful matching. In other words, the song that attempted matching is determined as the visual of the buzz.

이에 따라, 컨텐츠 관리부(29)는 버즈의 '가시'에 관한 상세 정보를 사용자 단말기(10)에 제공하는 한편, 버즈의 '가시'를 개인데이터베이스(25)의 사용자의 보유 컨텐츠 목록에 추가한다. 차후, 사용자가 보유 컨텐츠 목록에서 버즈의 '가시'를 선택하여 재생 또는 다운로드 등을 선택하면, 컨텐츠 관리부(29)는 미디어저장부(23)에 저장된 버즈의 '가시'를 스트리밍 또는 다운로드 서비스한다. Accordingly, the content manager 29 provides detailed information regarding the 'visible' of the buzz to the user terminal 10, and adds the 'visible' of the buzz to the list of contents held by the user of the personal database 25. Subsequently, when the user selects a 'bounce' of the buzz from the content list to play or download the content, the content manager 29 streams or downloads the 'bounce' of the buzz stored in the media storage unit 23.

이하, 시작점을 검출하지 않고 음원매칭을 시도한 종래의 예와, 본 발명에 따라 시작점을 검출한 후 음원 매칭을 시도한 예를 비교하여 설명하기로 한다. Hereinafter, a conventional example in which sound source matching is attempted without detecting a starting point and an example in which sound source matching is attempted after detecting a starting point according to the present invention will be described.

도 13a 내지 13c는 시작점을 검출하지 않고 음원 매칭을 시도한 예로서, 13a는 기준이 되는 파일의 파형(곡명: 행복을 주는 사람, 가수: 해바라기), 13b는 비교가 되는 파일의 파형(곡명: 행복을 주는 사람, 가수: 해바라기), 13c는 매칭 그래프를 도시한다.13A to 13C are examples of attempting sound source matching without detecting a starting point, where 13a is a waveform of a file (song name: happy giver, mantissa: sunflower) as a reference, and 13b is a waveform (file name: happiness) of a file to be compared. Giver, singer: sunflower), 13c shows a matching graph.

도 13a와 13b를 통해 직관적으로 알 수 있듯이 두 파일의 시작점이 다름을 확인할 수 있다. 또한, 도 13c의 매칭 결과에서도 두 음원의 시작점이 달라 그래프가 엇갈려서, 기준 특징값들과 비교 특징값들의 차이의 합산값이 무려 2576.56에 다다름을 알 수 있다. 이렇듯 시작점을 검출하지 않고 매칭을 시도할 경우, 동일한 음원이라고 하더라도 많은 경우 매칭이 실패할 수 있음을 보여주는 단적인 예이다. As can be seen intuitively through FIGS. 13A and 13B, it can be seen that the starting points of the two files are different. In addition, even in the matching result of FIG. 13C, the graphs are different from each other because the starting points of the two sound sources are different, and the sum of the difference between the reference feature values and the comparison feature values is 2576.56. As such, when a matching is attempted without detecting a starting point, even if the same sound source is used, the matching may be failed in many cases.

도 14a 내지 14c는 본 발명에 따라 시작점을 검출한 후 음원 매칭을 시도한 예를 도시한다. 곡명은 도 13a 내지 13c와 동일하다. 14A to 14C illustrate examples of attempting sound source matching after detecting a starting point according to the present invention. The music name is the same as that of Figs. 13A to 13C.

도 14a 및 14b에 도시된 바와 같이, 매칭 대상이 되는 두 파일에서 시작점을 검출한 것을 확인할 수 있다. 이렇게 시작점을 검출한 후 시작점 이전의 데이터를 제거하고 나서 매칭을 시도한 결과는 도 14c에 도시되어 있다. As shown in FIGS. 14A and 14B, it can be seen that a starting point is detected in two files to be matched. The result of attempting matching after removing the data before the starting point after detecting the starting point is shown in FIG. 14C.

도 14c에서 알 수 있듯이 시작점을 검출하여 시간축의 엇갈림 현상을 제거하기 때문에 특징값들의 차이값이 488.58로 낮아짐을 확인할 수 있다As can be seen in FIG. 14C, the difference between the feature values is lowered to 488.58 since the starting point is detected to eliminate the staggering of the time axis.

이와 같이, 본 발명은 음악의 비트레이트, 포맷 등을 일정 형식으로 변환하고, 시작점을 검출하여 특징벡터를 추출함으로써, 매칭 성능이 향상될 수 있다. As described above, according to the present invention, matching performance can be improved by converting a bit rate, a format, etc. of music into a predetermined format, detecting a starting point, and extracting feature vectors.

표 4는 본 발명의 일 실시예에 따른 특징벡터 추출 및 매칭과정에 소요되는 시간에 관한 실험결과 데이터를 나타낸 것이다. Table 4 shows the experimental data about the time required for the feature vector extraction and matching process according to an embodiment of the present invention.

표 4에서, 시간은 소수점 둘째 자리까지 나타냈다. 위 표에서 알 수 있듯이, 비트 변환, 디코딩 및 특징벡터 추출에 시간이 약간 소요될 뿐 다른 과정에서는 거의 시간이 소요되지 않으며, 특징벡터를 추출하는데 총 1.21초의 짧은 시간이 소요됨을 확인할 수 있다. 또한, 평균 2~3초 안에 특징벡터 추출과 매칭이 완료됨을 실험결과에서 확인할 수 있었다. 이러한 처리시간은 종래의 DTW 방법에 비해 1/100 정도 단축된 시간이다. 또한, 실험결과 매칭 성공률이 97%에 이르는 것을 확인할 수 있었다. In Table 4, time was represented to two decimal places. As can be seen from the above table, bit conversion, decoding, and feature vector extraction take only a little time, and other processes take little time, and it can be seen that a short time of 1.21 seconds is required to extract the feature vector. In addition, the experimental results show that feature vector extraction and matching is completed within an average of 2-3 seconds. This processing time is about a hundred times shorter than the conventional DTW method. In addition, it was confirmed that the matching success rate reached 97%.

이와 같이, 전술한 특징 추출 및 매칭 기술은 종래에 비해 획기적으로 처리시간을 단축하는 효과를 가지고 높은 매칭 성공률을 갖기 때문에, 실시간 음원 매칭이 필요한 본원 발명의 디지털 컨텐츠 관리 시스템에 매우 유용하게 이용된다. As described above, the above-described feature extraction and matching technique has an effect of dramatically shortening the processing time and has a high matching success rate, which is very useful for the digital content management system of the present invention requiring real-time sound source matching.

이하, 본 발명의 제2 실시예에 따른 온라인을 통한 디지털 컨텐츠의 관리 시스템에 대해 설명하기로 한다. Hereinafter, a description will be given of a system for managing digital content through online according to a second embodiment of the present invention.

전술한 실시예에서는 특징추출부가 사용자 단말기(10)에 설치되는 것으로 설명하였으나, 본 발명의 제2 실시예에서는 디지털 컨텐츠 관리 서버(20)에 설치된다. 즉, 사용자 단말기(10)에는 데이터의 업로드를 위한 모듈이 설치될 뿐이고, 해당 모듈을 이용하여 사용자가 단말기에 저장된 자신의 컨텐츠의 업로드를 선택하면, 해당 컨텐츠가 디지털 컨텐츠 관리 서버(20)로 전송된다.In the above-described embodiment, the feature extraction unit has been described as being installed in the user terminal 10. However, in the second embodiment of the present invention, the feature extraction unit is installed in the digital content management server 20. FIG. That is, only a module for uploading data is installed in the user terminal 10, and when the user selects upload of his or her content stored in the terminal using the module, the corresponding content is transmitted to the digital content management server 20. do.

그리고, 디지털 컨텐츠 관리 서버(20)에 설치된 특징추출부에서 특징값을 추출하고, 특징매칭부에서 특징값 매칭을 수행하게 된다. 이외의 부분은 전술한 실시예와 유사하므로 이에 대한 구체적인 설명은 생략하기로 한다. The feature extractor installed in the digital content management server 20 extracts the feature value, and the feature matcher performs feature value matching. Other parts are similar to the above-described embodiment, so a detailed description thereof will be omitted.

이하, 본 발명의 제3 실시예에 따른 온라인을 통한 디지털 컨텐츠의 관리 시스템에 대해 설명하기로 한다. 본 발명의 제3 실시예에서는 특징추출부와 특징매칭부가 사용자 단말기(10)에 설치된다. 즉, 사용자가 자신의 컨텐츠의 업로드를 선택하면, 사용자 단말기(10)에 설치된 특징추출부에서 해당 컨텐츠의 특징값들을 추출하고, 특징매칭부가 디지털 컨텐츠 관리 서버(20)와 연동하여 특징값 매칭을 수행한다. 매칭 결과는 네트워크(30)를 통해 디지털 컨텐츠 관리 서버(20)로 전송된다. 이외의 부분은 전술한 실시예와 유사하므로 이에 대한 구체적인 설명은 생략하기로 한다. Hereinafter, a description will be given of a system for managing digital content through online according to a third embodiment of the present invention. In the third embodiment of the present invention, the feature extraction unit and the feature matching unit are installed in the user terminal 10. That is, when the user selects the upload of his or her content, the feature extractor installed in the user terminal 10 extracts the feature values of the corresponding content, and the feature matching unit works with the digital content management server 20 to perform feature value matching. Perform. The matching result is transmitted to the digital content management server 20 via the network 30. Other parts are similar to the above-described embodiment, so a detailed description thereof will be omitted.

전술한 전처리, 특징값 추출 및 음원 매칭 처리는 상술한 연산 및/또는 처리를 수행하기 위한 소프트웨어 알고리즘에 의해 구현 가능하며, 전처리 과정으로 다수의 처리단계들을 언급하였으나, 이들 단계 중에서 일부 과정은 생략될 수 있다. 또한, 전술한 실시예들에서 음원 매칭의 구체적인 기준을 제시하였으나, 그 적용 기준이 달라질 수 있음은 물론이다. The above-described preprocessing, feature value extraction and sound source matching processing can be implemented by a software algorithm for performing the above-described calculation and / or processing, and a plurality of processing steps are mentioned as the preprocessing process, but some of these steps will be omitted. Can be. In addition, although the specific criteria of sound source matching have been presented in the above-described embodiments, the application criteria may be changed.

또한, 전술한 실시예에서는 음악 파일을 예로 하여 설명하였으나, 오디오 데이터가 삽입된 영상 컨텐츠 등에도 본 발명이 적용될 수 있음은 물론이다. 또한, 영상 컨텐츠에 본 발명을 적용하는 경우, 전술한 실시예들과는 다른 기준이 적용될 수 있음이 통상의 기술자들에게는 자명하다. In addition, in the above-described embodiment, the music file has been described as an example, but the present invention can be applied to video content in which audio data is inserted. In addition, when the present invention is applied to the image content, it is obvious to those skilled in the art that different standards from the above-described embodiments may be applied.

비록 본 발명의 몇몇 실시예들이 도시되고 설명되었지만, 본 발명이 속하는 기술분야의 통상의 지식을 가진 당업자라면 본 발명의 원칙이나 정신에서 벗어나지 않으면서 본 실시예를 변형할 수 있음을 알 수 있을 것이다. 발명의 범위는 첨부된 청구항과 그 균등물에 의해 정해질 것이다.Although some embodiments of the invention have been shown and described, it will be apparent to those skilled in the art that modifications may be made to the embodiment without departing from the spirit or spirit of the invention. . It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.

도 1은 본 발명의 제1 실시예에 따른 온라인을 통한 디지털 컨텐츠 관리 시스템의 개략도;1 is a schematic diagram of a digital content management system via online according to a first embodiment of the present invention;

도 2는 도 1의 시스템의 제어흐름도; 2 is a control flow diagram of the system of FIG.

도 3은 도 1의 시스템에서 디지털 컨텐츠의 스트리밍 서비스를 제공하는 방법의 개략도;3 is a schematic diagram of a method for providing a streaming service of digital content in the system of FIG. 1;

도 4는 본 발명의 일 실시예에 따라 디지털 컨텐츠에서 오디오 데이터의 특징 벡터를 추출하는 방법의 흐름도;4 is a flowchart of a method for extracting feature vectors of audio data from digital content according to an embodiment of the present invention;

도 5는 도 4의 처리과정에 따른 파형의 예를 도시한 것;5 shows an example of a waveform according to the process of FIG. 4;

도 6은 본 발명의 일 실시예에 따른 시작점 검출에 사용되는 최소 파형 모형을 설명하기 위한 그림;6 is a diagram for explaining a minimum waveform model used for starting point detection according to an embodiment of the present invention;

도 7a 내지 도 7d는 디지털 컨텐츠의 오디오 데이터에 포함된 노이즈 파형의 특징을 도시한 것이고;7A-7D illustrate features of noise waveforms included in audio data of digital content;

도 8은 본 발명의 일 실시예에 따라 시작점을 검출하는 방법에 관한 흐름도;8 is a flowchart of a method for detecting a starting point according to an embodiment of the present invention;

도 9는 최소 파형 모형을 이루는 샘플의 개수에 따른 성능 측정 그래프의 예;9 is an example of a performance measurement graph according to the number of samples forming a minimum waveform model;

도 10은 음악 장르에 따른 성능 측정 그래프의 예;10 is an example of a performance measurement graph according to a music genre;

도 11은 본 발명의 일 실시예에 따라 특징 데이터를 산출하는 방법의 흐름도;11 is a flowchart of a method for calculating feature data in accordance with an embodiment of the present invention;

도 12는 본 발명의 일 실시예에 따라 음원을 매칭하는 방법의 흐름도;12 is a flowchart of a method of matching a sound source according to an embodiment of the present invention;

도 13a 내지 13c는 시작점을 검출하지 않고 음원 매칭을 시도한 예;13A to 13C are examples of sound source matching attempted without detecting a starting point;

도 14a 내지 14c는 본 발명에 따라 시작점을 검출한 후 음원 매칭을 시도한 예를 도시한다. 14A to 14C illustrate examples of attempting sound source matching after detecting a starting point according to the present invention.

Claims

In the digital content management system online,

A media storage unit storing digital content;

A feature database storing audio feature values for the digital content;

A personal database storing a personal possession list of digital contents held by each user;

A feature extractor for extracting feature values from audio data of the digital content uploaded and selected by the user;

A feature matching unit which searches the matched digital content by comparing the extracted feature values with the audio feature values stored in the feature database; And

And a content manager to provide detailed information on the matched digital content to a user terminal and to include it in the personal possession list of the user.

The method of claim 1,

And a digital content management server interworking with the user terminal through the network.

The media storage unit, the feature database, the personal database, and the content management unit is provided in the digital content management server,

The feature extractor and the feature matcher are provided in any one of the user terminal and the digital content management server.

The method according to claim 1 or 2,

The detailed information of the digital content provided to the user terminal includes at least two of a title, artist name, producer, album name, play time, copyright holder, album image, lyrics.

The method according to claim 1 or 2,

The content management unit extracts and streams or transmits digital content corresponding to the request from the feature database in response to a user's request to play or download the digital content included in the personal possession list.

And digital content transmitted in response to the download request normally includes a file name and attribute information.

The method of claim 4, wherein

When the matching fails, the content management unit receives the digital content uploaded and selected by the user and constructs a database into a database.

The method of claim 1,

The feature extracting unit changes and decodes some data corresponding to a predetermined playback time from the start time of reproduction to the predetermined playback time in the audio data, converts the decoded data into a mono format, and converts the data into the mono format. And processing a starting point at which a sound starts after normalizing the A, and extracting a feature vector from the data after the searched starting point.

The method of claim 6,

The feature extractor checks whether or not the minimum waveform model constituting the sound exists in the unit of normalized data in units of frames, and detects a predetermined number or more of the frames in which the minimum waveform model exists continuously. The first minimum waveform model of the extracted frames determines the detected position as a starting point;

The minimum waveform model is a digital content management system through the on-line, characterized in that the number of samples having a predetermined value or more energy.

The method of claim 7, wherein

The feature extractor adds the extracted feature vector to a predetermined number of frames to calculate sum data, calculates difference values between the sum data, and outputs the sum values as the feature values. .