KR20190104671A

KR20190104671A - The content based clean cloud systems and method

Info

Publication number: KR20190104671A
Application number: KR1020180025035A
Authority: KR
Inventors: 이성섭
Original assignee: (주)미래기술
Priority date: 2018-03-02
Filing date: 2018-03-02
Publication date: 2019-09-11
Also published as: KR102141411B1

Abstract

The present invention relates to a content-based clean cloud management system, and to a method thereof. In a cloud storage-based service system, by checking whether content stored in a mass storage is duplicated and removing duplicated files, the content-based clean cloud management system can improve the efficiency of a storage space and provide a high quality video service, thereby providing a suitable service for discriminating copyright in accordance with distributed search rules.

Description

The content based clean cloud systems and method

본 발명은 콘텐츠 기반 클린 클라우드 관리 시스템 및 방법에 관한 것으로서, 대용량 스토리지에 저장된 콘텐츠의 중복 여부를 확인하고 중복된 파일을 제거함으로써 클라우드 스토리지 기반 서비스 시스템에서 저장 공간의 효율성을 향상시키고 고품질 동영상 서비스를 제공할 수 있고 분산된 검색 규칙에 따라 저작권 유무를 판별하는데 적합한 서비스를 제공할 수 있도록 하는 콘텐츠 기반 클린 클라우드 관리 시스템 및 그 방법에 관한 것이다.The present invention relates to a content-based clean cloud management system and method, which checks whether a content stored in a mass storage is duplicated and removes duplicate files, thereby improving storage space efficiency and providing high quality video service in a cloud storage based service system. The present invention relates to a content-based clean cloud management system and a method for providing a service suitable for determining copyright existence based on distributed search rules.

네트워크를 통하여 이미지뿐만 아니라 상대적으로 용량이 큰 동영상 콘텐츠를 전송, 저장, 검색하는 서비스가 증가하고 있으며, 이에 따라 동영상 콘텐츠와 관련된 서비스를 제공하는 사업자는 방대한 용량의 콘텐츠를 저장하고 유지, 관리하게 되었다.Increasingly, services that transmit, store, and search not only images but also large-capacity video content through the network are increasing. Accordingly, service providers who provide services related to video content store, maintain, and manage vast amounts of content. .

또한, 저장되는 콘텐츠가 기하급수적으로 증가함에 따라 서비스 사업자는 대용량의 콘텐츠를 스토리지에 효율적으로 저장하고 검색할 수 있는 방법이 필요하게 되었으며, 효율적인 콘텐츠 관리를 위해서 중복 파일을 검출하고 그 품질에 따라 동일 내용의 저화질 콘텐츠는 제거하여 사용자에게는 고화질 콘텐츠만 제공할 필요가 있다.In addition, as the stored content increases exponentially, service providers need a method of efficiently storing and retrieving a large amount of content in the storage.Duplicate files are detected and the same according to their quality for efficient content management. It is necessary to remove the low quality content of the content and provide only high quality content to the user.

그러나, 중복 콘텐츠를 제거하기 위한 다양한 방법들이 시도되고 있으나 정확도와 시간 등의 측면에서 효율성이 떨어지며, 특히 품질에 따라 동일 내용의 중복 콘텐츠를 관리할 수 있는 기술이 절실히 요구되고 있다.However, various methods for removing duplicate content have been tried, but efficiency is low in terms of accuracy and time. In particular, there is an urgent need for a technology for managing duplicate content of the same content according to quality.

또한, 통신 환경과 디지털 기기의 발달은 디지털 콘텐츠의 폭발적 수요를 창출하는 MP3, Divx 등으로 대표되는 영화, 음악 등의 콘텐츠 대다수는 OSP(online service provider)를 통해 불법적으로 공유되고 있으며, 이로 인해 저작권 침해 사례가 크게 증가하고 있으며, 웹2.0 시대의 도래와 함께 디지털 콘텐츠를 단순히 소비만 했던 사용자는 UCC(user created content, 이하 'UCC'라 함)를 통해 능동적인 생산자로 거듭나고 있고, 이 과정에서 기존의 저작물인 영화, 드라마, 음악 등을 편집하여 UCC로 제작하는 등의 저작권 침해가 가속되고 있다. In addition, the development of the communication environment and digital devices, the majority of movies, music, etc. represented by MP3, Divx, etc., which creates an explosive demand for digital content is illegally shared through the online service provider (OSP), and thus copyright Cases of infringement have increased significantly, and with the advent of the Web 2.0 era, users who have simply consumed digital content are being reborn as active producers through user created content (UCC). Copyright infringement is accelerating, such as editing existing works such as movies, dramas, and music, and producing them with UCC.

상기와 같은 문제점을 극복하기 위해, 다양한 기술들이 개발되었으며, 상기와 같은 종래의 기술로는 대한민국 공개특허공보 제10-2016-0113879호 (영상 품질 기반 중복 파일 관리 시스템 및 방법, 이하 '선행기술'이라 함)이 있다.In order to overcome the above problems, various technologies have been developed, and the conventional technology as described above is Korean Patent Publication No. 10-2016-0113879 (image quality based duplicate file management system and method, hereinafter 'prior art' Is called).

상기 선행기술은 핑거프린팅 기술을 이용하여 중복 콘텐츠 검출의 정확도를 향상시키고 동영상 파일의 핑거프린트를 이용하여 파일 스토리지에 저장된 동영상 파일 중 중복된 동영상 파일을 검출하는 중복파일검출부; 인간이 지각하는 영상 품질 기준에 부합하게 상기 검출된 동영상 파일의 영상 품질을 측정하는 영상품질측정부; 및 상기 측정된 영상 품질에 기초하여 상기 검출된 동영상 파일 중 일부를 다른 스토리지로 이동시키거나 삭제하는 중복파일관리부를 포함하는 영상 품질 기반 중복 파일 관리 시스템을 제공하여 중복 파일들 간의 영상 화질을 비교하여 화질 결과에 따라 중복된 저화질 콘텐츠를 삭제함으로써, 스토리지 공간을 효율적으로 사용하고 사용자에게는 고화질의 콘텐츠만을 제공할 수 있도록 하는 영상 품질에 기반한 중복 파일 관리 시스템 및 방법이다.The prior art includes a duplicate file detection unit that improves the accuracy of detecting duplicate content by using a fingerprinting technique and detects a duplicate video file among video files stored in the file storage using a fingerprint of the video file; An image quality measuring unit measuring the image quality of the detected moving image file in accordance with an image quality standard perceived by a human; And a duplicate file management system including a duplicate file manager to move or delete some of the detected moving image files to another storage based on the measured image quality to compare image quality between duplicate files. It is a redundant file management system and method based on video quality that deletes duplicate low quality content according to the quality result, thereby efficiently using storage space and providing only high quality content to a user.

그러나 상기 선행기술로는 가속화 되고 있는 저작권 침해를 방지할 수 없고 저작권 보호를 위한 기술적인 보호 조치가 필요한 실정이다. 특정 콘텐츠에 대한 저작권 유무를 판별하는데에 어려움이 있다.However, the prior art cannot prevent the infringement of copyrights and technical protection measures for copyright protection are required. There is a difficulty in determining whether or not the copyright for a specific content.

대한민국 공개특허공보 제10-2016-0113879호(영상 품질 기반 중복 파일 관리 시스템 및 방법)Republic of Korea Patent Publication No. 10-2016-0113879 (Image Quality Based Duplicate File Management System and Method) 대한민국 공개특허공보 제10-2017-0063077호(미디어 콘텐츠 식별 방법)Republic of Korea Patent Publication No. 10-2017-0063077 (media content identification method)

본 발명은 전술한 문제점을 해결하기 위하여, 핑거프린팅 기술을 이용하여 중복 콘텐츠 검출의 정확도를 향상시키고 중복 파일들 간의 영상 화질을 비교하여 화질 결과에 따라 중복된 저화질 콘텐츠를 삭제함으로써, 스토리지 공간을 효율적으로 사용하고 사용자에게는 고화질의 콘텐츠만을 제공할 수 있도록 하는 영상 품질에 기반한 콘텐츠 기반 클린 클라우드 관리 시스템 및 그 방법을 제공하는 것을 목적으로 한다.In order to solve the above problems, the present invention improves the accuracy of detecting duplicated contents using fingerprinting technique, compares the image quality between duplicate files, and deletes the duplicated low quality contents according to the image quality result, thereby saving storage space. The purpose of the present invention is to provide a content-based clean cloud management system and method based on image quality, which can be used as a user and provide only high quality content to a user.

또한, 필터링을 위한 데이터베이스를 우선 작업 노드와 일반 작업 노드로 분산시켜 디지털 콘텐츠의 저작권 검색 시 우선 순위에 따라 필터링함으로써, 필터링 댓아 콘텐츠의 증가에 따른 필터링의 효율성을 향상시킬 수 있는 콘텐츠 기반 클린 클라우드 관리 시스템 및 그 방법을 제공하는 것을 목적으로 한다.In addition, by distributing the database for filtering into priority work nodes and general work nodes, the content-based clean cloud management can improve the efficiency of filtering according to the increase of content for filtering by filtering according to the priority of copyright retrieval of digital content. It is an object to provide a system and a method thereof.

상기와 같은 목적을 해결하기 위해 본 발명은 동영상 파일의 핑거프린트를 이용하여 파일 스토리지에 저장된 동영상 파일 중 중복된 동영상 파일을 검출하는 중복파일검출부; 영상 품질 측정 기준을 생성하고 상기 생성된 영상 품질 측정 기준에 따라 상기 검출된 동영상 파일의 영상 품질을 측정하는 영상품질측정부; 상기 측정된 영상 품질에 기초하여 상기 검출된 동영상 파일 중 일부를 다른 스토리지로 이동시키거나 삭제하는 중복파일관리부; 및 디지털 콘텐츠의 저작권 유무에 따른 필터링을 선택 수행하는 분산 필터링부;를 포함한다.In order to solve the above object, the present invention provides a duplicate file detection unit for detecting a duplicate video file of the video files stored in the file storage using the fingerprint of the video file; An image quality measuring unit configured to generate an image quality measurement standard and measure image quality of the detected video file according to the generated image quality measurement standard; A duplicate file manager to move or delete some of the detected video files to another storage based on the measured image quality; And a distributed filtering unit for performing filtering according to the copyright of the digital content.

분산 필터링부는 검색 서버;와 우선작업노드모듈, 일반 작업 노드 모듈 및 필터링 정보 데이터베이스를 포함하되, 상기 검색 서버는, 상기 다수의 우선 작업 노드용 데이터 베이스 중에서 상대적으로 높은 우선 순위의 우선 작업 노드용 데이터베이스를 이용하여 선택적으로 필터링한 후, 상대적으로 낮은 우선 순위의 우선 작업 노드용 데이터베이스를 이용하여 선택적으로 필터링하도록 제어하고 필터링된 결과는 검색 서버로 전달하는 것이다.The distributed filtering unit may include a search server, a priority work node module, a general work node module, and a filtering information database, wherein the search server is a database for the priority work node having a higher priority among the plurality of priority work node databases. After selectively filtering using, control is controlled to selectively filter using a database for a lower priority task node, and the filtered result is transmitted to a search server.

중복 파일 관리부는 상기 측정된 영상 품질이 가장 높고 저작권 침해 유무가 없는 동영상 파일의 핑거프린트를 핑거프린트 데이터베이스에 저장하고, 상기 검출된 동영상 파일 중 상기 측정된 영상 품질이 가장 높은 동영상 파일을 제외하고, 저작권 침해 유무가 있는 동영상 파일을 삭제하는 것이다.The duplicate file manager stores the fingerprint of the video file having the highest measured video quality and no copyright infringement in a fingerprint database, and excluding the video file having the highest measured video quality among the detected video files. It is to delete video files that have copyright infringement.

파일 스토리지에 저장된 동영상 파일 중 초고화질 비디오, 이미지를 획득하는 단계(S400); 상기 획득된 비디오, 이미지를 대상으로 고정된 크기의 밝기 성분값으로 변환된 이미지를 생성하는 단계(S410); 상기 변환된 비디오, 이미지를 N개의 블록으로 분할하고 분할된 블록 중 선명한 윤곽선을 포함하는 블록으로 추출하는 단계(S420); 상기 선별된 블록별로 영상 품질 측정을 위한 특징 벡터를 추출하고 추출된 특징 벡터들의 평균과 분산을 계산하는 단계(S430); 상기 계산된 특징 벡터들의 평균과 분산을 영상 품질 측정을 위한 기준으로 생성하는 단계(S440); 상기 생성된 영상 품질 측정 기준을 저장하고 중복된 동영상 파일의 영상품질을 측정하는데 활용하는 단계(S450)를 포함하되, 상기 영상 품질을 측정 후 검색 서버를 통해 저작권 유무에 따른 필터링이 요청되었는 지를 확인하는 단계(S460); 상기 저작권 유무에 따른 필터링의 요청이 확인되면 우선 작업 노드에 대한 순차 검색을 수행하는 단계(S470); 상기 순차 검색 수행 중 필터링 유무를 순차 판별하는 단계(S480); 상기 해당 콘텐츠에 저작권의 유무 여부를 판단하는 단계(S490); 상기 해당 콘텐츠에 저작권의 유무가 있는 경우 우선 작업 노드에서 필터링을 수행한 후 그 필터링 결과 정보를 검색 서버로 전달하는 단계(S500);를 포함한다.Acquiring an ultra-high definition video or image among the moving image files stored in the file storage (S400); Generating an image converted into a brightness component value having a fixed size based on the obtained video and image (S410); Dividing the converted video and the image into N blocks and extracting the converted video and the image into blocks including sharp outlines among the divided blocks (S420); Extracting a feature vector for measuring image quality for each of the selected blocks and calculating an average and a variance of the extracted feature vectors (S430); Generating average and variance of the calculated feature vectors as a reference for measuring image quality (S440); And storing the generated image quality measurement criteria and using the same to measure the image quality of the duplicate video file (S450). After measuring the image quality, it is checked whether filtering based on the existence of copyright is requested through a search server. Step (S460); Performing a sequential search for a work node when the request for filtering according to the copyright is confirmed (S470); Sequentially determining whether filtering is performed during the sequential search (S480); Determining whether or not a copyright exists in the corresponding content (S490); And if there is a copyright in the corresponding content, first performing filtering at the work node, and then transmitting the filtering result information to the search server (S500).

상기 해당 콘텐츠에 저작권의 유무가 없는 경우 일반 작업 노드에 대한 검색을 수행하는 단계(S510); 상기 일반 작업 노드에서 검색 결과를 분석하여 해당 콘텐츠에 저작권의 유무 여부를 판단하는 단계(S520); 상기 판단 결과 해당 콘텐츠에 저작권의 유무가 있는 경우 우선 작업 노드에서 필터링을 수행한 후 그 필터링 결과 정보를 검색 서버로 전달하는 단계(S500);를 더 포함할 수 있는 것이다.Performing a search for a general task node when there is no copyright in the corresponding content (S510); Analyzing the search result in the general work node to determine whether copyright exists in the corresponding content (S520); If there is a copyright in the corresponding content as a result of the determination, first performing filtering in the work node, and then transmitting the filtering result information to the search server (S500).

본 발명에 따르면, 대용량 스토리지 내의 중복된 동영상 파일들을 핑거프린팅 기술을 이용하여 중복 여부를 확인하고 관리함으로써 스토리지 운영의 효율을 향상시킬 수 있도록 하여, 중복된 동영상 콘텐츠들의 품질 측정 결과에 따라 고품질 동영상 콘텐츠만으로 데이터베이스를 구성함으로써 서비스 품질의 향상이 가능하며 저작권보호기술을 통해 내용기반의 대용량 동영상 콘텐츠 클라우드 서비스의 스토리지 비용이 절감되는 효과가 있다.According to the present invention, it is possible to improve the efficiency of the storage operation by identifying and managing the duplicate video files in the mass storage by using fingerprinting technology, high quality video content according to the quality measurement results of the duplicate video content It is possible to improve the service quality by constructing the database alone, and the copyright protection technology can reduce the storage cost of the content-based large-capacity video content cloud service.

또한, 필터링 대상 정보에 따라 분산 규칙을 통해 생성된 다수의 우선작업 노드용 데이터베이스를 이용하여 콘텐츠 저작권 유무에 따른 선택적인 필터링 유무를 판단 및 필터링하는 과정을 우선적으로 수행하여 필터링 대상 콘텐츠가 증가할 경우 일반 작업 노드로의 검색량을 일차적으로 차단하여 필터링 응답 속도를 향상 시킬 수 있는 효과가 있다.In addition, when the content to be filtered increases by first performing a process of determining and filtering selective filtering according to the content copyright by using a database for a plurality of priority work nodes generated through distribution rules according to the filtering target information. The filtering response speed can be improved by first blocking the search volume to general work nodes.

또한, 클라우드 기반의 솔루션을 제공하는 것으로 해외 동영상 콘텐츠 유통서비스 적용이 가능한 효과가 있다.In addition, by providing a cloud-based solution can be applied to overseas video content distribution services.

도 1은 본 발명의 일실시예에 따른 콘텐츠 기반 클린 클라우드 관리 시스템의 구조를 도시한 블록도이다.
도 2 내지 도 3은 본 발명의 일실시예에 따른 콘텐츠 기반 클린 클라우드 관리 시스템의 관리 방법 과정을 도시한 도면이다.
1 is a block diagram showing the structure of a content-based clean cloud management system according to an embodiment of the present invention.
2 to 3 are diagrams illustrating a method of managing a content-based clean cloud management system according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art may easily implement the present invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention.

이에 본 발명의 실시 예에 따른 콘텐츠 기반 클린 클라우드 관리 시스템 및 그 방법에 대하여 설명한다. Therefore, a content-based clean cloud management system and method thereof according to an embodiment of the present invention will be described.

도 1은 본 발명의 일실시예에 따른 콘텐츠 기반 클린 클라우드 관리 시스템 및 그 방법의 구조를 도시한 도면이다.1 is a diagram showing the structure of a content-based clean cloud management system and method according to an embodiment of the present invention.

도 2 내지 도 3은 본 발명의 일실시예에 따른 콘텐츠 기반 클린 클라우드 관리 시스템의 관리 방법 과정을 도시한 도면이다.2 to 3 are diagrams illustrating a method of managing a content-based clean cloud management system according to an embodiment of the present invention.

도 1을 참조하면 중복 파일 관리 시스템(100)은 중복 파일 검출부(110), 영상 품질 측정부(120), 중복 파일 관리부(130) 및 분산 필터링부(140)를 포함하며, 파일 스토리지(200)는 동영상 파일이 저장된 스토리지를 의미하고 핑거프린트DB(300)는 파일 스토리지(200)에 저장된 동영상 파일의 핑거프린트가 저장된 데이터베이스를 의미한다.Referring to FIG. 1, the duplicate file management system 100 includes a duplicate file detector 110, an image quality measurer 120, a duplicate file manager 130, and a distributed filtering unit 140, and a file storage 200. Denotes a storage in which a video file is stored and the fingerprint DB 300 refers to a database in which a fingerprint of a video file stored in the file storage 200 is stored.

중복 파일 검출부(110)는 파일 스토리지(200)에 저장된 동영상 파일에 대해 해당 파일과 동일한 동영상 파일이 존재하는지 여부를 검출한다.The duplicate file detector 110 detects whether a video file identical to the corresponding file exists for the video file stored in the file storage 200.

여기서 중복 파일이란 동영상 콘텐츠의 파일 크기, 이름 등의 메타 정보가 동일한 파일뿐만 아니라 메타 정보가 달라도 내용이 동일한 동영상 파일을 의미한다.In this case, the duplicate file refers to a video file having the same meta information such as file size and name of the video content as well as the same file even if the meta information is different.

중복 파일 검출부(110)는 핑거프린트 추출기(111)와 중복 파일 검색기(112)를 포함하며, 파일 스토리지(200)와 연결됨에 따라 중복 파일을 검출할 수도 있고 파일 스토리지(200)에 새로운 파일이 추가되는 것이 확인되면 중복 파일의 검출을 수행할 수도 있다.The duplicate file detector 110 includes a fingerprint extractor 111 and a duplicate file finder 112, and may be connected to the file storage 200 to detect a duplicate file or add a new file to the file storage 200. If it is confirmed that the detection of duplicate files may be performed.

핑거프린트 추출기(111)는 파일 스토리지(200)에 저장된 동영상 파일 중 중복 파일의 관리가 요구되는 동영상 파일의 다중 구간에서 핑거프린트를 추출하고 중복 파일 검색기(112)로 전달한다.The fingerprint extractor 111 extracts a fingerprint from multiple sections of the video file requiring the management of the duplicate file among the video files stored in the file storage 200 and transmits the fingerprint to the duplicate file finder 112.

중복 파일 검색기(112)는 핑거프린트 추출기(111)로부터 전달받은 핑거프린트와 핑거프린트 DB(300)에 저장된 핑거프린트를 비교하여 파일 스토리지(200)에 중복된 동영상 파일이 존재하는지 여부를 검색한다.The duplicate file finder 112 compares the fingerprint received from the fingerprint extractor 111 with the fingerprint stored in the fingerprint DB 300 to detect whether a duplicate video file exists in the file storage 200.

중복 파일 검출을 위해 사용되는 핑거프린팅 알고리즘에 제한은 없으나, 본 발명의 일실시예에서는 동영상 파일에 포함된 오디오 신호를 핑거프린트로 이용하는 방법을 사용하며, 이러한 방법은 비디오 신호를 핑거프린트로 사용하는 방법에 비하여 처리속도와 정확성 측면에서 우수한 이점을 제공한다.There is no limitation on the fingerprinting algorithm used to detect duplicate files, but an embodiment of the present invention uses a method of using an audio signal included in a video file as a fingerprint, and the method uses a video signal as a fingerprint. It offers superior advantages in processing speed and accuracy over the method.

중복 파일 검색기(112)는 동영상 파일과 중복된 파일이 검색되면 중복된 동영상 파일에 대한 정보를 영상 품질 측정부(120)로 전달한다.The duplicate file finder 112 transmits the information about the duplicate video file to the image quality measuring unit 120 when a duplicate file is detected.

이때 중복된 파일이 검색되지 않으면 해당 동영상 파일에 대한 정보를 중복 파일 관리부(130)로 전송할 수 있다.In this case, if a duplicate file is not found, information about the corresponding video file may be transmitted to the duplicate file manager 130.

영상 품질 측정부(120)는 중복 파일 검출부(110)로부터 중복된 동영상 파일에 대한 정보를 전달받으면, 중복된 파일들의 영상 품질을 측정하고 측정된 결과를 출력한다. When the image quality measuring unit 120 receives information on the duplicate video file from the duplicate file detecting unit 110, the image quality measuring unit 120 measures the image quality of the duplicate files and outputs the measured result.

구체적으로, 영상 품질 측정부(120)는 파일 스토리지(200)에 저장된 동영상 파일을 기반으로 영상 품질, 측정기준을 생성하고 생성된 영상 품질 측정 기준에 따라 동영상 파일의 품질을 측정하도록 하는 것이다.In detail, the image quality measuring unit 120 generates image quality and measurement criteria based on the video file stored in the file storage 200 and measures the quality of the video file according to the generated image quality measurement criteria.

또한, 중복 파일 관리부(130)로 전송된 동영상 파일은 저작권의 침해 유무를 판단하기 위하여 분산 필터링부(140)로 전달한다.In addition, the video file transmitted to the duplicate file manager 130 is transmitted to the distributed filtering unit 140 to determine whether the copyright infringement.

분산 필터링부(140)는 검색 서버(142), 우선 작업 노드 모듈(144), 일반 작업 노드 모듈(146), 필터링 정보 데이터베이스(148) 등을 포함할 수 있도록 한다.The distributed filtering unit 140 may include a search server 142, a work node module 144, a general work node module 146, a filtering information database 148, and the like.

검색 서버(142)는, 상기 다수의 우선 작업 노드용 데이터 베이스 중에서 상대적으로 높은 우선 순위의 우선 작업 노드용 데이터베이스를 이용하여 선택적으로 필터링한 후, 상대적으로 낮은 우선 순위의 우선 작업 노드용 데이터베이스를 이용하여 선택적으로 필터링하도록 제어하고 필터링된 결과는 검색 서버로 전달한다.The search server 142 selectively filters the database for the priority work node having a higher priority among the plurality of priority work node databases, and then uses the database for the priority work node having a lower priority. To selectively filter and forward the filtered results to the search server.

상기 필터링 정보 데이터베이스(148)는 파일 정보, 해쉬 정보, 메타 정보, 특징 정보를 포함한다.The filtering information database 148 includes file information, hash information, meta information, and feature information.

분산 필터링부(140)는 디지털 콘텐츠의 저작권 유무에 따른 필터링을 선택 수행하는 장치로서, 검색서버(142)는 필터링 정보 데이터베이스(148)를 기반으로 분산 규칙에 따른 다수의 우선 작업 노드용 데이터베이스(DB)와 다수의 일반 작업 노드용 데이터베이스(DB')를 분산시켜 구축하고, 검색서버(142)로부터 저작권 유무에 따른 필터링이 요청되면, 검색 규칙에 따라 우선 작업 노드 모듈(144)와 일반 작업 노드 모듈(146)를 이용하여 검색, 필터링 유무 판별 및 저작권 유무 체크, 필터링 등의 프로세스를 수행하도록 제어하며, 이에 따라 우선 작업 노드 모듈(144)는 다수의 우선 작업 노드용 데이터베이스(DB)에 따라 우선 작업 노드에 대한 순차 검색을 수행하며, 순차 검색 수행 중 필터링 유무를 순차 판별하여 저작권이 있는 경우 해당 콘텐츠의 필터링을 수행한 후에 그 결과 정보를 검색 서버(142)로 전달한다.The distributed filtering unit 140 is a device for selecting filtering according to the copyright of digital content, and the search server 142 is based on the filtering information database 148. ) And a plurality of general work node databases (DB ') are distributed, and if a request for filtering based on copyright is requested from the search server 142, the work node module 144 and the general work node module are first applied according to a search rule. 146 controls to perform a process such as searching, filtering presence determination, copyright check, filtering, and the like, and accordingly, the priority work node module 144 performs priority work according to the database for a number of priority work nodes. Performs a sequential search on the node, and if there is a copyright by sequentially determining whether filtering is performed during the sequential search, filtering the corresponding content. After that, the result is transmitted to the search server 142.

또한, 분산 필터링부(140)의 일반 작업 노드 모듈(146)은 우선 작업 노드에 대한 순차 검색을 통해 저작권 유무를 판별하지 못한 경우 다수의 일반 작업 노드용 데이터베이스(DB')에 따라 일반 작업 노드에 대한 검색을 수행하고, 검색 결과를 분석하여 저작권이 있는 경우 해당 콘텐츠의 필터링을 수행한 후에 그 결과 정보를 검색서버(142)로 전달하며, 저작권이 없는 경우 이에 대응하는 결과 정보를 검색서버(142)로 전달하는 것이다.In addition, if the general work node module 146 of the distributed filtering unit 140 does not determine whether copyright exists through a sequential search for the work node, the general work node module 146 may be assigned to the general work node according to the database for the plurality of general work nodes (DB ′). Search for, analyze the search results, and if the copyright is copyrighted and then filter the corresponding content, the result information is transmitted to the search server 142, if there is no copyright, the corresponding search result information is returned to the search server 142 Will be delivered.

상기 검색서버(142)로 전달되는 저작권의 유무는 중복 파일 관리부(130)로 해당 파일들이 파일 스토리지(200)에 저장된 동일 내용의 파일 중 가장 고화질 파일에 저작권 침해 유무가 없는 클린 파일에 해당하므로 해당 파일의 핑거프린트를 핑거프린트 DB(300)에 저장하고 관리한다.The presence or absence of copyright transmitted to the search server 142 is a duplicate file manager 130 so that the files correspond to a clean file having no copyright infringement on the highest definition file among the files of the same contents stored in the file storage 200. The fingerprint of the file is stored and managed in the fingerprint DB 300.

상기 중복 파일 관리부(130)는 측정된 영상 품질이 가장 높고 저작권 침해 유무가 없는 동영상 파일의 핑거프린트를 핑거프린트 DB(300)에 저장하고, 상기 검출된 동영상 파일 중 상기 측정된 영상 품질이 가장 높은 동영상 파일을 제외하고, 저작권 침해 유무가 있는 동영상 파일을 삭제하는 것이다.The duplicate file manager 130 stores the fingerprint of the video file having the highest measured image quality and no copyright infringement in the fingerprint DB 300, and has the highest measured image quality among the detected video files. Except for video files, video files with copyright infringement are deleted.

한편, 도 2 내지 도 3을 참조하면, 영상 품질 측정부(120)는 파일 스토리지(200)에 저장된 동영상 파일 중 초고화질 비디오나 이미지(예컨대, 1080p 또는 2K급 이상의 해상도 영상)를 획득하고(S400), 획득된 이미지를 대상으로 고정된 크기의 밝기 성분값으로 변환된 이미지를 생성한다(S410). Meanwhile, referring to FIGS. 2 to 3, the image quality measuring unit 120 acquires an ultra high definition video or an image (eg, a 1080p or 2K or higher resolution image) from a video file stored in the file storage 200 (S400). In operation S410, an image converted into brightness component values having a fixed size is generated for the acquired image.

본 발명의 실시예에서는 고정된 크기(1920*1080)의 고화질 이미지 200장을 사용하였으나 크기와 사용된 이미지의 수는 변경이 가능하다.In the exemplary embodiment of the present invention, 200 high quality images having a fixed size (1920 * 1080) are used, but the size and the number of used images can be changed.

영상 품질 측정부(120)는 변환된 이미지를 N개의 블록으로 분할하고, 분할된 블록 중 선명한 윤곽선을 포함하는 블록들만 선별한다(S420).The image quality measuring unit 120 divides the converted image into N blocks, and selects only the blocks including the sharp outlines among the divided blocks (S420).

선명한 윤곽선을 포함하는 블록들을 선별하는 과정은, 먼저 변환된 이미지 내에서 각 블록 내 화소값들의 분산값이 가장 높은 블록을 검색하고, 검색된 블록의 분산값을 기준으로 일정 수준 이상(예컨대, 가장 높은 분산값의 75% 이상)인 블록들을 선명한 윤곽선을 포함하는 블록으로 선별한다.In the process of selecting blocks including sharp outlines, first, a block having the highest variance of pixel values in each block in the converted image is searched for, and a predetermined level or more (eg, the highest) is determined based on the variance of the retrieved block. Blocks that are greater than or equal to 75% of the variance value are selected with blocks that contain clear outlines.

그리고 선별된 블록별로 영상 품질 측정을 위한 특징 벡터를 추출하고 추출된 특징 벡터들의 평균과 분산을 계산한다(S430). 이때 사용되는 특징값은 이미지 화소 단위의 정규화된 윤곽선 정보들을 이용한다.The feature vector for image quality measurement is extracted for each selected block, and the average and the variance of the extracted feature vectors are calculated (S430). In this case, the feature value used uses normalized contour information in units of image pixels.

계산된 평균과 분산을 영상 품질 측정을 위한 기준으로 생성한다(S440).The calculated mean and variance are generated as a reference for measuring image quality (S440).

영상 품질 측정부(120)는 생성된 영상 품질 측정 기준을 저장하고 중복된 동영상 파일의 영상 품질을 측정하는 데 활용하며(S450), 영상 품질 측정 기준을 기설정된 시간 간격 또는 파일 스토리지(200)에 새로운 파일이 업로드된 횟수가 기설정된 횟수 이상인지 여부에 따라 갱신할 수도 있다.The image quality measuring unit 120 stores the generated image quality measurement standard and utilizes the image quality measurement standard to measure the image quality of the duplicate video file (S450). The image quality measurement standard is stored at a predetermined time interval or the file storage 200. The new file may be updated depending on whether the number of uploads is greater than or equal to a preset number.

상기 영상 품질을 측정 후 필터링 서비스 시스템의 대기 모드에서 분산 필터링부(140)의 검색 서버(142)에서는 저작권 유무에 따른 필터링이 요청되는 지를 확인한다(S460). After measuring the image quality, in the standby mode of the filtering service system, the search server 142 of the distributed filtering unit 140 checks whether filtering based on the presence or absence of copyright is requested (S460).

저작권 유무에 따른 필터링 요청을 확인한 결과(S460), 저작권 유무에 따른 필터링이 요청될 경우 검색 서버(142)에서는 전송되는 필터링 대상 정보(예를 들면, 파일 정보, 해쉬 정보, 메타 정보, 특징 정보 등)를 우선 작업 노드 모듈(144)로 전달하고, 우선 작업 노드부(144)에서는 다수의 우선 작업 노드용 데이터베이스(DB)를 이용하여 검색 규칙에 따라 우선 작업 노드에 대한 순차 검색을 수행하며(S470), 순차 검색 수행 중 필터링 유무를 순차 판별한 후에(단계480), 해당 콘텐츠에 저작권이 있는지의 여부를 체크한다(S490).As a result of confirming the filtering request according to the existence of copyright (S460), when filtering based on the existence of copyright is requested, the search server 142 transmits filtering target information (for example, file information, hash information, meta information, and characteristic information). ) Is transmitted to the priority work node module 144, and the priority work node unit 144 performs a sequential search on the priority work node according to a search rule using a plurality of databases for the priority work nodes (S470). After determining whether filtering is performed during the sequential search (step 480), it is checked whether or not the corresponding content is copyrighted (S490).

여기에서, 검색 규칙의 적용은 예를 들면, 다수의 우선 작업 노드용 데이터베이스(DB)에서 가중치(예컨대, 0<가중치<1의 실수값)에 따라 우선 순위가 적용될 경우 예를 들면, DB 1, DB 2, DBL에 가중치가 적용되어 우선 순위가 높은 경우 이들을 이용한 검색, 필터링 유무 판별 및 저작권 유무 체크를 수행한 후에, 나머지 우선 작업 노드용 데이터베이스를 이용한 검색, 필터링 유무 판별 및 저작권 유무 체크를 순차 수행하는 방식으로 수행될 수 있다.Here, the application of the search rule is applied to the priority according to the weight (e.g., a real value of 0 < weight < 1) in the database for a plurality of priority work nodes, for example DB 1, If DB2 and DBL are weighted and have a high priority, search, filter presence, and copyright check are performed using them, and then search, filtering, and copyright check are performed sequentially using the database for the remaining work node. It can be done in a way.

상기 단계의 체크 결과가 해당 콘텐츠에 저작권이 있는 경우 우선 작업 노드 모듈(144)에서는 해당 콘텐츠의 필터링을 수행한 후에 그 필터링 결과 정보를 검색 서버(142)로 전달한다.(S500)If the check result of the step is copyrighted in the content, the work node module 144 first performs filtering of the content and then transfers the filtering result information to the search server 142 (S500).

상기 단계의 체크 결과가 해당 콘텐츠에 저작권이 없는 경우 (즉, 저작권 유무를 판별하지 못하는 경우 포함) 일반 작업 노드 모듈(146)에서는 다수의 일반 작업 노드용 데이터베이스(DB')에 따라 일반 작업 노드에 대한 검색을 수행한다(S510).If the result of the check in the above step is that the content is not copyrighted (that is, it is not possible to determine whether there is a copyright) in the general work node module 146, the general work node according to the database (DB ') for a plurality of general work nodes. Search for (S510).

상기 일반 작업 노드 모듈(146)에서는 검색 결과를 분석하여 해당 콘텐츠에 저작권의 유무를 판단하는 단계(S520)에서도 저작권의 유무를 판단 시 해당 콘텐츠에 저작권의 유무가 없는 경우에는 상기 단계의 체크 결과가 해당 콘텐츠에 저작권이 없는 경우 (즉, 저작권 유무를 판별하지 못하는 경우 포함) 일반 작업 노드 모듈(146)에서는 다수의 일반 작업 노드용 데이터베이스(DB')에 따라 일반 작업 노드에 대한 검색을 수행한다(S510)으로 되돌아가 반복 수행이 가능하도록 형성한다.The general work node module 146 analyzes the search result and determines whether the copyright exists in the corresponding content (S520). If the copyright does not exist in the corresponding content when determining the existence of copyright, the check result of the step is If the content is not copyrighted (that is, if the copyright is not determined), the general work node module 146 performs a search for the general work node according to a database (DB ') for a plurality of general work nodes ( Returning to the step S510 is formed to be repeated.

상기 다수의 우선 작업 노드용 데이터베이스, 특징 정보, 메타 정보, 해쉬 정보, 필터링 키워드 정보를 이용하여 상기 콘텐츠 타입, 발매 시기, 인기 순위, 필터링 검색 순위, 필터링 대상 정보를 구분하고, 상기 다수의 일반 작업 노드용 데이터베이스는 특징 정보, 메타 정보, 해쉬 정보, 필터링 키워드 정보 중 적어도 어느 하나를 포함하도록 각각 생성되는 것이 바람직하다.The content type, release time, popularity ranking, filtering search ranking, and filtering target information are classified using the database for the plurality of priority work nodes, feature information, meta information, hash information, and filtering keyword information, and the plurality of general operations. The node database is preferably generated to include at least one of feature information, meta information, hash information, and filtering keyword information.

상기 필터링 대상 정보는, 파일 정보, 해쉬 정보, 메타 정보, 특징 정보를 포함하도록 형성되는 것이다.The filtering target information is formed to include file information, hash information, meta information, and feature information.

따라서, 필터링 정보 데이터베이스를 다수의 우선 작업 노드용 데이터베이스와 다수의 일반 작업 노드용 데이터베이스로 분산 생성한 후에, 필터링 요청에 따라 다수의 우선 작업 노드용 데이터베이스를 이용한 필터링 검색을 선택 수행하고, 다수의 일반 작업 노드용 데이터베이스를 이용한 필터링 검색을 선택 수행함으로써, 분산 검색 및 데이터 처리를 통해 효율적으로 필터링 작업을 수행할 수 있다.Therefore, after generating the filtering information database distributed among the database for the plurality of preferred work nodes and the database for the common work nodes, the filtering request using the database for the plurality of preferred work nodes is selected according to the filtering request, By performing filtering search using the database for work nodes, filtering can be efficiently performed through distributed search and data processing.

이상의 설명에서는 본 발명의 당양한 실시 예들을 제시하여 설명하였으나 본 발명이 반드시 이에 한정되는 것은 아니며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자라면 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능함을 쉽게 알 수 있을 것이다.In the above description has been presented by presenting the various embodiments of the present invention, but the present invention is not necessarily limited thereto, and those skilled in the art to which the present invention pertains within the scope not departing from the technical spirit of the present invention It will be readily appreciated that various substitutions, modifications and variations are possible.

100 : 중복 파일 관리 시스템
110 : 중복 파일 검출부
120 : 영상 품질 측정부
130 : 중복 파일 관리부
140 : 분산 필터링부100: Duplicate File Management System
110: duplicate file detection unit
120: image quality measuring unit
130: duplicate file management unit
140: distributed filtering unit

Claims

A duplicate file detection unit detecting a duplicate video file among the video files stored in the file storage using a fingerprint of the video file;
An image quality measuring unit configured to generate an image quality measurement standard and measure image quality of the detected video file according to the generated image quality measurement standard;
A duplicate file manager to move or delete some of the detected video files to another storage based on the measured image quality; And
Contents-based clean cloud management system comprising a; distributed filtering unit for selecting filtering according to the copyright of digital content

The method of claim 1,
The distributed filtering unit;
Includes priority node nodes, generic job node modules, and a database of filtering information.
The search server selectively filters the database for the priority work node having a higher priority among the plurality of priority work node databases, and then selectively uses the database for the priority work node having a lower priority. Content-based clean cloud management system, characterized in that the control to filter by the filtered and forwarded filtered results to the search server.

The method of claim 1,
The duplicate file management unit
Storing the fingerprint of the video file having the highest measured image quality and without copyright infringement in a fingerprint database,
Content-based clean cloud management system, characterized in that for deleting the video file having a copyright infringement, except for the video file having the highest measured video quality of the detected video file.

Acquiring an ultra-high definition video or image among the moving image files stored in the file storage (S400);
Generating an image converted into a brightness component value having a fixed size based on the obtained video and image (S410);
Dividing the converted video and the image into N blocks and extracting the converted video and the image into blocks including sharp outlines among the divided blocks (S420);
Extracting a feature vector for measuring image quality for each of the selected blocks and calculating an average and a variance of the extracted feature vectors (S430);
Generating average and variance of the calculated feature vectors as a reference for measuring image quality (S440);
And storing the generated image quality measurement criteria and using the same to measure the image quality of the duplicate video file (S450).
Determining whether filtering based on the presence or absence of copyright is requested through a search server after measuring the image quality (S460);
Performing a sequential search for a work node when the request for filtering according to the copyright is confirmed (S470);
Sequentially determining whether filtering is performed during the sequential search (S480);
Determining whether or not a copyright exists in the corresponding content (S490);
And if there is a copyright in the corresponding content, first performing filtering in a work node and then transmitting the filtering result information to a search server (S500).

The method of claim 4, wherein
Performing a search for a general task node when there is no copyright in the corresponding content (S510);
Analyzing the search result in the general work node to determine whether copyright exists in the corresponding content (S520);
If there is a copyright of the corresponding content as a result of the determination, first filtering the work node and then transmitting the filtering result information to the search server (S500). The content-based clean cloud may further include a. How to manage.

The method of claim 5,
The content-based clean cloud management method of claim 1, wherein if it is determined whether the copyright exists in the S520, if there is no copyright in the corresponding content, the process returns to S510.