KR102382956B1

KR102382956B1 - System and method for preventing Illegal outflow of sharing content using n-gram analysis

Info

Publication number: KR102382956B1
Application number: KR1020200158368A
Authority: KR
Inventors: 남기효; 정문권; 오세민
Original assignee: (주)유엠로직스
Priority date: 2020-11-24
Filing date: 2020-11-24
Publication date: 2022-04-06

Abstract

The present invention relates to a system and a method for preventing illegal leakage of shared content using n-gram. More specifically, the system for preventing illegal leakage of shared content using n-grams includes: an original data processor (100) that receives original content data provided through a content sharing service and generates original information by applying a predetermined analysis algorithm; a shared data processing unit (200) that receives shared content data provided by the original content data through the content sharing service, monitors temporarily stored data while the shared content data is being provided, and generates monitoring information by collecting data temporarily stored according to monitoring results and applying the same to the predetermined analysis algorithm; and a leakage prevention unit (300) that compares and analyzes the original information and surveillance information to determine whether the surveillance information is included in the original information, and determines whether the shared content data is illegally leaked.

Description

System and method for preventing Illegal outflow of sharing content using n-gram analysis}

본 발명은 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 시스템 및 그 방법에 관한 것으로, 더욱 상세하게는 널리 이용되고 있는 화상 커뮤니케이션 서비스를 이용하는 과정에서 디지털 콘텐츠의 공유가 이루어질 때, 발생할 수 있는 공유 콘텐츠의 불법 유출을 방지할 수 있는 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 시스템 및 그 방법에 관한 것이다.The present invention relates to a system for preventing illegal sharing of content using n-gram and a method therefor, and more particularly, to a shared content that may occur when digital content is shared in the process of using a widely used video communication service. To a system and method for preventing illegal leakage of shared content using n-gram capable of preventing illegal leakage.

신종 코로나 바이러스 등에 의한 감염증 상태가 장기화되면서, 수많은 기업과 학교, 가정 등에서 화상 커뮤니케이션 서비스를 활용하여 일상생활을 이어가고 있다. 그렇지만, 비대면 상태인 만큼 부작용이 속출되고 있다.As the state of infection caused by the novel coronavirus has been prolonged, many businesses, schools, and homes are using video communication services to continue their daily lives. However, as it is a non-face-to-face state, side effects are occurring one after another.

일 예를 들자면, 화상 회의에서는 필수적으로 음성 데이터가 공유되거나, 경우에 따라 문서나 이미지 데이터 등의 디지털 콘텐츠의 공유가 이루어지게 되며, 내부 기밀 정보 등이 포함되어 있는 중요한 콘텐츠라 할지라도 공유가 이루어지게 되면서, 비대면 상황에서 충분히 발생할 수 있는 불법적인 자료복사, 화면캡쳐 등의 방법으로 불법 저장을 통한 외부 유출로 이어질 수 있는 문제점이 있다.For example, in video conferencing, voice data is necessarily shared or, in some cases, digital content such as documents or image data is shared, and even important content including internal confidential information is shared. There is a problem that can lead to external leakage through illegal storage through methods such as illegal data copying and screen capture that can sufficiently occur in non-face-to-face situations.

이러한 문제점을 해소하기 위한 종래의 화면 불법 캡쳐 방지 기술로는, 사용자(공유 받는 사용자)의 화면 캡쳐 이벤트를 감시하여 이벤트 발생 자체를 차단하는 방법, 사용자가 화상 커뮤니케이션 서비스를 제공받는 시스템의 클립보드를 삭제하는 방법 및 화면 캡쳐 프로그램을 강제 종료시키는 방법 등이 있으나, 공유 콘텐츠의 불법적인 화면 캡쳐 방지 뿐 아니라, 사용자가 화상 커뮤니케이션 서비스를 제공받으면서 동시에 진행하는 다른 작업 수행을 위한 화면 캡쳐나 클립보드 사용할 경우, 이를 차단 또는 훼손하여 데이터가 손실되거나 업무 수행 자체를 방해하는 또다른 문제가 발생한다.As a conventional screen capture prevention technology to solve this problem, a method of monitoring a screen capture event of a user (a shared user) to block the event itself, and a clipboard of a system that a user receives a video communication service from. There are methods to delete and forcibly terminate the screen capture program, but when using a screen capture or clipboard to prevent illegal screen capture of shared content, as well as to perform other tasks concurrently with the user being provided with a video communication service , another problem occurs that blocks or corrupts it, resulting in data loss or interfering with business performance itself.

이에 따라, 화상 커뮤니케이션 서비스를 이용하더라도, 화상 커뮤니케이션 서비스를 통해서 공유되는 콘텐츠 외에 다른 콘텐츠 데이터를 훼손하거나 일반적인 업무 수행을 방해하지 않으면서도, 화상 커뮤니케이션 서비스를 통해서 공유되는 디지털 콘텐츠의 화면 불법 캡쳐를 방지할 수 있는 기술이 요구되고 있다.Accordingly, even if the video communication service is used, it is possible to prevent illegal screen capture of digital content shared through the video communication service without damaging data other than the contents shared through the video communication service or interfering with general business performance. capable technology is required.

이와 관련해서, 국내공개특허 제10-2019-0033800호("화상 회의용 데이터의 보안 관리 장치 및 방법")에서는 화상 회의용 데이터를 안전하게 저장 및 관리하고, 보안 정책 및 권한 정보를 기반으로 화상 회의용 데이터를 안전하게 공유하는 기술을 개시하고 있다.In this regard, in Korea Patent Publication No. 10-2019-0033800 ("Device and method for security management of data for video conferencing"), data for video conferencing is safely stored and managed, and data for video conferencing is stored based on security policy and permission information. We are disclosing technology for securely sharing.

국내공개특허 제10-2019-0033800호(공개일자 2019.04.01.)Domestic Patent Publication No. 10-2019-0033800 (published on April 1, 2019)

본 발명은 상기한 바와 같은 종래 기술의 문제점을 해결하기 위하여 안출된 것으로, 본 발명의 목적은 디지털 콘텐츠의 공유가 이루어지는 과정에서, 공유되고 있는 디지털 콘텐츠에 대해서만 불법 유출 등을 방지하지 위한 임시 저장 데이터의 삭제를 수행하여, 이 외 사용자 데이터의 손실이나 일반적인 업무 수행의 저하 없이, 공유되고 있는 디지털 콘텐츠에 대해서만 불법 유출을 방지할 수 있는 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 시스템 및 그 방법을 제공하는 것이다.The present invention has been devised to solve the problems of the prior art as described above, and an object of the present invention is to prevent illegal leakage of only the digital content being shared in the process of sharing the digital content. Provides a system and method for preventing illegal leakage of shared contents using n-gram, which can prevent illegal leakage only of shared digital contents without loss of other user data or deterioration of general work performance by deleting will do

본 발명의 일 실시예에 따른 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 시스템은, 콘텐츠 공유 서비스를 통해서 제공되는 원본 콘텐츠 데이터를 전달받아, 기설정된 분석 알고리즘에 적용하여 원본 정보를 생성하는 원본 데이터 처리부(100), 콘텐츠 공유 서비스를 통해서 상기 원본 콘텐츠 데이터에 의한 제공되는 공유 콘텐츠 데이터를 전달받아, 상기 공유 콘텐츠 데이터가 제공되는 동안 임시 저장되는 데이터를 감시하여, 감시 결과에 따라 임시 저장되는 데이터를 수집하여 기설정된 분석 알고리즘에 적용하여 감시 정보를 생성하는 공유 데이터 처리부(200) 및 상기 원본 정보와 감시 정보를 비교 분석하여, 상기 감시 정보가 상기 원본 정보에 포함되는지 여부를 판단하여, 상기 공유 콘텐츠 데이터의 불법 유출 여부를 판단하는 유출 방지부(300)를 포함하여 구성되는 것이 바람직하다.The system for preventing illegal leakage of shared content using n-gram according to an embodiment of the present invention is an original data processing unit that receives original content data provided through a content sharing service and applies a preset analysis algorithm to generate original information (100), receives the shared content data provided by the original content data through the content sharing service, monitors the data temporarily stored while the shared content data is provided, and collects the temporarily stored data according to the monitoring result The shared data processing unit 200 that generates monitoring information by applying to a preset analysis algorithm and compares and analyzes the original information and the monitoring information to determine whether the monitoring information is included in the original information, the shared content data Preferably, it is configured to include a leak prevention unit 300 that determines whether the leak is illegal.

더 나아가 상기 원본 데이터 처리부(100)는 상기 원본 콘텐츠 데이터를 기설정된 콘텐츠 유형에 따라 분류하여, 분류한 콘텐츠 유형을 이용하여, 원본 콘텐츠 메타 정보를 생성하는 유형 분석부(110), 기저장되어 있는 n-gram 모델을 기반으로, 상기 원본 콘텐츠 데이터를 분석하여 상기 원본 콘텐츠 데이터의 n-gram 정보를 생성하는 n-gram 생성부(120) 및 상기 원본 콘텐츠 메타 정보와 상기 원본 콘텐츠 데이터의 n-gram 정보에 대한 정규화(normalization)를 수행한 후 결합하여, 상기 원본 정보를 생성하는 원본정보 생성부(130)를 더 포함하여 구성되는 것이 바람직하다.Furthermore, the original data processing unit 100 classifies the original content data according to a preset content type, and uses the classified content type to generate the original content meta-information by using the type analysis unit 110, which is pre-stored. Based on the n-gram model, an n-gram generator 120 that analyzes the original content data to generate n-gram information of the original content data, and the n-gram of the original content meta information and the original content data It is preferable to further include an original information generating unit 130 for generating the original information by performing normalization on the information and combining them.

더 나아가 상기 공유 데이터 처리부(200)는 콘텐츠 공유 서비스를 통해서 상기 원본 콘텐츠 데이터에 의한 상기 공유 콘텐츠 데이터가 제공되는 동안, 임시 저장 데이터의 생성 여부를 감시하는 임시 저장 감시부(210), 상기 임시 저장 감시부(210)에 의해 감시한 상기 임시 저장 데이터를 기설정된 콘텐츠 유형에 따라 분류하여, 분류한 콘텐츠 유형을 이용하여, 공유 콘텐츠 메타 정보를 생성하는 유형 분석부(220), 기저장되어 있는 n-gram 모델을 기반으로, 상기 임시 저장 데이터를 분석하여 상기 임시 저장 데이터의 n-gram 정보를 생성하는 n-gram 생성부(230) 및 상기 공유 콘텐츠 메타 정보와 상기 임시 저장 데이터의 n-gram 정보에 대한 정규화를 수행한 후 결합하여, 상기 감시 정보를 생성하는 감시정보 생성부(240)를 더 포함하여 구성되는 것이 바람직하다.Furthermore, the shared data processing unit 200 includes a temporary storage monitoring unit 210 that monitors whether temporary storage data is generated while the shared contents data is provided by the original contents data through the contents sharing service, the temporary storage A type analysis unit 220 that classifies the temporarily stored data monitored by the monitoring unit 210 according to a preset content type, and uses the classified content type to generate shared content meta information, a pre-stored n - Based on a gram model, an n-gram generator 230 that analyzes the temporarily stored data to generate n-gram information of the temporarily stored data, and the shared content meta information and n-gram information of the temporarily stored data After performing the normalization for the combined, it is preferable to further include a monitoring information generating unit 240 for generating the monitoring information.

더 나아가 상기 유출 방지부(300)는 상기 원본 정보와 상기 감시 정보를 비교하여, 상기 감시 정보가 상기 원본 정보에 포함되는지 여부를 판단하는 판단부(310) 및 상기 판단부(310)의 판단 결과에 따라, 상기 감시 정보가 상기 원본 정보에 포함될 경우, 상기 공유 콘텐츠 데이터의 불법 유출로 판단하여, 상기 임시 저장 데이터를 삭제하는 보안 처리부(320)를 더 포함하여 구성되는 것이 바람직하다.Furthermore, the leak prevention unit 300 compares the original information with the monitoring information, and determines whether the monitoring information is included in the original information. The determination result of the determination unit 310 and the determination unit 310 Accordingly, when the monitoring information is included in the original information, it is preferable to further include a security processing unit 320 that determines that the shared content data is illegally leaked and deletes the temporarily stored data.

본 발명의 일 실시예에 따른 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 방법은, 원본 데이터 처리부에서, 콘텐츠 공유 서비스를 통해서 제공되는 원본 콘텐츠 데이터를 전달받아, 기설정된 분석 알고리즘을 통해 원본 정보를 생성하는 원본 정보 생성단계(S100), 공유 데이터 처리부에서, 콘텐츠 공유 서비스를 통해서 상기 원본 콘텐츠 데이터에 의한 공유 콘텐츠 데이터가 제공되는 동안, 임시 저장되는 데이터 발생 여부를 감시하는 임시 저장 감시단계(S200), 상기 임시 저장 감시 단계(S200)에 의해 상기 공유 콘텐츠 데이터가 제공되는 동안 임시 저장 데이터가 발생될 경우, 발생된 상기 임시 저장 데이터를 수집하여 기설정된 분석 알고리즘을 통해 감시 정보를 생성하는 감시 정보 생성단계(S300), 유출 방지부에서, 상기 원본 정보 생성단계(S100)에 의한 상기 원본 정보와, 상기 감시 정보 생성단계(S300)에 의한 상기 감시 정보를 전달받아, 상기 정보들을 비교 분석하여, 상기 감시 정보가 상기 원본 정보에 포함되는지 여부를 판단하여, 상기 공유 콘텐츠 데이터의 불법 유출 여부를 판단하는 불법 판단단계(S400) 및 상기 불법 판단단계(S400)의 판단 결과에 따라, 상기 감시 정보가 상기 원본 정보에 포함될 경우, 상기 공유 콘텐츠 데이터의 불법 유출로 판단하여, 상기 임시 저장 데이터를 삭제하는 보안 단계(S500)를 포함하여 구성되는 것이 바람직하다.In the method for preventing illegal leakage of shared content using n-gram according to an embodiment of the present invention, an original data processing unit receives original content data provided through a content sharing service, and generates original information through a preset analysis algorithm. a temporary storage monitoring step (S200) of monitoring whether data to be temporarily stored occurs while the shared content data by the original content data is provided through the content sharing service in the shared data processing unit (S100); When temporary storage data is generated while the shared content data is provided by the temporary storage monitoring step (S200), a monitoring information generation step of collecting the generated temporary storage data and generating monitoring information through a preset analysis algorithm (S300), the leak prevention unit receives the original information by the original information generation step (S100) and the monitoring information by the monitoring information generation step (S300), compares and analyzes the information, and monitors the According to the determination result of the illegal determination step (S400) and the illegal determination step (S400) of determining whether the information is included in the original information to determine whether the shared content data is illegally leaked, the monitoring information is the original When included in the information, it is determined that the shared content data is illegally leaked, and it is preferable to include a security step (S500) of deleting the temporarily stored data.

더 나아가 상기 원본 정보 생성단계(S100)는 상기 원본 콘텐츠 데이터의 콘텐츠 유형을 분류하고, 분류한 콘텐츠 유형에 따른 원본 콘텐츠 메타 정보를 생성하는 메타정보 생성단계(S110), 기저장되어 있는 n-gram 모델을 기반으로, 상기 원본 콘텐츠 데이터를 분석하여 상기 원본 콘텐츠 데이터의 n-gram 정보를 생성하는 원본 n-gram 생성단계(S120) 및 상기 메타정보 생성단계(S110)에 의한 상기 원본 콘텐츠 메타 정보와 상기 원본 n-gram 생성단계(S120)에 의한 상기 원본 콘텐츠 데이터의 n-gram 정보를 전달받아, 정규화를 수행한 후 결합하여 상기 원본 정보를 생성하는 원본 정보 생성단계(S130)를 더 포함하여 구성되는 것이 바람직하다.Furthermore, the original information generating step (S100) includes a meta-information generation step (S110) of classifying the content types of the original content data, and generating original content meta information according to the classified content types (S110), a pre-stored n-gram Based on the model, the original content meta information and the original n-gram generation step (S120) and the meta information generation step (S110) for generating n-gram information of the original content data by analyzing the original content data Further comprising an original information generating step (S130) of receiving the n-gram information of the original content data by the original n-gram generating step (S120), performing normalization, and combining them to generate the original information It is preferable to be

더 나아가 상기 감시 정보 생성단계(S300)는 상기 임시 저장 데이터의 콘텐츠 유형을 분류하고, 분류한 콘텐츠 유형에 따른 공유 콘텐츠 메타 정보를 생성하는 메타정보 생성단계(S310), 기저장되어 있는 n-gram 모델을 기반으로, 상기 임시 저장 데이터를 분석하여 상기 임시 저장 데이터의 n-gram 정보를 생성하는 공유 n-gram 생성단계(S320) 및 상기 메타정보 생성단계(S310)에 의한 상기 공유 콘텐츠 메타 정보와 상기 공유 n-gram 생성단계(S320)에 의한 상기 임시 저장 데이터의 n-gram 정보를 전달받아, 정규화를 수행한 후 결합하여 상기 감시 정보를 생성하는 감시 정보 생성단계(S330)를 더 포함하여 구성되는 것이 바람직하다.Furthermore, the monitoring information generating step (S300) includes a meta-information generation step (S310) of classifying the content type of the temporarily stored data, and generating shared content meta information according to the classified content type (S310), a pre-stored n-gram Based on the model, the shared content meta information by the shared n-gram generating step (S320) and the meta information generating step (S310) of analyzing the temporarily stored data to generate n-gram information of the temporarily stored data and Further comprising a monitoring information generating step (S330) of receiving the n-gram information of the temporarily stored data by the shared n-gram generating step (S320), performing normalization, and combining them to generate the monitoring information It is preferable to be

상기와 같은 구성에 의한 본 발명의 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 시스템 및 그 방법은 널리 이용되고 있는 화상 커뮤니케이션 서비스를 이용하는 과정에서 디지털 콘텐츠의 공유가 이루어질 때, 발생할 수 있는 공유 콘텐츠의 불법 유출을 방지할 수 있는 장점이 있다.The system and method for preventing illegal sharing of content using n-grams according to the present invention according to the above-described configuration can prevent illegal sharing of content that may occur when digital content is shared in the process of using a widely used video communication service. It has the advantage of preventing leakage.

상세하게는, 화상 회의 등의 콘텐츠 공유 서비스를 이용하는 과정에서 공유되는 콘텐츠(화상회의 콘텐츠 제공자에 의한)와 사용자(화상회의 참석자)의 임시 저장 데이터(클립보드 데이터)를 판독하여, 보호하고자 하는 공유 콘텐츠의 내용이 포함된 클립보드 데이터만을 삭제하여, 화상회의 관련 콘텐츠에 해당하는 않는 사용자 데이터의 손실이나 일반적인 업무수행의 저하없이 화상회의 참석자 간의 공유 콘텐츠의 불법 저장을 통한 불법 유출을 방지할 수 있다.In detail, in the process of using a content sharing service such as a video conference, content shared (by a video conference content provider) and temporary storage data (clipboard data) of a user (video conference participant) are read and shared to be protected. By deleting only the clipboard data that contains the contents of the contents, it is possible to prevent illegal leakage through illegal storage of shared contents among video conference participants without loss of user data that does not correspond to video conference related contents or deterioration of general business performance. .

즉, 화상 회의 등의 콘텐츠 공유 서비스를 통해서 공유되는 콘텐츠와 사용자의 클립보드의 데이터를 판독하여, 보호하고자 하는 공유 콘텐츠의 내용이 포함된 클립보드 데이터 만을 삭제하여, 사용자 데이터의 불필요한 손실이나 일반적인 업무 수행의 저하 없이 공유 콘텐츠의 불법 유출을 방지할 수 있는 장점이 있다.In other words, it reads the content shared through content sharing services such as video conferences and the user's clipboard data, and deletes only the clipboard data that contains the content of the shared content to be protected, resulting in unnecessary loss of user data or general work. There is an advantage in that it is possible to prevent illegal leakage of shared content without degradation of performance.

도 1은 본 발명의 일 실시예에 따른 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 시스템을 나타낸 구성 예시도이다.
도 2는 본 발명의 일 실시예에 따른 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 시스템의 동작 예시도이다.
도 3은 본 발명의 일 실시예에 따른 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 방법을 나타낸 순서 예시도이다.1 is an exemplary configuration diagram illustrating a system for preventing illegal sharing of content using n-grams according to an embodiment of the present invention.
2 is an exemplary operation diagram of a system for preventing illegal sharing of content using n-grams according to an embodiment of the present invention.
3 is a flowchart illustrating a method for preventing illegal leakage of shared content using n-grams according to an embodiment of the present invention.

이하 첨부한 도면들을 참조하여 본 발명의 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 시스템 및 그 방법을 상세히 설명한다. 다음에 소개되는 도면들은 당업자에게 본 발명의 사상이 충분히 전달될 수 있도록 하기 위해 예로서 제공되는 것이다. 따라서, 본 발명은 이하 제시되는 도면들에 한정되지 않고 다른 형태로 구체화될 수도 있다. 또한, 명세서 전반에 걸쳐서 동일한 참조번호들은 동일한 구성요소들을 나타낸다.Hereinafter, with reference to the accompanying drawings, a system and method for preventing illegal sharing of content using n-grams of the present invention will be described in detail. The drawings introduced below are provided as examples so that the spirit of the present invention can be sufficiently conveyed to those skilled in the art. Accordingly, the present invention is not limited to the drawings presented below and may be embodied in other forms. Also, like reference numerals refer to like elements throughout.

이 때, 사용되는 기술 용어 및 과학 용어에 있어서 다른 정의가 없다면, 이 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 통상적으로 이해하고 있는 의미를 가지며, 하기의 설명 및 첨부 도면에서 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능 및 구성에 대한 설명은 생략한다.At this time, if there is no other definition in the technical terms and scientific terms used, it has the meaning commonly understood by those of ordinary skill in the art to which this invention belongs, and in the following description and accompanying drawings, the subject matter of the present invention Descriptions of known functions and configurations that may unnecessarily obscure will be omitted.

더불어, 시스템은 필요한 기능을 수행하기 위하여 조직화되고 규칙적으로 상호 작용하는 장치, 기구 및 수단 등을 포함하는 구성 요소들의 집합을 의미한다.In addition, a system refers to a set of components including devices, instruments, and means that are organized and regularly interact to perform necessary functions.

본 발명의 일 실시예에 따른 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 시스템 및 그 방법은, 화상 회의 등의 콘텐츠 공유 서비스를 이용하는 과정에서 공유되는 콘텐츠(화상회의 콘텐츠 제공자에 의한)와 사용자(화상회의 참석자)의 임시 저장 데이터(클립보드 데이터)를 판독하여, 보호하고자 하는 공유 콘텐츠의 내용이 포함된 클립보드 데이터만을 삭제하여, 화상회의 관련 콘텐츠에 해당하는 않는 사용자 데이터의 손실이나 일반적인 업무수행의 저하없이 화상회의 참석자 간의 공유 콘텐츠의 불법 저장을 통한 불법 유출을 방지할 수 있는 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 시스템 및 그 방법에 관한 것이다.A system and method for preventing illegal leakage of shared content using n-gram according to an embodiment of the present invention include content (by a video conferencing content provider) and a user (image It reads the temporarily stored data (clipboard data) of meeting participants) and deletes only the clipboard data that contains the content of the shared content to be protected, resulting in loss of user data that does not correspond to video conferencing-related content or loss of general business performance. Disclosed are a system for preventing illegal leakage of shared contents using n-gram, which can prevent illegal leakage through illegal storage of shared contents among video conference participants without degradation, and a method therefor.

이를 통해서, 본 발명의 일 실시예에 따른 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 시스템 및 그 방법은, 도 2에 도시된 바와 같이, 화상 회의 등의 콘텐츠 공유 서비스를 제공받으면서, 화상 회의 참석자들 간 공유되고 있는 콘텐츠 데이터와 화상 회의 참석자의 각각 클립보드 데이터(임시 저장 데이터 등)에 대한 각각의 n-gram을 생성하고 이를 판독하여, 불법 저장 등의 불법 유출을 방지할 수 있다.Through this, in the system and method for preventing illegal sharing of content using n-gram according to an embodiment of the present invention, as shown in FIG. 2 , while being provided with content sharing services such as video conference, video conference participants It is possible to prevent illegal leakage such as illegal storage by generating and reading each n-gram for content data shared between each other and clipboard data (temporarily stored data, etc.) of each video conference participant.

이러한 본 발명의 일 실시예에 따른 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 시스템은 도 1 및 도 2에 도시된 바와 같이, 원본 데이터 처리부(100), 공유 데이터 처리부(200) 및 유출 방지부(300)를 포함하여 구성되는 것이 바람직하며, 상기 원본 데이터 처리부(100), 공유 데이터 처리부(200)는 화상 회의 등의 콘텐츠 공유 서비스를 이용하는 사용자들의 어플리케이션에 설치되어, 동작을 수행하는 것이 바람직하다.As shown in FIGS. 1 and 2, the shared content illegal leakage prevention system using n-gram according to an embodiment of the present invention includes an original data processing unit 100, a shared data processing unit 200, and a leakage prevention unit ( 300), and the original data processing unit 100 and the shared data processing unit 200 are installed in applications of users who use content sharing services, such as video conferencing, to perform operations.

상세하게는, 상기 원본 데이터 처리부(100)는 화상 회의 등의 콘텐츠 공유 서비스를 이용하는 사용자들의 어플리케이션에 설치되어, 화상 회의 등의 콘텐츠 공유 서비스를 통해서 원본 콘텐츠 데이터를 제공할 때, 활성화되는 것이 바람직하며, 상기 공유 데이터 처리부(200)는 화상 회의 등의 콘텐츠 공유 서비스를 이용하는 사용자들의 어플리케이션에 설치되어, 화상 회의 등의 콘텐츠 공유 서비스를 통해서 공유 콘텐츠 데이터를 제공받을 때, 활성화되는 것이 바람직하다. 그렇기 때문에, 화상 회의 등의 콘텐츠 공유 서비스의 특성상, 양방향 상호 간 데이터의 공유가 이루어지기 때문에, 상기 원본 데이터 처리부(100)와 공유 데이터 처리부(200)를 일측, 즉, 회의 주최자 또는 회의 참석자에 포함하여 구성하는 것이 아니라, 양측 모두 포함하여 구성되는 것이 바람직하다.In detail, the original data processing unit 100 is installed in applications of users who use content sharing services such as video conferences, and is preferably activated when providing original content data through content sharing services such as video conferences, , it is preferable that the shared data processing unit 200 is installed in an application of users who use a content sharing service such as a video conference and activated when the shared content data is provided through a content sharing service such as a video conference. Therefore, due to the nature of the content sharing service such as video conference, since data is shared between each other in both directions, the original data processing unit 100 and the shared data processing unit 200 are included in one side, that is, the meeting organizer or meeting participants. It is preferable not to be configured to do so, but to be configured to include both sides.

또한, 콘텐츠 공유 서비스를 제공하는 어플리케이션에서, 콘텐츠 공유 서비스의 신뢰성 등을 향상시키기 위하여, 상기 유출 방지부(300)를 포함하여 구성하여, 공유 콘텐츠 데이터의 불법 유출 여부를 판단하는 것이 바람직하다.In addition, in an application providing a content sharing service, in order to improve the reliability of the content sharing service, it is preferable to include the leak prevention unit 300 to determine whether the shared content data is illegally leaked.

각 구성에 대해서 자세히 알아보자면,To learn more about each configuration,

상기 원본 데이터 처리부(100)는 콘텐츠 공유 서비스(일 예를 들자면, 화상 회의 등)를 통해서 제공되는 원본 콘텐츠 데이터를 전달받아, 다시 말하자면, 화상회의 참석자 중 어느 한명으로부터 입력되는 상기 원본 콘텐츠 데이터를 전달받아, 미리 설정된 분석 알고리즘에 적용하여, 공유 콘텐츠 데이터의 불법 유출을 판단하기 위한 기준이 되는 원본 정보를 생성하는 것이 바람직하다.The original data processing unit 100 receives the original content data provided through a content sharing service (for example, video conference, etc.), that is, the original content data input from any one of the video conference participants It is preferable to receive and apply a preset analysis algorithm to generate original information that is a standard for judging illegal leakage of shared content data.

이러한 상기 원본 데이터 처리부(100)는 도 1에 도시된 바와 같이, 유형 분석부(110), n-gram 생성부(120) 및 원본정보 생성부(130)를 더 포함하여 구성되는 것이 바람직하다.As shown in FIG. 1 , the original data processing unit 100 is preferably configured to further include a type analysis unit 110 , an n-gram generation unit 120 , and an original information generation unit 130 .

상기 유형 분석부(110)는 상기 원본 콘텐츠 데이터를 미리 설정된 콘텐츠 유형을 기준으로 분류하는 것이 바람직하다. 이 때, 미리 설정된 콘텐츠 유형으로는 파일, 이미지, 텍스트, 영상, 음성으로 분류되는 것이 바람직하다.It is preferable that the type analyzer 110 classifies the original content data based on a preset content type. In this case, it is preferable that the preset content types be classified into file, image, text, video, and audio.

상기 유형 분석부(110)는 분류한 콘텐츠 유형을 이용하여 원본 콘텐츠에 대한 메타 정보, 즉, 원본 콘텐츠 메타 정보를 생성하는 것이 바람직하다.It is preferable that the type analyzer 110 generates meta-information on the original content, that is, the meta-information on the original content, by using the classified content type.

상기 n-gram 생성부(120)는 미리 저장되어 있는 n-gram 모델을 기반으로, 상기 원본 콘텐츠 데이터를 분석하여, 상기 원본 콘텐츠 데이터의 n-gram 정보를 생성하는 것이 바람직하다.Preferably, the n-gram generating unit 120 generates n-gram information of the original content data by analyzing the original content data based on a previously stored n-gram model.

이 때, n-gram 알고리즘이란, 자연어로 표현된 텍스트의 특징을 추출하여 단순한 기호의 나열로 다룰 수 있도록 하는 것으로, 기호열의 특징을 조사하기 위해 동일한 부분의 기호열이 반복되는지 확인하는데, 이 때, n개씩 잘라낸 기호열 중 같은 기호열이 발견되면 이를 카운트하여 분석하게 된다. 이를 통해서 기호열에서 추출 가능한 n개의 연속된 시퀀스 집합을 얻고, 집합 내 구성 요소 간의 유사성을 비교하여 각각의 기호열로부터 추출한 n-gram 집합의 원소가 얼마나 유사한지를 평가하게 된다. 동일한 n-gram이 많다는 것은 길이가 n인 동일한 기호열 패턴을 많이 공유하고 있다는 의미하며, 이를 통해 두 기호열 간 유사성을 예측할 수 있다.At this time, the n-gram algorithm extracts the characteristics of the text expressed in natural language so that it can be treated as a simple sequence of symbols. If the same symbol string is found among the symbol strings cut by , n, it is counted and analyzed. Through this, a set of n consecutive sequences that can be extracted from a symbol string is obtained, and the similarity of the elements of the n-gram set extracted from each symbol string is evaluated by comparing the similarity between the elements in the set. The fact that there are many identical n-grams means that many of the same sequence patterns of length n are shared, and the similarity between the two sequences can be predicted.

이러한 점을 이용하여 본 발명의 일 실시예에 따른 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 시스템에서는, 상기 원본 데이터 처리부(100)를 통해서 상기 원본 콘텐츠 데이터로부터 생성한 n-gram 정보와 상기 공유 데이터 처리부(200)를 통해서 공유 콘텐츠 데이터를 제공받는 사용자(화상회의 참석자 등)가 공유 콘텐츠 데이터를 제공받는 도중 생성한 임시 저장 데이터로부터 생성한 n-gram 정보를 이용하여, 두 기호열 간 상호성을 분석하여 불법 유출 여부를 판단할 수 있다.Using this point, in the system for preventing illegal leakage of shared contents using n-grams according to an embodiment of the present invention, n-gram information generated from the original contents data through the original data processing unit 100 and the shared data Analysis of reciprocity between two symbol strings using n-gram information generated from temporary storage data generated while receiving shared content data by a user (such as a video conference participant) receiving shared content data through the processing unit 200 . Thus, it can be determined whether there is an illegal leak.

단, 본 발명의 일 실시예에 따른 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 시스템에서는 텍스트 데이터 뿐 아니라, 파일 콘텐츠, 이미지 콘텐츠, 영상 콘텐츠 및 음성 콘텐츠에 대해서도 n-gram 정보를 생성하여 활용하는 것이 바람직하다.However, in the system for preventing illegal leakage of shared contents using n-grams according to an embodiment of the present invention, it is not only possible to generate and utilize n-gram information not only for text data but also for file contents, image contents, video contents, and audio contents. desirable.

이를 위해, 상기 n-gram 생성부(120)를 통해서, 상기 유형 분석부(110)에서 분류한 콘텐츠 유형을 이용하여, 상기 원본 콘텐츠 데이터가 파일 콘텐츠일 경우, 바이너리(binary) 형태로 분석하고, 이미지 콘텐츠일 경우, 픽셀(pixel) 형태로 분석하고, 음성 콘텐츠일 경우, 주파수(frequency) 형태로 분석하고, 영상 콘텐츠일 경우, 픽셀과 주파수의 결합 형태를 기반으로 n-gram 정보를 생성하는 것이 바람직하다.To this end, if the original content data is file content, it is analyzed in a binary format using the content type classified by the type analyzer 110 through the n-gram generator 120 , In the case of image content, it is analyzed in the form of pixels, in the case of audio content, it is analyzed in the form of frequency, and in the case of image content, generating n-gram information based on the combination of pixel and frequency is desirable.

상세하게는, 상기 n-gram 생성부(120)는 상기 유형 분석부(110)에서 분류한 콘텐츠 유형이 파일 콘텐츠일 경우, 파일 헤더의 텍스트 영역과 데이터 부분을 분할하여 추출하게 된다.In detail, when the content type classified by the type analyzer 110 is file content, the n-gram generator 120 divides and extracts the text area and data part of the file header.

추출한 텍스트 영역의 문자열 시그니처를 추출하고, 바이너리 형태의 데이터 영역은 문자 배열로 변환한 후, 파일 콘텐츠 크기에 따라 일정 비율로 해쉬를 수행하고, 해쉬 벡터를 추출하여 각각의 n-gram 정보를 생성하는 것이 바람직하다.After extracting the string signature of the extracted text area, converting the binary data area into a character array, performing hashing at a certain rate according to the file content size, and extracting the hash vector to generate each n-gram information. it is preferable

또한, 상기 n-gram 생성부(120)는 상기 유형 분석부(110)에서 분류한 콘텐츠 유형이 이미지 콘텐츠일 경우, 색상(color), 질감(texture) 및 모양(shape) 특징을 분할하여 추출한다. 색상 정보(RGB)는 픽셀 단위의 RGB에 대한 히스토그램화하여 기법을 사용하고, 질감 정보는 GLCM(Grey-Level Co-occurrence Matrix) 방법을 사용하며, 모양 정보는 영역 기반(Region-based) 방법을 사용하여 특징들을 추출하여, 각각의 n-gram 정보를 생성하는 것이 바람직하다.In addition, when the content type classified by the type analysis unit 110 is image content, the n-gram generating unit 120 divides and extracts color, texture, and shape characteristics. . Color information (RGB) uses a histogram of pixel unit RGB method, texture information uses the GLCM (Grey-Level Co-occurrence Matrix) method, and shape information uses a region-based method. It is desirable to extract features using

또한, 상기 n-gram 생성부(120)는 상기 유형 분석부(110)에서 분류한 콘텐츠 유형이 음성 콘텐츠일 경우, 주파수 영역을 스펙트럼 방식으로 분류하여 특징을 추출하여 n-gram 정보를 생성하는 것이 바람직하다.In addition, when the content type classified by the type analyzer 110 is voice content, the n-gram generating unit 120 generates n-gram information by classifying the frequency domain in a spectral method to extract features. desirable.

또한, 상기 n-gram 생성부(120)는 상기 유형 분석부(110)에서 분류한 콘텐츠 유형이 영상 콘텐츠일 경우, 이미지 콘텐츠에서 추출하는 색상, 질감, 모양 특징과 음성 콘텐츠에서 추출하는 주파수 영역의 특징, 영상 콘텐츠의 타임라인 특징(timeline-feature)을 추출하여, n-gram을 수행하고, 생성된 타임라인 n-gram별로 각각의 n-gram 정보를 생성하는 것이 바람직하다.In addition, when the content type classified by the type analysis unit 110 is video content, the n-gram generator 120 determines the color, texture, and shape features extracted from image content and frequency domain extracted from audio content. It is preferable to perform n-grams by extracting features and timeline-features of video content, and to generate respective n-gram information for each generated timeline n-gram.

상기 원본정보 생성부(130)는 상기 원본 콘텐츠 메타 정보와 상기 원본 콘텐츠 데이터의 n-gram 정보에 대한 정규화(normalization)를 수행한 후 결합하여, 상기 원본 정보를 생성하는 것이 바람직하다. 즉, 공유 콘텐츠 데이터의 불법 유출을 판단하기 위한 기준이 되는 상기 원본 정보를 생성하게 된다.Preferably, the original information generating unit 130 generates the original information by performing normalization on the original content meta information and the n-gram information of the original content data and then combining them. That is, the original information, which is a standard for determining illegal leakage of shared content data, is generated.

상기 공유 데이터 처리부(200)는 콘텐츠 공유 서비스(일 예를 들자면, 화상 회의 등)를 통해서 제공되는 공유 콘텐츠 데이터를 전달받아, 상기 공유 콘텐츠 데이터가 제공되는 동안 임시 저장되는 데이터를 감시하여, 감시 결과에 따라 임시 저장되는 데이터를 수집하여 미리 설정된 분석 알고리즘에 적용하여 감시 정보를 생성하는 것이 바람직하다.The shared data processing unit 200 receives shared content data provided through a content sharing service (for example, video conferencing, etc.), monitors data temporarily stored while the shared content data is provided, and monitors results It is desirable to generate monitoring information by collecting data temporarily stored according to the data and applying it to a preset analysis algorithm.

상세하게는, 본 발명의 일 실시예에 따른 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 시스템 및 그 방법은, 기본적으로 콘텐츠 공유 서비스를 통해서 제공되는 상기 공유 콘텐츠 데이터는 이를 제공받는 사용자(화상 회의 참석자 등)의 불법 저장을 통해서 불법 유출된다는 가정 하에, 불법 저장되는 상기 공유 콘텐츠 데이터와 정상적인 과정을 통해서 저장되는 사용자의 데이터를 구별하면서 불법 유출을 방지하는 것이기 때문에, 전달되는 공유 콘텐츠 데이터가 아닌, 상기 공유 콘텐츠 데이터가 제공되는 동안 임시 저장되는 데이터를 감시하는 것이 바람직하다.In detail, in the system and method for preventing illegal sharing of content using n-gram according to an embodiment of the present invention, the shared content data provided through a content sharing service is basically transmitted to a user (a video conference participant) provided with the shared content data. etc.) under the assumption that the illegally stored shared contents data and the user's data stored through a normal process are distinguished from the illegal leakage, so that the transmitted shared contents data is not It is desirable to monitor data that is temporarily stored while shared content data is provided.

이 때, 이러한 임시 저장 데이터의 감시 과정은 상기 공유 데이터 처리부(200)가 포함되어 있는 사용자의 어플리케이션을 지속적으로 감시해야 할 필요성이 있기 때문에, 사전에 이에 대한 동의가 필요한 것은 물론이다.At this time, since it is necessary to continuously monitor the user's application including the shared data processing unit 200 in the monitoring process of the temporarily stored data, consent is required in advance.

또한, 사용자(화상 회의 참석자)의 불편함을 최소화하기 위하여, 원본 콘텐츠 데이터를 제공하는 사용자가 중요한 기밀 정보 등이 포함되어 있는, 다시 말하자면, 외부 유출되지 말아야 할 내용이 포함되어 있는 원본 콘텐츠 데이터를 공유하기 앞서서, 별도의 입력 동작을 통해서 앞으로 입력되는 콘텐츠 데이터를 불법 유출 감시가 필요한 콘텐츠 데이터라는 것을 사전이 입력함으로써, 상기 원본 데이터 처리부(100)와 공유 데이터 처리부(200)가 동작을 수행하도록 하는 것이 바람직하다.In addition, in order to minimize the inconvenience of users (video conference attendees), the user providing the original content data contains important confidential information, that is, the original content data that contains content that should not be leaked to the outside. Prior to sharing, the original data processing unit 100 and the shared data processing unit 200 perform the operation by inputting the content data to be input in the future as content data requiring illegal leakage monitoring through a separate input operation before sharing. it is preferable

즉, 콘텐츠 공유 서비스를 제공하는 내내 상기 원본 데이터 처리부(100)와 공유 데이터 처리부(200) 및 유출 방지부(300)에서 동작을 수행할 경우, 데이터 처리의 과부하가 발생하거나, 예기치 못하게 상기 공유 콘텐츠 데이터를 제공받는 사용자의 어플리케이션 사용에 불편함을 줄 수 있기 때문에, 이를 미연에 방지하기 위하여, 별도의 입력 동작을 통해서 상기 원본 데이터 처리부(100), 공유 데이터 처리부(200) 및 유출 방지부(300)로 일명 '웨이크업 신호'를 제공하는 것이 바람직하다.That is, when the original data processing unit 100, the shared data processing unit 200, and the leakage prevention unit 300 perform operations while providing the content sharing service, data processing overload occurs or unexpectedly, the shared content Since it may cause inconvenience to the user receiving the data in using the application, in order to prevent this in advance, the original data processing unit 100 , the shared data processing unit 200 , and the leakage prevention unit 300 through a separate input operation ) to provide a so-called 'wake-up signal'.

한편, 상기 공유 데이터 처리부(200)는 도 1에 도시된 바와 같이, 임시 저장 감시부(210), 유형 분석부(220), n-gram 생성부(230) 및 감시정보 생성부(240)를 더 포함하여 구성되는 것이 바람직하다.Meanwhile, the shared data processing unit 200 includes a temporary storage monitoring unit 210 , a type analysis unit 220 , an n-gram generation unit 230 , and a monitoring information generation unit 240 as shown in FIG. 1 . It is preferable to further include.

상기 임시 저장 감시부(210)는 상술한 바와 같이, 상기 콘텐츠 공유 서비스를 통해서 상기 원본 콘텐츠 데이터에 의한 상기 공유 콘텐츠 데이터가 사용자에게 제공되는 동안, 사용자의 어플리케이션을 지속적으로 감시하면서 임시 저장 데이터의 생성 여부를 감시하는 것이 바람직하다. 일 예를 들자면, 컴퓨터 활용시 빈번하게 사용되는 클립보드를 통한 임시 저장 데이터의 생성 여부를 감시하는 것이 바람직하다.As described above, the temporary storage monitoring unit 210 generates temporary storage data while continuously monitoring the user's application while the shared content data based on the original content data is provided to the user through the content sharing service. It is desirable to monitor whether For example, it is desirable to monitor whether temporary storage data is created through a clipboard, which is frequently used when using a computer.

상기 유형 분석부(220)는 상기 임시 저장 감시부(210)에 의해 수집된 상기 임시 저장 데이터를 미리 설정된 콘텐츠 유형을 기준으로 분류하는 것이 바람직하다. 이 때, 미리 설정된 콘텐츠 유형으로는 파일, 이미지, 텍스트, 영상, 음성으로 분류되는 것이 바람직하다.Preferably, the type analysis unit 220 classifies the temporary storage data collected by the temporary storage monitoring unit 210 based on a preset content type. In this case, it is preferable that the preset content types be classified into file, image, text, video, and audio.

또한, 상기 유형 분석부(220)는 분류한 콘텐츠 유형을 이용하여 임시 저장 데이터에 대한 메타 정보, 즉, 공유 콘텐츠 메타 정보를 생성하는 것이 바람직하다.In addition, it is preferable that the type analyzer 220 generates meta-information about the temporarily stored data, that is, the shared content meta-information, by using the classified content type.

상기 n-gram 생성부(230)는 미리 저장되어 있는 n-gram 모델을 기반으로, 상기 임시 저장 데이터를 분석하여, 상기 임시 저장 데이터의 n-gram 정보를 생성하는 것이 바람직하다. 상기 n-gram 생성부(230) 역시, 상기 원본 데이터 처리부(100)의 n-gram 생성부(120)와 마찬가지로, 텍스트 데이터 뿐 아니라, 파일 콘텐츠, 이미지 콘텐츠, 영상 콘텐츠 및 음성 콘텐츠에 대해서도 n-gram 정보를 생성하여 활용하는 것이 바람직하다. 이를 위해, 상기 n-gram 생성부(230)는 상기 유형 분석부(220)에서 분류한 콘텐츠 유형을 이용하여, 상기 원본 콘텐츠 데이터가 파일 콘텐츠일 경우, 바이너리(binary) 형태로 분석하고, 이미지 콘텐츠일 경우, 픽셀(pixel) 형태로 분석하고, 음성 콘텐츠일 경우, 주파수(frequency) 형태로 분석하고, 영상 콘텐츠일 경우, 픽셀과 주파수의 결합 형태를 기반으로 n-gram 정보를 생성하는 것이 바람직하다.Preferably, the n-gram generator 230 generates n-gram information of the temporarily stored data by analyzing the temporarily stored data based on a previously stored n-gram model. The n-gram generating unit 230, like the n-gram generating unit 120 of the original data processing unit 100, is also n-gram for file content, image content, video content, and audio content as well as text data. It is desirable to generate and utilize gram information. To this end, the n-gram generator 230 uses the content type classified by the type analyzer 220 to analyze the original content data in a binary format when the original content data is file content, and image content. In the case of , it is desirable to analyze in the form of pixels, in the case of audio content, in the form of frequency, and in the case of image content, it is desirable to generate n-gram information based on the combination of pixels and frequencies. .

상기 감시정보 생성부(240)는 상기 공유 콘텐츠 메타 정보와 상기 임시 저장 데이터의 n-gram 정보에 대한 정규화(normalization)를 수행한 후 결합하여, 상기 감시 정보를 생성하는 것이 바람직하다. 즉, 공유 콘텐츠 데이터의 불법 유출을 판단할 수 있는 상기 감시 정보를 생성하게 된다.It is preferable that the monitoring information generating unit 240 generates the monitoring information by performing normalization on the shared content meta information and the n-gram information of the temporarily stored data and combining them. That is, the monitoring information capable of determining the illegal leakage of shared content data is generated.

상기 유출 방지부(300)는 상기 원본 정보와 상기 감시 정보를 비교 분석하여, 상기 감시 정보가 상기 원본 정보에 포함되는지 여부를 판단하여, 상기 공유 콘텐츠 데이터의 불법 유출 여부를 판단하는 것이 바람직하다.Preferably, the leakage prevention unit 300 compares and analyzes the original information and the monitoring information, determines whether the monitoring information is included in the original information, and determines whether the shared content data is illegally leaked.

이를 위해, 상기 유출 방지부(300)는 도 1에 도시된 바와 같이, 판단부(310) 및 보안 처리부(320)를 포함하여 구성되는 것이 바람직하다.To this end, the leak prevention unit 300 is preferably configured to include a determination unit 310 and a security processing unit 320 as shown in FIG. 1 .

상기 판단부(310)는 상기 원본 정보와 상기 감시 정보를 비교하여, 상기 감시 정보가 상기 원본 정보에 포함되는지 여부를 판단하는 것이 바람직하다. 상기 판단은, 각각 생성된 n-gram 정보의 기호열에서 추출 가능한 n개의 연속된 시퀀스 집합을 얻고, 집합 내 구성 요소 간의 유사성/동일성을 비교하여 상기 감시 정보로부터 추출한 n-gram 집합의 원소가 상기 원본 정보로부터 추출한 n-gram 집합의 원소와 교집합을 이루고 있는지 판단하는 것이 바람직하다.Preferably, the determination unit 310 compares the original information with the monitoring information to determine whether the monitoring information is included in the original information. The determination is performed by obtaining a set of n consecutive sequences that can be extracted from the symbol string of each generated n-gram information, comparing the similarity/identity among constituent elements in the set, and determining whether the elements of the n-gram set extracted from the monitoring information are the It is desirable to determine whether or not an element of the n-gram set extracted from the original information intersects.

상기 보안 처리부(320)는 상기 판단부(310)의 판단 결과에 따라, 상기 감시 정보가 상기 원본 정보에 포함될 경우, 불법 저장이 이루어진 것으로 판단하고, 상기 공유 콘텐츠 데이터의 불법 유출을 방지하기 위하여, 상기 임시 저장 데이터를 삭제하는 것이 바람직하다.The security processing unit 320 determines that illegal storage has been made when the monitoring information is included in the original information according to the determination result of the determination unit 310, and to prevent illegal leakage of the shared content data, It is preferable to delete the temporarily stored data.

도 3은 본 발명의 일 실시예에 따른 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 방법을 나타낸 순서 예시도로서, 도 3을 참고로 하여, 본 발명의 일 실시예에 따른 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 방법을 상세히 설명한다.3 is a flowchart illustrating a method for preventing illegal leakage of shared content using n-grams according to an embodiment of the present invention. Referring to FIG. 3, sharing using n-grams according to an embodiment of the present invention A method for preventing illegal content leakage will be described in detail.

본 발명의 일 실시예에 따른 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 은 도 3에 도시된 바와 같이, 원본 정보 생성단계(S100), 임시 저장 감시단계(S200), 감시 정보 생성단계(S300), 불법 판단단계(S400) 및 보안 단계(S500)를 포함하여 구성되는 것이 바람직하다.As shown in FIG. 3, the prevention of illegal leakage of shared content using n-gram according to an embodiment of the present invention is an original information generation step (S100), a temporary storage monitoring step (S200), and a monitoring information generation step (S300). , it is preferably configured to include an illegal determination step (S400) and a security step (S500).

각 단계에 대해서 자세히 알아보자면,To learn more about each step,

상기 원본 정보 생성단계(S100)는 상기 원본 데이터 처리부(100)에서, 콘텐츠 공유 서비스(일 예를 들자면, 화상 회의 등)를 통해서 제공되는 원본 콘텐츠 데이터를 전달받아, 다시 말하자면, 화상회의 참석자 중 어느 한명으로부터 입력되는 상기 원본 콘텐츠 데이터를 전달받아, 미리 설정된 분석 알고리즘에 적용하여, 공유 콘텐츠 데이터의 불법 유출을 판단하기 위한 기준이 되는 원본 정보를 생성하게 된다.In the original information generating step (S100), the original data processing unit 100 receives the original content data provided through a content sharing service (for example, video conference, etc.), that is, any of the video conference participants. The original content data input from one person is received and applied to a preset analysis algorithm to generate original information as a standard for judging illegal leakage of shared content data.

상세하게는, 상기 원본 정보 생성단계(S100)는 메타정보 생성단계(S110), 원본 n-gram 생성단계(S120) 및 원본 정보 생성단계(S130)를 더 포함하여 구성되는 것이 바람직하다.In detail, the original information generating step (S100) is preferably configured to further include a meta information generating step (S110), an original n-gram generating step (S120) and an original information generating step (S130).

상기 메타정보 생성단계(S110)는 상기 유형 분석부(110)에서, 상기 원본 콘텐츠 데이터를 미리 설정된 콘텐츠 유형을 기준으로 분류하고, 분류한 콘텐츠 유형을 이용하여 원본 콘텐츠에 대한 메타 정보, 즉, 원본 콘텐츠 메타 정보를 생성하는 것이 바람직하다.In the meta-information generating step ( S110 ), the type analysis unit 110 classifies the original content data based on a preset content type, and uses the classified content type to provide meta information on the original content, that is, the original. It is desirable to generate content meta information.

이 때, 미리 설정된 콘텐츠 유형으로는 파일, 이미지, 텍스트, 영상, 음성으로 분류되는 것이 바람직하다.In this case, it is preferable that the preset content types be classified into file, image, text, video, and audio.

상기 원본 n-gram 생성단계(S120)는 상기 n-gram 생성부(120)에서, 미리 저장되어 있는 n-gram 모델을 기반으로, 상기 원본 콘텐츠 데이터를 분석하여, 상기 원본 콘텐츠 데이터의 n-gram 정보를 생성하게 된다.In the original n-gram generating step (S120), the n-gram generating unit 120 analyzes the original content data based on the n-gram model stored in advance, and generates an n-gram of the original content data. information will be created.

이러한 점을 이용하여 상기 원본 정보 생성단계(S100)를 통해서 상기 원본 콘텐츠 데이터로부터 생성한 n-gram 정보와 상기 감시 정보 생성단계(S300)를 통해서 공유 콘텐츠 데이터를 제공받는 사용자(화상회의 참석자 등)가 공유 콘텐츠 데이터를 제공받는 도중 생성한 임시 저장 데이터로부터 생성한 n-gram 정보를 이용하여, 두 기호열 간 상호성을 분석하여 불법 유출 여부를 판단할 수 있다.Using this point, a user (such as a video conference participant) who is provided with n-gram information generated from the original content data through the original information generating step (S100) and shared content data through the monitoring information generating step (S300) Using n-gram information generated from temporary storage data generated while receiving shared content data, it is possible to analyze the reciprocity between two symbol strings to determine whether illegal leakage occurs.

단, 본 발명의 일 실시예에 따른 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 방법에서는 텍스트 데이터 뿐 아니라, 파일 콘텐츠, 이미지 콘텐츠, 영상 콘텐츠 및 음성 콘텐츠에 대해서도 n-gram 정보를 생성하여 활용하는 것이 바람직하다.However, in the method for preventing illegal leakage of shared content using n-grams according to an embodiment of the present invention, it is not only possible to generate and utilize n-gram information not only for text data but also for file content, image content, video content, and audio content. desirable.

이를 위해, 상기 원본 n-gram 생성단계(S120)는 상기 메타정보 생성단계(S110)에 의해 분류한 콘텐츠 유형을 이용하여, 상기 원본 콘텐츠 데이터가 파일 콘텐츠일 경우, 바이너리(binary) 형태로 분석하고, 이미지 콘텐츠일 경우, 픽셀(pixel) 형태로 분석하고, 음성 콘텐츠일 경우, 주파수(frequency) 형태로 분석하고, 영상 콘텐츠일 경우, 픽셀과 주파수의 결합 형태를 기반으로 n-gram 정보를 생성하는 것이 바람직하다.To this end, the original n-gram generating step (S120) uses the content type classified by the meta information generating step (S110), and when the original content data is file content, it is analyzed in a binary form, , in the case of image content, it is analyzed in the form of pixels, in the case of audio content, it is analyzed in the form of frequency, and in the case of image content, n-gram information is generated based on the combination of pixel and frequency. it is preferable

상세하게는, 상기 원본 n-gram 생성단계(S120)는 상기 메타정보 생성단계(S110)에 의해 분류한 콘텐츠 유형이 파일 콘텐츠일 경우, 파일 헤더의 텍스트 영역과 데이터 부분을 분할하여 추출하게 된다.In detail, in the original n-gram generating step (S120), when the content type classified by the meta information generating step (S110) is file content, the text area and the data part of the file header are divided and extracted.

또한, 상기 원본 n-gram 생성단계(S120)는 상기 메타정보 생성단계(S110)에 의해 분류한 콘텐츠 유형이 이미지 콘텐츠일 경우, 색상(color), 질감(texture) 및 모양(shape) 특징을 분할하여 추출한다. 색상 정보(RGB)는 픽셀 단위의 RGB에 대한 히스토그램화하여 기법을 사용하고, 질감 정보는 GLCM(Grey-Level Co-occurrence Matrix) 방법을 사용하며, 모양 정보는 영역 기반(Region-based) 방법을 사용하여 특징들을 추출하여, 각각의 n-gram 정보를 생성하는 것이 바람직하다.In addition, in the original n-gram generating step (S120), when the content type classified by the meta information generating step (S110) is image content, color, texture, and shape characteristics are divided. to extract Color information (RGB) uses a histogram of pixel unit RGB method, texture information uses the GLCM (Grey-Level Co-occurrence Matrix) method, and shape information uses a region-based method. It is desirable to extract features using

또한, 상기 원본 n-gram 생성단계(S120)는 상기 메타정보 생성단계(S110)에 의해 분류한 콘텐츠 유형이 음성 콘텐츠일 경우, 주파수 영역을 스펙트럼 방식으로 분류하여 특징을 추출하여 n-gram 정보를 생성하는 것이 바람직하다.In addition, in the original n-gram generating step (S120), when the content type classified by the meta-information generating step (S110) is voice content, the frequency domain is classified in a spectral method to extract features to generate n-gram information. It is desirable to create

또한, 상기 원본 n-gram 생성단계(S120)는 상기 메타정보 생성단계(S110)에 의해 분류한 콘텐츠 유형이 영상 콘텐츠일 경우, 이미지 콘텐츠에서 추출하는 색상, 질감, 모양 특징과 음성 콘텐츠에서 추출하는 주파수 영역의 특징, 영상 콘텐츠의 타임라인 특징(timeline-feature)을 추출하여, n-gram을 수행하고, 생성된 타임라인 n-gram별로 각각의 n-gram 정보를 생성하는 것이 바람직하다.In addition, in the original n-gram generating step (S120), when the content type classified by the meta-information generating step (S110) is video content, color, texture, shape features extracted from image content, and audio content extracted from It is preferable to extract the frequency domain feature and the timeline-feature of the image content, perform n-grams, and generate each n-gram information for each generated timeline n-gram.

상기 원본 정보 생성단계(S130)는 상기 원본정보 생성부(130)에서, 상기 메타정보 생성단계(S110)에 의한 상기 원본 콘텐츠 메타 정보와, 상기 원본 n-gram 생성단계(S120)에 의한 상기 원본 콘텐츠 데이터의 n-gram 정보를 전달받아, 정규화(normalization)를 수행한 후 결합하여, 상기 원본 정보를 생성하는 것이 바람직하다. 즉, 공유 콘텐츠 데이터의 불법 유출을 판단하기 위한 기준이 되는 상기 원본 정보를 생성하게 된다.In the original information generating step (S130), in the original information generating unit 130, the original content meta information by the meta information generating step (S110) and the original by the original n-gram generating step (S120) It is preferable to receive the n-gram information of the content data, perform normalization, and combine them to generate the original information. That is, the original information, which is a standard for determining illegal leakage of shared content data, is generated.

상기 임시 저장 감시단계(S200)는 상기 공유 데이터 처리부(200)의 임시 저장 감시부(210)에서, 상기 콘텐츠 공유 서비스를 통해서 상기 원본 콘텐츠 데이터에 의한 상기 공유 콘텐츠 데이터가 사용자에게 제공되는 동안, 사용자의 어플리케이션을 지속적으로 감시하면서 임시 저장 데이터의 생성 여부, 다시 말하자면, 임시 저장되는 데이터의 발생 여부를 감시하는 것이 바람직하다. 일 예를 들자면, 컴퓨터 활용시 빈번하게 사용되는 클립보드를 통한 임시 저장 데이터의 생성 여부를 감시하는 것이 바람직하다.The temporary storage monitoring step (S200) is performed in the temporary storage monitoring unit 210 of the shared data processing unit 200, while the shared contents data by the original contents data is provided to the user through the contents sharing service, a user While continuously monitoring the application of For example, it is desirable to monitor whether temporary storage data is created through a clipboard, which is frequently used when using a computer.

상기 감시 정보 생성단계(S300)는 상기 공유 데이터 처리부(200)에서, 상기 임시 저장 감시 단계(S200)에 의해 상기 공유 콘텐츠 데이터가 제공되는 동안 임시 저장 데이터가 발생될 경우, 발생된 상기 임지 저장 데??어를 수집하여 미리 설정된 분석 알고리즘에 적용하여 감시 정보를 생성하는 것이 바람직하다.The monitoring information generation step (S300) is the temporary storage data generated when the temporary storage data is generated while the shared content data is provided by the temporary storage monitoring step (S200) in the shared data processing unit 200 It is desirable to generate monitoring information by collecting data and applying it to a preset analysis algorithm.

상세하게는, 상기 감시 정보 생성단계(S300)는 메타정보 생성단계(S310), 공유 n-gram 생성단계(S320) 및 감시 정보 생성단계(S330)를 더 포함하여 구성되는 것이 바람직하다.In detail, the monitoring information generation step (S300) is preferably configured to further include a meta information generation step (S310), a shared n-gram generation step (S320) and a monitoring information generation step (S330).

상기 메타정보 생성단계(S310)는 상기 유형 분석부(220)에서, 상기 임시 저장 감시단계(S200)에 의해 수집한 상기 임시 저장 데이터를 미리 설정된 콘텐츠 유형을 기준으로 분류하고, 분류한 콘텐츠 유형을 이용하여 임시 저장 데이터에 대한 메타 정보, 즉, 공유 콘텐츠 메타 정보를 생성하는 것이 바람직하다.In the meta-information generation step (S310), the type analysis unit 220 classifies the temporary storage data collected by the temporary storage monitoring step (S200) based on a preset content type, and selects the classified content type. It is preferable to generate meta-information about the temporarily stored data, that is, shared content meta-information, by using it.

상기 공유 n-gram 생성단계(S320)는, 상기 n-gram 생성부(230)에서, 미리 저장되어 있는 n-gram 모델을 기반으로, 상기 임시 저장 데이터를 분석하여, 상기 임시 저장 데이터의 n-gram 정보를 생성하는 것이 바람직하다.In the shared n-gram generating step (S320), the n-gram generating unit 230 analyzes the temporarily stored data based on the previously stored n-gram model, It is desirable to generate gram information.

상기 공유 n-gram 생성단계(S320) 또한 텍스트 데이터 뿐 아니라, 파일 콘텐츠, 이미지 콘텐츠, 영상 콘텐츠 및 음성 콘텐츠에 대해서도 n-gram 정보를 생성하여 활용하는 것이 바람직하다. 이를 위해, 상기 메타정보 생성단계(S310)에 의해 분류한 콘텐츠 유형을 이용하여, 상기 원본 콘텐츠 데이터가 파일 콘텐츠일 경우, 바이너리(binary) 형태로 분석하고, 이미지 콘텐츠일 경우, 픽셀(pixel) 형태로 분석하고, 음성 콘텐츠일 경우, 주파수(frequency) 형태로 분석하고, 영상 콘텐츠일 경우, 픽셀과 주파수의 결합 형태를 기반으로 n-gram 정보를 생성하는 것이 바람직하다.In the shared n-gram generating step (S320), it is preferable to generate and utilize n-gram information not only for text data but also for file content, image content, video content, and audio content. To this end, using the content type classified by the meta information generating step S310, if the original content data is file content, it is analyzed in a binary format, and in the case of image content, it is analyzed in a pixel format. , and in the case of audio content, it is desirable to analyze it in the form of a frequency, and in the case of image content, it is desirable to generate n-gram information based on the combination of pixels and frequencies.

상기 감시 정보 생성단계(S330)는 상기 감시정보 생성부(240)에서, 상기 메타정보 생성단계(S310)에 의한 상기 공유 콘텐츠 메타 정보와 상기 공유 n-gram 생성단계(S320)에 의한 상기 임지 저장 데이터의 n-gram 정보를 전달받아, 정규화를 수행한 후 결합하여 상기 감시 정보를 생성하게 된다. 이를 통해서 공유 콘텐츠 데이터의 불법 유출을 판단할 수 있는 상기 감시 정보를 생성하게 된다.In the monitoring information generating step (S330), in the monitoring information generating unit 240, the shared content meta information by the meta information generating step (S310) and the temporary storage by the shared n-gram generating step (S320) The n-gram information of the data is received, normalized and then combined to generate the monitoring information. Through this, the monitoring information capable of determining the illegal leakage of shared content data is generated.

상기 불법 판단단계(S400)는 상기 유출 방지부(300)에서, 상기 원본 정보 생성단계(S100)에 의한 상기 원본 정보와, 상기 감시 정보 생성단계(S300)에 의한 상기 감시 정보를 전달받아, 상기 원본 정보와 상기 감시 정보를 비교 분석하여, 상기 감시 정보가 상기 원본 정보에 포함되는지 여부를 판단하게 된다.In the illegal determination step (S400), the leakage prevention unit 300 receives the original information by the original information generation step (S100) and the monitoring information by the monitoring information generation step (S300), the By comparing and analyzing the original information and the monitoring information, it is determined whether the monitoring information is included in the original information.

상세하게는, 상기 불법 판단단계(S400)는 상기 원본 정보와 상기 감시 정보를 비교하여, 상기 감시 정보가 상기 원본 정보에 포함되는지 여부를 판단하되, 기 판단은, 각각 생성된 n-gram 정보의 기호열에서 추출 가능한 n개의 연속된 시퀀스 집합을 얻고, 집합 내 구성 요소 간의 유사성/동일성을 비교하여 상기 감시 정보로부터 추출한 n-gram 집합의 원소가 상기 원본 정보로부터 추출한 n-gram 집합의 원소와 교집합을 이루고 있는지 판단하는 것이 바람직하다.In detail, the illegal determination step (S400) compares the original information with the monitoring information to determine whether the monitoring information is included in the original information, but the determination is, Obtain a set of n consecutive sequences extractable from the symbol string, compare the similarity/identity between the elements in the set, and intersect the element of the n-gram set extracted from the monitoring information with the element of the n-gram set extracted from the original information It is desirable to determine whether

상기 보안 단계(S500)는 상기 불법 판단단계(S400)의 판단 결과에 따라, 상기 감시 정보가 상기 원본 정보에 포함될 경우, 불법 저장이 이루어진 것으로 판단하고, 상기 공유 콘텐츠 데이터의 불법 유출을 방지하기 위하여, 상기 임시 저장 데이터를 삭제하는 것이 바람직하다.In the security step (S500), according to the determination result of the illegal determination step (S400), when the monitoring information is included in the original information, it is determined that illegal storage has been made, and in order to prevent illegal leakage of the shared content data , it is preferable to delete the temporarily stored data.

즉, 다시 말하자면, 본 발명의 일 실시예에 따른 n-gram을 이용한 공유 콘텐츠 불법 유출 방지 시스템 및 그 방법은, 화상 회의 등의 콘텐츠 공유 서비스를 통해서 공유되는 콘텐츠와 사용자의 클립보드의 데이터를 판독하여, 보호하고자 하는 공유 콘텐츠의 내용이 포함된 클립보드 데이터 만을 삭제하여, 사용자 데이터의 불필요한 손실이나 일반적인 업무 수행의 저하 없이 공유 콘텐츠의 불법 유출을 방지할 수 있는 장점이 있다.That is, in other words, the system and method for preventing illegal sharing of content using n-gram according to an embodiment of the present invention read content shared through a content sharing service such as video conference and data of a user's clipboard Accordingly, there is an advantage in that it is possible to prevent illegal leakage of shared content without unnecessary loss of user data or deterioration of general work performance by deleting only the clipboard data including the content of the shared content to be protected.

이상과 같이 본 발명에서는 구체적인 구성 소자 등과 같은 특정 사항들과 한정된 실시예 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것 일 뿐, 본 발명은 상기의 일 실시예에 한정되는 것이 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.As described above, in the present invention, specific matters such as specific components and the like and limited embodiment drawings have been described, but these are only provided to help a more general understanding of the present invention, and the present invention is not limited to the above one embodiment. No, various modifications and variations are possible from these descriptions by those of ordinary skill in the art to which the present invention pertains.

따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허 청구 범위뿐 아니라 이 특허 청구 범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the described embodiments, and not only the claims to be described later, but also all those with equivalent or equivalent modifications to the claims will be said to belong to the scope of the spirit of the present invention. .

100 : 원본 데이터 처리부
110 : 유형 분석부 120 : n-gram 생성부
130 : 원본정보 생성부
200 : 공유 데이터 처리부
210 : 임시 저장 감시부 220 : 유형 분석부
230 : n-gram 생성부 140 : 감시정보 생성부
300 : 유출 방지부
310 : 판단부 320 : 보안 처리부100: original data processing unit
110: type analysis unit 120: n-gram generation unit
130: original information generation unit
200: shared data processing unit
210: temporary storage monitoring unit 220: type analysis unit
230: n-gram generating unit 140: monitoring information generating unit
300: leak prevention unit
310: determination unit 320: security processing unit

Claims

an original data processing unit 100 that receives original content data provided through a content sharing service and generates original information by applying a preset analysis algorithm;
Receive the shared content data provided by the original content data through the content sharing service, monitor the data temporarily stored while the shared content data is provided, and collect the temporarily stored data according to the monitoring result for preset analysis A shared data processing unit 200 for generating monitoring information by applying an algorithm; and
a leak prevention unit 300 that compares and analyzes the original information and the monitoring information, determines whether the monitoring information is included in the original information, and determines whether the shared content data is illegally leaked;
consists of,
The shared data processing unit 200
a temporary storage monitoring unit 210 for monitoring whether temporary storage data is generated while the shared contents data is provided by the original contents data through a contents sharing service;
a type analysis unit 220 for classifying the temporary storage data monitored by the temporary storage monitoring unit 210 according to a preset content type, and generating shared content meta information using the classified content type;
an n-gram generator 230 that analyzes the temporarily stored data based on a pre-stored n-gram model and generates n-gram information of the temporarily stored data; and
a monitoring information generating unit 240 for generating the monitoring information by combining the shared content meta information with the n-gram information of the temporary storage data after normalization;
A system for preventing illegal leakage of shared content using n-gram, characterized in that it comprises a.

The method of claim 1,
The original data processing unit 100
a type analysis unit 110 for classifying the original content data according to a preset content type and generating original content meta information by using the classified content type;
an n-gram generator 120 that analyzes the original content data based on a pre-stored n-gram model and generates n-gram information of the original content data; and
an original information generating unit 130 for generating the original information by combining the original content meta information with the n-gram information of the original content data after performing normalization;
Shared content illegal leakage prevention system using n-gram, characterized in that it further comprises a.

delete

The method of claim 1,
The leak prevention unit 300 is
a determination unit 310 that compares the original information with the monitoring information and determines whether the monitoring information is included in the original information; and
a security processing unit 320 that determines that the shared content data is illegally leaked when the monitoring information is included in the original information according to the determination result of the determination unit 310 and deletes the temporary storage data;
Shared content illegal leakage prevention system using n-gram, characterized in that it further comprises a.

an original information generating step (S100) of receiving, in the original data processing unit, original content data provided through a content sharing service, and generating original information through a preset analysis algorithm;
a temporary storage monitoring step (S200) of monitoring, in the shared data processing unit, whether or not temporarily stored data is generated while the shared contents data according to the original contents data is provided through the contents sharing service;
When the temporary storage data is generated while the shared content data is provided by the temporary storage monitoring step (S200), the monitoring information generation step of collecting the generated temporary storage data and generating monitoring information through a preset analysis algorithm (S300);
In the leak prevention unit, the original information by the original information generating step (S100) and the monitoring information by the monitoring information generating step (S300) are received, the information is compared and analyzed, and the monitoring information is the original an illegal determination step (S400) of determining whether the information is included in the information, and determining whether the shared content data is illegally leaked; and
a security step (S500) of determining that the shared content data is illegally leaked when the monitoring information is included in the original information according to the determination result of the illegal determination step (S400), and deleting the temporary storage data (S500);
consists of,
The monitoring information generation step (S300) is
a meta-information generating step (S310) of classifying a content type of the temporarily stored data and generating shared content meta information according to the classified content type;
a shared n-gram generation step of generating n-gram information of the temporarily stored data by analyzing the temporarily stored data based on a pre-stored n-gram model (S320); and
The shared content meta information by the meta information generating step (S310) and the n-gram information of the temporary storage data by the shared n-gram generating step (S320) are received, normalized and combined to monitor the Monitoring information generating step of generating information (S330);
A method for preventing illegal leakage of shared content using n-gram, characterized in that it comprises a.

6. The method of claim 5,
The original information generation step (S100) is
a meta-information generating step (S110) of classifying a content type of the original content data and generating original content meta information according to the classified content type;
an original n-gram generation step of generating n-gram information of the original content data by analyzing the original content data based on a pre-stored n-gram model (S120); and
The original content meta information by the meta information generating step (S110) and the n-gram information of the original content data by the original n-gram generating step (S120) are received, normalized and then combined to create the original Original information generation step of generating information (S130);
A method for preventing illegal leakage of shared content using n-gram, characterized in that it further comprises a.

delete