KR20230109038A

KR20230109038A - Method of automatically detecting harmful media, and System there-for

Info

Publication number: KR20230109038A
Application number: KR1020220004938A
Authority: KR
Inventors: 이소민; 유현; 김종민; 김태룡; 강성민; 허원무; 전유민; 조서연; 김다솜; 김예령
Original assignee: 이소민
Priority date: 2022-01-12
Filing date: 2022-01-12
Publication date: 2023-07-19

Abstract

본 발명의 유해 미디어 자동 탐지 시스템은 사용자에게 신고 페이지를 제공 하고, 사용자로부터 신고 페이지를 통해 피해 미디어 파일이나 URL을 신고 받고, 신고 대상이 URL일 경우 URL로부터 미디어 링크를 추출하여 미디어 파일을 다운로드 하고, 다운로드 된 미디어 파일로부터 비디오 식별 데이터(VID)를 추출하며, 신고 대상이 미디어 파일인 경우 상기 미디어 파일로부터 VID를 추출하는 신고 웹 서버, 상기 신고 웹 서버로부터 상기 VID를 수신하여 API 데이터베이스에 저장하며, 상기 VID를 방송통신심의 위원회 서버로 전송하며, 상기 방송통신심의 위원회 서버 로부터 신고 결과를 수신하여 상기 신고 웹서버로 전송함으로써, 사용자가 상기 신 고 웹서버의 결과페이지에 접근하여 신고결과를 확인할 수 있도록 하는 API 서버, 수사관이 채증한 미디어로부터 VID를 추출하고, 추출된 채증 미디어 VID를 API서버 의 데이터 베이스에 저장되어 있는 저장 미디어 VID와 비교하여, 두 미디어 간의 유사도를 산출하며, 유사도 높은 미디어들을 그룹화하여 유사 미디어 그룹을 생성 하는 미디어 유사도 비교기, 유해 미디어를 자동으로 수집하고, 수집된 유해 미디 어로부터 VID를 추출하여 상기 API 서버의 데이터베이스에 저장하는 유해 미디어 크롤러를 포함한다.The automatic harmful media detection system of the present invention provides a report page to the user, receives a report of a victim media file or URL from the user through the report page, and extracts a media link from the URL to download the media file if the target of the report is a URL. , A reporting web server that extracts video identification data (VID) from the downloaded media file, and extracts the VID from the media file when the reporting target is a media file, receives the VID from the reporting web server and stores it in an API database, , The VID is transmitted to the Korea Communications Standards Commission server, and the report result is received from the Korea Communications Standards Commission server and transmitted to the report web server, so that the user accesses the report web server result page and confirms the report result. API server that allows investigators to extract the VID from the media collected, compare the extracted media VID with the storage media VID stored in the database of the API server, calculate the similarity between the two media, and media with high similarity It includes a media similarity comparator that groups objects to create similar media groups, and a harmful media crawler that automatically collects harmful media, extracts VIDs from the collected harmful media, and stores them in the database of the API server.

Description

Method of automatically detecting harmful media and its system {Method of automatically detecting harmful media, and System there-for}

본 발명은 디지털 성범죄로 인한 피해 미디어 또는 유해 미디어를 자동으로 탐지하여 신고 및 삭제 요청하기 위한 서비스 방법 및 시스템에 관한 것이다.The present invention relates to a service method and system for automatically detecting, reporting, and deleting damaged media or harmful media due to digital sex crimes.

디지털 성범죄의 대부분을 차지하는 불법촬영 및 성범죄 영상의 유포문제는 피해 지속화에 큰 영향을 미친다. 현재 존재하는 영상 삭제 방안은 크게 정부지원 사업을 통하는 방법과 디지털 장의업체를 통해 삭제하는 방법 두 가지가 있지만, 두 방안 모두 신상을 밝히고 신고해야 한다는 점과 피해 영상을 기관이 소지한다는 문제점이 있다.Illegal filming and distribution of sex crime videos, which account for most of the digital sex crimes, have a great impact on the continuation of the damage. Currently, there are two ways to delete videos: through government support projects and through digital funeral service companies, but both methods have problems in that you have to disclose your identity and report it, and that the institution has possession of the damaged videos.

더불어 디지털 장의업체는 자료유출과 범죄가담 발생 여력이 있기에, 피해자 가 신고하기 위한 심리적 장벽이 높다.In addition, since digital funeral home companies have the potential for data leakage and criminal involvement, the psychological barrier for victims to report is high.

또한 영상이라는 디지털 파일의 특성상 여러 음란물 유포 사이트나 메신저를 통해 유포가 쉽다. 피해 사실을 확인하기까지 수많은 음란물 유포 사이트를 하나하나 확인해야 하며, 이를 각각 신고해야 하는 것은 피해 사실에 지속적으로 노출되는 문제를 야기하기에, 더 나은 방안을 모색해야할 필요성이 있다.In addition, due to the nature of a digital file called a video, it is easy to distribute it through various pornography distribution sites or messengers. It is necessary to check numerous pornography distribution sites one by one until the fact of damage is confirmed, and reporting each of them causes a problem of continuous exposure to the fact of damage, so there is a need to find a better way.

디지털 파일의 특성상 촬영기기와 전송 수단에 따라 각기 다른 아티팩트(artifact)가 생성된다. 특히 디지털 성범죄가 일어나거나, 피해 영상이 유포되는 메신저 플랫폼들은 각각의 아티팩트가 존재하므로, 이를 분석하여 유포과정을 추적해 낼 가능성이 있지만, 이와 관련된 연구가 활발히 이루어지지 않고 있다.Due to the nature of digital files, different artifacts are created depending on the recording device and transmission method. In particular, messenger platforms where digital sex crimes occur or victim images are distributed have their own artifacts, so it is possible to analyze and track the distribution process, but research related to this is not actively conducted.

디지털 성범죄 용의자의 PC를 수색하여 피해 미디어 파일을 찾고자 할 때, 수사관은 현재 해시(hash) 기반 '불법촬영물 추적 시스템'을 사용하고 있다. 음란 동영상의 해시 세트(hash set)와 채증된 미디어 파일을 비교하여 식별해내는 구조이지만, 국내 음란물 사이트는 대부분 영상에 광고를 삽입하는 등의 변조를 가하기에, 해시 기반 대조로는 탐지할 수 없다. 또한 수사관이 미디어를 확인하는 과정에 정신적 충격을 보호하기 위한 장치가 미비하다.When searching the digital sex crime suspect's PC to find victim media files, investigators are currently using a hash-based 'illegal footage tracking system'. It is a structure that compares the hash set of obscene videos and the collected media files to identify them, but since most domestic pornography sites are tampered with, such as inserting advertisements in the video, it cannot be detected with hash-based collation. . In addition, there are insufficient devices to protect the mental shock in the process of checking the media by the investigator.

이와 같이, 기존에는 비교 대상이 원본과 큰 변화가 없을 때만 동일 유해 영 상이라고 판별한다. 하지만, 리사이즈(영상의 사이즈가 축소되거나 확대된) 영상과 광고(광고 동영상이 삽입된) 영상이 있는 경우, 판별을 할 수 없거나 다른 영상이라고 판별한다.In this way, in the past, only when there is no significant change between the comparison target and the original, it is determined that it is the same harmful image. However, if there is a resize (reduced or enlarged image size) image and an advertisement (advertisement video inserted) image, it cannot be determined or it is determined that the image is different.

실제로 경찰청에서는 동일한 유해 영상이지만, 리사이즈(축소)가 되거나 잡음 혹은 광고가 삽입될 경우 판별할 수 없거나 판독 능력이 떨어져서 수사관이 직 접 육안으로 판단한다. 이 경우, 육안으로 판별해야 하므로 비효율적이고 2차 피해가 생길 가능성이 있다.In fact, in the National Police Agency, it is the same harmful video, but if it is resized (reduced) or if noise or advertisements are inserted, it cannot be identified or the reading ability is poor, so the investigator directly judges it with the naked eye. In this case, since it must be visually determined, it is inefficient and there is a possibility of secondary damage.

본 발명이 이루고자 하는 기술적 과제는 비교 대상이 원본과 차이가 있는 경 우에도 판별력을 높여 디지털 성범죄로 인한 피해 미디어를 자동으로 탐지할 수 있는 서비스 방법 및 시스템을 제공하는 것이다.The technical problem to be achieved by the present invention is to provide a service method and system capable of automatically detecting victim media due to digital sex crimes by increasing discriminative power even when the comparison target is different from the original.

본 발명의 실시예에 따르면, 사용자에게 신고 페이지를 제공하고, 사용자로 부터 신고 페이지를 통해 피해 미디어 파일이나 URL을 신고 받고, 신고 대상이 URL 일 경우 URL로부터 미디어 링크를 추출하여 미디어 파일을 다운로드하고, 다운로드 된 미디어 파일로부터 비디오 식별 데이터(VID)를 추출하며, 신고 대상이 미디어 파일인 경우 상기 미디어 파일로 부터 VID를 추출하는 신고 웹 서버; 상기 신고 웹 서버로부터 상기 VID를 수신하여 API 데이터베이스에 저장하며, 상기 VID를 방 송통신심의 위원회 서버로 전송하며, 상기 방송통신심의 위원회 서버로부터 신고 결과를 수신하여 상기 신고 웹서버로 전송함으로써, 사용자가 상기 신고 웹서버의 결과페이지에 접근하여 신고결과를 확인할 수 있도록 하는 API 서버; 수사관이 채 증한 미디어로부터 VID를 추출하고, 추출된 채증 미디어 VID를 API서버의 데이터 베이스에 저장되어 있는 저장 미디어 VID와 비교하여, 두 미디어 간의 유사도를 산 출하며, 유사도 높은 미디어들을 그룹화하여 유사 미디어 그룹을 생성하는 미디어 유사도 비교기; 및 유해 미디어를 자동으로 수집하고, 수집된 유해 미디어로부터 VID를 추출하여 상기 API 서버의 데이터베이스에 저장하는 유해 미디어 크롤러를 포함하는 유해 미디어 자동 탐지 시스템이 제공된다.According to an embodiment of the present invention, a report page is provided to the user, a victim media file or URL is reported from the user through the report page, and when the target of the report is a URL, a media link is extracted from the URL to download the media file , A reporting web server that extracts video identification data (VID) from the downloaded media file, and extracts the VID from the media file when the reporting target is a media file; By receiving the VID from the reporting web server, storing it in an API database, transmitting the VID to the Broadcasting and Communications Standards Commission server, and receiving the report result from the Korea Communications Standards Commission server and transmitting it to the reporting web server, the user an API server that allows the user to access the result page of the report web server and check the report result; The VID is extracted from the media verified by the investigator, and the extracted media VID is compared with the storage media VID stored in the database of the API server to calculate the similarity between the two media, and media with high similarity are grouped to obtain similar media. a media similarity comparator that creates groups; and a harmful media crawler that automatically collects harmful media, extracts a VID from the collected harmful media, and stores it in a database of the API server.

본 발명의 실시예에 따르면, 사용자에게 신고 페이지를 제공하고, 사용자로 부터 신고 페이지를 통해 피해 미디어 파일이나 URL을 신고받는 단계; 사용자로부터 신고 받은 신고 대상이 URL일 경우 URL로부터 미디어 링크를 추출하여 미디어 파일을 다운로드하고, 다운로드 된 미디어 파일로부터 비디오 식별 데이터(이하, VID 라 함)를 추출하며, 신고 대상이 미디어 파일인 경우 상기 미디어 파일로부터 VID 를 추출하고, 상기 미디어 파일은 삭제하는 단계; 상기 신고 웹 서버로부터 상기 VID를 수신하여 데이터베이스에 저장하며, 상기 VID를 신고 기관 서버로 전송하며, 상기 신고기관 서버로부터 신고 결과를 수신하여 상기 신고 웹서버로 전송함으로 써, 사용자가 상기 신고 웹서버의 결과 페이지에 접근하여 신고결과를 확인할 수 있도록 하는 단계; 채증 미디어로부터 VID를 추출하고, 추출된 채증 미디어 VID를 상기 데이터 베이스에 저장되어 있는 미디어 VID와 비교하여, 두 미디어 간의 유사 도를 산출하는 단계; 상기 채증 미디어는 삭제하는 단계; 산출된 유사도를 이용하 여 유사도 높은 미디어들을 그룹화하여 유사 미디어 그룹을 생성하는 단계; 및 유 해 미디어를 자동으로 수집하고, 수집된 유해 미디어로부터 VID를 추출하여 상기 데이터베이스에 저장하는 단계; 및 수집된 유해 미디어는 삭제하는 단계를 포함하 는 유해 미디어 자동 탐지 방법이 제공된다.According to an embodiment of the present invention, providing a report page to a user and receiving a report of a damaged media file or URL from the user through the report page; If the report target received from the user is a URL, the media link is extracted from the URL to download the media file, video identification data (hereinafter referred to as VID) is extracted from the downloaded media file, and if the report target is a media file, the above extracting a VID from a media file and deleting the media file; The VID is received from the reporting web server, stored in the database, the VID is transmitted to the reporting agency server, and the reporting result is received from the reporting agency server and transmitted to the reporting web server, so that the user can access the reporting web server. Accessing the result page of the report so that the report result can be checked; extracting a VID from the collected media, comparing the extracted media VID with a media VID stored in the database, and calculating a similarity between the two media; Deleting the collection media; generating a similar media group by grouping media having a high similarity using the calculated similarity; and automatically collecting harmful media, extracting a VID from the collected harmful media, and storing the extracted VID in the database. and deleting the collected harmful media.

본 발명에 따르면, 피해자의 원본 미디어(영상)가 아닌, VID를 추출하여, 저장하기 때문에 유출과 소장의 위험이 적으며, 위와 같은 이유로 피해자의 개인정보를 보호할 수 있다.According to the present invention, since the VID is extracted and stored, rather than the original media (video) of the victim, the risk of leakage and collection is low, and the victim's personal information can be protected for the above reasons.

또한 미디어 간의 유사도 산출시, 미디어 전체에 대한 다각적 유사도 검사 (음원+이미지+비율 등)를 통하여 다방면적 유사도를 평가한다. 이에 따라 피해자나 수사관의 분석이 별도로 필요하지 않으며, 신고 기준 유사도(예컨대, 90%) 이상의 미디어만을 자동적으로 신고할 수 있다.In addition, when calculating the similarity between media, multi-dimensional similarity is evaluated through a multilateral similarity test (sound source + image + ratio, etc.) for the entire media. Accordingly, a separate analysis by the victim or the investigator is not required, and only media with a similarity level (eg, 90%) or higher can be automatically reported.

또한, 수사관이 직접 미디어를 시청하지 않아도 되기 때문에 정신적 트라우마 혹은 2차 가해를 방지 할 수 있다.In addition, since the investigator does not have to directly watch the media, psychological trauma or secondary harm can be prevented.

또한 수사관 외에도 신고자 또한 실시간 현황확인이 가능하며, 변조(리사이즈, 광고) 영상의 유사도 비교도 가능하다. 동일 영상의 선별 시간이 축소되며, 유 사도 판별의 정확도 향상된다. VID 추출 이후 원본 영상이 필요없기에 최소한의 DB 만 필요하다.In addition to the investigator, the reporter can also check the current status in real time, and it is possible to compare the similarity of the altered (resize, advertisement) video. The screening time for the same image is reduced, and the accuracy of similarity discrimination is improved. Since the original video is not needed after VID extraction, only a minimal DB is required.

도 1은 본 발명의 일 실시예에 따른 유해 미디어 자동 탐지 시스템을 나타내는 도면이다.
도 2는 본 발명의 실시예에 따른 신고 웹 서버의 동작을 설명하는 플로우챠트이다.
도 3은 본 발명의 일 실시예에 따른 미디어 유사도 비교기의 동작을 설명하 기 위한 플로우챠트이다.
도 4는 본 발명의 일 실시예에 따른 유해 미디어 크롤러의 동작을 설명하기 위한 플로우챠트이다.1 is a diagram illustrating an automatic harmful media detection system according to an embodiment of the present invention.
2 is a flowchart illustrating the operation of a reporting web server according to an embodiment of the present invention.
3 is a flowchart for explaining the operation of a media similarity comparator according to an embodiment of the present invention.
4 is a flowchart for explaining the operation of a harmful media crawler according to an embodiment of the present invention.

본 명세서에 개시되어 있는 본 발명의 개념에 따른 실시 예들에 대해서 특정 한 구조적 또는 기능적 설명은 단지 본 발명의 개념에 따른 실시 예들을 설명하기 위한 목적으로 예시된 것으로서, 본 발명의 개념에 따른 실시 예들은 다양한 형태 들로 실시될 수 있으며 본 명세서에 설명된 실시 예들에 한정되지 않는다.Specific structural or functional descriptions of the embodiments according to the concept of the present invention disclosed in this specification are only illustrated for the purpose of explaining the embodiments according to the concept of the present invention, and the embodiment according to the concept of the present invention These may be embodied in various forms and are not limited to the embodiments described herein.

본 발명의 개념에 따른 실시 예들은 다양한 변경들을 가할 수 있고 여러 가 지 형태들을 가질 수 있으므로 실시예들을 도면에 예시하고 본 명세서에서 상세하 게 설명하고자 한다. 그러나, 이는 본 발명의 개념에 따른 실시예들을 특정한 개시 형태들에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물, 또는 대체물을 포함한다.Embodiments according to the concept of the present invention can apply various changes and can have various forms, so the embodiments are illustrated in the drawings and described in detail in this specification. However, this is not intended to limit the embodiments according to the concept of the present invention to specific disclosed forms, and includes all modifications, equivalents, or substitutes included in the spirit and scope of the present invention.

이하 도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명하기로 한다. 이에 앞서, 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전 적인 의미로 한정하여 해석되어서는 아니되며, 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. Prior to this, the terms or words used in this specification and claims should not be construed as limited to ordinary or dictionary meanings, but should be interpreted as meanings and concepts consistent with the technical spirit of the present invention.

도 1은 본 발명의 일 실시예에 따른 유해 미디어 자동 탐지 시스템을 나타내는 도면이다. 유해 미디어는 주로 디지털 성범죄 피해 미디어를 의미하나, 범죄에 관련있거나 일반인에게 유해한 영상, 사진 등의 미디어를 포괄하는 의미로 사용될 수 있다.1 is a diagram illustrating an automatic harmful media detection system according to an embodiment of the present invention. Harmful media mainly means digital sex crime victim media, but can be used to encompass media such as videos and photos that are related to crime or are harmful to the general public.

이하에서는 주로 디지털 성범죄 피해 미디어를 자동 탐지하여 신고하는 시스템 및 방법을 실시예로 기술하나, 디지털 성범죄 피해 미디어에 한정되는 것은 아니다.Hereinafter, a system and method for automatically detecting and reporting digital sex crime victim media will be described as an example, but it is not limited to digital sex crime victim media.

도 1을 참조하면, 유해 미디어 자동 탐지 시스템은 신고 웹 서버(10), 미디 어 유사도 비교기(20), API 서버(30), 및 유해미디어 크롤러(40)를 포함한다.Referring to FIG. 1, the automatic harmful media detection system includes a reporting web server 10, a media similarity comparator 20, an API server 30, and a harmful media crawler 40.

신고 웹 서버(10)는 디지털 성범죄의 피해자를 대상으로 디지털 성범죄 피해를 신고 및 접수할 수 있는 웹사이트를 제공하는 서버로서, 사용자가 간단히 URL 입력 혹은 미디어 파일 첨부만으로 디지털 성범죄 미디어(영상, 사진 등)의 추적 요청 및 신고 접수가 가능하도록 구성된다.The report web server 10 is a server that provides a website for victims of digital sex crimes to report and receive digital sex crimes, and allows users to simply enter URLs or attach media files to digital sex crimes media (videos, photos, etc.) ) is configured to enable tracking requests and reports.

신고 웹 서버(10)는 디지털 성범죄 피해내역이 접수되면, 피해 내역과 유사 한 영상이 업로드 된 사이트 링크를 추적하여 신고기관(예컨대, 방송통신심의위원 회)(50)에 자동 신고가 이루어질 수 있도록 한다.The reporting web server 10 tracks the site link where a video similar to the damage details was uploaded when a digital sex crime damage details are received, so that automatic reporting can be made to the reporting agency (eg, Korea Communications Standards Commission) 50 do.

도 2는 본 발명의 실시예에 따른 신고 웹 서버(10)의 동작을 설명하는 플로우 챠트이다.2 is a flowchart illustrating the operation of the reporting web server 10 according to an embodiment of the present invention.

사용자(예컨대, 디지털 성범죄 피해자)가 피해 미디어 파일이나 URL을 발견할 경우, 사용자는 신고 웹 서버(10)가 제공하는 신고 사이트에 접속하여(S120), 신고 페이지에 접근을 하여 신고 대상 등을 입력하여 신고한다(S130). 신고 웹 서버(10)는 신고 대상이 URL인지 여부를 판단하여(S140), 신고 대상이 URL인 경우, URL로부터 미디어 링크를 추출하고(S150), 추출된 미디어 링크를 이용하여 미디어 파일을 다운로드하며(S160), 다운로드한 미디어 파일로부터 비디오 식별 데이 터(VID)를 추출한다(S170). 신고 대상이 URL이 아니고 미디어 파일인 경우, 상기과정을 거치지 않고 미디어 파일로부터 바로 VID를 추출한다(S170). VID를 추출한 후에는 미디어 파일은 삭제된다.When a user (e.g., digital sex crime victim) finds a victim media file or URL, the user accesses the reporting site provided by the reporting web server 10 (S120), accesses the reporting page, and inputs the subject to report and report (S130). The reporting web server 10 determines whether the reporting target is a URL (S140), and if the reporting target is a URL, extracts a media link from the URL (S150), downloads a media file using the extracted media link, (S160), video identification data (VID) is extracted from the downloaded media file (S170). If the reporting target is not a URL but a media file, the VID is directly extracted from the media file without going through the above process (S170). After extracting the VID, the media files are deleted.

VID(Video Identification Data)는 해시(Hash) 기반 영상 비교를 하지 않기 위한 데이터 저장 타입으로, Video Identification Data의 약자이다.VID (Video Identification Data) is a data storage type for not performing hash-based video comparison, and is an abbreviation of Video Identification Data.

일 실시예에서, VID의 구조는 아래의 표 1과 같으나, 이에 한정되는 것은 아니다.In one embodiment, the structure of the VID is shown in Table 1 below, but is not limited thereto.

표 1을 참조하면, VID의 구조는 데이터베이스의 주요 키(PRIMARY KEY)에 해 당하는 아이디(ID), 미디어가 그림인지 음악인지 동영상인지 등을 나타내는 미디어 타입(MediaType), 이미지나 동영상의 가로/세로 비율을 나타내는 Ratio, 음악 또는 동영상의 재생 시간(PlayTime), 파일의 교환이미지 파일형식(EXIF: exchangeable image file format) 값 속의 위도(Latitude), 경도(Longitude), 파일 생성날 짜(CreatedTime), 음원에 대한 특징 데이터 배열(SoundData), 이미지 또는 영상에 대한 프레임 특징 데이터 배열(FrameData), 유사 음원 그룹의 아이 디(SoundGroupID), 유사 프레임 그룹의 아이디(FrameGroupID), 최초 신고/크롤링된 날짜(FirstCatchedData), 최근 신고/크롤링된 날짜(LastCatchedData) 등을 포함할 수 있다.Referring to Table 1, the structure of VID is the ID corresponding to the PRIMARY KEY of the database, the media type indicating whether the media is a picture, music, or video, and the width/length of the image or video. Ratio, music or video playback time (PlayTime), latitude, longitude, file creation date (CreatedTime), sound source in EXIF (exchangeable image file format) values Array of feature data for file (SoundData), array of frame feature data for image or video (FrameData), ID of similar sound source group (SoundGroupID), ID of similar frame group (FrameGroupID), first reported/crawled date (FirstCatchedData) , recently reported/crawled date (LastCatchedData), etc.

위와 같이, VID 구조는 원본 영상을 저장하지 않고, VID를 통한 원본 영상 추측이 불가능 하도록 구성되어 있다. 그러나, VID 비교를 통해 미디어 간 유사도 비교는 가능하기 때문에, 가해 및 재 유포 위험이 있는 디지털성범죄 영상 비교에 적합하다.As described above, the VID structure is configured so that the original image is not stored and it is impossible to estimate the original image through the VID. However, since it is possible to compare similarities between media through VID comparison, it is suitable for comparing digital sex crime images that have the risk of harm and re-distribution.

일 실시예에서, 미디어(영상 또는 사진)으로부터 1회에 한하여 장면 분할 알고리즘을 통해 추출된 대표 프레임에 대한 디해시(Dhash)를 추출하고 추출된 디해시(Dhash) 값을 VID의 FrameData에 저장한다.In one embodiment, a dehash (Dhash) for a representative frame extracted through a scene segmentation algorithm is extracted from media (video or photo) only once, and the extracted dehash (Dhash) value is stored in FrameData of VID. .

추출된 Dhash값은 원본 영상을 도출할 수 없으면서도, 각 장면에 대한 비교가 가능한 일련의 해시이며, 유사한 영상의 경우 비슷한 값이 나온다.The extracted Dhash value is a series of hashes that can be compared for each scene even though the original video cannot be derived, and similar values are obtained in the case of similar videos.

통상의 영상 비교 서비스는 프레임 비교를 위해 원본 이미지 파일을 저장하 거나 원본 비디오 파일을 저장하여 비교 시 마다 데이터를 추출하는데 반해, 본 실 시예에서는 VID의 FrameData에 Dhash 값을 사용함으로써, 이미지 유사도 판별 시 간결하면서 정확하여 짧은 시간 안에 정밀한 유사도 판단이 가능하다.Conventional image comparison services store original image files or original video files for frame comparison to extract data for each comparison, whereas in this embodiment, Dhash values are used for VID FrameData to determine image similarity. It is concise and accurate, so it is possible to accurately determine similarity in a short time.

신고 웹 서버(10)는, 추출된 VID 데이터를 API 서버의 데이터 베이스(301)를 거쳐 신고기관(예컨대, 방송통신심의 위원회)(50)으로 전송함으로써,“방송통신심의 위원회”에 자동으로 신고한다. 신고기관(50)는 신고 결과를 API 서버의 데이터 베이스(301)를 거쳐 신고 웹 서버(10)로 전송한다. 이에 따라, 사용자는 신고 이후, 신고 웹 서버(10)의 결과 페이지에 접근하여(S180), 신고결과를 확인할 수 있다(S190).The reporting web server 10 transmits the extracted VID data to the reporting agency (eg, Korea Communications Standards Commission) 50 via the database 301 of the API server, and automatically reports it to the “Korea Communications Standards Commission”. do. The report agency 50 transmits the report result to the report web server 10 via the database 301 of the API server. Accordingly, after reporting, the user can access the result page of the reporting web server 10 (S180) and check the reporting result (S190).

기존 서비스는 피해 신고 시 신상정보를 밝혀야 하며, 영상을 기관이 소장해 야 하는 문제점이 있다. 그러나, 본 발명의 실시예에 따르면, 익명 신고를 지원하며, 신고된 영상은 원본을 알 수 없는 영상 패턴 데이터인 VID 만을 추출한 다음, 자동으로 신고 영상은 폐기(삭제)된다. 또한 피해자가 직접 여러 음란물 사이트에 업로드 된 피해영상을 찾아다니며 하나씩 신고하지 않더라도, 유해 미디어 크롤러(40)가 이를 찾아다니며 자동으로 연계 신고를 수행한다. 크롤러(40)는 상시 음란물 유포 사이트의 유해 미디어로부터 VID를 추출하며, 신고 접수 시 신고 대상 VID와 지금껏 모아온 음란물 유포 사이트의 미디어 VID와 비교하여 신고 기관(50) 에 자동으로 신고한다.Existing services have problems in that personal information must be disclosed when reporting damage, and that the institution must keep the video. However, according to an embodiment of the present invention, anonymous reporting is supported, and only VID, which is video pattern data of unknown origin, is extracted from the reported video, and then the reported video is automatically discarded (deleted). In addition, even if the victim does not directly search for and report the damaged images uploaded to various pornography sites, the harmful media crawler 40 searches for them and automatically reports them in connection. The crawler 40 always extracts the VID from the harmful media of the site for distributing pornography, and upon receipt of the report, compares the VID to be reported with the media VID of the pornography distribution site that has been collected so far, and automatically reports it to the reporting agency 50.

도 3은 본 발명의 일 실시예에 따른 미디어 유사도 비교기의 동작을 설명하 기 위한 플로우챠트이다.3 is a flowchart for explaining the operation of a media similarity comparator according to an embodiment of the present invention.

미디어 유사도 비교기(30)는 수사관 혹은 경찰이 사용하는 컴퓨터 및 컴퓨터에 설치되어 구동되는 미디어 유사도 프로그램으로 구현될 수 있다.The media similarity comparator 30 may be implemented as a computer used by investigators or police officers and a media similarity program installed and driven in the computer.

미디어 유사도 비교 프로그램은 수사관 대상 프로그램으로, 수사관이 채증한 미디어에 대하여 분석, 타임라인 추출 및 시각화를 수행함으로써, 수사를 보조할 수 있도록 구성된다.The media similarity comparison program is a program for investigators and is configured to assist the investigation by performing analysis, timeline extraction, and visualization on the media collected by the investigator.

도 3을 참조하여 미디어 유사도 비교기의 작동 플로우를 설명하면 다음과 같다.Referring to FIG. 3, the operation flow of the media similarity comparator is as follows.

수사관은 피해 미디어를 채증 및 수집하고(S210), 미디어 유사도 비교 분석 프로그램을 실행한다(S215). 수사관이 채증한 피해 미디어가 새로운 케이스일 경 우, 새 케이스를 생성하고(S220), 채증한 미디어의 VID를 추출하며(S225), 추출된 VID를 API 서버 데이터 베이스에 저장되어 있는 VID와 비교한다(S230). 추출된 VID 및 비교 결과 데이터는 API 서버의 데이터베이스(301)에 저장될 수 있다.The investigator collects and collects the damaged media (S210), and executes a media similarity comparison analysis program (S215). If the damage media collected by the investigator is a new case, a new case is created (S220), the VID of the collected media is extracted (S225), and the extracted VID is compared with the VID stored in the API server database. (S230). The extracted VID and comparison result data may be stored in the database 301 of the API server.

채증한 미디어로부터 VID가 추출되면, 원본 미디어(즉, 채증한 미디어)는 삭 제(폐기)된다.When the VID is extracted from the acquired media, the original media (ie, the acquired media) is deleted (discarded).

추출된 VID를 데이터 베이스에 저장되어 있는 VID와 비교하여, 두 미디어 간의 유사도를 산출할 수 있다.The similarity between the two media can be calculated by comparing the extracted VID with the VID stored in the database.

또한 유사도 높은 미디어들을 그룹화하여 유사 미디어 그룹을 생성할 수 있다(S235). 예컨대, 기준 유사도(예를 들어, 80%) 이상의 미디어들을 유사 미디어 그룹으로 설정하는 유사 미디어 그룹화를 통해 특정 타입(예컨대, .djj 타입) 결과 파일을 생성한다(S240). 결과 파일이 생성되면 결과 파일을 열람할 수 있다(S250).Also, a similar media group may be created by grouping media having a high similarity (S235). For example, a result file of a specific type (eg, .djj type) is generated through similar media grouping in which media of a criterion similarity (eg, 80%) or higher is set as a similar media group (S240). When the result file is created, the result file can be viewed (S250).

수사관이 채증한 피해 미디어가 새로운 케이스가 아닌 경우 케이스 불러오기를 실행하여(S245), 결과 파일을 열람할 수 있다(S250).If the damage media collected by the investigator is not a new case, the case can be loaded (S245) and the resulting file can be viewed (S250).

미디어 유사도 비교기(30)는 수사관에게 케이스 생성정보 화면(S255), 유사 도 비교차트 화면(S260), 및 유사도 그룹 시각화 화면(S270)을 제공함으로써, 수사 관의 수사를 보조할 수 있다.The media similarity comparator 30 may assist the investigator in the investigation by providing the investigator with a case creation information screen (S255), a similarity comparison chart screen (S260), and a similarity group visualization screen (S270).

수사관은 케이스 생성정보 화면을 통해 각 케이스에 대한 정보, 예컨대, 케이스 생성일자, 분석 소요시간, 사건정보 등을 열람할 수 있고, 유사도 그룹 시각 화 화면(S270)을 통해 둘 이상의 미디어 간의 유사도를 확인할 수 있으며, 유사도 그룹 시각화 화면(S270)을 통해 노드 편집, 파일 추출, 타임라인 확인 등을 할 수 있다.The investigator can view information about each case, such as case creation date, analysis time, case information, etc., through the case creation information screen, and check the similarity between two or more media through the similarity group visualization screen (S270). Node editing, file extraction, timeline confirmation, etc. can be performed through the similarity group visualization screen (S270).

API 서버(30)는 초기에 범죄 사이트 정보를 보유하지 않으므로, 피해 미디어 VID의 초기 수집을 위한 시드 URL 리스트를 저장한다. 시드 URL 리스트에는 유해 미디어 크롤러(40)를 통해 추출한 미디어들의 VID를 포함한다. 시드 URL 리스트를 위한 크롤러 대상은 국내외 포르노사이트, 불법 사이트, 히든위키 사이트, 다크웹 광고 사이트 및 다크웹 검색 사이트 중 하나 이상을 포함한다.Since the API server 30 initially does not hold crime site information, it stores a list of seed URLs for initial collection of victim media VIDs. The seed URL list includes VIDs of media extracted through the harmful media crawler 40 . Crawler targets for the seed URL list include at least one of domestic and foreign pornographic sites, illegal sites, Hidden Wiki sites, dark web advertising sites, and dark web search sites.

유해 미디어 크롤러(400)는 검색 엔진 로봇을 이용하여 네트워크 상의 수많은 서버 및/또는 웹사이트에서 피해 미디어를 자동으로 수집한다.Harmful media crawler 400 automatically collects victim media from numerous servers and/or websites on a network using search engine robots.

유해 미디어 크롤러(400)는 API 서버(30)의 시드 URL 리스트를 대상으로 크 롤링을 수행하여 피해 미디어를 수집하여 VID를 추출한다. 유해 미디어 크롤러(400)에 의해 추출된 피해 미디어의 VID는 API 서버(30)로 전송된다. 피해 미디 어로부터 VID가 추출된 이후에는 피해 미디어는 자동 삭제(폐기)된다.The harmful media crawler 400 performs crawling on the seed URL list of the API server 30 to collect victim media and extracts a VID. The VID of the victim media extracted by the harmful media crawler 400 is transmitted to the API server 30 . After the VID is extracted from the damaged media, the damaged media is automatically deleted (discarded).

유해 미디어 크롤러(400)는 초기에는 시드 URL 리스트를 대상으로 크롤링을 진행하고, 크롤링이 진행됨에 따라 대상 웹사이트 내의 내/외부 링크를 수집해 크 롤링 대상 URL 리스트를 지속적으로 업데이트 한다.The harmful media crawler 400 initially crawls the list of seed URLs, collects internal/external links in the target website as the crawl progresses, and continuously updates the crawl target URL list.

도 4는 본 발명의 일 실시예에 따른 유해 미디어 크롤러(400)의 동작을 설명하기 위한 플로우챠트이다.4 is a flowchart for explaining the operation of the harmful media crawler 400 according to an embodiment of the present invention.

도 4를 참조하면, 유해 미디어 크롤러(400)는 유해 미디어 URL 및 키워드를 로드하여(S310), 크롤링을 시작한다. 유해 미디어 크롤러(400)의 초기 동작시 유해 미디어 URL 리스트는 API 서버(30)의 시드 URL 리스트일 수 있다.Referring to FIG. 4 , the harmful media crawler 400 loads harmful media URLs and keywords (S310) and starts crawling. When the harmful media crawler 400 initially operates, the harmful media URL list may be a seed URL list of the API server 30 .

크롤링할 URL이 남아 있는지 판단하여(S320), 크롤링할 URL이 남아 있다면, 해당 URL이 미디어 파일인지 판단하고(S325), 해당 URL이 미디어 파일이면, 미디어 파일을 다운로드하여 VID를 추출한다((S330, S335). 추출한 VID는 API 서버로 전송한다(S340).It is determined whether there are URLs to be crawled (S320), and if there are URLs to be crawled, it is determined whether the corresponding URL is a media file (S325). If the URL is a media file, the media file is downloaded and the VID is extracted (S330 , S335) The extracted VID is transmitted to the API server (S340).

S325 단계에서, 해당 URL이 미디어 파일이 아니면, 유해 키워드가 들어있는 지 여부를 판단하고(S345), 유해 키워드가 들어있다면, 해당 URL이 등록된 정규표현식에 맞는지를 확인하여(S350), 정규표현식에 맞는 링크는 바로 크롤링 대상 리스트에 등록한다(S355).In step S325, if the URL is not a media file, it is determined whether a harmful keyword is included (S345), and if the URL is contained, it is checked whether the URL matches the registered regular expression (S350), A link suitable for is immediately registered in the crawling target list (S355).

정규표현식에 맞지 않는 링크는 사이트 깊이 지표를 1만큼 더하고(S360), 사이트 깊이 지표가 특정 수치(예컨대, 5) 이상이면 사이트 속 모든 링크를 크롤링 대상 리스트에 등록한다(S370).Links that do not match the regular expression are added to the site depth index by 1 (S360), and if the site depth index is greater than or equal to a specific value (eg, 5), all links in the site are registered in the list to be crawled (S370).

S345 단계에서, 유해 키워드가 들어있지 않다면 유해사이트 지수를 1만큼 차 감하고(S375), 유해사이트 지수가 특정 수치(예컨대, 5) 미만이면 S320 단계로 복 복귀하여 다음 크롤링 URL을 대상으로 크롤링을 수행한다.In step S345, if the harmful keyword is not included, the harmful site index is deducted by 1 (S375), and if the harmful site index is less than a specific value (eg, 5), the process returns to step S320 to perform crawling targeting the next crawl URL. carry out

유해사이트 지수가 특정 수치(예컨대, 5) 이상이면, 해당 URL이 등록된 정규 표현식에 맞는지를 확인하여(S350), 정규표현식에 맞는 링크는 바로 크롤링 대상 리스트에 등록하고(S355), 정규표현식에 맞지 않는 링크는 사이트 깊이 지표를 1만큼 더하고(S360), 사이트 깊이 지표가 특정 수치(예컨대, 5) 이상이면 사이트 속 모든 링크를 크롤링 대상 리스트에 등록한다(S370).If the harmful site index is higher than a specific number (e.g., 5), it is checked whether the URL matches the registered regular expression (S350), and the link matching the regular expression is immediately registered in the list to be crawled (S355). Links that do not fit are added to the site depth index by 1 (S360), and if the site depth index is greater than or equal to a specific value (eg, 5), all links in the site are registered in the list to be crawled (S370).

S320 단계에서 크롤링할 URL이 없다면, 특정 시간 동안 아이들(idle) 상태에 있다가 다시 S320 단계로 복귀한다.If there is no URL to be crawled in step S320, the process returns to step S320 after being in an idle state for a specific period of time.

상술한 바와 같이, 본 발명의 실시예에 따르면, 본 발명은 피해자의 원본 미디어(영상)이 아닌, VID를 추출하여, 저장하기 때문에 유출과 소장의 위험이 적다. 위와 같은 이유로 피해자의 개인정보를 보호할 수 있다.As described above, according to the embodiment of the present invention, since the present invention extracts and stores the VID, not the victim's original media (video), the risk of leakage and collection is small. For the above reasons, the victim's personal information can be protected.

또한 미디어 간의 유사도 산출 시, 미디어 전체에 대한 다각적 유사도 검사 (음원+이미지+비율 등)를 통하여 다방면적 유사도를 평가한다. 이에 따라 피해자나 수사관의 분석이 별도로 필요하지 않으며, 신고 기준 유사도(예컨대, 90%) 이상의 미디어만을 자동적으로 신고할 수 있다.In addition, when calculating the similarity between media, multi-dimensional similarity is evaluated through a multilateral similarity test (sound source + image + ratio, etc.) for the entire media. Accordingly, a separate analysis by the victim or the investigator is not required, and only media with a similarity level (eg, 90%) or higher can be automatically reported.

통상의 기술과 다르게 수사관이 직접 미디어를 시청하지 않아도 되기 때문에 정신적 트라우마 혹은 2차 가해를 방지할 수 있다.Unlike conventional techniques, since the investigator does not have to directly watch the media, psychological trauma or secondary harm can be prevented.

또한, 미디어로부터 VID가 추출된 후에는 미디어 파일은 자동 삭제 또는 폐기되며, 데이터베이스는 물론 시스템 내의 어디에도 미디어 자체가 저장되지는 않는 다. 또한, VID로부터 미디어를 복원하는 것도 불가능한다. 이에 따라, 유해 미디어의 유출 우려가 없으며, 수사관, 피해자, 신고자 등이 유해 미디어에 노출될 우려가 줄어든다.In addition, after the VID is extracted from the media, the media file is automatically deleted or discarded, and the media itself is not stored anywhere in the system, let alone the database. Also, it is impossible to recover the media from the VID. Accordingly, there is no risk of leakage of harmful media, and the risk of being exposed to harmful media by investigators, victims, and reporters is reduced.

또한 통상의 기술과 다르게 본 발명은 수사관 외에도 신고자 또한 실시간 현 황확인이 가능하며, 변조(리사이즈, 광고) 영상의 유사도 비교도 가능하다. 동일영상의 선별 시간이 축소되며, 유사도 판별의 정확성도 향상된다. VID 추출 이후 원 본 영상이 필요 없기에 최소한의 DB만 필요하다.In addition, unlike the conventional technology, the present invention can check the real-time status of the reporting person in addition to the investigator, and it is possible to compare the similarity of the altered (resize, advertisement) video. The screening time of the same image is reduced, and the accuracy of similarity determination is also improved. Since the original image is not needed after VID extraction, only a minimal DB is required.

본 발명은 도면에 도시된 실시 예를 참고로 설명되었으나 이는 예시적인 것에 불과하며, 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시 예가 가능하다는 점을 이해할 것이다. 따라서, 본 발명의 진정한 기술적 보호 범위는 첨부된 등록청구범위의 기술적 사상에 의해 정해져야 할 것이 다.Although the present invention has been described with reference to the embodiments shown in the drawings, this is only exemplary, and those skilled in the art will understand that various modifications and equivalent other embodiments are possible therefrom. Therefore, the true technical protection scope of the present invention should be determined by the technical spirit of the attached claims.

10: 신고 웹 서버
20: 미디어 유사도 비교기
30: API 서버
40: 유해 미디어 크롤러10: Report web server
20: Media Similarity Comparator
30: API server
40: Harmful Media Crawler

Claims

In the harmful media automatic detection system,
A report page is provided to the user, the user reports the damaged media file or URL through the report page, and if the target of the report is a URL, the media link is extracted from the URL to download the media file, and video identification data is obtained from the downloaded media file. A reporting web server that extracts (VID) and extracts the VID from the media file if the reporting target is a media file;
By receiving the VID from the reporting web server, storing it in the API database, transmitting the VID to the Korea Communications Standards Commission server, and receiving the report result from the Korea Communications Standards Commission server and transmitting it to the reporting web server, the user An API server that allows access to the result page of the virtual reporting web server and confirms the reporting result;
The investigator extracts the VID from the collected media, compares the extracted media VID with the storage media VID stored in the API server database, calculates the similarity between the two media, and groups the media with high similarity into a similar media group. a media similarity comparator that generates and
An automatic harmful media detection system including a harmful media crawler that automatically collects harmful media, extracts VIDs from the collected harmful media, and stores them in the database of the API server.

According to claim 1,
The storage unit stores a list of seed URLs for initial collection of harmful media, and the harmful media crawler
Initially, crawling is performed targeting the seed URL, and the VID of the extracted harmful media is transmitted to the storage unit again. Harmful media automatic detection system characterized by updating.

The method of claim 1, wherein the VID is
a media type (MediaType) indicating whether the harmful media is a picture, music, or video;
ratio information (Ratio) indicating the horizontal/vertical ratio of an image or video; Play time of music or video (PlayTime);
Characteristic data array for a sound source (SoundData); and
An automatic harmful media detection system comprising a frame feature data array (FrameData) for an image or video.

In the harmful media automatic detection method,
Providing a report page to the user and receiving a report of the damaged media file or URL from the user through the report page;
If the report target received from the user is a URL, the media link is extracted from the URL to download the media file, video identification data (hereinafter referred to as VID) is extracted from the downloaded media file, and the report target is a media file extracting a VID from the media file and deleting the media file;
The VID is received from the reporting web server, stored in a database, the VID is transmitted to the reporting agency server, and the reporting result is received from the reporting agency server and transmitted to the reporting web server, so that the user can access the reporting web server. Accessing the result page of the report so that the report result can be checked;
extracting a VID from the collected media, comparing the extracted media VID with a media VID stored in the database, and calculating a similarity between the two media;
Deleting the collection media;
generating a similar media group by grouping media having a high similarity using the calculated similarity; and
automatically collecting harmful media, extracting VIDs from the collected harmful media, and storing them in the database; and
A method for automatically detecting harmful media, comprising deleting collected harmful media.

According to claim 4,
storing a list of seed URLs for initial collection of harmful media;
Initially, crawling is performed on the seed URL list, and as the crawling progresses, internal and external links in the crawled website are collected to update the crawled URL list. Method for automatically detecting harmful media.

According to claim 4,
The method of automatically detecting harmful media further comprising the step of automatically reporting by transmitting the VID of the collected media to the reporting agency server when the calculated similarity is equal to or greater than the reporting criterion.