KR102574353B1

KR102574353B1 - Device Resource-based Adaptive Frame Extraction and Streaming Control System and method for Blocking Obscene Videos in Mobile devices

Info

Publication number: KR102574353B1
Application number: KR1020210154308A
Authority: KR
Inventors: 정광수; 강정호; 김민수
Original assignee: 광운대학교 산학협력단
Priority date: 2021-11-10
Filing date: 2021-11-10
Publication date: 2023-09-01
Also published as: KR20230068207A

Abstract

모바일 단말에서 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 스트리밍 제어 시스템 및 방법이 개시된다. 상기 시스템은 서버; 동영상 프레임을 수신받아 적응적 프레임을 추출하는 적응적 프레임 추출 모듈과, 유해 동영상 차단에 필요한 단말 자원은 단말의 처리 능력과 관련된 요소와 유해 동영상에 대한 대응 결과의 경우 추출된 프레임의 개수와 유해성 분석에 소요된 시간이 존재하며, 딥러닝 기반의 유해성 분석 엔진과, 유해 동영상의 경우 모자이크 처리하는 스트리밍 제어 모듈을 구비하는 사용자 단말을 포함하고, 상기 서버로부터 수신한 동영상 세그먼트의 프레임은 실시간으로 추출된 후 상기 유해성 분석 엔진으로 전달되고, 상기 유해성 분석 엔진은 추출 프레임별 유해성 분석 결과를 유해성 변화 판단 모듈로 전달하여 재생 중인 동영상의 유해성 변화를 판단하며, 유해성 판단 결과는 유해성 분석에 필요한 단말 자원의 변화와 함께 프레임 추출 방식을 조정하기 위해 사용되고, 프레임 추출 방식 조절을 위한 기준이 되는 단말 자원 특성은 사용자 단말의 잔여 배터리의 상태이며, 사용자 단말의 잔여 배터리의 상태는 단말의 스펙에 따라 충분하다고 판단되는 상태(40%), 절전 모드의 기준이 되는 상태(20%)와 초절전 모드의 기준이 되는 상태(10%)로 분류하고, 모바일 단말에서 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 동영상 세그먼트의 비트율을 적응적으로 결정하는 HTTP 적응적 스트리밍 기반으로 동작한다. A terminal resource-based adaptive frame extraction and streaming control system and method for blocking harmful video in a mobile terminal are disclosed. The system includes a server; An adaptive frame extraction module that receives video frames and extracts adaptive frames, and terminal resources required to block harmful videos are factors related to the processing capacity of the terminal and the number of extracted frames and harmfulness analysis in the case of response results to harmful videos includes a user terminal having a deep learning-based harmfulness analysis engine and a streaming control module for mosaic processing in the case of harmful videos, and frames of video segments received from the server are extracted in real time Then, the harmfulness analysis engine transmits the harmfulness analysis result for each extracted frame to the harmfulness change determination module to determine the harmfulness change of the video being played, and the harmfulness determination result is the change in terminal resources required for harmfulness analysis It is used to adjust the frame extraction method, and the terminal resource characteristic that is the criterion for adjusting the frame extraction method is the state of the remaining battery of the user terminal, and the state of the remaining battery of the user terminal is determined to be sufficient according to the specifications of the terminal. state (40%), a state that is the standard for power saving mode (20%), and a state that is the standard for ultra power saving mode (10%). Adaptive frame extraction and It operates based on HTTP adaptive streaming that adaptively determines the bit rate of video segments.

Description

Device Resource-based Adaptive Frame Extraction and Streaming Control System and method for Blocking Obscene Videos in Mobile devices}

본 발명은 모바일 단말에서 유해 동영상 차단하기 위해 사용중인 단말의 자원(사용자 단말의 배터리의 상태)을 고려하여 적응적으로 프레임을 추출을 수행하고 유해 동영상을 모자이크 처리하는 사용자 단말 자원 기반의 적응적 프레임 추출 및 스트리밍 제어 시스템 및 방법에 관한 것이다.The present invention is a user terminal resource-based adaptive frame that extracts frames adaptively in consideration of the terminal resource (battery state of the user terminal) in use to block harmful videos in a mobile terminal and mosaics harmful videos. It relates to extraction and streaming control systems and methods.

최근, 컴퓨터와 스마트기기의 보급이 증감함에 따라 유무선 인터넷은 PC와 스마트 기기로 접속 가능한 유해 사이트, 유해 동영상들이 범람하고 있으며, 어린이나 청소년과 같이 보호가 필요한 피보호자들은 유해 사이트에 접속하거나 또는 성인 동영상이나 불법 동영상을 다운로드 받거나 유해 동영상에 노출되어 있다.Recently, as the spread of computers and smart devices increases and decreases, the wired and wireless Internet is overflowing with harmful websites and videos that can be accessed with PCs and smart devices. or download illegal videos or are exposed to harmful videos.

이를 해결하기 위해, 어린 자녀들과 청소년들에게 유해 사이트를 차단하고, 유해 동영상 차단 기술이 필요하다.To solve this problem, technology to block harmful websites and harmful videos is needed for young children and teenagers.

유해 동영상 차단에 필요한 단말 자원은 처리 능력과 관련된 요소와 유해 동영상에 대한 대응 결과로 구분된다. 처리 능력과 관련된 요소의 경우 CPU 점유율과 배터리 소모량이 있으며 유해 동영상에 대한 대응 결과의 경우 추출된 프레임의 개수와 유해성 분석에 소요된 시간이 있다. 사용자 단말은 딥러닝 기반의 유해성 분석 엔진이 존재한다. 서버로부터 수신한 동영상 세그먼트의 프레임은 실시간으로 추출된 후 유해성 분석 엔진으로 전달된다. 유해성 분석 엔진은 추출 프레임별 유해성 분석 결과를 유해성 변화 판단 모듈로 전달하여 재생 중인 동영상의 유해성 변화를 판단한다. 유해성 판단 결과는 유해성 분석에 필요한 단말 자원의 변화와 함께 프레임 추출 방식을 조정하기 위해 사용된다. 프레임 추출 방식 조절을 위한 기준이 되는 단말 자원 특성은 단말의 잔여 배터리이다. 잔여 배터리의 상태는 단말의 스펙에 따라 충분하다고 판단되는 상태(40%), 절전 모드의 기준이 되는 상태(20%)와 초절전 모드의 기준이 되는 상태(10%)로 나뉜다. 단말의 잔여 배터리가 40% 이상일 경우 유해성 변화와 프레임 간의 유사도 비교 기반의 적응적 프레임 추출 알고리즘을 적용한다. 단말의 잔여 배터리 상태가 절전 모드 돌입을 준비해야 하는 경우일 때에는 유해 동영상 차단 과정에서 CPU 부하에 주는 영향이 적은 이미지 resizing 크기와 SSIM 임계값을 제어함으로써 배터리의 빠른 소모에 대비한다. 단말의 잔여 배터리가 절전 모드 상태이거나 초절전 모드 상태일 경우에는 프레임 추출 간격 제어와 유사도 비교를 추가로 적용하여 단말의 부하를 감소시킨다.Terminal resources required to block harmful videos are divided into factors related to processing capability and response results to harmful videos. Factors related to processing power include CPU occupancy and battery consumption, and response results for harmful videos include the number of frames extracted and the time required for harmfulness analysis. The user terminal has a deep learning-based hazard analysis engine. The frame of the video segment received from the server is extracted in real time and transmitted to the harmfulness analysis engine. The harmfulness analysis engine transfers the harmfulness analysis result for each extracted frame to the harmfulness change determination module to determine the harmfulness change of the video being played. The harmfulness determination result is used to adjust the frame extraction method along with changes in terminal resources required for harmfulness analysis. A terminal resource characteristic that is a criterion for adjusting the frame extraction method is the remaining battery of the terminal. The remaining battery state is divided into a state that is determined to be sufficient (40%), a state that is the standard for power saving mode (20%), and a state that is the standard for ultra power saving mode (10%) according to the specifications of the terminal. If the remaining battery of the terminal is 40% or more, an adaptive frame extraction algorithm based on the similarity comparison between harmfulness changes and frames is applied. When the remaining battery state of the terminal is necessary to prepare for entering the power saving mode, it prepares for rapid battery consumption by controlling the image resizing size and SSIM threshold, which have little effect on the CPU load during the harmful video blocking process. When the remaining battery of the terminal is in the power saving mode or ultra power saving mode, the load of the terminal is reduced by additionally applying frame extraction interval control and similarity comparison.

이와 관련된 선행기술1로써, 특허등록번호 10-06877320000에서는 "내용 기반 멀티 모달 특징값을 이용한 유해 동영상 차단방법 및 그 장치"가 등록되어 있다. As a prior art 1 related to this, Patent Registration No. 10-06877320000 has registered “a method and apparatus for blocking harmful videos using content-based multi-modal feature values”.

내용 기반 멀티 모달 특징값을 이용한 유해 동영상 차단 방법 및 그 장치는 유해 동영상 분류 모델과 비유해 동영상 분류 모델로 이루어지는 판별모델을 구축하는 단계; 동영상 컨텐츠에서 비디오 스트림과 오디오 스트림으로 분리하는 단계; 상기 분리된 비디오 스트림에 대하여 소정의 비주얼 특징값을 추출하여 샷(shot) 경계를 검출한 후 각 샷에서 적어도 하나 이상의 키 프레임을 추출하고 상기 샷과 키 프레임정보를 생성하는 단계; 상기 비디오 스트림에 대하여 소정의 선처리를 수행한 후 상기 샷과 키 프레임 정보를 기초로 상기 키 프레임의 멀티 모달 특징값을 추출하는 단계; 상기 멀티 모달 특징값을 상기 판별모델에 입력하여 샷의 유해성을 판단하는 단계; 및 상기 샷 기준의 판단 결과를 종합하여 상기 동영상 컨텐츠의 유해성을 판단하는 단계를 포함하며, A method and apparatus for blocking harmful video using content-based multi-modal feature values include: constructing a discrimination model composed of a harmful video classification model and a non-harmful video classification model; Separating video content into a video stream and an audio stream; extracting predetermined visual feature values from the separated video stream to detect shot boundaries, extracting at least one key frame from each shot, and generating shot and key frame information; extracting multi-modal feature values of the key frame based on the shot and key frame information after performing predetermined pre-processing on the video stream; determining the harmfulness of a shot by inputting the multi-modal feature values into the discrimination model; and determining the harmfulness of the video content by integrating the determination results based on the shot criteria.

키프레임 기반 판단 엔진과 프레임 기반 판단 엔진을 이용한 점진적 차단 방법을 이용하고, 멀티모달(multi-modal) 특징값들을 이용함으로써, 인터넷을 통한 유해 동영상 차단, P2P를 통한 유해 동영상 유통 제어, 디지털 방송 스트림의 실시간 유해 정보 감시 시스템에 적용될 수 있다.Blocking harmful videos through the Internet, controlling distribution of harmful videos through P2P, and digital broadcasting streams by using a gradual blocking method using a keyframe-based judgment engine and a frame-based judgment engine, and using multi-modal feature values. It can be applied to real-time harmful information monitoring system of

이와 관련된 선행기술2로써, 특허등록번호 10-09071720000에서는 "동영상 유통 환경에서 유해 동영상의 다단계 차단 시스템 및 방법"이 등록되어 있다. As prior art 2 related to this, Patent Registration No. 10-09071720000 has registered "a system and method for blocking harmful videos in multiple stages in a video distribution environment".

유해 동영상의 다단계 차단 시스템은 The multi-level blocking system for harmful videos

동영상 컨텐츠에서 추출된 이미지를 병합하여 요약 이미지를 생성하고, 상기 요약 이미지를 분석하여 유해 동영상을 검출하는 필터링 관리자; 및 a filtering manager for generating a summary image by merging images extracted from video contents and detecting harmful videos by analyzing the summary image; and

상기 동영상 컨텐츠를 시청하는 소비자의 반응 정보를 분석하여 유해 동영상을 검출하고, 유해 동영상에 해당하는 상기 동영상 컨텐츠의 정보를 상기 필터링 관리자에 전송하는 평가 분석 서버를 포함한다. and an evaluation analysis server for detecting harmful videos by analyzing reaction information of consumers watching the video contents, and transmitting information on the video contents corresponding to the harmful videos to the filtering manager.

유해 동영상의 다단계 차단 시스템 및 방법을 사용하여, 동영상의 제작, 유통 및 소비 과정 전반에 걸쳐 유해 동영상을 효율적으로 차단할 수 있으며, 피드백을 통하여 앞 단계에서 검출되지 않은 유해 동영상을 검출하는 것이 가능하다. 또한, 동영상의 내용에 기반한 필터링 방법을 사용하므로 UCC 등의 동영상에 있어서도 효율적으로 유해 동영상을 차단할 수 있다. Using the multi-step blocking system and method for harmful videos, it is possible to efficiently block harmful videos throughout the process of production, distribution and consumption of videos, and it is possible to detect harmful videos that were not detected in the previous step through feedback. In addition, since a filtering method based on the content of the video is used, it is possible to efficiently block harmful videos even in videos such as UCC.

이와 관련된 선행기술3으로써, 특허등록번호 10-21690730000에서는 "동영상 유해성 분석 결과 기반 실시간 스트리밍 차단 시스템 및 방법"이 등록되어 있다.As a prior art 3 related to this, Patent Registration No. 10-21690730000 "system and method for blocking real-time streaming based on video harmfulness analysis results" is registered.

도 1은 종래의 동영상 유해성 분석 결과 기반의 실시간 스트리밍 차단 시스템의 구조를 나타낸 도면이다. 1 is a diagram showing the structure of a real-time streaming blocking system based on a conventional video harmfulness analysis result.

동영상 유해성 분석 결과 기반 실시간 스트리밍 차단 시스템은 Real-time streaming blocking system based on video harmfulness analysis results

HTTP 적응적 스트리밍을 위해 동영상 데이터를 고정된 재생 길이와 서로 다른 품질(bitrate)을 가지는 세그먼트의 형태로 인코딩하여 서버에 저장되며, 실시간 스트리밍 동영상 데이터를 제공하는 동영상 서버; 및A video server that encodes video data in the form of segments having a fixed playback length and different bitrates for HTTP adaptive streaming, stores them in the server, and provides real-time streaming video data; and

HTTP 적응적 스트리밍 클라이언트가 설치되며, 딥러닝 기반의 유해성 분석 엔진을 사용하여 동영상 프레임 간 유사도를 유해 원본 이미지와 비교하여 각 프레임의 장면의 길이 정보와 프레임별 유해 확률을 계산하여 최종 유해 등급에 따라 스트리밍 차단을 결정하여 유해한 동영상을 차단하는 사용자 단말을 포함하며,An HTTP adaptive streaming client is installed, and a deep learning-based harmfulness analysis engine is used to compare the similarity between video frames with the original harmful image, calculate the length information of each frame and the probability of harmfulness per frame, and determine the final harmfulness level. Including a user terminal that blocks harmful videos by determining streaming blocking;

상기 사용자 단말은 상기 HTTP 적응적 스트리밍 클라이언트가 설치되며, 상기 동영상 서버로부터 요청할 동영상에 대한 정보를 기술하는 MPD(Media Presentation Description) 파일을 다운로드하며, 측정한 네트워크 대역폭 및 단말의 상태에 따라 특정 품질(bitrate)의 동영상 세그먼트를 요청하고,The user terminal has the HTTP adaptive streaming client installed, downloads a Media Presentation Description (MPD) file describing information about a video to be requested from the video server, and has a specific quality ( bitrate) request a video segment,

상기 딥러닝 기반의 유해성 분석 엔진은 CNN(Convolutional Neural Network) 모델을 사용하며, 동영상 세그먼트의 주어진 데이터 세트에 대하여 입력 이미지들의 특성을 학습하여 동영상 프레임 간 유사도를 유해 원본 이미지와 비교하여 각 프레임의 장면의 길이 정보와 프레임별 유해 확률을 계산하고, 동영상의 유해성을 분석하여 각각의 동영상 프레임에 대한 유해 확률의 형태로 동영상 유해성 분석 결과를 동영상 스트리밍 차단 제어 모듈로 제공한다. The deep learning-based harmfulness analysis engine uses a convolutional neural network (CNN) model, learns the characteristics of input images for a given data set of video segments, compares the similarity between video frames with the original harmful image, and compares the scene of each frame. It calculates length information and harmfulness probability for each frame, analyzes the harmfulness of the video, and provides the video harmfulness analysis result in the form of harmfulness probability for each video frame to the video streaming blocking control module.

상기 사용자 단말은The user terminal

스트리밍되는 동영상 세그먼트로부터 동영상 프레임을 추출하고, 각 동영상 프레임들의 입력 이미지들의 특성을 학습하여 유해 원본 이미지와 비교 이미지를 비교하여 즉, 동영상 프레임 간 유사도를 상기 유해 원본 이미지와 비교하여 각 프레임의 장면의 길이 정보와 프레임별 유해 확률을 계산하고, 동영상의 유해성을 분석하여 각각의 동영상 프레임에 대한 유해성 분석 결과를 제공하는 딥러닝 기반의 유해성 분석 엔진;A video frame is extracted from a video segment being streamed, the characteristics of the input images of each video frame are learned, and the harmful original image and the comparison image are compared, that is, the similarity between the video frames is compared with the harmful original image to determine the scene of each frame. a deep learning-based harmfulness analysis engine that calculates length information and harmfulness probabilities for each frame, analyzes harmfulness of the video, and provides harmfulness analysis results for each video frame;

동영상 유해성 분석의 효율성을 높이기 위해 중복된 동영상 프레임을 제외하고 동영상 프레임을 추출하는 동영상 프레임 추출 모듈; 및A video frame extraction module that extracts video frames except for duplicate video frames to increase the efficiency of video harmfulness analysis; and

상기 유해성 분석 결과와 장면 길이 정보를 이용하여 최종 유해 등급을 산출하고, 최종 유해 등급으로 판단된 유해 동영상의 스트리밍을 차단하는 동영상 스트리밍 차단 제어 모듈을 포함한다. and a video streaming blocking control module that calculates a final harmfulness level using the harmfulness analysis result and scene length information, and blocks the streaming of the harmfulness video determined to be the final harmfulness level.

상기 딥러닝 기반의 유해성 분석 엔진은 상기 동영상 프레임 간 유사도 비교 기반의 장면 길이 분석을 위해 SSIM (Structural Similarity; 구조적 유사도 지수) 값을 계산하여 학습된 유해 장면이 포함된 원본 이미지와 프레임의 비교 이미지의 유사도를 비교한다. The deep learning-based harmfulness analysis engine calculates SSIM (Structural Similarity; Structural Similarity Index) values for scene length analysis based on similarity comparison between video frames, and compares the original image including the learned harmful scene with the frame comparison image. Compare similarities.

상기 SSIM 값은 두 이미지의 픽셀들에 대한 평균 밝기 I(x,y), 이미지의 콘트라스트 값 c(x,y), 그리고 이미지 구조 지수 s(x,y)를 종합하여 다음과 같이 계산되며, 이때 사용되는 가중치 a, b, c의 값은 모두 1로 설정되며, 원본 이미지와 비교 이미지의 유사도를 비교하는 것으로, 가중치가 모두 1로 설정되었기 때문에 SSIM 계산에 있어서, 계산된 SSIM 값을 특정 임계값과 비교하여 유사 여부를 판단한다. The SSIM value is calculated as follows by combining the average brightness I(x,y) of the pixels of the two images, the contrast value c(x,y) of the image, and the image structure index s(x,y), The values of the weights a, b, and c used at this time are all set to 1, and the similarity between the original image and the comparison image is compared. Since the weights are all set to 1, the calculated SSIM value is set to a specific threshold Compare the values to determine whether they are similar.

상기 딥러닝 기반의 유해성 분석 엔진은 상기 동영상 세그먼트에서 추출된 각 동영상 프레임들에 대하여 장면 단위로 분류하고, 각 동영상 프레임에 대한 유해성 분석 결과와 장면 길이 정보를 이용하여 각 장면에 대한 유해 등급 및 스트리밍 차단을 결정하기 위한 최종 유해 등급을 산출하며,The deep-learning-based harmfulness analysis engine classifies each video frame extracted from the video segment into a scene unit, and uses the harmfulness analysis result and scene length information for each video frame to classify and stream harmfulness for each scene. Calculate a final hazard class for determining interception;

상기 유해성 분석 엔진의 입력으로 사용된 동영상 프레임에 대한 유해성 분석 결과는 프레임별 유해 확률로 표현된다. A harmfulness analysis result for a video frame used as an input of the harmfulness analysis engine is expressed as a harmfulness probability for each frame.

그러나, however,

특허등록번호 10-06877320000 (등록일자 2007년 02월 21일), "내용 기반 멀티 모달 특징값을 이용한 유해 동영상 차단방법 및 그 장치", 한국전자통신연구원, 한국정보통신대학교 산학협력단Patent Registration No. 10-06877320000 (Registration Date: February 21, 2007), "Method and Apparatus for Blocking Harmful Videos Using Content-Based Multi-Modal Characteristics", Korea Electronics and Telecommunications Research Institute, Korea Information and Communications University Industry-University Cooperation Foundation 특허등록번호 10-09071720000 (등록일자 2009년 07월 02일), "동영상 유통 환경에서 유해 동영상의 다단계 차단 시스템 및 방법", 에스케이텔레콤 주식회사Patent Registration No. 10-09071720000 (registration date: July 02, 2009), "Multi-level blocking system and method for harmful videos in video distribution environment", SK Telecom Co., Ltd. 특허 등록번호 10-21690730000 (등록일자 2020년 10월 16일), "동영상 유해성 분석 결과 기반 실시간 스트리밍 차단 시스템 및 방법", 광운대학교 산학협력단Patent Registration No. 10-21690730000 (registration date: October 16, 2020), "System and method for blocking real-time streaming based on video harmfulness analysis results", Kwangwoon University Industry-University Cooperation Foundation

상기 문제점을 해결하기 위한 본 발명의 목적은 사용중인 단말의 자원(배터리의 상태)을 고려하여 적응적으로 프레임을 추출을 수행하고 유해 동영상을 모자이크 처리하며, 구체적으로는 사용자 단말의 잔여 배터리 상태를 분류하고 그 상태에 따라 적응적으로 프레임 추출 알고리즘을 결정하고, 제안하는 시스템은 동영상을 재생 중인 단말의 상태에 적합한 프레임 추출 및 유해성 변화 판단을 통해 사용자 단말의 배터리 소모를 최소화하고 유해 동영상을 차단하는, 모바일 단말에서 유해 동영상을 차단하기 위해 사용중인 단말의 자원을 고려하여 적응적으로 프레임 추출을 수행하는, 모바일 단말에서 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 스트리밍 제어 시스템 및 방법을 제공한다. An object of the present invention to solve the above problem is to adaptively extract frames in consideration of the resources (battery state) of the terminal in use, mosaic-process harmful videos, and specifically, determine the remaining battery state of the user terminal. The proposed system minimizes the battery consumption of the user terminal and blocks harmful videos by classifying and adaptively determining the frame extraction algorithm according to the state, and extracting frames suitable for the state of the terminal playing the video and determining harmfulness change. A terminal resource-based adaptive frame extraction and streaming control system and method for blocking harmful videos in a mobile terminal, which performs frame extraction adaptively in consideration of the terminal resources in use to block harmful videos in the mobile terminal. to provide.

본 발명의 목적을 달성하기 위해, 모바일 단말에서 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 스트리밍 제어 시스템은 서버; 적응적 프레임 추출 모듈과, 딥러닝 기반의 유해성 분석 엔진과, 유해 동영상의 경우 모자이크 처리하는 스트리밍 제어 모듈을 구비하는 사용자 단말을 포함하고,
상기 사용자 단말의 유해 동영상 차단에 필요한 단말 자원은 단말의 처리 능력과 관련된 요소와 유해 동영상에 대한 대응 결과의 경우 추출된 프레임의 개수와 유해성 분석에 소요된 시간이 존재하며,
상기 사용자 단말의 유해 동영상 차단에 필요한 단말 자원은 단말의 처리 능력과 관련된 요소는 CPU 점유율, 배터리 소모량을 포함하고,
상기 서버로부터 수신한 동영상 세그먼트의 프레임은 실시간으로 추출된 후 상기 유해성 분석 엔진으로 전달되고, 상기 유해성 분석 엔진은 추출 프레임별 유해성 분석 결과를 유해성 변화 판단 모듈로 전달하여 재생 중인 동영상의 유해성 변화를 판단하며, 유해성 판단 결과는 유해성 분석에 필요한 단말 자원의 변화와 함께 프레임 추출 방식을 조정하기 위해 사용되고, 프레임 추출 방식을 위한 기준이 되는 단말 자원 특성은 사용자 단말의 잔여 배터리의 상태가 사용되며,
상기 사용자 단말의 잔여 배터리의 상태는 단말의 스펙에 따라 충분하다고 판단되는 상태(40%), 절전 모드의 기준이 되는 상태(20%)와 초절전 모드의 기준이 되는 상태(10%)로 나뉘고, 모바일 단말에서 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 동영상 세그먼트의 비트율(bit rate)을 적응적으로 결정하는 HTTP 적응적 스트리밍 기반으로 동작하며,
상기 적응적 프레임 추출 모듈은 동영상 유해성 분석의 효율성을 높이기 위해 중복된 동영상 프레임을 제외하고 동영상 프레임을 추출하며, 인접한 두 동영상 프레임에 대한 SSIM 값을 계산하며, 계산된 SSIM 값을 특정 임계값과 비교하여 유사도를 판단하고,
상기 사용자 단말의 배터리 상태가 절전 모드(20%) 이내로 돌입했을 경우, 이전 단계에서 조절한 이미지 resizing 크기와 SSIM 임계값을 유지하고 프레임 추출 간격을 기준 프레임 추출 간격의 2배로 늘려 상기 유해성 분석 엔진이 동작되는 횟수를 감소시킴으로써 단말 자원에 대한 부하를 감소시킨다. In order to achieve the object of the present invention, a terminal resource-based adaptive frame extraction and streaming control system for blocking harmful videos in a mobile terminal includes a server; A user terminal having an adaptive frame extraction module, a deep learning-based harmfulness analysis engine, and a streaming control module that performs mosaic processing in the case of harmful videos,
The terminal resources required to block harmful videos of the user terminal include factors related to the processing capacity of the terminal and the number of frames extracted and the time required for harmfulness analysis in the case of response results to harmful videos.
In the terminal resources required to block harmful videos of the user terminal, factors related to the processing capacity of the terminal include CPU occupancy and battery consumption,
The frame of the video segment received from the server is extracted in real time and transmitted to the harmfulness analysis engine, and the harmfulness analysis engine transmits the harmfulness analysis result for each frame to the harmfulness change determination module to determine the harmfulness change of the video being played. The harmfulness determination result is used to adjust the frame extraction method along with the change in terminal resources required for hazard analysis, and the terminal resource characteristic that is the standard for the frame extraction method is the state of the remaining battery of the user terminal.
The state of the remaining battery of the user terminal is divided into a state that is determined to be sufficient according to the specifications of the terminal (40%), a state that is a standard for power saving mode (20%), and a state that is a standard for ultra power saving mode (10%), It operates based on HTTP adaptive streaming that adaptively determines the bit rate of video segments and adaptive frame extraction based on terminal resources for blocking harmful videos in mobile terminals,
The adaptive frame extraction module extracts video frames excluding duplicate video frames to increase the efficiency of video harmfulness analysis, calculates SSIM values for two adjacent video frames, and compares the calculated SSIM values with a specific threshold. to determine the degree of similarity,
When the battery state of the user terminal enters the power saving mode (20%), the harmfulness analysis engine maintains the image resizing size and SSIM threshold value adjusted in the previous step and increases the frame extraction interval to twice the standard frame extraction interval. By reducing the number of operations, the load on terminal resources is reduced.

또한, 본 발명의 다른 목적을 달성하기 위해 모바일 단말에서 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 스트리밍 제어 방법은 In addition, in order to achieve another object of the present invention, a terminal resource-based adaptive frame extraction and streaming control method for blocking harmful videos in a mobile terminal is

서버; 적응적 프레임 추출 모듈과, 유해 동영상 차단에 필요한 단말 자원은 단말의 처리 능력과 관련된 요소와 유해 동영상에 대한 대응 결과의 경우 추출된 프레임의 개수와 유해성 분석에 소요된 시간이 존재하며, 딥러닝 기반의 유해성 분석 엔진과 유해동영상의 경우 모자이크 처리하는 스트리밍 제어 모듈을 구비하는 사용자 단말을 포함하는 단말 자원(배터리의 상태) 기반의 적응적 프레임 추출 및 스트리밍 제어 시스템에서,
상기 사용자 단말의 유해 동영상 차단에 필요한 단말 자원은 단말의 처리 능력과 관련된 요소와 유해 동영상에 대한 대응 결과의 경우 추출된 프레임의 개수와 유해성 분석에 소요된 시간이 존재하며,
상기 사용자 단말의 유해 동영상 차단에 필요한 단말 자원은 단말의 처리 능력과 관련된 요소는 CPU 점유율, 배터리 소모량을 포함하고, server; The adaptive frame extraction module and the terminal resources required to block harmful videos include factors related to the processing capacity of the terminal and the number of extracted frames and the time required for harmfulness analysis in the case of response results to harmful videos, based on deep learning. In a terminal resource (battery state)-based adaptive frame extraction and streaming control system including a user terminal having a harmfulness analysis engine and a streaming control module for mosaic processing in the case of harmful videos,
The terminal resources required to block harmful videos of the user terminal include factors related to the processing capacity of the terminal and the number of frames extracted and the time required for harmfulness analysis in the case of response results to harmful videos.
In the terminal resources required to block harmful videos of the user terminal, factors related to the processing capacity of the terminal include CPU occupancy and battery consumption,

상기 서버로부터 수신한 동영상 세그먼트의 프레임은 실시간으로 추출된 후 상기 유해성 분석 엔진으로 전달되고, 상기 유해성 분석 엔진은 추출 프레임별 유해성 분석 결과를 유해성 변화 판단 모듈로 전달하여 재생 중인 동영상의 유해성 변화를 판단하며, 유해성 판단 결과는 유해성 분석에 필요한 단말 자원의 변화와 함께 프레임 추출 방식을 조정하기 위해 사용되고, 프레임 추출 방식을 위한 기준이 되는 단말 자원 특성은 사용자 단말의 잔여 배터리의 상태가 사용되며, The frame of the video segment received from the server is extracted in real time and transmitted to the harmfulness analysis engine, and the harmfulness analysis engine transmits the harmfulness analysis result for each frame to the harmfulness change determination module to determine the harmfulness change of the video being played. The harmfulness determination result is used to adjust the frame extraction method along with the change in terminal resources required for hazard analysis, and the terminal resource characteristic that is the standard for the frame extraction method is the state of the remaining battery of the user terminal.

상기 사용자 단말의 잔여 배터리의 상태는 단말의 스펙에 따라 충분하다고 판단되는 상태(40%), 절전 모드의 기준이 되는 상태(20%)와 초절전 모드의 기준이 되는 상태(10%)로 나뉘고, 모바일 단말에서 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 동영상 세그먼트의 비트율(bit rate)을 적응적으로 결정하는 HTTP 적응적 스트리밍 기반으로 동작하는 단계를 포함하며,
상기 적응적 프레임 추출 모듈은 동영상 유해성 분석의 효율성을 높이기 위해 중복된 동영상 프레임을 제외하고 동영상 프레임을 추출하며, 인접한 두 동영상 프레임에 대한 SSIM 값을 계산하며, 계산된 SSIM 값을 특정 임계값과 비교하여 유사도를 판단하고,
상기 사용자 단말의 배터리 상태가 절전 모드(20%) 이내로 돌입했을 경우, 이전 단계에서 조절한 이미지 resizing 크기와 SSIM 임계값을 유지하고 프레임 추출 간격을 기준 프레임 추출 간격의 2배로 늘려 상기 유해성 분석 엔진이 동작되는 횟수를 감소시킴으로써 단말 자원에 대한 부하를 감소시킨다. The state of the remaining battery of the user terminal is divided into a state that is determined to be sufficient according to the specifications of the terminal (40%), a state that is a standard for power saving mode (20%), and a state that is a standard for ultra power saving mode (10%), Operating based on HTTP adaptive streaming to adaptively determine the bit rate of video segments and adaptive frame extraction based on terminal resources for blocking harmful videos in a mobile terminal,
The adaptive frame extraction module extracts video frames excluding duplicate video frames to increase the efficiency of video harmfulness analysis, calculates SSIM values for two adjacent video frames, and compares the calculated SSIM values with a specific threshold. to determine the degree of similarity,
When the battery state of the user terminal enters the power saving mode (20%), the harmfulness analysis engine maintains the image resizing size and SSIM threshold value adjusted in the previous step and increases the frame extraction interval to twice the standard frame extraction interval. By reducing the number of operations, the load on terminal resources is reduced.

본 발명의 모바일 단말에서 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 스트리밍 제어 시스템은 단말의 잔여 배터리 상태를 분류하고 그 상태에 따라 적응적으로 프레임 추출 알고리즘을 결정하고, 제안하는 시스템은 동영상을 재생 중인 단말의 상태에 적합한 프레임 추출 및 유해성 변화 판단을 통해 단말의 배터리 소모를 최소화하고 유해 동영상을 차단하는 효과가 있다. The terminal resource-based adaptive frame extraction and streaming control system for blocking harmful video in a mobile terminal of the present invention classifies the remaining battery state of the terminal and adaptively determines a frame extraction algorithm according to the state, and the proposed system is There is an effect of minimizing battery consumption of the terminal and blocking harmful videos by extracting frames suitable for the state of the terminal playing the video and determining harmfulness changes.

배터리 상태가 충분할 경우 유해성 변화 판단의 성능을 우선으로 하는 알고리즘을 적용한다. 그러나, 배터리 소모 속도가 증가하여 단말이 절전 모드 돌입에 준비해야 할 경우, 이미지 resizing 크기를 줄여 유사도 비교를 위해 필요한 CPU 부하를 감소시키고 SSIM 임계값을 작게 조절함으로써 유해성 분석 엔진에 의한 부하를 감소시킨다. 단말이 절전 모드에 돌입했을 경우, 프레임 추출 간격을 기존의 두 배로 늘려서 적용함으로써 유해성 분석에 의한 부하를 줄이며 단말이 초절전 모드에 돌입했을 경우 프레임 추출 간격을 기존의 3배로 늘리고 유사도 비교를 모든 유해성 변화 구간에 적용하여 유해성 분석에 의한 부하를 최소화한다. 본 발명에서 제안하는 시스템에서 사용되는 여러 임계값과 배터리 상태의 기준의 경우 스트리밍 서비스를 제공하는 서비스 사업자의 기호에 따라 값을 다르게 설정할 수 있다. 또한, 제안하는 시스템에서 스트리밍 제어 방식으로 사용하는 모자이크 처리 외에 스트리밍 세션 차단, 동영상 품질 저하, 유해 동영상 안내 문구 출력과 같은 다양한 방식을 적용할 수 있다.If the battery condition is sufficient, an algorithm that prioritizes the performance of hazardous change determination is applied. However, when the battery consumption rate increases and the terminal needs to prepare for entering power saving mode, the image resizing size is reduced to reduce the CPU load required for similarity comparison, and the load by the hazard analysis engine is reduced by adjusting the SSIM threshold to a small size. . When the terminal enters power saving mode, the frame extraction interval is doubled and applied to reduce the load caused by harmfulness analysis. It is applied to the section to minimize the load by hazard analysis. In the case of various threshold values and battery state standards used in the system proposed in the present invention, values may be set differently according to the preference of a service provider providing a streaming service. In addition, in addition to mosaic processing used as a streaming control method in the proposed system, various methods such as blocking streaming sessions, degrading video quality, and outputting harmful video guide phrases can be applied.

도 1은 종래의 동영상 유해성 분석 결과 기반의 실시간 스트리밍 차단 시스템의 구조를 나타낸 도면이다.
도 2는 본 발명에 따른 단말 자원 특성 기반의 적응적 프레임 추출 기술에 대한 전체 플로우차트이다.
도 3은 사용자 단말의 배터리가 충분할 경우의 유해성 변화 기반의 적응적 프레임 추출 및 스트리밍 제어 과정을 나타낸 도면이다.
도 4는 추출 프레임별 유해성 분석 결과 기반 동영상 유해성 변화 판단.
도 5는 사용자 단말의 배터리가 충분할 경우 동영상 유해 단계별 프레임 추출 간격 및 유사도 비교 결정.
도 6은 동영상 프레임 간 유사도 비교 예시.
도 7은 이미지 정보 평균화 방식의 모자이크 처리 예시.
도 8은 사용자 단말의 배터리 상태가 절전 모드 돌입을 준비해야 하는 경우의 유해 단계별 프레임 추출 간격 및 유사도 비교 결정.
도 9는 사용자 단말의 배터리 상태가 절전 모드(20%)일 경우의 유해 단계별 프레임 추출 간격 및 유사도 비교 결정.
도 10은 사용자 단말의 배터리 상태가 초절전 모드(10%)일 경우의 유해 단계별 프레임 추출 간격 및 유사도 비교 결정. 1 is a diagram showing the structure of a real-time streaming blocking system based on a conventional video harmfulness analysis result.
2 is an overall flowchart of a technique for adaptive frame extraction based on UE resource characteristics according to the present invention.
3 is a diagram illustrating a process of adaptive frame extraction and streaming control based on harmfulness change when the battery of the user terminal is sufficient.
4 is a video harmfulness change determination based on the harmfulness analysis result for each extracted frame.
Figure 5 is a frame extraction interval and similarity comparison determination for each harmful video level when the battery of the user terminal is sufficient.
6 is an example of similarity comparison between video frames.
7 is an example of mosaic processing of an image information averaging method.
8 illustrates comparison and determination of frame extraction intervals and similarities in harmful stages when the battery state of a user terminal needs to prepare for entering a power saving mode.
9 is a frame extraction interval and similarity comparison determination for harmful levels when the battery state of the user terminal is in power saving mode (20%).
10 is a comparison and determination of frame extraction interval and similarity for each harmful level when the battery state of the user terminal is in ultra power saving mode (10%).

이하, 본 발명의 바람직한 실시예를 첨부된 도면을 참조하여 발명의 구성 및 동작을 상세하게 설명한다. 본 발명의 설명에 있어서 관련된 공지의 기술 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 자세한 설명을 생략한다. 또한, 도면 번호는 동일한 구성을 표기할 때에 다른 도면에서 동일한 도면 번호를 부여한다. Hereinafter, the configuration and operation of a preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings. In the description of the present invention, if it is determined that a detailed description of a related known technology or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. In addition, drawing numbers are assigned the same drawing numbers in different drawings when indicating the same configuration.

본 발명은 모바일 단말에서 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 스트리밍 제어 시스템 및 방법을 제공한다. The present invention provides a terminal resource-based adaptive frame extraction and streaming control system and method for blocking harmful video in a mobile terminal.

유해 동영상 차단에 필요한 단말 자원은 단말의 처리 능력과 관련된 요소( CPU 점유율, 배터리 소모량)와 유해 동영상에 대한 대응 결과로 구분된다. 사용자 단말의 처리 능력과 관련된 요소는 CPU 점유율과 배터리 소모량이 있으며, 유해 동영상에 대한 대응 결과의 경우 추출된 프레임의 개수와 유해성 분석에 소요된 시간이 있다. 사용자 단말은 딥러닝 기반의 유해성 분석 엔진이 존재한다. 서버로부터 수신한 동영상 세그먼트의 프레임은 실시간으로 추출된 후 유해성 분석 엔진으로 전달된다. 유해성 분석 엔진은 추출 프레임별 유해성 분석 결과를 유해성 변화 판단 모듈로 전달하여 재생 중인 동영상의 유해성 변화를 판단한다. 유해성 판단 결과는 유해성 분석에 필요한 단말 자원의 변화와 함께 프레임 추출 방식을 조정하기 위해 사용된다. 프레임 추출 방식 조절을 위한 기준이 되는 단말 자원 특성은 사용자의 단말의 잔여 배터리이다. 잔여 배터리의 상태는 단말의 스펙에 따라 충분하다고 판단되는 상태(40%), 절전 모드의 기준이 되는 상태(20%)와 초절전 모드의 기준이 되는 상태(10%)로 나뉜다. 단말의 잔여 배터리가 40% 이상일 경우 유해성 변화와 프레임 간의 유사도 비교 기반의 적응적 프레임 추출 알고리즘을 적용한다. 사용자 단말의 잔여 배터리 상태가 절전 모드 돌입을 준비해야 하는 경우일 때에는 유해 동영상 차단 과정에서 CPU 부하에 주는 영향이 적은 이미지 resizing 크기와 SSIM 임계값을 제어함으로써 배터리의 빠른 소모에 대비한다. 단말의 잔여 배터리가 절전 모드 상태이거나 또는 초절전 모드 상태일 경우 프레임 추출 간격 제어와 유사도 비교를 추가로 적용하여 단말의 부하를 감소시킨다.Terminal resources required to block harmful videos are divided into factors related to the processing capacity of the terminal (CPU share, battery consumption) and response results for harmful videos. Factors related to the processing power of the user terminal include CPU occupancy and battery consumption, and in the case of response results to harmful videos, there are the number of frames extracted and the time required for harmfulness analysis. The user terminal has a deep learning-based hazard analysis engine. The frame of the video segment received from the server is extracted in real time and transmitted to the harmfulness analysis engine. The harmfulness analysis engine transfers the harmfulness analysis result for each extracted frame to the harmfulness change determination module to determine the harmfulness change of the video being played. The harmfulness determination result is used to adjust the frame extraction method along with changes in terminal resources required for harmfulness analysis. A terminal resource characteristic that is a criterion for adjusting the frame extraction method is the remaining battery of the user's terminal. The remaining battery state is divided into a state that is determined to be sufficient (40%), a state that is the standard for power saving mode (20%), and a state that is the standard for ultra power saving mode (10%) according to the specifications of the terminal. If the remaining battery of the terminal is 40% or more, an adaptive frame extraction algorithm based on the similarity comparison between harmfulness changes and frames is applied. When the remaining battery state of the user terminal is in preparation for entering the power saving mode, it prepares for rapid battery consumption by controlling the image resizing size and SSIM threshold, which have little effect on CPU load during the harmful video blocking process. When the remaining battery of the terminal is in the power saving mode or ultra power saving mode, the load of the terminal is reduced by additionally applying frame extraction interval control and similarity comparison.

모바일 단말에서 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 스트리밍 제어 시스템은 HTTP(Hyper Text Transfer Protocol) 적응적 스트리밍을 기반으로 동작한다. HTTP 적응적 스트리밍은 인터넷을 통해 동영상을 스트리밍할 때 사용자의 체감 품질(Quality of Experience, QoE)을 보장하기 위한 기술로써, 현재 네트워크 상태에 따라 요청할 동영상 세그먼트의 비트율(bit rate)을 적응적으로 결정하기 때문에 끊김 없는 스트리밍 서비스를 제공할 수 있다. 또한, HTTP 적응적 스트리밍은 HTTP를 기반으로 동작하기 때문에 방화벽 문제가 발생하지 않으며 기존의 웹 서버를 활용할 수 있는 장점이 있다. HTTP 적응적 스트리밍에서 서버는 동영상을 고정된 재생 길이와 서로 다른 비트율을 가지는 세그먼트들로 인코딩하여 저장한다. 동영상 세그먼트에 대한 정보는 MPD(Media Presentation Description) 파일로 저장되어 스트리밍 시작 후 클라이언트로 전달된다. 클라이언트는 수신한 MPD 파일의 동영상 세그먼트 관련 정보와 측정한 네트워크 상태를 사용하여 다음에 요청할 동영상 세그먼트의 비트율을 결정한다.A terminal resource-based adaptive frame extraction and streaming control system for blocking harmful videos in a mobile terminal operates based on HTTP (Hyper Text Transfer Protocol) adaptive streaming. HTTP adaptive streaming is a technology for ensuring quality of experience (QoE) for users when streaming video over the Internet, and adaptively determines the bit rate of video segments to be requested according to the current network condition. Therefore, it is possible to provide seamless streaming service. In addition, since HTTP adaptive streaming operates based on HTTP, it does not cause a firewall problem and has the advantage of being able to utilize an existing web server. In HTTP adaptive streaming, a server encodes and stores a video in segments having a fixed playback length and different bit rates. Information on video segments is stored as a Media Presentation Description (MPD) file and delivered to the client after streaming starts. The client determines the bit rate of the next video segment to be requested using the video segment related information of the received MPD file and the measured network condition.

MPD 파일은 서버가 동여상 스트리밍할때, 동영상 세그먼트의 정보(720P, 240P 화질의 duration), 동영상 세그먼트의 품질(bit rate), 서버 URL 정보를 포함한다. The MPD file includes video segment information (720P, 240P quality duration), video segment quality (bit rate), and server URL information when a server streams a video.

모바일 단말에서 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 스트리밍 제어 시스템에서 추출된 프레임에 대한 유해성 분석을 수행하는 딥러닝 기반의 유해성 분석 엔진은 CNN(Convolutional Neural Network) 모델로 구성된다. CNN 모델은 입력 이미지에 대한 특성을 이용한 학습에 특화되어 있기 때문에 유해한 이미지와 유해하지 않은 이미지를 구분하는 작업에 적합하다. 사용자 단말에서 추출된 프레임이 유해성 분석 엔진으로 전달될 경우, 유해성 분석 엔진은 분석 결과로 해당 프레임의 유해 여부와 유해성 분석에 소요된 시간을 출력한다. 본 발명은 유해성 변화가 존재하는 스트리밍 동영상에 대해 단말 자원 특성을 반영하여 실시간으로 대응하기 위한 적응적 프레임 추출 및 스트리밍 제어 메커니즘에 관한 것으로 유해성 분석 엔진이 입력 이미지의 유해 여부를 판단하는 구체적인 메커니즘에 대해서는 기술하지 않는다. 또한, 유해 동영상은 미성년자에게 부적합한 음란성을 가진 동영상을 의미한다.A deep learning-based harmfulness analysis engine that performs harmfulness analysis on frames extracted from a terminal resource-based adaptive frame extraction and streaming control system for blocking harmful videos in a mobile terminal is composed of a Convolutional Neural Network (CNN) model. Since the CNN model is specialized in learning using the characteristics of input images, it is suitable for the task of distinguishing between harmful and non-harmful images. When the frame extracted from the user terminal is transmitted to the harmfulness analysis engine, the harmfulness analysis engine outputs whether the frame is harmful and the time required for the harmfulness analysis as an analysis result. The present invention relates to an adaptive frame extraction and streaming control mechanism for responding in real time by reflecting terminal resource characteristics for streaming video in which harmfulness changes exist. do not describe In addition, harmful videos refer to videos with obscenities that are unsuitable for minors.

CNN 기반의 모델은 이미지를 입력으로 받은 후 그 이미지의 특징을 추출하여 이미지의 장면을 분류하거나 특징을 추출한다. 유해성 분석 엔진은 CNN 기반의 모델을 유해한 이미지와 무해한 이미지로 분류된 데이터 세트를 사용하여 학습시킨 유해 이미지 분류 엔진이다. 학습에 사용되는 이미지 세트는 사전에 각 이미지가 유해한 이미지인지 무해한 이미지인지 라벨링 되어있다. 유해성 분석 엔진은 임의의 이미지가 입력되었을 때 그 이미지의 장면 정보를 추출하여 학습된 모델을 통해 해당 이미지가 유해한 이미지인지 무해한 이미지인지 분류한다. 이미지의 장면 정보는 픽셀 RGB 값의 분포, 장면 내 움직임 검출 결과, 그리고 HSV (Hue, Saturation, Value) 값과 같은 것들을 의미한다.A CNN-based model receives an image as an input and then extracts features from the image to classify scenes or extract features from the image. The harmfulness analysis engine is a harmful image classification engine that trains a CNN-based model using a data set classified into harmful and harmless images. The image set used for learning is previously labeled as harmful or harmless. When an arbitrary image is input, the harmfulness analysis engine extracts scene information of the image and classifies whether the image is harmful or harmless through a learned model. The scene information of an image means things such as pixel RGB value distribution, motion detection results in the scene, and HSV (Hue, Saturation, Value) values.

따라서, 유해성 분석 엔진에 전달되는 프레임은 유해성을 판단하고자 하는 장면의 대표 프레임이 되어야 한다. 장면의 대표 프레임을 추출하기 위해 본 발명에서는 유사도 비교를 통해 중복되지 않는 프레임을 기준으로 장면을 나누고, 프레임을 추출하였다. 유사도 비교 연산의 경우는 유해성 분석 엔진 내부에서 수행되는 작업이 아니고 프레임 추출 알고리즘상에서 수행되는 작업이다. 유사도 비교 연산은 추출된 프레임을 Mat 형태로 변환한 후에 기술되어 있는 수식을 기반으로 계산된다.Accordingly, the frame transmitted to the harmfulness analysis engine should be a representative frame of a scene for which harmfulness is to be determined. In order to extract a representative frame of a scene, the present invention divides the scene based on non-overlapping frames through similarity comparison and extracts the frame. In the case of similarity comparison calculation, it is not a task performed inside the hazard analysis engine, but a task performed on the frame extraction algorithm. The similarity comparison operation is calculated based on the formula described after converting the extracted frame into Mat format.

도 2는 본 발명에 따른 모바일 단말에서의 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 스트리밍 제어 시스템의 전체 플로우차트를 나타낸다. 재생 중인 동영상의 유해성을 실시간으로 분석하기 위해 필요한 단말 자원(잔여 배터리, CPU 부하)은 유해성 분석의 부하에 영향을 끼친다. 제안하는 시스템은 동영상을 재생 중인 단말의 잔여 배터리를 기준으로 단말 자원의 부하에 영향을 주는 컨텍스트들(contexts)을 조절함으로써 유해성 분석에 활용되는 단말의 처리 능력을 고려한 프레임 추출 기술을 수행한다. 사용자 단말의 잔여 배터리 상태는 배터리가 충분할 경우(40% 이상일 경우), 배터리 소모 속도를 줄여 절전 모드 돌입을 준비해야 하는 경우(40~20% 일 경우), 절전 모드에 돌입했을 경우(20~10% 일 경우)와 초절전 모드에 돌입했을 경우(10~0%)로 분류된다. 각 배터리 상태에 따른 알고리즘의 변화는 구체적으로 서술한다. 사용자 단말의 배터리의 상태를 판단하는 기준은 단말의 스펙에 따라 서비스 제공자가 선택 가능한 값이며 본 발명에서는 40%, 20% 그리고 10%를 그 기준으로 설정하였다.2 shows an overall flow chart of a terminal resource-based adaptive frame extraction and streaming control system for blocking harmful video in a mobile terminal according to the present invention. Device resources (remaining battery, CPU load) required to analyze the harmfulness of a video being played in real time affect the harmfulness analysis load. The proposed system performs a frame extraction technique considering the processing capacity of the terminal used for harmfulness analysis by adjusting the contexts that affect the load of terminal resources based on the remaining battery of the terminal playing the video. The remaining battery status of the user device is determined when the battery is sufficient (more than 40%), when the battery consumption rate needs to be reduced to prepare for power saving mode (40 to 20%), and when entering power saving mode (20 to 10%). %) and when entering ultra power saving mode (10~0%). Algorithm changes according to each battery state are described in detail. The criterion for determining the battery state of the user terminal is a value selectable by the service provider according to the specifications of the terminal, and 40%, 20%, and 10% are set as the criteria in the present invention.

도 3은 모바일 단말에서 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 스트리밍 제어 시스템에서 사용자 단말의 잔여 배터리가 충분한 상태일 경우의 유해성 분석 및 스트리밍 제어 과정을 나타낸다. 3 shows a harmfulness analysis and streaming control process when the remaining battery of a user terminal is in a sufficient state in a terminal resource-based adaptive frame extraction and streaming control system for blocking harmful videos in a mobile terminal.

서버로부터 동영상 스트림이 전송되면, 제안 시스템은 사용자 단말의 잔여 배터리가 충분한 상태일 경우 HTTP 기반 적응적 프레임 추출 알고리즘에 따라 동영상 프레임을 추출한다. 추출된 프레임은 딥러닝 기반의 유해성 분석 엔진(100)으로 전달된다. 유해성 분석 엔진(100)은 입력된 동영상 프레임에 대한 유해성 분석을 통해 프레임별 유해 확률의 형태로 유해성 분석 결과를 동영상 스트리밍 차단 제어 모듈(120)과 적응적 프레임 추출 모듈(110)로 전달한다. When a video stream is transmitted from the server, the proposed system extracts video frames according to an HTTP-based adaptive frame extraction algorithm when the remaining battery of the user terminal is sufficient. The extracted frames are delivered to the deep learning-based hazard analysis engine 100. The harmfulness analysis engine 100 transmits the harmfulness analysis result in the form of harmfulness probability for each frame to the video streaming blocking control module 120 and the adaptive frame extraction module 110 through harmfulness analysis on the input video frame.

적응적 프레임 추출 모듈(110)은 유해성 분석 결과에 따라 동영상의 유해성 변화를 판단하고, 유해성 변화에 따라 프레임 추출 간격 및 유사도 비교를 결정한다. 제안하는 시스템은 중복된 프레임의 추출을 방지하여 부하를 최소화하기 위해 동영상이 유해하다고 판단되기 전까지 고정 간격과 유사도 비교를 적용하여 프레임을 추출한다. 프레임 간 유사도 비교를 통한 추출은 SSIM 값을 계산하여 사전에 설정된 임계값을 초과할 경우 유사한 프레임으로 판단하여 추출 대상에서 제외하며 임계값 보다 낮은 값이 될 경우 서로 다른 장면의 프레임으로 판단하여 추출 대상에 포함한다. The adaptive frame extraction module 110 determines the change in harmfulness of the video according to the harmfulness analysis result, and determines the frame extraction interval and similarity comparison according to the harmfulness change. The proposed system extracts frames by applying a fixed interval and similarity comparison until a video is determined to be harmful in order to minimize the load by preventing duplicate frame extraction. Extraction through similarity comparison between frames calculates the SSIM value, and if it exceeds a preset threshold, it is judged as a similar frame and excluded from the extraction target. include in

본 발명의 모바일 단말에서 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 스트리밍 제어 시스템은 서버; 적응적 프레임 추출 모듈과, 딥러닝 기반의 유해성 분석 엔진과, 유해 동영상의 경우 모자이크 처리하는 스트리밍 제어 모듈을 구비하는 사용자 단말을 포함하고,
상기 사용자 단말의 유해 동영상 차단에 필요한 단말 자원은 단말의 처리 능력과 관련된 요소와 유해 동영상에 대한 대응 결과의 경우 추출된 프레임의 개수와 유해성 분석에 소요된 시간이 존재하며,
상기 사용자 단말의 유해 동영상 차단에 필요한 단말 자원은 단말의 처리 능력과 관련된 요소는 CPU 점유율, 배터리 소모량을 포함하고,
상기 서버로부터 수신한 동영상 세그먼트의 프레임은 실시간으로 추출된 후 상기 유해성 분석 엔진으로 전달되고, 상기 유해성 분석 엔진은 추출 프레임별 유해성 분석 결과를 유해성 변화 판단 모듈로 전달하여 재생 중인 동영상의 유해성 변화를 판단하며, 유해성 판단 결과는 유해성 분석에 필요한 단말 자원의 변화와 함께 프레임 추출 방식을 조정하기 위해 사용되고, 상기 프레임 추출 방식 조절을 위한 기준이 되는 단말 자원 특성은 사용자 단말의 잔여 배터리의 상태이며,
상기 사용자 단말의 잔여 배터리의 상태는 단말의 스펙에 따라 충분하다고 판단되는 상태(40%), 절전 모드의 기준이 되는 상태(20%)와 초절전 모드의 기준이 되는 상태(10%)로 나뉘고, 모바일 단말에서 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 동영상 세그먼트의 비트율(bit rate)을 적응적으로 결정하는 HTTP 적응적 스트리밍 기반으로 동작하며,
상기 적응적 프레임 추출 모듈은 동영상 유해성 분석의 효율성을 높이기 위해 중복된 동영상 프레임을 제외하고 동영상 프레임을 추출하며, 인접한 두 동영상 프레임에 대한 SSIM 값을 계산하며, 계산된 SSIM 값을 특정 임계값과 비교하여 유사도를 판단하고,
상기 사용자 단말의 배터리 상태가 절전 모드(20%)에 돌입했을 경우, 이전 단계에서 조절한 이미지 resizing 크기와 SSIM 임계값을 유지하고 프레임 추출 간격을 기준 프레임 추출 간격의 2배로 늘려 상기 유해성 분석 엔진이 동작되는 횟수를 감소시킴으로써 단말 자원에 대한 부하를 감소시킨다. A terminal resource-based adaptive frame extraction and streaming control system for blocking harmful video in a mobile terminal of the present invention includes a server; A user terminal having an adaptive frame extraction module, a deep learning-based harmfulness analysis engine, and a streaming control module that performs mosaic processing in the case of harmful videos,
The terminal resources required to block harmful videos of the user terminal include factors related to the processing capacity of the terminal and the number of frames extracted and the time required for harmfulness analysis in the case of response results to harmful videos.
In the terminal resources required to block harmful videos of the user terminal, factors related to the processing capacity of the terminal include CPU occupancy and battery consumption,
The frame of the video segment received from the server is extracted in real time and transmitted to the harmfulness analysis engine, and the harmfulness analysis engine transmits the harmfulness analysis result for each frame to the harmfulness change determination module to determine the harmfulness change of the video being played. The harmfulness determination result is used to adjust the frame extraction method along with the change in terminal resources necessary for the harmfulness analysis, and the terminal resource characteristic that is the criterion for adjusting the frame extraction method is the state of the remaining battery of the user terminal,
The state of the remaining battery of the user terminal is divided into a state that is determined to be sufficient according to the specifications of the terminal (40%), a state that is a standard for power saving mode (20%), and a state that is a standard for ultra power saving mode (10%), It operates based on HTTP adaptive streaming that adaptively determines the bit rate of video segments and adaptive frame extraction based on terminal resources for blocking harmful videos in mobile terminals,
The adaptive frame extraction module extracts video frames excluding duplicate video frames to increase the efficiency of video harmfulness analysis, calculates SSIM values for two adjacent video frames, and compares the calculated SSIM values with a specific threshold. to determine the degree of similarity,
When the battery state of the user terminal enters power saving mode (20%), the harmfulness analysis engine maintains the image resizing size and SSIM threshold value adjusted in the previous step and increases the frame extraction interval to twice the standard frame extraction interval. By reducing the number of operations, the load on terminal resources is reduced.

또한, 본 발명의 모바일 단말에서 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 스트리밍 제어 방법은 In addition, the terminal resource-based adaptive frame extraction and streaming control method for blocking harmful videos in a mobile terminal of the present invention

서버; 적응적 프레임 추출 모듈과, 딥러닝 기반의 유해성 분석 엔진과 유해 동영상의 경우 모자이크 처리하는 스트리밍 제어 모듈을 구비하는 사용자 단말을 포함하는 단말 자원(배터리의 상태) 기반의 적응적 프레임 추출 및 스트리밍 제어 시스템에서,
상기 사용자 단말의 유해 동영상 차단에 필요한 단말 자원은 단말의 처리 능력과 관련된 요소와 유해 동영상에 대한 대응 결과의 경우 추출된 프레임의 개수와 유해성 분석에 소요된 시간이 존재하며,
상기 사용자 단말의 유해 동영상 차단에 필요한 단말 자원은 단말의 처리 능력과 관련된 요소는 CPU 점유율, 배터리 소모량을 포함하고, server; Device resource (battery status) based adaptive frame extraction and streaming control system including a user terminal having an adaptive frame extraction module, a deep learning-based harmfulness analysis engine, and a streaming control module that performs mosaic processing in case of harmful video at,
The terminal resources required to block harmful videos of the user terminal include factors related to the processing capacity of the terminal and the number of frames extracted and the time required for harmfulness analysis in the case of response results to harmful videos.
In the terminal resources required to block harmful videos of the user terminal, factors related to the processing capacity of the terminal include CPU occupancy and battery consumption,

상기 서버로부터 수신한 동영상 세그먼트의 프레임은 실시간으로 추출된 후 상기 유해성 분석 엔진으로 전달되고, 상기 유해성 분석 엔진은 추출 프레임별 유해성 분석 결과를 유해성 변화 판단 모듈로 전달하여 재생 중인 동영상의 유해성 변화를 판단하며, 유해성 판단 결과는 유해성 분석에 필요한 단말 자원의 변화와 함께 프레임 추출 방식을 조정하기 위해 사용되고, 프레임 추출 방식 조절을 위한 기준이 되는 단말 자원 특성은 사용자 단말의 잔여 배터리의 상태이며, The frame of the video segment received from the server is extracted in real time and transmitted to the harmfulness analysis engine, and the harmfulness analysis engine transmits the harmfulness analysis result for each frame to the harmfulness change determination module to determine the harmfulness change of the video being played. The harmfulness determination result is used to adjust the frame extraction method along with the change in terminal resources necessary for the harmfulness analysis, and the terminal resource characteristic that is the criterion for adjusting the frame extraction method is the state of the user terminal's remaining battery,

도 4는 사용자 단말의 잔여 배터리가 충분한 상태일 경우 추출 프레임별 유해성 분석 결과를 기반으로 스트리밍 동영상의 유해성 변화를 판단하는 과정을 나타낸다. 유해성 변화 판단은 동영상 세그먼트의 재생 길이마다 수행되며 세그먼트의 마지막 프레임의 유해성 분석 결과에 따라 다음 동영상 세그먼트에 대한 유해 여부를 예측한다. (a) 무해 → 유해는 무해 동영상 세그먼트의 마지막 프레임이 유해성 분석 결과 유해 프레임으로 판단되었을 경우로 다음 동영상 세그먼트를 유해 동영상 세그먼트로 예측하며 유해 동영상 스트리밍에 빠르게 대응하기 위해 해당 세그먼트 내에서 짧은 고정 간격으로 유사도 비교를 적용하지 않고 프레임을 추출한다. (b) 유해 → 유해는 유해 동영상 세그먼트의 마지막 프레임이 유해성 분석 결과 유해 프레임으로 판단되었을 경우로 프레임 추출에 의한 부하와 유해 동영상 스트리밍 실시간 대응 간의 균형을 위해 (a) 무해 → 유해 경우와 (c) 유해 → 무해 경우의 중간 간격으로 유사도 비교를 적용하지 않고 프레임을 추출한다. (c) 유해 → 무해는 유해 동영상 세그먼트의 마지막 프레임이 유해성 분석 결과 무해 프레임으로 판단되었을 경우로 불필요한 프레임 추출을 최소화하기 위해 해당 세그먼트 내에서 긴 고정 간격으로 유사도 비교를 적용하여 프레임을 추출한다. 4 illustrates a process of determining a change in harmfulness of a streaming video based on a harmfulness analysis result for each extracted frame when the remaining battery of the user terminal is in a sufficient state. The harmfulness change judgment is performed for each playback length of the video segment, and based on the harmfulness analysis result of the last frame of the segment, whether or not the next video segment is harmful is predicted. (a) Harmless → Harmful is when the last frame of a harmless video segment is determined to be a harmful frame as a result of the harmfulness analysis. The next video segment is predicted as a harmful video segment. Frames are extracted without applying similarity comparison. (b) Harmful → Harmful is when the last frame of a harmful video segment is determined to be a harmful frame as a result of harmfulness analysis. To balance the load by frame extraction and real-time response to harmful video streaming, (a) harmless → harmful Frames are extracted without applying similarity comparison at intermediate intervals between harmful → harmless cases. (c) Harmful → harmless is when the last frame of a harmful video segment is judged to be harmless as a result of harmfulness analysis. In order to minimize unnecessary frame extraction, frames are extracted by applying similarity comparison at long fixed intervals within the segment.

도 5는 동영상 유해성 변화에 따른 프레임 추출 간격 및 유사도 비교 결정을 나타낸다. 프레임 추출 간격은 30 FPS 동영상을 기준으로 하며 유해 부분으로 스트리밍 동영상의 유해성이 변화하였을 경우 가장 짧은 간격과 강제 추출, 유해 부분이 유지되는 경우에는 중간 간격과 강제 추출, 그리고 무해 부분으로 스트리밍 동영상의 유해성이 변화하였을 경우에는 가장 긴 간격과 유사도 비교를 적용한다. 이 단계에서 사용하는 프레임 추출 간격은 다른 단계에서 사용하는 프레임 추출 간격의 기준으로 사용한다. 제안하는 기술에서는 사전에 반복 실험을 통해 유해성 분석 엔진의 부하를 과도하게 발생시키지 않고, 스트리밍 동영상의 유해성을 적절하게 판단하기 위한 최소 프레임 개수를 후보 간격으로 설정하였다.5 shows frame extraction intervals and similarity comparison decisions according to changes in video harmfulness. The frame extraction interval is based on 30 FPS video, and if the harmfulness of the streaming video changes to the harmful part, the shortest interval and forced extraction, if the harmful part is maintained, the middle interval and forced extraction, and the harmless part, the harmfulness of the streaming video If this has changed, the longest interval and similarity comparison are applied. The frame extraction interval used in this stage is used as a criterion for the frame extraction interval used in other stages. In the proposed technique, the minimum number of frames to properly determine the harmfulness of a streaming video is set as the candidate interval without excessive load on the harmfulness analysis engine through repeated experiments in advance.

도 6은 본 시스템에서 제시한 동영상 프레임 간 유사도 비교 기반의 프레임 추출을 위해 SSIM (Structural Similarity; 구조적 유사도 지수) 값을 계산하여 프레임 간 유사도 비교를 수행하는 예시를 나타낸다. SSIM은 주로 2D 이미지에 대한 지각 품질 평가에서 사용되는 대표적인 메트릭으로, 사람의 시각 시스템은 이미지로부터 구조 관련 정보를 도출하는데 특화되어 있기 때문에 이미지 구조의 왜곡 정도가 지각 품질에 가장 큰 영향을 미친다는 가정에 근거하여 계산되는 메트릭이다. 따라서, SSIM 값은 유사도 비교 대상이 되는 두 이미지의 각 픽셀에 대한 밝기 정보와 왜곡 정보를 이용하여 계산된다. 본 발명에서 제안하는 동영상 스트림 특성 기반의 스트리밍 제어 시스템은 SSIM 값을 기반으로 추출된 동영상 프레임 간의 유사도를 비교한다. 도 5에 나타난 것처럼 추출된 동영상 프레임 중 10번째 프레임과 11번째 프레임에 대한 SSIM 값은 0.4795로 작은 값을 가진다. SSIM 값이 작을 경우 제안하는 동영상 스트림 특성 기반의 스트리밍 제어 시스템은 해당 프레임들이 서로 다른 장면에 포함된다고 판단한다. 반면 11번째 동영상 프레임과 12번째 동영상 프레임에 대한 SSIM 값은 0.8258로 큰 값을 가지기 때문에 이 경우 해당 프레임들이 같은 장면에 포함된다고 판단한다. 도 5의 예시는 유사도 비교 기반의 동영상 프레임 추출이 어떤 방식으로 동작하는지 보여주기 위한 예시로, 제안하는 동영상 스트림 특성 기반의 스트리밍 제어 시스템에서 동영상 세그먼트 내에 포함된 여러 장면을 구분하고 각 장면을 대표하는 프레임을 추출하는 메커니즘과는 차이가 존재한다. 본 발명에서 장면 변화의 기준은 인접한 동영상 프레임 간의 유사도를 비교할 때 SSIM 값이 크게 변하는 부분이 된다. 예를 들면, 도 4에서 첫 번째로 추출된 동영상 프레임이 포함된 장면의 경우 추출된 프레임과 유사한 프레임들이 모여 있는 장면으로 볼 수 있다. 반면 두 번째로 추출된 동영상 프레임이 포함된 장면의 경우 첫 번째로 추출된 동영상 프레임이 포함된 장면과 비교할 때 SSIM 값이 크게 변하는 프레임부터 시작되는 장면으로 볼 수 있다.6 shows an example of performing similarity comparison between frames by calculating SSIM (Structural Similarity; Structural Similarity Index) values for frame extraction based on similarity comparison between video frames proposed by the present system. SSIM is a representative metric mainly used in evaluating the perceptual quality of 2D images. Since the human visual system is specialized in deriving structure-related information from images, it is assumed that the degree of distortion of the image structure has the greatest effect on the perceptual quality. It is a metric that is calculated based on Therefore, the SSIM value is calculated using the brightness information and distortion information of each pixel of the two images to be compared for similarity. The streaming control system based on the characteristics of a video stream proposed in the present invention compares the similarity between extracted video frames based on SSIM values. As shown in FIG. 5, SSIM values for the 10th and 11th frames among the extracted video frames have a small value of 0.4795. If the SSIM value is small, the proposed streaming control system based on video stream characteristics determines that the corresponding frames are included in different scenes. On the other hand, SSIM values for the 11th video frame and the 12th video frame have a large value of 0.8258, so in this case, it is determined that the corresponding frames are included in the same scene. The example of FIG. 5 is an example to show how video frame extraction based on similarity comparison operates. There is a difference with the mechanism for extracting frames. In the present invention, the criterion for scene change is the part where the SSIM value greatly changes when comparing similarities between adjacent video frames. For example, in the case of a scene including a video frame extracted first in FIG. 4 , it can be regarded as a scene in which frames similar to the extracted frame are gathered. On the other hand, in the case of a scene including the second extracted video frame, compared to a scene including the first extracted video frame, it can be regarded as a scene starting from a frame in which the SSIM value changes greatly.

원본 이미지를 x, 비교 이미지를 y라고 할 때 SSIM 값을 계산하기 위해 우선 두 이미지의 픽셀들에 대한 평균 밝기 I(x,y)를 다음 수식과 같이 계산한다. 는 원본 이미지 x의 평균 밝기, 는 비교 이미지 y의 평균 밝기, 그리고 은 이미지 밝기 상수를 의미한다. 해당 수식에 의해 계산되는 값은 원본 이미지와 비교 이미지의 평균 밝기를 종합하여 두 이미지의 평균 밝기를 나타낼 수 있도록 정규화된 값을 의미한다.When x is the original image and y is the comparison image, the average brightness I(x,y) of the pixels of the two images is first calculated as follows to calculate the SSIM value. is the average brightness of the original image x, is the average brightness of the comparison image y, and denotes the image brightness constant. The value calculated by the corresponding formula means a normalized value to represent the average brightness of the two images by combining the average brightness of the original image and the comparison image.

평균 밝기를 계산한 후, 각 이미지의 표준편차 값을 이미지의 콘트라스트 값 으로 정의하고, 두 이미지의 표준편차 값을 다음 수식과 같이 계산한다. 는 원본 이미지 의 밝기 표준편차, 는 비교 이미지 의 밝기 표준 편차, 그리고 는 이미지 콘트라스트 상수를 의미한다. 두 이미지의 평균 밝기를 계산하는 수식과 마찬가지로, 해당 수식에 의해 계산되는 값은 원본 이미지와 비교 이미지의 밝기 표준편차를 종합하여 두 이미지의 콘트라스트를 나타낼 수 있도록 정규화된 값을 의미한다.After calculating the average brightness, the standard deviation value of each image is the contrast value of the image. , and calculate the standard deviation of the two images as follows. is the original image Brightness standard deviation of , is the comparison image is the brightness standard deviation of , and denotes an image contrast constant. Like the formula for calculating the average brightness of two images, the value calculated by the formula means a normalized value to represent the contrast between the two images by combining the brightness standard deviations of the original image and the comparison image.

이미지 콘트라스트를 계산한 후, 두 이미지 간의 밝기 공분산과 밝기 표준편차를 이용하여 이미지 구조 지수 의 값을 다음 수식과 같이 계산한다. 밝기 공분산 는 각 이미지의 밝기 정보가 서로 얼마나 연관성이 있는지를 나타내기 위한 값이며 는 이미지 구조 상수를 의미한다. 해당 수식에 의해 계산되는 값은 두 이미지 각각의 밝기 표준편차 대비 두 이미지 사이의 연관된 밝기 표준 편차의 값의 상대적 비율을 의미한다.After calculating the image contrast, the image structure index is calculated using the brightness covariance and brightness standard deviation between the two images. The value of is calculated by the following formula. brightness covariance is a value indicating how correlated the brightness information of each image is with each other denotes an image structure constant. The value calculated by the corresponding formula means the relative ratio of the value of the associated brightness standard deviation between the two images to the brightness standard deviation of each of the two images.

이미지 구조 상수 는 이미지 콘트라스트 상수 를 이용하여 다음 수식과 같이 계산된다. 이미지 콘트라스트를 계산하는 과정이 이미지의 밝기 표준편차를 포함하기 때문에 이미지 구조 지수를 계산하는 과정과 연관성이 있으며, 다만 이미지 구조 지수 계산에는 밝기 공분산을 사용하기 때문에 상수항에 의한 편향을 줄이기 위해 해당 수식과 같이 이미지 구조 상수를 계산하게 된다.image structure constant is the image contrast constant is calculated using the following formula. Since the process of calculating the image contrast includes the brightness standard deviation of the image, it is related to the process of calculating the image structure index. However, since the brightness covariance is used to calculate the image structure index, the corresponding formula and In the same way, image structure constants are calculated.

최종적으로, SSIM 값은 두 이미지의 평균 밝기, 이미지의 콘트라스트 값, 그리고 이미지 구조 지수를 종합하여 다음 수식과 같이 계산된다. 이때 사용되는 가중치 a, b, c의 값은 모두 1로 설정된다. 해당 수식이 의미하는 바는 사람의 지각 품질에 영향을 주는 각 요소를 모두 반영하여 원본 이미지와 비교 이미지 간의 유사도를 비교하겠다는 것으로, 가중치가 모두 1로 설정되었기 때문에 SSIM 계산에 있어서 각 요소가 모두 같은 영향력을 가지게 된다.Finally, the SSIM value is calculated as the following formula by combining the average brightness of the two images, the contrast value of the images, and the image structure index. The values of the weights a, b, and c used at this time are all set to 1. What this formula means is to compare the similarity between the original image and the comparison image by reflecting each factor that affects human perception quality. have an influence

도 7은 제안하는 단말 자원 특성 기반의 스트리밍 제어 시스템에서 사용되는 이미지 정보 평균화 방식의 모자이크 처리 예시를 나타낸다. 7 shows an example of mosaic processing of an image information averaging method used in the proposed streaming control system based on terminal resource characteristics.

스트리밍 제어 모듈(120)은 유해성 분석 엔진(100)에서 유해 동영상이 검출되면 유해 동영상으로 판단되면 모자이크 처리를 하거나 스트리밍 차단을 실시할 수도 있다. When a harmful video is detected by the harmfulness analysis engine 100, the streaming control module 120 may perform mosaic processing or block streaming if the video is determined to be harmful.

유해 동영상 스트리밍 실시간 대응을 위해 단말은 재생 중인 동영상 프레임 위에 임의의 레이어를 추가한 후 여러 개의 격자 셀로 프레임 내의 구역을 나눈다. 모자이크 처리는 격자 셀로 나누어진 프레임 내의 구역에 포함된 픽셀들의 이미지 정보를 평균화하는 방식으로 수행된다. 따라서, 격자 셀의 개수가 적을 경우 모자이크 처리 효과가 커지지만 이미지 정보에 대한 평균화 과정이 복잡해지기 때문에 처리 시간이 증가하며 격자 셀의 개수가 많을 경우 처리 시간은 감소하나 유해 동영상 노출을 효과적으로 줄일 수 있을 정도로 모자이크 처리 효과가 나타나지 않는다. 본 발명에서 제안하는 시스템이 사용하는 모자이크 처리 방식의 경우 재생 중인 동영상 프레임의 이미지 정보에 직접 접근하지 않고 임의의 레이어를 추가하는 방식으로 수행되기 때문에, 추출된 동영상 프레임의 정보 손상 없이 유해성 분석이 가능한 장점이 있다.To respond to harmful video streaming in real time, the terminal adds an arbitrary layer on the video frame being played and then divides the area within the frame into several grid cells. Mosaic processing is performed by averaging image information of pixels included in areas within a frame divided into grid cells. Therefore, if the number of grid cells is small, the mosaic processing effect increases, but the processing time increases because the averaging process for image information becomes complicated, and if the number of grid cells is large, the processing time decreases but harmful video exposure can be effectively reduced To this extent, the mosaic effect does not appear. In the case of the mosaic processing method used by the system proposed in the present invention, it is possible to analyze harmfulness without damaging the information of the extracted video frame because it is performed by adding an arbitrary layer without directly accessing the image information of the video frame being played. There are advantages.

도 8은 사용자 단말의 배터리 상태가 절전 모드(20%)로 돌입을 준비해야 하는 경우의 유해 단계별 프레임 추출 간격 및 유사도 비교 결정을 나타낸다. 사용자 단말의 배터리 소모량이 늘어 절전 모드로 돌입을 준비해야 하는 경우, 제안하는 시스템은 상대적으로 CPU 부하 변동폭이 적은 유사도 비교 과정에서 resizing 되는 이미지 크기와 유사도 비교의 기준이 되는 SSIM 임계값을 조절한다. 이미지 reszing 크기와 SSIM 임계값은 배터리가 충분할 경우에 각각 200 by 200과 0.8의 값을 사용하며 그렇지 않을 경우에는 100 by 100과 0.7의 값을 사용한다. 이미지 resizing 크기의 감소는 유사도 비교에 필요한 CPU의 부하 정도를 감소시킬 수 있고 SSIM 임계값의 감소는 추출되는 프레임의 개수를 감소시켜 유해성 분석에 의한 부하를 줄일 수 있다. 8 illustrates frame extraction intervals and similarity comparison determinations for harmful stages when the battery state of the user terminal needs to prepare for entering a power saving mode (20%). When the battery consumption of the user terminal increases and it is necessary to prepare for entering power saving mode, the proposed system adjusts the image size resizing and the SSIM threshold, which is the criterion for similarity comparison, in the similarity comparison process with relatively small CPU load fluctuation range. For the image reszing size and SSIM threshold, values of 200 by 200 and 0.8 are used when the battery is sufficient, and values of 100 by 100 and 0.7 are used otherwise. Reducing the size of image resizing can reduce the degree of CPU load required for similarity comparison, and reducing the SSIM threshold can reduce the load by harmfulness analysis by reducing the number of frames extracted.

도 9는 절전 모드일 경우의 유해 단계별 프레임 추출 간격 및 유사도 비교 결정을 나타낸다. 사용자 단말의 배터리 상태가 절전 모드(20-10%)에 돌입했을 경우, 제안하는 시스템은 이전 단계에서 조절한 이미지 resizing 크기와 SSIM 임계값을 유지한다. 그리고, 프레임 추출 간격을 기준 프레임 추출 간격의 2배로 늘려 유해성 분석 엔진이 동작되는 횟수를 감소시킴으로써 단말 자원에 대한 부하를 감소시킬 수 있다. 9 shows frame extraction intervals and similarity comparison determinations for each harmful level in power saving mode. When the battery state of the user terminal enters the power saving mode (20-10%), the proposed system maintains the image resizing size and SSIM threshold adjusted in the previous step. In addition, the load on terminal resources can be reduced by reducing the number of times the harmfulness analysis engine is operated by increasing the frame extraction interval to twice the reference frame extraction interval.

도 10은 사용자 단말의 배터리 상태가 초절전 모드(10-0%)일 경우의 유해 단계별 프레임 추출 간격 및 유사도 비교 결정을 나타낸다. 사용자 단말이 초절전 모드에 돌입했을 경우, 제안하는 시스템은 이미지 resizing 크기와 SSIM 임계값을 이전 단계의 값으로 유지한다. 그리고, 프레임 추출 간격을 기준 프레임 추출 간격의 3배로 늘리며, 모든 유해성 구간에서 유사도 비교를 일괄적으로 적용하여 프레임 추출을 최소화한다. 이를 통해 유해성 분석 엔진이 동작되는 횟수가 최소화되어 단말 자원에 대한 부하를 최소화할 수 있다. 기준 프레임 추출 간격을 늘리는 배율은 서비스 제공자가 환경에 따라 설정 가능한 상수로써 제안하는 시스템에서는 각각 2와 3으로 설정하였다. 10 illustrates frame extraction intervals and similarity comparison determinations for harmful levels when the battery state of the user terminal is in the ultra power saving mode (10-0%). When the user terminal enters deep power saving mode, the proposed system maintains the image resizing size and SSIM threshold at the values of the previous step. In addition, the frame extraction interval is increased to three times the reference frame extraction interval, and the similarity comparison is collectively applied to all harmful sections to minimize frame extraction. Through this, the number of times the harmfulness analysis engine is operated is minimized, thereby minimizing the load on terminal resources. The multiplier for increasing the reference frame extraction interval is a constant that can be set by the service provider according to the environment, and is set to 2 and 3 in the proposed system, respectively.

본 발명에서 제안하는 모바일 단말에서의 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 스트리밍 제어 시스템은 단말의 잔여 배터리 상태를 분류하고 그 잔여 배터리 상태에 따라 적응적으로 프레임 추출 알고리즘을 결정한다. 제안하는 시스템은 동영상을 재생 중인 단말의 상태에 적합한 프레임 추출 및 유해성 변화 판단을 통해 단말의 배터리 소모를 최소화하고 유해 동영상을 차단하는 것을 목표로 한다. 배터리 상태가 충분할 경우 유해성 변화 판단의 성능을 우선으로 하는 알고리즘을 적용한다. 그러나, 사용자 단말의 배터리 소모 속도가 증가하여 단말이 절전 모드(20%)에 돌입에 준비해야 할 경우, 이미지 resizing 크기를 줄여 유사도 비교를 위해 필요한 CPU 부하를 감소시키고 SSIM 임계값을 작게 조절함으로써 유해성 분석 엔진에 의한 부하를 감소시킨다. 사용자 단말의 배터리 상태가 절전 모드(20%)에 돌입했을 경우, 프레임 추출 간격을 기존의 두 배로 늘려 적용함으로써 유해성 분석에 의한 부하를 줄이며 단말이 초절전 모드(10-0%)에 돌입했을 경우 프레임 추출 간격을 기존의 세 배로 늘리고 유사도 비교를 모든 유해성 변화 구간에 적용하여 유해성 분석에 의한 부하를 최소화한다. The terminal resource-based adaptive frame extraction and streaming control system for blocking harmful videos in a mobile terminal proposed in the present invention classifies the remaining battery state of the terminal and adaptively determines the frame extraction algorithm according to the remaining battery state. . The proposed system aims to minimize battery consumption of the terminal and block harmful videos by extracting frames suitable for the state of the terminal playing video and determining harmfulness changes. If the battery condition is sufficient, an algorithm that prioritizes the performance of hazardous change determination is applied. However, when the battery consumption rate of the user terminal increases and the terminal needs to prepare for entering the power saving mode (20%), the image resizing size is reduced to reduce the CPU load required for similarity comparison and the SSIM threshold value is adjusted small to reduce harmfulness. Reduce the load on the analysis engine. When the battery state of the user terminal enters the power saving mode (20%), the frame extraction interval is doubled and applied to reduce the load caused by harmfulness analysis, and when the terminal enters the ultra power saving mode (10-0%), the frame extraction interval is applied. Minimize the load by hazard analysis by increasing the extraction interval by three times and applying similarity comparison to all hazard change sections.

참고로, CNN (Convolutional Neural Network) 기반의 모델은 이미지를 입력받은 후 그 이미지의 특징을 추출하여 이미지의 장면을 특징을 추출하고 분류한다. CNN은 이미지 처리 분야에서 사용하던 필터링 기법에 인공신경망을 적용하여 이미지를 더 효과적으로 처리하기 위해 고안된 모델이다. 입력되는 이미지 데이터는 높이, 너비, 채널로 구성된 3차원 텐서(tensor) 구조로 표현된다. 텐서 구조로 들어오는 이미지의 특징(feature)을 추출하고 그 이미지의 특징을 특정한 값으로 라벨링 한 뒤 학습을 진행시켜 다양한 이미지들을 구분할 수 있다. 추출되는 이미지의 특징은 이미지의 경계선, 색상의 분포 그리고 왜곡 정도와 같은 것들이 있고 이런 다양한 특징들을 특징맵(feature map)으로 만들어 학습 후 입력으로 들어오는 이미지의 feature와 비교한다. 특징맵(feature map)을 만들기 위해 사용된 CNN의 구조는 아래에서 구체적으로 서술한다. 유해성 분석 엔진은 CNN 기반의 모델을 유해한 이미지와 무해한 이미지로 분류된 데이터 세트를 사용하여 학습시킨 유해 이미지 분류 엔진이다. 학습에 사용되는 이미지 세트는 사전에 각 이미지가 유해한 이미지인지 무해한 이미지인지 라벨링 되어있다. 유해성 분석 엔진은 임의의 이미지가 입력되었을 때 그 이미지의 장면 정보를 추출하여 학습된 모델을 통해 해당 이미지가 유해한 이미지인지 무해한 이미지인지 분류한다. 이미지의 장면 정보는 픽셀 RGB 값의 분포, 장면 내 움직임 검출 결과, 그리고 HSV(Hue, Saturation, Value) 값과 같은 것들을 의미한다.For reference, a CNN (Convolutional Neural Network)-based model receives an image as an input and then extracts features of the image to extract and classify the features of the scene of the image. CNN is a model designed to process images more effectively by applying an artificial neural network to the filtering technique used in the field of image processing. The input image data is expressed as a 3D tensor structure composed of height, width, and channels. Various images can be distinguished by extracting the features of images coming into the tensor structure, labeling the features of the images with specific values, and proceeding with learning. The features of the extracted image include image borders, color distribution, and degree of distortion. These various features are made into a feature map and compared with the input image features after learning. The structure of the CNN used to create the feature map is described in detail below. The harmfulness analysis engine is a harmful image classification engine that trains a CNN-based model using a data set classified into harmful and harmless images. The image set used for learning is previously labeled as harmful or harmless. When an arbitrary image is input, the harmfulness analysis engine extracts scene information of the image and classifies whether the image is harmful or harmless through a learned model. Scene information of an image means things such as the distribution of pixel RGB values, motion detection results in a scene, and HSV (Hue, Saturation, Value) values.

본 발명에서는 실시간 유해성 분석을 위한 모델을 사용하기 위해 MobileNet 기반의 CNN 유해성 분석 엔진을 사용하였다. 도 11은 CNN을 위한 일반적인 convolution 연산과 MobileNet을 위한 depthwise convolution 및 pointwise convolution 연산의 방식을 보여준다. 일반적인 convolution 연산의 필터 크기는 K x K 사이즈이며 이미지의 높이와 넓이는 F, 입력 채널의 수는 N, 그리고 필터의 수는 M이다. 따라서, 총 연산량은 이다. 그러나, depthwise convolution은 입력 feature map의 각 채널마다 다른 커널을 사용하기 때문에 필터의 수가 입력 채널의 수와 동일하다. 이 경우, 채널 방향으로 연산이 이루어지지 않고 공간 방향으로 convolution 하기 때문에, 총 연산량은 으로 일반적인 convolution 보다 연산량이 적어짐을 알 수 있다. Pointwise convolution은 1x1 conv를 사용하였기 때문에 필터 수가 줄어들어 차원 축소를 통해 연산량이 감소되었다. Depthwise convolution의 연산 방향과 반대로 채널 방향으로 연산을 수행하기 때문에 총 연산량은 과 같다.In the present invention, a MobileNet-based CNN harmfulness analysis engine was used to use a model for real-time harmfulness analysis. 11 shows a general convolution operation for CNN and methods of depthwise convolution and pointwise convolution operation for MobileNet. The filter size of a general convolution operation is K × K, the image height and width are F, the number of input channels is N, and the number of filters is M. Therefore, the total amount of computation is am. However, since depthwise convolution uses a different kernel for each channel of the input feature map, the number of filters is equal to the number of input channels. In this case, since the operation is not performed in the channel direction and convolution is performed in the spatial direction, the total amount of operation is It can be seen that the amount of computation is smaller than that of general convolution. Since pointwise convolution uses 1x1 conv, the number of filters is reduced and the amount of computation is reduced through dimensionality reduction. Since the operation is performed in the channel direction opposite to the operation direction of depthwise convolution, the total amount of operation is Same as

도 11은 Convolution 연산 방식의 비교한 그림이다. 11 is a comparison diagram of convolution operation methods.

도 12는 본 발명에서 사용하는 유해성 분석 엔진에 사용된 MobileNet을 기반으로 한 CNN의 전체 구조이다. Depthwise seperable conv라고도 불리는 이 구조는 채널 방향과 공간 방향의 convolution 연산을 동시에 수행하는 일반적인 convolution과는 다르게 depthwise convolution과 pointwise convolution을 따로 사용함으로써 공간 방향과 채널 방향의 연산을 따로 수행한 후 합친다. 따라서, 일반적인 convolution의 연산량인 와 Depthwise seperable conv의 연산량인 () + ()의 연산량을 비교했을 때, F = 5, N = 2, M = 64, K = 2 라고 가정하면 3.76배 가량의 연산량 차이가 발생한다.12 is an overall structure of a CNN based on MobileNet used in the harmfulness analysis engine used in the present invention. This structure, also called depthwise seperable conv, uses depthwise convolution and pointwise convolution separately, unlike general convolutions that simultaneously perform channel-direction and spatial-direction convolution operations, perform spatial and channel-direction operations separately, and then combine them. Therefore, the amount of operation of a general convolution and the amount of operation of Depthwise seperable conv ( ) + ( ), assuming that F = 5, N = 2, M = 64, and K = 2, a difference of about 3.76 times the amount of calculation occurs.

도 12는 Depthwise seperable conv의 구조를 나타낸다. 12 shows the structure of depthwise seperable conv.

따라서, 유해성 분석 엔진에 전달되는 프레임은 유해성을 판단하고자 하는 장면의 대표 프레임이 되어야 한다. 장면의 대표 프레임을 추출하기 위해 본 발명에서는 유사도 비교를 통해 중복되지 않는 프레임을 기준으로 장면을 나누고, 프레임을 추출하였다. 유사도 비교 연산과 같은 경우는 유해성 분석 엔진 내부에서 수행되는 작업이 아니고 프레임 추출 알고리즘상에서 수행되는 작업이다. 본 발명에서 유사도 비교 연산은 추출된 프레임을 Mat 형태로 변환한 후에 아래에 기술되어 있는 수식을 기반으로 계산된다. x와 y는 Mat 형태로 변환된 이미지이며 a, b,c의 값은 모두 1로 설정하였다. 평균 밝기는 원본 이미지와 비교 이미지의 평균 밝기를 종합하여 두 이미지의 평균 밝기를 나타낼 수 있도록 정규화된 값을 의미한다. 이 때,

는 원본 이미지x의 평균 밝기,

는 비교 이미지 y의 평균 밝기, 그리고

은 이미지 밝기 상수를 의미한다. 이미지의 콘트라스트는 원본 이미지와 비교 이미지의 밝기 표준편차를 종합하여 두 이미지의 콘트라스트를 나타낼 수 있도록 정규화된 값을 의미한다. 이 때,

는 원본 이미지x의 밝기 표준편차,

는 비교 이미지 y의 밝기 표준 편차, 그리고

는 이미지 콘트라스트 상수를 의미한다. 그리고, 이미지의 구조 지수는 두 이미지 각각의 밝기 표준편차 대비 두 이미지 사이의 연관된 밝기 표준 편차의 값의 상대적 비율을 의미한다. 이 때 밝기 공분산

는 각 이미지의 밝기 정보가 서로 얼마나 연관성이 있는지를 나타내기 위한 값이며

는 이미지 구조 상수를 의미한다. a, b, c가 의미하는 것은 평균 밝기, 이미지 콘트라스트, 그리고 이미지 구조 지수를 SSIM을 비교할 때 얼마나 반영할지에 대한 가중치이다. 사람의 지각 품질에 영향을 주는 각 요소를 모두 반영하여 원본 이미지와 비교 이미지 간의 유사도를 비교해야 하기 때문에 가중치를 모두 1로 설정하였고, 이로 인해 SSIM 계산에 있어서 각 요소가 모두 같은 영향력을 가지게 된다.Accordingly, the frame transmitted to the harmfulness analysis engine should be a representative frame of a scene for which harmfulness is to be determined. In order to extract a representative frame of a scene, the present invention divides the scene based on non-overlapping frames through similarity comparison and extracts the frame. In the case of similarity comparison calculation, it is not a task performed inside the hazard analysis engine, but a task performed on the frame extraction algorithm. In the present invention, the similarity comparison operation is calculated based on the formula described below after converting the extracted frame into Mat format. x and y is the image converted to Mat form a, b ,c The values of are all set to 1. The average brightness means a value normalized to represent the average brightness of the two images by combining the average brightness of the original image and the comparison image. At this time,

is the original image average brightness of x,

is the comparison image the average brightness of y, and

denotes the image brightness constant. The contrast of the image means a normalized value to represent the contrast of the two images by combining the brightness standard deviations of the original image and the comparison image. At this time,

is the original image brightness standard deviation of x,

is the comparison image the brightness standard deviation of y, and

denotes an image contrast constant. And, the structure index of an image means a relative ratio of a brightness standard deviation value between two images to a brightness standard deviation value of each of the two images. In this case, the brightness covariance

is a value indicating how correlated the brightness information of each image is with each other

denotes an image structure constant. a, b, c means a weight for how much to reflect average brightness, image contrast, and image structure index when comparing SSIM. Since the similarity between the original image and the comparison image must be compared by reflecting each factor that affects human perception quality, all weights are set to 1, so that each factor has the same influence in SSIM calculation.

평균 밝기 average brightness

이미지의 콘트라스트 contrast of images

이미지 구조 지수 image structure index

본 발명에서 제안하는 모바일 단말에서 유해 동영상 차단을 위한 단말 자원 기반의 적응적 프레임 추출 및 스트리밍 제어 시스템은 단말의 잔여 배터리 상태를 분류하고 그 상태에 따라 적응적으로 프레임 추출 알고리즘을 결정한다. 제안하는 시스템은 동영상을 재생 중인 단말의 상태에 적합한 프레임 추출 및 유해성 변화 판단을 통해 단말의 배터리 소모를 최소화하고 유해 동영상을 차단하는 것을 목표로 한다. 배터리 상태가 충분할 경우 유해성 변화 판단의 성능을 우선으로 하는 알고리즘을 적용한다. 그러나, 배터리 소모 속도가 증가하여 단말이 절전 모드 돌입에 준비해야 할 경우, 이미지 resizing 크기를 줄여 유사도 비교를 위해 필요한 CPU 부하를 감소시키고 SSIM 임계값을 작게 조절함으로써 유해성 분석 엔진으로 인한 부하를 감소시킨다. 단말이 절전 모드에 돌입했을 경우, 프레임 추출 간격을 기존의 두 배로 늘려서 적용함으로써 유해성 분석으로 인한 부하를 줄이며 단말이 초절전 모드에 돌입했을 경우 프레임 추출 간격을 기존의 3배로 늘리고 유사도 비교를 모든 유해성 변화 구간에 적용하여 유해성 분석으로 인한 부하를 최소화한다. 본 발명에서 제안하는 시스템에서 사용되는 여러 임계값과 배터리 상태의 기준의 경우 스트리밍 서비스를 제공하는 서비스 사업자의 기호에 따라 값을 다르게 설정할 수 있다. 또한, 제안하는 시스템에서 스트리밍 제어 방식으로 사용하는 모자이크 처리 외에 스트리밍 세션 차단, 동영상 품질 저하, 유해 동영상 안내 문구 출력과 같은 다양한 방식을 적용할 수 있다.The terminal resource-based adaptive frame extraction and streaming control system for blocking harmful video in a mobile terminal proposed in the present invention classifies the terminal's remaining battery state and adaptively determines a frame extraction algorithm according to the state. The proposed system aims to minimize battery consumption of the terminal and block harmful videos by extracting frames suitable for the state of the terminal playing video and determining harmfulness change. If the battery condition is sufficient, an algorithm that prioritizes the performance of hazardous change determination is applied. However, when the battery consumption rate increases and the terminal needs to prepare for entering power saving mode, the image resizing size is reduced to reduce the CPU load required for similarity comparison, and the load caused by the hazard analysis engine is reduced by adjusting the SSIM threshold value small. . When the terminal enters power saving mode, the frame extraction interval is doubled and applied to reduce the load caused by harmfulness analysis. It is applied to the section to minimize the load due to hazard analysis. In the case of various threshold values and battery state standards used in the system proposed in the present invention, values may be set differently according to the preference of a service provider providing a streaming service. In addition, in addition to mosaic processing used as a streaming control method in the proposed system, various methods such as blocking streaming sessions, degrading video quality, and outputting harmful video guide phrases can be applied.

본 발명에 따른 실시예들은 다양한 컴퓨터 수단을 통해 수행될 수 있는 프로그램 명령 형태로 구현되고, 컴퓨터 판독 가능 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 기록 매체는 프로그램 명령, 데이터 파일, 데이터 구조를 단독으로 또는 조합하여 포함할 수 있다. 컴퓨터 판독 가능 기록 매체는 스토리지, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리, 스토리지와 같은 저장 매체에 프로그램 명령을 저장하고 수행하도록 구성된 하드웨어 장치가 포함될 수 있다.　프로그램 명령의 예는 컴파일러에 의해 만들어지는 것과, 기계어 코드 뿐만 아니라 인터프리터를 사용하여 컴퓨터에 의해 실행될 수 있는 고급 언어 코드를 포함할 수 있다.　상기 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로써 작동하도록 구성될 수 있다.Embodiments according to the present invention are implemented in the form of program instructions that can be executed through various computer means, and can be recorded in a computer readable recording medium. The computer readable recording medium may include program instructions, data files, and data structures alone or in combination. Computer-readable recording media include storage, hard disks, magnetic media such as floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - A hardware device configured to store and execute program instructions in storage media such as magneto-optical media, ROM, RAM, flash memory, and storage may be included. Examples of program instructions may include those produced by a compiler, machine language codes as well as high-level language codes that can be executed by a computer using an interpreter. The hardware device may be configured to operate as one or more software modules to perform the operations of the present invention.

이상에서 설명한 바와 같이, 본 발명의 방법은 프로그램으로 구현되어 컴퓨터의 소프트웨어를 이용하여 읽을 수 있는 형태로 기록매체(CD-ROM, RAM, ROM, 메모리 카드, 하드 디스크, 광자기 디스크, 스토리지 디바이스 등)에 저장될 수 있다. As described above, the method of the present invention is implemented as a program and can be read using computer software on a recording medium (CD-ROM, RAM, ROM, memory card, hard disk, magneto-optical disk, storage device, etc.) ) can be stored in

본 발명의 구체적인 실시예를 참조하여 설명하였지만, 본 발명은 상기와 같이 기술적 사상을 예시하기 위해 구체적인 실시 예와 동일한 구성 및 작용에만 한정되지 않고, 본 발명의 기술적 사상과 범위를 벗어나지 않는 한도 내에서 다양하게 변형하여 실시될 수 있으며, 본 발명의 범위는 후술하는 특허청구범위에 의해 결정되어야 한다.Although described with reference to specific embodiments of the present invention, the present invention is not limited to the same configuration and operation as the specific embodiments to illustrate the technical idea as described above, and within the limit that does not deviate from the technical spirit and scope of the present invention It can be implemented with various modifications, and the scope of the present invention should be determined by the claims described later.

100: 유해성 분석 엔진
110: 적응적 프레임 추출 모듈
120: 스트리밍 제어 모듈100: hazard analysis engine
110: adaptive frame extraction module
120: streaming control module

Claims

server;
A user terminal having an adaptive frame extraction module, a deep learning-based harmfulness analysis engine, and a streaming control module that performs mosaic processing in the case of harmful videos,
The terminal resources required to block harmful videos of the user terminal include factors related to the processing capacity of the terminal and the number of frames extracted and the time required for harmfulness analysis in the case of response results to harmful videos.
In the terminal resources required to block harmful videos of the user terminal, factors related to the processing capacity of the terminal include CPU occupancy and battery consumption,
The frame of the video segment received from the server is extracted in real time and transmitted to the harmfulness analysis engine, and the harmfulness analysis engine transmits the harmfulness analysis result for each frame to the harmfulness change determination module to determine the harmfulness change of the video being played. The harmfulness determination result is used to adjust the frame extraction method along with the change in terminal resources necessary for the harmfulness analysis, and the terminal resource characteristic that is the criterion for the frame extraction method is the state of the remaining battery of the user terminal.
The state of the remaining battery of the user terminal is divided into a state that is determined to be sufficient according to the specifications of the terminal (40%), a state that is a standard for power saving mode (20%), and a state that is a standard for ultra power saving mode (10%), It operates based on HTTP adaptive streaming that adaptively determines the bit rate of video segments and adaptive frame extraction based on terminal resources for blocking harmful videos in mobile terminals,
The adaptive frame extraction module extracts video frames excluding duplicate video frames to increase the efficiency of video harmfulness analysis, calculates SSIM values for two adjacent video frames, and compares the calculated SSIM values with a specific threshold. to determine the degree of similarity,
When the battery state of the user terminal enters the power saving mode (20%), the harmfulness analysis engine maintains the image resizing size and SSIM threshold value adjusted in the previous step and increases the frame extraction interval to twice the standard frame extraction interval. A terminal resource-based adaptive frame extraction and streaming control system for blocking harmful videos in a mobile terminal, which reduces the load on terminal resources by reducing the number of operations.

delete

According to claim 1,
When the remaining battery state of the user terminal is 40% or more, an adaptive frame extraction algorithm based on similarity comparison between harmfulness changes and frames is applied,
When the remaining battery state of the user terminal needs to prepare for entering the power saving mode (20%), prepare for rapid battery consumption by controlling the image resizing size and SSIM threshold that have little effect on CPU load in the process of blocking harmful videos ,
Device resource-based adaptation for blocking harmful videos in a mobile device, which reduces the load of the device by additionally applying frame extraction interval control and similarity comparison when the remaining battery state of the user device is in power saving mode or ultra power saving mode Enemy frame extraction and streaming control system.

According to claim 1,
In HTTP adaptive streaming, the server encodes and stores a video into segments having a fixed playback length and different bit rates, and information about video segments is stored as a Media Presentation Description (MPD) file and sent to a client after streaming starts. and the client determines the bit rate of the next video segment to be requested using the video segment information of the received MPD file and the measured network state;
The MPD file is a terminal resource for blocking harmful videos in a mobile terminal, including video segment information (720P, 240P quality duration), video segment quality (bit rate), and server URL information when a server streams a video. based adaptive frame extraction and streaming control system.

According to claim 1,
The deep learning-based harmfulness analysis engine that performs harmfulness analysis on frames extracted from terminal resource-based adaptive frame extraction and streaming control system for blocking harmful videos in a mobile terminal uses a Convolutional Neural Network (CNN) model,
The harmfulness analysis engine is a harmful image classification engine that trains a CNN-based model using a data set classified into harmful and harmless images, and the image set used for learning is previously labeled whether each image is harmful or harmless. When an arbitrary image is input, the hazard analysis engine extracts the scene information of the image and classifies whether the image is harmful or harmless through the learned model, and the scene information of the image is the distribution of pixel RGB values. , motion detection results in the scene, and HSV (Hue, Saturation, Value) values,
Frames transmitted to the harmfulness analysis engine were divided into scenes based on non-overlapping frames through similarity comparison to extract a representative frame of the scene to be determined harmfulness, and frames were extracted. In the case of similarity comparison calculation, the harmfulness analysis A terminal resource-based adaptive frame extraction and streaming control system for blocking harmful videos in a mobile terminal, which is not performed inside the engine but is performed on a frame extraction algorithm.

According to claim 1,
The user terminal resources (remaining battery, CPU load) required to analyze the harmfulness of the video being played in real time affect the load of harmfulness analysis, and the load of the terminal resource is affected based on the remaining battery of the device playing the video. By controlling contexts, the frame extraction technique considering the processing capacity of the user terminal used for hazard analysis is performed. It is classified into the case of preparing for power saving mode by reducing the speed (40~20%), entering power saving mode (20~10%) and ultra power saving mode (10~0%). , The criterion for determining the battery state of the user terminal is a value selectable by the service provider according to the specifications of the terminal, and 40%, 20%, and 10% are set as the criteria, and terminal resources for blocking harmful videos in the mobile terminal based adaptive frame extraction and streaming control system.

According to claim 1,
When a video stream is transmitted from the server, if the remaining battery of the user terminal is in a sufficient state, video frames are extracted according to an HTTP-based adaptive frame extraction algorithm, and the extracted frames are transmitted to the deep learning-based harmfulness analysis engine. , The harmfulness analysis engine transmits the harmfulness analysis result in the form of harmfulness probability for each frame to the video streaming blocking control module and the adaptive frame extraction module through harmfulness analysis on the input video frame,
The adaptive frame extraction module determines the harmfulness change of the video according to the harmfulness analysis result, determines the frame extraction interval and similarity comparison according to the harmfulness change, and prevents the extraction of duplicate frames to minimize the load of the video. Frames are extracted by applying a fixed interval and similarity comparison until they are determined to be harmful. Extraction through similarity comparison between frames calculates the SSIM value, and if it exceeds a preset threshold, it is judged as a similar frame and excluded from the extraction target. A terminal resource-based adaptive frame extraction and streaming control system for blocking harmful video in a mobile terminal, which is determined as a frame of a different scene and included in the extraction target when the value is lower than the threshold.

According to claim 1,
When comparing the similarity between frames by calculating the SSIM (Structural Similarity) value for frame extraction based on the similarity comparison between video frames, the SSIM value is the brightness information for each pixel of the two images to be compared. and distortion information, and the video stream characteristic-based streaming control system compares the similarity between extracted video frames based on the SSIM value,
SSIM values for the 10th frame and 11th frame among the extracted video frames have a small value of 0.4795, and if the SSIM value is small, the video stream characteristic-based streaming control system determines that the frames are included in different scenes, On the other hand, since the SSIM values for the 11th video frame and the 12th video frame have a large value of 0.8258, in this case, it is determined that the frames are included in the same scene,
In order to extract video frames based on similarity comparison, a streaming control system based on video stream characteristics distinguishes multiple scenes included in a video segment, and the criterion for scene change is the part where the SSIM value changes greatly when comparing the similarity between adjacent video frames. In the case of a scene containing the first extracted video frame, it can be viewed as a scene in which frames similar to the extracted frame are gathered, whereas in the case of a scene containing the second extracted video frame, the first extracted Compared to a scene containing video frames, it can be viewed as a scene starting from a frame in which the SSIM value changes greatly.
When x is the original image and y is the comparison image, in order to calculate the SSIM value, the average brightness I(x,y) of the pixels of the two images is first calculated as follows: is the average brightness of the original image x, is the average brightness of the comparison image y, and means the image brightness constant, and the value calculated by the corresponding formula means a normalized value to represent the average brightness of the two images by combining the average brightness of the original image and the comparison image,

After calculating the average brightness, the standard deviation value of each image is the contrast value of the image. , and the standard deviation of the two images is calculated as the following formula, where: is the original image Brightness standard deviation of , is the comparison image is the brightness standard deviation of , and Means an image contrast constant, and the value calculated by the formula means a normalized value to represent the contrast of the two images by combining the brightness standard deviations of the original image and the comparison image,

After calculating the image contrast, the image structure index is calculated using the brightness covariance and brightness standard deviation between the two images. Calculate the value of , where the brightness covariance is a value for indicating how correlated the brightness information of each image is with each other, Means an image structure constant, and the value calculated by the corresponding formula means the relative ratio of the value of the associated brightness standard deviation between the two images to the brightness standard deviation of each of the two images,

The image structure constant is the image contrast constant It is calculated as in the following formula using

Finally, the SSIM value is calculated as follows by combining the average brightness of the two images, the contrast value of the images, and the image structure index, the values of weights a, b, and c are all set to 1, and the original image and the comparison image Comparing the similarity between the two, and calculating the SSIM value because the weights are all set to 1,

Device resource-based adaptive frame extraction and streaming control system for blocking harmful video in mobile terminal.

According to claim 1,
When the battery consumption of the user terminal is increased and it is necessary to prepare for entering the power saving mode (20%), the size of the image resizing in the similarity comparison process with relatively small fluctuations in CPU load and the SSIM threshold value that is the criterion for similarity comparison are adjusted. , The image resizing size and SSIM threshold use values of 200 by 200 and 0.8, respectively, when the battery is sufficient, and values of 100 by 100 and 0.7 are used otherwise. Adaptive frame extraction based on terminal resources for blocking harmful videos in a mobile terminal, which can reduce the degree of CPU load and reduce the number of frames extracted by reducing the SSIM threshold to reduce the load by harmfulness analysis. and a streaming control system.

delete

According to claim 1,
When the battery state of the user terminal is in ultra power saving mode (10-0%), frame extraction intervals and similarity comparisons are determined for harmful stages, and when the terminal enters ultra power saving mode, the image resizing size and SSIM threshold are set to those of the previous stage. value, the frame extraction interval is increased to 3 times the standard frame extraction interval, and similarity comparison is applied collectively to all harmfulness sections to minimize frame extraction. It is possible to minimize the load for the reference frame extraction interval, and the multiplier for increasing the reference frame extraction interval is a constant that can be set by the service provider according to the environment. In the proposed system, it is set to 2 and 3, respectively. Enemy frame extraction and streaming control system.

server; Adaptive frame extraction and streaming control system based on terminal resources (battery status) including a user terminal having an adaptive frame extraction module, a deep learning-based harmfulness analysis engine, and a streaming control module that performs mosaic processing in case of harmful videos at,
The terminal resources required to block harmful videos of the user terminal include factors related to the processing capacity of the terminal and the number of frames extracted and the time required for harmfulness analysis in the case of response results to harmful videos.
In the terminal resources required to block harmful videos of the user terminal, factors related to the terminal's processing capability include CPU occupancy and battery consumption,
The frame of the video segment received from the server is extracted in real time and transmitted to the harmfulness analysis engine, and the harmfulness analysis engine transmits the harmfulness analysis result for each frame to the harmfulness change determination module to determine the harmfulness change of the video being played. The harmfulness determination result is used to adjust the frame extraction method along with the change in terminal resources necessary for the harmfulness analysis, and the terminal resource characteristic that is the criterion for the frame extraction method is the state of the remaining battery of the user terminal.
The state of the remaining battery of the user terminal is divided into a state that is determined to be sufficient according to the specifications of the terminal (40%), a state that is a standard for power saving mode (20%), and a state that is a standard for ultra power saving mode (10%), Operating based on HTTP adaptive streaming to adaptively determine the bit rate of video segments and adaptive frame extraction based on terminal resources for blocking harmful videos in a mobile terminal,
The adaptive frame extraction module extracts video frames excluding duplicate video frames to increase the efficiency of video harmfulness analysis, calculates SSIM values for two adjacent video frames, and compares the calculated SSIM values with a specific threshold. to determine the degree of similarity,
When the battery state of the user terminal enters the power saving mode (20%), the harmfulness analysis engine maintains the image resizing size and SSIM threshold value adjusted in the previous step and increases the frame extraction interval to twice the standard frame extraction interval. A terminal resource-based adaptive frame extraction and streaming control method for blocking harmful videos in a mobile terminal, which reduces the load on terminal resources by reducing the number of operations.

delete

According to claim 12,
When the remaining battery state of the user terminal is 40% or more, an adaptive frame extraction algorithm based on similarity comparison between harmfulness changes and frames is applied,
When the remaining battery state of the user terminal needs to prepare for entering power saving mode, prepare for rapid battery consumption by controlling the image resizing size and SSIM threshold that have little effect on CPU load in the process of blocking harmful videos,
Device resource-based adaptation for blocking harmful videos in a mobile device, which reduces the load of the device by additionally applying frame extraction interval control and similarity comparison when the remaining battery state of the user device is in power saving mode or ultra power saving mode Enemy frame extraction and streaming control method.

According to claim 12,
In HTTP adaptive streaming, the server encodes and stores a video into segments having a fixed playback length and different bit rates, and information about video segments is stored as a Media Presentation Description (MPD) file and sent to a client after streaming starts. and the client determines the bit rate of the video segment to be requested next using the video segment related information of the received MPD file and the measured network state, and extracts the terminal resource-based adaptive frame for blocking harmful videos in the mobile terminal. and a streaming control method.

According to claim 12,
The deep learning-based harmfulness analysis engine that performs harmfulness analysis on frames extracted from terminal resource-based adaptive frame extraction and streaming control system for blocking harmful videos in a mobile terminal uses a Convolutional Neural Network (CNN) model,
The harmfulness analysis engine is a harmful image classification engine that trains a CNN-based model using a data set classified into harmful and harmless images, and the image set used for learning is previously labeled whether each image is harmful or harmless. When an arbitrary image is input, the hazard analysis engine extracts the scene information of the image and classifies whether the image is harmful or harmless through the learned model, and the scene information of the image is the distribution of pixel RGB values. , motion detection results in the scene, and HSV (Hue, Saturation, Value) values,
Frames transmitted to the harmfulness analysis engine were divided into scenes based on non-overlapping frames through similarity comparison to extract a representative frame of the scene to be determined harmfulness, and frames were extracted. In the case of similarity comparison calculation, the harmfulness analysis A terminal resource-based adaptive frame extraction and streaming control method for blocking harmful videos in a mobile terminal, which is not performed inside an engine but is performed on a frame extraction algorithm.

According to claim 12,
The user terminal resources (remaining battery, CPU load) required to analyze the harmfulness of the video being played in real time affect the load of harmfulness analysis, and the load of the terminal resource is affected based on the remaining battery of the device playing the video. By adjusting the contexts, the frame extraction technique considering the processing capacity of the user terminal used for hazard analysis is performed, and the remaining battery state of the user terminal is determined by the battery consumption rate It is classified into cases where it is necessary to prepare for power saving mode by reducing (40~20%), entering power saving mode (20~10%) and ultra power saving mode (10~0%). The criterion for determining the battery state of the user terminal is a value selectable by the service provider according to the specifications of the terminal, and 40%, 20%, and 10% are set as the criterion, based on terminal resources for blocking harmful videos in mobile terminals. Adaptive frame extraction and streaming control method.

According to claim 12,
When a video stream is transmitted from the server, if the remaining battery of the user terminal is in a sufficient state, video frames are extracted according to an HTTP-based adaptive frame extraction algorithm, and the extracted frames are transmitted to the deep learning-based harmfulness analysis engine. , The harmfulness analysis engine transmits the harmfulness analysis result in the form of harmfulness probability for each frame to the video streaming blocking control module and the adaptive frame extraction module through harmfulness analysis on the input video frame,
The adaptive frame extraction module determines the change in the harmfulness of the video according to the harmfulness analysis result, determines the frame extraction interval and similarity comparison according to the harmfulness change, and the proposed system minimizes the load by preventing the extraction of duplicate frames. In order to do this, frames are extracted by applying a fixed interval and similarity comparison until the video is determined to be harmful, and extraction through similarity comparison between frames calculates the SSIM value and determines it as a similar frame when it exceeds a preset threshold and extracts it. Excluded from the target, and if the value is lower than the threshold, determined as a frame of a different scene and included in the extraction target, terminal resource-based adaptive frame extraction and streaming control method for blocking harmful video in a mobile terminal.

According to claim 12,
When comparing the similarity between frames by calculating the SSIM (Structural Similarity) value for frame extraction based on the similarity comparison between video frames, the SSIM value is the brightness information for each pixel of the two images to be compared. and distortion information, and the video stream characteristic-based streaming control system compares the similarity between extracted video frames based on the SSIM value,
SSIM values for the 10th frame and 11th frame among the extracted video frames have a small value of 0.4795, and if the SSIM value is small, the video stream characteristic-based streaming control system determines that the frames are included in different scenes, On the other hand, since the SSIM values for the 11th video frame and the 12th video frame have a large value of 0.8258, in this case, it is determined that the frames are included in the same scene,
In order to extract video frames based on similarity comparison, the streaming control system based on video stream characteristics distinguishes multiple scenes included in a video segment, and the criterion for scene change is the part where the SSIM value greatly changes when comparing the similarity between adjacent video frames. The scene containing the first extracted video frame can be viewed as a scene in which frames similar to the extracted frame are gathered, while the scene containing the second extracted video frame can be viewed as the first extracted video frame. It can be viewed as a scene starting from a frame in which the SSIM value changes significantly compared to a scene containing frames.
When x is the original image and y is the comparison image, in order to calculate the SSIM value, the average brightness I(x,y) of the pixels of the two images is first calculated as follows: is the average brightness of the original image x, is the average brightness of the comparison image y, and means the image brightness constant, and the value calculated by the corresponding formula means a normalized value to represent the average brightness of the two images by combining the average brightness of the original image and the comparison image,

After calculating the average brightness, the standard deviation value of each image is the contrast value of the image. , and the standard deviation of the two images is calculated as the following formula, where: is the original image Brightness standard deviation of , is the comparison image is the brightness standard deviation of , and Means an image contrast constant, and the value calculated by the formula means a normalized value to represent the contrast of the two images by combining the brightness standard deviations of the original image and the comparison image,

After calculating the image contrast, the image structure index is calculated using the brightness covariance and brightness standard deviation between the two images. Calculate the value of , where the brightness covariance is a value for indicating how correlated the brightness information of each image is with each other, Means an image structure constant, and the value calculated by the corresponding formula means the relative ratio of the value of the associated brightness standard deviation between the two images to the brightness standard deviation of each of the two images,

The image structure constant is the image contrast constant It is calculated as in the following formula using

Finally, the SSIM value is calculated as follows by combining the average brightness of the two images, the contrast value of the images, and the image structure index, the values of weights a, b, and c are all set to 1, and the original image and the comparison image Comparing the similarity between the two, and calculating the SSIM value because the weights are all set to 1,

Device resource-based adaptive frame extraction and streaming control method for blocking harmful video in mobile terminal.

According to claim 12,
When the battery consumption of the user terminal is increased and it is necessary to prepare for entering the power saving mode (20%), the size of the image resizing in the similarity comparison process with relatively small fluctuations in CPU load and the SSIM threshold value that is the criterion for similarity comparison are adjusted. , Image resizing size and SSIM threshold use 200 by 200 and 0.8 values, respectively, when the battery is sufficient, and 100 by 100 and 0.7 values, respectively, when the battery is sufficient. Device resource-based adaptation for blocking harmful videos in a mobile terminal, characterized in that the required degree of CPU load can be reduced, and the reduction of the SSIM threshold can reduce the number of frames extracted to reduce the load caused by harmfulness analysis. Enemy frame extraction and streaming control method.

delete

According to claim 12,
When the battery state of the user terminal is in ultra power saving mode (10-0%), frame extraction intervals and similarity comparisons are determined for harmful stages, and when the terminal enters ultra power saving mode, the image resizing size and SSIM threshold are set to those of the previous stage. value, the frame extraction interval is increased to 3 times the standard frame extraction interval, the similarity comparison is applied collectively in all harmfulness sections to minimize frame extraction, and the number of times the harmfulness analysis engine is operated is minimized, thereby minimizing the load on terminal resources. can be minimized, and the multiplier for increasing the reference frame extraction interval is a constant that can be set by the service provider according to the environment. Extraction and Streaming Control Methods.