KR20240075985A

KR20240075985A - Vision system based on deep learning for video object de-identification

Info

Publication number: KR20240075985A
Application number: KR1020220157953A
Authority: KR
Inventors: 추연승; 박용석; 김현식
Original assignee: 한국전자기술연구원
Priority date: 2022-11-23
Filing date: 2022-11-23
Publication date: 2024-05-30

Abstract

비디오 객체 비식별화 방법 및 시스템이 제공된다. 본 발명의 실시예에 따른 비식별화 방법은, 비디오 시퀀스에서 프레임들을 구분하고, 구분된 프레임들 중 초기 프레임에서 비식별화 대상을 선정받아, 초기 프레임 이후 프레임에서 초기 프레임에서 선정된 비식별화 대상을 비식별화 한다. 이에 의해 편집자에 의한 수동 비식별화 과정에서 발생하는 미검출·오검출의 상황을 방지할 수 있고, 편집자의 피로감을 덜 수 있게 된다.A method and system for video object de-identification are provided. The de-identification method according to an embodiment of the present invention divides frames in a video sequence, selects a de-identification target from an initial frame among the divided frames, and de-identifies the selected frame from the initial frame in frames after the initial frame. De-identify the target. As a result, it is possible to prevent situations of non-detection or false detection that occur during the manual de-identification process by the editor, and reduce editor fatigue.

Description

Vision system based on deep learning for video object de-identification}

본 발명은 딥러닝 기반 컴퓨터 비전 기술에 관한 것으로, 더욱 상세하게는 비디오 시퀀스에서 특정 객체를 검출하고 추적하여 비식별화 하는 방법에 관한 것이다.The present invention relates to deep learning-based computer vision technology, and more specifically, to a method of detecting, tracking, and de-identifying a specific object in a video sequence.

21세기에 들어서면서 TV 등 방송 매체 및 인터넷 등 통신 매체의 발달로 인해 비디오, 영상에 대한 접근이 더 쉽고 용이해졌다. 당연하게도, 그로 인한 단점도 극단적인 형태로 나타나게 되었는데, 예를 들면 딥페이크, 몰래카메라 등 불법 영상물이 나타나기 시작하고, 차량 번호판, 사람의 얼굴 등 개인정보나 저작권, 초상권의 침해 현상이 더욱 빈번히 발생하게 되었다. 따라서 이에 대응하기 위한 비식별화 기술의 필요성이 크게 증대되었다.Entering the 21st century, access to videos and images has become easier and easier due to the development of broadcasting media such as TV and communication media such as the Internet. Naturally, the resulting disadvantages also appeared in extreme forms. For example, illegal videos such as deepfakes and hidden cameras began to appear, and infringements of personal information, copyright, and portrait rights such as vehicle license plates and people's faces became more frequent. I did it. Therefore, the need for de-identification technology to respond to this has greatly increased.

특히 미디어·콘텐츠를 제작하는 제작자로서는, 영상을 방송·통신 매체에 송신하기 전에 비식별화 과정을 통해 특정 정보를 차단해야만 한다. 그러나 비디오, 시퀀스의 개별 프레임 별 검사 과정은 피로감이 높고, 미검출·오검출에 의한 실수가 발생할 여지가 충분하다. 이로 인해, 객체 검출, 영역 분할, 재식별 및 추적과 같은 비전 시스템을 적용해 객체 정보 인지 과정과 인지된 정보를 토대로 비식별화를 거치는 과정에 대한 자동화 과정이 필요하다.In particular, producers of media and content must block certain information through a de-identification process before transmitting video to broadcasting and communication media. However, the process of inspecting each individual frame of a video or sequence is very tiring, and there is ample room for errors due to non-detection or false detection. For this reason, it is necessary to automate the object information recognition process and de-identification process based on the recognized information by applying a vision system such as object detection, area segmentation, re-identification, and tracking.

도 1은 앞서 언급한 것과 같이, 의도치 않게 등장한 특정 인물에 대한 영역을 검출해 비식별화를 진행하는 예시를 나타낸다. 위와 같이, 특정 정보를 비식별화 처리하는 과정은 반드시 필요한 과정 중 하나이다.As mentioned above, Figure 1 shows an example of detecting an area for a specific person that appears unintentionally and performing de-identification. As above, the process of de-identifying specific information is one of the essential processes.

또한, 미디어·콘텐츠 영상에서는 장면 변화에 따라 영상 내의 정보들이 크게 바뀔 수 있다. 이러한 장면 변화는 크게 두 종류로 볼 수 있다. 첫 번째로 fade-in, fade-out, cut-scene 등이 있다. 이는 여러 다른 환경에서 촬영한 영상을 편집하여 이어붙인 것이다. 두 번째로, 카메라 포즈 변화에 의한 장면 변화가 있다. 촬영 상황에서 주시 객체의 위치 변화에 따라 카메라가 바라보는 FOV(Field of View)가 변할 수 있다. 도 2는 이러한 상황들에 대한 예시를 나타낸다. 이러한 장면 변화에는 기존의 추적 과정에서 객체를 잘못 추적하는 오추적이 발생할 수 있다. 이러한 장면 변화에 대한 재식별화 및 객체 추적 상황이 추가되어야 하는데, 이 역시 번거롭고 피로한 작업에 해당한다.Additionally, in media/content videos, information within the video can change significantly depending on scene changes. These scene changes can be broadly viewed as two types. First, there are fade-in, fade-out, cut-scene, etc. This is an edited and spliced piece of footage shot in several different environments. Second, there is a scene change due to a change in camera pose. In a shooting situation, the camera's FOV (Field of View) may change depending on the location of the object of interest. Figure 2 shows examples of these situations. These scene changes may cause mistracking, where objects are incorrectly tracked during the existing tracking process. Re-identification and object tracking situations must be added for these scene changes, but this is also a cumbersome and tiring task.

이에 따라 정확하면서도 완전 자동화된 비식별화 방법이 필요하다.Accordingly, an accurate and fully automated de-identification method is needed.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 본 발명의 목적은, 컴퓨터 비전의 장면 변화 탐지, 객체 검출, 추적 및 영역 분할 기법을 순차적으로 적용한 비디오 시퀀스 내 특정 객체에 대한 자동화된 비식별화 방법 및 시스템을 제공함에 있다.The present invention was devised to solve the above problems, and the purpose of the present invention is to provide automated rain for specific objects in a video sequence by sequentially applying scene change detection, object detection, tracking, and region segmentation techniques in computer vision. To provide an identification method and system.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 비디오 객체 비식별화 방법은 비디오 시퀀스에서 프레임들을 구분하는 단계; 구분된 프레임들 중 초기 프레임에서, 비식별화 대상을 선정받는 단계; 및 초기 프레임 이후 프레임에서, 초기 프레임에서 선정된 비식별화 대상을 비식별화 하는 단계;를 포함한다. A video object de-identification method according to an embodiment of the present invention for achieving the above object includes the steps of distinguishing frames in a video sequence; Selecting a de-identification target from an initial frame among the divided frames; and, in frames after the initial frame, de-identifying the de-identified target selected in the initial frame.

구분 단계는, 비디오 시퀀스에서 장면 전환을 탐지하여, 프레임들을 구분하는 것일 수 있다.The segmentation step may be to detect scene transitions in the video sequence and separate the frames.

구분 단계는, 비디오 시퀀스의 프레임들에 대한 키포인트 비교, 컬러 히스토그램 비교, 시퀀스 특징 비교 중 적어도 하나를 이용한 장면 전환 탐지를 통해, 프레임들을 구분하는 것일 수 있다.The classification step may be to classify the frames through scene change detection using at least one of key point comparison, color histogram comparison, and sequence feature comparison for the frames of the video sequence.

비디오 시퀀스의 프레임들에 대한 키포인트 비교를 통해 계산한 제1 스코어, 컬러 히스토그램 비교를 통해 계산한 제2 스코어 및 시퀀스 특징 비교를 통해 계산한 제3 스코어를 계산하고, 제1 스코어, 제2 스코어 및 제3 스코어를 통합한 최종 스코어를 기초로 장면 전환을 탐지할 수 있다.A first score calculated through keypoint comparison of frames of the video sequence, a second score calculated through color histogram comparison, and a third score calculated through sequence feature comparison are calculated, and the first score, the second score, and A scene change can be detected based on the final score that integrates the third score.

선정 단계는, 초기 프레임에서 객체들을 검출하는 단계; 검출된 객체들이 나타난 영역들을 구분하는 단계; 검출된 객체들 중 적어도 하나를 비식별화 대상으로 선택받는 단계;를 포함할 수 있다.The selection step includes detecting objects in the initial frame; distinguishing areas where detected objects appear; It may include the step of selecting at least one of the detected objects as a de-identification target.

선택 단계는, 검출된 객체들이 표시된 사용자 인터페이스를 통해 사용자로부터 비식별화 대상 객체를 선택받는 것일 수 있다.In the selection step, the object to be de-identified may be selected by the user through a user interface on which the detected objects are displayed.

현재 프레임에서 이전 프레임의 비식별화 대상을 추적하는 단계; 추적하는 객체가 나타난 영역을 분할하는 단계; 분할된 영역을 비식별화 하는 단계;를 포함할 수 있다.tracking the de-identified object of the previous frame in the current frame; dividing the area where the object to be tracked appears; It may include a step of de-identifying the divided area.

현재 프레임에서 이전 프레임의 비식별화 대상이 추적되지 않으면, 현재 프레임에서 객체들을 검출하는 단계; 검출된 객체들 중 비식별화 대상을 판별하는 단계; 판별된 비식별화 대상을 추적하는 단계; 추적하는 객체가 나타난 영역을 분할하는 단계; 분할된 영역을 비식별화 하는 단계;를 포함할 수 있다.If the de-identified object of the previous frame is not tracked in the current frame, detecting objects in the current frame; Determining a de-identified object among the detected objects; Tracking the identified de-identified object; dividing the area where the object to be tracked appears; It may include a step of de-identifying the divided area.

비식별화 단계는, 분할된 영역에 대해 흐려짐, 지우기, 얼굴 생성 및 합성 중 어느 하나를 통해 비식별화하는 것일 수 있다.The de-identification step may be de-identification of the divided area through any one of blurring, erasing, face creation, and synthesis.

본 발명의 다른 측면에 따르면, 비디오 시퀀스에서 프레임들을 구분하고, 구분된 프레임들 중 초기 프레임에서 비식별화 대상을 선정받아, 초기 프레임 이후 프레임에서 초기 프레임에서 선정된 비식별화 대상을 비식별화 하는 프로세서; 및 프로세서에 필요한 저장 공간을 제공하는 저장부;를 포함하는 것을 특징으로 하는 비디오 객체 비식별화 시스템이 제공된다.According to another aspect of the present invention, frames are divided in a video sequence, a de-identification target is selected from an initial frame among the divided frames, and the de-identification target selected in the initial frame is de-identified in frames after the initial frame. processor; and a storage unit that provides storage space necessary for the processor. A video object de-identification system is provided, comprising:

본 발명의 또다른 측면에 따르면, 비디오 시퀀스를 구성하는 프레임들 중 초기 프레임에서, 비식별화 대상을 선정받는 단계; 및 초기 프레임 이후 프레임에서, 초기 프레임에서 선정된 비식별화 대상을 비식별화 하는 단계;를 포함하는 것을 특징으로 하는 비디오 객체 비식별화 방법이 제공된다.According to another aspect of the present invention, selecting a de-identification target from an initial frame among frames constituting a video sequence; and, in frames after the initial frame, de-identifying the de-identification target selected in the initial frame. A video object de-identification method is provided, comprising:

본 발명의 또다른 측면에 따르면, 비디오 시퀀스를 구성하는 프레임들 중 초기 프레임에서, 비식별화 대상을 선정받는 입력부; 및 초기 프레임 이후 프레임에서, 초기 프레임에서 선정된 비식별화 대상을 비식별화 하는 프로세서;를 포함하는 것을 특징으로 하는 비디오 객체 비식별화 시스템이 제공된다.According to another aspect of the present invention, an input unit that selects a de-identification target in an initial frame among the frames constituting a video sequence; and a processor that, in frames after the initial frame, de-identifies the de-identification object selected in the initial frame.

이상 설명한 바와 같이, 본 발명의 실시예들에 따르면, 컴퓨터 비전의 장면 변화 탐지, 객체 검출, 추적 및 영역 분할 기법을 순차적으로 적용하여 비디오 시퀀스 내 특정 객체에 대해 자동화된 비식별화가 가능해져, 편집자에 의한 수동 비식별화 과정에서 발생하는 미검출·오검출의 상황을 방지할 수 있고, 편집자의 피로감을 덜 수 있으며, 특히 편집 기술에 대해 접근이 어렵고 일손이 부족한 1인 크리에이터부터 다양한 미디어 편집을 진행하는 방송·콘텐츠 제작사까지 비식별화 기술을 쉽게 적용할 수 있다.As described above, according to embodiments of the present invention, automated de-identification of specific objects in a video sequence is possible by sequentially applying scene change detection, object detection, tracking, and region segmentation techniques in computer vision, allowing the editor It can prevent situations of non-detection or false detection that occur during the manual de-identification process, reduce editor fatigue, and edit various media, especially for single creators who have difficulty accessing editing technology and are short on manpower. De-identification technology can be easily applied to broadcasting and content production companies.

또한 본 발명의 실시예들에 따르면, 대용량 미디어·영상 데이터에 대한 일관성 있는 비식별화 과정을 적용할 수 있고, 장면 변화에 대해서도 비식별화 대상을 자동 식별 및 비식별화 처리가 가능하기 때문에, 장면에 따라 별도로 편집할 필요가 없으므로 사용자의 편의성을 더욱 증진시킬 수 있다.In addition, according to embodiments of the present invention, a consistent de-identification process can be applied to large-capacity media and video data, and automatic identification and de-identification processing of de-identification objects are possible even for scene changes. Since there is no need for separate editing depending on the scene, user convenience can be further improved.

도 1은 인물 비식별화 예시,
도 2는 장면 변화 예시,
도 3은 본 발명의 일 실시예에 따른 비디오 객체 비식별화 방법의 설명에 제공되는 흐름도,
도 4는 도 3의 S110단계의 상세 흐름도,
도 5는 키포인트 비교 예시,
도 6은 도 3의 S120단계의 상세 흐름도,
도 7은 비식별화 대상 선택을 위한 사용자 인터페이스 예시,
도 8은 도 3의 S130단계의 상세 흐름도,
도 9는 본 발명의 다른 실시예에 따른 비디오 객체 비식별화 시스템의 구성을 도시한 도면이다.Figure 1 is an example of person de-identification,
Figure 2 is an example of a scene change,
3 is a flowchart provided to explain a video object de-identification method according to an embodiment of the present invention;
Figure 4 is a detailed flow chart of step S110 of Figure 3;
Figure 5 is an example of key point comparison,
Figure 6 is a detailed flowchart of step S120 of Figure 3;
Figure 7 is an example of a user interface for selecting a de-identified object,
Figure 8 is a detailed flow chart of step S130 of Figure 3;
Figure 9 is a diagram showing the configuration of a video object de-identification system according to another embodiment of the present invention.

이하에서는 도면을 참조하여 본 발명을 보다 상세하게 설명한다.Hereinafter, the present invention will be described in more detail with reference to the drawings.

본 발명의 실시예에서는 정확하면서도 자동화된 비디오 객체 비식별화 방법 및 시스템을 제시한다.Embodiments of the present invention present a method and system for accurate and automated video object de-identification.

비디오 시퀀스 형태의 입력 영상에 대해 장면 변화를 탐지하면서, 비식별화를 진행할 특정 객체를 선택하고, 전체 영상에서 해당 객체의 영역을 비식별화 처리하여 영상에서 식별할 수 없게 하는 기술이다.This is a technology that detects scene changes in an input image in the form of a video sequence, selects a specific object to be de-identified, and de-identifies the area of the object in the entire image so that it cannot be identified in the image.

기존의 편집자에 의한 비디오 프레임 단위 수동 비식별화 방법이 아닌, 사용자에 의해 미리 지정된 객체 정보를 시스템 내에서 스스로 인지하고 추적하여 비식별화를 수행하는 기술로, 장면 변화를 탐지하고 장면의 변화에 따라 장면 구분 후 재식별화 및 식별화 된 객체에 대한 비식별화 과정을 수행한다.Rather than using the existing manual video frame-by-frame de-identification method by an editor, this is a technology that performs de-identification by recognizing and tracking object information pre-specified by the user within the system. It detects scene changes and responds to scene changes. Accordingly, after classifying the scene, re-identification and de-identification processes are performed on the identified objects.

도 3은 본 발명의 일 실시예에 따른 비디오 객체 비식별화 방법의 설명에 제공되는 도면이다.Figure 3 is a diagram provided to explain a method for de-identifying a video object according to an embodiment of the present invention.

비디오 객체 비식별화를 위해, 먼저 비디오 시퀀스에서 장면 전환을 탐지하여 프레임들을 구분한다(S110). 비디오 시퀀스를 샷(shot) 단위나 씬(scene) 단위로 구분하는 것을 말한다. S110단계에 대해서는 도 4를 참조하여 상세히 후술한다.To de-identify video objects, first detect scene transitions in the video sequence and distinguish frames (S110). This refers to dividing a video sequence into shots or scenes. Step S110 will be described in detail later with reference to FIG. 4.

다음 S110단계에서 구분된 프레임들, 이를 테면 동일 샷을 구성하는 프레임들 중 초기 프레임(최초 프레임)에서 비식별화 대상을 선정받는다(S120). 이는 초기 프레임에서 검출된 객체들 중 비식별화가 필요한 객체를 사용자가 선택하는 방법에 의한다. S120단계에 대해서는 도 6을 참조하여 상세히 후술한다.In the next step S110, a de-identification target is selected from the divided frames, for example, the initial frame (first frame) among the frames constituting the same shot (S120). This is done by allowing the user to select an object that needs to be de-identified among the objects detected in the initial frame. Step S120 will be described in detail later with reference to FIG. 6.

그리고 S110단계에서 구분된 프레임들 중 초기 프레임 이후의 프레임들에 대해서는, S120단계에서 선정된 비식별화 대상을 추적하면서 비식별화 한다(S130). S130단계에 대해서는 도 8을 참조하여 상세히 후술한다.And among the frames classified in step S110, frames after the initial frame are de-identified while tracking the de-identification target selected in step S120 (S130). Step S130 will be described in detail later with reference to FIG. 8.

이하에서 비디오 시퀀스 구분 단계(S110)에 대해 도 4를 참조하여 상세히 설명한다. 도 4는 도 3의 S110단계의 상세 흐름도이다.Hereinafter, the video sequence classification step (S110) will be described in detail with reference to FIG. 4. Figure 4 is a detailed flowchart of step S110 of Figure 3.

비디오 시퀀스를 구성하는 프레임들을 동일 장면 단위들로 구분하기 위해서는, 장면이 전환되는 프레임을 탐지하여야 하며, 이를 위해서는 인접 프레임들을 비교하여야 한다(S111).In order to classify the frames that make up the video sequence into identical scene units, the frame in which the scene changes must be detected, and for this, adjacent frames must be compared (S111).

먼저 비디오 시퀀스를 구성하는 인접 프레임들에 키포인트를 추출하고 추적하면서 유효 키포인트 개수를 스코어로 기록한다(S112). 도 5에는 인접 프레임들에서 키포인트 비교 방법을 예시하였다. 빨간점들은 다음 프레임에서도 추출되는 유효 키포인트로 스코어에 반영된다. 하지만 파란점들은 다음 프레임에서 추적되지 않고 현재 프레임에서 추적이 종료되므로 유효한 키포인트가 아니어서 스코어에 반영되지 않는다.First, keypoints are extracted and tracked in adjacent frames constituting the video sequence, and the number of valid keypoints is recorded as a score (S112). Figure 5 illustrates a method of comparing key points in adjacent frames. Red dots are valid keypoints that are extracted in the next frame and are reflected in the score. However, since the blue dots are not tracked in the next frame and tracking ends in the current frame, they are not valid keypoints and are not reflected in the score.

추적되는 유효 키포인트의 갯수가 미리 지정한 문턱치 값보다 낮아지면 추적을 중지하고, 추적 시작 프레임부터 추적 중지 프레임까지의 스코어들을 정규화 한다. 그리고 추적 중기 프레임에서 새롭게 키포인트를 추출하여 추적을 재개한다. 이 과정은 비디오 시퀀스의 마지막 프레임까지 반복한다. When the number of valid keypoints being tracked falls below a pre-specified threshold value, tracking is stopped, and the scores from the tracking start frame to the tracking stop frame are normalized. Then, new key points are extracted from the mid-tracking frame and tracking is resumed. This process repeats until the last frame of the video sequence.

다음 비디오 시퀀스를 구성하는 인접 프레임들의 컬러 히스토그램을 비교한다. 컬러 히스토그램은 R,G,B 3채널에 대해 각각 정규화를 취하고, 이전 프레임과의 차이를 구해 평균하여 스코어를 계산한다(S113). 해당 과정 역시 비디오 시퀀스의 마지막 프레임 까지 반복한다.Next, compare the color histograms of adjacent frames that make up the video sequence. The color histogram is normalized for each of the three channels R, G, and B, and the difference from the previous frame is calculated and averaged to calculate the score (S113). This process is also repeated until the last frame of the video sequence.

또한 미리 지정한 프레임 선택 파라미터 n개 만큼의 프레임들을 선택해 CNN을 이용하여 시퀀스 특징을 추출하고 비교하여 스코어를 계산한다(S114). CNN은 비디오 장면 전환 검출을 위한 시퀀스 특징을 추출하도록 학습한 네트워크이다. 이 과정 역시 비디오 시퀀스의 마지막 프레임 까지 반복한다.In addition, frames as many as n pre-specified frame selection parameters are selected, sequence features are extracted and compared using CNN, and a score is calculated (S114). CNN is a network learned to extract sequence features for video scene transition detection. This process is also repeated until the last frame of the video sequence.

마지막으로 키포인트 비교, 컬러 히스토그램 비교 및 시퀀스 특징 비교를 통해 계산된 스코어들을 통합하여 최종 스코어를 산정한 뒤(S115), 지역 극솟값 및 극댓값을 벗어낫는지 여부를 판단하여 장면 변화를 검출함으로써(S116), 프레임들을 구분한다(S117).Finally, the final score is calculated by integrating the scores calculated through key point comparison, color histogram comparison, and sequence feature comparison (S115), and then scene changes are detected by determining whether the local minimum and maximum values are exceeded (S116). , distinguish the frames (S117).

이하에서 비식별화 대상 선정 단계(S120)에 대해 도 6을 참조하여 상세히 설명한다. 도 6은 도 3의 S120단계의 상세 흐름도이다.Below, the de-identification target selection step (S120) will be described in detail with reference to FIG. 6. Figure 6 is a detailed flowchart of step S120 of Figure 3.

비식별화 대상 선정을 위해, 먼저 구분된 프레임들 중 초기 프레임(최초 프레임)에서 객체들을 검출하고(S121), 검출된 객체들이 나타난 영역들을 분할한다(S122). 다음 S122단계에서 영역 분할 결과를 반영하여 사용자로부터 검출된 객체들 중 비식별화 대상을 선택받는다(S123).To select a de-identification target, objects are first detected in the initial frame (first frame) among the divided frames (S121), and the areas where the detected objects appear are divided (S122). In the next step S122, the area division result is reflected and a de-identification target is selected from the user among the detected objects (S123).

이후 S130단계를 통해 선택된 비식별화 대상이 추적된다.Afterwards, the selected de-identified target is tracked through step S130.

도 7에는 S123단계에서의 비식별화 대상 선택을 위한 사용자 인터페이스를 예시하였다. 도 4의 좌측 영상에는 사용자가 객체를 선택하지 않은 경우를 나타내었고, 우측 영상에는 객체를 비식별화 대상으로 선택하기 위해 마우스 포인터를 위치시켜 해당 객체 영역을 하이라이트시킨 경우를 나타내었다.Figure 7 illustrates a user interface for selecting a de-identified object in step S123. The left image in Figure 4 shows a case where the user did not select an object, and the right image shows a case where the mouse pointer was placed to select the object as a de-identification target and the corresponding object area was highlighted.

이하에서 비식별화 단계(S130)에 대해 도 8을 참조하여 상세히 설명한다. 도 8은 도 3의 S130단계의 상세 흐름도이다.Below, the de-identification step (S130) will be described in detail with reference to FIG. 8. Figure 8 is a detailed flowchart of step S130 of Figure 3.

대상 객체의 비식별화를 위해, 최초 프레임 이후의 프레임들에 대해서는 이전 프레임까지 추적하였던 비식별화 대상을 계속 추적한다(S131). 비식별화 대상 객체의 추적에 성공하면(S131-Yes), 해당 객체를 계속 추적하면서(S132), 추적 중인 객체가 나타난 영역을 분할한다(S133). 다음 S133단계에서 분할된 객체 영역을 비식별화 하고(S134), 다음 프레임에 대해 S131단계를 반복한다.In order to de-identify the target object, the de-identified object that was tracked up to the previous frame is continuously tracked for frames after the first frame (S131). If tracking of the de-identified object is successful (S131-Yes), the object is continued to be tracked (S132) and the area where the object being tracked appears is divided (S133). In the next step S133, the divided object area is de-identified (S134), and step S131 is repeated for the next frame.

S134단계에서의 비식별화는 흐려짐, 지우기, 얼굴 생성 및 합성 등은 물론 그 밖의 다른 방법으로도 가능하다.De-identification in step S134 is possible through blurring, erasing, face creation and synthesis, as well as other methods.

한편 비식별화 대상 객체의 추적에 실패하면(S131-No), 현재 프레임에서 객체들을 다시 검출하고(S135), 검출된 객체들 중 비식별화 대상 객체이 있는지 판별한다(S136).Meanwhile, if tracking of the de-identification target object fails (S131-No), the objects are detected again in the current frame (S135), and it is determined whether there is a de-identification target object among the detected objects (S136).

검출한 객체들 중 비식별화 대상 객체가 있는 것으로 판별되면(S136-Yes), 해당 객체를 추적하면서(S132), 추적 중인 객체가 나타난 영역을 분할하고(S133), 분할된 객체 영역을 비식별화 한다(S134). 이후 다음 프레임에 대해 S131단계를 반복한다.If it is determined that there is an object to be de-identified among the detected objects (S136-Yes), the object is tracked (S132), the area where the object being tracked appears is divided (S133), and the divided object area is de-identified. Be angry (S134). Afterwards, step S131 is repeated for the next frame.

반면 검출한 객체들 중 비식별화 대상 객체가 없는 것으로 판별되면(S136-No), 다음 프레임에 대해 S131단계를 반복한다.On the other hand, if it is determined that there is no object to be de-identified among the detected objects (S136-No), step S131 is repeated for the next frame.

도 9는 본 발명의 다른 실시예에 따른 비디오 객체 비식별화 시스템의 구성을 도시한 도면이다. 본 발명의 실시예에 따른 비디오 객체 비식별화 시스템은, 도시된 바와 같이, 통신부(210), 출력부(220), 프로세서(230), 입력부(240) 및 저장부(250)를 포함하여 구성되는 컴퓨팅 시스템으로 구현 가능하다.Figure 9 is a diagram showing the configuration of a video object de-identification system according to another embodiment of the present invention. As shown, the video object de-identification system according to an embodiment of the present invention includes a communication unit 210, an output unit 220, a processor 230, an input unit 240, and a storage unit 250. It can be implemented with a computing system.

통신부(210)는 외부 기기와 통신하고 외부 네트워크에 연결하기 위한 통신 수단이고, 출력부(220)는 프로세서(230)의 실행 결과를 표시하고, 입력부(240)는 사용자 명령을 프로세서(230)로 전달한다.The communication unit 210 is a communication means for communicating with external devices and connecting to an external network, the output unit 220 displays the execution results of the processor 230, and the input unit 240 transmits user commands to the processor 230. Deliver.

본 발명의 실시예와 관련하여 통신부(210)는 카메라를 통해 촬영되는 비디오 시퀀스를 수신하거나, 방송 스트리밍 서비스 또는 동영상 스트리밍 서비스로 제공되는 비디오 시퀀스를 수신할 수 있다.In relation to an embodiment of the present invention, the communication unit 210 may receive a video sequence captured through a camera, or receive a video sequence provided through a broadcast streaming service or video streaming service.

프로세서(230)는 도 3에 제시된 비디오 객체 비식별화 방법을 수행한다. 저장부(250)는 프로세서(230)가 기능하고 동작함에 있어 필요한 저장공간을 제공한다. 또한 본 발명의 실시예와 관련하여 저장부(250)에는 비식별화 할 비디오 시퀀스를 저장하고 있을 수 있다.Processor 230 performs the video object de-identification method shown in FIG. 3. The storage unit 250 provides storage space necessary for the processor 230 to function and operate. Additionally, in relation to an embodiment of the present invention, the storage unit 250 may store a video sequence to be de-identified.

지금까지, 비디오 객체 비식별화 방법 및 시스템에 대해 바람직한 실시예를 들어 상세히 설명하였다.So far, the video object de-identification method and system have been described in detail with preferred embodiments.

본 발명의 실시예에서는 컴퓨터 비전의 장면 변화 탐지, 객체 검출, 추적 및 영역 분할 기술을 순차적으로 적용하여 비디오 시퀀스 내 특정 객체(비식별화 대상)에 대한 자동화된 비식별화 과정을 제시하였다.In an embodiment of the present invention, an automated de-identification process for a specific object (de-identification target) in a video sequence is presented by sequentially applying computer vision scene change detection, object detection, tracking, and region segmentation technologies.

이에 의해 편집자에 의한 수동 비식별화 과정에서 발생하는 미검출·오검출의 상황을 방지할 수 있고, 편집자의 피로감을 덜 수 있으며, 편집 기술에 대해 접근이 어렵고 일손이 부족한 1인 크리에이터부터 다양한 미디어 편집을 진행하는 방송·콘텐츠 제작사까지 비식별화 기술을 쉽게 적용할 수 있게 된다.As a result, it is possible to prevent situations of non-detection or false detection that occur during the manual de-identification process by the editor, reduce the fatigue of the editor, and enable various media, from single creators who have difficulty accessing editing technology and lack of manpower. Even broadcasting and content production companies that conduct editing can easily apply de-identification technology.

또한 대용량 미디어·영상 데이터에 대한 일관성 있는 비식별화 과정을 적용할 수 있으며, 장면의 변화에 대해서도 비식별화 대상을 자동 식별 및 비식별화 처리가 가능하기 때문에, 장면에 따라 편집할 필요성이 크지 않아 사용자의 편의성을 증진시킬 수 있다.In addition, a consistent de-identification process can be applied to large amounts of media and video data, and the de-identification target can be automatically identified and de-identified even for changes in the scene, so there is no need to edit according to the scene. This can improve user convenience.

한편, 본 실시예에 따른 장치와 방법의 기능을 수행하게 하는 컴퓨터 프로그램을 수록한 컴퓨터로 읽을 수 있는 기록매체에도 본 발명의 기술적 사상이 적용될 수 있음은 물론이다. 또한, 본 발명의 다양한 실시예에 따른 기술적 사상은 컴퓨터로 읽을 수 있는 기록매체에 기록된 컴퓨터로 읽을 수 있는 코드 형태로 구현될 수도 있다. 컴퓨터로 읽을 수 있는 기록매체는 컴퓨터에 의해 읽을 수 있고 데이터를 저장할 수 있는 어떤 데이터 저장 장치이더라도 가능하다. 예를 들어, 컴퓨터로 읽을 수 있는 기록매체는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광디스크, 하드 디스크 드라이브, 등이 될 수 있음은 물론이다. 또한, 컴퓨터로 읽을 수 있는 기록매체에 저장된 컴퓨터로 읽을 수 있는 코드 또는 프로그램은 컴퓨터간에 연결된 네트워크를 통해 전송될 수도 있다.Meanwhile, of course, the technical idea of the present invention can be applied to a computer-readable recording medium containing a computer program that performs the functions of the device and method according to this embodiment. Additionally, the technical ideas according to various embodiments of the present invention may be implemented in the form of computer-readable code recorded on a computer-readable recording medium. A computer-readable recording medium can be any data storage device that can be read by a computer and store data. For example, of course, computer-readable recording media can be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical disk, hard disk drive, etc. Additionally, computer-readable codes or programs stored on a computer-readable recording medium may be transmitted through a network connected between computers.

또한, 이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In addition, although preferred embodiments of the present invention have been shown and described above, the present invention is not limited to the specific embodiments described above, and the technical field to which the invention pertains without departing from the gist of the present invention as claimed in the claims. Of course, various modifications can be made by those skilled in the art, and these modifications should not be understood individually from the technical idea or perspective of the present invention.

210 : 통신부
220 : 출력부
230 : 프로세서
240 : 입력부
250 : 저장부210: Department of Communications
220: output unit
230: processor
240: input unit
250: storage unit

Claims

Separating frames in a video sequence;
Selecting a de-identification target from an initial frame among the divided frames; and
A video object de-identification method comprising: de-identifying, in frames after the initial frame, the de-identification target selected in the initial frame.

In claim 1,
The classification steps are,
A video object de-identification method characterized by detecting scene transitions in a video sequence and distinguishing frames.

In claim 2,
The classification steps are,
A video object de-identification method characterized by distinguishing frames through scene change detection using at least one of key point comparison, color histogram comparison, and sequence feature comparison for frames of a video sequence.

In claim 3,
A first score calculated through keypoint comparison of frames of the video sequence, a second score calculated through color histogram comparison, and a third score calculated through sequence feature comparison are calculated, and the first score, the second score, and A video object de-identification method characterized by detecting scene transitions based on a final score that integrates the third score.

In claim 1,
The selection step is,
detecting objects in an initial frame;
distinguishing areas where detected objects appear;
A video object de-identification method comprising: selecting at least one of the detected objects as a de-identification target.

In claim 5,
The selection step is,
A video object de-identification method, characterized in that an object to be de-identified is selected by the user through a user interface on which detected objects are displayed.

In claim 5,
tracking the de-identified object of the previous frame in the current frame;
dividing the area where the object to be tracked appears;
A video object de-identification method comprising: de-identifying the divided area.

In claim 6,
If the de-identified object of the previous frame is not tracked in the current frame, detecting objects in the current frame;
Determining a de-identified object among the detected objects;
Tracking the identified de-identified object;
dividing the area where the object to be tracked appears;
A video object de-identification method comprising: de-identifying the divided area.

In claim 5
The de-identification step is,
A video object de-identification method characterized by de-identifying a segmented area through any one of blurring, erasing, face creation, and compositing.

A processor that separates frames in a video sequence, selects a de-identification target from an initial frame among the divided frames, and de-identifies the de-identification target selected in the initial frame in frames after the initial frame; and
A video object de-identification system comprising a storage unit that provides storage space necessary for the processor.

Selecting a de-identification target from an initial frame among the frames constituting the video sequence; and
A video object de-identification method comprising: de-identifying, in frames after the initial frame, the de-identification target selected in the initial frame.

An input unit that selects a de-identification target from an initial frame among the frames constituting the video sequence; and
A video object de-identification system comprising a processor that, in frames after the initial frame, de-identifies the de-identification object selected in the initial frame.