KR20240003467A

KR20240003467A - Video content providing system based on motion recognition

Info

Publication number: KR20240003467A
Application number: KR1020220081029A
Authority: KR
Inventors: 신재준
Original assignee: (주)아트인인터랙션
Priority date: 2022-07-01
Filing date: 2022-07-01
Publication date: 2024-01-09

Abstract

본 발명은 사용자의 제스쳐나 음성을 인식하여, 인식된 제스처 또는 음성 패턴에 기초하여 인터렉티브 영상을 출력하는 모션 인식에 기초한 영상 콘텐츠 제공 시스템에 관한 것이다.The present invention relates to a video content provision system based on motion recognition that recognizes a user's gesture or voice and outputs an interactive video based on the recognized gesture or voice pattern.

Description

{Video content providing system based on motion recognition}

본 발명은 모션 인식에 기초한 영상 콘텐츠 제공 시스템에 관한 것으로, 더욱 상세하게는 사용자의 제스쳐나 음성을 인식하여, 인식된 제스처 또는 음성 패턴에 기초하여 인터렉티브 영상을 출력하는 모션 인식에 기초한 영상 콘텐츠 제공 시스템에 관한 것이다.The present invention relates to a video content providing system based on motion recognition, and more specifically, to a video content providing system based on motion recognition that recognizes a user's gesture or voice and outputs an interactive video based on the recognized gesture or voice pattern. It's about.

최근 들어, 영상 분야의 비약적인 발전으로 인해 다양한 영상 기술이 개발 및 응용되고 있으며, 특히, 가상현실 영상을 생성하고, 상기 생성된 가상 현실 영상을 컴퓨터 또는 사용자 단말을 통해 재생하고 관람하는 기술이 등장하고 있다.Recently, due to rapid developments in the video field, various video technologies have been developed and applied. In particular, technologies for generating virtual reality images and playing and viewing the generated virtual reality images through a computer or user terminal have emerged. there is.

여기에서, 가상 현실(Virtual Reality: VR)이라 함은 실제 현실은 아니지만 사용자가 현실과 같은 환경을 3차원적인 시각(Sight)을 통해 경험할 수 있는 시뮬레이션 기술을 의미하며, 상기 가상현실은 가상의 공간과 사물을 기반으로 현실 세계만으로는 얻기 어려운 부가적인 체험이나 정보들을 사용자에게 제공할 수 있는 특징을 가지고 있다.Here, virtual reality (VR) refers to a simulation technology that allows users to experience a realistic environment through a three-dimensional view, although it is not actual reality, and virtual reality is a virtual space. It has the characteristic of providing users with additional experiences or information that is difficult to obtain in the real world alone based on objects and objects.

즉, 가상현실은 사용자로 하여금 3차원으로 이루어진 가상의 환경에 몰입하게 하되, 사용자가 상기 가상의 환경 내에서 실제감있는 체험을 할 수 있게끔 하는 것에 그 특징이 있다.In other words, virtual reality is characterized by allowing users to immerse themselves in a three-dimensional virtual environment and allowing users to have realistic experiences within the virtual environment.

위와 같은 특징 때문에, 상기 가상 현실은 게임, 애니메이션 등의 분야뿐만 아니라 다양한 환경에 응용하는 것이 가능하며 특히, 유비쿼터스 환경에 적합한 차세대 디스플레이 기술로서 각광받고 있다.Because of the above characteristics, virtual reality can be applied not only to fields such as games and animation, but also to various environments, and is especially in the spotlight as a next-generation display technology suitable for a ubiquitous environment.

한편, 위와 같이 가상현실 기술이 점차 발전되면서, 상기 가상현실과 사용자 간의 상호작용(Interaction) 즉, 상기 가상현실에서의 사용자의 명령 입력에 대한 문제점이 대두되고 있다.Meanwhile, as virtual reality technology gradually develops as described above, problems with the interaction between the virtual reality and the user, that is, the user's command input in the virtual reality, are emerging.

보다 구체적으로, 사용자들은 상기 가상현실을 이용하기 위해 컴퓨터, 이동 단말, VR 기기 등을 이용하고 있는데, 상기 컴퓨터, 이동 단말, VR 기기의 경우 사용자가 명령 등을 입력하기 위해서는 표준 입력 디바이스인 키보드, 마우스, 터치 센서 등을 이용하여야 한다.More specifically, users use computers, mobile terminals, VR devices, etc. to use the virtual reality. In the case of the computers, mobile terminals, and VR devices, in order for the user to input commands, a keyboard, which is a standard input device, is used. You must use a mouse, touch sensor, etc.

따라서, 상기 가상현실과 같은 3차원 공간에서도 키보드, 마우스 등과 같은 표준 입력 디바이스를 사용하여야 하므로, 실제감있는 가상현실에서 원하는 입력이나 액션 등을 원활히 수행하기가 어렵다는 문제점이 존재하고 있다.Therefore, since standard input devices such as a keyboard and mouse must be used even in a three-dimensional space such as virtual reality, there is a problem that it is difficult to smoothly perform desired inputs or actions in realistic virtual reality.

한편, 전술한 배경 기술은 발명자가 본 발명의 도출을 위해 보유하고 있었거나, 본 발명의 도출 과정에서 습득한 기술 정보로서, 반드시 본 발명의 출원 전에 일반 공중에게 공개된 공지기술이라 할 수는 없다.Meanwhile, the above-mentioned background technology is technical information that the inventor possessed for deriving the present invention or acquired in the process of deriving the present invention, and cannot necessarily be said to be known technology disclosed to the general public before filing the application for the present invention. .

한국등록특허 제10-1656025호Korean Patent No. 10-1656025

본 발명의 일측면은 사용자의 제스쳐나 음성을 인식하여, 인식된 제스처 또는 음성 패턴에 기초하여 인터렉티브 영상을 출력하는 모션 인식에 기초한 영상 콘텐츠 제공 시스템을 제공한다.One aspect of the present invention provides a video content providing system based on motion recognition that recognizes a user's gesture or voice and outputs an interactive video based on the recognized gesture or voice pattern.

본 발명의 다른 측면은 사용자의 제스쳐나 음성을 정교하게 인식함으로써, 인터렉티브 영상이 의도한 바와 다르게 출력되는 것을 방지할 수 있는 모션 인식에 기초한 영상 콘텐츠 제공 시스템을 제공한다.Another aspect of the present invention provides a video content provision system based on motion recognition that can prevent interactive images from being output differently than intended by precisely recognizing the user's gestures or voice.

본 발명의 기술적 과제는 이상에서 언급한 기술적 과제로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The technical problem of the present invention is not limited to the technical problem mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the description below.

본 발명의 일 실시예에 따른 모션 인식에 기초한 영상 콘텐츠 제공 시스템은 사용자의 제스쳐나 음성을 인식하여, 인식된 제스처 또는 음성 패턴에 기초하여 인터렉티브 영상을 출력한다.A video content providing system based on motion recognition according to an embodiment of the present invention recognizes a user's gesture or voice and outputs an interactive video based on the recognized gesture or voice pattern.

상기 모션 인식에 기초한 영상 콘텐츠 제공 시스템은, The video content provision system based on motion recognition,

사용자의 동작 또는 음성을 감지하여 감지정보를 생성하는 감지부; 및A sensing unit that detects the user's motion or voice and generates sensing information; and

영상 콘텐츠를 출력하며, 상기 감지정보에 기초하여 영상 콘텐츠 내 객체를 실시간으로 제어하는 콘텐츠 관리부;를 포함한다.It includes a content management unit that outputs video content and controls objects in the video content in real time based on the sensing information.

상기 감지부는,The sensing unit,

사용자의 동작을 감지하는 모션 감지부; 및A motion detection unit that detects the user's motion; and

사용자의 음성을 감지하는 음성 감지부를 포함하고,It includes a voice detection unit that detects the user's voice,

상기 모션 감지부는,The motion detection unit,

사용자가 촬영된 촬영영상을 분석하여 손 객체를 추출하고, 추출된 손 객체에 대한 3차원 좌표값을 생성하는 객체 추출부; 및An object extraction unit that analyzes a video captured by a user to extract a hand object and generates a three-dimensional coordinate value for the extracted hand object; and

촬영영상으로부터 감지되는 동작이 미리 저장된 복수의 모션정보 중 어떤 모션정보에 해당되는지를 판단하고, 판단 결과에 따라 감지정보를 생성하는 감지정보 생성부를 포함하고,A detection information generator that determines which motion information among a plurality of pre-stored motion information corresponds to the motion detected from the captured image and generates detection information according to the determination result,

상기 객체 추출부는,The object extraction unit,

추출된 손 객체의 외곽선을 추출하고, 추출된 손 객체의 외곽선으로부터 손 객체의 중점을 검출하고, 검출된 손 객체의 중점에 대응되는 3차원 좌표값을 손 객체의 대표 좌표값으로 설정하고, Extract the outline of the extracted hand object, detect the midpoint of the hand object from the outline of the extracted hand object, set the 3D coordinate value corresponding to the midpoint of the detected hand object as the representative coordinate value of the hand object,

상기 음성 감지부는,The voice detection unit,

사용자로부터 입력되는 음성신호를 분석하여 상기 음성신호가 미리 저장된 복수의 모션정보 중 어떤 모션정보에 해당되는지를 판단하고, 판단 결과에 따라 감지정보를 생성한다.The audio signal input from the user is analyzed to determine which motion information the audio signal corresponds to among a plurality of pre-stored motion information, and detection information is generated according to the determination result.

상기 콘텐츠 관리부는,The content management department,

상기 감지정보에 대응되는 모션정보가 미리 저장된 복수의 모션정보 중 기준 모션정보에 대응되는 것으로 판단되면 초기 화면이 출력되도록 하되,If the motion information corresponding to the detection information is determined to correspond to the reference motion information among the plurality of motion information stored in advance, the initial screen is displayed,

상기 초기 화면은, The initial screen is,

사용자의 동작에 따라 영상 콘텐츠 내 객체를 실시간으로 제어하는 모션제어 기반 인터렉티브 영상 출력모드의 실행을 선택받기 위한 제1 아이콘 객체;A first icon object for selecting the execution of a motion control-based interactive video output mode that controls objects in video content in real time according to the user's actions;

사용자의 음성에 따라 영상 콘텐츠 내 객체를 실시간으로 제어하는 음성제어 기반 인터렉티브 영상 출력모드의 실행을 선택받기 위한 제2 아이콘 객체; 및a second icon object for selecting execution of a voice control-based interactive video output mode that controls objects in video content in real time according to the user's voice; and

인터렉티브 영상 콘텐츠의 송출 시 함께 출력되는 배경음악을 선택받기 위한 제3 아이콘 객체가 표시되는 것을 특징으로 하고,When transmitting interactive video content, a third icon object is displayed to select background music that is output together,

상기 초기 화면이 출력되는 동안 감지되는 상기 대표 좌표값에 기초하여 사용자가 상기 제1 아이콘 객체, 상기 제2 아이콘 객체 및 상기 제3 아이콘 객체 중 어느 아이콘 객체를 선택하는지를 감지하고,Detect which icon object among the first icon object, the second icon object, and the third icon object is selected by the user based on the representative coordinate values detected while the initial screen is displayed,

상기 제1 아이콘 객체가 선택된 것으로 확인되면, 상기 감지정보에 대응되는 궤적정보를 생성하고, 영상 콘텐츠 내 객체를 실시간으로 상기 궤적정보에 따라 실시간 이동시킨 인터렉티브 영상 콘텐츠가 출력되도록 제어하고,When it is confirmed that the first icon object is selected, trajectory information corresponding to the detection information is generated, and interactive video content is output by moving an object in the video content in real time according to the trajectory information,

상기 제2 아이콘 객체가 선택된 것으로 확인되면, 상기 음성 감지로부터 수신되는 감지정보에 기초하여 영상 콘텐츠 내 객체가 상기 감지정보에 대응되는 동작을 수행하는 인터렉티브 영상 콘텐츠가 출력되도록 제어한다.When it is confirmed that the second icon object is selected, interactive video content in which an object in the video content performs an action corresponding to the sensing information is controlled to be output based on the sensing information received from the voice detection.

상술한 본 발명의 일측면에 따르면, 사용자의 제스쳐나 음성을 인식하여, 인식된 제스처 또는 음성 패턴에 기초하여 인터렉티브 영상을 출력할 수 있다.According to one aspect of the present invention described above, the user's gesture or voice can be recognized and an interactive image can be output based on the recognized gesture or voice pattern.

또한, 사용자의 제스쳐나 음성을 정교하게 인식함으로써, 인터렉티브 영상이 의도한 바와 다르게 출력되는 것을 방지할 수 있다.Additionally, by precisely recognizing the user's gestures or voice, it is possible to prevent interactive images from being output differently than intended.

도 1은 본 발명의 일 실시예에 따른 모션 인식에 기초한 영상 콘텐츠 제공 시스템의 개략적인 구성이 도시된 도면이다.
도 2는 도 1에 도시된 감지부의 구체적인 구성이 도시된 도면이다.
도 3 내지 도 6은 도 1에 도시된 콘텐츠 관리부에 출력되는 영상 콘텐츠의 구체적인 일 예가 도시된 도면이다.1 is a diagram illustrating a schematic configuration of a video content providing system based on motion recognition according to an embodiment of the present invention.
FIG. 2 is a diagram showing the specific configuration of the detection unit shown in FIG. 1.
3 to 6 are diagrams illustrating specific examples of video content output to the content management unit shown in FIG. 1.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예와 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.The detailed description of the present invention described below refers to the accompanying drawings, which show by way of example specific embodiments in which the present invention may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that the various embodiments of the invention are different from one another but are not necessarily mutually exclusive. For example, specific shapes, structures and characteristics described herein may be implemented in one embodiment without departing from the spirit and scope of the invention. Additionally, it should be understood that the location or arrangement of individual components within each disclosed embodiment may be changed without departing from the spirit and scope of the invention. Accordingly, the detailed description that follows is not intended to be taken in a limiting sense, and the scope of the invention is limited only by the appended claims, together with all equivalents to what those claims assert, if properly described. Similar reference numbers in the drawings refer to identical or similar functions across various aspects.

이하, 도면들을 참조하여 본 발명의 바람직한 실시예들을 보다 상세하게 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 모션 인식에 기초한 영상 콘텐츠 제공 시스템의 개략적인 구성이 도시된 도면이다.1 is a diagram illustrating a schematic configuration of a video content providing system based on motion recognition according to an embodiment of the present invention.

본 발명에 따른 모션 인식에 기초한 영상 콘텐츠 제공 시스템은 사용자의 제스쳐나 음성을 인식하여, 인식된 제스처 또는 음성 패턴에 기초하여 영상 콘텐츠 내의 특정 객체가 실시간으로 제어되는 인터렉티브 영상을 출력하는 것을 특징으로 한다.The video content provision system based on motion recognition according to the present invention is characterized in that it recognizes the user's gesture or voice and outputs an interactive video in which a specific object in the video content is controlled in real time based on the recognized gesture or voice pattern. .

이를 위해, 본 발명의 일 실시예에 따른 모션 인식에 기초한 영상 콘텐츠 제공 시스템은 감지부(10) 및 콘텐츠 관리부(20)를 포함한다.To this end, the video content providing system based on motion recognition according to an embodiment of the present invention includes a detection unit 10 and a content management unit 20.

감지부(10)는 사용자의 동작 또는 음성을 감지하여 감지정보를 생성한다.The sensing unit 10 detects the user's motion or voice and generates sensing information.

구체적으로, 도 2에 도시된 바와 같이, 감지부(10)는 모션 감지부(110) 및 음성 감지부(120)를 포함한다.Specifically, as shown in FIG. 2, the detection unit 10 includes a motion detection unit 110 and a voice detection unit 120.

모션 감지부(110)는 사용자의 동작을 감지한다. 예를 들어, 모션 감지부(110)는 카메라 모듈과 같은 형태로 구현되어 스크린 전방에 서 있는 사용자를 촬영한 촬영영상을 생성할 수 있다.The motion detection unit 110 detects the user's motion. For example, the motion detection unit 110 may be implemented in the form of a camera module to generate a captured image of a user standing in front of the screen.

음성 감지부(120)는 사용자의 음성을 감지한다. 예를 들어, 음성 감지부(120)는 마이크로폰 등과 같은 형태로 구현되어 사용자의 음성을 감지할 수 있다.The voice detection unit 120 detects the user's voice. For example, the voice detection unit 120 may be implemented in the form of a microphone or the like to detect the user's voice.

이와 같이, 감지부(10)는 모션 감지부(110) 및 음성 감지부(120)로 구성되어 사용자의 동작 및 음성을 감지할 수 있으나, 이에 한정되는 것은 아니며 하나의 모듈에서 모션 및 음성을 함께 감지할 수도 있다. 예컨대, 감지부(10)는 생성된 촬영영상에 포함된 음성신호를 분석함으로써 사용자의 음성을 감지할 수도 있을 것이다.In this way, the detection unit 10 is composed of a motion detection unit 110 and a voice detection unit 120 and can detect the user's motion and voice, but is not limited to this and can detect motion and voice together in one module. You can also sense it. For example, the detection unit 10 may detect the user's voice by analyzing the voice signal included in the generated captured image.

몇몇 또 다른 실시예에서, 감지부(10)는 사용자의 동작 또는 음성을 감지하기만 하고, 이에 대한 감지정보는 후술하는 콘텐츠 관리부(20)에서 생성될 수도 있다.In some other embodiments, the detection unit 10 only detects the user's motion or voice, and detection information for this may be generated in the content management unit 20, which will be described later.

콘텐츠 관리부(20)는 스크린에 영상 콘텐츠를 출력한다. 이를 위해, 콘텐츠 관리부(20)는 영상을 출력할 수 있는 프로젝터, PC 등과 같은 형태로 구현될 수 있다. The content management unit 20 outputs video content on the screen. To this end, the content management unit 20 may be implemented in a form such as a projector or PC capable of outputting images.

이러한 콘텐츠 관리부(20)는 감지부(10)와 유선 또는 무선 통신을 통해 연결되어, 감지부(10)에 의해 생성된 감지정보를 수신하고, 수신된 감지정보에 기초하여 영상 콘텐츠 내 객체의 이동, 동작 등을 실시간으로 제어한 인터렉티브 영상 콘텐츠가 스크린에 출력되도록 할 수 있다.This content management unit 20 is connected to the detection unit 10 through wired or wireless communication, receives detection information generated by the detection unit 10, and moves objects in the video content based on the received detection information. Interactive video content that controls actions, etc. in real time can be displayed on the screen.

이하에서는 이러한 감지부(10) 및 콘텐츠 관리부(20)의 구체적인 기능에 대하여 설명하기로 한다.Hereinafter, specific functions of the detection unit 10 and the content management unit 20 will be described.

감지부(10)를 구성하는 모션 감지부(110)는 객체 추출부 및 감지정보 생성부를 포함한다.The motion detection unit 110, which constitutes the detection unit 10, includes an object extraction unit and a detection information generation unit.

객체 추출부는 사용자가 촬영된 촬영영상을 분석하여 손 객체를 추출하고, 추출된 손 객체에 대한 3차원 좌표값을 생성한다.The object extraction unit extracts hand objects by analyzing images captured by the user and generates 3D coordinates for the extracted hand objects.

일 실시예에서, 객체 추출부는 추출된 손 객체의 외곽선을 추출하고, 추출된 손 객체의 외곽선으로부터 손 객체의 중점을 검출하고, 검출된 손 객체의 중점에 대응되는 3차원 좌표값을 손 객체의 대표 좌표값으로 설정할 수 있다.In one embodiment, the object extraction unit extracts the outline of the extracted hand object, detects the midpoint of the hand object from the outline of the extracted hand object, and sets a three-dimensional coordinate value corresponding to the midpoint of the detected hand object to the hand object. It can be set as a representative coordinate value.

일 실시예에서, 객체 추출부는 영상의 깊이 값에 기초하여 영상으로부터 배경과 손을 분리하고, 깊이 값에 임계값을 설정하여 손 영역을 추출하고, 손 영역으로부터 손가락을 인식하기 위하여 그레이 영상 및 이진화 영상으로 변환한다. 객체 추출부는 이진화 영상으로부터 컨투어를 이용하여 손의 중점과 손의 외곽선을 검출할 수 있다. 객체 추출부는 컨투어 좌표의 평균값을 계산함으로써 손의 중점을 검출하고, 컨투어 좌표들을 연결함으로써 손의 외곽선을 검출할 수 있다. 객체 추출부는 영상에서 각각의 다른 픽셀값을 가지는 각각의 픽셀마다 동일한 픽셀값을 가지는 픽셀을 탐색하여 연결하되, 영상을 이진화시킴으로써 픽셀들의 값을 단순화시킬 수 있다.In one embodiment, the object extractor separates the background and the hand from the image based on the depth value of the image, extracts the hand area by setting a threshold on the depth value, and extracts the gray image and binarizes it to recognize the finger from the hand area. Convert to video. The object extraction unit can detect the midpoint of the hand and the outline of the hand using the contour from the binarized image. The object extraction unit can detect the midpoint of the hand by calculating the average value of the contour coordinates and detect the outline of the hand by connecting the contour coordinates. The object extractor searches for and connects pixels with the same pixel value to each pixel with different pixel values in the image, but can simplify the pixel values by binarizing the image.

감지정보 생성부는 촬영영상으로부터 감지되는 동작이 미리 저장된 복수의 모션정보 중 어떤 모션정보에 해당되는지를 판단하고, 판단 결과에 따라 감지정보를 생성한다.The detection information generator determines which motion information among a plurality of pre-stored motion information corresponds to the motion detected from the captured image, and generates detection information according to the determination result.

이와 유사하게, 음성 감지부는 사용자로부터 입력되는 음성신호를 분석하여 상기 음성신호가 미리 저장된 복수의 모션정보 중 어떤 모션정보에 해당되는지를 판단하고, 판단 결과에 따라 감지정보를 생성한다.Similarly, the voice detection unit analyzes a voice signal input from the user, determines which motion information the voice signal corresponds to among a plurality of pre-stored motion information, and generates sensing information according to the determination result.

이와 같이, 모션 감지부(110)는 사용자가 어떠한 동작을 취하고 있는지를 손 객체의 이동을 통해 감지하고, 음성 감지부(120)는 사용자가 어떤 단어를 말하는지를 음성 신호 분석을 통해 감지하여, 감지된 동작 또는 음성에 대응되는 감지정보를 생성한다. In this way, the motion detection unit 110 detects what action the user is taking through the movement of the hand object, and the voice detection unit 120 detects what word the user is saying through voice signal analysis. Generates sensing information corresponding to the motion or voice.

콘텐츠 관리부는 상기 감지정보에 대응되는 모션정보가 미리 저장된 복수의 모션정보 중 기준 모션정보에 대응되는 것으로 판단되면 초기 화면이 출력되도록 한다. 예를 들어, 콘텐츠 관리부는 감지부로부터 사용자가 박수를 세 번 치는 감지정보인 기준 감지정보를 수신하면 초기 화면이 출력되도록 제어할 수 있다. 이러한 기준 감지정보는 사용 환경에 따라 다양한 형태로 설정될 수 있음은 물론이다.The content management unit outputs an initial screen when it is determined that the motion information corresponding to the sensed information corresponds to the reference motion information among a plurality of pre-stored motion information. For example, the content management unit can control the initial screen to be displayed when receiving reference detection information, which is detection information that the user claps three times, from the detection unit. Of course, this standard detection information can be set in various forms depending on the usage environment.

도 3 및 도 4는 콘텐츠 관리부에 의해 출력되는 초기 화면의 일 예가 도시된 도면이다.Figures 3 and 4 are diagrams showing an example of an initial screen output by the content management unit.

도시된 바와 같이, 콘텐츠 관리부는 초기 화면에 제1 아이콘 객체(o1), 제2 아이콘 객체(o2), 제3 아이콘 객체(o3) 및 손 객체(h)가 표시되도록 하는 것을 특징으로 한다.As shown, the content management unit is characterized in that the first icon object (o1), the second icon object (o2), the third icon object (o3), and the hand object (h) are displayed on the initial screen.

제1 아이콘 객체(o1)는 사용자의 동작에 따라 영상 콘텐츠 내 객체를 실시간으로 제어하는 모션제어 기반 인터렉티브 영상 출력모드의 실행을 선택받기 위한 아이콘 객체이다. The first icon object (o1) is an icon object for selecting the execution of a motion control-based interactive video output mode that controls objects in video content in real time according to the user's actions.

제2 아이콘 객체(o2)는 사용자의 음성에 따라 영상 콘텐츠 내 객체를 실시간으로 제어하는 음성제어 기반 인터렉티브 영상 출력모드의 실행을 선택받기 위한 아이콘 객체이다.The second icon object (o2) is an icon object for selecting the execution of a voice control-based interactive video output mode that controls objects in video content in real time according to the user's voice.

제3 아이콘 객체(o3)는 인터렉티브 영상 콘텐츠의 송출 시 함께 출력되는 배경음악을 선택받기 위한 아이콘 객체이다.The third icon object (o3) is an icon object for selecting background music that is output together when transmitting interactive video content.

손 객체(h)는 사용자가 초기 화면의 어느 위치를 가리키고 있는지를 나타내기 위한 객체이다.The hand object (h) is an object used to indicate where the user is pointing on the initial screen.

이와 같이, 콘텐츠 관리부는 상기 초기 화면이 출력되는 동안 감지되는 상기 대표 좌표값에 기초하여 손 객체(h)의 위치를 초기 화면 내에서 출력하여, 사용자가 상기 제1 아이콘 객체, 상기 제2 아이콘 객체 및 상기 제3 아이콘 객체 중 어느 아이콘 객체를 선택하는지를 감지한다.In this way, the content management unit outputs the position of the hand object (h) within the initial screen based on the representative coordinate value detected while the initial screen is displayed, so that the user can select the first icon object and the second icon object. and detect which icon object is selected among the third icon objects.

이때, 콘텐츠 관리부는 손 객체(h)가 특정 아이콘 객체에 위치하는 것으로 확인되면, 손 객체가 위치한 아이콘 객체의 주변에 타임 바 객체(t)가 함께 표시되도록 제어할 수 있다. 타임 바 객체(t)는 아이콘 객체의 12시 위치에서 시계 방향을 따라 원형으로 채워지는 고리 형상의 객체로, 예컨대 1초가 경과하면 타임 바 객체(t)는 완전한 원형 형상을 이룰 수 있다. 이와 같이, 콘텐츠 관리부는 사용자가 특정 아이콘 객체를 선택하면 타임 바 객체(t)를 함께 표시함으로써 사용자가 의도하지 않은 객체를 선택한 것인지 여부를 판단할 수 있으며, 사용자는 해당 아이콘 선택을 위한 남은 시간에 대한 정보를 제공받을 수 있다는 효과를 가질 수 있다.At this time, when the content management unit confirms that the hand object (h) is located in a specific icon object, it can control the time bar object (t) to be displayed together around the icon object where the hand object is located. The time bar object (t) is a ring-shaped object that fills in a circle clockwise from the 12 o'clock position of the icon object. For example, when 1 second has elapsed, the time bar object (t) can form a completely circular shape. In this way, when the user selects a specific icon object, the content management unit can determine whether the user selected an unintended object by displaying a time bar object (t), and the user can determine whether the user has selected an unintended object in the remaining time for selecting the icon. It can have the effect of being able to receive information about.

도 5 및 도 6은 제1 아이콘 객체가 선택되어 모션제어 기반 인터렉티브 영상 출력모드에서 출력되는 영상 콘텐츠의 일 예가 도시된 도면이다.5 and 6 are diagrams illustrating an example of video content output in a motion control-based interactive video output mode when a first icon object is selected.

콘텐츠 관리부(20)는 상기 제1 아이콘 객체가 선택된 것으로 확인되면, 상기 감지정보에 대응되는 궤적정보를 생성한다. When it is confirmed that the first icon object is selected, the content management unit 20 generates trace information corresponding to the sensed information.

이를 위해, 감지부(10)는 특정 시간동안 수집되는 대표 좌표점의 위치 변화 패턴을 분석하여, 촬영영상으로부터 감지되는 손동작이 미리 저장된 복수의 모션정보 중 어떤 모션정보에 해당되는지를 판단한다. 감지부(10)는 감지된 손동작이 어떤 모션정보인지를 나타내는 감지정보를 생성하여 콘텐츠 관리부(20)로 전송한다.To this end, the detection unit 10 analyzes the position change pattern of representative coordinate points collected during a specific time and determines which motion information among a plurality of pre-stored motion information corresponds to the hand motion detected from the captured image. The detection unit 10 generates detection information indicating what kind of motion information the detected hand gesture is and transmits it to the content management unit 20.

콘텐츠 관리부(20)는 감지부(10)로부터 수신된 감지정보에 기초하여 궤적정보를 생성하고, 생성된 궤적정보(r)가 영상 콘텐츠에 함께 표시되도록 제어할 수 있다.The content management unit 20 may generate trace information based on the sensing information received from the detection unit 10 and control the generated trace information r to be displayed together with the video content.

하지만, 생성된 궤적정보(r)는 반드시 영상 콘텐츠에 표시될 필요는 없으며, 경우에 따라서는 궤적정보(r)를 숨김 처리할 수도 있다. 전자의 경우 사용자가 자신이 입력한 모션이 올바르게 입력되었는지를 확인할 수 있는 장점이 있고, 후자의 경우 영상 콘텐츠에 불필요한 정보가 표시되는 것을 방지할 수 있다는 장점이 있다.However, the generated trace information (r) does not necessarily need to be displayed in the video content, and in some cases, the trace information (r) may be hidden. The former case has the advantage of allowing the user to check whether the motion he/she entered was entered correctly, and the latter case has the advantage of preventing unnecessary information from being displayed in video content.

이후, 콘텐츠 관리부(20)는 도 6에 도시된 바와 같이 영상 콘텐츠 내 객체를 실시간으로 상기 궤적정보에 따라 실시간 이동시킨 인터렉티브 영상 콘텐츠가 출력되도록 제어할 수 있다.Thereafter, the content management unit 20 can control the output of interactive video content in which objects in the video content are moved in real time according to the trajectory information, as shown in FIG. 6 .

한편, 콘텐츠 관리부(20)는 상기 제2 아이콘 객체가 선택된 것으로 확인되면, 상기 음성 감지로부터 수신되는 감지정보에 기초하여 영상 콘텐츠 내 객체가 상기 감지정보에 대응되는 동작을 수행하는 인터렉티브 영상 콘텐츠가 출력되도록 제어한다.Meanwhile, when it is confirmed that the second icon object is selected, the content management unit 20 outputs interactive video content in which an object in the video content performs an action corresponding to the sensing information based on the sensing information received from the voice detection. Control as much as possible.

예를 들어, 콘텐츠 관리부(20)는 제2 아이콘 객체가 선택된 것으로 확인되면, 도 7에 도시된 바와 같이 영상 콘텐츠 내 특정 객체(도시된 도면에서는 물고기 객체)가 수행할 수 있는 감정표현 목록이 표시되도록 하고, 사용자로부터 감지되는 음성 신호에 대응되는 감정 표현을 특정 객체가 실시간 수행하도록 제어할 수 있다.For example, when the content management unit 20 confirms that the second icon object is selected, as shown in FIG. 7, a list of emotional expressions that can be performed by a specific object in the video content (a fish object in the drawing) is displayed. In addition, a specific object can be controlled to perform emotional expressions corresponding to voice signals detected from the user in real time.

이와 같이, 본 발명에 따른 모션 인식에 기초한 영상 콘텐츠 제공 시스템은 사용자의 제스쳐나 음성을 인식하여, 인식된 제스처 또는 음성 패턴에 기초하여 인터렉티브 영상을 출력함으로써 사용자의 몰입도를 극대화시킬 수 있다.In this way, the video content providing system based on motion recognition according to the present invention can maximize the user's immersion by recognizing the user's gesture or voice and outputting an interactive video based on the recognized gesture or voice pattern.

이와 같은, 본 발명에 따른 기술은 애플리케이션으로 구현되거나 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.As such, the technology according to the present invention may be implemented as an application or in the form of program instructions that can be executed through various computer components and recorded on a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, etc., singly or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다.The program instructions recorded on the computer-readable recording medium may be specially designed and configured for the present invention, or may be known and usable by those skilled in the computer software field.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD 와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specifically configured to store and perform program instructions, such as ROM, RAM, flash memory, etc.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include not only machine language code such as that created by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform processing according to the invention and vice versa.

이상에서는 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구범위에 기재된 본 발명의 사상 및 공간으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to embodiments, those skilled in the art will understand that various modifications and changes can be made to the present invention without departing from the spirit and space of the present invention as set forth in the following patent claims. You will be able to.

10: 감지부
20: 콘텐츠 관리부10: detection unit
20: Content Management Department

Claims

A video content provision system based on motion recognition that recognizes a user's gesture or voice and outputs an interactive video based on the recognized gesture or voice pattern.

According to paragraph 1,
The video content provision system based on motion recognition,
A sensing unit that detects the user's motion or voice and generates sensing information; and
A video content providing system based on motion recognition, comprising a content management unit that outputs video content and controls objects in the video content in real time based on the sensing information.

According to paragraph 2,
The sensing unit,
A motion detection unit that detects the user's motion; and
It includes a voice detection unit that detects the user's voice,
The motion detection unit,
An object extraction unit that analyzes a video captured by a user to extract a hand object and generates a three-dimensional coordinate value for the extracted hand object; and
A detection information generator that determines which motion information among a plurality of pre-stored motion information corresponds to the motion detected from the captured image and generates detection information according to the determination result,
The object extraction unit,
Extract the outline of the extracted hand object, detect the midpoint of the hand object from the outline of the extracted hand object, set the 3D coordinate value corresponding to the midpoint of the detected hand object as the representative coordinate value of the hand object,
The voice detection unit,
A video content providing system based on motion recognition that analyzes a voice signal input from a user, determines which motion information the voice signal corresponds to among a plurality of pre-stored motion information, and generates detection information according to the judgment result.

According to paragraph 3,
The content management department,
If the motion information corresponding to the detection information is determined to correspond to the reference motion information among the plurality of pre-stored motion information, the initial screen is displayed,
The initial screen is,
A first icon object for selecting the execution of a motion control-based interactive video output mode that controls objects in video content in real time according to the user's actions;
a second icon object for selecting execution of a voice control-based interactive video output mode that controls objects in video content in real time according to the user's voice; and
When transmitting interactive video content, a third icon object is displayed to select background music that is output together,
Detecting which icon object among the first icon object, the second icon object, and the third icon object is selected by the user based on the representative coordinate values detected while the initial screen is displayed,
When it is confirmed that the first icon object is selected, trajectory information corresponding to the detection information is generated, and interactive video content is output by moving the object in the video content in real time according to the trajectory information,
When it is confirmed that the second icon object is selected, based on the detection information received from the voice detection, interactive video content in which an object in the video content performs an action corresponding to the detection information is controlled to be output, based on motion recognition. Video content provision system.