KR101701813B1

KR101701813B1 - Display apparatus and Method for changing Voice thereof

Info

Publication number: KR101701813B1
Application number: KR1020110115201A
Authority: KR
Inventors: 아디티 가르그; 카스쑤리 자야찬드 야들라팔리
Original assignee: 삼성전자주식회사
Priority date: 2011-04-11
Filing date: 2011-11-07
Publication date: 2017-02-13
Also published as: KR20120115928A

Abstract

디스플레이 장치 및 이의 보이스 변환 방법이 제공된다. 디스플레이 장치의 보이스 변환 방법은, 제1 비디오 프레임이 입력되면, 제1 비디오 프레임에 포함된 적어도 하나의 엔티티를 검출하고, 검출된 엔티티 중 하나가 선택되면, 선택된 엔티티를 저장하며, 기 저장된 복수의 보이스 샘플 중 하나가 선택되면, 선택된 보이스 샘플을 선택된 엔티티에 매칭시켜 저장하고, 선택된 엔티티가 포함된 제2 비디오 프레임이 입력되면, 상기 선택된 엔티티의 보이스를 상기 선택된 보이스 샘플로 변환하여 출력한다. 이에 의해, 비디오 프레임에 포함된 엔티티의 보이스가 선택된 보이스 샘플로 변경됨으로써, 사용자는 보이스가 커스터마이징된 컨텐츠를 제공받을 수 있게 된다.A display device and a voice conversion method thereof are provided. A method of converting a voice of a display device, when a first video frame is input, detecting at least one entity included in the first video frame, storing the selected entity when one of the detected entities is selected, If one of the voice samples is selected, the selected voice sample is matched to the selected entity and stored. When a second video frame including the selected entity is input, the voice of the selected entity is converted into the selected voice sample and output. Thereby, the voice of the entity included in the video frame is changed to the selected voice sample, so that the user can receive the voice-customized contents.

Description

[0001] The present invention relates to a display apparatus and a method of converting the same,

본 발명은 디스플레이 장치 및 이의 보이스 변환 방법에 관한 것으로, 더욱 상세하게는 컨텐츠의 오디오를 커스터마이징하여 변환하는 디스플레이 장치 및 이의 보이스 변환 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a display device and a voice conversion method thereof, and more particularly, to a display device for customizing and converting audio of contents and a voice conversion method therefor.

일반적으로, IPTV(internet protocol television)은 네트워크 기반의 IP를 통해 오디오 및 비디오 정보와 같은 멀티미디어 서비스를 제공하기 위한 시스템이다. 이때, 멀티미디어 서비스는 라이브 TV, VOD(Video On Demend) 및 타임 쉬프트 프로그램(time shifted programming)을 포함할 수 있다. 종래에는 비디오 클립에 표시된 엔티티(entity)의 얼굴을 교체하는 것이 가능하였다. 이때, 엔티티는 비디오 클립에서 사용자에 의해 선택된 특정 캐릭터의 얼굴로 정의한다. 또한, 얼굴인식방법은 비디오 클립에서 사용자에 의해 선택된 엔티티를 다른 엔티티로의 효과적인 교체를 제공하는데 이용될 수 있다. 게다가, 종래에는 사용자에 의해 선택된 엔티티에 대화를 변경시킬 수 있었다. 즉, 사용자는 영상으로부터 제1 대화를 선택하고, 제1 대화를 제2 대화로 교체할 수 있었다. 그러나, 선택된 엔티티의 보이스를 변경하는 기술이 요구되었다.In general, IPTV (internet protocol television) is a system for providing multimedia services such as audio and video information through network-based IP. At this time, the multimedia service may include live TV, video on demand (VOD), and time shifted programming. Conventionally, it has been possible to replace the face of an entity displayed in a video clip. At this time, the entity defines the face of a specific character selected by the user in the video clip. In addition, the face recognition method can be used to provide an effective replacement of the entity selected by the user in the video clip to another entity. In addition, conventionally, the conversation can be changed to the entity selected by the user. That is, the user could select the first conversation from the video and replace the first conversation with the second conversation. However, a technique for changing the voices of selected entities has been required.

따라서, 선택된 엔티티의 보이스를 커스터마이징(customizing)하기 위한 방법 및 시스템의 개발이 요구된다.Accordingly, there is a need to develop a method and system for customizing the voices of selected entities.

본 발명은 입력되는 비디오 프레임에 포함된 엔티티 중 사용자에 의해 선택된 엔티티의 보이스를 커스터마이징하기 위한 디스플레이 장치 및 보이스 변환 방법을 제공함을 목적으로 한다.It is an object of the present invention to provide a display apparatus and a voice conversion method for customizing a voice of an entity selected by a user among entities included in an input video frame.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 디스플레이 장치의 보이스 변환 방법은, 제1 비디오 프레임이 입력되면, 상기 제1 비디오 프레임에 포함된 적어도 하나의 엔티티를 검출하는 단계; 상기 검출된 엔티티 중 하나가 선택되면, 선택된 엔티티를 저장하는 단계; 기 저장된 복수의 보이스 샘플 중 하나가 선택되면, 상기 선택된 보이스 샘플을 상기 선택된 엔티티에 매칭시켜 저장하는 단계; 및 상기 선택된 엔티티가 포함된 제2 비디오 프레임이 입력되면, 상기 선택된 엔티티의 보이스를 상기 선택된 보이스 샘플로 변환하여 출력하는 단계;를 포함한다.According to another aspect of the present invention, there is provided a method of converting a voice of a display device, the method comprising: detecting, when a first video frame is input, at least one entity included in the first video frame; If one of the detected entities is selected, storing the selected entity; If one of the pre-stored plurality of voice samples is selected, matching the selected voice sample to the selected entity and storing the selected voice sample; And converting a voice of the selected entity into the selected voice sample and outputting the selected voice sample when the second video frame including the selected entity is input.

그리고, 상기 엔티티는, 비디오 프레임에 포함된 인물의 얼굴인 것을 특징으로 하며, 상기 검출하는 단계는, 얼굴 검출 모듈을 통해 상기 적어도 하나의 엔티티의 피부색, 모션, 크기, 형태 및 위치 중 적어도 하나를 이용하여 상기 비디오 프레임에 포함된 인물의 얼굴을 검출할 수 있다.Wherein the entity is a face of a person included in a video frame, and wherein the detecting comprises detecting at least one of the skin color, motion, size, shape and position of the at least one entity via the face detection module So that the face of the person included in the video frame can be detected.

또한, 상기 입력된 비디오 프레임에서 상기 적어도 하나의 엔티티가 검출되면, 디스플레이 화면의 일 영역에 상기 검출된 엔티티를 리스트로 표시하는 단계;를 더 포함할 수 있다.The method may further include, when the at least one entity is detected in the input video frame, displaying the detected entity as a list in one area of the display screen.

그리고, 상기 엔티티가 선택되면, 디스플레이 화면의 일 영역에 상기 복수의 보이스 샘플을 리스트로 표시하는 단계;를 더 포함할 수 있다.And displaying the plurality of voice samples in a list in one area of the display screen when the entity is selected.

또한, 상기 선택된 엔티티를 저장하는 단계는, 상기 선택된 엔티티에 대응되는 제1 식별자를 룩-업 테이블에 저장하고, 상기 선택된 보이스 샘플을 저장하는 단계는, 상기 선택된 보이스 샘플에 대응되는 제2 식별자를 룩-업 테이블에 저장할 수 있다.The step of storing the selected entity may further include storing a first identifier corresponding to the selected entity in a look-up table, and storing the selected voice sample may include storing a second identifier corresponding to the selected voice sample It can be stored in a look-up table.

그리고, 상기 복수의 보이스 샘플은, 상기 디스플레이 장치에 기 내장된 보이스 샘플, 기록된 보이스 샘플 및 사용자에 의해 입력된 보이스 샘플 중 적어도 하나를 포함하며, 상기 기록된 보이스 샘플 및 사용자에 의해 입력된 보이스 샘플은, 보이스 서브 샘플러 모듈에 의해 필터링 처리될 수 있다.The plurality of voice samples include at least one of a voice sample embedded in the display device, a recorded voice sample, and a voice sample input by the user, and the recorded voice sample and the voice input by the user The sample can be filtered by the voice subsampler module.

또한, 상기 출력하는 단계는, 상기 제2 비디오 프레임에 상기 선택된 엔티티가 포함되었는지 여부를 판단하는 단계;를 포함할 수 있다.The outputting step may include determining whether the selected entity is included in the second video frame.

그리고, 상기 출력하는 단계는, 상기 제2 비디오 프레임 속에 포함된 상기 선택된 엔티티의 입술 움직임 여부가 있는지 여부를 판단하는 단계; 상기 엔티티의 입술 움직임이 존재한다고 판단되면, 상기 엔티티의 보이스를 상기 선택된 보이스 샘플로 변환하여 출력하는 단계;를 포함할 수 있다.The outputting step may include: determining whether or not the selected entity included in the second video frame has a lips movement; And converting the voice of the entity into the selected voice sample and outputting the selected voice sample if it is determined that the lip movement of the entity is present.

한편, 상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른, 디스플레이 장치는, 제1 비디오 프레임이 입력되면, 상기 제1 비디오 프레임에 포함된 적어도 하나의 엔티티를 검출하는 검출부; 상기 검출된 엔티티 중 보이스 변환을 수행할 엔티티를 선택받고, 상기 선택된 엔티티에 매칭되는 보이스 샘플을 선택받기 위한 사용자 인터페이스부; 상기 선택된 엔티티 및 상기 선택된 보이스 샘플을 저장하는 저장부; 및 상기 선택된 엔티티가 포함된 후속 비디오 프레임이 입력되면, 상기 선택된 엔티티의 보이스를 상기 선택된 보이스 샘플로 변환하여 출력하는 제어부;를 포함한다.According to another aspect of the present invention, there is provided a display apparatus including: a detector for detecting at least one entity included in a first video frame when a first video frame is input; A user interface unit for selecting an entity to perform voice conversion among the detected entities and receiving a voice sample matching the selected entity; A storage for storing the selected entity and the selected voice samples; And a controller for converting a voice of the selected entity into the selected voice sample and outputting the selected voice sample when a subsequent video frame including the selected entity is input.

그리고, 상기 엔티티는, 비디오 프레임에 포함된 인물의 얼굴인 것을 특징으로 하며, 상기 검출부는, 얼굴 검출 모듈을 통해 상기 적어도 하나의 엔티티의 피부색, 모션, 크기, 형태 및 위치 중 적어도 하나를 이용하여 상기 비디오 프레임에 포함된 인물의 얼굴을 검출할 수 있다.And the detection unit detects at least one of the skin color, motion, size, shape and position of the at least one entity via the face detection module The face of the person included in the video frame can be detected.

또한, 상기 입력된 비디오 프레임을 처리하는 비디오 처리부; 상기 입력된 비디오 프레임에 대응되는 오디오 신호를 처리하는 오디오 처리부; 상기 비디오 처리부에서 처리된 비디오 프레임을 화면 상에 출력하는 디스플레이부; 및 상기 오디오 처리부에서 처리된 오디오 신호를 상기 비디오 프레임과 동기시켜 출력하는 오디오 출력부;를 더 포함하며, 상기 제어부는, 상기 선택된 엔티티의 보이스를 상기 선택된 보이스 샘플로 변환하여 상기 오디오 출력부로 제공하도록 상기 오디오 처리부를 제어할 수 있다.A video processing unit for processing the input video frame; An audio processing unit for processing an audio signal corresponding to the input video frame; A display unit for outputting video frames processed by the video processing unit on a screen; And an audio output unit for outputting the audio signal processed by the audio processing unit in synchronization with the video frame, wherein the control unit converts the voice of the selected entity into the selected voice sample and provides the selected voice sample to the audio output unit The audio processing unit can be controlled.

그리고, 상기 제어부는, 상기 입력된 비디오 프레임에서 상기 적어도 하나의 엔티티가 검출되면, 상기 화면 상의 일 영역에 상기 검출된 엔티티를 리스트로 표시하도록 상기 디스플레이부를 제어할 수 있다.When the at least one entity is detected in the input video frame, the controller may control the display unit to display the detected entity in a region on the screen.

또한, 상기 제어부는, 상기 엔티티가 선택되면, 상기 화면의 일 영역에 상기 복수의 보이스 샘플을 리스트로 표시하도록 상기 디스플레이 부를 제어할 수 있다.In addition, when the entity is selected, the controller may control the display unit to display the plurality of voice samples in a list in one area of the screen.

그리고, 상기 저장부는, 상기 선택된 엔티티에 대응되는 제1 식별자 및 상기 선택된 보이스 샘플에 대응되는 제2 식별자를 룩-업 테이블에 저장할 수 있다.The storage unit may store a first identifier corresponding to the selected entity and a second identifier corresponding to the selected voice sample in a look-up table.

또한, 상기 저장부는, 상기 디스플레이 장치에 기 내장된 보이스 샘플, 기록된 보이스 샘플 및 사용자에 의해 입력된 보이스 샘플 중 적어도 하나를 저장하며, 상기 선택된 보이스 샘플은, 상기 기 내장된 보이스 샘플, 상기 기록된 보이스 샘플 및 상기 사용자에 의해 입력된 보이스 샘플 중 하나일 수 있다.The storage unit may store at least one of a voice sample stored in the display device, a recorded voice sample, and a voice sample input by the user, and the selected voice sample may include at least one of the built- And a voice sample input by the user.

그리고, 상기 기록된 보이스 샘플 및 사용자에 의해 입력된 보이스 샘플은, 보이스 서브 샘플러 모듈에 의해 필터링 처리될 수 있다.The recorded voice samples and the voice samples input by the user can be filtered by the voice subsampler module.

또한, 상기 제어부는, 얼굴 검출 서브 모듈을 이용하여 상기 제2 비디오 프레임에 포함된 인물의 얼굴 중에 상기 선택된 엔티티가 있는지 여부를 검색하여 판단할 수 있다.In addition, the controller may search for and determine whether the selected entity exists in the face of the person included in the second video frame using the face detection sub-module.

그리고, 상기 제어부는, 상기 제2 비디오 프레임 속에 포함된 상기 선택된 엔티티의 입술 움직임 여부가 있는지 여부를 판단하고, 상기 엔티티의 입술 움직임이 존재한다고 판단되면, 상기 엔티티의 보이스를 상기 선택된 보이스 샘플로 변환하여 출력할 수 있다.The control unit may determine whether or not there is lip movement of the selected entity included in the second video frame, and convert the voice of the entity into the selected voice sample if it is determined that the lip movement of the entity is present And output it.

상술한 바와 같은 본 발명의 다양한 실시예에 따르면, 비디오 프레임에 포함된 엔티티의 보이스가 선택된 보이스 샘플로 변경됨으로써, 사용자는 보이스가 커스터마이징된 컨텐츠를 제공받을 수 있게 된다.According to various embodiments of the present invention as described above, the voice of the entity included in the video frame is changed to the selected voice sample so that the user can receive the voice-customized content.

도 1은 본 발명의 일 실시예에 따른, 디스플레이 장치의 블럭도를 도시한 도면,
도 2는 본 발명의 일 실시예에 따른, 디스플레이 장치에서 선택된 엔티티의 보이스를 커스터마이징하기 위한 구성의 블럭도를 도시한 도면,
도 3은 본 발명의 일 실시예에 따른, 디스플레이 장치에서 선택된 엔티티의 보이스를 커스터마이징하기 위한 방법을 설명하기 위한 흐름도,
도 4는 본 발명의 일 실시예에 따란, 제1 프리젠테이션 모듈을 이용하여 엔티티를 선택하고 업데이트하는 방법을 설명하기 위한 흐름도,
도 5는 본 발명의 일 실시예에 따른, 엔티티 선택을 위한 룩-업 테이블을 포함하는 사용자 인터페이스를 도시한 도면,
도 6은 본 발명의 일 실시예에 따른, 보이스 서브-샘플러 모듈을 이용하여 보이스를 커스토마이징하기 위한 보이스 샘플을 선택하는 방법을 설명하기 위한 흐름도,
도 7은 본 발명의 일 실시예에 따른, 보이스 샘플을 선택하기 위한 룩-업 테이블을 포함하는 사용자 인터페이스를 도시한 도면, 그리고,
도 8은 본 발명의 일 실시예에 따른, 핵심처리모듈을 이용한 보이스의 커스터마이징 방법을 설명하기 위한 흐름도이다.1 is a block diagram of a display device according to an embodiment of the present invention,
2 is a block diagram of an arrangement for customizing voices of selected entities in a display device, in accordance with an embodiment of the present invention;
3 is a flowchart illustrating a method for customizing a voice of an entity selected in a display device, according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a method of selecting and updating an entity using a first presentation module according to an embodiment of the present invention; FIG.
Figure 5 illustrates a user interface including a look-up table for entity selection, according to one embodiment of the present invention;
6 is a flow chart illustrating a method for selecting a voice sample for customizing voices using a voice sub-sampler module, in accordance with an embodiment of the present invention;
Figure 7 illustrates a user interface including a look-up table for selecting voice samples, according to one embodiment of the present invention,
8 is a flowchart illustrating a method of customizing a voice using a core processing module according to an embodiment of the present invention.

이하에서는 도면을 참조하여, 본 발명에 대해 더욱 상세히 설명하기로 한다. 도 1은 본 발명의 일 실시예에 따른, 디스플레이 장치(1)의 블럭도를 도시한 도면이다.Hereinafter, the present invention will be described in more detail with reference to the drawings. 1 is a block diagram of a display device 1 according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 디스플레이 장치(1)는 영상 입력부(10), 검출부(20), 비디오 처리부(30), 오디오 처리부(40), 저장부(50), 오디오 출력부(60), 디스플레이부(70), 사용자 인터페이스부(80) 및 제어부(90)를 포함한다.1, the display device 1 includes a video input unit 10, a detection unit 20, a video processing unit 30, an audio processing unit 40, a storage unit 50, an audio output unit 60, A display unit 70, a user interface unit 80, and a control unit 90.

영상 입력부(10)는 유,무선으로 연결된 외부의 소스로부터 비디오 프레임으로 이루어진 영상 데이터를 입력받는다. 이때, 영상 입력부(10)는 방송국으로부터 방송 데이터를 입력받거나 DVD 플레이어와 같은 영상 입력 장치로부터 동영상 데이터를 입력받을 수 있다.The video input unit 10 receives video data composed of video frames from an external source connected wirelessly or wirelessly. At this time, the video input unit 10 may receive broadcast data from a broadcasting station or receive video data from a video input device such as a DVD player.

검출부(20)는 입력된 영상 데이터의 비디오 프레임으로부터 엔티티(entity)를 검출한다. 이때, 엔티티는 비디오 프레임에 포함된 인물의 얼굴 또는 얼굴을 포함하는 특정 캐릭터의 이미지일 수 있다. 따라서, 검출부(20)는 얼굴 검출 모듈을 이용하여, 비디오 프레임에 포함된 엔티티를 검출할 수 있다. 또한, 검출부(20)는 엔티티를 검출할 때, 엔티티의 피부색, 모션, 크기, 모양 및 위치 등을 이용할 수 있다.The detection unit 20 detects an entity from the video frame of the input image data. At this time, the entity may be an image of a specific character including a face or a face of a person included in the video frame. Therefore, the detection unit 20 can detect the entity included in the video frame using the face detection module. The detection unit 20 can use the skin color, motion, size, shape, and position of the entity when detecting the entity.

비디오 처리부(30)는 입력된 영상 데이터의 비디오 프레임을 처리한다. 즉, 비디오 처리부(30)는 입력된 영상 데이터의 디코딩, 스케일링과 같은 영상 처리 작업을 수행한다.The video processing unit 30 processes video frames of input video data. That is, the video processing unit 30 performs image processing operations such as decoding and scaling of inputted image data.

오디오 처리부(40)는 상기 입력된 비디오 프레임에 대응되는 오디오 신호를 처리한다. 이때, 오디오 처리부(40)는 제어부(90)의 제어에 의해, 비디오 프레임에 포함된 엔티티의 보이스를 변경하도록 처리할 수 있다.The audio processing unit 40 processes the audio signal corresponding to the input video frame. At this time, under the control of the control unit 90, the audio processing unit 40 can process the voice of the entity included in the video frame to change.

저장부(50)는 디스플레이 장치(1)를 구동하기 위한 각종 데이터 및 멀티미디어 데이터를 저장한다. 또한, 저장부(50)는 디스플레이 장치(1)의 보이스 변환을 위하여 다양한 모듈을 저장한다. The storage unit 50 stores various data and multimedia data for driving the display device 1. [ In addition, the storage unit 50 stores various modules for voice conversion of the display device 1. [

오디오 출력부(60)는 오디오 처리부(50)에서 처리된 오디오 신호를 출력한다. 이때, 오디오 출력부(60)는 스피커로 구현될 수 있다.The audio output unit 60 outputs the audio signal processed by the audio processing unit 50. [ At this time, the audio output unit 60 may be implemented as a speaker.

디스플레이부(70)는 비디오 처리부(30)에서 처리된 비디오 프레임을 디스플레이한다.The display unit 70 displays the processed video frames in the video processing unit 30. [

사용자 인터페이스부(80)는 디스플레이 장치(1)를 제어하기 위해 사용자로부터 제어 명령을 수신한다. 특히, 사용자 인터페이스부(80)를 통해 보이스를 변환할 엔티티 및 보이스 변환할 엔티티에 적용되는 보이스 샘플이 선택될 수 있다.The user interface unit 80 receives a control command from the user to control the display device 1. [ In particular, a voice sample to be applied to the entity to be voice-converted and the entity to be voice-converted through the user interface unit 80 can be selected.

이때, 사용자 인터페이스부(80)는 GUI(Graphic User Interface) 및 터치 스크린, 리모컨, 포인팅 디바이스와 같은 입력 장치로 구현될 수 있다.At this time, the user interface unit 80 may be implemented as a GUI (Graphic User Interface) and an input device such as a touch screen, a remote controller, and a pointing device.

제어부(90)는 사용자 인터페이스부(80)를 통해 수신한 제어명령을 바탕으로 디스플레이 장치(1)의 전반을 제어한다. 특히, 제어부(90)는 비디오 프레임에 포함된 엔티티의 보이스 커스터마이징을 위해, 보이스를 변환할 수 있다.The control unit 90 controls the entire display device 1 based on the control command received through the user interface unit 80. In particular, the control unit 90 may convert the voice for voice customization of the entity included in the video frame.

구체적으로, 제어부(90)는 영상 입력부(10)를 통해 제1 비디오 프레임이 입력되면, 제1 비디오 프레임에 포함된 적어도 하나의 엔티티를 검출하도록 검출부(20)를 제어한다. Specifically, when the first video frame is input through the video input unit 10, the control unit 90 controls the detection unit 20 to detect at least one entity included in the first video frame.

그리고, 적어도 하나의 엔티티가 검출되면, 제어부(9)는 적어도 하나의 엔티티 중 하나를 선택할 수 있도록 적어도 하나의 엔티티가 포함된 리스트를 디스플레이화면의 일 영역에 표시하도록 디스플레이부(80)를 제어한다.When at least one entity is detected, the control unit 9 controls the display unit 80 to display a list including at least one entity in one area of the display screen so that one of the at least one entity can be selected .

그리고, 사용자 인터페이스부(80)를 통해 검출된 엔티티 중 보이스 변환을 수행할 제1 엔티티가 선택되면, 제어부(90)는 선택된 제1 엔티티가 저장되도록 저장부(50)를 제어한다. 이때, 제어부(90)는 선택된 제1 엔티티와 대응되는 제1 식별자를 선택된 제1 엔티티와 함께 저장되도록 저장부(50)를 제어할 수 있다.When a first entity to be voice-converted is selected from among the detected entities through the user interface unit 80, the controller 90 controls the storage unit 50 to store the selected first entity. At this time, the control unit 90 may control the storage unit 50 to store the first identifier corresponding to the selected first entity together with the selected first entity.

그리고, 제어부(90)는 선택된 제1 엔티티와 매칭되는 보이스 샘플을 선택하기 위하여, 복수의 보이스 샘플이 포함된 리스트를 디스플레이 화면의 일 영역에 표시하도록 디스플레이부(80)를 제어한다. 이때, 복수의 보이스 샘플은 디스플레이 장치(1)에 기 내장된 보이스 샘플, 기록된 보이스 샘플 및 사용자에 의해 입력된 보이스 샘플 중 적어도 하나를 포함한다.The control unit 90 controls the display unit 80 to display a list including a plurality of voice samples in one area of the display screen in order to select a voice sample matching the selected first entity. At this time, the plurality of voice samples include at least one of a voice sample built in the display device 1, a recorded voice sample, and a voice sample input by the user.

그리고, 사용자 인터페이스부(80)를 통해 복수의 보이스 샘플 중 특정 보이스 샘플이 선택되면, 제어부(90)는 선택된 보이스 샘플과 선택된 제1 엔티티를 매칭시켜 저장되도록 저장부(50)를 제어한다. 이때, 제어부(90) 선택된 보이스 샘플에 대응되는 제2 식별자가 저장되도록 저장부(50)를 제어할 수 있다.When a specific voice sample among a plurality of voice samples is selected through the user interface unit 80, the control unit 90 controls the storage unit 50 to store the selected voice sample and the selected first entity in a matching manner. At this time, the control unit 90 may control the storage unit 50 to store the second identifier corresponding to the selected voice sample.

그리고, 제2 비디오 프레임이 입력되면, 제어부(90)는 제2 비디오 프레임에 선택된 제1 엔티티와 대응되는 엔티티가 있는지 여부를 판단한다. 제2 비디오 프레임에 포함된 복수의 엔티티 중 선택된 제1 엔티티와 대응되는 엔티티가 있으면, 제어부(90)는 선택된 제1 엔티티의 보이스를 선택된 보이스 샘플로 변환하여 오디오 출력부(60)로 출력되도록 오디오 처리부(40)를 제어할 수 있다.When the second video frame is input, the control unit 90 determines whether or not there is an entity corresponding to the first entity selected in the second video frame. If there is an entity corresponding to the selected first entity among the plurality of entities included in the second video frame, the controller 90 converts the voice of the selected first entity into the selected voice sample and outputs the selected voice sample to the audio output unit 60 The processing unit 40 can be controlled.

특히, 제어부(90)는 제2 비디오 프레임 속에 포함된 선택된 제1 엔티티의 입술 움직임 여부가 있는지 여부를 판단하고, 엔티티의 입술 움직임이 존재한다고 판단되면, 제1 엔티티의 보이스를 선택된 보이스 샘플로 변환하여 출력하도록 오디오 처리부(40)를 제어할 수 있다.In particular, the control unit 90 determines whether or not there is lip movement of the selected first entity included in the second video frame, and converts the voice of the first entity into the selected voice sample And controls the audio processing unit 40 to output it.

이때, 제어부(90)는 제1 엔티티의 보이스의 특유의 성질 중 음색 및 음높이 중 적어도 하나를 변경할 수 있다.At this time, the controller 90 may change at least one of the tone color and the pitch of the peculiar characteristics of the voice of the first entity.

상술한 바와 같은 디스플레이 장치(1)에 의해 비디오 프레임에 포함된 제1 엔티티의 보이스가 선택된 보이스 샘플로 변경됨으로써, 사용자는 보이스가 커스터마이징된 컨텐츠를 제공받을 수 있게 된다.The voice of the first entity included in the video frame is changed to the selected voice sample by the display device 1 as described above so that the user can receive the voice-customized contents.

이하에서는 도 2 내지 도 8을 참조하여, 본 발명의 보이스 변환 방법을 더욱 상세히 설명하기로 한다. Hereinafter, the voice conversion method of the present invention will be described in more detail with reference to FIG. 2 to FIG.

도 2는 본 발명의 일 실시예에 따른, 디스플레이 장치에서 선택된 엔티티의 보이스를 커스터마이징하기 위한 구성의 블럭도를 도시한 도면이다. 특히, 도 2는 본 발명의 또 다른 실시예로, 보이스 변환을 위한 디스플레이 장치(100)의 구성 모듈을 도시한 도면이다.Figure 2 is a block diagram of a configuration for customizing voices of selected entities in a display device, in accordance with an embodiment of the present invention. In particular, FIG. 2 is a diagram showing a configuration module of a display device 100 for voice conversion according to another embodiment of the present invention.

도 2에 도시된 바와 같이, 엔티티의 보이스를 커스터마이징하기 위한 구성의 블럭도는 제1 비디오 프레임(105), 얼굴 검출 모듈(110), 엔티티의 선택을 위한 제1 프리젠테이션 모듈(115), 제1 엔티티를 저장하기 위한 룩-업 테이블(120), 보이스 샘플을 선택하기 위한 제2 프리젠테이션 모듈(125), 제2 식별자(130), 제1 식별자(195), 입력 비디오 프레임(135), 핵심 처리 모듈(145), 사용자 입력 보이스 샘플(165), 기록된 보이스(170), 보이스 서브 샘플러 모듈(180), 보이스 데이터베이스(190)를 포함하며, 핵심 처리 모듈(145)은 얼굴 검색 서브 모듈(150), 입술 움직임 검출 서브 모듈(155) 및 보이스 제어부(160)를 포함하고, 보이스 서브 샘플러 모듈(180)은 보이스 처리 모듈(15) 및 기록 모듈(185)을 포함한다. 2, a block diagram of a configuration for customizing an entity's voice includes a first video frame 105, a face detection module 110, a first presentation module 115 for selection of an entity, Up table 120 for storing one entity, a second presentation module 125 for selecting a voice sample, a second identifier 130, a first identifier 195, an input video frame 135, The core processing module 145 includes a core processing module 145, a user input voice sample 165, a recorded voice 170, a voice subsampler module 180 and a voice database 190, A lips movement detection submodule 155 and a voice control unit 160. The voice subsampler module 180 includes a voice processing module 15 and a recording module 185. [

제1 비디오 프레임(105)은 디스플레이 장치(100)에 디스플레이된다. 디스플레이 장치(100)의 일 예로서, 컴퓨터, IPTV, VOD, CE(consumer electronic) 장치 및 인터넷 TV로 구현될 수 있으나, 이에 한정되는 것은 아니다. 제1 비디오 프레임(105)의 일 예로 영화, 방송 스트림, 라이브 비디오 및 비디오 클립의 장면을 포함할 수 있으나, 이에 한정되는 것은 아니다. 디스플레이 장치(100)는 네트워크를 통해 제1 비디오 프레임(105)을 수신한다. 이때, 네트워크의 일 예로, 무선 네트워크, 인터넷, 인트라넷, 블루투스, SAN(Small Area Network), MAN(Metropolitan Area Network) 및 이더넷(Ethernet)을 포함할 수 있으나, 이에 한정되는 것은 아니다. 제1 비디오 프레임(105)은 복수의 엔티티를 포함한다. 이때, 엔티티는 제1 비디오 프레임(105)에 표시된 복수의 캐릭터로써 여겨질 수 있다. 사용자는 보이스 커스터마이징을 위하여, 제1 비디오 프레임(105)에 표시된 복수의 엔티티로부터 특정 엔티티를 선택할 수 있다. 이하에서는 사용자에 의해 선택된 특정 엔티티를 제1 엔티티(140)라고 한다. The first video frame 105 is displayed on the display device 100. As an example of the display device 100, a computer, an IPTV, a VOD, a CE (Consumer electronic) device, and an Internet TV may be used, but the present invention is not limited thereto. One example of the first video frame 105 may include, but is not limited to, scenes of movies, broadcast streams, live video, and video clips. The display device 100 receives the first video frame 105 over the network. The network may include, but is not limited to, a wireless network, the Internet, an intranet, a Bluetooth, a small area network (SAN), a metropolitan area network (MAN), and an Ethernet. The first video frame 105 includes a plurality of entities. At this time, the entity may be regarded as a plurality of characters displayed in the first video frame 105. A user may select a particular entity from a plurality of entities displayed in the first video frame 105 for voice customization. Hereinafter, a specific entity selected by the user is referred to as a first entity 140.

사용자는 보이스 커스터마이징을 위하여 디스플레이 장치(100)의 보이스 설정 옵션을 수행할 수 있다. 보이스 설정 옵션에서의 선택에 있어서, 제1 비디오 프레임(105)을 캡쳐하기 위해 얼굴 검출 모듈(110)이 작동될 수 있다. 얼굴 검출 모듈(110)은 캡쳐된 제1 비디오 프레임(105)에 포함된 적어도 하나의 엔티티를 추출한다. 얼굴 검출 모듈(110)은 제1 비디오 프레임(105)에 표시된 적어도 하나의 엔티티를 추출하기 위하여 복수의 특유의 성질을 이용할 수 있다. 이때, 특유의 성질의 일 예로 피부색, 모션, 사이즈, 모양 및 위치를 포함할 수 있으나, 이에 한정되지 않는다. 또한, 얼굴 검출 모듈(110)은 제1 비디오 프레임(105)에 포함된 엔티티를 추출하기 위하여, 다양한 알고리즘을 이용할 수 있다.The user can perform voice setting options of the display device 100 for voice customization. For selection in the voice configuration option, the face detection module 110 may be activated to capture the first video frame 105. [ The face detection module 110 extracts at least one entity included in the captured first video frame 105. The face detection module 110 may utilize a plurality of unique properties to extract at least one entity displayed in the first video frame 105. At this time, one specific characteristic may include, but is not limited to, skin color, motion, size, shape, and position. In addition, the face detection module 110 may utilize various algorithms to extract the entities contained in the first video frame 105.

그리고, 제1 비디오 프레임(105)에 포함된 엔티티는 제1 프리젠테이션 모듈(115)에 의해 리스트로 표현될 수 있다. 제1 프리젠테이션 모듈(115)에 리스트된 엔티티 중에서 사용자에게 보이스 커스터마이징을 위해 특정 엔티티가 선택될 수 있다. 사용자에 의한 엔티티 선택을 위해, 제1 엔티티(140)는 룩-업 테이불에 저장된다. 그리고, 제1 엔티티(140)는 제1 식별자(195)와 연관된다. 즉, 제1 식별자(195)는 제1 엔티티(140)를 나타낸다. 또한, 제1 식별자(195)는 제1 엔티티(140)에 배타적으로 나타낸다. 또한, 룩-업 테이블(120)은 제2 식별자(130)를 포함한다. 제2 식별자(130)는 보이스 커스터마이징에 이용되는 보이스 샘플을 표시한다. 복수의 보이스 샘플은 보이스 데이터베이스(190)에 저장된다. 사용자는 보이스 데이터베이스(190)에 저장된 복수의 보이스 샘플로부터 보이스 커스터마이징을 위해 사용될 수 있는 특정 보이스 샘플을 선택할 수 있다. 이하에서는, 보이스 커스터마이징을 위해 사용자에 의해 선택된 특정 보이스 샘플을 "선택된 보이스 샘플"이라고 말하기로 한다. 게다가 제2 프리젠테이션 모듈(125)은 보이스 데이터베이스(190)에 저장된 보이스 샘플을 리스트로 표시하기 위해 이용된다. 제2 프리젠테이션 모듈(125)은 사용자에게 보이스 커스터마이징을 위해 이용될 수 있는 특정 보이스 샘플을 선택하도록 할 수 있다.The entities included in the first video frame 105 may then be represented by the first presentation module 115 as a list. A particular entity may be selected for voice customization to the user from the entities listed in the first presentation module 115. For entity selection by the user, the first entity 140 is stored in a look-up table. The first entity 140 is then associated with a first identifier 195. That is, the first identifier 195 represents the first entity 140. In addition, the first identifier 195 exclusively represents the first entity 140. In addition, the look-up table 120 includes a second identifier 130. The second identifier 130 indicates a voice sample used for voice customization. A plurality of voice samples are stored in the voice database 190. A user may select a particular voice sample that may be used for voice customization from a plurality of voice samples stored in the voice database 190. [ Hereinafter, a specific voice sample selected by the user for voice customization will be referred to as a "selected voice sample ". In addition, the second presentation module 125 is used to list the voice samples stored in the voice database 190. The second presentation module 125 may allow the user to select a particular voice sample that may be used for voice customization.

보이스 서브 샘플러 모듈(180)은 사용자에 의해 선택된 보이스 샘플을 처리한다. 사용자에 의해 입력된 보이스 샘플들의 일 예로 기록된 보이스 샘플(170), 데이터베이스에 저장된 내장된 보이스 샘플(미도시) 및 사용자에 의해 입력된 보이스 샘플을 포함할 수 있으나, 이에 한정되는 것이 아니다. 일반적으로, 내장된 보이스 샘플은 서비스 제공자에 의해 제공된다. 또한, 보이스 서브 샘플러 모듈(180)은 보이스 데이터베이스(190)에 특정 보이스 샘플을 저장하기에 앞서 보이스 샘플을 평활 필터(smooth filter)를 통과하게 함으로써, 특정 보이스 샘플의 음질을 향상시킨다. 또한, 보이스 서브 샘플러 모듈(180)은 보이스 기록 모듈(185)을 이용하여 실시간으로 보이스 샘플을 기록할 수 잇도록 한다. 또한, 사용자는 웹으로부터 보이스 서브 샘플러 모듈(180)에 보이스 샘플을 입력할 수 있다. 사용자에 의해 입력된 보이스 샘플 중 일부 및 기록된 보이스 샘플은 보이스 처리 모듈(175)에 의해 처리된다. 처리된 보이스 샘플은 보이스 데이터베이스(190)에 입력된다. 보이스 데이터베이스(190)에 보이스 샘플을 등록할 때, 제2 식별자(130)가 생성된다. 보이스 데이터베이스(190)에 저장된 각각의 보이스 샘플은 대응되는 제2 식별자(130)와 연관되어 저장된다. 보이스 데이터베이스(190)에 저장된 각각의 보이스 샘플을 위해 생성된 제2 식별자는 보이스 샘플을 배타적으로 표시하는 것에 이용된다. 그리고 복수의 보이스 샘플은 제2 프리젠테이션 모듈(125)을 이용하여 사용자가 선택할 수 있도록 리스트로 표시된다. 사용자는 제2 프리젠테이션 모듈(125)에 의해 생성된 리스트에 포함된 보이스 샘플들 중 특정 보이스 샘플을 선택할 수 있다. 보이스 변환을 위해 특정 보이스 샘플을 선택할 때, 사용자에 의해 선택된 특정 보이스에 대응되는 제2 식별자(130)는 룩-업 테이블(130)에 저장된다. 제2 식별자(130)는 데이터베이스(190)로부터 사용자에 의해 선택된 특정 보이스 샘플을 제1 엔티티(130)에 맵핑할 때 이용될 수 있다. The voice subsampler module 180 processes the voice samples selected by the user. But are not limited to, a recorded voice sample 170 as an example of the voice samples input by the user, an embedded voice sample (not shown) stored in the database, and a voice sample input by the user. Generally, a built-in voice sample is provided by a service provider. In addition, the voice subsampler module 180 enhances the sound quality of a particular voice sample by allowing the voice samples to pass through a smooth filter prior to storing the particular voice sample in the voice database 190. The voice subsampler module 180 also allows the voice recording module 185 to record voice samples in real time. In addition, the user may enter voice samples from the web into the voice subsampler module 180. Some of the voice samples input by the user and the recorded voice samples are processed by the voice processing module 175. [ The processed voice samples are input to the voice database 190. When registering a voice sample in the voice database 190, a second identifier 130 is generated. Each voice sample stored in the voice database 190 is stored in association with the corresponding second identifier 130. A second identifier generated for each voice sample stored in the voice database 190 is used to exclusively represent the voice samples. The plurality of voice samples are displayed in a list for the user to select using the second presentation module 125. The user may select a particular voice sample from the voice samples included in the list generated by the second presentation module 125. [ When selecting a specific voice sample for voice conversion, the second identifier 130 corresponding to the specific voice selected by the user is stored in the look-up table 130. [ The second identifier 130 may be used when mapping a particular voice sample selected by the user to the first entity 130 from the database 190. [

얼굴 검색 서브 모듈(150), 입술 움직임 검출 서브 모듈(155) 및 보이스 제어부(160)를 포함하는 핵심 처리 모듈(145)은 보이스 서브 샘플러 모듈(130)에 연결된다. 핵심 처리 모듈(145)은 디스플레이 장치(100)의 핵심부이다. 핵심 처리 모듈(145)은 보이스 설정 옵션이 디스플레이 장치(100)에 의해 수행되었는지 여부를 판단한다.The core processing module 145, which includes a facial search submodule 150, a lip motion detection submodule 155 and a voice control 160, is coupled to the voice subsampler module 130. The core processing module 145 is a core part of the display device 100. The core processing module 145 determines whether or not the voice setting option has been performed by the display device 100. [

보이스 설정 옵션이 수행될 때, 핵심 처리 모듈(145)은 입력 비디오 프레임(135)을 수신한다. 비디오 프레임(135)은 사용자가 보이스 커스터마이징을 구현할 때 이용되는 비디오 클립이라 여길 수 있다. 핵심 처리 모듈(145)은 룩-업 테이블(120)을 위해 제1 쿼리를 생성한다. 제1 쿼리는 룩-업 테이블(120)에 저장된 사용자에 의해 선택된 제1 엔티티(140)를 불러오기 위해 이용된다. 제1 식별자(195)에 의해 표시되는 제1 엔티티(140)는 얼굴 검색 서브 모듈(150)의 입력으로서 제공된다. 얼굴 검색 서브 모듈(150)은 입력된 비디오 프레임(135)의 복수의 엔티티에 대한 캡쳐를 수행한다. 얼굴 검색 서브 모듈(150)은 입력된 비디오 프레임(135)에 포함된 복수의 엔티티와 제1 엔티티(140) 사이에 매칭 여부를 검색한다. 핵심 처리 모듈(145)은 입력된 비디오 프레임(135)에 포함된 복수의 엔티티와 제1 엔티티(140) 사이에 매칭 여부를 검색하기 위해 이미지 처리 기술을 이용한다. 입력된 비디오 프레임(135)에 포함된 복수의 엔티티 중에 제1 엔티티(140)와 매칭하는 엔티티를 검색하면, 핵심 처리 모듈(145)은 입술 움직임 검출 서브 모듈(155)을 동작한다. 입술 움직임 검출 서브 모듈(155)은 선택된 엔티티의 입술 움직임에 대한 표현을 판단하기 위해 입력된 비디오 프레임(135)을 분석한다. 입력된 비디오 프레임(135)에 포함된 선택된 엔티티의 입술 움직임이 검출되면, 입술 움직임 검출 서브 모듈(155)은 보이스 제어부(160)를 위해 인터럽트(interrupt)를생성한다. 보이스 제어부(160)는 룩-업 테이블(120)에 저장된 제2 식별자(130)를 불러내기 위해 제2 쿼리를 생성한다. 또한, 보이스 제어부(160)는 제2 쿼리를 이용하여 선택된 보이스 샘플을 불러내기 위해 보이스 데이터베이스(190)로 전송하기 위한 제3 쿼리를 생성한다. 보이스 제어부(160)는 음색 및 음높이 등과 같은 보이스 특성을 변경함으로써, 보이스를 커스터마이징 한다. 보이스 변환은 보이스 모핑 방법(Voice Morphing Method) 등을 이용하여 수행될 수 있다. 보이스 모핑 방법은 또한 보이스 변환 방법으로 나타낼 수 있다.When the voice configuration option is performed, the core processing module 145 receives the input video frame 135. [ The video frame 135 may be considered a video clip that is used when the user implements voice customization. The core processing module 145 creates a first query for the look-up table 120. The first query is used to retrieve the first entity (140) selected by the user stored in the look-up table (120). The first entity 140, represented by the first identifier 195, is provided as input to the facial search submodule 150. The facial search submodule 150 performs capturing for a plurality of entities of the input video frame 135. [ The facial search submodule 150 searches for a match between a plurality of entities included in the input video frame 135 and the first entity 140. The core processing module 145 uses an image processing technique to search for a match between the plurality of entities included in the input video frame 135 and the first entity 140. The core processing module 145 operates the lip motion detection submodule 155 when an entity that matches the first entity 140 among the plurality of entities included in the input video frame 135 is searched. The lip motion detection sub-module 155 analyzes the input video frame 135 to determine a representation of the lip motion of the selected entity. When the lip motion of the selected entity contained in the input video frame 135 is detected, the lip motion detection submodule 155 generates an interrupt for the voice control unit 160. The voice control unit 160 generates a second query to retrieve the second identifier 130 stored in the look-up table 120. [ The voice control unit 160 also generates a third query for transmitting to the voice database 190 to retrieve the selected voice sample using the second query. The voice control unit 160 customizes voices by changing voice characteristics such as tone color and pitch. The voice conversion may be performed using a voice morphing method or the like. The voice morphing method can also be represented by a voice conversion method.

룩-업 테이블(120)은 보이스 변환에 제공되기 위한 선택된 엔티티를 매핑하는 것에 이용된다. 룩-업 테이블(120)은 특정 기간 동안 제1 엔티티(140), 제2 식별자(130) 및 제1 식별자(195)를 저장한다. 제1 프리젠테이션 모듈(115)로부터 엔티티가 선택되면, 대응되는 제1 엔티티(140)는 룩-업 테이블(120)에 들어간다. 게다가 룩-업 테이블(120)은 룩-업 테이블(120)에 제1 엔티티(140)가 들어갈 때, 제1 식별자(195)를 생성한다. 또한, 제2 프리젠테이션 모듈(125)로부터 보이스 샘플이 선택되면, 선택된 보이스 샘플을 위해 제2 식별자(130)가 룩-업 테이블(120)에 들어간다.The look-up table 120 is used to map selected entities to be provided for voice conversion. The look-up table 120 stores the first entity 140, the second identifier 130 and the first identifier 195 for a certain period of time. When an entity is selected from the first presentation module 115, the corresponding first entity 140 enters the look-up table 120. In addition, the look-up table 120 generates the first identifier 195 when the first entity 140 enters the look-up table 120. Also, when a voice sample is selected from the second presentation module 125, the second identifier 130 enters the look-up table 120 for the selected voice sample.

입력된 비디오 프레임(135)에 선택된 엔티티의 입술 움직임이 검출되면, 룩-업 테이블(120)은 제2 식별자를 추출한다. 제2 식별자는 보이스 데이터베이스(190)로부터 선택된 엔티티에 인가될 수 있는 보이스 샘플을 불러내기 위해 이용된다. 보이스 제어부(160)는 선택된 엔티티의 보이스를 커스터마이징하기 위해 음색 및 음높이와 같은 보이스 특징을 추출한다. 이때, 보이스 커스터마이징은 디스플레이 장치(100)에서 사용자의 시청을 방해하지 않고 수행된다.When the lips movement of the selected entity in the input video frame 135 is detected, the look-up table 120 extracts the second identifier. The second identifier is used to invoke a voice sample that can be applied to the selected entity from the voice database 190. The voice control unit 160 extracts voice features such as tone color and pitch to customize the voice of the selected entity. At this time, the voice customizing is performed in the display device 100 without disturbing the user's viewing.

도 3은 본 발명의 일 실시예에 따른, 디스플레이 장치(100)에서 선택된 엔티티의 보이스 커스터마이징을 위한 방법을 설명하기 위한 흐름도이다.3 is a flowchart illustrating a method for voice customizing an entity selected in the display device 100, according to an embodiment of the present invention.

210 단계에서, 제1 비디오 프레임에 포함된 적어도 하나의 엔티티가 포함된다. 적어도 하나의 엔티티는 제1 비디오 프레임에 포함된 캐릭터의 얼굴일 수 있다. 제1 비디오 프레임의 일 예로 동영상 클립 및 방송 영상 등이 포함될 수 있으나, 이에 한정되지는 않는다. 제1 비디오 프레임에 포함된 적어도 하나의 엔티티는 얼굴 검출 모듈에 의해 캡쳐될 수 있다. 얼굴 검출 모듈은 제1 비디오 프레임에 포함된 적어도 하나의 엔티티를 캡쳐하기 위해 복수의 특유의 성질을 이용할 수 있다. 특유의 성질의 일 예로 피부색, 모션, 사이즈, 모양 및 위치 등을 포함할 수 있으나, 이에 한정도는 것은 아니다. 또한, 얼굴 검출 모듈은 제1 비디오 프레임에 포함된 적어도 하나의 엔티티를 캡쳐하기 위해 다양한 알고리즘을 이용할 수 있다.In step 210, at least one entity included in the first video frame is included. The at least one entity may be the face of the character contained in the first video frame. An example of the first video frame may include, but is not limited to, a video clip and a broadcast video. At least one entity included in the first video frame may be captured by the face detection module. The face detection module may utilize a plurality of unique properties to capture at least one entity included in the first video frame. Examples of unique properties include, but are not limited to, skin color, motion, size, shape, and position. In addition, the face detection module may use various algorithms to capture at least one entity included in the first video frame.

215 단계에서, 제1 비디오 프레임에 포함된 적어도 하나의 엔티티는 리스트로 표시된다. 제1 비디오 프레임에 포함된 적어도 하나의 리스트는 제1 프리젠테이션 모듈에 의해 수행될 수 있다. 또한, 제1 프리젠테이션 모듈은 제1 비디오 프레임에 포함된 적어도 하나의 엔티티를 디스플레이한다. 사용자는 제1 프리젠테이션 모듈을 이용하여 생성된 리스트에 포함된 적어도 하나의 엔티티로부터 특정 엔티티를 선택한다.In step 215, at least one entity included in the first video frame is displayed as a list. The at least one list included in the first video frame may be performed by the first presentation module. The first presentation module also displays at least one entity included in the first video frame. The user selects a particular entity from at least one entity included in the list generated using the first presentation module.

220 단계에서, 사용자는 제1 비디오 프레임에 포함된 적어도 하나의 엔티티로부터 제1 엔티티를 선택한다. 제1 비디오 프레임에 포함된 엔티티는 제1 프리젠테이션 모듈을 이용하여 생성된 리스트에 포함될 수 있다. 제1 비디오 프레임에 포함된 적어도 하나의 엔티티로부터 제1 엔티티를 선택하는 것은 사용자 인터페이스에 의해 수행될 수 있다. 이때, 사용자 인터페이스는 GUI(Graphic User Interface), 터치 스크린 및 명령 행 인터페이스(command line interface)를 포함할 수 있으나, 이에 한정되지 않는다. 제1 예로, 사용자는 GUI를 이용하여 제1 프리젠테이션 모듈에 입력을 제공함으로써, 제1 비디오 프레임에 포함된 적어도 하나의 엔티티 중에서 제1 엔티티를 선택할 수 있다.In step 220, the user selects a first entity from at least one entity included in the first video frame. The entities included in the first video frame may be included in the list generated using the first presentation module. The selection of the first entity from at least one entity included in the first video frame may be performed by a user interface. At this time, the user interface may include a GUI (Graphic User Interface), a touch screen, and a command line interface, but the present invention is not limited thereto. As a first example, a user may select a first entity among at least one entity included in a first video frame by providing input to a first presentation module using a GUI.

본 발명의 일 실시예로, 사용자에 의핸 선택된 제1 엔티티는 룩-업 테이블에 저장될 수 있다. 룩-업 테이블은 제1 엔티티르확인하기 위한 제1 식별자를 생성하도록 구현될 수 있다. 제1 식별자는 제1 엔티티를 표시한다. 비슷하게 복수의 제1 식별자들은 제1 프리젠테이션 모듈에 포함된 대응되는 복수의 엔티티들을 표시하도록 생성될 수 있다. 그리고 적어도 하나의 엔티티는 룩-업 테이블에 저장될 수 있다. 룩-업 테이블은 프로세서를 이용하여 구현될 수 있다.In one embodiment of the present invention, the first entity selected by the user may be stored in a look-up table. The look-up table may be implemented to generate a first identifier for identifying a first entity. The first identifier indicates the first entity. Similarly, a plurality of first identifiers may be generated to represent a corresponding plurality of entities included in the first presentation module. And at least one entity may be stored in a look-up table. The look-up table may be implemented using a processor.

본 발명의 또 다른 실시예로, 제1 비디오 테이블에 포함된 적어도 하나의 엔티티를 저장하기 위해, 해쉬 테이블(hash table)이 이용될 수 있다.In yet another embodiment of the present invention, a hash table may be used to store at least one entity included in the first video table.

225 단계에서, 제1 보이스 샘플이 선택된다. 제1 보이스 샘플은 보이스 데이터베이스에 저장될 수 있다. 보이스 데이터베이스는 내장될 수 있으며 원거리에 위치할 수 있다. 보이스 데이터베이스는 복수의 보이스 샘플을 포함한다. 제1 실시예로, 제1 보이스 샘플은 제2 식별자를 이용하여 표현될 수 있다. 제1 보이스 샘플을 표시하는 제2 식별자는 룩-업 테이블에 저장될 수 있다. 이와 비슷하게, 복수의 제2 식별자는 보이스 데이터베이스에 저장된 대응되는 복수의 보이스 샘플을 표시하기 위해 생성될 수 있다.In step 225, a first voice sample is selected. The first voice sample may be stored in a voice database. Voice databases can be embedded and can be located remotely. The voice database includes a plurality of voice samples. In a first embodiment, a first voice sample may be represented using a second identifier. The second identifier indicating the first voice sample may be stored in a look-up table. Similarly, a plurality of second identifiers may be generated to represent a plurality of corresponding voice samples stored in the voice database.

또 다른 실시예에서, 제1 보이스 샘플을 표시하는 제2 식별자를 저장하기 위해, 해쉬 테이블이 이용될 수 있다.In another embodiment, a hash table may be used to store a second identifier representing the first voice sample.

230 단계에서, 입력된 비디오 프레임에 포함된 적어도 하나의 엔티티와 제1 엔티티 사이의 매칭 여부를 판단한다. 입력된 비디오 프레임에 포함된 적어도 하나의 엔티티와 제1 엔티티 사이의 매칭 여부는 얼굴 검색 서브 모듈을 이용하여 수행된다. 얼굴 검색 서브 모듈은 입력된 비디오 프레임에 포함된 적어도 하나의 엔티티와 제1 엔티티를 비교한다. 디지털 이미지 처리 기술은 적어도 하나의 엔티티와 제1 엔티티의 비교를 위해 이용될 수 있다. 또한, 얼굴 검색 서브 모듈은 입력된 비디오 프레임에 포함된 적어도 하나의 엔티티와 제1 엔티티를 매칭한다. 입력된 비디오 프레임에 포함된 적어도 하나의 엔티티와 사용자에 의해 선택된 제1 엔티티 를 매칭한 후, 입력된 비디오 프레임의 적어도 하나의 엔티티 들 중에 선택된 엔티티의 입술 움직임을 판단한다.In step 230, it is determined whether there is a match between at least one entity included in the input video frame and the first entity. Whether a match between the first entity and at least one entity included in the input video frame is performed using the face search submodule. The facial search submodule compares the first entity with at least one entity contained in the input video frame. Digital image processing techniques may be used for comparison of at least one entity with a first entity. The face retrieval submodule also matches the first entity with at least one entity included in the input video frame. Matching the first entity selected by the user with at least one entity included in the input video frame and then determining lips movement of the selected entity among at least one of the entities of the input video frame.

본 발명의 일 실시예로, 다양한 얼굴 인식 알고리즘은 입력된 비디오 프레임에 포함된 적어도 하나의 엔티티와 제1 엔티티의 매칭 여부를 검색하기 위해 얼굴 검출 서브 모듈에 의해 이용될 수 있다.In one embodiment of the invention, various face recognition algorithms can be used by the face detection submodule to search for matching of the first entity with at least one entity included in the input video frame.

235 단계에서, 입력된 비디오 프레임에 포함된 적어도 하나의 엔티티들 사이에 선택된 제1 엔티티는 입술 움직임이 있는지 여부가 분석된다. 선택된 엔티티의 입술 움직임은 입술 움직임 검출 서브 모듈에 의해 검출될 수 있다. 입술 움직임 검출 서브 모듈은 선택된 엔티티의 입술 움직임 존재 여부를 분석하기 위해 스피치 처리 기술이 이용될 수 있다.In step 235, a first entity selected between at least one of the entities included in the input video frame is analyzed for lips movement. The lip motion of the selected entity may be detected by the lip motion detection submodule. The lip motion detection submodule may use a speech processing technique to analyze whether the selected entity has lip motion.

본 발명의 제1 실시예로, 입술 움직임 검출 서브 모듈은 보이스 변환의 필요성을 판단한다. 입술 움직임 검출 서브 모듈은 입술 움직임의 존재 여부를 검출한다. 선택된 엔티티에서 입술 움직임이 검출되면, 입술 움직임 검출 서브 모듈은 보이스 변환의 수행하는 처리 과정을 시작한다. 그러나, 선택된 엔티티에서 입술 움직임이 검출되지 않으면, 입술 움직임 검출 서브 모듈은 보이스 변환을 수행하는 처리 과정을 바이패스한다.In a first embodiment of the present invention, the lip motion detection submodule determines the necessity of voice conversion. The lip motion detection sub-module detects the presence of lip motion. If lip motion is detected at the selected entity, the lip motion detection submodule begins the process of performing the voice conversion. However, if no lip motion is detected at the selected entity, the lip motion detection submodule bypasses the process of performing the voice conversion.

본 발명의 또 다른 실시예로, 선택된 엔티티의 입술 움직임을 검출하기 위해 다양한 알고리즘이 입술 움직임 검출 서브 모듈에 적용될 수 있다.In another embodiment of the present invention, various algorithms can be applied to the lip motion detection submodule to detect the lip motion of the selected entity.

240 단계에서, 선택된 엔티티의 보이스이 변환된다. 선택된 엔티티의 보이스은 보이스 제어부를 이용하여 변환될 수 있다. 보이스 변환은 선택된 엔티티 보이스에 제1 보이스 샘플을 교체하는 것을 포함한다, 일 예로, 제1 보이스 샘플은 보이스 데이터베이스에 저장될 수 있다. 보이스 데이터베이스는 복수의 보이스 샘플을 저장한다. 보이스 제어부는 선택된 엔티티의 보이스를 제1 보이스 샘플로 변환하기 위해 다양한 보이스 합성 기술을 이용할 수 있다.In step 240, the voice of the selected entity is transformed. The voices of the selected entities can be converted using the voice control unit. The voice transformation includes replacing the first voice sample with the selected entity voice. In one example, the first voice sample may be stored in a voice database. The voice database stores a plurality of voice samples. The voice control unit may use various voice synthesis techniques to convert the voice of the selected entity to the first voice sample.

본 발명의 일 실시예로, 입술 움직임 검출 서브 모듈은 선택된 엔티티의 보이스를 제1 보이스 샘플로 변환하도록 보이스 제어부를 작동시킬 수 있다. 일 예로, 입술 움직임 검출 서브 모듈은 인터럽트를 이용하여 작동시킬 수 있다. 입술 움직임 검출 서브 모듈은 인터럽트를 생성한다. 인터럽트는 보이스 제어부가 선택된 엔티티의 보이스를 제1 보이스 샘플로 변환할 수 있도록 한다. 또한, 보이스 변환은 특정 시간 동안 적용될 수 있다. 특정 시간은 보이스 변환이 발생하는 동안의 기간을 의미한다.In one embodiment of the invention, the lip motion detection submodule may activate the voice control unit to convert the voice of the selected entity to a first voice sample. In one example, the lip motion detection submodule may be enabled using an interrupt. The lip motion detection submodule generates an interrupt. The interrupt allows the voice control unit to convert the voice of the selected entity to a first voice sample. In addition, the voice conversion can be applied for a specific time. The specific time means the period during which the voice conversion occurs.

도 4는 본 발명의 일 실시예에 따른, 제1 프리젠테이션 모듈을 이용하여 엔티티를 선택하고 업데이트하는 방법을 설명하기 위한 흐름도이다.4 is a flowchart illustrating a method for selecting and updating an entity using a first presentation module, according to an embodiment of the present invention.

310 단계에서, 제1 비디오 프레임은 얼굴 검출 모듈의 입력으로서 수신된다. 비디오 프레임의 일 실시예로 동영상, 방송 스트림, 라이브 영상 및 비디오 클립이 포함될 수 있으나, 이에 한정되지 않는다. 또한, 제1 비디오 프레임은 복수의 엔티티를 포함한다. 엔티티는 제1 비디오 프레임에 포함된 캐릭터의 얼굴일 수 있다.In step 310, the first video frame is received as an input to the face detection module. An embodiment of a video frame may include, but is not limited to, moving pictures, broadcast streams, live images, and video clips. Also, the first video frame includes a plurality of entities. The entity may be the face of the character included in the first video frame.

315 단계에서, 제1 비디오 프레임은 얼굴 검출 모듈을 이용하여 캡쳐된다. 이미지 캡쳐 기술의 일 예로, 디지털 이미지 처리 기술 및 크로마키(chroma key) 기술이 포함될 수 있으나, 이에 한정되는 것은 아니다.In step 315, the first video frame is captured using the face detection module. An example of an image capture technique may include, but is not limited to, digital image processing techniques and chroma key techniques.

320 단계에서, 제1 비디오 프레임에 포함된 적어도 하나의 엔티티는 얼굴 검출 모듈에 의해 추출된다. 엔티티의 추출은 제1 비디오 프레임에 포함된 엔티티와 연관된 복수의 특유의 성질을 확인함으로써 이루어 질 수 있다. 특유의 성질의 일 예로 피부색, 모션, 크기, 모양 및 위치가 포함될 수 있으나, 이에 한정되는 것은 아니다. 또한, 제1 비디오 프레임에 포함된 적어도 하나의 엔티티를 캡쳐하기 위해 다양한 알고리즘이 이용될 수 있다.In step 320, at least one entity included in the first video frame is extracted by the face detection module. Extraction of an entity may be accomplished by ascertaining a plurality of unique properties associated with the entity contained in the first video frame. Examples of unique properties include, but are not limited to, skin color, motion, size, shape, and location. Also, various algorithms can be used to capture at least one entity included in the first video frame.

325 단계에서, 제1 비디오 프레임에 포함된 적어도 하나의 엔티티가 리스트로 표시된다. 제1 프레임에 포함된 적어도 하나의 엔티티가 포함된 리스트트 제1 프리젠테이션 모듈를 이용하여 수행될 수 있다. 제1 프리젠테이션 모듈은 제1 비디오 프레임에 포함된 적어도 하나의 엔티티르 디스플레이한다. 사용자는 제1 프리젠테이션 모듈을 이용하여 리스트에 표시된 적어도 하나의 엔티티로부터 특정 엔티티를 선택할 수 있다. 이하에서는, 사용자에 의해 선택된 특정 엔티티는 제1 엔티티라 언급하기로 한다.In step 325, at least one entity included in the first video frame is displayed as a list. The first presentation module including at least one entity included in the first frame. The first presentation module displays at least one entity included in the first video frame. The user may select a particular entity from at least one entity listed in the list using the first presentation module. Hereinafter, a specific entity selected by the user will be referred to as a first entity.

330 단계에서, 사용자는 제1 비디오 프레임에 포함된 적어도 하나의 엔티티로부터 제1 엔티티를 선택한다. 제1 비디오 프레임에 포함된 엔티티들은 제1 프리젠테이션 모듈을 이용하여 리스트로 표시될 수 있다. 제1 비디오 프레임에 포함된적어도 하나의 엔티티 중에 제1 엔티티를 선택하는 것은 사용자 인터페이스에 의해 선택될 수 있다. 이때, 사용자 인터페이스의 일 예로 GUI, 터치 스크린 및 명령 행 인터페이스 등이 포함될 수 있으나, 이에 한정되는 것은 아니다.In step 330, the user selects a first entity from at least one entity included in the first video frame. The entities included in the first video frame may be displayed as a list using the first presentation module. The selection of the first entity among the at least one entity included in the first video frame may be selected by the user interface. At this time, one example of the user interface may include a GUI, a touch screen, and an instruction line interface, but is not limited thereto.

335 단계에서, 제1 엔티티는 룩-업 테이블에 저장된다. 룩업 테이블은 제1 엔티티를 확인하기 위한 제1 식별자를 생성하도록 구현될 수 있다. 제1 식별자는 제1 엔티티를 표시한다. 이와 비슷하게, 복수의 제1 식별자는 제1 프리젠테이션 모듈에 포함된 대응되는 복수의 엔티티를 식별하기 위해 생성될 수 있다. 이때, 적어도 하나의 엔티티는 룩-업 테이블에 저장될 수 있다.In step 335, the first entity is stored in a look-up table. The lookup table may be implemented to generate a first identifier for identifying the first entity. The first identifier indicates the first entity. Similarly, a plurality of first identifiers may be generated to identify a corresponding plurality of entities contained in the first presentation module. At this time, at least one entity may be stored in a look-up table.

도 5는 본 발명의 일 실시예에 따른, 엔티티를 선택하기 위해, 룩업 테이블을 포함하는 사용자 인터페이스를 도시한 도면이다. 도 5에는 디스플레이부(405), 적어도 하나의 엔티티를 포함하는 비디오 프레임(410), 제1 엔티티(415), 제2 엔티티(420), 선택된 제1 엔티티(415)를 식별하기 위한 제1 식별자(430)가 저장된 룩-업 테이블이 포함된다.5 is a diagram illustrating a user interface including a lookup table for selecting an entity, in accordance with an embodiment of the present invention. 5 shows a display unit 405, a video frame 410 including at least one entity, a first entity 415, a second entity 420, a first identifier for identifying the selected first entity 415, And a look-up table in which the image data 430 is stored.

디스플레이부(405)는 비디오 프레임(410)에 포함된 적어도 하나의 엔티티르 디스플레이한다. 디스플레이부(405)의 일 예로, 컴퓨터, IPTV, VOD 및 인터넷 TV 등이 포함될 수 있으나, 이에 한정되는 것은 아니다. 적어도 하나의 엔티티는 얼굴 검출 모듈을 이용하여 비디오 프레임으로부터 검출된다. 비디오 프레임에서 검출된 적어도 하나의 엔티티는 제1 프리젠테이션 모듈을 이용하여 사용자 선택을 위해 리스트로 표시된다. 여기서, 제1 프리젠테이션 모듈에 의해 표시된 리스트는 사용자 선택을 위해, 제1 엔티티(415) 및 제 엔티티(420)를 포함한다. 일 예로, 사용자는 제1 엔티티(415)를 선택한다. 또 다른 예로 사용자는 제2 엔티티(420)를 선택할 수 있다. 사용자에 의해 선택된 제1 엔티티는 룩-업 테이블(425)에 저장된다. 룩-업 테이블(425)은 선택된 제1 엔티티(415)를 위한 제1 식별자(430)를 생성한다. 제1 식별자(430)는 선택된 제1 엔티티(415)를 나타낸다. 또한, 룩-업 테이블은 제2 엔티티를 나타내기 위해 다른 제1 식별자를 생성할 수 있다. 이와 비슷하게, 제1 프리젠테이션 모듈에 포함어 대응되는 복수의 엔티티를 나타내도록, 복수의 제1 식별자가 생성될 수 있다. 이때, 적어도 하나의 엔티티(425)는 룩-업 테이블에 저장된다.The display unit 405 displays at least one entity included in the video frame 410. Examples of the display unit 405 include a computer, an IPTV, a VOD, and an Internet TV, but the present invention is not limited thereto. At least one entity is detected from a video frame using a face detection module. At least one entity detected in the video frame is displayed as a list for user selection using the first presentation module. Here, the list displayed by the first presentation module includes a first entity 415 and a first entity 420 for user selection. As an example, the user selects the first entity 415. As another example, the user may select the second entity 420. The first entity selected by the user is stored in the look-up table 425. The look-up table 425 generates a first identifier 430 for the selected first entity 415. The first identifier 430 represents the selected first entity 415. In addition, the look-up table may generate another first identifier to represent the second entity. Similarly, a plurality of first identifiers may be generated to represent a plurality of entities corresponding to the first presentation module. At this time, at least one entity 425 is stored in a look-up table.

도 6은 본 발명의 일 실시예에 따른, 보이스 커스터마이징을 위한 보이스 샘플을 선택하는 방법을 설명하기 위한 도면이다. 보이스 서브 샘플러 모듈은 사용자에 의해 입력된 보이스 샘플을 처리한다. 사용자에 의해 입력된 보이스 샘플의 일 예로, 기록된 보이스 샘플, 샘플 보이스 및 사용자에 의해 입력된 보이스 샘플을 포함할 수 있으나, 이에 한정되는 것은 아니다.6 is a diagram for explaining a method of selecting a voice sample for voice customization according to an embodiment of the present invention. The voice subsampler module processes the voice samples input by the user. An example of a voice sample input by a user may include, but is not limited to, a recorded voice sample, a sample voice, and a voice sample input by the user.

510 단계에서, 사용자는 보이스 데이터베이스에 저장된 복수의 기 처리된 보이스 샘플들 중에서 보이스 출력을 선택하기 위한 옵션이 제공될 수 있다. 기처리된 보이스는 내장된 보이스 샘플일 수 있다. 내장된 보이스 샘플은 보이스 데이터베이스에 저장된다. 전형적으로, 내장된 보이스 샘플은 서비스 제공자에 의해 제공된다. 사용자가 보이스 커스터마이징을 위해 기처리된 보이스를 선택하길 원하는 경우, 사용자는 525 단계에 개시된 바와 같이, 보이스 데이터베이스에 저장된 복수의 보이스 샘플들 중에서 기처리된 보이스 샘플 출력을 선택할 수 있다. 또한, 사용자가 보이스 커스터마이징을 위해 기처리된 보이스 샘플을 사용하길 원하지 않는 경우, 사용자는 보이스 커스터마이징을 위해 기록된 보이스 샘플을 이용할 수 있다. 515 단계에서, 사용자는 기록 모듈을 이용하여 기록된 보이스 샘플을 이용할 수 있다. 사용자가 보이스 커스터마이징을 위해 기록된 보이스 샘플을 사용하길 원한다면, 530 단계에 도시된 바와 같이, 기록 처리 과정이 시작된다. 또한, 사용자가 보이스 커스터마이징을 위해 기록된 보이스 샘플을 원하지 않는 경우, 사용자는 520 단계에 도시된 바와 같이, 보이스 커스터마이징을 위해 이용될 수 있는 보이스 샘플을 입력할 수 있다. 또한, 535 단계에서, 기록된 보이스 샘플은 보이스 서브 샘플러 모듈을 이용하여 처리된다. 보이스 서브 샘플러 모듈은 기록된 보이스 샘플에 포함된 랜덤 노이즈 및 양자화 노이즈와 같은 다양한 노이즈를 제거한다. 보이스 서브 샘플러 모듈은 540 단계에 도시된 바와 같이, 보이스 데이터베이스에 저장하기 앞서 평활 필터로 기록된 보이스 샘플을 인가함으로써, 기록된 보이스 샘플의 음질을 향상시킬 수 있다. 540 단계에서, 기록된 보이스 샘플은 보이스 데이터베이스에 저장된다.In step 510, the user may be provided with an option to select a voice output from among a plurality of pre-processed voice samples stored in the voice database. The preprocessed voice may be a built-in voice sample. The built-in voice samples are stored in the voice database. Typically, a built-in voice sample is provided by a service provider. If the user desires to select a preprocessed voice for voice customization, the user may select a preprocessed voice sample output from a plurality of voice samples stored in the voice database, as described in step 525. [ Also, if the user does not want to use the preprocessed voice samples for voice customization, the user can use the recorded voice samples for voice customization. In step 515, the user can use the recorded voice samples using the recording module. If the user wishes to use the recorded voice samples for voice customization, the recording process begins, as shown in step 530. [ Also, if the user does not want the recorded voice samples for voice customization, the user may enter voice samples that may be used for voice customization, as shown in step 520. [ Also, in step 535, the recorded voice samples are processed using a voice subsampler module. The voice subsampler module removes various noises such as random noise and quantization noise included in the recorded voice samples. The voice subsampler module may enhance the sound quality of the recorded voice samples by applying a voice sample recorded with a smoothing filter prior to storing in the voice database, In step 540, the recorded voice samples are stored in the voice database.

또한, 520 단계와 같이, 보이스 커스터마이징을 위한 사용자에 의해 입력된 보이스 샘플은 535 단계에 도시된 보이스 서브 샘플러 보듈을 이용하여 처리된다. 540 단계에서, 사용자에 의해 입력된 보이스 샘플은 보이스 데이터베이스에 저장된다.Also, as in step 520, the voice samples input by the user for voice customization are processed using the voice subsampler module shown in step 535. In step 540, the voice samples input by the user are stored in the voice database.

도 7은 본 발명의 발명의 일 실시예에 따른, 보이스 샘플을 선택하기 위해, 룩업 테이블을 포함하는 사용자 인터페이스를 도시한 도면이다. 도 6에는 디스플레이부(605), 기록 모듈(610) 및 룩-업 테이블(640)이 포함된다.7 is a diagram illustrating a user interface including a look-up table for selecting a voice sample, in accordance with an embodiment of the present invention. 6, a display unit 605, a recording module 610, and a look-up table 640 are included.

디스플레이부(605)는 비디오 프레임에 포함된 적어도 하나의 엔티티를 디스플레이한다. 디스플레이부의 일 예로, 컴퓨터, IPTV, VOD 및 인터넷 TV 등이 포함될 수 있으나, 이에 한정되는 것은 아니다. 사용자는 비디오 프레임에서 포함된 적어도 하나의 엔티티로부터 엔티티를 선택한다. 본 발명의 일 실시예로, 선택은 커서를 드래그하여 선택된 엔티티에 위치하게 하거나, 키보드를 이용하여 입력하거나, 터치 패드를 이용하여 수행될 수 있다. 예를 들어, 선택된 엔티티는 635에 도시된 바와 같이, 비디오 프레임에 포함된 캐릭터일 수 있다. 선택된 엔티티(635)는 룩-업 테이블(640)에 저장된다. 또한, 룩업 테이블(640)은 제1 식별자(645)를 생성한다. 제1 식별자(645)는 선택된 엔티티(635)를 배타적으로 표시한다. 이와 비슷하게, 대응되는 복수의 선택된 엔티티들에 대한 복수의 제1 식별자는 룩-업 테이블(640)에 저장된다. 일 예로, 사용자가 기록 모듈(610)을 이용하여 보이스 샘플을 기록하길 원한다. 선택된 엔티티(635)의 보이스 커스터마이징을 위해 이용가능한 보이스 샘플은 로봇 보이스 샘플(615), 유명 인사의 보이스 샘플(620), 및 베이비 보이스 샘플(625)을 포함할 수 있으나, 이에 한정되는 것은 아니다. 상술한 바와 같은 보이스 샘플(625)은 보이스 데이터베이스에 저장된다. 보이스 데이터베이스에 저장된 각 보이스 샘플은 각각을 개별적으로 표시하는 제2 식별자를 포함한다. 상술한 바와 같은 보이스 샘플의 선택은 사용자의 보이스 변환의 목적을 위해 보이스 데이터베이스에 저장된다. 보이스 커스터마이징을 위한 보이스 샘플이 선택되면, 선택된 보이스 샘플에 대응되는 제2 식별자(630) 는 룩-업 테이블(640)에 저장된다. 제2 식별자(630)는 보이스 데이터베이스로부터 보이스 샘플을 불러내기 위해 이용된다. 선택된 보이스 샘플은 선택된 엔티티의 보이스에 대한 커스터마이징을 위해 이용될 수 있다.The display unit 605 displays at least one entity included in the video frame. Examples of the display unit include, but are not limited to, a computer, IPTV, VOD, and Internet TV. The user selects an entity from at least one entity included in the video frame. In one embodiment of the present invention, the selection may be performed by dragging the cursor to the selected entity, inputting it using the keyboard, or using the touchpad. For example, the selected entity may be a character included in a video frame, as shown at 635. The selected entities 635 are stored in the look-up table 640. The lookup table 640 also generates a first identifier 645. The first identifier 645 exclusively represents the selected entity 635. Similarly, a plurality of first identifiers for a corresponding plurality of selected entities are stored in a look-up table 640. As an example, the user desires to record a voice sample using the recording module 610. The available voice samples for voice customization of the selected entity 635 may include, but are not limited to, a robot voice sample 615, a celebrity voice sample 620, and a baby voice sample 625. The voice samples 625 as described above are stored in the voice database. Each voice sample stored in the voice database includes a second identifier that individually displays each. The selection of the voice samples as described above is stored in the voice database for the purpose of voice conversion of the user. When a voice sample for voice customization is selected, the second identifier 630 corresponding to the selected voice sample is stored in the look-up table 640. A second identifier 630 is used to retrieve a voice sample from the voice database. The selected voice samples can be used for customizing the voice of the selected entity.

도 8은 본 발명의 일 실싱예에 따른, 핵심처리모듈을 이용한 보이스의 커스터마이징 방법을 설명하기 위한 흐름도이다. 710 단계에서, 핵심 처리 모듈은 입력 비디오 프레임을 수신한다. 입력 비디오 프레임은 동영상, 비디오 클립 및 방송 스트림의 장면을 포함할 수 있으나, 이에 한정되는 것은 아니다. 715 단계에서, 핵심 처리 모듈은 사용자가 보이스 커스터마이징을 원하는지 여부를 판단한다. 사용자가 보이스 커스터마이징을 원한다면, 입력 비디오 프레임은 720 단계에 도시된 바와 같이, 분석된다. 그러나, 사용자가 보이스 커스터마이징을 원하지 않는 경우, 핵심 처리 모듈은 보이스 커스터마이징을 위한 처리 과정을 바이패스한다.8 is a flowchart illustrating a method of customizing a voice using a core processing module according to an embodiment of the present invention. In step 710, the core processing module receives the input video frame. The input video frame may include, but is not limited to, scenes of moving pictures, video clips, and broadcast streams. In step 715, the core processing module determines whether the user desires voice customization. If the user desires voice customization, the input video frame is analyzed, as shown in step 720. However, if the user does not want voice customization, the core processing module bypasses the process for voice customization.

720 단계에서, 입력된 비디오 프레임은 분석된다. 핵심 처리 모듈은 입력 비디오 프레임에 포함된 적어도 하나의 엔티티를 캡쳐함으로써, 비디오 프레임을 분석한다. 적어도 하나의 엔티티를 캡쳐하는 것은 얼굴 검색 서브 모듈을 이용하여 수행될 수 있다. 얼굴 검색 서브 모듈은 입력 비디오 프레임에 포함된 복수의 엔티티와 연관된 다양한 특유의 성질을 확인함으로써, 엔티티를 캡쳐할 수 있다. 특유의 성질의 일 예로, 피부색, 모션, 크기, 모양 및 위치가 포함될 수 있으나, 이에 한정되는 것은 아니다. 또한, 입력 비디오 프레임에 포함된 적어도 하나의 엔티티를 캡쳐하기 위하여, 다양한 알고리즘이 적용될 수 있다.In step 720, the input video frame is analyzed. The core processing module analyzes the video frame by capturing at least one entity contained in the input video frame. Capture of at least one entity may be performed using a face search submodule. The facial search submodule can capture entities by identifying various unique properties associated with a plurality of entities contained in an input video frame. Examples of unique properties include, but are not limited to, skin color, motion, size, shape and location. Also, various algorithms may be applied to capture at least one entity included in the input video frame.

725 단계에서, 핵심 처리 모듈은 룩-업 테이블에 저장된 사용자에 의해 선택된 제1 엔티티를 불러내기 위해 제1 쿼리를 생성한다. 룩-업 테이블에 저장된 사용자에 의해 선택된 제1 엔티티는 얼굴 검색 서브 모듈의 입력으로 제공된다.In step 725, the core processing module creates a first query to retrieve the first entity selected by the user stored in the look-up table. The first entity selected by the user stored in the look-up table is provided as input to the facial search submodule.

730 단계에서, 핵심 처리모듈은 룩-업 테이블에 저장된 제1 엔티티 및 입력 비디오 프레임에 저장된 복수의 엔티티 사이의 매칭 여부를 검색한다. 룩-업 테이블에 저장된 제1 엔티티 및 입력 비디오 프레임에 저장된 복수의 엔티티 사이의 매칭 여부는 얼굴 검색 서브 모듈을 이용하여 검색될 수 있다.In step 730, the core processing module retrieves a match between the first entity stored in the look-up table and the plurality of entities stored in the input video frame. The matching between the first entity stored in the look-up table and the plurality of entities stored in the input video frame can be retrieved using the face search submodule.

735 단계에서, 핵심 처리 모듈은 룩-업 테이블에 저장된 제1 엔티티 및 입력 비디오 프레임에 저장된 복수의 엔티티 사이의 매칭 여부에 대하여 판단한다. 매칭이 검색되면, 핵심 처리 모듈은 선택된 엔티티의 입술 움직임이 존재하는지 여부를 판단하기 위해 입력 비디오 영상을 체크한다. 그러나, 매칭이 검색되지 않으면, 765 단계에 도시된 바와 같이,보이스 커스터마이징을 위한 과정들은 바이패스 된다.In step 735, the core processing module determines whether the first entity stored in the look-up table and the plurality of entities stored in the input video frame are matched. Once a match is found, the core processing module checks the input video image to determine whether the lip motion of the selected entity is present. However, if a match is not found, the steps for voice customization are bypassed, as shown in step 765.

740 단계에서, 핵심 처리 모듈은 대응되는 선택된 엔티티의 입술 움직임의 존재 여부를 판단하기 위해 입력 비디오 프레임을 확인한다. 입술 움직임 검출 서브 모듈은 선택된 엔티티의 입술 움직임을 판단하는데 이용된다. 입력 비디오 프레임의 선택된 엔티티의 입술 움직임이 있는지 여부를 판단한 후, 보이스 제어부를 위해 인터럽트가 생성된다. 그러나, 선택된 엔티티의 입술 움직임이 없다고 판단되면, 765 단계에 도시된 바와 같이,보이스 커스터마이징을 위한 과정들은 바이패스 된다.In step 740, the core processing module checks the input video frame to determine whether there is a lip motion of the corresponding selected entity. The lip motion detection submodule is used to determine the lips movement of the selected entity. After determining whether there is a lips movement of the selected entity of the input video frame, an interrupt is generated for the voice control. However, if it is determined that there is no lips movement of the selected entity, the steps for voice customization are bypassed, as shown in step 765.

745 단계에서, 입술 움직임 검출 서브 모듈은 보이스 제어부에 전송할 인터럽트를 생성한다. 인터럽트는 선택된 엔티티에 대한 보이스 커스터마이징을 구현하기 위한 신호로써 생성된다. 인터럽트는 입력 비디오 프레임에서 선택된 엔티티에 대한 입술 움직임의 존재 여부에 따라 보이스 제어부로 전달하기 위해 생성된다.In step 745, the lip motion detection submodule generates an interrupt to be transmitted to the voice control unit. The interrupt is generated as a signal to implement voice customization for the selected entity. Interrupts are generated for delivery to the voice control according to the presence or absence of lips movement for the selected entity in the input video frame.

750 단계에서, 보이스 제어부는 제2 식별자를 불러내기 위해 룩-업 테이블로 전송할 제2 쿼리를 생성한다. 제2 식별자는 선택된 보이스를 나타낸다. 선택된 보이스는 선택된 엔티티를 위한 보이스의 커스터마이징에 이용된다. 제2 쿼리는 선택된 보이스를 나타내는 제2 식별자의 전송을 위해 이용된다.In step 750, the voice control unit generates a second query to be sent to the look-up table to recall the second identifier. The second identifier indicates the selected voice. The selected voice is used for customizing the voice for the selected entity. The second query is used for transmission of the second identifier representing the selected voice.

755 단계에서, 제3 쿼리는 제2 식별자를 이용하여 선택된 보이스 샘플을 불러내기 위하여, 보이스 데이터베이스로 전송되기 위해 생성된다. 보이스 데이터베이스는 보이스 커스터마이징을 위한 복수의 보이스 샘플을 저장한다. 보이스 데이터베이스에 저장된 각각의 보이스 샘플은 대응되는 복수의 제2 식별자와 연관된다. 그리고, 제2 식별자와 연관된 복수의 보이스 샘플은 보이스 데이터베이스에 저장된다. 제3 쿼리는 보이스 데이터베이스로부터 선택된 보이스 샘플의 전송을 위해 이용된다.In step 755, the third query is generated to be transmitted to the voice database to retrieve the selected voice samples using the second identifier. The voice database stores a plurality of voice samples for voice customization. Each voice sample stored in the voice database is associated with a corresponding plurality of second identifiers. Then, a plurality of voice samples associated with the second identifier are stored in the voice database. The third query is used for transmission of the selected voice samples from the voice database.

760 단계에서, 선택된 엔티티의 보이스는 선택된 보이스 샘플로 변경된다. 보이스 제어부는 선택된 엔티티의 보이스를 선택된 보이스 샘플로 변경한다. 보이스 제어부에 의해 수행되는 보이스 변경은 음색, 음높이와 같은 보이스의 특정이 변경되는 것을 포함한다. In step 760, the voice of the selected entity is changed to the selected voice sample. The voice control unit changes the voice of the selected entity to the selected voice sample. The voice change performed by the voice control section includes changing the specification of voice such as tone color and pitch.

이상에서는 본 발명의 바람직한 실시예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is clearly understood that the same is by way of illustration and example only and is not to be construed as limiting the scope of the invention as defined by the appended claims. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

10: 영상 입력부 20: 검출부
30: 비디오 처리부 40: 오디오 처리부
50: 저장부 60: 오디오 출력부
70: 디스플레이부 80: 사용자 인터페이스부
90: 제어부10: image input unit 20:
30: video processing unit 40: audio processing unit
50: storage unit 60: audio output unit
70: display unit 80: user interface unit
90:

Claims

A method of converting a voice of a display device,
Detecting, when a first video frame is input, at least one entity included in the first video frame;
If one of the detected entities is selected, storing the selected entity;
Displaying a plurality of previously stored voice samples in one area of the display screen when the entity is selected;
If one of the pre-stored plurality of voice samples is selected, matching the selected voice sample to the selected entity and storing the selected voice sample;
And converting a voice of the selected entity into the selected voice sample when the second video frame including the selected entity is input, and outputting the selected voice sample.

The method according to claim 1,
The entity comprising:
A face of a person included in a video frame,
Wherein the detecting comprises:
Wherein a face of the person included in the video frame is detected using at least one of the skin color, motion, size, shape and position of the at least one entity through the face detection module.

The method according to claim 1,
Further comprising: when the at least one entity is detected in the input video frame, displaying the detected entity in a list in one area of the display screen.

delete

The method according to claim 1,
Wherein storing the selected entity comprises:
Storing a first identifier corresponding to the selected entity in a look-up table,
Wherein storing the selected voice samples comprises:
And a second identifier corresponding to the selected voice sample is stored in a look-up table.

The method according to claim 1,
Wherein the plurality of voice samples comprise:
A voice sample, a recorded voice sample, and a voice sample input by a user, the voice sample being embedded in the display device,
The recorded voice samples and the voice samples input by the user,
Wherein the voice signal is filtered by a voice subsampler module.

3. The method of claim 2,
Wherein the outputting step comprises:
And determining whether the selected entity is included in the second video frame.

The method according to claim 1,
Wherein the outputting step comprises:
Determining whether the selected entity included in the second video frame has a lips movement;
And converting the voice of the entity into the selected voice sample if the lips movement of the entity is determined to exist.

In the display device,
A detector for detecting at least one entity included in the first video frame when the first video frame is input;
A user interface unit for selecting an entity to perform voice conversion among the detected entities and receiving a voice sample matching the selected entity;
A storage for storing the selected entity and the selected voice samples; And
And a controller for converting a voice of the selected entity into the selected voice sample when the second video frame including the selected entity is input,
Wherein,
And controls the display unit to display a plurality of previously stored voice samples in one area of the screen when the entity is selected.

10. The method of claim 9,
The entity comprising:
A face of a person included in a video frame,
Wherein:
Wherein at least one of the skin color, motion, size, shape, and position of the at least one entity is used to detect a face of a person included in the video frame through the face detection module.

10. The method of claim 9,
A video processing unit for processing the input video frame;
An audio processing unit for processing an audio signal corresponding to the input video frame;
A display unit for outputting video frames processed by the video processing unit on a screen; And
And an audio output unit for outputting the audio signal processed by the audio processing unit in synchronization with the video frame,
Wherein,
And controls the audio processing unit to convert a voice of the selected entity into the selected voice sample and provide the selected voice sample to the audio output unit.

12. The method of claim 11,
Wherein,
And controls the display unit to display the detected entity in a list on one area on the screen when the at least one entity is detected in the input video frame.

delete

10. The method of claim 9,
Wherein,
And stores a first identifier corresponding to the selected entity and a second identifier corresponding to the selected voice sample in a look-up table.

10. The method of claim 9,
Wherein,
A voice sample stored in the display device, a recorded voice sample, and a voice sample input by the user,
Wherein the selected voice sample comprises:
A voice sample recorded by the user, and a voice sample input by the user.

16. The method of claim 15,
The recorded voice samples and the voice samples input by the user,
And the signal is filtered by a voice subsampler module.

11. The method of claim 10,
Wherein,
Wherein the face detection submodule is used to search for and determine whether the selected entity is present in a face of a person included in the second video frame.

12. The method of claim 11,
Wherein,
And determining whether or not there is lip movement of the selected entity included in the second video frame. When it is determined that lip movement of the entity is present, the voice of the entity is converted into the selected voice sample and outputted. / RTI >