KR20160093404A

KR20160093404A - Method and Apparatus for Multimedia Contents Service with Character Selective Audio Zoom In

Info

Publication number: KR20160093404A
Application number: KR1020150014375A
Authority: KR
Inventors: 임우택
Original assignee: 한국전자통신연구원
Priority date: 2015-01-29
Filing date: 2015-01-29
Publication date: 2016-08-08

Abstract

The present invention relates to a multimedia content service method capable of providing audio zoom-in and a device thereof which separate a sound source from an audio signal in connection with a user input such as selection or zoom-in of a specific character where a face of a person, an animal or the like is detected or enhance only a sound source part such as a speech and the like of a corresponding character through a sub-channel audio signal to listen to the sound source part.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a method and apparatus for multimedia content service,

본 발명은 멀티미디어 콘텐츠 서비스 방법 및 장치에 관한 것으로서, 특히, 멀티미디어 콘텐츠 재생 화면에서 얼굴이 검출된 특정 캐릭터의 선택이나 줌인 등을 통하여 해당 대사 부분을 강화하여 청취할 수 있도록 오디오 줌인을 제공하는 멀티미디어 콘텐츠 서비스 방법 및 장치에 관한 것이다. The present invention relates to a multimedia content service method and apparatus, and more particularly, to a multimedia content service method and apparatus for providing multimedia zoom contents in which a user can listen to a specific part of a multimedia content through a selection or zoom- Service method and apparatus.

TV 프로그램과 같은 멀티미디어 콘텐츠 시청 시, 배경음(background music) 등 주변음으로 인해, 사용자가 배우의 대사나 나레이션과 같은 프로그램 이해에 필요한 오디오를 청취하는데 저해를 받게 된다. 특히, 노인 인구가 급증하고 있는 현재, 퇴행성 질환 중 하나인 노인성 난청의 문제가 심각해지고 있다. 인간의 청력은 고령에 접어들수록 소리를 인지하는데 어려움을 겪는다. 나이가 들수록 가청 밴드가 손상되고, 그에 따라 고령자의 경우 TV 프로그램 시청 시 대사 인지에 어려움을 겪는 경우가 있다. 이렇게 대사 인지에 어려움을 겪는 경우 중 하나의 예는 BGM, 사운드 효과 등의 배경 소리가 클 경우이다.When watching multimedia contents such as a TV program, the user is hindered from listening to the audio required for program understanding such as an actor's metabolism or narration due to surrounding sounds such as background music. Especially, as the elderly population is rapidly increasing, the problem of senile deafness, one of the degenerative diseases, is becoming serious. Human hearing is more difficult to perceive sound as it gets older. As the age increases, the audiovisual band is damaged. As a result, elderly people may have difficulty in recognizing their metabolism when watching TV programs. One example of this difficulty in recognizing the ambiguity is when the background sound of BGM, sound effects, etc. is large.

그러나, 현재의 멀티미디어 서비스에서는 대사와 주변 음의 재생 볼륨을 별도로 재생할 수 없으며, 두 소리가 합해진 음량만을 조절할 수 있다. 따라서, 오디오 신호처리를 통해 대사 소리만을 조절하여 재생하는 방법도 연구되고 있다. 그러나, 종래의 방법은 실제 대사가 없는 부분에서도 음원 분리와 같은 방법을 적용하기 때문에 연산량이 많으며, 불필요한 부분에서도 이와 같은 연산으로 인해 연산에 있어서의 부하가 크다는 단점이 있다.However, in the current multimedia service, the reproduction volume of ambassador and ambient sound can not be separately reproduced, and only the volume of the two sounds can be adjusted. Therefore, a method of regulating and reproducing only the metabolism sound through the audio signal processing has been studied. However, the conventional method is disadvantageous in that it requires a large amount of computation because a method such as a sound source separation is applied even in a part without actual metabolism, and a load in computation is large due to such an operation even in an unnecessary part.

따라서, 본 발명은 상술한 문제점을 해결하기 위하여 안출된 것으로, 본 발명의 목적은, 멀티미디어 콘텐츠 재생 화면에서 사람, 동물 등 얼굴이 검출된 특정 캐릭터의 선택이나 줌인 등 사용자 입력에 연동하여 오디오 신호에서의 음원 분리나 부채널 오디오 신호를 통해 해당 캐릭터의 대사 등 음원 부분만을 강화하여 청취할 수 있도록 오디오 줌인을 제공할 수 있는 멀티미디어 콘텐츠 서비스 방법 및 장치를 제공하는 데 있다. SUMMARY OF THE INVENTION It is therefore an object of the present invention to provide an apparatus and a method for displaying a multimedia content on a display screen, Which is capable of providing audio zoom-in so that only a sound source portion such as a metabolism of a character can be enhanced through a sound source separation or a sub-channel audio signal.

먼저, 본 발명의 특징을 요약하면, 상기의 목적을 달성하기 위한 본 발명의일면에 따른 멀티미디어 콘텐츠 서비스 장치에서 멀티미디어 콘텐츠 서비스 방법은, 비디오 데이터와 오디오 데이터를 포함하는 멀티미디어 콘텐츠 데이터로부터 해당 재생 영상 신호 중 캐릭터를 포함하는 프레임에 대하여 사용자 입력을 수신하는 단계; 하나 이상의 관심 캐릭터 설정을 위한 상기 사용자 입력이 이루어지면, 재생 오디오 신호 중에서 설정된 해당 관심 캐릭터에 대응된 대사음 신호를 추출하는 단계; 및 상기 재생 오디오 신호 중에서 배경음 신호에 대한 상기 대사음 신호의 상대적인 크기를 조절하여 상기 재생 오디오 신호에 합성하면서, 상기 멀티미디어 콘텐츠 데이터를 디스플레이 장치와 오디오 장치에 재생하는 단계를 포함한다.According to another aspect of the present invention, there is provided a multimedia content service method in a multimedia content service apparatus, the method comprising: receiving multimedia content data including video data and audio data, Receiving a user input for a frame comprising a middle character; Extracting a metabolic sound signal corresponding to a corresponding interest character set in the reproduced audio signal when the user input for one or more interest character setting is performed; And reproducing the multimedia content data on the display device and the audio device while adjusting the relative size of the metabolic sound signal with respect to the background sound signal among the reproduced audio signals and combining the reproduced audio signal with the reproduced audio signal.

상기 사용자 입력으로서, 상기 캐릭터가 포함된 재생 화면에서 사용자가 관심 캐릭터 얼굴을 줌인, 재생 화면에 제공된 인터페이스를 통해 하나 이상의 관심 캐릭터를 선택, 또는 재생 화면에 제공된 인터페이스를 통해 하나 이상의 관심 캐릭터를 포함하도록 영역 범위 지정 등으로 이루어질 수 있다.The user input may include one or more characters of interest through the interface provided on the playback screen, a user may zoom in on a character face of interest on the playback screen including the character, Area range designation, and the like.

상기 멀티미디어 콘텐츠 데이터는, 상기 멀티미디어 콘텐츠 서비스 장치에 저장된 콘텐츠 파일, 인터넷 상에서 다운로드나 스트리밍을 통해 수신하는 데이터, 또는 지상파나 위성파을 통해 수신하는 방송 데이터 스트림을 포함한다. The multimedia content data includes a content file stored in the multimedia content service apparatus, data received via downloading or streaming on the Internet, or a broadcast data stream received through terrestrial waves or satellite waves.

상기 대사음 신호의 상대적인 크기를 조절하기 위하여, 미리 해당 조절량이 설정되어 있거나, 사용자가 설정 가능한 볼륨 크기 조절을 위한 인터페이스를 재생 화면에 제공할 수 있다.In order to adjust the relative size of the metabolic sound signals, a corresponding adjustment amount may be set in advance or an interface for volume size adjustment that can be set by the user may be provided on the reproduction screen.

그리고, 본 발명의 다른 일면에 따른 멀티미디어 콘텐츠 서비스 장치는, 비디오 데이터와 오디오 데이터를 포함하는 멀티미디어 콘텐츠 데이터로부터 해당 재생 영상 신호 중 캐릭터를 포함하는 프레임에 대하여 사용자 입력을 위한 인터페이스를 재생 화면에 제공하는 캐릭터 검출부; 하나 이상의 관심 캐릭터 설정을 위한 상기 사용자 입력이 이루어지면, 재생 오디오 신호 중에서 설정된 해당 캐릭터에 대응된 대사음 신호를 추출하며, 상기 재생 오디오 신호 중에서 배경음 신호에 대한 상기 대사음 신호의 상대적인 크기를 조절하여 상기 재생 오디오 신호에 합성하는 오디오 처리부; 및 상기 멀티미디어 콘텐츠 데이터를 디스플레이 장치와 오디오 장치에 재생하기 위한 콘텐츠 재생부를 포함한다. According to another aspect of the present invention, there is provided a multimedia content service apparatus for providing a playback screen with an interface for user input on a frame including a character in a corresponding playback video signal from multimedia content data including video data and audio data A character detection unit; Extracting a metabase signal corresponding to the set character from the reproduced audio signal and adjusting a relative size of the metabolic sound signal with respect to the background sound signal from the reproduced audio signal, An audio processing unit for synthesizing the reproduced audio signal; And a content playback unit for playing back the multimedia content data on a display device and an audio device.

본 발명에 따른 오디오 줌인을 제공하는 멀티미디어 콘텐츠 서비스 방법 및 장치에 따르면, 멀티미디어 콘텐츠 재생 화면에서 사람, 동물 등 얼굴을 검출하고, 사용자가 특정 캐릭터의 선택이나 줌인 등 관심 인물의 위치나 범위 설정 이벤트에 연동하여, 오디오 신호에서의 음원 분리나 부채널 오디오 신호를 통해 해당 캐릭터의 대사 등 음원 부분만을 강화하여 BGM, 음향 효과 등의 배경 소리에 우선하여 출력함으로써, 대사 인지 어려움 등을 해결할 수 있다.According to the multimedia content service method and apparatus providing the audio zoom-in according to the present invention, a face such as a person or an animal is detected on a multimedia content playback screen, and a user can select a character, It is possible to solve the problems such as difficulty in dialogue by enhancing only the sound source portion such as the metabolism of the character through the sound source separation in the audio signal or the subchannel audio signal so as to output prior to the background sound such as the BGM and the sound effect.

도 1은 본 발명의 일 실시예에 따른 멀티미디어 콘텐츠 서비스 장치를 설명하기 위한 도면이다.
도 2는 도 1의 멀티미디어 콘텐츠 서비스 장치의 동작 설명을 위한 흐름도이다.
도 3은 본 발명의 일 실시예에 따른 멀티미디어 콘텐츠 서비스 장치에서 디스플레이 화면에 표시되는 관심 캐릭터 선택을 위한 인터페이스의 일례를 나타낸다.1 is a view for explaining a multimedia contents service apparatus according to an embodiment of the present invention.
2 is a flowchart illustrating an operation of the multimedia contents service apparatus of FIG.
3 shows an example of an interface for selecting a character of interest displayed on a display screen in a multimedia contents service apparatus according to an embodiment of the present invention.

이하에서는 첨부된 도면들을 참조하여 본 발명에 대해서 자세히 설명한다. 이때, 각각의 도면에서 동일한 구성 요소는 가능한 동일한 부호로 나타낸다. 또한, 이미 공지된 기능 및/또는 구성에 대한 상세한 설명은 생략한다. 이하에 개시된 내용은, 다양한 실시 예에 따른 동작을 이해하는데 필요한 부분을 중점적으로 설명하며, 그 설명의 요지를 흐릴 수 있는 요소들에 대한 설명은 생략한다. 또한 도면의 일부 구성요소는 과장되거나 생략되거나 또는 개략적으로 도시될 수 있다. 각 구성요소의 크기는 실제 크기를 전적으로 반영하는 것이 아니며, 따라서 각각의 도면에 그려진 구성요소들의 상대적인 크기나 간격에 의해 여기에 기재되는 내용들이 제한되는 것은 아니다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same components are denoted by the same reference symbols as possible. In addition, detailed descriptions of known functions and / or configurations are omitted. The following description will focus on the parts necessary for understanding the operation according to various embodiments, and a description of elements that may obscure the gist of the description will be omitted. Also, some of the elements of the drawings may be exaggerated, omitted, or schematically illustrated. The size of each component does not entirely reflect the actual size, and therefore the contents described herein are not limited by the relative sizes or spacings of the components drawn in the respective drawings.

도 1은 본 발명의 일 실시예에 따른 멀티미디어 콘텐츠 서비스 장치(100)를 설명하기 위한 도면이다. FIG. 1 is a view for explaining a multimedia contents service apparatus 100 according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 멀티미디어 콘텐츠 서비스 장치(100)는, 제어부(110), 메모리(111), 사용자 인터페이스(112), 콘텐츠 수신부(120), 캐릭터 검출부(130), 대사음 추출부(141)와 대사음 합성부(142)를 포함하는 오디오 처리부(140), 및 콘텐츠 재생부(130)를 포함한다. 이와 같은 멀티미디어 콘텐츠 서비스 장치(100)의 각부 구성요소들은 하드웨어, 소프트웨어, 또는 이들의 결합으로 구현될 수 있다.Referring to FIG. 1, a multimedia contents service apparatus 100 according to an exemplary embodiment of the present invention includes a controller 110, a memory 111, a user interface 112, a contents receiver 120, a character detector 130, An audio processing unit 140 including a metabolism sound extracting unit 141 and a metabolic sound synthesizing unit 142 and a content reproducing unit 130. Each component of the multimedia contents service apparatus 100 may be realized by hardware, software, or a combination thereof.

제어부(110)는 위와 같은 멀티미디어 콘텐츠 서비스 장치(100)의 각부 구성요소들의 전반적인 제어를 담당한다. 제어부(110)는 반도체 프로세서로 구현될 수 있으며, 위와 같은 멀티미디어 콘텐츠 서비스 장치(100)의 각부 구성요소들 중 어느 하나 이상의 기능을 포함하도록 구현될 수도 있다. The controller 110 controls the overall components of the multimedia contents service apparatus 100 as described above. The controller 110 may be implemented as a semiconductor processor and may include any one or more of the components of the multimedia contents service apparatus 100. [

메모리(111)에는 멀티미디어 콘텐츠 서비스 장치(100)의 동작에 필요한 각종 설정 정보, 사용자가 사용자 인터페이스(112)를 통해 입력하는 설정 정보 등이 저장되며, 또한, 경우에 따라 콘텐츠 재생부(130)를 통해 재생될 멀티미디어 콘텐츠 데이터로서 파일 형태의 멀티미디어 콘텐츠 파일 또는 스트리밍 데이터 형태로 수신하는 버퍼링 데이터 등이 저장될 수 있다. Various kinds of setting information necessary for the operation of the multimedia contents service apparatus 100 and setting information inputted by the user through the user interface 112 are stored in the memory 111. In addition, A multimedia content file in the form of a file or buffering data to be received in the form of streaming data may be stored as the multimedia content data to be reproduced.

사용자 인터페이스(112)는 사용자가 필요한 데이터의 설정 등을 위해 입력 수단으로서 각종 버튼이나 키 등을 포함할 수 있다. 또한, 콘텐츠 재생부(130)가 멀티미디어 콘텐츠 데이터를 디스플레이 장치(도시되지 않음)와 오디오 장치(도시되지 않음)에 재생할 때, 디스플레이 장치의 화면에 구비된 터치 스크린의 구동을 위한 하드웨어, 소프트웨어 등도, 사용자 인터페이스(112)에 포함될 수 있다. The user interface 112 may include various buttons and keys as input means for the user to set necessary data. Further, hardware, software, and the like for driving the touch screen provided on the screen of the display device, when the content playback unit 130 reproduces the multimedia content data to a display device (not shown) and an audio device (not shown) May be included in the user interface 112.

콘텐츠 수신부(120)는 비디오 데이터와 오디오 데이터를 포함하는 멀티미디어 콘텐츠 데이터를 외부로부터 수신할 수 있다. 콘텐츠 재생부(130)는 사용자 인터페이스(112)를 통한 사용자의 선택에 따라 메모리(111)에 저장된 멀티미디어 콘텐츠 파일을 재생할 수도 있으며, 이외에 콘텐츠 수신부(120)를 통하여 수신되는 멀티미디어 콘텐츠 데이터나 방송 데이터 스트림을 디스플레이 장치와 오디오 장치에 재생할 수도 있다. 예를 들어, 콘텐츠 수신부(120)는 사용자 인터페이스(112)를 통한 사용자의 선택에 따라 인터넷 상에서 다운로드나 스트리밍을 통해 멀티미디어 콘텐츠 데이터를 수신할 수 있으며, 또는 지상파나 위성파을 통해 각종 방송국에서 송신하는 방송 데이터 스트림을 수신할 수도 있다. The content receiving unit 120 may receive multimedia content data including video data and audio data from outside. The content reproducing unit 130 may reproduce the multimedia content file stored in the memory 111 according to the user's selection through the user interface 112. In addition to the multimedia content data received through the content receiving unit 120, To the display device and the audio device. For example, the content receiving unit 120 can receive multimedia content data through downloading or streaming on the Internet according to a user's selection through the user interface 112, or can receive broadcast content data transmitted from various broadcasting stations via terrestrial waves or satellite waves Stream.

이와 같이 콘텐츠 수신부(120)가 수신하거나, 메모리(111)에서 읽어오는 비디오 데이터와 오디오 데이터를 포함하는 멀티미디어 콘텐츠 데이터가 재생되는 경우, 캐릭터 검출부(130)는 재생되는 멀티미디어 콘텐츠 데이터로부터 해당 재생 영상 신호 중 캐릭터를 포함하는 비디오 프레임을 추출한다. 예를 들어, 캐릭터 검출부(130)는 비디오 데이터로부터 사람, 동물 등의 얼굴 등 신체 부위를 식별하는 소정의 알고리즘을 이용하여 캐릭터를 식별할 수 있고, 해당 비디오 프레임을 검출할 수 있다. 캐릭터 검출부(130)는 캐릭터를 포함하는 프레임을 추출한 후, 해당 프레임에서의 캐릭터 얼굴 줌인을 통한 관심 캐릭터 선택이 가능하도록 재생 화면에 소정의 메시지나 알림창 등 인터페이스를 제공할 수 있으며, 또는 사용자가 관심 캐릭터 선택이나 영역 범위 지정을 위한 인터페이스(예, 해당 키나 버튼 위치의 표시, 설정 방법 설명 표시, 또는 화면에 영역 범위 지정을 위한 점선 등의 표시 등)를 재생 화면에 제공할 수도 있다.When the multimedia contents data including the video data and the audio data read by the contents receiving unit 120 or read from the memory 111 are reproduced, the character detecting unit 130 detects the corresponding reproduction video signal And extracts a video frame including the middle character. For example, the character detecting unit 130 can identify a character from video data using a predetermined algorithm for identifying a body part such as a face of a person, an animal, etc., and can detect the video frame. The character detection unit 130 may provide an interface such as a predetermined message or a notification window to the playback screen so that a character of interest can be selected by zooming in on the character in the frame after extracting the frame including the character, (For example, display of the corresponding key or button position, display of the setting method description, display of a dotted line or the like for designating an area range on the screen) for character selection or area range designation may be provided on the reproduction screen.

이에 따라 사용자에 의해 캐릭터 얼굴 줌인을 통해 관심 캐릭터 선택이 이루어지거나, 영역 범위 지정을 위한 인터페이스에 따라 사용자에 의해 하나 이상의 관심 캐릭터 선택이나 하나 이상의 관심 캐릭터를 포함하도록 영역 범위 지정의 설정(예, 터치를 통한 선택이나 점선 영역 크기 조절)이 이루어지면, 오디오 처리부(140)는 검출된 비디오 프레임에 대응된 오디오 신호 세그먼트들에 대사음 강화 알고리즘을 적용하여 대사음을 강화한다.Accordingly, the user's interest character selection is performed through the character's face zoom-in, or the setting of the area range designation (for example, touching) is performed so that the user can select one or more characters of interest, , The audio processing unit 140 enhances the metabolic sound by applying a metabolic sound enhancement algorithm to the audio signal segments corresponding to the detected video frame.

즉, 오디오 처리부(140)의 대사음 추출부(141)는 소정의 음원분리 알고리즘을 적용하여 재생 오디오 신호(예, 콘텐츠 파일 또는 스트리밍 데이터로부터 추출된 오디오 신호, 또는 방송 데이터 스트림의 부채널 오디오 신호) 중에서 설정된 해당 캐릭터에 대응된 대사음 신호를 추출할 수 있다.That is, the metabolism sound extracting unit 141 of the audio processing unit 140 applies a predetermined sound source separation algorithm to extract a reproduction audio signal (for example, an audio signal extracted from a content file or streaming data, or a sub- The user can extract the metabolic sound signal corresponding to the set character.

또한, 오디오 처리부(140)의 대사음 합성부(142)는 재생 오디오 신호 중에서 배경음 신호에 대한 해당 대사음 신호의 상대적인 크기를 조절하여 재생 오디오 신호에 합성할 수 있다. 이때, 대사음 신호의 상대적인 크기를 조절하기 위하여, 미리 해당 조절량이 설정되어 있을 수도 있고, 사용자가 설정 가능한 볼륨 크기 조절을 위한 인터페이스(예, 볼륨 조절 키 위치나 터치스크린에 표시된 볼륨 조절 키 등)를 재생 화면에 제공함으로써 사용자가 조절하게 할 수도 있다.In addition, the metabolic sound synthesizing unit 142 of the audio processing unit 140 may adjust the relative size of the metabolic sound signal with respect to the background sound signal among the reproduced audio signals to synthesize the reproduced audio signal. In this case, in order to adjust the relative size of the metabolic signal, the corresponding adjustment amount may be set in advance, or an interface for adjusting a volume size that can be set by the user (for example, a volume adjustment key position or a volume adjustment key displayed on the touch screen) May be provided on the playback screen so that the user can control the playback screen.

콘텐츠 재생부(130)는 이와 같은 합성이 이루어진 재생 오디오 신호를 반영하여, 해당 멀티미디어 콘텐츠 데이터를 디스플레이 장치와 오디오 장치에 재생한다.The content reproducing unit 130 reflects the reproduced audio signal thus synthesized and reproduces the corresponding multimedia content data on the display device and the audio device.

이하 도 2의 흐름도를 참조하여 본 발명의 일 실시예에 따른 멀티미디어 콘텐츠 서비스 장치(100)의 동작을 좀 더 자세히 설명한다.Hereinafter, the operation of the multimedia contents service apparatus 100 according to an embodiment of the present invention will be described in more detail with reference to the flowchart of FIG.

먼저, 콘텐츠 수신부(120)를 통하여 인터넷 상에서 다운로드나 스트리밍을 통해 멀티미디어 콘텐츠 데이터를 수신하거나 각종 방송국에서 송신하는 방송 데이터 스트림을 수신할 수도 있으며, 또한 사용자의 선택에 따른 제어부(110)의 제어로 메모리(111)에 저장된 콘텐츠 파일을 읽어올 수도 있다. 이와 같은 다운로드나 스트리밍 데이터, 방송 데이터 스트림, 메모리(111)에 저장된 콘텐츠 파일 등은 비디오 데이터와 오디오 데이터를 포함하고 있으며, 콘텐츠 재생부(130)는 이와 같은 멀티미디어 콘텐츠 데이터를 재생하기 위하여 비디오 데이터와 오디오 데이터를 추출하고 각각의 데이터를 처리하며, 영상 신호를 통한 디스플레이 장치의 구동과 오디오 신호를 통한 오디오 장치의 구동을 통해, 멀티미디어 콘텐츠 데이터를 재생할 수 있다.First, the multimedia contents data may be received through downloading or streaming on the Internet through the contents receiving unit 120, or a broadcast data stream transmitted from various broadcasting stations may be received. In addition, The content file stored in the storage unit 111 may be read. The contents reproducing unit 130 includes the video data and the audio data in order to reproduce the multimedia contents data, such as downloading, streaming data, a broadcast data stream, a content file stored in the memory 111, Extracts audio data, processes each data, and reproduces multimedia contents data through driving of a display device through a video signal and driving of an audio device through an audio signal.

이와 같은 멀티미디어 콘텐츠 데이터의 재생 동안, 캐릭터 검출부(130)는 재생되는 멀티미디어 콘텐츠 데이터로부터 해당 재생 영상 신호 중 캐릭터를 포함하는 비디오 프레임을 추출하며, 캐릭터를 포함하는 프레임에서 캐릭터 얼굴 줌인을 통한 관심 캐릭터 선택이 가능하도록 재생 화면에 소정의 메시지나 알림창 등 인터페이스를 제공할 수 있다. 이외에도 캐릭터 검출부(130)는 사용자가 관심 캐릭터 선택이나 영역 범위 지정을 위한 인터페이스(예, 키나 버튼 위치나 설정 방법 표시, 또는 화면에 영역 범위 지정을 위한 점선 등의 표시 등)를 재생 화면에 제공할 수도 있다(S110).During the reproduction of the multimedia content data, the character detection unit 130 extracts a video frame including a character from the reproduced video signal from the multimedia content data to be reproduced, and selects a desired character through zooming in the character face in the frame including the character An interface such as a predetermined message or a notification window can be provided on the playback screen. In addition, the character detector 130 may provide an interface (e.g., a display of a key, a button position, a setting method, or a dotted line for designating an area range on the screen) to a reproduction screen (S110).

예를 들어, 캐릭터 검출부(130)는 비디오 데이터로부터 사람, 동물 등의 얼굴 등 신체 부위를 식별하는 소정의 알고리즘을 이용하여 캐릭터를 식별할 수 있고, 해당 비디오 프레임을 검출할 수 있다. 캐릭터 검출부(130)는 이와 같이 캐릭터가 포함된 프레임에 캐릭터 얼굴 줌인을 통한 관심 캐릭터 선택이 가능하도록 재생 화면에 소정의 메시지나 알림창 등 인터페이스를 제공할 수 있다. 캐릭터 검출부(130)는 도 3과 같이, 영역 범위 지정을 위한 인터페이스(310, 320)을 재생 화면에 제공할 수도 있다. 인터페이스(310, 320)는 반투명 상태로 재생화면에 오버레이되도록 표시될 수 있다. For example, the character detecting unit 130 can identify a character from video data using a predetermined algorithm for identifying a body part such as a face of a person, an animal, etc., and can detect the video frame. The character detection unit 130 may provide an interface such as a predetermined message or a notification window on the reproduction screen so that a character of interest can be selected by zooming in the character face to the frame including the character. 3, the character detection unit 130 may provide interfaces (310, 320) for specifying the area range to the playback screen. The interfaces 310 and 320 may be displayed to be overlaid on the playback screen in a semi-transparent state.

이와 같이 관심 캐릭터 선택이 가능하도록 소정의 메시지나 알림창 등 인터페이스가 제공되면, 사용자는 캐릭터가 포함된 비디오 프레임의 해당 재생 화면에서 캐릭터 얼굴을 줌인하는 방식으로 관심 캐릭터를 선택할 수 있으며, 또는 관심 캐릭터 선택 또는 영역 범위 지정을 위한 인터페이스가 재생화면에 표시되면, 사용자는 해당 인터페이스를 통한 하나 이상의 관심 캐릭터 선택이나 하나 이상의 관심 캐릭터를 포함하도록 영역 범위 지정의 설정(예, 터치를 통한 선택이나 점선 영역 크기 조절)이 가능하다(S120). 도 3과 같이, 2 이상의 복수의 인터페이스를 제공하고, 캐릭터마다 터치 스크린 상에서의 선택이나 키버튼 등을 이용한 선택이 가능하도록 할 수도 있으며, 각각의 점선 형태의 인터페이스를 터치 등의 방법으로 영역 크기 조절이 가능하도록 할 수도 있다. If an interface such as a predetermined message or a notification window is provided so that the user can select an interested character, the user can select an interested character by zooming in the character face on the corresponding playback screen of the video frame including the character, Or an interface for area range designation is displayed on the playback screen, the user can select one or more characters of interest through the interface or set the area range designation to include one or more characters of interest (e.g., selection through touch or dot- (S120). As shown in FIG. 3, it is possible to provide a plurality of interfaces of two or more, and to enable selection on a touch screen or selection using a key button or the like for each character. Alternatively, .

이와 같이 사용자에 의해 어느 하나 이상의 관심 캐릭터 선택이나 하나 이상의 관심 캐릭터를 포함하도록 영역 범위 지정의 설정(예, 터치를 통한 선택이나 점선 영역 크기 조절)이 이루어지면, 오디오 처리부(140)의 대사음 추출부(141)는 소정의 음원분리 알고리즘을 적용하여 재생 오디오 신호(예, 콘텐츠 파일 또는 스트리밍 데이터로부터 추출된 오디오 신호, 또는 방송 데이터 스트림의 부채널 오디오 신호) 중에서 설정된 해당 캐릭터에 대응된 대사음 신호를 추출할 수 있다(S130).When the setting of the area range designation (e.g., selection through touch or size adjustment of the dotted line area) is performed to include one or more characters of interest or one or more characters of interest as described above, Unit 141 applies a predetermined sound source separation algorithm to generate a metabase signal corresponding to the set character among the reproduced audio signal (e.g., an audio signal extracted from a content file or streaming data, or a subchannel audio signal of a broadcast data stream) (S130).

또한, 오디오 처리부(140)의 대사음 합성부(142)는 재생 오디오 신호 중에서 배경음 신호에 대한 해당 대사음 신호의 상대적인 크기를 조절하여 재생 오디오 신호에 합성할 수 있다(S140). 이때, 대사음 신호의 상대적인 크기를 조절하기 위하여, 미리 해당 조절량이 설정되어 있을 수도 있고, 사용자가 설정 가능한 볼륨 크기 조절을 위한 인터페이스(예, 볼륨 조절 키 위치나 터치스크린에 표시된 볼륨 조절 키 등)(도 3의 311, 312 참조)를 재생 화면에 제공함으로써 사용자가 조절하게 할 수도 있다.In addition, the metabolic sound synthesizing unit 142 of the audio processing unit 140 may adjust the relative size of the metabolic sound signal with respect to the background sound signal among the reproduced audio signals to synthesize the reproduced audio signal with the reproduced audio signal (S140). In this case, in order to adjust the relative size of the metabolic signal, the corresponding adjustment amount may be set in advance, or an interface for adjusting a volume size that can be set by the user (for example, a volume adjustment key position or a volume adjustment key displayed on the touch screen) (See 311 and 312 in Fig. 3) may be provided on the playback screen to be controlled by the user.

콘텐츠 재생부(130)는 이와 같은 합성이 이루어진 재생 오디오 신호를 반영하여, 해당 멀티미디어 콘텐츠 데이터를 디스플레이 장치와 오디오 장치에 재생한다(S150).The content reproducing unit 130 reflects the reproduced audio signal thus synthesized and reproduces the corresponding multimedia content data on the display device and the audio device (S150).

상술한 바와 같이, 본 발명에 따른 오디오 줌인을 제공하는 멀티미디어 콘텐츠 서비스 장치(100)는, 텔레비젼, 이동통신단말기, 노트북 PC, 데스크탑 PC 등 디스플레이 장치와 오디오 장치를 구비하는 다양한 전자장치에 적용될 수 있고, 이와 같은 각종 전자 장치의 멀티미디어 콘텐츠 재생 화면에서 사람, 동물 등 얼굴을 검출하고, 사용자가 특정 캐릭터의 선택이나 줌인 등 관심 인물의 위치나 범위 설정 이벤트에 연동하여, 오디오 신호에서의 음원 분리나 부채널 오디오 신호를 통해 해당 캐릭터의 대사 등 음원 부분만을 강화하여 BGM, 음향 효과 등의 배경 소리에 우선하여 출력함으로써, 대사 인지 어려움 등을 해결할 수 있다.As described above, the multimedia contents service apparatus 100 providing the audio zoom-in according to the present invention can be applied to various electronic apparatuses including a display apparatus such as a television, a mobile communication terminal, a notebook PC, a desktop PC, and an audio apparatus , A face such as a person or an animal is detected on a multimedia content reproduction screen of such various electronic devices and a sound source separation or a sound source separation in an audio signal is performed in cooperation with a position or a range setting event of a person of interest, It is possible to enhance the sound source portion such as the metabolism of the character through the channel audio signal and output it in preference to the background sound such as BGM or sound effect, thereby solving the difficulty of dialogue or the like.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.As described above, the present invention has been described with reference to particular embodiments, such as specific elements, and specific embodiments and drawings. However, it should be understood that the present invention is not limited to the above- Those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the essential characteristics of the invention. Therefore, the spirit of the present invention should not be construed as being limited to the embodiments described, and all technical ideas which are equivalent to or equivalent to the claims of the present invention are included in the scope of the present invention .

콘텐츠 수신부(120)
캐릭터 검출부(130)
오디오 처리부(140)
대사음 추출부(141)
대사음 합성부(142)
콘텐츠 재생부(130)The content receiving unit 120,
The character detecting unit 130 detects,
The audio processing unit 140,
The metabolism sound extracting unit 141 extracts,
The metabolic synthesis unit 142,
The content playback unit 130,

Claims

A multimedia content service method in a multimedia content service apparatus,
The method comprising: receiving user input for a frame including a character of a corresponding playback video signal from multimedia content data including video data and audio data;
Extracting a metabolic sound signal corresponding to a corresponding interest character set in the reproduced audio signal when the user input for setting one or more interesting characters is performed; And
Reproducing the multimedia content data on the display device and the audio device while adjusting the relative size of the metabolic sound signal with respect to the background sound signal among the reproduced audio signals and combining the reproduced audio signal with the reproduction audio signal
The method of claim 1,