KR20140140916A

KR20140140916A - Device and method for displaying subject of interst during video call

Info

Publication number: KR20140140916A
Application number: KR20130061955A
Authority: KR
Inventors: 이성오; 정문식; 최성도
Original assignee: 삼성전자주식회사
Priority date: 2013-05-30
Filing date: 2013-05-30
Publication date: 2014-12-10
Also published as: KR102078132B1

Abstract

The present invention relates to a device and a method for displaying a subject of interest during a video call. For this purpose, according to the present invention, a face image recognized from image data is stored. A subject of interest is determined according to the recognition result by performing at least one of a voice recognition for voce data transmitted from a transmitting device and a gesture recognition for the image data. The determined subject of interest is expanded and displayed on the screen. Therefore, it is possible to automatically expand a subject in which a user is interested.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a device and a method for displaying an object of interest in a video call,

본 발명은 표시 장치 및 방법에 관한 것으로, 특히 일대다 영상 통화 시 관심 대상을 표시하기 위한 장치 및 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a display device and a method, and more particularly, to an apparatus and method for displaying an object of interest in a one-to-many video call.

일반적으로 영상 통화는 일 대 일 또는 일 대 다수가 음성 및 영 통신을 연결하여 대화를 수행하는 것을 의미한다. 이에 대해서 구체적으로 살펴보면, 영상 통화에 참여하는 각 사용자는 디스플레이 장치, 이미지 센서, 마이크, 스피커 등을 이용하여 다른 사람과의 영상 통화를 수행할 수 있다. 영상 통화를 시작하면 각 사용자는 이미지 센서를 통해서 입력된 영상을 각 사용자의 디스플레이 장치로 전송하여 각 사용자의 모습을 표시한다. 만약 특정 사용자가 마이크를 통해서 음성을 입력하면 입력된 음성 데이터가 각 사용자의 스피커를 통해서 출력된다.Generally, a video call means that a one-to-one or one-to-many voice and / or communication is connected to perform a conversation. Specifically, each user participating in a video call can perform a video call with another person using a display device, an image sensor, a microphone, a speaker, or the like. When a video call is started, each user transmits an image input through an image sensor to each user's display device, thereby displaying each user's image. If a specific user inputs voice through a microphone, the input voice data is output through each user's speaker.

상기와 같이 종래의 영상 통화는 호 연결 시 이미지 센서를 통해서 복수의 사용자에 대한 영상을 수신하여 각 사용자의 디스플레이 장치로 전달되어 표시되고, 특정 사용자의 음성 데이터가 수신되면 수신된 음성 데이터를 각 사용자의 스피커를 통해서 출력하도록 한다. As described above, the conventional video call receives an image for a plurality of users through an image sensor at the time of call connection and is transmitted to and displayed on a display device of each user. When voice data of a specific user is received, And outputs it through the speaker of the speaker.

하지만 종래에는 영상 통화를 요청한 발신자가 다수의 수신자들 중 자신이 관심 있는 대상의 얼굴을 좀 더 자세하게 보기 위해서 리모컨 등과 같은 제어장치를 이용하여 디스플레이 장치에 표시되는 화면 크기를 조절해야 하는 번거로움이 있었다. However, in the related art, a sender requesting a video call has to adjust the screen size displayed on the display device by using a control device such as a remote controller to view a face of a target of interest among a plurality of recipients .

따라서, 본 발명에서는 영상 통화 시 대화 내용과 특정 제스쳐를 인식하여 발신자가 관심을 가지는 대상을 표시하기 위한 장치 및 방법을 제공한다.Accordingly, the present invention provides an apparatus and method for recognizing conversation contents and a specific gesture in a video call and displaying an object of interest to the caller.

상술한 바를 달성하기 위한 영상 통화 시 관심 대상을 표시하기 위한 장치에 있어서, 영상 데이터를 획득하는 카메라부, 음성 데이터를 수신하는 통신부, 상기 영상 데이터로부터 인식된 얼굴 이미지를 저장하고, 발신장치로부터 수신된 음성 데이터에 대한 음성 인식 및 상기 영상 데이터에 대한 제스쳐 인식 중 적어도 하나를 수행하고, 인식 결과에 따라 관심 대상을 판단한 후 상기 판단된 관심 대상을 확대하여 상기 관심 대상을 확대한 영상 데이터를 상기 발신장치로 전송하는 제어부를 특징으로 한다.An apparatus for displaying an object of interest in a video call for achieving the above, the apparatus comprising: a camera unit for acquiring image data; a communication unit for receiving voice data; a face image storage unit for storing a face image recognized from the image data; The method includes performing at least one of voice recognition for voice data and gesture recognition for the image data, determining an object of interest according to the recognition result, enlarging the determined object of interest, To the device.

또한, 본 발명은 영상 통화 시 관심 대상을 표시하기 위한 방법에 있어서, 영상 데이터로부터 인식된 얼굴 이미지를 저장하는 과정과, 발신장치로부터 수신된 음성 데이터에 대한 음성 인식 및 상기 영상 데이터에 대한 제스쳐 인식 중 적어도 하나를 수행하여 인식 결과에 따라 관심 대상을 판단하고, 상기 판단된 관심 대상을 확대하여 상기 관심 대상을 확대한 영상 데이터를 상기 발신장치로 전송하는 과정을 포함하는 것을 특징으로 한다.According to another aspect of the present invention, there is provided a method for displaying an object of interest in a video call, the method comprising the steps of: storing a face image recognized from the image data; performing voice recognition on the voice data received from the calling device; And transmitting the image data enlarged to the object of interest to the originating device by enlarging the object of interest as determined according to the recognition result.

본 발명은 본 발명은 일대다 영상 통화 시 발신자와 수신자간의 대화 내용과 특정 제스쳐 인식에 의해 관심대상을 판단하고, 해당 관심대상을 확대하여 표시함으로써 사용자가 관심을 가지는 대상을 자동으로 확대하여 보여줄 수 있다는 이점이 있다.The present invention can automatically enlarge and display a target of interest by displaying a dialogue between a caller and a recipient in a one-to-many video call, determining a target of interest by recognizing a specific gesture, .

도 1은 본 발명의 실시 예에 따른 장치에 대한 블록도,
도 2는 본 발명의 실시 예에 따라 관심 얼굴을 확대하여 표시하는 과정을 나타내는 흐름도,
도 3은 본 발명의 실시 예에 따라 키워드 인식에 의해서 관심 얼굴을 확대하여 표시하는 과정을 나타내는 흐름도,
도 4는 본 발명의 실시 예에 따라 응답 음성 데이터 인식에 의한 관심 얼굴을 확대하여 표시하는 과정을 나타내는 흐름도,
도 5는 본 발명의 실시 예에 따라 제스쳐 인식에 의한 관심 얼굴을 확대하여 표시하는 과정을 나타내는 흐름도,
도 6 내지 도 9는 본 발명의 실시 예에 따라 음성 및 제스쳐 인식에 의한 관심 얼굴을 확대하여 표시하는 과정을 설명하기 위한 예시도들.1 is a block diagram of an apparatus according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a process of enlarging and displaying a face of interest according to an embodiment of the present invention.
FIG. 3 is a flowchart illustrating a process of enlarging and displaying a face of interest by keyword recognition according to an embodiment of the present invention. FIG.
FIG. 4 is a flowchart illustrating a process of enlarging and displaying a face of interest by recognition of response voice data according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating a process of enlarging and displaying a face of interest by gesture recognition according to an embodiment of the present invention. FIG.
FIGS. 6 to 9 illustrate a process of enlarging and displaying a face of interest by voice and gesture recognition according to an embodiment of the present invention. FIG.

이하, 첨부된 도면들에 기재된 내용들을 참조하여 본 발명에 따른 예시적 실시예를 상세하게 설명한다. 다만, 본 발명이 예시적 실시 예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조부호는 실질적으로 동일한 기능을 수행하는 부재를 나타낸다. Hereinafter, exemplary embodiments according to the present invention will be described in detail with reference to the contents described in the accompanying drawings. However, the present invention is not limited to or limited by the exemplary embodiments. Like reference numerals in the drawings denote members performing substantially the same function.

제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 본 출원에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다.Terms including ordinals, such as first, second, etc., may be used to describe various elements, but the elements are not limited to these terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. The singular expressions include plural expressions unless the context clearly dictates otherwise.

도 1은 본 발명의 실시 예에 따른 장치에 대한 블록도이다.1 is a block diagram of an apparatus according to an embodiment of the present invention.

본 발명의 실시 예에 따른 장치(10)는 제어부(100), 카메라부(110), 얼굴 인식부(120), 음성 인식부(130), 제스쳐 인식부(140), 통신부(150), 표시부(160), 저장부(170)를 포함한다.The apparatus 10 according to the embodiment of the present invention includes a control unit 100, a camera unit 110, a face recognition unit 120, a voice recognition unit 130, a gesture recognition unit 140, a communication unit 150, (160), and a storage unit (170).

제어부(100)는 장치의 동작을 제어하는데, 특히 카메라부(110), 얼굴 인식부(120), 음성 인식부(130), 제스쳐 인식부(140), 통신부(150), 표시부(160), 저장부(170)의 동작을 제어한다.The control unit 100 controls the operation of the device. The control unit 100 includes a camera unit 110, a face recognition unit 120, a voice recognition unit 130, a gesture recognition unit 140, a communication unit 150, a display unit 160, And controls the operation of the storage unit 170.

제어부(100)는 발신자의 발신장치로부터 일대다 영상 호 연결에 대한 요청이 수신되면 영상 호 연결 요청이 수신됨을 알리고, 영상 호 연결 요청이 수락되면 발신 장치와의 영상 호 연결을 수행한다. 예를 들어, 제어부(100)는 TV 등과 같은 수신 장치의 디스플레이 화면에 영상 호 연결 요청이 수신됨을 알리기 위한 호 연결 요청 메시지를 표시할 수 있다.The control unit 100 informs that a video call connection request is received when a request for a one-to-many video call connection is received from the calling apparatus of the caller, and performs a video call connection with the calling apparatus when the video call connection request is accepted. For example, the control unit 100 may display a call connection request message to inform that a video call connection request is received on a display screen of a receiving apparatus such as a TV.

영상 호 연결이 허락되면 제어부(100)는 카메라부(110)를 통해서 입력된 영상으로부터 얼굴을 인식하도록 얼굴 인식부(120)를 제어한다. 구체적으로, 제어부(100)는 일반적인 얼굴 인식 기술을 이용하여 영상으로부터 얼굴 이미지를 인식하고, 인식된 얼굴 이미지와 그 얼굴 이미지에 대응하는 사용자 정보를 대응시켜 저장부(170)에 저장한다. 이때, 제어부(100)는 수신자로부터 인식된 얼굴 이미지에 대한 사용자 정보를 입력받을 수 있다. 또한 전화번호부와 같이 사용자 정보와 얼굴 이미지가 미리 대응되어 저장된 경우 제어부(100)는 미리 저장된 얼굴 이미지와 인식된 얼굴 이미지를 비교하여 일치 여부에 따라 인식된 얼굴 이미지와 사용자 정보를 대응시켜 저장할 수도 있다.The control unit 100 controls the face recognition unit 120 to recognize the face from the image input through the camera unit 110. [ Specifically, the control unit 100 recognizes the face image from the image using a general face recognition technology, stores the recognized face image in correspondence with the user information corresponding to the face image, and stores the same in the storage unit 170. At this time, the control unit 100 can receive user information on the recognized face image from the recipient. In addition, when the user information and the face image are stored corresponding to each other in advance such as a phone book, the control unit 100 may compare the previously stored face image with the recognized face image and store the recognized face image in correspondence with the user information .

발신장치로부터 음성 및 영상 데이터가 수신되면 제어부(100)는 음성 인식 및 제스쳐 인식 중 적어도 하나를 수행하고, 인식 결과에 따라서 관심 대상을 판단하고, 판단된 관심 대상을 확대하고, 확대된 영상을 발신장치로 전달한다. 또한 제어부(100)는 발신장치로부터 수신된 영상을 표시부(160)를 통해서 표시하면서 미리 설정된 위치에 카메라부(110)를 통해서 수신되는 영상을 프리뷰 화면으로 표시할 수 있다. When voice and video data is received from the originating device, the controller 100 performs at least one of voice recognition and gesture recognition, judges an object of interest according to the recognition result, enlarges the determined object of interest, Device. Also, the control unit 100 may display a video received through the camera unit 110 at a predetermined position on the display unit 160 while displaying the video received from the originating device as a preview screen.

제1 실시 예로 제어부(100)는 수신된 음성 데이터에 대한 음성 인식을 수행하여 복수의 수신자에 해당하는 키워드를 추출할 수 있다. 예를 들어, 제어부(100)는 수신된 음성 데이터로부터 복수의 단어들을 인식하도록 음성 인식부(130)를 제어하고, 인식된 단어들 중에서 사용자 정보와 일치하는 단어를 추출한다. 만약 수신된 음성 데이터가 “엄마, 잘 지냈어?”와 같은 문장이면 음성 인식부(130)는 “엄마”, “잘”, “지냈어”를 인식하고, 인식된 “엄마”, “잘”, “지냈어”에 해당하는 사용자 정보를 검색한다. “엄마”에 해당하는 사용자 정보가 검색되는 경우 제어부(100)는 검색된 사용자 정보에 대응하여 저장된 얼굴 이미지와 입력된 영상 내 인식된 얼굴 중 일치하는 얼굴을 검출하고, 검출된 얼굴이 확대되도록 줌인한 영상을 발신장치로 전달한다. In the first embodiment, the control unit 100 may perform speech recognition on the received speech data to extract keywords corresponding to a plurality of recipients. For example, the control unit 100 controls the voice recognition unit 130 to recognize a plurality of words from the received voice data, and extracts words corresponding to the user information from the recognized words. If the received voice data is a sentence such as " Mom, how are you? &Quot;, the voice recognition unit 130 recognizes "Mom", " The user information corresponding to " When user information corresponding to " mother " is searched, the control unit 100 detects a matching face among the stored face image and the recognized face in the input image corresponding to the retrieved user information, and zooms in And delivers the image to the originating device.

만약 “엄마”에 해당하는 사용자 정보가 검색되지 않은 경우 제어부(100)는 카메라부(110)를 통해서 입력된 영상을 발신장치로 전달한다. If the user information corresponding to " mother " is not retrieved, the control unit 100 transmits the image input through the camera unit 110 to the originating device.

제2 실시 예로 제어부(100)는 발신 장치로부터 수신된 음성 데이터에 대한 응답으로 응답 음성 데이터가 수신되는지 여부를 판단하여 응답 음성 데이터가 수신되면 응답 음성 데이터를 전송한 대상을 확인한다. 예를 들어, 제어부(100)는 수신된 응답 음성 데이터의 음성 방향을 검출하고, 검출된 음성 방향에 위치하는 대상을 확인한다. 이후 제어부(100)는 확인된 대상의 얼굴을 검출하고, 검출된 얼굴이 확대되도록 줌인한 영상을 발신장치로 전달한다.In the second embodiment, the control unit 100 determines whether or not the response voice data is received in response to the voice data received from the calling apparatus, and confirms the transmission destination of the response voice data when the response voice data is received. For example, the control unit 100 detects the voice direction of the received response voice data, and confirms the object located in the detected voice direction. Then, the control unit 100 detects the face of the identified object, and transmits the zoomed-in image to the transmitting apparatus so that the detected face is enlarged.

제3 실시 예로 제어부(100)는 제1 실시 예와 제2 실시 예를 결합하여 발신장치로부터 수신된 음성 데이터로부터 복수의 단어들을 인식하도록 음성 인식부(130)를 제어하고, 인식된 단어들 중에서 사용자 정보와 일치하는 단어를 추출할 수 있다. 제어부(100)는 발신 장치로부터 수신된 음성 데이터에 대한 응답으로 응답 음성 데이터가 수신되는지 여부를 판단하여 응답 음성 데이터가 수신되면 응답 음성 데이터를 전송한 대상을 확인한다. 제어부(100)는 확인된 대상에 대한 얼굴 인식을 수행하여 인식된 얼굴과 일치하는 얼굴 이미지를 검색하고, 검색된 얼굴 이미지에 대응하는 사용자 정보와 상기에서 추출된 단어가 일치하는지 여부를 판단한다. 검색된 얼굴 이미지에 대응하는 사용자 정보와 상기에서 추출된 단어가 일치하면 제어부(100)는 확인된 대상의 얼굴이 확대되도록 줌인한 영상을 발신장치로 전달한다. In the third embodiment, the control unit 100 controls the voice recognition unit 130 to recognize a plurality of words from the voice data received from the originating apparatus by combining the first and second embodiments, A word matching the user information can be extracted. The control unit 100 determines whether or not the response voice data is received in response to the voice data received from the calling apparatus, and confirms the transmission destination of the response voice data when the response voice data is received. The controller 100 performs face recognition on the identified object to search for a face image matching the recognized face, and determines whether the user information corresponding to the detected face image matches the extracted word. If the user information corresponding to the searched face image matches the extracted word, the controller 100 transmits the zoomed-in image to the transmitting device so that the face of the verified object is enlarged.

제4 실시 예로 제어부(100)는 발신 장치로부터 수신된 음성 데이터에 대한 응답으로 응답 음성 데이터가 수신되는지 여부를 판단하여 응답 음성 데이터가 수신되면 응답 음성 데이터를 전송한 대상을 확인한다. 제어부(100)는 확인된 대상에 대한 제스쳐 인식을 수행하도록 제스쳐 인식부(140)를 제어하여 확인된 대상에 대한 제스쳐가 인식되는지 여부를 판단한다. 제스쳐 인식부(140)를 통해서 확인된 대상에 대한 제스쳐가 인식되면 제어부(100)는 확인된 대상의 얼굴을 검출하고, 검출된 얼굴이 확대되도록 줌인한 영상을 발신장치로 전달한다. In the fourth embodiment, the control unit 100 determines whether or not response voice data is received in response to voice data received from the calling apparatus, and confirms the transmission destination of the response voice data when the response voice data is received. The control unit 100 controls the gesture recognition unit 140 to recognize the gesture of the identified object to determine whether the gesture of the identified object is recognized. If a gesture for the identified object is recognized through the gesture recognition unit 140, the control unit 100 detects the face of the identified object and delivers the zoomed-in image to the originating apparatus so that the detected face is enlarged.

제5 실시 예로 제어부(100)는 제3 실시예와 제4 실시 예를 결합하여 발신 장치로부터 수신된 음성 데이터로부터 복수의 단어들을 인식하도록 음성 인식부(130)를 제어하고, 인식된 단어들 중에서 사용자 정보와 일치하는 단어를 추출한다. 제어부(100)는 응답 음성 데이터가 수신되면 응답 음성 데이터를 전송한 대상을 확인하여 응답 음성 데이터를 전송한 대상에 대한 제스쳐가 인식되는지 여부를 판단한다. 제스쳐가 인식되면 제어부(100)는 응답 음성 데이터를 전송한 대상의 얼굴을 검출하고, 검출된 얼굴이 확대한 후 검출된 얼굴을 확대한 영상을 발신장치로 전달한다. In the fifth embodiment, the control unit 100 controls the speech recognition unit 130 to recognize a plurality of words from the speech data received from the originating apparatus by combining the third embodiment and the fourth embodiment, And extracts words that match the user information. When the response voice data is received, the control unit 100 checks the destination of the response voice data and determines whether or not the gesture of the subject that transmitted the response voice data is recognized. When the gesture is recognized, the control unit 100 detects a face of the subject to which the response voice data is transmitted, and transmits an enlarged image of the face detected after enlarging the detected face to the originating device.

본 발명의 실시 예에서는 각 실시 예들의 조합에 의해서 관심 대상을 판단하도록 할 수 있으며, 상기에서 설명한 실시 예 이외에도 다양한 조합에 의해서 관심 대상을 판단하도록 할 수 있다.In the embodiment of the present invention, the object of interest can be determined by a combination of the embodiments, and the object of interest can be determined by various combinations other than the embodiments described above.

이와 같이 확대된 영상을 수신한 발신장치에서는 수신된 영상을 화면에 표시함으로써 발신자가 관심을 가지는 관심 대상의 얼굴을 더 자세하게 볼 수 있게 된다.In the sending apparatus that receives the enlarged image, the received image is displayed on the screen, so that the sender can more closely view the interested face of interest.

카메라부(110)는 광 신호를 입력받아서 영상을 출력한다.The camera unit 110 receives an optical signal and outputs an image.

얼굴 인식부(120)는 입력된 영상 내의 얼굴 영역을 인식하는데, 일반적인 얼굴 인식 기술을 이용한다. 예를 들어, 얼굴 인식부(120)는 입력 영상 내에서 미리 설정된 얼굴 피부색에 해당하는 영역을 얼굴 영역으로 인식할 수 있다. 본 발명의 실시 예에서는 상기의 기술을 예로 설명하였으나, 얼굴 인식을 위한 다양한 기술을 이용하여 얼굴 영역을 인식할 수 있다.The face recognition unit 120 recognizes a face area in the input image, and uses a general face recognition technique. For example, the face recognizing unit 120 can recognize an area corresponding to a predetermined face skin color in the input image as a face area. In the embodiments of the present invention, the above description has been made by way of example, but the face region can be recognized by using various techniques for face recognition.

음성 인식부(130)는 제어부(100)와 연결되어 통신부(150)로부터 입력된 음성 데이터를 분석하여 음성을 인식하고, 마이크로폰(microphone)으로부터 수신되는 음성 데이터를 분석하여 음성을 인식한다. 이러한 음성 인식은 일반적인 음성 인식 기술을 이용할 수 있으며, 다양한 기술을 이용하여 음성 인식을 수행할 수 있다.The voice recognition unit 130 is connected to the control unit 100 to recognize voice by analyzing the voice data input from the communication unit 150 and analyzes the voice data received from the microphone to recognize voice. Such speech recognition can utilize general speech recognition technology, and speech recognition can be performed using various techniques.

제스쳐 인식부(140)는 카메라부(110)로부터 입력된 영상으로부터 제스쳐를 인식한다. 이러한 제스쳐 인식은 일반적인 제스쳐 인식 기술을 이용할 수 있으며, 다양한 기술을 이용하여 제스쳐 인식을 수행할 수 있다. The gesture recognition unit (140) recognizes the gesture from the image input from the camera unit (110). Such gesture recognition can utilize general gesture recognition technology, and gesture recognition can be performed using various techniques.

통신부(150)는 제어부(100)와 연결되며, 음성 데이터 및 제어 데이터를 무선 신호로 변환하여 송신하고, 무선 신호를 수신하여 음성 데이터 및 제어 데이터로 변환하여 출력한다.The communication unit 150 is connected to the control unit 100. The communication unit 150 converts voice data and control data into wireless signals, transmits the wireless signals, and converts the voice signals and control data into voice data and control data.

표시부(160)는 액정표시장치(LCD, Liquid Crystal Display)로 형성될 수 있으며, 장치의 메뉴, 입력된 데이터, 기능 설정 정보 및 기타 다양한 정보를 사용자에게 시각적으로 제공한다. 이러한 표시부(160)는 LCD 이외에 다양한 장치들로 구성될 수 있다. 표시부(160)는 장치의 다양한 화면들을 출력하는 기능을 수행한다.The display unit 160 may be formed of a liquid crystal display (LCD) and visually provides menus, input data, function setting information, and various other information to the user. The display unit 160 may include various devices in addition to the LCD. The display unit 160 functions to output various screens of the apparatus.

저장부(170)는 제어부(100)의 제어에 따라 카메라부(110), 얼굴 인식부(120), 음성 인식부(130), 제스쳐 인식부(140), 통신부(150), 표시부(160)의 동작에 대응되게 입/출력되는 신호 또는 데이터를 저장할 수 있다. 또한 저장부(170)는 장치 또는 제어부(100)의 제어를 위한 제어 프로그램 및 어플리케이션들을 저장할 수 있다. The storage unit 170 includes a camera unit 110, a face recognition unit 120, a voice recognition unit 130, a gesture recognition unit 140, a communication unit 150, a display unit 160, And outputs the signal or data corresponding to the operation of the signal processing unit. The storage unit 170 may store control programs and applications for controlling the apparatus or the control unit 100. [

도 2는 본 발명의 실시 예에 따라 관심 얼굴을 확대하여 표시하는 과정을 나타내는 흐름도이다.2 is a flowchart illustrating a process of enlarging and displaying a face of interest according to an embodiment of the present invention.

200단계에서 제어부(100)는 일대다 영상 호 연결 요청에 따라 영상 호 연결을 수행한다.In step 200, the control unit 100 performs video call connection according to a one-to-many video call connection request.

210단계에서 제어부(100)는 카메라부(110)를 통해서 입력되는 영상으로부터 둘 이상의 수신자에 대한 얼굴 인식을 수행한다. 예를 들어, 제어부(100)는 얼굴 인식부(120)를 통해서 영상 내에 얼굴 인식을 수행하여 얼굴 영역을 검출하고, 검출된 얼굴 영역과 사용자 정보를 대응시켜 저장부(170)에 저장한다. 이때, 검출된 얼굴 영역과 사용자 정보를 대응시켜 저장하는 방법은 상기의 제어부(100)에서 설명한 바와 같은 방법을 이용할 수 있다.In step 210, the control unit 100 performs face recognition on two or more recipients from the image input through the camera unit 110. [ For example, the control unit 100 detects the face region by performing face recognition in the image through the face recognition unit 120, stores the detected face region and the user information in the storage unit 170 in association with each other. At this time, the method of storing the detected face area and the user information in association with each other may be the same as the method described in the control section 100 described above.

220단계에서 제어부(100)는 발신장치로부터 발신자에 관련된 음성 및 영상 데이터를 수신한다.In step 220, the control unit 100 receives voice and image data related to the caller from the calling apparatus.

230단계에서 제어부(100)는 발신장치로부터 수신된 발신자의 음성 데이터 또는 마이크를 통해서 입력된 수신자의 음성 데이터에 대한 음성 인식 및 수신자에 대한 제스쳐 인식 중 적어도 하나를 수행한다.In operation 230, the control unit 100 performs at least one of voice recognition of the caller's voice data received from the calling apparatus or voice data of the recipient input through the microphone, and gesture recognition of the recipient.

240단계에서 제어부(100)는 음성 인식 및 제스쳐 인식 중 적어도 하나의 인식이 완료되었는지 판단하여 인식이 완료되었으면 250단계를 진행하고, 인식이 완료되지 않았으면 270단계를 진행하여 전체 화면을 표시부(160)에 표시한다. 이때, 제어부(100)는 음성 인식 및 제스쳐 인식 중 어느 하나도 인식되지 않으면 카메라부(110)를 통해서 입력된 영상을 표시부(160)에 표시한다.In step 240, the controller 100 determines whether recognition of at least one of speech recognition and gesture recognition is completed. If the recognition is completed, the controller 100 proceeds to step 250. If the recognition is not completed, the controller 100 proceeds to step 270, ). At this time, the control unit 100 displays the image input through the camera unit 110 on the display unit 160 if either the voice recognition or the gesture recognition is not recognized.

250단계에서 제어부(100)는 인식결과에 따라 인식된 얼굴들 중에서 관심 대상의 얼굴을 판단한다.In step 250, the controller 100 determines a face of interest based on the recognition result.

260단계에서 제어부(100)는 판단된 관심 대상의 얼굴을 확대하여 표시부(160)를 통해서 표시하고, 관심 대상의 얼굴을 확대한 영상을 통신부(150)를 통해서 발신장치로 전달한다.In step 260, the control unit 100 enlarges the determined face of interest and displays the enlarged face through the display unit 160, and transmits an enlarged image of the face of interest to the calling apparatus through the communication unit 150. [

도 3은 본 발명의 실시 예에 따라 키워드 인식에 의해서 관심 얼굴을 확대하여 표시하는 과정을 나타내는 흐름도이다.3 is a flowchart illustrating a process of enlarging and displaying a face of interest by keyword recognition according to an embodiment of the present invention.

300단계에서 제어부(100)는 일대다 영상 호 연결 요청에 따라 영상 호 연결을 수행한다.In step 300, the control unit 100 performs video call connection according to a one-to-many video call connection request.

310단계에서 제어부(100)는 카메라부(110)를 통해서 입력되는 영상으로부터 둘 이상의 수신자에 대한 얼굴 인식을 수행한다. 예를 들어, 제어부(100)는 얼굴 인식부(120)를 통해서 영상 내에 얼굴 인식을 수행하여 얼굴 영역을 검출하고, 검출된 얼굴 영역과 사용자 정보를 대응시켜 저장부(170)에 저장한다. In step 310, the control unit 100 performs face recognition on two or more recipients from the image input through the camera unit 110. [ For example, the control unit 100 detects the face region by performing face recognition in the image through the face recognition unit 120, stores the detected face region and the user information in the storage unit 170 in association with each other.

320단계에서 제어부(100)는 발신장치로부터 발신자에 관련된 음성 및 영상 데이터를 수신한다.In step 320, the control unit 100 receives voice and image data related to the caller from the calling apparatus.

330단계에서 제어부(100)는 수신된 음성 데이터로부터 키워드를 추출한다. 구체적으로, 제어부(100)는 수신된 음성 데이터에 대한 음성 인식을 수행하여 복수의 수신자에 해당하는 키워드를 추출할 수 있다. 예를 들어, 제어부(100)는 수신된 음성 데이터로부터 복수의 단어들을 인식하고, 인식된 단어들 중에서 사용자 정보와 일치하는 단어를 키워드로써 추출할 수 있다. In step 330, the control unit 100 extracts keywords from the received voice data. Specifically, the control unit 100 may perform speech recognition on the received voice data to extract keywords corresponding to a plurality of recipients. For example, the control unit 100 recognizes a plurality of words from the received voice data and extracts a word matching the user information from the recognized words as a keyword.

340단계에서 제어부(100)는 인식된 얼굴들 중 추출된 키워드에 해당하는 얼굴을 판단한다. 구체적으로 제어부(100)는 미리 저장된 사용자 정보 중 추출된 단어에 해당하는 사용자 정보를 검색하고, 검색된 사용자 정보에 대응하여 저장된 얼굴 이미지와 입력된 영상 내 인식된 얼굴 중 일치하는 얼굴을 검출한다. In step 340, the controller 100 determines a face corresponding to the extracted keyword among the recognized faces. Specifically, the control unit 100 searches for user information corresponding to the extracted word in the user information stored in advance, and detects a matching face among the stored face image corresponding to the retrieved user information and the recognized face in the input image.

350단계에서 제어부(100)는 판단된 얼굴을 확대하고, 확대한 영상을 표시부(160)에 표시한 후 해당 영상을 발신장치로 전달한다.In step 350, the control unit 100 enlarges the determined face, displays the enlarged image on the display unit 160, and transmits the enlarged image to the calling apparatus.

도 4는 본 발명의 실시 예에 따라 응답 음성 데이터 인식에 의한 관심 얼굴을 확대하여 표시하는 과정을 나타내는 흐름도이다.4 is a flowchart illustrating a process of enlarging and displaying a face of interest by recognizing response speech data according to an embodiment of the present invention.

400단계에서 제어부(100)는 일대다 영상 호 연결 요청에 따라 영상 호 연결을 수행한다.In step 400, the control unit 100 performs video call connection according to a one-to-many video call connection request.

410단계에서 제어부(100)는 카메라부(110)를 통해서 입력되는 영상으로부터 둘 이상의 수신자에 대한 얼굴 인식을 수행한다. 예를 들어, 제어부(100)는 얼굴 인식부(120)를 통해서 영상 내에 얼굴 인식을 수행하여 얼굴 영역을 검출하고, 검출된 얼굴 영역과 사용자 정보를 대응시켜 저장부(170)에 저장한다. In operation 410, the control unit 100 performs face recognition on two or more recipients from the image input through the camera unit 110. [ For example, the control unit 100 detects the face region by performing face recognition in the image through the face recognition unit 120, stores the detected face region and the user information in the storage unit 170 in association with each other.

420단계에서 제어부(100)는 발신장치로부터 발신자에 관련된 음성 및 영상 데이터를 수신한다.In step 420, the control unit 100 receives voice and image data related to the caller from the calling apparatus.

430단계에서 제어부(100)는 수신된 음성 데이터에 대한 응답으로 응답 음성 데이터를 수신한다. In step 430, the control unit 100 receives the response voice data in response to the received voice data.

440단계에서 제어부(100)는 수신된 응답 음성 데이터에 관련된 얼굴을 판단한다. 이에 대해서 구체적으로 제어부(100)는 발신장치로부터 수신된 음성 데이터에 대한 응답으로 응답 음성 데이터가 수신되는지 여부를 판단하여 응답 음성 데이터가 수신되면 응답 음성 데이터를 전송한 대상을 확인한다. In step 440, the control unit 100 determines a face associated with the received response voice data. Specifically, the control unit 100 determines whether or not response voice data is received in response to the voice data received from the calling apparatus, and confirms the transmission destination of the response voice data when the response voice data is received.

450단계에서 제어부(100)는 판단된 얼굴을 확대하고, 확대한 영상을 표시부(160)에 표시한 후 해당 영상을 발신장치로 전달한다. In step 450, the controller 100 enlarges the determined face, displays the enlarged image on the display unit 160, and transmits the enlarged image to the calling apparatus.

도 5는 본 발명의 실시 예에 따라 제스쳐 인식에 의한 관심 얼굴을 확대하여 표시하는 과정을 나타내는 흐름도이다.5 is a flowchart illustrating a process of enlarging and displaying a face of interest by gesture recognition according to an embodiment of the present invention.

500단계에서 제어부(100)는 일대다 영상 호 연결 요청에 따라 영상 호 연결을 수행한다.In step 500, the controller 100 performs video call connection according to a one-to-many video call connection request.

510단계에서 제어부(100)는 카메라부(110)를 통해서 입력되는 영상으로부터 둘 이상의 수신자에 대한 얼굴 인식을 수행한다. 예를 들어, 제어부(100)는 얼굴 인식부(120)를 통해서 영상 내에 얼굴 인식을 수행하여 얼굴 영역을 검출하고, 검출된 얼굴 영역과 사용자 정보를 대응시켜 저장부(170)에 저장한다. In step 510, the control unit 100 performs face recognition on two or more recipients from the image input through the camera unit 110. [ For example, the control unit 100 detects the face region by performing face recognition in the image through the face recognition unit 120, stores the detected face region and the user information in the storage unit 170 in association with each other.

520단계에서 제어부(100)는 발신장치로부터 발신자에 관련된 음성 및 영상 데이터를 수신한다.In step 520, the control unit 100 receives voice and image data related to the caller from the calling apparatus.

530단계에서 제어부(100)는 수신된 음성 데이터에 대한 응답으로 응답 음성 데이터를 수신되면 540단계에서 응답 음성 데이터 전송 대상에 대한 제스쳐 인식이 되는지 여부를 판단하여 제스쳐가 인식되면 550단계를 진행하고, 제스쳐가 인식되지 않으면 570단계를 진행한다.In step 530, when the controller 100 receives the response voice data in response to the received voice data, it determines whether the voice data is to be recognized for the response voice data transmission object in step 540. If the voice data is recognized in step 540, If the gesture is not recognized, proceed to step 570.

550단계에서 제어부(100)는 제스쳐 인식 결과에 따라 관심 대상의 얼굴을 판단한다. In step 550, the controller 100 determines a face of interest according to the gesture recognition result.

이에 대해서 구체적으로 제어부(100)는 발신 장치로부터 수신된 음성 데이터에 대한 응답으로 응답 음성 데이터가 수신되는지 여부를 판단하여 응답 음성 데이터가 수신되면 응답 음성 데이터를 전송한 대상을 확인한다. 제어부(100)는 확인된 대상에 대한 제스쳐 인식을 수행하여 확인된 대상에 대한 제스쳐가 인식되는지 여부를 판단한다. 제스쳐 인식부(140)를 통해서 확인된 대상에 대한 제스쳐가 인식되면 제어부(100)는 확인된 대상의 얼굴을 검출할 수 있다.Specifically, the control unit 100 determines whether or not response voice data is received in response to the voice data received from the calling apparatus, and confirms the transmission destination of the response voice data when the response voice data is received. The controller 100 performs gesture recognition on the identified object and determines whether the gesture for the identified object is recognized. When the gesture for the identified object is recognized through the gesture recognition unit 140, the control unit 100 can detect the face of the identified object.

560단계에서 제어부(100)는 판단된 얼굴을 확대하고, 확대한 영상을 표시부(160)에 표시한 후 해당 영상을 발신장치로 전달한다.In step 560, the controller 100 enlarges the determined face, displays the enlarged image on the display unit 160, and transmits the enlarged image to the calling apparatus.

제스쳐가 인식되지 않으면 570단계에서 제어부(100)는 전체 화면을 표시부(160)에 표시한다.If the gesture is not recognized, the controller 100 displays the entire screen on the display unit 160 in step 570.

도 6 내지 도 9는 본 발명의 실시 예에 따라 음성 및 제스쳐 인식에 의한 관심 대상의 얼굴을 확대하여 표시하는 과정을 설명하기 위한 예시도들이다.FIGS. 6 to 9 are illustrations for explaining a process of enlarging and displaying a face of interest by voice and gesture recognition according to an embodiment of the present invention.

도 6은 본 발명의 실시 예에 따라 일대다 영상 호 연결을 설명하기 위한 예시도이다.6 is an exemplary diagram illustrating a one-to-many video call connection according to an embodiment of the present invention.

도 6에 따르면 본 발명의 실시 예에서는 발신자인 딸이 발신장치를 통해서 복수의 수신자인 엄마(600), 아빠(610), 아들(620)과 일대다 영상 호 연결을 요청하는 것을 가정한다. 수신장치(10)의 제어부(100)는 카메라부(110)에 의해서 촬영된 영상으로부터 얼굴 이미지를 인식하고, 인식된 얼굴 이미지와 그 얼굴에 대응하는 사용자 정보를 대응시켜 저장부(170)에 저장한다. 이때, 제어부(100)는 수신자로부터 인식된 얼굴에 대한 사용자 정보를 입력받을 수 있다. 또한 전화번호부와 같이 사용자 정보와 얼굴 이미지가 미리 대응되어 저장된 경우 제어부(100)는 미리 저장된 얼굴 이미지와 인식된 얼굴을 비교하여 일치 여부에 따라 인식된 얼굴과 사용자 정보를 대응시켜 저장할 수도 있다.Referring to FIG. 6, it is assumed in the embodiment of the present invention that a daughter as a sender requests a one-to-many video call connection with a plurality of recipients, that is, a mother 600, a father 610, and a son 620 through a calling apparatus. The control unit 100 of the receiving apparatus 10 recognizes the face image from the image photographed by the camera unit 110 and associates the recognized face image with the user information corresponding to the face to store it in the storage unit 170 do. At this time, the control unit 100 can receive user information about the recognized face from the receiver. Also, when the user information and the face image are stored corresponding to each other in advance such as a phone book, the control unit 100 may compare the previously stored face image with the recognized face, and store the corresponding face in correspondence with the recognized face according to the match.

도 7은 본 발명의 실시 예에 따라 발신자의 음성 데이터로부터 추출된 키워드에 관련된 관심 대상을 자동으로 확대하여 표시하는 과정을 설명하기 위한 예시도이다.FIG. 7 is an exemplary view for explaining a process of automatically enlarging and displaying a target of interest related to a keyword extracted from voice data of a caller according to an embodiment of the present invention. Referring to FIG.

도 7(a)에 따르면 제어부(100)는 발신장치(20)로부터 발신자의 음성 데이터가 수신되면 수신된 음성 데이터에 대한 음성 인식을 수행하여 복수의 수신자에 해당하는 키워드를 추출할 수 있다. 7 (a), when the voice data of the caller is received from the calling apparatus 20, the control unit 100 performs speech recognition on the received voice data to extract a keyword corresponding to a plurality of recipients.

예를 들어, 제어부(100)는 발신장치(20)로부터 “엄마, 저에요~ 오랜만이죠?”라는 음성 데이터가 수신되면 “엄마”, “저에요”, “오랜만이죠”와 같이 각 단어를 인식하고, 인식된 단어들을 미리 저장된 사용자 정보와 비교하여 수신자와 관련된 단어를 추출한다. 만약 “엄마”에 대응하여 저장된 사용자 정보가 존재하면 제어부(100)는 “엄마”를 키워드로써 추출할 수 있다.For example, the control unit 100 recognizes each word such as " Mom, " ", " It is a long time, " And compares the recognized words with previously stored user information to extract words related to the recipient. If there is stored user information corresponding to " mom ", the control unit 100 can extract " mom "

제어부(100)는 “엄마”에 관련된 사용자 정보에 대응하여 저장된 얼굴 이미지와 입력된 영상 내 인식된 얼굴 중 일치하는 얼굴을 검출하고, 검출된 얼굴을 확대하여 얼굴이 확대된 영상을 표시부(160)를 통해서 표시할 수 있다. 이때, 제어부(100)는 엄마의 얼굴이 확대된 영상을 미리 설정된 크기의 프리뷰 화면(700)에 표시할 수 있다. 이후 제어부(100)는 엄마의 얼굴이 확대된 영상을 통신부(150)를 통해서 발신장치(20)로 전달한다. The control unit 100 detects a matching face among the stored face image and the recognized face in the input image corresponding to the user information related to the " mother ", enlarges the detected face, . &Lt; / RTI > At this time, the control unit 100 can display the enlarged image of the mother's face on the preview screen 700 having the predetermined size. Then, the control unit 100 transmits the enlarged image of the face of the mother to the sending apparatus 20 through the communication unit 150.

이에 따라, 발신장치는 도 7(b)와 같이 엄마의 얼굴이 확대된 영상을 화면(21)에 표시함으로써 발신자인 딸(630)이 관심을 가지는 관심 대상의 얼굴을 자동으로 확대하여 보여줄 수 있게 된다.Accordingly, the calling apparatus displays the enlarged image of the mother's face on the screen 21 as shown in FIG. 7B so that the daughter 630 as a sender can automatically enlarge and display a face of interest of interest do.

도 8은 본 발명의 실시 예에 따라 발신자의 음성 데이터에 대한 응답 음성 데이터에 관련된 관심 대상을 자동으로 확대하여 표시하는 과정을 설명하기 위한 예시도이다.8 is an exemplary diagram for explaining a process of automatically enlarging and displaying an object of interest related to response voice data for voice data of a caller according to an embodiment of the present invention.

도 8에 따르면 제어부(100)는 발신장치로부터 수신된 음성 데이터에 대한 응답으로 응답 음성 데이터가 수신되는지 여부를 판단하여 응답 음성 데이터가 수신되면 응답 음성 데이터를 전송한 대상을 확인한다.Referring to FIG. 8, the control unit 100 determines whether or not response voice data is received in response to voice data received from a source apparatus, and confirms the destination of the response voice data when the response voice data is received.

예를 들어, 도 8(a)와 같이 발신 장치(20)로부터 “엄마, 저에요~ 오랜만이죠?”라는 음성 데이터(710)가 수신되면 제어부(100)는 음성 데이터의 응답으로 응답 음성 데이터가 수신되는지 여부를 판단한다. 만약 “어~ 우리딸~”과 같은 응답 음성 데이터(720)가 수신되면 제어부(100)는 응답 음성 데이터를 전송한 대상을 확인한다. 예를 들어, 제어부(100)는 수신된 응답 음성 데이터의 음성 방향을 검출하고, 검출된 음성 방향에 위치하는 대상을 확인한다.For example, when the voice data 710 of " Mom, me, me? Long time? &Quot; is received from the sending apparatus 20 as shown in Fig. 8 (a) It is determined whether or not it is received. If the response voice data 720 such as " uh-my daughter ~ " is received, the control unit 100 confirms the destination of the response voice data. For example, the control unit 100 detects the voice direction of the received response voice data, and confirms the object located in the detected voice direction.

응답 음성 데이터를 전송한 대상이 엄마인 경우 제어부(100)는 카메라부(110)를 통해서 입력되는 영상으로부터 엄마의 얼굴을 검출하고, 검출된 얼굴을 확대하여 얼굴이 확대된 영상을 표시부(160)를 통해서 표시한 후 엄마의 얼굴이 확대된 영상을 통신부(150)를 통해서 발신장(20)치로 전달한다. The controller 100 detects the face of the mother from the image input through the camera unit 110 and enlarges the detected face to display the face enlarged image on the display unit 160. [ And then transmits the enlarged image of the mother's face to the calling party 20 through the communication unit 150. [

또한 제어부(100)는 발신장치(20)로부터 수신된 음성 데이터로부터 복수의 단어들을 인식하도록 음성 인식부(130)를 제어하고, 인식된 단어들 중에서 사용자 정보와 일치하는 단어를 추출할 수 있다. 제어부(100)는 발신장치(20)로부터 수신된 음성 데이터에 대한 응답으로 응답 음성 데이터가 수신되는지 여부를 판단하여 응답 음성 데이터가 수신되면 응답 음성 데이터를 전송한 대상을 확인한다. 제어부(100)는 확인된 대상에 대한 얼굴 인식을 수행하여 인식된 얼굴과 일치하는 얼굴 이미지를 검색하고, 검색된 얼굴 이미지에 대응하는 사용자 정보와 상기에서 추출된 단어가 일치하는지 여부를 판단한다. 검색된 얼굴 이미지에 대응하는 사용자 정보와 상기에서 추출된 단어가 일치하면 제어부(100)는 카메라부(110)로부터 입력되는 영상으로부터 엄마의 얼굴을 검출하고, 검출된 얼굴을 확대하여 얼굴이 확대된 영상을 표시부(160)를 통해서 표시한 후 얼굴이 확대된 영상을 통신부(150)를 통해서 발신장치로 전달한다. Also, the control unit 100 may control the voice recognition unit 130 to recognize a plurality of words from the voice data received from the calling apparatus 20, and may extract words matching the user information from the recognized words. The control unit 100 determines whether or not the response voice data is received in response to the voice data received from the sending apparatus 20, and confirms the transmission destination of the response voice data when the response voice data is received. The controller 100 performs face recognition on the identified object to search for a face image matching the recognized face, and determines whether the user information corresponding to the detected face image matches the extracted word. If the user information corresponding to the searched face image coincides with the extracted word, the control unit 100 detects the mother's face from the image input from the camera unit 110, enlarges the detected face, Is displayed on the display unit 160, and then the enlarged image of the face is transmitted to the calling apparatus through the communication unit 150. [

예를 들어, 발신장치(20)로부터 수신된 “엄마, 저에요~ 오랜만이죠?”와 같은 음성 데이터로부터 엄마와 같은 단어를 검출하고, 응답 음성 데이터를 전송한 대상자가 엄마인 것으로 판단된 경우 제어부(100)는 입력 영상으로부터 엄마 얼굴을 검출하고, 검출된 엄마 얼굴을 확대하여 엄마의 얼굴이 확대된 영상을 발신장치(20)에 전달함으로써 도 8의 (b)와 같이 발신자가 관심을 가지는 관심 대상의 얼굴을 자동으로 확대하여 화면(21)에 보여줄 수 있게 된다.For example, when a word such as a mother is detected from the voice data such as " Mom, me or me? &Quot; received from the calling apparatus 20 and it is determined that the subject who transmitted the response voice data is a mother, The controller 100 detects the mother face from the input image and enlarges the detected mother face to deliver the enlarged image of the mother's face to the transmitting device 20 so that the interest The face of the object can be automatically enlarged and displayed on the screen 21.

도 9는 본 발명의 실시 예에 따라 제스쳐 인식을 통해서 관심 대상을 자동으로 확대하여 표시하는 과정을 설명하기 위한 예시도이다.9 is an exemplary diagram for explaining a process of automatically enlarging and displaying an object of interest through gesture recognition according to an embodiment of the present invention.

도 9에 따르면 제어부(100)는 발신 장치로부터 수신된 음성 데이터에 대한 응답으로 응답 음성 데이터가 수신되는지 여부를 판단하여 응답 음성 데이터가 수신되면 응답 음성 데이터를 전송한 대상을 확인한다. 제어부(100)는 확인된 대상에 대한 제스쳐 인식을 수행하도록 제스쳐 인식부(140)를 제어하여 확인된 대상에 대한 제스쳐가 인식되는지 여부를 판단한다. Referring to FIG. 9, the control unit 100 determines whether or not response voice data is received in response to voice data received from the calling apparatus, and confirms the transmission destination of the response voice data when the response voice data is received. The control unit 100 controls the gesture recognition unit 140 to recognize the gesture of the identified object to determine whether the gesture of the identified object is recognized.

도 9(a)와 같이 제스쳐 인식부(140)를 통해서 엄마가 검지와 중지를 브이(v)모양으로 하는 제스쳐가 인식되면 제어부(100)는 입력 영상으로부터 엄마 얼굴을 검출하고, 검출된 엄마 얼굴을 확대하여 엄마의 얼굴이 확대된 영상을 발신장치(20)에 전달함으로써 도 9(b)와 같이 발신자(630)가 관심을 가지는 관심 대상의 얼굴을 자동으로 확대하여 화면(21)에 보여줄 수 있게 된다.9 (a), when the mother recognizes a gesture in which the detection and the stop are in the form of V (v), the control unit 100 detects the mother face from the input image, The face of the mother 630 is enlarged and the enlarged image of the mother is transmitted to the originating device 20 so that the face of the interested person interested in the caller 630 is automatically enlarged and displayed on the screen 21 as shown in FIG. .

본 발명의 실시 예에서는 일대다 영상 호 연결 시 하나의 수신자를 관심 대상으로 판단하는 것을 예로 설명하였으나, 하나 이상의 수신자를 관심 대상으로 판단할 수도 있다. 예를 들어, 발신장치로부터 “엄마, 아빠 안녕하세요”라는 음성 데이터가 수신되면 제어부(100)는 음성 데이터로부터 “엄마”, “아빠”를 키워드로 추출하고, 엄마, 아빠에 해당하는 사용자 정보를 검출한 후 검출된 사용자 정보에 대응하여 저장된 얼굴 이미지와 일치하는 대상들을 검출하여 검출된 대상들을 확대하고 확대된 영상을 화면에 표시한 후 발신장치로 전달한다. 이에 따라, 발신장치는 엄마, 아빠의 얼굴이 확대된 영상을 화면에 출력할 수 있다.In the embodiment of the present invention, one recipient is determined to be a target of interest when a one-to-many video call is concatenated. However, one or more recipients may be determined as a target of interest. For example, when voice data of " Mom and Dad Hello " is received from the calling apparatus, the control unit 100 extracts "Mom" and "Dad" from the voice data by keyword and detects user information corresponding to the mother and father Detects the objects corresponding to the stored face image corresponding to the detected user information, enlarges the detected objects, displays the enlarged image on the screen, and transmits the enlarged image to the originating device. Thus, the sending apparatus can output the enlarged image of the face of the mother and father on the screen.

본 발명의 실시 예에서는 제어부(100)와 얼굴 인식부(120), 음성 인식부(130), 제스쳐 인식부(140)가 별개로 구성되는 것을 예로 설명하였으나, 얼굴 인식부(120), 음성 인식부(130), 제스쳐 인식부(140)를 별도로 구비하지 않고 제어부(100)가 얼굴 인식부(120), 음성 인식부(130), 제스쳐 인식부(140)의 동작을 수행할 수도 있다.The face recognition unit 120, the voice recognition unit 130, and the gesture recognition unit 140 are separately formed in the embodiment of the present invention. However, the face recognition unit 120, The control unit 100 may perform operations of the face recognition unit 120, the voice recognition unit 130, and the gesture recognition unit 140 without separately providing the gesture recognition unit 130 and the gesture recognition unit 140. [

이와 같이 본 발명은 일대다 영상 통화 시 발신자와 수신자간의 대화 내용과 특정 제스쳐 인식에 의해 관심대상을 판단하고, 해당 관심대상을 확대하여 표시함으로써 사용자가 관심을 가지는 대상을 자동으로 확대하여 보여줄 수 있게 된다.As described above, according to the present invention, when a user is interested in a conversation between a caller and a recipient in a one-to-many video call, a specific gesture recognition is performed, and the interested object is enlarged and displayed, do.

본 발명의 실시 예들은 하드웨어, 소프트웨어 또는 하드웨어 및 소프트웨어의 조합의 형태로 실현 가능하다는 것을 알 수 있을 것이다. 이러한 임의의 소프트웨어는 예를 들어, 삭제 가능 또는 재기록 가능 여부와 상관없이, ROM 등의 저장 장치와 같은 휘발성 또는 비휘발성 저장 장치, 또는 예를 들어, RAM, 메모리 칩, 장치 또는 집적 회로와 같은 메모리, 또는 예를 들어 CD, DVD, 자기 디스크 또는 자기 테이프 등과 같은 광학 또는 자기적으로 기록 가능함과 동시에 기계(예를 들어, 컴퓨터)로 읽을 수 있는 저장 매체에 저장될 수 있다. 본 발명의 관심 대상 표시 방법은 제어부 및 메모리를 포함하는 컴퓨터 또는 휴대 단말에 의해 구현될 수 있고, 상기 메모리는 본 발명의 실시 예들을 구현하는 지시들을 포함하는 프로그램 또는 프로그램들을 저장하기에 적합한 기계로 읽을 수 있는 저장 매체의 한 예임을 알 수 있을 것이다. 따라서, 본 발명은 본 명세서의 임의의 청구항에 기재된 장치 또는 방법을 구현하기 위한 코드를 포함하는 프로그램 및 이러한 프로그램을 저장하는 기계(컴퓨터 등)로 읽을 수 있는 저장 매체를 포함한다. 또한, 이러한 프로그램은 유선 또는 무선 연결을 통해 전달되는 통신 신호와 같은 임의의 매체를 통해 전자적으로 이송될 수 있고, 본 발명은 이와 균등한 것을 적절하게 포함한다.It will be appreciated that embodiments of the present invention may be implemented in hardware, software, or a combination of hardware and software. Such arbitrary software may be stored in a memory such as, for example, a volatile or non-volatile storage device such as a storage device such as ROM or the like, or a memory such as a RAM, a memory chip, a device or an integrated circuit, , Or a storage medium readable by a machine (e.g., a computer), such as a CD, a DVD, a magnetic disk, or a magnetic tape, as well as being optically or magnetically recordable. The method of displaying interest of the present invention can be implemented by a computer or a mobile terminal including a controller and a memory, which is a machine suitable for storing programs or programs containing instructions embodying the embodiments of the present invention It is an example of a storage medium that can be read. Accordingly, the invention includes a program comprising code for implementing the apparatus or method as claimed in any of the claims herein, and a storage medium readable by a machine (such as a computer) for storing such a program. In addition, such a program may be electronically transported through any medium such as a communication signal transmitted via a wired or wireless connection, and the present invention appropriately includes the same.

또한, 상기 장치는 유선 또는 무선으로 연결되는 장치로부터 상기 프로그램을 수신하여 저장할 수 있다. 상기 장치는 상기 장치가 기설정된 컨텐츠 보호 방법을 수행하도록 하는 지시들을 포함하는 프로그램, 컨텐츠 보호 방법에 필요한 정보 등을 저장하기 위한 메모리와, 상기 장치와의 유선 또는 무선 통신을 수행하기 위한 통신부와, 상기 장치의 요청 또는 자동으로 해당 프로그램을 상기 장치로 전송하는 제어부를 포함할 수 있다.In addition, the device may receive and store the program from a wired or wirelessly connected device. The apparatus includes a memory for storing a program including instructions for causing the apparatus to perform a predetermined content protection method, information required for a content protection method, and the like, a communication unit for performing wired or wireless communication with the apparatus, And a control unit for requesting the device or automatically transmitting the program to the device.

100: 제어부
110: 무선통신부
120: 얼굴 인식부
130: 음성 인식부
140: 제스쳐 인식부
150: 통신부
160: 표시부
170: 저장부100:
110:
120: Face Recognition Unit
130:
140: Gesture recognition unit
150:
160:
170:

Claims

An apparatus for displaying an object of interest in a video call,
A camera unit for acquiring image data,
A communication unit for receiving voice data,
A face image recognized from the image data, performing at least one of speech recognition of voice data received from a calling apparatus and gesture recognition of the image data, determining an object of interest according to the recognition result, And transmitting the image data enlarged the object of interest to the calling apparatus by enlarging the object of interest.

The method according to claim 1,
Further comprising a display unit for displaying the image data enlarged with the object of interest.

The apparatus of claim 1,
And stores the recognized face image in association with user information of a receiver corresponding to the recognized face image.

The apparatus of claim 3,
A voice recognition unit for performing voice recognition on the voice data received from the calling apparatus to determine whether or not a keyword related to one or more recipients is extracted and if the keyword associated with the one or more recipients is extracted, And enlarges the face of the object matching the face image corresponding to the detected user information from the image data, and transmits the enlarged image of the face of the object.

The apparatus of claim 3,
Determining whether or not response voice data is received in response to the voice data received from the calling apparatus, receiving the answer voice data, checking an object to which the response voice data is transmitted, enlarging the face of the checked object, And displays the enlarged image of the face of the object on the screen.

The apparatus of claim 3,
A voice recognition unit for performing voice recognition on the voice data received from the calling apparatus to determine whether or not a keyword related to one or more recipients is extracted and if the keyword associated with the one or more recipients is extracted, Determines whether or not response voice data is received in response to the voice data received from the calling apparatus, confirms an object to which the response voice data is transmitted when the response voice data is received, And transmits an enlarged image of the face of the object.

The apparatus of claim 3,
The method comprising the steps of: determining whether response voice data is received in response to voice data received from the calling apparatus; confirming an object to which the response voice data is transmitted when the response voice data is received; And when the gesture of the identified object is recognized, enlarges a face of the object to be checked, and transmits an enlarged image of the face of the object.

The apparatus of claim 3,
A voice recognition unit for performing voice recognition on the voice data received from the calling apparatus to determine whether or not a keyword related to one or more recipients is extracted and if the keyword associated with the one or more recipients is extracted, Determines whether or not response voice data is received in response to the voice data received from the calling apparatus, confirms an object to which the response voice data is transmitted when the response voice data is received, And if the gesture of the checked object is recognized, enlarges a face of the checked object, and transmits an enlarged image of the face of the object.

A method for displaying an object of interest in a video call,
Storing face images recognized from the image data;
A voice recognition unit for performing voice recognition on voice data received from a calling apparatus and gesture recognition for the video data to determine an object of interest according to a recognition result, And transmitting the data to the originating device.

10. The method of claim 9,
Further comprising the step of displaying the enlarged image data of the object of interest.

10. The method according to claim 9, wherein the step of storing the recognized face image comprises:
And storing the recognized face image in association with user information of a recipient corresponding to the recognized face image.

The method of claim 10, wherein the step of enlarging the object of interest and displaying the object on a screen comprises:
Determining whether a keyword associated with one or more recipients is extracted by performing speech recognition on speech data received from the originating device;
Detecting user information related to the extracted keyword when the keyword associated with the at least one recipient is extracted;
Enlarging a face of an object matching the face image corresponding to the detected user information from the image data;
And transmitting an enlarged image of the face of the object.

The method according to claim 11, wherein the step of enlarging the object of interest and displaying the object on a screen comprises:
Determining whether response voice data is received in response to the voice data received from the calling apparatus;
When the response voice data is received, confirming an object to which the response voice data is transmitted and enlarging a face of the verified object;
And transmitting an enlarged image of the face of the object.

The method according to claim 11, wherein the step of enlarging the object of interest and displaying the object on a screen comprises:
Determining whether a keyword associated with one or more recipients is extracted by performing speech recognition on speech data received from the originating device;
Detecting user information related to the extracted keyword when the keyword associated with the at least one recipient is extracted;
Determining whether response voice data is received in response to the voice data received from the calling apparatus;
Confirming an object to which the response voice data is transmitted when the response voice data is received;
And transmitting an enlarged image of the face of the object after enlarging the face of the object to be checked.

The method according to claim 11, wherein the step of enlarging the object of interest and displaying the object on a screen comprises:
Determining whether response voice data is received in response to the voice data received from the calling apparatus;
Determining whether a gesture of the confirmed subject is recognized by checking an object to which the response voice data is transmitted when the response voice data is received;
And enlarging a face of the checked object and transmitting an enlarged image of the face of the object, if the gesture of the identified object is recognized.

The method according to claim 11, wherein the step of enlarging the object of interest and displaying the object on a screen comprises:
Determining whether a keyword associated with one or more recipients is extracted by performing speech recognition on speech data received from the originating device;
Detecting user information related to the extracted keyword when the keyword associated with the at least one recipient is extracted;
Determining whether response voice data is received in response to the voice data received from the calling apparatus;
Determining whether a gesture for the checked object is recognized by checking an object to which the response voice data is transmitted when the response voice data is received;
And enlarging a face of the checked object and transmitting an enlarged image of the face of the object when the gesture of the checked object is recognized.