KR101189053B1

KR101189053B1 - Method For Video Call Based on an Avatar And System, Apparatus thereof

Info

Publication number: KR101189053B1
Application number: KR1020090083726A
Authority: KR
Inventors: 홍상우
Original assignee: 에스케이플래닛 주식회사
Priority date: 2009-09-05
Filing date: 2009-09-05
Publication date: 2012-10-10
Also published as: KR20110025720A

Abstract

본 발명은 아바타 기반 화상 통화 방법 및 시스템, 이를 지원하는 단말기에 관한 것으로, 화상 통화 시, 단말기는 상대측 단말기의 사용자에 대응하는 아바타를 표시하고, 상대측 단말기 사용자의 제스처 및 음성 중 적어도 하나를 인식하고 그에 대응하여 생성한 아바타 실행 코드를 상대측 단말기로부터 수신하고, 수신한 아바타 실행 코드에 따라 아바타의 상태 및 동작 중 적어도 하나를 조절하여 출력한다. 본 발명에 따르면, 화상 통화 중 상대측의 동작이나 음성 등을 기반으로 출력되는 아바타를 통하여 상대측의 다양한 감정 상태를 사용자에게 전달할 수 있다.The present invention relates to an avatar-based video call method and system and a terminal supporting the same. In a video call, the terminal displays an avatar corresponding to the user of the opposite terminal, recognizes at least one of a gesture and a voice of the other terminal user. The avatar execution code generated corresponding thereto is received from the counterpart terminal, and at least one of the state and operation of the avatar is adjusted and output according to the received avatar execution code. According to the present invention, various emotional states of the other party can be transmitted to the user through an avatar output based on the other party's motion or voice during a video call.

단말기, 화상 통화, 아바타, 제스처, 음성 Handset, video call, avatar, gesture, voice

Description

Avatar-based video call method and system, and a terminal supporting the same {Method For Video Call Based on an Avatar And System, Apparatus}

본 발명은 화상 통화 기술에 관한 것으로, 특히 화상 통화 시 화자의 제스처 및 음성 중 적어도 하나를 인식하고, 이를 기반으로 특정 아바타의 상태나 동작 중 적어도 하나를 조절할 수 있도록 지원하는 아바타 기반 화상 통화 방법 및 시스템, 이를 지원하는 단말기에 관한 것이다.The present invention relates to a video call technology, and more particularly, to an avatar-based video call method for recognizing at least one of a gesture and a voice of a speaker during a video call, and controlling at least one of a specific avatar's state and motion based thereon; System, and a terminal supporting the same.

최근 들어, 단말기는 배터리의 전기적 용량을 유지하면서 그 크기를 소형화시키는 기술의 비약적인 발전에 힘입어 다양한 옵션 기능을 구현할 수 있게 되었다. 예를 들면, 상기 단말기는 카메라를 장착하게 되면서, 특정 피사체에 대한 영상을 수집할 수 있는 기능을 통화 기능과 연계하여 지원하고 있다. 이에 따라, 단말기 사용자는 단말기에 장착된 카메라를 기반으로 타 단말기 사용자와 화상 통화 기능을 이용할 수 있다.Recently, terminals have been able to implement various optional functions thanks to the rapid development of technology for miniaturizing the size while maintaining the electrical capacity of the battery. For example, while the camera is mounted, the terminal supports a function of collecting an image of a specific subject in association with a call function. Accordingly, the terminal user may use a video call function with another terminal user based on a camera mounted on the terminal.

이러한 화상 통화 기능은 단순히 음성으로만 통화를 수행하던 시기에 비하여 상대측의 얼굴이나 배경을 확인하면서 통화를 하기 때문에, 단순히 음성만으로 통화하는 것에 비해서 사용자의 감정 상태를 상대측에게 보다 잘 전달할 수 있는 이 점이 있다.Since the video call function makes a call while checking the face or background of the other party as compared to the time when the call is made by voice only, the advantage of communicating the user's emotional state to the other party is better than simply making a voice call. have.

하지만 현재의 화상 통화는 다양한 환경적 조건 예를 들면, 대역폭, 전송 속도, 단말기의 데이터 처리 용량 등에 따라 제약을 받기 때문에, 영상이 끊어지거나 제대로 전송되지 않는 경우가 많다. 이에 따라, 화상 통화를 수행하더라도 상대측과 관련된 영상을 정확하게 인식하기가 어려운 문제점이 있다.However, the current video call is restricted by various environmental conditions, for example, bandwidth, transmission speed, data processing capacity of the terminal, and thus, video is often disconnected or not transmitted properly. Accordingly, there is a problem that it is difficult to accurately recognize the image related to the other party even when performing a video call.

따라서 본 발명의 목적은 화상 통화 시 화자의 제스처 또는 음성 중 적어도 하나를 인식하여 특정 아바타의 상태 및 동작 중 적어도 하나를 조절한 후, 조절된 아바타를 출력할 수 있는 화상 통화 방법 및 시스템, 이를 지원하는 단말기를 제공함에 있다.Accordingly, an object of the present invention is to recognize at least one of a gesture or a voice of a speaker during a video call and to adjust at least one of the state and operation of a specific avatar, and then to output the adjusted avatar. The present invention provides a terminal.

상술한 바와 같은 목적을 달성하기 위하여, 본 발명은 제1 단말기와 제2 단말기를 포함하는 아바타 기반 화상 통화 시스템을 제공한다. 상기 제1 단말기는 화상 통화 채널이 형성되면, 제2 단말기에 대응하는 아바타를 출력하고, 상기 제2 단말기로부터 수신한 아바타 실행 코드를 기반으로 출력된 상기 아바타의 상태 및 동작 중 적어도 하나를 조절하여 출력하며, 상기 제2 단말기는 자신의 카메라가 수집한 피사체의 동작 상태 및 동작 변화로부터 인식하는 특정 제스처 및 마이크가 수집한 오디오 신호로부터 인식하는 특정 음성 중 적어도 하나를 기반으로 아바타의 상태 및 동작 중 적어도 하나를 조절하기 위한 상기 아바타 실행 코드를 생성하고, 상기 아바타 실행 코드를 상기 제1 단말기에 전송한다.In order to achieve the above object, the present invention provides an avatar-based video call system including a first terminal and a second terminal. When the video call channel is formed, the first terminal outputs an avatar corresponding to the second terminal, and adjusts at least one of the state and operation of the avatar based on the avatar execution code received from the second terminal. The second terminal outputs the status and operation of the avatar based on at least one of a specific gesture recognized from an operation state and an operation change of a subject collected by the camera and a specific voice recognized from an audio signal collected by a microphone. Generate the avatar execution code to adjust at least one, and transmit the avatar execution code to the first terminal.

본 발명은 또한, 통신망 및 아바타 제공서버를 포함하여 구성되는 아바타 기반 화상 통화 시스템을 제공한다. 상기 통신망은 제1 단말기와 제2 단말기 간의 화상 통화를 제어한다. 상기 아바타 제공서버는 상기 통신망을 매개로 화상 통화 채널이 형성된 후 상기 제1 또는 제2 단말기로부터 아바타 요청 신호를 수신하면, 상 기 아바타 요청 신호를 발신한 단말기로 상대측에 대응하는 아바타를 전송하고, 상기 아바타를 수신한 단말기의 상대측 단말기로부터 화상 통화 데이터를 수신하고, 상기 수신한 화상 통화 데이터에서 상기 상대측이 취하는 특정 제스처 및 음성 인식에 기반하여 아바타 실행 코드를 생성하고, 상기 아바타 실행 코드를 상기 아바타를 수신한 단말기로 전송한다.The present invention also provides an avatar-based video call system including a communication network and an avatar providing server. The communication network controls a video call between the first terminal and the second terminal. When the avatar providing server receives an avatar request signal from the first or second terminal after a video call channel is formed through the communication network, the avatar providing server transmits an avatar corresponding to the other party to the terminal that sent the avatar request signal. Receive video call data from a counterpart terminal of the terminal receiving the avatar, generate an avatar execution code based on a specific gesture and voice recognition taken by the other party from the received video call data, and convert the avatar execution code into the avatar Send to the received terminal.

본 발명은 또한, 송수신부 및 서버제어부를 포함하여 구성되는 아바타 기반 화상 통화 시스템의 아바타 제공서버를 제공한다. 상기 송수신부는 통신망을 매개로 제1 단말기 또는 제2 단말기와 통신을 수행한다. 상기 서버제어부는 상기 통신망을 매개로 화상 통화 채널이 형성된 후 상기 제1 또는 제2 단말기로부터 아바타 요청 신호를 수신하면, 상기 아바타 요청 신호를 발신한 단말기로 상대측에 대응하는 아바타를 전송하고, 상기 아바타를 수신한 단말기의 상대측 단말기로부터 화상 통화 데이터를 수신하고, 상기 수신한 화상 통화 데이터에서 상기 상대측이 취하는 특정 제스처 및 음성 인식에 기반하여 아바타 실행 코드를 생성하고, 상기 아바타 실행 코드를 상기 아바타를 수신한 단말기로 전송한다.The present invention also provides an avatar providing server of an avatar-based video call system including a transceiver and a server controller. The transceiver unit communicates with the first terminal or the second terminal via a communication network. When the server control unit receives an avatar request signal from the first or second terminal after the video call channel is formed through the communication network, the server controller transmits the avatar corresponding to the other party to the terminal that sent the avatar request signal, and the avatar Receive video call data from a counterpart terminal of the receiving terminal, generate an avatar execution code based on a specific gesture and voice recognition taken by the other party from the received video call data, and receive the avatar execution code from the avatar execution code. Send to one terminal.

본 발명은 또한, 카메라와, 표시부, 제어부 그리고 통신부와 저장부의 구성을 포함하는 아바타 기반 화상 통화 시스템의 단말기를 제공한다. 상기 카메라는 화상 통화를 위하여 상기 상대측 단말기로 전송할 영상을 수집하며, 상기 표시부는 상기 상대측 단말기로부터 수신한 상대측의 영상을 출력한다. 그리고 상기 제어부는 상기 상대측 단말기와 화상 통화 채널이 형성되면, 상기 상대측 단말기에 대응하는 아바타를 상기 표시부에 출력하고, 상기 상대측 단말기로부터 상기 상대측의 제스처 및 음성 중 적어도 하나를 인식하여 생성된 아바타 실행 코드를 수신하고, 상기 아바타 실행 코드에 따라 상기 아바타의 상태 및 동작 중 적어도 하나를 조절하여 상기 표시부로 출력하도록 제어한다. 상기 통신부는 화상 통화를 위한 화상 통화 채널을 형성하고, 상기 아바타 실행 코드를 상기 화상 통화 채널 또는 메시지 서비스 채널 등을 이용하여 전송하도록 지원할 수 있다. 그리고 상기 저장부는 상대측 단말기로부터 아바타를 수신하거나, 자신의 단말기에 저장된 특정 아바타를 상기 상대측 단말기의 아바타로 지정하는 경우, 아바타를 상대측 단말기의 전화번호에 연결하여 저장할 수 있다.The present invention also provides a terminal of an avatar-based video call system including a camera, a display unit, a controller, and a communication unit and a storage unit. The camera collects an image to be transmitted to the opposite terminal for a video call, and the display unit outputs an image of the opposite party received from the opposite terminal. The controller outputs an avatar corresponding to the counterpart terminal to the display unit when a video call channel is established with the counterpart terminal, and recognizes at least one of a gesture and a voice of the counterpart terminal from the counterpart terminal. And control at least one of a state and an operation of the avatar according to the avatar execution code to output to the display unit. The communication unit may support to form a video call channel for a video call and to transmit the avatar execution code using the video call channel or a message service channel. When the avatar receives an avatar from the opposite terminal or designates a specific avatar stored in its own terminal as the avatar of the opposite terminal, the storage unit may store the avatar by connecting to the telephone number of the opposite terminal.

본 발명은 또한, 제1 단말기 및 제2 단말기 간에 화상 통화 채널이 형성된 후 상기 제1 또는 제2 단말기로부터 아바타 요청 신호를 수신하면, 아바타 제공서버가 상기 아바타 요청 신호를 발신한 단말기로 상대측에 대응하는 아바타를 전송하는 아바타 전송 단계, 상기 아바타 제공서버가 상기 아바타를 수신한 단말기의 상대측 단말기로부터 화상 통화 데이터를 수신하는 수신 단계, 상기 아바타 제공서버가 상기 수신한 화상 통화 데이터에서 상기 상대측이 취하는 특정 제스처 및 음성 인식에 기반하여 아바타 실행 코드를 생성하는 생성 단계, 상기 아바타 제공서버가 상기 아바타 실행 코드를 상기 아바타를 수신한 단말기로 전송하는 아바타 실행 코드 전송 단계를 포함하는 아바타 기반의 화상 통화 방법을 제공한다.The present invention also provides a counterpart to a terminal to which the avatar providing server sends the avatar request signal when receiving an avatar request signal from the first or second terminal after a video call channel is formed between the first terminal and the second terminal. An avatar transmitting step of transmitting an avatar, a receiving step of the avatar providing server receiving video call data from a counterpart terminal of the terminal receiving the avatar, and the avatar providing server specifying the counterpart from the received video call data; Generating an avatar execution code based on a gesture and a voice recognition; and transmitting the avatar execution code by the avatar providing server to the terminal receiving the avatar. to provide.

본 발명은 또한, 제1 단말기와 제2 단말기가 화상 통화 채널을 형성하는 채널 형성 단계, 상기 제1 단말기가 상기 제2 단말기에 대응하는 아바타를 출력하는 출력 단계, 상기 제2 단말기의 카메라가 수집한 영상의 동작 상태 및 동작 변화로 부터 인식한 특정 제스처 및 마이크가 수집한 오디오 신호로부터 음성 인식한 특정 음성 중 적어도 하나를 기반으로 상기 아바타의 상태 및 동작 중 적어도 하나를 조절할 수 있는 아바타 실행 코드를 상기 제1 단말기가 상기 제2 단말기로부터 수신하는 수신 단계, 상기 제1 단말기가 상기 아바타 실행 코드에 따라 상기 아바타의 상태 및 동작 중 적어도 하나를 조절하여 출력하는 조절 출력 단계를 포함하는 아바타 기반의 화상 통화 방법을 제공한다.The present invention also provides a channel forming step of forming a video call channel between a first terminal and a second terminal, an output step of outputting an avatar corresponding to the second terminal by the first terminal, and collecting by the camera of the second terminal. Avatar execution code for adjusting at least one of the avatar's state and operation based on at least one of a specific gesture recognized from the operation state and the change of motion of the image and the audio signal collected by the microphone An avatar-based image including a reception step received by the first terminal from the second terminal, and an output step of adjusting and outputting at least one of a state and an operation of the avatar according to the avatar execution code by the first terminal; Provide a call method.

본 발명의 화상 통화 방법 및 시스템, 이를 지원하는 단말기에 따르면, 화상 통화 시, 단말기는 상대측 단말기의 사용자에 대응하는 아바타를 표시하고, 상대측 단말기 사용자의 제스처 및 음성 중 적어도 하나를 인식하고 그에 대응하여 생성한 아바타 실행 코드를 상대측 단말기 또는 아바타 제공서버로부터 수신하고, 수신한 아바타 실행 코드에 따라 아바타의 상태 및 동작 중 적어도 하나를 조절하여 출력하기 때문에, 아바타를 통하여 상대측의 감정상태를 사용자에게 효과적으로 전달할 수 있다. 즉 상대측이 취하는 특정 제스처 및 음성을 통하여 상대측의 현재의 감정상태, 예컨대 기쁨, 슬픔, 놀람, 즐거움, 화남, 짜증남 등을 인식하고, 이를 아바타를 통하여 표현하기 때문에, 화상 통화 시스템은 사용자가 아바타를 통하여 상대측의 감정상태를 인지하면서 화상 통화를 원활하게 수행할 수 있도록 한다. 따라서 화상 통화 중 상대측의 동작이나 음성 등을 기반으로 출력되는 아바타를 통하여 상대측의 다양한 감정 상태를 사용자에게 전달할 수 있다.According to the video call method and system of the present invention and a terminal supporting the same, during a video call, the terminal displays an avatar corresponding to the user of the opposite terminal, recognizes at least one of a gesture and a voice of the other terminal user, and Since the generated avatar execution code is received from the counterpart terminal or the avatar providing server and the at least one of the avatar's state and operation is output according to the received avatar execution code, the avatar execution code is effectively transmitted to the user through the avatar. Can be. That is, the video call system recognizes the avatar's current emotional state, such as joy, sadness, surprise, pleasure, anger, annoyance, and the like through an avatar through a specific gesture and voice taken by the other party. Through this, the video call can be smoothly performed while recognizing the emotional state of the other party. Accordingly, various emotional states of the other party may be transmitted to the user through an avatar output based on the other party's motion or voice during a video call.

이하, 본 발명에 따른 바람직한 실시 예를 첨부한 도면을 참조하여 상세히 설명한다. 하기의 설명에서는 본 발명의 실시 예에 따른 동작을 이해하는데 필요한 부분만이 설명되며, 그 이외 부분의 설명은 본 발명의 요지를 흩트리지 않도록 생략될 것이라는 것을 유의하여야 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description, only parts necessary for understanding the operation according to the embodiment of the present invention will be described, and the description of other parts will be omitted so as not to disturb the gist of the present invention.

이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시 예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일실시 예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다.The terms and words used in the present specification and claims should not be construed as limited to ordinary or dictionary meanings and the inventor is not limited to the meaning of the terms in order to describe his invention in the best way. It should be interpreted as meaning and concept consistent with the technical idea of the present invention. Therefore, the embodiments described in the present specification and the configurations shown in the drawings are merely the most preferred embodiments of the present invention, and not all of the technical ideas of the present invention are described. Therefore, It is to be understood that equivalents and modifications are possible.

여기서 '아바타'는 단말기 사용자의 감정상태를 대신하여 표현하는 그래픽 컨텐츠로서, 애니메이션 캐릭터, 동영상, 정지영상, UCC(User Created Contents), 이모티콘, 플래시(flash), 영상과 진동이 조합된 햅틱 컨텐츠 등을 포함한다.Here, 'Avatar' is a graphic content that represents the emotional state of the terminal user, such as animated characters, video, still images, UCC (User Created Contents), emoticons, flash, haptic content combined with video and vibration, etc. It includes.

도 1은 본 발명의 실시 예에 따른 화상 통화 운용을 지원하기 위한 화상 통화 시스템의 구성을 개략적으로 나타낸 도면이다.1 is a diagram schematically illustrating a configuration of a video call system for supporting video call operation according to an exemplary embodiment of the present invention.

설명에 앞서, 이하에서는 본 발명의 화상 통화 서비스를 설명하기 위하여 단말기가 이동통신망을 이용하는 형태의 단말기로서 설명하지만, 본 발명이 이에 한정되는 것은 아니다. 즉, 본 발명의 화상 통화 서비스는 이동통신망을 이용하는 단 말기뿐만 아니라, 카메라를 장착하여 화상 통화가 가능한 일반 유선 단말기, 고정형 단말기, IP 단말기 등 다양한 단말기에 적용될 수 있을 것이다. 상기 단말기가 화상 통화를 지원할 수 있는 유선 단말기인 경우, 상기 이동통신망은 유선을 기반으로 하는 화상 통화를 지원할 수 있는 교환기망이 될 수 있다. 또한, 상기 단말기가 화상 통화 지원할 수 있는 IP 단말기인 경우, 상기 이동통신망은 화상 통화를 지원할 수 있는 IP 네트워크로 대체될 수 있다. 즉 단말기 간의 화상 통화 서비스를 지원하는 통신망은 이동통신망, 교환기망, IP 네트워크를 포함한다.In the following description, the terminal is described as a terminal using a mobile communication network to describe the video call service of the present invention. However, the present invention is not limited thereto. That is, the video call service of the present invention may be applied to various terminals such as a general wired terminal, a fixed terminal, an IP terminal, and the like, which are equipped with a camera, as well as a terminal using a mobile communication network. When the terminal is a wired terminal capable of supporting a video call, the mobile communication network may be a switch network capable of supporting a video call based on a wired line. In addition, when the terminal is an IP terminal capable of supporting a video call, the mobile communication network may be replaced with an IP network capable of supporting a video call. That is, a communication network supporting a video call service between terminals includes a mobile communication network, an exchange network, and an IP network.

상기 도 1을 참조하면, 본 발명의 화상 통화 시스템은 제1 단말기(100), 제2 단말기(200) 및 이동통신망(300)을 포함할 수 있다.Referring to FIG. 1, the video call system of the present invention may include a first terminal 100, a second terminal 200, and a mobile communication network 300.

이와 같은 구성을 가지는 본 발명의 화상 통화 시스템은 제1 단말기(100) 또는 제2 단말기(200)가 이동통신망(300)을 통하여 상대측 단말기에 화상 통화 요청을 수행할 수 있으며, 이 과정에서, 화상 통화 채널이 형성되면, 제1 단말기(100) 및 제2 단말기(200) 중 적어도 하나는 상대측 단말기에 대응하는 상대측 아바타를 화면에 출력할 수 있다. 이때, 상기 상대측 아바타는 각 단말기의 저장부에 기 저장되어 있거나, 상대측 단말기로부터 수신하여 출력할 수 있다.In the video call system of the present invention having the configuration as described above, the first terminal 100 or the second terminal 200 may perform a video call request to the opposite terminal through the mobile communication network 300. When the call channel is formed, at least one of the first terminal 100 and the second terminal 200 may output a counterpart avatar corresponding to the counterpart terminal on the screen. In this case, the counterpart avatar may be previously stored in the storage unit of each terminal, or may be received from the counterpart terminal and output.

제1 단말기(100) 및 제2 단말기(200)는 화상 통화 채널이 형성된 이후, 카메라를 통하여 수집되는 피사체의 정지 영상 또는 동영상을 획득하고, 화자의 음성을 획득한다. 제1 단말기(100) 및 제2 단말기(200)는 피사체의 영상에서 특정 동작 상태나 동작의 변화를 추출하여 특정 제스처에 해당하는지를 검사하고, 음성을 인식하여 특정 음성이 인식되는지를 검사한다. 상기 제1 단말기(100) 및 제2 단말 기(200)는 동작 상태 및 동작 변화가 특정 제스처에 해당하거나, 음성 인식 결과가 특정 음성으로 판단되면, 그에 대응하는 아바타 실행 코드를 생성하고, 이를 상대측 단말기에 전송할 수 있다. 그리고 제1 단말기(100) 및 제2 단말기(200)는 상대측 단말기로부터 수신된 아바타 실행 코드를 현재 화면에 출력되고 있는 상대측 아바타에 적용하여 상대측 아바타의 표정이나 상태 및 동작 중 적어도 하나를 제어할 수 있다. 여기서, 상기 제1 단말기(100) 및 제2 단말기(200)는 각각 자신의 아바타를 자신의 표시부에 출력할 수 있으며, 상대측 단말기로 전송하는 아바타 실행 코드에 따라 자신의 아바타의 상태 및 동작 중 적어도 하나를 조절하고, 조절된 자신의 아바타를 표시부에 출력할 수 도 있다. 이러한 과정을 통하여 상기 제1 단말기(100) 및 제2 단말기(200)는 상대측 단말기 사용자의 특정 동작이나 특정 상태 및 특정 음성을 통하여 화자가 나타내고자 하는 바를 아바타의 변화를 통하여 정확하고 빠르게 인식할 수 있다.After the video call channel is formed, the first terminal 100 and the second terminal 200 acquire a still image or a moving image of a subject collected through a camera, and obtain a speaker's voice. The first terminal 100 and the second terminal 200 extract a specific operation state or a change in the operation from the image of the subject to check whether it corresponds to a specific gesture, and recognize whether the specific voice is recognized by recognizing the voice. The first terminal 100 and the second terminal device 200 generates an avatar execution code corresponding to the specific gesture when the operation state and the operation change correspond to a specific gesture or the voice recognition result is a specific voice, and the counterpart side generates the avatar execution code. Can be sent to the terminal. In addition, the first terminal 100 and the second terminal 200 may control at least one of an expression, a state, and an operation of the other avatar by applying the avatar execution code received from the other terminal to the other avatar being output on the current screen. have. Here, each of the first terminal 100 and the second terminal 200 may output its own avatar to its display unit, and at least one of a state and an operation of the avatar according to the avatar execution code transmitted to the counterpart terminal. One may adjust one and output the adjusted one's avatar to the display unit. Through this process, the first terminal 100 and the second terminal 200 can accurately and quickly recognize what the speaker wants to indicate through the specific operation, the specific state, and the specific voice of the counterpart terminal user through the change of the avatar. have.

이러한 본 실시 예에 따른 화상 통화 서비스 지원을 위하여 상기 제1 단말기(100) 및 제2 단말기(200)는 도 2에 도시된 바와 같은 구성을 포함할 수 있다. 여기서, 상기 제1 단말기(100) 및 제2 단말기(200)는 실질적으로 송신 기능과 수신 기능을 수행하기 위하여 동일한 구성을 포함할 수 있으므로, 이하 설명에서, 상기 제1 단말기(100)와 제2 단말기(200)는 단말기로서 명명하여 설명하기로 하며 도면 번호는 제1 단말기(100)의 도면 번호와 동일하게 부여하여 설명하기로 한다.In order to support the video call service according to the present embodiment, the first terminal 100 and the second terminal 200 may include a configuration as shown in FIG. 2. Here, since the first terminal 100 and the second terminal 200 may include the same configuration to substantially perform the transmission function and the reception function, in the following description, the first terminal 100 and the second terminal The terminal 200 will be described as a terminal and the drawing number will be described by giving the same as the drawing number of the first terminal 100.

도 2는 본 발명의 실시 예에 따른 단말기의 구성을 개략적으로 나타낸 블록도이다.2 is a block diagram schematically illustrating a configuration of a terminal according to an exemplary embodiment of the present invention.

상기 도 2를 참조하면, 본 발명의 단말기(100)는 통신부(110), 입력부(120), 오디오 처리부(130), 표시부(140), 저장부(150), 카메라(170) 및 제어부(160)의 구성을 포함할 수 있다.2, the terminal 100 of the present invention includes a communication unit 110, an input unit 120, an audio processor 130, a display unit 140, a storage unit 150, a camera 170, and a controller 160. It may include the configuration of).

상기 통신부(110)는 상기 제어부(160)의 제어에 따라 화상 통화 데이터를 송신 및 수신한다. 상기 통신부(110)는 송신되는 신호의 주파수를 상승변환 및 증폭하는 무선 주파수 송신부와, 수신되는 신호를 저잡음 증폭하고 주파수를 하강 변환하는 무선 주파수 수신부 등을 포함할 수 있다. 특히, 통신부(110)는 제어부(160) 제어에 따라 자신의 아바타를 상대측 단말기에 전송할 수 있도록 하는 통신 채널, 자신의 아바타 상태나 동작을 조절할 수 있는 아바타 실행 코드를 전송할 수 있도록 하는 통신 채널을 상대측 단말기와 형성할 수 있다. 여기서 상기 통신부(110)는 제어부(160) 제어에 따라 별도의 데이터 통신 채널을 생성하여 아바타와 관련된 데이터 즉, 아바타 데이터 및 아바타의 상태 및 동작을 조절하기 위한 아바타 실행 코드를 전송하거나, 이미 형성되어 있는 화상 통화 채널을 통하여 아바타 데이터 및 아바타 실행 코드를 전송할 수 있다. 또한 상기 통신부(110)는 현재 생성되어 있는 화상 통화 채널을 통하여 상기 아바타 데이터와 아바타 실행 코드를 전송하거나, 별도의 데이터 통신 채널이나 메시지 서비스 채널 예를 들면, SMS(Short Message Service) 또는 MMS(Multimedia Message Service) 채널을 일시적으로 생성하여 상기 아바타 데이터 및 아바타 실행 코드를 전송할 수 있다.The communication unit 110 transmits and receives video call data under the control of the controller 160. The communication unit 110 may include a radio frequency transmitter for upconverting and amplifying a frequency of a transmitted signal, and a radio frequency receiver for low noise amplifying and downconverting a received signal. In particular, the communication unit 110 transmits a communication channel for transmitting its avatar to the counterpart terminal under control of the controller 160, and a communication channel for transmitting an avatar execution code for adjusting its avatar state or motion. It can be formed with the terminal. In this case, the communication unit 110 generates a separate data communication channel under the control of the control unit 160 to transmit data related to the avatar, namely, avatar data and avatar execution code for adjusting the state and operation of the avatar, or have already been formed. The avatar data and the avatar execution code may be transmitted through the video call channel. In addition, the communication unit 110 transmits the avatar data and the avatar execution code through the currently created video call channel, or a separate data communication channel or a message service channel, for example, a short message service (SMS) or a multimedia (MMS). Message Service) can be temporarily created to transmit the avatar data and the avatar execution code.

상기 입력부(120)는 숫자 또는 문자 정보를 입력받고 각종 기능들을 설정하기 위한 다수의 입력키 및 기능키들을 포함한다. 상기 기능키들은 특정 기능을 수 행하도록 설정된 방향키, 사이드 키 및 단축키 등을 포함할 수 있다. 또한 상기 입력부(120)는 사용자 설정 및 단말기들의 기능 제어와 관련하여 입력되는 키 신호를 생성하고, 이를 제어부(160)로 전달한다. 즉, 상기 입력부(120)는 상대측 단말기의 전화번호 입력을 위한 입력 신호, 입력된 전화번호를 기반으로 화상 통화 호 형성을 위한 입력 신호, 화상 통화 시 아바타 이용을 설정할 수 있는 모드 선택 신호, 화상 통화 종료를 위한 입력 신호 등을 사용자 요청에 따라 생성하여 상기 제어부(160)로 전달한다.The input unit 120 includes a plurality of input keys and function keys for receiving numeric or character information and setting various functions. The function keys may include direction keys, side keys, shortcut keys, and the like, which are set to perform a specific function. In addition, the input unit 120 generates a key signal input in relation to user setting and function control of the terminals, and transmits it to the controller 160. That is, the input unit 120 inputs an input signal for inputting a phone number of a counterpart terminal, an input signal for forming a video call call based on the input phone number, a mode selection signal for setting an avatar use during a video call, and a video call. An input signal for termination and the like are generated according to a user request and transmitted to the controller 160.

상기 오디오 처리부(130)는 상기 화상 통화 시 송수신 되는 오디오 데이터를 재생하기 위한 스피커(SPK)와, 화상 통화 시 사용자의 음성 또는 기타 오디오 신호를 수집하기 위한 마이크(MIC)를 포함한다. 여기서, 상기 오디오 처리부(130) 음성 통화를 위해 설치된 스피커와 별도로 화상 통화 시 수신된 오디오 데이터를 출력하기 위한 스피커를 더 포함할 수 있다. 그리고 상기 오디오 처리부(130)는 상기 마이크(MIC)가 수집한 오디오 신호를 음성 인식을 위하여 제어부(160)에 전달할 수 있다.The audio processor 130 includes a speaker SPK for playing audio data transmitted and received during the video call, and a microphone MIC for collecting a user's voice or other audio signal during the video call. Here, the audio processor 130 may further include a speaker for outputting audio data received during a video call separately from the speaker installed for the voice call. The audio processor 130 may transmit the audio signal collected by the microphone MIC to the controller 160 for speech recognition.

상기 표시부(140)는 LCD(Liquid Crystal Display), OLED 등이 적용될 수 있으며, LCD가 적용되는 경우 표시부(140)는 LCD 제어부, 데이터를 저장할 수 있는 메모리 및 LCD 표시소자 등을 구비할 수 있다. 상기 LCD 및 OLED 등의 표시 장치를 터치스크린 방식으로 구현하는 경우, 표시부(140) 화면은 입력부로 동작될 수 있다. 특히, 본 발명의 표시부(140)는 화상 통화 시, 카메라(170)를 통해 수집되어 영상 처리된 화상 데이터와, 상대측 단말기가 전송한 화상 데이터 중 적어도 하나 를 표시할 수 있다. 예를 들어, 화상 통화를 수행하는 제1 단말기(100)와 제2 단말기(200)를 가정하면, 제1 단말기(100)의 표시부(140)는 제2 단말기(200)가 전송한 화상 데이터를 표시하는 제1 화면 영역과, 제1 단말기(100)에 장착된 카메라(170)가 수집하여 영상 처리한 화상 데이터를 표시하는 제2 화면 영역을 포함한다. 이때, 상기 제1 화면 영역 및 제2 화면 영역은 동일 평면상에 화면을 분할하여 표시될 수 있으며, 제1 화면 영역 상에 제2 화면 영역이 오버랩 되어 표시될 수 도 있다. 한편, 상기 표시부(140)에 표시되는 제1 화면 영역은 제2 단말기(200)가 전송한 화상 데이터뿐만 아니라, 제2 단말기(200)에 대응하는 상대측 아바타를 함께 출력할 수 있다. 여기서, 기 설정된 아바타는 제1 단말기(100) 사용자가 선택한 아바타가 될 수 있다. 그리고 제2 화면 영역은 제1 단말기(100)의 카메라(170)가 수집한 화상 데이터뿐만 아니라, 제1 단말기(100)에 대응하는 자신의 아바타를 함께 출력할 수 있다. 상기 제2 단말기(200)도 제1 단말기(100)와 유사하게 자신의 화상 데이터와 자신의 아바타 및 상대측 화상 데이터와 상대측 아바타를 출력하는 제1 화면 영역 및 제2 화면 영역 중 적어도 하나를 출력할 수 있다. 이러한 표시부(140) 화면 인터페이스에 대하여 도 4를 참조하여 보다 상세히 설명하기로 한다.The display unit 140 may be a liquid crystal display (LCD), an OLED, or the like, and when the LCD is applied, the display unit 140 may include an LCD controller, a memory capable of storing data, and an LCD display device. When a display device such as an LCD and an OLED is implemented by using a touch screen method, the screen of the display unit 140 may be operated as an input unit. In particular, the display unit 140 of the present invention may display at least one of image data collected and image-processed through the camera 170 and image data transmitted from the counterpart terminal during the video call. For example, assuming that the first terminal 100 and the second terminal 200 perform a video call, the display unit 140 of the first terminal 100 receives the image data transmitted by the second terminal 200. And a second screen area for displaying image data collected and image-processed by the camera 170 mounted on the first terminal 100. In this case, the first screen area and the second screen area may be displayed by dividing the screen on the same plane, and the second screen area may be displayed overlapping on the first screen area. Meanwhile, the first screen area displayed on the display unit 140 may output not only image data transmitted by the second terminal 200, but also the other party's avatar corresponding to the second terminal 200. Here, the preset avatar may be an avatar selected by the user of the first terminal 100. In addition, the second screen area may output not only image data collected by the camera 170 of the first terminal 100, but also its own avatar corresponding to the first terminal 100. Similar to the first terminal 100, the second terminal 200 also outputs at least one of a first screen area and a second screen area for outputting its own image data, its own avatar, its opponent's image data, and its opponent's avatar. Can be. The display interface of the display unit 140 will be described in more detail with reference to FIG. 4.

상기 저장부(150)는 본 발명의 실시 예에 따른 기능 동작에 필요한 응용 프로그램을 비롯하여, 화상 통화에 필요한 응용 프로그램, 카메라(170) 동작을 위한 응용 프로그램, 아바타 데이터, 아바타 조절을 위한 아바타 실행 코드들을 저장한다. 그리고 상기 저장부(150)는 화상 통화 시 송수신 되는 화상 통화 데이터에 대하여 지정된 크기만큼 버퍼링하는 기능을 담당할 수 있다. 이러한 상기 저장 부(150)는 크게 프로그램 영역과 데이터 영역을 포함할 수 있다.The storage unit 150 includes an application program for operating a function according to an embodiment of the present invention, an application program required for a video call, an application program for operating the camera 170, avatar data, and an avatar execution code for controlling an avatar. Save them. The storage unit 150 may be responsible for buffering the video call data transmitted / received during a video call by a predetermined size. The storage unit 150 may largely include a program area and a data area.

상기 프로그램 영역은 단말기(100)를 부팅시키는 운영체제(OS, Operating System), 화상 통화를 위한 카메라(170) 조작 및 오디오 수집을 위한 응용 프로그램, 단말기(100)의 기타 옵션(options) 기능 예컨대, 소리 재생 기능, 이미지 또는 동영상 재생 기능 등에 필요한 응용 프로그램 등을 저장한다. 상기 단말기(100)는 사용자 요청에 상응하여 상기한 각 기능을 활성화하는 경우, 상기 제어부(160)의 제어 하에 해당 응용 프로그램들을 이용하여 각 기능을 제공하게 된다. 특히, 본 발명의 프로그램 영역은 아바타 지원을 위하여 아바타 서비스 모듈, 제스처 인식 모듈, 음성 인식 모듈을 포함할 수 있다. 상기 아바타 서비스 모듈, 제스처 인식 모듈, 음성 인식 모듈은 본 발명의 아바타 기반 화상 통화 서비스 모드가 설정되어 있으며, 화상 통화가 실행되는 경우, 제어부(160)에 로드되어 활성화될 수 있는 응용 프로그램이 될 수 있다. 상기 제스처 인식 모듈은 카메라(170)가 수집한 피사체의 동작 상태 또는 동작 변화로부터 특정 제스처를 인식하는 루틴, 인식된 특정 제스처에 대응하는 아바타 실행 코드를 생성하는 루틴을 포함할 수 있다. 상기 음성 인식 모듈은 오디오 처리부(130)의 마이크(MIC)가 수집한 오디오 신호를 음성 인식하는 루틴, 음성 인식된 단어나 문장 등이 특정 음성에 대응하는지 판단하는 루틴, 특정 음성에 대응하는 경우, 그에 매핑된 아바타 실행 코드를 생성하는 루틴을 포함할 수 있다. 상기 아바타 서비스 모듈은 상기 제스처 인식 모듈 및 상기 음성 인식 모듈로부터 아바타 실행 코드를 수신하는 경우, 수신된 아바타 실행 코드를 자신의 아바타에 적용하는 루틴 및 수신된 아바타 실행 코드를 상대측 단말기에 전송 하는 루틴을 포함할 수 있다.The program area may include an operating system (OS) for booting the terminal 100, an application program for operating a camera 170 and collecting audio for a video call, and other option functions of the terminal 100, for example, a sound. Stores applications required for playback, image or video playback. When the terminal 100 activates the above functions in response to a user request, the terminal 100 provides each function by using the corresponding application programs under the control of the controller 160. In particular, the program area of the present invention may include an avatar service module, a gesture recognition module, and a voice recognition module to support an avatar. The avatar service module, the gesture recognition module, and the voice recognition module are set to the avatar-based video call service mode of the present invention, and when the video call is executed, the avatar service module, the gesture recognition module, and the voice recognition module may be an application program that can be loaded and activated in the controller 160. have. The gesture recognition module may include a routine for recognizing a specific gesture from an operation state or an operation change of a subject collected by the camera 170, and a routine for generating an avatar execution code corresponding to the recognized specific gesture. When the speech recognition module corresponds to a routine for recognizing an audio signal collected by a microphone (MIC) of the audio processor 130, a routine for determining whether a speech recognized word or sentence corresponds to a specific speech, It may include a routine for generating an avatar execution code mapped thereto. When the avatar service module receives the avatar execution code from the gesture recognition module and the voice recognition module, the avatar service module applies a routine to apply the received avatar execution code to its avatar and transmits the received avatar execution code to the counterpart terminal. It may include.

상기 데이터 영역은 단말기(100)의 사용에 따라 발생하는 데이터가 저장되는 영역으로서, 화상 통화 시, 사용자가 녹화한 데이터, 카메라(170)가 수집한 영상 중 사용자가 저장하고자 하는 데이터, 상기 단말기(100)가 제공하는 다양한 옵션 기능과 관련한 사용자 데이터들 예컨대, 동영상과, 폰 북 데이터와, 오디오 데이터 및 해당 컨텐츠 또는 사용자 데이터들에 대응하는 정보들 등을 저장할 수 있다. 특히, 본 발명의 데이터 영역은 일정 형태의 동작과 상태를 이미지로 출력할 수 있는 아바타 구현을 위한 다양한 아바타 데이터를 저장할 수 있다. 이를 이용하여 단말기(100) 사용자는 폰 북 데이터 저장 시에 특정 전화번호에 대응하는 아바타를 설정할 수 있다. 상기 특정 전화번호에 링크된 상대측 아바타가 존재하며, 해당 전화번호를 가진 단말기와 본 발명의 화상 통화 서비스를 수행하는 경우, 상기 상대측 아바타는 제어부(160)에 의하여 로드되어 표시부(140)에 출력될 수 있다. 이때, 자신의 아바타도 사용자 선택에 따라 표시부(140)에 출력될 수 있다. 상기 아바타는 아바타 실행 코드에 따라 다양한 상태 또는 다양한 동작을 포함하는 애니메이션을 수행할 수 있다. 예를 들면, 상기 아바타는 특정 아바타 실행 코드에 따라 다른 형태나 모양으로 변경될 수 도 있으며, 특정 이미지에서 다른 이미지가 추가되는 형태가 될 수 도 있다. 이러한 아바타의 연출 형태를 위한 아바타 실행 코드는 아바타의 설계자 또는 사용자의 설정 조작 등을 통하여 삭제되거나 추가될 수 있다. 여기서 상기 아바타의 변경 형태는 각 아바타의 형태에 따라 그 모양이 달라질 수 있으며, 이 또한 아바타 설계자 또는 사용자의 설정 조작에 의하여 변경될 수 있다. 한편, 아바타 실행 코드는 상대측 단말기에 전송되어 상대측 단말기가 출력하고 있는 아바타의 상태 및 동작을 조절하여야 하기 때문에, 아바타 실행 코드 전송을 위하여 단말기들 상호 간에 미리 규약 되어지는 것이 바람직하다. 상기 데이터 영역은 상대측 단말기로부터 수신한 제1 아바타 실행 코드 및 자신의 아바타 조절을 위한 제2 아바타 실행 코드를 임시 저장할 수 있으며, 새로운 제1 아바타 실행 코드를 상대측 단말기로부터 수신하기 전까지 이전 제1 아바타 실행 코드의 저장을 유지하거나, 사용자 설정에 따라 반영구적으로 저장할 수 있다. The data area is an area in which data generated according to the use of the terminal 100 is stored. In a video call, data recorded by the user, data to be stored by the user among images collected by the camera 170, and the terminal ( User data related to various optional functions provided by 100 may be stored, for example, a video, phone book data, audio data, information corresponding to corresponding content or user data, and the like. In particular, the data area of the present invention may store various avatar data for implementing an avatar capable of outputting a certain type of operation and state as an image. Using this, the user of the terminal 100 may set an avatar corresponding to a specific phone number when storing the phone book data. If there is a partner avatar linked to the specific phone number and the video call service of the present invention is performed with the terminal having the phone number, the partner avatar is loaded by the controller 160 and output to the display unit 140. Can be. At this time, the avatar of the user may be output to the display unit 140 according to the user's selection. The avatar may perform an animation including various states or various actions according to the avatar execution code. For example, the avatar may be changed to another shape or shape according to a specific avatar execution code, or may be a form in which another image is added to a specific image. The avatar execution code for the avatar's presentation form may be deleted or added through a setting operation of the designer or the user of the avatar. Here, the shape of the avatar may be changed according to the shape of each avatar, and may also be changed by the avatar designer or the user's setting manipulation. On the other hand, since the avatar execution code is transmitted to the counterpart terminal to adjust the state and operation of the avatar outputted by the counterpart terminal, it is preferable that the avatar execution code is mutually regulated in order to transmit the avatar execution code. The data area may temporarily store the first avatar execution code received from the opponent terminal and the second avatar execution code for controlling his / her avatar, and execute the previous first avatar until the new first avatar execution code is received from the opponent terminal. You can keep the code saved, or save it semi-permanently depending on your settings.

그리고 상기 데이터 영역은 제스처 인식을 위하여 제스처 인식 DB(Data Base)를 포함할 수 있으며, 음성 인식을 위하여 음성 인식 DB를 포함할 수 있다. 상기 제스처 인식 DB는 피사체의 특정 동작 상태나 동작 변화를 특정 제스처로 인식할 수 있도록 기준을 제공하는 데이터베이스이다. 예를 들어, 피사체가 사람인 경우, 상기 제스처 인식 DB는 사람이 취할 수 있는 정지 상태 또는 동작 상태로 표현할 수 있는 다양한 제스처에 대한 기준 정보들을 포함할 수 있다. 예를 들어, 상기 제스처 인식 DB가 “손 흔드는 동작”에 관한 제스처 기준으로서 손바닥, 일정 각도, 좌우 방향, 일정 횟수에 대한 정보를 가지고 있다고 가정하기로 한다. 그러면, 단말기(100)는 카메라(170)가 수집한 영상에서 피사체가 손바닥을 보이면서, 일정 각도 내에서 좌우 방향으로 일정 횟수 이상 이동시키는 경우, 상기 제스처 인식 DB를 기반으로 상술한 영상을 “손을 흔드는 동작”의 제스처로 인식할 수 있다. 한편, 상기 음성 인식 DB는 오디오 신호로부터 전달되는 신호가 어떠한 음성에 해당하는지를 판별하는 기준에 관한 제1 정보와, 음성 인식된 정보가 기 설정된 특 정 음성에 해당하는지 비교하기 위한 기준에 관한 제2 정보를 포함할 수 있다. 제1 정보는 마이크(MIC)가 수집한 신호로부터 음소, 음절, 단어, 어절, 문장 등을 인식할 수 있도록 제공하는 정보이며, 상기 제2 정보는 인식된 정보가 아바타 실행 코드로 변환하기 위하여 기 설정된 특정 음절, 단어, 어절 또는 문장 등에 해당하는지 비교하기 위한 정보이다.The data area may include a gesture recognition DB (Data Base) for gesture recognition and a voice recognition DB for voice recognition. The gesture recognition DB is a database that provides a criterion for recognizing a specific operation state or a change in motion of a subject as a specific gesture. For example, when the subject is a human, the gesture recognition DB may include reference information about various gestures that can be expressed as a stationary state or an operational state that a human can take. For example, it is assumed that the gesture recognition DB has information about a palm, a predetermined angle, a left-right direction, and a predetermined number of times as a gesture reference for a "shake". Then, when the subject 100 shows the palm of the object in the image collected by the camera 170, and moves the object more than a predetermined number of times in the left and right direction within a predetermined angle, the terminal 100 moves the above-described image based on the gesture recognition DB. Shake "gesture. On the other hand, the speech recognition DB is the first information on the criteria for determining which voice corresponds to the signal transmitted from the audio signal, and the second for the criteria for comparing whether the speech recognition information corresponds to a predetermined specific voice May contain information. The first information is information provided to recognize a phoneme, syllable, word, word, sentence, and the like from a signal collected by the microphone (MIC), and the second information is used to convert the recognized information into an avatar execution code. Information for comparing whether a particular syllable, word, word, or sentence is set.

상기 카메라(170)는 화상 통화 및 제스처 인식을 위한 영상을 수집한다. 이러한 카메라(170)는 렌즈를 통해 촬상되는 영상을 촬영하며, 촬영된 광 신호를 전기적 신호로 변환하는 카메라 센서(도시되지 않음)와, 카메라 센서로부터 촬영되는 아날로그 영상신호를 디지털 데이터로 변환하는 신호처리부(도시되지 않음)를 구비한다. 여기서 카메라 센서는 CCD(Charge Coupled Device) 센서가 적용될 수 있고, 신호처리부는 DSP(Digital Signal Processor)로 구현될 수 있으나, 이에 한정되는 것은 아니다. 이러한 상기 카메라(170)는 카메라 기능 이용을 위한 입력 신호가 수신되는 경우에 활성화될 수 있으며, 화상 통화 기능사용 시 활성화될 수 있다. 상기 카메라(170)는 수집된 영상을 상대측 단말기에 전송하거나 제스처 인식을 위하여 제어부(160)에 전달한다.The camera 170 collects images for video call and gesture recognition. The camera 170 photographs an image photographed through a lens, a camera sensor (not shown) for converting the photographed optical signal into an electrical signal, and a signal for converting an analog image signal photographed from the camera sensor into digital data. A processing unit (not shown). Here, the camera sensor may be a charge coupled device (CCD) sensor, and the signal processor may be implemented as a digital signal processor (DSP), but is not limited thereto. The camera 170 may be activated when an input signal for using a camera function is received, and may be activated when using a video call function. The camera 170 transmits the collected image to the counterpart terminal or the controller 160 for gesture recognition.

상기 제어부(160)는 본 발명의 아바타 기반 화상 통화 서비스 지원을 위하여 단말기(100)의 각 구성을 초기화하고, 필요한 신호 제어를 수행할 수 있다. 특히 제어부(160)는 화상 통화 중 상대측이 취하는 특정 제스처 및 음성 인식을 기반으로 아바타 실행 코드의 송수신 및 아바타 출력을 제어한다.The controller 160 may initialize each component of the terminal 100 and perform necessary signal control to support the avatar-based video call service of the present invention. In particular, the controller 160 controls the transmission and reception of the avatar execution code and the avatar output based on a specific gesture and voice recognition taken by the counterpart during a video call.

제어부(160)는 아바타 설정 모드에서 상대측에 대응하는 아바타와, 설정된 아바타를 동작시키는 아바타 실행 코드를 설정할 수 있다. 아바타 및 아바타 실행 코드는 일반, 날씨, 스포츠(골프, 농구, 야구 등) 등 사용자의 관심분야나 취미 등에 따라 다양하게 분류되어 제공될 수 있다. 사용자는 아바타 설정 모드에서 자신 또는 상대측이 좋아하거나 관심 있는 분야의 아바타 또는 아바타 실행 코드를 선택하여 설정할 수 있다.The controller 160 may set an avatar corresponding to the other party and an avatar execution code for operating the set avatar in the avatar setting mode. The avatar and the avatar execution code may be variously classified and provided according to a user's interests or hobbies such as general, weather, sports (golf, basketball, baseball, etc.). In the avatar setting mode, the user may select and set an avatar or an avatar execution code of a field that the user or the other party likes or is interested in.

예컨대 상대측의 특정 제스처 및 음성에 따라 매핑된 아바타 실행 코드는 아바타를 표1과 같이 동작시킬 수 있다. 이때 감정상태는 기쁨, 슬픔, 놀람으로 한정하여 설명하였지만 이에 한정되는 것은 아니다. 또한 각각의 감정상태에 따른 아바타의 동작 또한 이에 한정되는 것은 아니다.For example, the avatar execution code mapped according to a specific gesture and voice of the opponent may operate the avatar as shown in Table 1. At this time, the emotional state is described as being limited to joy, sadness, surprise, but is not limited thereto. In addition, the motion of the avatar according to each emotional state is not limited thereto.

기쁨pleasure 슬픔sadness 놀람surprised 일반Normal 아바타가 웃음Avatar laughs 아바타가 울음을
터트림Avatar crying
Turm 아바타가 눈을
크게 뜸Avatar eyes
Large moxibustion 날씨weather 아바타가 화창한
날씨에 산책함Avatar is sunny
Strolling in the weather 아바타가 비를 맞음Avatar rained 아바타 머리 위로
번개가 침Avatar head up
Lightning strikes 골프golf 아바타가 퍼팅한 공이
홀컵에 들어가 좋아함The ball put by the avatar
I like the hole cup 아바타의 티샷이 물에
빠져 울음을 터트림Avatar tee shot in water
I burst out and cry 티샷이 홀인원되어
아바타가 깜짝 놀람The tee shot is hole-in-one
Avatar is surprised 농구basketball 아바타가 삼점 슛에
성공함Avatar is on three point shot
Succeeded 아바타가 공을
빼앗김Avatar hit the ball
Deprived 아바타의 슛이
블락슛을 당함Avatar shot
Blocked

한편 별도로 아바타 실행 코드를 설정하지 않는 경우, 제어부(160)는 디폴트로 설정된 아바타 실행 코드를 사용할 수 있다. 예컨대 일반 항목이 디폴트로 설정될 수 있다.If the avatar execution code is not set separately, the controller 160 may use the avatar execution code set as a default. For example, a general item may be set as a default.

상기 제어부(160)는 아바타 기반 화상 통화 서비스 요청 시, 통신부(110)를 통하여 상대측 단말기와 화상 통화 연결을 위한 협상 및 채널 확보를 수행한다. 화상 통화 요청이 있거나 또는 화상 통화 채널이 형성되면, 상기 제어부(160)는 카메라(170)를 활성화하여 피사체에 대한 영상을 수집하고, 수집된 영상을 표시부(140)에 출력하도록 제어하는 한편, 화상 통화 규격에 맞도록 처리할 수 있다. 이때, 상기 제어부(160)는 피사체의 제스처 및 음성 중 적어도 하나로부터 아바타의 상태 및 동작 중 적어도 하나를 변경할 수 있도록 하는 아바타 실행 코드 기반으로 아바타를 출력할 수 있는 기능을 제공한다.When the avatar-based video call service request is made, the controller 160 performs negotiation and channel acquisition for the video call connection with the counterpart terminal through the communication unit 110. When there is a video call request or a video call channel is formed, the controller 160 activates the camera 170 to collect an image of a subject, and controls to output the collected image to the display unit 140, Can be processed to meet currency standards. In this case, the controller 160 provides a function of outputting an avatar based on an avatar execution code for changing at least one of a state and an operation of the avatar from at least one of a gesture and a voice of the subject.

이를 위하여 상기 제어부(160)는 도 3에 도시된 바와 같이 제스처 인식부(165), 음성 인식부(167), 화상 통화 모듈(163) 및 아바타 서비스 모듈(161)을 포함할 수 있다.To this end, the controller 160 may include a gesture recognition unit 165, a voice recognition unit 167, a video call module 163, and an avatar service module 161 as shown in FIG. 3.

상기 제스처 인식부(165)는 상기 저장부(150)에 저장된 제스처 인식 DB를 기반으로 카메라(170)가 수집한 영상의 특정 동작 상태나 특정 동작 변화를 제스처로 인식하고, 인식된 제스처에 대응하는 아바타 실행 코드를 생성하고, 이를 아바타 서비스 모듈(161)에 전달하는 구성이다. 이를 위하여 상기 제스처 인식부(165)는 상기 카메라(170)가 수집한 영상에 대하여 특징점을 추출하고, 특징점의 배치 상태나 배치 상태 변화로부터 피사체가 어떠한 동작을 취하는지를 판단하고, 이를 제스처 인식 DB에 저장된 정보와 비교할 수 있다. 여기서, 상기 제스처 인식 DB는 앞서 설명한 바와 같이, 피사체의 특정 동작 상태 및 변화가 특정 제스처에 해당하는지의 기준을 제공할 뿐만 아니라, 특정 제스처가 어떠한 아바타 실행 코드에 해당하는지에 대한 정보도 저장할 수 있다. 제스처 인식부(165)가 수집한 영상에 대한 특징점을 추출을 통하여 제스처를 인식하기 위해서, 예컨대 PCA(principal component analysis)를 통해 adaboost를 이용할 수 있다. 여기서 adaboost는 객체의 형태 및 특징점을 추출하기 위한 학습 알고리듬으로, Yoav Freund and Robert E. Schapire 에 의한 "A decision-theoretic generalization of on-line learning and an application to boosting", In Computational Learning Theory: Eurocolt '95, pp. 23-37, Springer-Verlag, 1995에 자세히 기재되어 있다.The gesture recognition unit 165 recognizes a specific operation state or a specific operation change of the image collected by the camera 170 as a gesture based on the gesture recognition DB stored in the storage unit 150, and corresponds to the recognized gesture. The avatar generating code is generated and delivered to the avatar service module 161. To this end, the gesture recognition unit 165 extracts a feature point from the image collected by the camera 170, and determines what action the subject takes from the arrangement state or change of the arrangement state of the feature point, and the gesture recognition DB is determined. You can compare it with the stored information. Here, as described above, the gesture recognition DB may not only provide a criterion of whether a specific operation state and change of a subject correspond to a specific gesture, but also store information on which avatar execution code corresponds to a specific gesture. . In order to recognize the gesture by extracting the feature points of the collected image by the gesture recognition unit 165, for example, adaboost may be used through PCA (principal component analysis). Here, adaboost is a learning algorithm for extracting the shape and feature points of an object. "A decision-theoretic generalization of on-line learning and an application to boosting", In Computational Learning Theory: Eurocolt 'by Yoav Freund and Robert E. Schapire. 95, pp. 23-37, Springer-Verlag, 1995.

한편, 상기 제스처 인식부(165)는 카메라(170)가 수집한 영상에 대한 제스처 평가에 대하여 특징점 확인이 아닌 오브젝트 변화 확인을 이용할 수 도 있다. 이를 보다 상세히 설명하면, 화상 통화 모듈(163)은 다양한 이미지 처리 방식을 이용할 수 있는데, 예를 들어, 상기 화상 통화 모듈(163)은 MPEG4 이미지 처리 방식을 이용할 수 있다. 상기 MPEG4는 이미지를 계층적인 오브젝트 단위로 구분할 수 있는데, 예를 들면, 화상 통화 시 수집된 이미지는 크게 배경 오브젝트와 타겟 오브젝트로 구분될 수 있다. 여기서, 상기 타겟 오브젝트가 사람인 경우, 타겟 오브젝트는 다시, 머리 오브젝트, 팔 오브젝트, 몸통 오브젝트 등으로 구분될 수 있으며, 상기 팔 오브젝트는 다시 손가락 오브젝트, 손바닥 오브젝트, 손등 오브젝트, 팔목 오브젝트 등으로 구분될 수 있다. 그리고 상기 손가락 오브젝트는 다시 손가락 마디마디가 각각의 오브젝트로 구분될 수 있다. 상기 화상 통화 모듈(163)이 이미지의 계층적인 방식으로 이미지를 처리하는 경우, 상기 제스처 인식부(165)는 전체 이미지에 대한 제스처 평가를 수행하지 않고, 제스처의 가장 많은 부분을 차지할 수 있는 손 오브젝트만을 추적하며, 손의 변화를 통하여 특정 제스처가 취해지는지를 확인할 수 있다. 이를 통하여 상기 제스처 인식부(165)는 제스처의 인식에 대한 처리 속도를 개선할 수 있을 것이다. 상기 제스처 인식부(165)는 제스처가 인식되면, 그에 대응하는 아바타 실행 코드를 생성하고, 이를 아바타 서비스 모듈(161)에 전달할 수 있다.The gesture recognition unit 165 may use object change confirmation, not feature point confirmation, for gesture evaluation of the image collected by the camera 170. In more detail, the video call module 163 may use various image processing methods. For example, the video call module 163 may use MPEG4 image processing. The MPEG4 may classify an image into hierarchical object units. For example, an image collected during a video call may be classified into a background object and a target object. Here, when the target object is a human, the target object may be further divided into a head object, an arm object, a torso object, and the like, and the arm object may be further divided into a finger object, a palm object, a back object, a cuff object, and the like. have. In addition, the finger node may be divided into respective nodes. When the video call module 163 processes the image in a hierarchical manner of the image, the gesture recognition unit 165 does not perform a gesture evaluation on the entire image, and a hand object that can occupy the most part of the gesture. Only the track can be tracked, and a change in the hand can be used to determine if a particular gesture is taken. Through this, the gesture recognition unit 165 may improve the processing speed for the recognition of the gesture. When the gesture is recognized, the gesture recognition unit 165 may generate an avatar execution code corresponding to the gesture and transmit it to the avatar service module 161.

상기 음성 인식부(167)는 오디오 처리부(130)로부터 전달되는 오디오 신호를 기 설정된 특정 음성 인식 모델을 기반으로 음성 인식을 수행하게 된다. 예를 들면, 상기 음성 인식부(167)는 음성 인식 모델 생성을 위하여 마이크(MIC)가 수신한 음성 신호를 16KHz로 샘플링 하고, 16bit로 양자화하여 저장할 수 있다. 그리고 상기 양자화된 음성 데이터는 일정 값의 전달함수로 사전 강조하고 25ms의 해밍 윈도우를 곱하여 10ms씩 이동하면서 분석할 수 있다. 이를 통해 음성 특징 파라미터를 결정할 수 있는데, 예를 들면 12차 LPC-MEL spectrum계수와 정규화 된 대수 에너지에 1차 및 2차의 차분 성분을 포함하여 총 39차의 특징 파라미터를 결정할 수 있다. 이러한 특징 파라미터는 본원 발명의 음성 인식 모델에 적용될 수 있는데, 상기 음성 인식 모델은 모델의 각 상태위치마다 음소결정트리를 생성하고, 학습 음성 데이터를 이용하여 SSS(Successive State Splitting : 이하 SSS)에 의해 문맥의존 음향모델의 상태열을 학습하는 방법을 적용할 수 있다. 이 방법은 상태분할을 수행하는데 속도가 빠르기 때문에 SSS에 의해 분할한 상태를 선택하여 분할하는 동시에 전체 분할 가능한 상태에 대해 상태분할을 수행하고 우도가 최대가 되는 상태를 선택할 수 있다. 그리고 상기 음성 인식부(167)에서 적용되는 음향 모델은 은닉 마르코프 망이 적용될 수 도 있다. 또한, 상기 음성 인식부(167)는 다양한 알고리즘을 기반으로 음성파를 주파수 분석하여 모음을 특징짓는 음역 또는 그것과 등가인 특징을 추출해서 분리하는 방법을 적용할 수 도 있다. 상기 음성 인식부(167)는 상술한 음성 인식 알고리즘뿐만 아니라, 다양한 음성 인식 알고리즘을 적용할 수 있다. 이러한 다양한 알고리즘을 기반으로 음성을 인식하면, 음성 인식부(167)는 인식된 음성이 기 설정된 특정 음성에 해당하는지를 판단할 수 있다. 여기서, 상기 특정 음성은 특정 아바타 실행 코드에 대응하는 음성으로서, 아바타 실행 코드를 설계하는 설계자에 의하여 지정되거나 혹은 아바타의 동작이나 상태 변화를 설계하는 설계자에 의하여 지정될 수 있다. 상기 음성 인식부(167)는 음성 인식 DB를 이용하여 현재 인식된 음성이 특정 음성에 해당하는 경우 그에 대응하는 아바타 실행 코드를 생성할 수 있다. 그리고 상기 음성 인식부(167)는 수집된 특정 음성에 대한 아바타 실행 코드를 생성하면, 이를 아바타 서비스 모듈(161)에 전달할 수 있다.The speech recognition unit 167 performs speech recognition based on a predetermined speech recognition model of the audio signal transmitted from the audio processor 130. For example, the voice recognition unit 167 may sample the voice signal received by the microphone MIC at 16 KHz, quantize the data to 16 bits, and generate the voice recognition model. The quantized speech data may be analyzed by pre-emphasis by a transfer function of a predetermined value and multiply by a Hamming window of 25 ms and moving by 10 ms. Through this, voice characteristic parameters can be determined. For example, a total of 39 characteristic parameters can be determined including the first and second difference components in the 12th LPC-MEL spectrum coefficient and the normalized logarithmic energy. This feature parameter can be applied to the speech recognition model of the present invention. The speech recognition model generates a phoneme decision tree for each state position of the model, and uses SSS (Successive State Splitting: SSS) using learning speech data. We can apply the method of learning the state string of the context-dependent acoustic model. Since this method is fast in performing state division, it is possible to select and divide a state divided by the SSS, perform state division for the entire partitionable state, and select a state in which the likelihood is maximized. A hidden Markov network may be applied to the acoustic model applied by the speech recognizer 167. In addition, the speech recognizer 167 may apply a method of frequency-analyzing speech waves based on various algorithms to extract and separate a sound region or a feature equivalent to the vowel. The voice recognition unit 167 may apply not only the above-described voice recognition algorithm but also various voice recognition algorithms. When the voice is recognized based on these various algorithms, the voice recognition unit 167 may determine whether the recognized voice corresponds to a predetermined specific voice. In this case, the specific voice is a voice corresponding to a specific avatar execution code, and may be designated by a designer who designs an avatar execution code or by a designer who designs an operation or state change of an avatar. The speech recognition unit 167 may generate an avatar execution code corresponding to a specific speech using a speech recognition DB. When the voice recognition unit 167 generates an avatar execution code for the collected specific voice, the voice recognition unit 167 may transmit the generated avatar code to the avatar service module 161.

상기 아바타 서비스 모듈(161)은 화상 통화 채널 형성 시, 상대측 단말기에 대응하는 상대측 아바타를 표시부(140)에 출력하도록 제어하며, 사용자 설정에 따라 자신의 단말기에 대응하는 자신의 아바타를 표시부(140) 일측에 출력하도록 제어한다. 그리고 상기 제스처 인식부(165) 및 음성 인식부(167) 중 적어도 하나로부터 수신된 아바타 실행 코드를 상대측 단말기에 전송하거나 자신의 아바타에 적용하도록 제어한다. 즉, 상기 아바타 서비스 모듈(161)은 상기 아바타 실행 코드를 화상 통화 모듈(163)에 전달하여 상대측 단말기에 전송하거나, 제어부(160) 제어에 따라 메시지 서비스를 이용하여 상대측 단말기에 전송할 수 있다. 한편, 상기 아바타 서비스 모듈(161)은 화상 통화 모듈(163)을 통하여 또는 메시지 서비스를 통하여 상대측 단말기로부터 상대측 아바타 조절을 위한 제1 아바타 실행 코드를 수신할 수 있다. 상기 아바타 서비스 모듈(161)은 수신된 제1 아바타 실행 코드를 기반으로 표시부(140)에 출력되고 있는 상대측 아바타의 상태나 동작을 변경하도록 제어할 수 있다. 그리고 상기 아바타 서비스 모듈(161)은 제스처 인식부(165) 및 음성 인식부(167) 중 적어도 하나로부터 수신된 제2 아바타 실행 코드를 이용하여 자신의 아바타 상태 및 동작 중 적어도 하나를 변경하도록 제어할 수 있다. 여기서, 상기 아바타 서비스 모듈(161)은 사용자의 선택에 따라 자신의 아바타를 출력하지 않았거나 또는 자신의 아바타 조절 기능을 수행하지 않도록 설정한 경우, 자신의 아바타 조절 기능을 수행하지 않고, 상대측 아바타 조절 기능만을 수행할 수 도 있다. The avatar service module 161 controls to output the counterpart avatar corresponding to the counterpart terminal to the display unit 140 when the video call channel is formed, and displays the avatar corresponding to the terminal according to a user setting. Control to output to one side. The avatar execution code received from at least one of the gesture recognition unit 165 and the voice recognition unit 167 is transmitted to the counterpart terminal or controlled to be applied to the own avatar. That is, the avatar service module 161 may transmit the avatar execution code to the video call module 163 to transmit to the counterpart terminal or to the counterpart terminal using a message service under the control of the controller 160. Meanwhile, the avatar service module 161 may receive a first avatar execution code for controlling a partner's avatar from the partner terminal through the video call module 163 or a message service. The avatar service module 161 may control to change the state or operation of the partner avatar outputted to the display unit 140 based on the received first avatar execution code. The avatar service module 161 may control to change at least one of its avatar state and operation using a second avatar execution code received from at least one of the gesture recognition unit 165 and the voice recognition unit 167. Can be. In this case, when the avatar service module 161 does not output its own avatar or does not perform its own avatar adjustment function according to a user's selection, the avatar service module 161 does not perform its own avatar adjustment function and controls the other party's avatar. It can also perform a function only.

한편, 상기 아바타 서비스 모듈(161)은 화상 통화 채널이 형성되었지만, 상대측 단말기에 대응하는 아바타가 없는 경우, 상대측 아바타 선택을 사용자에게 요청할 수 있다. 즉, 상기 아바타 서비스 모듈(161)은 상대측 아바타가 존재하지 않음을 알리는 팝업창을 표시부(140)에 출력하고, 상대측 아바타 선택을 위한 메뉴를 활성화할 수 있다. 이 후, 상기 아바타 서비스 모듈(161)은 사용자가 선택한 상대측 아바타를 표시부(140)에 출력하도록 하는 한편, 현재 화상 통화 채널이 형성된 상대측 단말기의 전화번호에 링크시켜 폰 북 데이터를 업데이트할 수 있다. 또한, 상기 아바타 서비스 모듈(161)은 상대측 아바타가 없는 경우 상대측 단말기에 아바타 데이터를 요청하는 메시지를 전송할 수 있다. 그리고 상기 아바타 서비스 모듈(161)은 상대측 단말기로부터 아바타 데이터를 수신하면, 이를 저장부에 저장하는 한편, 상대측 아바타를 표시부(140) 일측에 출력하도록 제어할 수 있다. 여기서 상기 아바타 서비스 모듈(161)은 상대측 단말기로부터 아바타 데이터를 수신하면, 상대측 전화번호에 아바타 데이터를 링크시켜 저장함으로써 폰 북 데이터를 갱신하도록 제어할 수 있다. 또한, 상기 아바타 서비스 모듈(161)은 사용자가 지정한 상대측 아바타가 존재하지만, 상대측 단말기로부터 아바타 데이터가 수신되는 경우, 상대측 단말기로부터 수신된 아바타 데이터를 기반으로 구현된 상대측 아바타를 표시부(140)에 출력하도록 제어할 수 있다. 그리고 상기 아바타 서비스 모듈(161)은 이전에 상대측 단말기 전화번호에 저장된 상대측 아바타를 새롭게 수신된 아바타 데이터로 갱신하도록 제어할 수 있다. Meanwhile, when the video call channel is formed but there is no avatar corresponding to the counterpart terminal, the avatar service module 161 may request the user to select the counterpart avatar. That is, the avatar service module 161 may output a pop-up window indicating that the opponent avatar does not exist to the display unit 140 and activate a menu for selecting the opponent avatar. Thereafter, the avatar service module 161 outputs the counterpart avatar selected by the user to the display unit 140, and updates the phone book data by linking to the phone number of the counterpart terminal on which the current video call channel is formed. In addition, the avatar service module 161 may transmit a message requesting avatar data to the counterpart terminal when there is no counterpart avatar. When the avatar service module 161 receives the avatar data from the counterpart terminal, the avatar service module 161 stores the avatar data in the storage unit and outputs the counterpart avatar to one side of the display unit 140. When the avatar service module 161 receives the avatar data from the counterpart terminal, the avatar service module 161 may control to update the phone book data by linking and storing the avatar data in the counterpart phone number. In addition, the avatar service module 161 outputs a counterpart avatar implemented on the display unit 140 based on the avatar data received from the counterpart terminal when the counterpart avatar specified by the user exists, but the avatar data is received from the counterpart terminal. Can be controlled. The avatar service module 161 may control to update the counterpart avatar previously stored in the counterpart terminal phone number with newly received avatar data.

상기 아바타 서비스 모듈(161)은 자신의 아바타가 존재하지 않는 경우에도, 전술한 바와 유사하게, 자신의 아바타가 존재하지 않음을 나타내는 팝업창을 선택적으로 출력하고, 단말기 사용자가 자신의 아바타를 선택할 수 있는 메뉴 등을 출력하도록 제어할 수 있다. 그리고 상기 아바타 서비스 모듈(161)은 단말기 사용자가 자신의 아바타를 변경한 경우, 아바타 변경에 대한 데이터를 생성하여 상대측 단말기에 전송할 수 있다.Even when the avatar service module 161 does not exist, the avatar service module 161 selectively outputs a popup window indicating that the avatar does not exist, and the terminal user can select the avatar of the user. The menu can be controlled to output. When the terminal user changes his avatar, the avatar service module 161 may generate data about the avatar change and transmit it to the counterpart terminal.

상기 화상 통화 모듈(163)은 카메라(170)가 수집한 화상 데이터와 마이크가 수집한 오디오 데이터를 화상 통화 규격에 맞도록 변환한 후, 통신부(110)를 통하여 상대측 단말기에 전송하거나, 상대측 단말기가 전송한 신호를 수신하여 화상 데이터와 오디오 데이터를 추출한 후, 이를 표시부(140)와 스피커를 통하여 출력하도록 제어하는 구성이다. 이러한 화상 통화 모듈(163)은 카메라(170)가 수집한 영상을 영상 처리하기 위한 영상 코덱 예를 들면, H.263, JPEG, Wavelet, mpeg2, mpeg4 및 H.264 중에 하나를 포함한다. 화상 통화 모듈(163)은 상기 화상 데이터 생성을 위해 전술한 영상 코덱을 포함하는 영상 코덱, 예컨대 H.324M을 포함할 수 있다. 화상 통화 모듈(163)은 그 외 다양한 영상 코덱을 사용하여 화상 통화 데이터를 생성할 수 있다.The video call module 163 converts the video data collected by the camera 170 and the audio data collected by the microphone to meet the video call standard, and then transmits the converted video data to the opposite terminal through the communication unit 110, After receiving the transmitted signal to extract the image data and the audio data, and to control the output through the display unit 140 and the speaker. The video call module 163 includes one of an image codec for processing an image collected by the camera 170, for example, H.263, JPEG, Wavelet, mpeg2, mpeg4, and H.264. The video call module 163 may include a video codec including H.324M including the above-described video codec for generating the video data. The video call module 163 may generate video call data using various other video codecs.

예컨대 화상 통화 데이터를 H,263과 H.324M을 이용하여 생성한 후 전송하는 과정을 설명하면 다음과 같다. For example, a process of generating and transmitting video call data using H, 263 and H.324M will be described below.

상기 H.263은 카메라(170)에서 출력되는 영상신호를 프레임 단위로 처리하여 화상 데이터로 변환하고, 상기 화상 데이터를 표시부(140)의 표시 특성 및 크기에 알맞게 변형하여 출력한다. 이때, 상기 H.263은 화상 데이터를 압축할 수 있다. 즉, 상기 H.263은 표시부(140)에 표시되는 화상 데이터를 기 설정된 방식으로 압축하거나 압축된 화상 데이터를 원래의 화상 데이터로 복원하는 기능을 수행한다. 상기 H.263은 JPEG, Wavelet, mpeg2, mpeg4 및 H.264 등으로 대체될 수 있다. The H.263 converts the image signal output from the camera 170 into frame data by converting the image signal into frame data, and transforms the frame data to fit the display characteristics and size of the display unit 140 and outputs the image data. In this case, the H.263 may compress the image data. That is, H.263 compresses the image data displayed on the display unit 140 in a preset manner or restores the compressed image data to the original image data. The H.263 may be replaced with JPEG, Wavelet, mpeg2, mpeg4, H.264 and the like.

상기 H.324M은 상기 H.263이 생성한 화상 데이터를 기타 데이터들과 함께 먹싱(Muxing)하여 화상 통화 데이터를 생성하고, 상기 화상 통화 데이터를 상기 통신부(110)에 전달한다. 이를 위하여, 상기 H.324M은 상기 오디오 처리부(130)가 수집한 오디오 데이터를 인코딩하기 위한 오디오 코덱 예를 들면, AMR을 포함할 수 있다. 상기 H.324M은 화상 통화 시, 상기 화상 데이터와 상기 오디오 데이터의 싱크 및 제어를 위한 제어 신호를 생성하는 H.245를 포함할 수 있다. 상기 H.324M은 상기 H.263으로부터의 화상 데이터, 상기 AMR로부터의 오디오 데이터, 상기 H.245로부터의 제어 신호를 수신하여 상기 통신부(110)에 전달하는 H.223을 포함할 수 있다. 즉, 상기 H.223은 상기 화상 데이터, 상기 오디오 데이터 및 상기 제어 신호를 먹싱함으로써 화상 통화 데이터를 생성하고, 상기 화상 통화 데이터를 상기 통신부(110)에 전달한다.The H.324M muxes the video data generated by the H.263 together with other data to generate video call data, and transmits the video call data to the communication unit 110. To this end, the H.324M may include an audio codec, for example, AMR, for encoding the audio data collected by the audio processor 130. The H.324M may include H.245 for generating a control signal for synchronizing and controlling the video data and the audio data during a video call. The H.324M may include H.223 which receives the image data from the H.263, the audio data from the AMR, and the control signal from the H.245 and transmits the received control signal to the communication unit 110. That is, the H.223 generates video call data by muxing the video data, the audio data and the control signal, and transfers the video call data to the communication unit 110.

특히, 본 발명의 화상 통화 모듈(163)은 아바타 서비스 모듈(161)이 전송하는 아바타 데이터를 영상 코덱으로 생성한 화상 데이터와 함께 통합하고, 이를 통신부(110)를 통하여 상대측 단말기에 전송할 수 있으며, 수신된 신호로부터 화상 데이터와 아바타 데이터를 각각 분리하여 추출한 이후, 두 개의 데이터를 구분하여 표시부(140)에 출력하도록 제어할 수 있다. 또한, 상기 화상 통화 모듈(163)은 화상 데이터, 오디오 데이터 및 제어 신호를 디먹싱하는 과정에서, 상대측 단말기가 전송한 제1 아바타 실행 코드를 추출하고, 추출된 제1 아바타 실행 코드를 아바타 서비스 모듈(161)에 전달할 수 있다. 그리고 상기 화상 통화 모듈(163)은 화상 데이터, 오디오 데이터 및 제어 신호를 먹싱하는 과정에서 아바타 서비스 모듈(161)이 전달하는 제2 아바타 실행 코드를 함께 먹싱한 신호를 생성하고, 이 신호를 상대측 단말기에 전송하도록 제어할 수 있다.In particular, the video call module 163 of the present invention may integrate the avatar data transmitted by the avatar service module 161 together with the video data generated by the image codec, and transmit the same to the counterpart terminal through the communication unit 110. After separating and extracting the image data and the avatar data from the received signal, the two pieces of data may be separated and output to the display unit 140. In addition, the video call module 163 extracts the first avatar execution code transmitted from the counterpart terminal in the process of demuxing the video data, the audio data and the control signal, and extracts the extracted first avatar execution code from the avatar service module. Can be passed to 161. In addition, the video call module 163 generates a signal muxing the second avatar execution code transmitted by the avatar service module 161 in the process of muxing the video data, the audio data, and the control signal. Can be controlled to send to.

한편, 상기 화상 통화 모듈(163)은 아바타 서비스 모듈(161)로부터 아바타 데이터와 아바타 실행 코드를 수신하고, 이를 화상 통화에 필요한 신호에 먹싱한 후, 전송하는 것을 예로 하여 설명하였지만, 본 발명이 이에 한정되는 것은 아니다. 즉, 본 발명의 제어부(160)는 화상 통화 모듈(163)과 독립적으로 아바타 서비스 모듈(161)을 운용하여 아바타 데이터와 아바타 실행 코드를 통신부(110)를 통하여 상대측 단말기에 전송할 수 있으며, 또한 수신된 신호를 아바타 서비스 모듈(161)에서 처리하여 표시부에 출력하도록 제어할 수 있다. 이를 위하여 상기 제어부(160)는 아바타 데이터 전송 및 아바타 실행 코드 전송을 위한 통신 채널 예를 들면 데이터 통신 채널 또는 메시지 서비스 채널을 화상 통화 모듈(163)이 생성하는 화상 통화 채널과 독립적으로 생성할 수 있다.Meanwhile, although the video call module 163 receives the avatar data and the avatar execution code from the avatar service module 161, muxes the signals for the video call, and transmits them, the present invention has been described as an example. It is not limited. That is, the controller 160 of the present invention can operate the avatar service module 161 independently of the video call module 163 to transmit the avatar data and the avatar execution code to the counterpart terminal through the communication unit 110 and also receive the received data. The avatar signal may be processed by the avatar service module 161 and output to the display unit. To this end, the controller 160 may generate a communication channel for transmitting avatar data and an avatar execution code, for example, a data communication channel or a message service channel, independently of the video call channel generated by the video call module 163. .

그리고 상기 화상 통화 모듈(163)은 화상 통화를 위하여 필요한 데이터 인코딩 및 디코딩을 위하여 MPEG4를 이용할 수 있으며, 이 경우, 상기 화상 통화 모듈(163)은 이미지와 관련된 이미지 오브젝트, 오디오와 관련된 오디오 오브젝트 등을 송수신하도록 제어하여 화상 통화에 필요한 데이터 전송을 지원할 수 있다. 이때, 상기 화상 통화 모듈(163)은 일부 오브젝트에 대하여 제스처 인식부(165)에 제공할 수 있으며, 제스처 인식부(165)는 앞서 설명한 바와 같이 화상 통화 모듈(163)이 인식한 오브젝트 변환을 통하여 특정 제스처 인식을 수행할 수 있다.The video call module 163 may use MPEG4 to encode and decode data necessary for a video call. In this case, the video call module 163 may select an image object related to an image, an audio object related to audio, or the like. It can control transmission and reception to support data transmission required for a video call. In this case, the video call module 163 may provide some objects to the gesture recognition unit 165, and the gesture recognition unit 165 converts objects recognized by the video call module 163 as described above. Specific gesture recognition may be performed.

이상에서 설명한 바와 같이, 본 발명의 실시 예에 따른 단말기는 카메라(170)가 수집하는 피사체의 제스처 인식 및 마이크가 수집한 오디오 인식을 통하여 아바타의 상태 및 동작 중 적어도 하나를 변화시킬 수 있도록 하는 아바타 실행 코드를 생성할 수 있고, 이를 기반으로 화상 통화 시 아바타의 상태 및 동작을 변경하여 사용자가 나타내고자 하는 표정이나 기분 등을 보다 정확하게 또는 보다 희화적으로 표현할 수 있다.As described above, the terminal according to an exemplary embodiment of the present invention enables an avatar to change at least one of an avatar's state and operation through gesture recognition of a subject collected by the camera 170 and audio recognition collected by a microphone. Execution code can be generated, and based on this, the state and motion of the avatar can be changed during a video call to more accurately or more expressively express an expression or mood that the user wants to express.

한편 상기 단말기는 제스처 인식부(165)와 음성 인식부(167)를 사용자의 선택에 따라 선택적으로 활성화하고 이를 기반으로 아바타 기반 화상 통화 서비스를 지원할 수 있다. 즉, 상기 단말기는 사용자가 제스처 인식을 기준으로 아바타 운용을 수행하기 위한 설정을 한 경우, 음성 인식 기능을 비활성화할 수 있으며, 반대로 음성 인식을 기준으로 아바타 운용을 수행하기 위한 설정을 사용자가 한 경우, 제스처 인식 기능을 비활성화하도록 제어할 수 있다. 그리고 상술한 바와 같이 상기 단말기는 제스처 인식 및 음성 인식을 모두 활성화할 수 있다. 이 과정에서 상기 아바타 실행 코드가 동시 또는 거의 동일한 시간에 형성되어 상대측 단말기로 전송될 수 있으며, 이를 수신한 상대측 단말기는 두개의 아바타 실행 코드를 아바타에 동시 적용하여 출력하되, 동시 적용이 불가능한 경우, 일정 시간 주기로 각각의 아바타 실행 코드가 적용된 아바타를 출력하도록 제어할 수 있다.Meanwhile, the terminal may selectively activate the gesture recognition unit 165 and the voice recognition unit 167 according to a user's selection, and support the avatar-based video call service based thereon. That is, the terminal may deactivate the voice recognition function when the user sets the avatar operation based on the gesture recognition, and conversely, when the user sets the avatar operation based on the voice recognition. In addition, the control may be controlled to deactivate the gesture recognition function. As described above, the terminal may activate both gesture recognition and voice recognition. In this process, the avatar execution codes may be simultaneously or almost formed at the same time, and may be transmitted to the opposite terminal. The receiving terminal may output two avatar execution codes simultaneously to the avatar, but may not be simultaneously applied. The avatar may be controlled to output an avatar to which each avatar execution code is applied at a predetermined time period.

도 4는 본 발명의 실시 예에 따른 단말기의 화상 통화 운용을 위한 화면 인터페이스의 일예를 나타낸 도면이다.4 is a diagram illustrating an example of a screen interface for operating a video call of a terminal according to an exemplary embodiment of the present invention.

상기 도 4를 참조하면, 단말기의 표시부(140)는 401 화면에서와 같이 크게 상대측 단말기로부터 수신된 데이터를 출력하는 제1 화면 영역(141)과 사용자의 카메라가 수집한 영상을 출력하는 제2 화면 영역(143)을 포함할 수 있다.Referring to FIG. 4, the display unit 140 of the terminal includes a first screen area 141 that outputs data received from the other terminal as large as the screen 401 and a second screen that outputs images collected by the camera of the user. Area 143 may be included.

상기 제1 화면 영역(141)은 상대측 단말기의 카메라가 수집한 영상을 출력하는 상대측 화상 데이터 출력 영역(141a)과 상대측 단말기에 대응하는 상대측 아바타 출력 영역(141b)을 포함한다. 여기서, 상기 상대측 아바타 출력 영역(141b)은 상대측 화상 데이터 출력 영역(141a)과 구분되도록 출력될 수 있다. 즉, 상기 제1 화면 영역(141)은 화면을 분할한 후, 분할된 영역에 상대측 아바타 출력 영역(141b)과 상대측 화상 데이터 출력 영역(141a)을 할당할 수 있다. 또한, 상기 제1 화면 영역(141)은 별도로 화면을 분할하지 않고, 상대측 화상 데이터 출력 영역(141a)에 상대측 아바타를 오버 레이 시켜 출력할 수 도 있다.The first screen area 141 includes a counterpart image data output area 141a for outputting an image collected by a camera of the counterpart terminal and a counterpart avatar output area 141b corresponding to the counterpart terminal. Here, the opponent avatar output area 141b may be output to be distinguished from the opponent image data output area 141a. That is, the first screen area 141 may divide the screen and then allocate the partner avatar output area 141b and the partner image data output area 141a to the divided area. In addition, the first screen area 141 may be output by overlaying the partner avatar on the partner image data output area 141a without dividing the screen.

상기 제2 화면 영역(143)은 카메라가 수집한 자신의 화상 데이터 출력 영역(143a)과 자신의 아바타 출력 영역(143b)을 포함한다. 여기서, 상기 자신의 아바타 출력 영역(143b)은 표시부(140)의 크기 제한 설정 또는 사용자 선택에 따라 제거될 수 있다. 상기 제2 화면 영역(143)은 제1 화면 영역(141)과 유사하게 화면을 분할한 뒤, 자신의 화상 데이터 출력 영역(143a)과 자신의 아바타 출력 영역(143b)을 구분되게 배치할 수 있다. 그리고 제2 화면 영역(143)은 제1 화면 영역(141) 상에 사용자 아바타가 오버 레이 되도록 출력할 수 도 있다.The second screen area 143 includes its own image data output area 143a and its own avatar output area 143b collected by the camera. Here, the avatar output area 143b of the user may be removed according to the size limit setting of the display unit 140 or the user selection. The second screen area 143 may divide the screen similarly to the first screen area 141, and then arrange its image data output area 143a and its avatar output area 143b separately. . The second screen area 143 may output the user avatar to be overlaid on the first screen area 141.

한편, 단말기는 상대측 단말기로부터 상대측 아바타의 상태 및 동작 중 적어도 하나를 변경할 수 있는 아바타 실행 코드를 수신한 경우, 402 화면에서와 같이, 상대측 아바타의 상태 및 동작을 변경하여 출력할 수 있다. 예를 들면, 상대측 단말기로부터 “손을 흔드는 동작”에 대응하는 아바타 실행 코드를 화상 통화 채널, 메시지 서비스 채널, 음성 통화 채널 등 특정 채널을 통하여 수신하면, 단말기는 상기 “손을 흔드는 동작”에 대응하는 아바타 실행 코드를 기반으로 아바타의 상태 및 동작을 조작하도록 제어한다. 여기서, 상기 아바타의 상태 및 동작에 대응하는 이미지 또는 애니메이션이 “손을 흔드는 동작”에 대응하도록 변경되는 것은 설계자의 의도나 기호 및 사용자의 설정 등에 따라 조절될 수 있을 것이다. 한편, 단말기는 화상 통화 모듈을 기반으로 상대측 화상 데이터를 지속적으로 수신하기 때문에, 상기 상대측 화상 데이터 출력 영역(141a)에는 상대측의 화상 데이터가 출력된다. 이때, 상대측 단말기 사용자가 손을 흔드는 동작을 취할 경우, 그에 대응하는 실제 영상이 상기 상대측 화상 데이터 출력 영역에 출력될 수 있다.On the other hand, when the terminal receives an avatar execution code that can change at least one of the state and operation of the other avatar from the other terminal, the terminal may change and output the state and the operation of the other avatar as shown on the screen 402. For example, when the avatar execution code corresponding to the "shaking operation" is received from the opposite terminal through a specific channel such as a video call channel, a message service channel, or a voice call channel, the terminal responds to the "shaking operation". Control to manipulate the state and operation of the avatar based on the avatar execution code. Here, the change of the image or animation corresponding to the state and the motion of the avatar to correspond to the "shake of the hand" may be adjusted according to the intention of the designer, the preference, the setting of the user, and the like. On the other hand, since the terminal continuously receives the opponent's image data based on the video call module, the opponent's image data is output to the opponent's image data output area 141a. In this case, when the opposite terminal user shakes his hand, an actual image corresponding thereto may be output to the opposite image data output area.

또한, 도시되지는 않았으나, 단말기 사용자는 특정 음성을 발생시킬 수 있으며, 단말기가 상기 특정 음성을 음성 인식하고, 이를 기반으로 특정 아바타 실행 코드를 생성하면, 상기 단말기 및 상대측 단말기는 상기 특정 음성 인식을 기반으로 생성된 아바타 실행 코드에 따라 변화되는 아바타를 각각 출력할 수 있다.In addition, although not shown, the terminal user may generate a specific voice, and when the terminal recognizes the specific voice and generates a specific avatar execution code based on the specific voice, the terminal and the counterpart terminal recognize the specific voice. Avatars that are changed according to the avatar execution code generated based on the output may be output.

이상에서는 본 발명의 실시 예에 따른 화상 통화 운용을 위한 시스템과, 시스템을 구성하는 단말기 및 화면 인터페이스에 대하여 살펴보았다. 이하에서는 본 발명의 아바타 기반의 화상 통화 운용 방법을 도면을 참조하여 보다 상세히 설명하기로 한다.In the above, a system for operating a video call according to an embodiment of the present invention, a terminal and a screen interface constituting the system have been described. Hereinafter, an avatar based video call operating method of the present invention will be described in detail with reference to the accompanying drawings.

도 5는 본 발명의 실시 예에 따른 화상 통화 방법 중 송신측 단말기의 동작을 설명하기 위한 순서도이다. 한편 설명의 편의상 송신측 단말기를 제2 단말기(200)라 하고, 수신측 단말기를 제1 단말기(100)로 하여 설명하면 다음과 같다.5 is a flowchart illustrating an operation of a transmitting terminal in a video call method according to an exemplary embodiment of the present invention. On the other hand, for convenience of description, the transmitting terminal is referred to as a second terminal 200 and the receiving terminal is described as a first terminal 100 as follows.

상기 도 5를 참조하면, 본 발명의 실시 예에 따른 화상 통화 운용 방법에서 제2 단말기(200)는 먼저, 전원이 공급되면 제2 단말기(200)의 각 구성을 초기화하고, 기 설정된 대기화면을 S101 단계에서 출력할 수 있다.Referring to FIG. 5, in the video call operating method according to an exemplary embodiment of the present invention, the second terminal 200 first initializes each configuration of the second terminal 200 when power is supplied, and displays a preset standby screen. It can be output in step S101.

이후 상기 제2 단말기(200)는 사용자의 화상 통화 연결 요청을 위한 입력 신호를 입력부(120)로부터 수신하면, S103 단계에서 화상 통화 연결을 시도한다. 이를 보다 상세히 설명하면, 상기 제2 단말기(200) 사용자는 화상 통화 연결을 위하여 화상 통화가 가능한 제1 단말기(100)의 전화번호를 입력하고, 화상 통화 연결을 지시하는 입력 신호를 입력부(120)를 이용하여 생성할 수 있다. 그러면, 상기 제2 단말기(200)는 입력된 전화번호에 대응하는 제1 단말기(100)와 협상을 통하여 화상 통화를 위한 환경을 결정한다.Thereafter, when the second terminal 200 receives an input signal for requesting a video call connection from the input unit 120, the second terminal 200 attempts to connect the video call in step S103. In more detail, the user of the second terminal 200 inputs a phone number of the first terminal 100 capable of making a video call for a video call connection, and inputs an input signal for instructing the video call connection. Can be generated using Then, the second terminal 200 determines an environment for a video call through negotiation with the first terminal 100 corresponding to the input telephone number.

한편 S103 단계를 수행하기 전에, 제1 및 제2 단말기(100,200)는 입력부(120)를 통한 사용자의 선택 신호에 따라 아바타 설정 모드에서 상대측에 대응하는 아바타와, 설정된 아바타를 동작시키는 아바타 실행 코드를 설정할 수 있다. 사용자는 아바타 설정 모드에서 자신 또는 상대측이 좋아하거나 관심 있는 분야의 아바타 또는 아바타 실행 코드를 선택하여 설정할 수 있다.Meanwhile, before performing step S103, the first and second terminals 100 and 200 may select an avatar corresponding to the other party in the avatar setting mode and an avatar execution code for operating the set avatar according to a user's selection signal through the input unit 120. Can be set. In the avatar setting mode, the user may select and set an avatar or an avatar execution code of a field that the user or the other party likes or is interested in.

이때, 상기 제2 단말기(200)는 S105 단계에서 상기 화상 통화가 아바타 모드를 기반으로 진행되는 것인지 여부를 확인할 수 있다. 이를 위하여 상기 제2 단말기(200)는 일반 화상 통화 모드 및 아바타 기반 화상 통화 모드를 선택할 수 있는 메뉴를 제공할 수 있다. 또한, 사용자가 S103 단계에서 화상 통화 연결을 요청하는 경우, 제2 단말기(200)는 아바타 모드로 진행할 것인지를 묻는 팝업창을 출력할 수 있다. 아바타 모드로 진행하는 경우, 상기 제2 단말기(200)는 화상 통화 연결 과정에서 제1 단말기(100)에게 아바타 기반 화상 통화 연결을 요청하는 메시지임을 확인할 수 있는 정보를 전송할 수 있다.In this case, the second terminal 200 may determine whether the video call is performed based on the avatar mode in step S105. To this end, the second terminal 200 may provide a menu for selecting a general video call mode and an avatar-based video call mode. In addition, when the user requests a video call connection in step S103, the second terminal 200 may output a pop-up window asking whether to proceed to the avatar mode. When proceeding to the avatar mode, the second terminal 200 may transmit information to confirm that the message is a message for requesting an avatar-based video call connection to the first terminal 100 during the video call connection process.

한편, 상기 제2 단말기(200)는 아바타 모드가 아닌 경우, S107 단계로 분기하여 일반 화상 통화 기능을 수행할 수 있도록 지원할 수 있다. 즉, 상기 제2 단말기(200)는 별도의 아바타 출력 없이, 카메라(170)가 수집한 영상 및 마이크(MIC)가 수집한 오디오 신호를 제1 단말기(100)에 전송하는 한편, 자신의 표시부(140)에 상기 수집한 영상을 선택적으로 출력할 수 있다. 그리고 상기 제2 단말기(200)는 제1 단말기(100)로부터 수신되는 영상 및 오디오 신호를 출력할 수 있다. On the other hand, when the second terminal 200 is not the avatar mode, the second terminal 200 may support to perform a general video call function by step S107. That is, the second terminal 200 transmits the image collected by the camera 170 and the audio signal collected by the microphone MIC to the first terminal 100 without a separate avatar output, and has its own display unit ( The collected image may be selectively output to 140. The second terminal 200 may output an image and an audio signal received from the first terminal 100.

그리고 S105 단계에서 아바타 모드로 설정되어 있는 경우, 제1 단말기(100)에 대응하는 아바타를 표시부(140) 일측에 출력하고, 자신의 단말기(200)에 대응하는 아바타를 사용자 설정에 따라 표시부(140) 일측에 출력할 수 있다. 제1 단말기(100)에 대응하는 아바타가 없는 경우, 상기 제2 단말기(200)는 제1 단말기(100)에 요청하여 해당 아바타에 대응하는 아바타 데이터를 수신하거나, 자신의 저장부(150)에 저장되어 있는 특정 아바타를 제1 단말기(100)에 대응하는 아바타로 설정할 수 있다.When the avatar mode is set in operation S105, the avatar corresponding to the first terminal 100 is output to one side of the display unit 140, and the avatar corresponding to the user's terminal 200 is displayed according to a user setting. ) Can be output to one side. If there is no avatar corresponding to the first terminal 100, the second terminal 200 requests the first terminal 100 to receive avatar data corresponding to the avatar or to the storage unit 150. The stored specific avatar may be set as an avatar corresponding to the first terminal 100.

이후, 상기 제2 단말기(200)는 S109 단계로 분기하여 카메라(170)가 수집하는 피사체의 제스처 인식 및 음성 인식 중 적어도 하나를 수행할 수 있다. 상기 제스처 인식은 앞서 설명한 바와 같이, 피사체의 특정 부위나 전체가 특정 동작 상태를 가지거나 특정 동작 변화를 가지는 경우, 해당 상태 및 변화가 기 설정된 특정 제스처에 해당하는지를 판단하는 과정을 포함한다. 그리고 상기 음성 인식은 마이크(MIC)가 수집한 오디오 신호를 음성 인식 DB를 기반으로 음소, 음절, 단어, 어절, 문장 중 적어도 하나의 정보로 인식하고, 인식된 정보가 기 설정된 특정 음성에 해당하는지 판단하는 과정을 포함한다.Thereafter, the second terminal 200 branches to step S109 to perform at least one of gesture recognition and voice recognition of the subject collected by the camera 170. As described above, the gesture recognition may include determining whether the state and the change correspond to the preset specific gesture when the specific part or the whole of the subject has the specific operation state or the specific operation change. The voice recognition recognizes the audio signal collected by the microphone as at least one of phoneme, syllable, word, word, and sentence based on a voice recognition DB, and whether the recognized information corresponds to a preset specific voice. It includes the process of judging.

그리고 상기 제2 단말기(200)는 S111 단계에서 제스처 인식 및 음성 인식을 기반으로 특정 아바타 실행 코드를 생성한다. 이를 위하여 상기 제2 단말기(200)는 다양한 아바타 실행 코드를 특정 제스처 및 특정 음성 중 적어도 하나에 매핑한 테이블을 저장할 수 있으며, 이를 참조하여, 특정 제스처 인식 및 특정 음성 인식 발생에 따라 특정 아바타 실행 코드를 생성하게 된다.In operation S111, the second terminal 200 generates a specific avatar execution code based on gesture recognition and voice recognition. To this end, the second terminal 200 may store a table in which various avatar execution codes are mapped to at least one of a specific gesture and a specific voice, and with reference to this, the specific avatar execution code according to a specific gesture recognition and a specific voice recognition generation. Will generate

다음으로, 상기 제2 단말기(200)는 S113 단계에서 상기 생성된 아바타 실행 코드를 제1 단말기(100)에 전송하는 과정을 수행한다. 이때, 상기 제2 단말기(200)는 아바타 실행 코드를 화상 통화 채널을 통하여 제1 단말기(100)에 전송하거나, 새로운 데이터 통신 채널, 메시지 서비스 채널 등을 화상 통화 채널과 관련 없이 새롭게 생성한 후, 제1 단말기(100)에 전송할 수 있다.Next, the second terminal 200 transmits the generated avatar execution code to the first terminal 100 in step S113. In this case, the second terminal 200 transmits the avatar execution code to the first terminal 100 through the video call channel, or newly generates a new data communication channel, a message service channel, etc. without being associated with the video call channel, It may transmit to the first terminal 100.

덧붙여 상기 제2 단말기(200)는 S115 단계에서 상기 아바타 실행 코드를 적용하여 자신의 아바타의 상태나 동작을 변경하도록 제어할 수 있다. 이 과정에서 사용자가 자신의 아바타를 자신의 표시부(140)에 출력하도록 설정한 경우에 해당하는 것으로, 사용자 설정이 없는 경우, 생략될 수 있는 과정이다. 이때, 상기 아바타 실행 코드에 따라 아바타의 상태 및 동작의 변화 정도는 아바타를 설계한 설계자의 의도에 따라 달라질 수 있으며, 다양한 아바타에 따라 다양한 형태의 상태 및 동작을 취할 수 있다. 그렇다 하더라도, 특정 아바타 실행 코드에 의하여 나타내고자 하는 특정 감정이나 표현 또는 동작은 동일하게 설계되는 것이 바람직하다.In addition, the second terminal 200 may control to change the state or operation of the avatar by applying the avatar execution code in step S115. This process corresponds to a case in which the user sets his avatar to be outputted to his display unit 140. If there is no user setting, the process may be omitted. In this case, the degree of change in the state and operation of the avatar according to the avatar execution code may vary according to the intention of the designer who designed the avatar, and may take various types of states and actions according to various avatars. Even so, it is preferable that the specific emotions, expressions or actions to be represented by the specific avatar executable code are designed identically.

다음으로, 제2 단말기(200)는 S117 단계에서 화상 통화 종료 여부를 확인하고, 종료가 없는 경우, S109 단계 이전으로 분기하여 아바타 기반의 화상 통화 서비스를 지원하며, 이후 과정을 반복적으로 수행할 수 있다.Next, the second terminal 200 checks whether the video call ends in step S117, and if there is no termination, branches to the step S109 to support the avatar-based video call service, and then repeatedly performs the process. have.

상술한 설명에서 상기 아바타 실행 코드에 의하여 특정 상태나 동작을 취하는 아바타는 새로운 아바타 실행 코드가 생성되거나 수신되는 경우까지 지속적으로 해당 상태 및 동작을 취할 수 있으며, 화상 통화의 실시간성 적용을 위하여 기 설정된 일정 시간 동안 특정 상태나 동작을 취하도록 설정한 후, 디폴트 상태 및 동작으로 복귀할 수 도 있을 것이다.In the above description, an avatar who takes a specific state or action by the avatar execution code may continuously take the state and action until a new avatar execution code is generated or received, and is set to apply the real time of the video call. After a certain time or set to take a specific state or action, it may return to the default state and action.

도 6은 본 발명의 실시 예에 따른 아바타 기반 화상 통화 운용 방법 중 수신측 단말기의 동작을 설명하기 위한 순서도이다. 한편 설명의 편의상 송신측 단말기를 제2 단말기(200)라 하고, 수신측 단말기를 제1 단말기(100)로 하여 설명하면 다음과 같다.6 is a flowchart illustrating an operation of a receiving terminal in an avatar based video call operating method according to an exemplary embodiment of the present invention. On the other hand, for convenience of description, the transmitting terminal is referred to as a second terminal 200 and the receiving terminal is described as a first terminal 100 as follows.

상기 도 6을 참조하면, 본 발명의 제1 단말기(100)는 전원이 공급되면, 공급된 전원을 이용하여 부팅 과정을 수행하고, 부팅이 완료되면 기 설정된 대기화면은 S201 단계에서 출력하도록 제어할 수 있다.Referring to FIG. 6, when power is supplied, the first terminal 100 performs a booting process using the supplied power, and when the booting is completed, the preset standby screen is controlled to be output in step S201. Can be.

이후, 상기 제1 단말기(100)는 S203 단계에서 화상 통화를 요청하는 신호를 수신하는지 여부를 확인하고, 이 과정에서 별도의 통화 요청 신호를 수신하지 않는 경우, S205 단계로 분기하여 사용자의 요청에 따라 제1 단말기(100)의 특정 기능 예를 들면, 파일 재생 기능, 파일 검색 기능, 방송 수신 기능, 이미지 수집 기능, 게임 기능 등을 수행할 수 있다. 이후, 상기 제1 단말기(100)는 S205 단계를 수행하면서 S203 단계 이전으로 분기하여 S203 단계를 지속적으로 감시할 수 있다. 실질적으로, 상기 제1 단말기(100)는 S205 단계를 수행하면서 S203 단계를 대기하게 되는데, 화상 통화 요청 신호 수신이 외부로부터 전달되는 경우, 인터럽트 방식으로 스케줄링될 수 있으므로 별도의 화상 통화 요청 신호 수신 이전에 S205 단계에서 사용자 기능을 지원할 수 있다.Thereafter, the first terminal 100 checks whether a signal for requesting a video call is received in step S203, and if a separate call request signal is not received in this process, branches to step S205 to request a user. Accordingly, a specific function of the first terminal 100 may be performed, for example, a file play function, a file search function, a broadcast reception function, an image collection function, a game function, and the like. Thereafter, the first terminal 100 may continue to monitor step S203 by branching to step S203 while performing step S205. Substantially, the first terminal 100 waits for step S203 while performing step S205. When the video call request signal reception is transmitted from the outside, the first terminal 100 may be scheduled in an interrupt manner before receiving a separate video call request signal. In step S205 can support the user function.

한편, S203 단계에서 화상 통화 요청 신호를 수신하는 경우, 상기 제1 단말기(100)는 수신된 화상 통화 요청 신호가 아바타 모드 기반으로 수행되는 것인지 여부를 확인할 수 있다. 이를 위하여, 화상 통화 요청 신호에 아바타 모드를 기반으로 수행하기를 요청하는 메시지가 포함되는 것이 바람직하며, 상기 제1 단말기(100)는 해당 메시지 확인을 통하여 아바타 모드로 화상 통화 서비스를 수행할 것인지를 묻는 팝업창을 표시부에 출력할 수 있다. 또는, 상기 제1 단말기(100)는 디폴트로 아바타 모드를 기반으로 하는 화상 통화 서비스를 지원할 수 있다. 이 경우, S207 단계 및 일반 화상 통화 모드를 수행하는 S209 단계는 생략될 수 있을 것이다.On the other hand, when receiving a video call request signal in step S203, the first terminal 100 may determine whether the received video call request signal is performed based on the avatar mode. To this end, the video call request signal preferably includes a message requesting to perform the operation based on the avatar mode, and the first terminal 100 determines whether to perform the video call service in the avatar mode by checking the corresponding message. A pop-up window that asks can be displayed on the display. Alternatively, the first terminal 100 may basically support a video call service based on the avatar mode. In this case, step S207 and step S209 for performing a normal video call mode may be omitted.

S207 단계에서 아바타 모드로 설정되는 경우, 제1 단말기(100)는 S211 단계로 진입하여 상대측 아바타 즉, 제2 단말기(200)에 대응하는 아바타를 표시부(140) 일측에 출력할 수 있다. 제2 단말기(200)에 대응하는 아바타는 폰 북에 저장되어 있는 상태에서, 제2 단말기(200)로부터 아바타 기반 화상 통화 요청이 수신되면, 활성화되어 표시부(140)에 출력될 수 있다. 여기서, 상기 제2 단말기(200)에 대응하는 아바타가 없는 경우, 상기 제1 단말기(100)는 제2 단말기(200)로 아바타에 해당하는 아바타 데이터를 요청하는 메시지를 전송하고, 제2 단말기(200)로부터 그에 대응하는 해당 아바타 데이터를 수신할 수 있다. 또한, 상기 제1 단말기(100)는 이러한 과정 수행 없이, 자신의 저장부(150)에 저장되어 있는 특정 아바타를 제2 단말기(200) 사용자에 대응하는 아바타로 설정할 수 도 있다.When it is set to the avatar mode in step S207, the first terminal 100 may enter the step S211 and output an avatar corresponding to the opponent's avatar, that is, an avatar corresponding to the second terminal 200, to one side of the display unit 140. When the avatar corresponding to the second terminal 200 is stored in the phone book and receives an avatar-based video call request from the second terminal 200, the avatar corresponding to the second terminal 200 may be activated and output to the display unit 140. In this case, when there is no avatar corresponding to the second terminal 200, the first terminal 100 transmits a message requesting avatar data corresponding to the avatar to the second terminal 200, and the second terminal ( 200, corresponding avatar data corresponding thereto may be received. Also, the first terminal 100 may set a specific avatar stored in its storage unit 150 as an avatar corresponding to the user of the second terminal 200 without performing this process.

이후, 상기 제1 단말기(100)는 S213 단계에서 제2 단말기(200)에 대응하는 아바타의 상태 및 동작을 변경하도록 지시하는 아바타 실행 코드를 수신하는지 여부를 확인하고, 별도의 아바타 실행 코드 수신 과정이 없는 경우, S211 단계 이전으로 분기하여 이하 과정을 반복적으로 수행할 수 있다. 여기서, 상기 제1 단말기(100)는 화상 통화 종료를 위한 입력 신호가 발생하면, 아바타 실행 코드 수신 여부에 관계없이 모드 과정을 종료할 수 있다.In operation S213, the first terminal 100 determines whether to receive an avatar execution code instructing to change the state and operation of the avatar corresponding to the second terminal 200, and receives a separate avatar execution code. If there is not, branching to step S211 can be performed repeatedly the following process. Here, when an input signal for terminating a video call is generated, the first terminal 100 may end the mode process regardless of whether an avatar execution code is received.

한편, S213 단계에서 아바타 실행 코드를 수신하면, 상기 제1 단말기(100)는 S215 단계로 분기하여 수신된 아바타 실행 코드를 기반으로 제2 단말기(200)의 아바타의 상태나 동작 중 적어도 하나를 변경하도록 제어할 수 있다. 그리고 제1 단말기(100)는 S217 단계에서 화상 통화 종료 여부를 확인하고, 화상 통화가 유지되는 경우, S211 단계 이전으로 분기하여 이하 과정을 반복적으로 수행할 수 있다.On the other hand, when receiving the avatar execution code in step S213, the first terminal 100 branches to step S215 and changes at least one of the state or operation of the avatar of the second terminal 200 based on the received avatar execution code. Can be controlled. In operation S217, the first terminal 100 determines whether the video call is terminated. If the video call is maintained, the first terminal 100 may branch to the previous step S211 to repeatedly perform the following process.

덧붙여 상기 제1 단말기(100)는 제2 단말기(200)와 마찬가지로 자신의 아바타를 표시부(140) 일측에 출력할 수 있으며, 제2 단말기(200)가 수행하는 제스처 인식 및 음성 인식 중 적어도 하나를 수행할 수 있고, 이를 통해 획득된 아바타 실행 코드를 기반으로 자신의 아바타 상태 및 동작을 변경하도록 조절할 수 있다.In addition, like the second terminal 200, the first terminal 100 may output its avatar to one side of the display unit 140, and perform at least one of gesture recognition and voice recognition performed by the second terminal 200. And it can be adjusted to change its avatar state and operation based on the avatar execution code obtained through this.

요약하면, 본 발명의 아바타 기반 화상 통화 방법 및 시스템, 이를 지원하는 단말기는 화상 통화를 수행하는 제1 단말기(100)에 대응하는 아바타 및 제2 단말기(200)에 대응하는 아바타 중 적어도 하나를 기 저장하고 있거나, 상대측으로부터 상대측 아바타에 대한 데이터를 수신하여 저장하고, 이를 화상 데이터를 기반으로 재생되는 영상과 함께 출력할 수 있다. 그리고 본 발명의 제1 단말기(100) 및 제2 단말기(200)는 사용자의 제스처나 특정 음성을 인식하고, 아바타의 상태 및 동작 중 적어도 하나를 변경할 수 있는 아바타 실행 코드를 생성한 후, 이를 송수신함으로써, 상대측 단말기에서 출력 중인 자신의 아바타 상태 및 동작 중 적어도 하나를 변경할 수 있고, 또한 생성된 아바타 실행 코드를 기반으로 자신의 단말기에 출력 중인 자신의 아바타 상태 및 동작 중 적어도 하나를 변경하도록 지원한다. In summary, an avatar-based video call method and system and a terminal supporting the same may be based on at least one of an avatar corresponding to the first terminal 100 performing a video call and an avatar corresponding to the second terminal 200. It may be stored, or may receive and store data on the opponent's avatar from the opponent, and output the data with the image reproduced based on the image data. In addition, the first terminal 100 and the second terminal 200 of the present invention recognize a gesture or a specific voice of the user, generate an avatar execution code that can change at least one of the state and operation of the avatar, and then transmit and receive the same. Accordingly, at least one of the avatar state and the action of the avatar being output by the counterpart terminal may be changed, and the at least one of the avatar state and the action of the avatar is output to the own terminal based on the generated avatar execution code. .

한편 본 발명의 실시예에 따른 화상 통화 시스템은 제1 단말기(100) 또는 제2 단말기(200)가 저장부(150)에 저장된 상대측 아바타를 호출하여 출력하고, 단말기는 상대측의 제스처 인식 또는 음성 인식을 기반으로 생성한 아바타 실행 코드를 상대측 단말기로부터 수신하여 상대측 아바타의 상태 또는 동작을 조절하여 출력하는 예를 개시하였지만 이에 한정되는 것은 아니다. 즉 도 7에 도시된 바와 같이, 제1 단말기(100) 또는 제2 단말기(200)는 상대측에 대응하는 상대측 아바타 및 아바타 실행 코드를 아바타 제공서버(400)를 통하여 수신하여 출력할 수 있다.Meanwhile, in the video call system according to an exemplary embodiment of the present invention, the first terminal 100 or the second terminal 200 calls and outputs the partner avatar stored in the storage unit 150, and the terminal recognizes the other party's gesture or recognizes the voice. An example of receiving an avatar execution code generated based on the received information from an opponent terminal and controlling a state or an operation of the opponent avatar is output, but is not limited thereto. That is, as shown in FIG. 7, the first terminal 100 or the second terminal 200 may receive and output the partner avatar and the avatar execution code corresponding to the partner through the avatar providing server 400.

본 발명의 다른 실시예에 따른 화상 통화 시스템은, 도 7에 도시된 바와 같이, 이동통신망(300)을 매개로 연결된 제1 단말기(100), 제2 단말기(200) 및 아바타 제공서버(400)를 포함하여 구성된다.In the video call system according to another embodiment of the present invention, as shown in FIG. 7, the first terminal 100, the second terminal 200, and the avatar providing server 400 connected through the mobile communication network 300. It is configured to include.

제1 단말기(100)와 제2 단말기(200)는 이동통신망(300)을 매개로 서로 간에 화상 통화를 수행한다. 제1 단말기(100) 또는 제2 단말기(200)는 이동통신망(300)을 통하여 아바타 제공서버(400)로부터 수신한 상대측에 대응하는 상대측 아바타를 출력하고, 상대측이 취하는 특정 제스처 및 음성 인식을 통해 파악한 상대측의 감정상태에 대응하는 아바타 실행 코드를 수신하여 상대측 아바타의 상태 및 동작 중에 적어도 하나를 조절하여 출력한다.The first terminal 100 and the second terminal 200 perform a video call with each other via the mobile communication network 300. The first terminal 100 or the second terminal 200 outputs the partner's avatar corresponding to the partner's partner received from the avatar providing server 400 through the mobile communication network 300, and through a specific gesture and voice recognition taken by the partner's partner. The avatar execution code corresponding to the identified emotion state of the opponent is received, and at least one of the state and the operation of the opponent avatar is adjusted and output.

이동통신망(300)은 제1 단말기(100), 아바타 제공서버(400) 및 착신 단말기(200) 사이의 데이터 전송 및 정보 교환을 위한 일련의 데이터 송수신 동작을 수행한다. 특히 제1 단말기(100)로부터 화상 통화 연결 요청을 수신하면, 이동통신망(300)은 제1 단말기(100)와 제2 단말기(200) 간의 화상 통화를 위한 화상 통화 채널을 형성한다. 형성한 화상 통화 채널을 통하여 화상 통화가 개시되면, 이동통신망(300)은 상대측 아바타 및 아바타 실행 코드를 아바타 제공서버(400)로부터 수신하여 제1 또는 제2 단말기(100,200)로 전송한다. 이때 이동통신망(300)은 상대측 아바타 및 아바타 실행 코드를 현재 형성된 화상 통화 채널을 이용하거나, 형성된 화상 통화 채널과는 별도로 데이터 통신 채널 또는 메시지 서비스 채널을 형성하여 제1 또는 제2 단말기(100,200)로 전송한다.The mobile communication network 300 performs a series of data transmission / reception operations for data transmission and information exchange between the first terminal 100, the avatar providing server 400, and the destination terminal 200. In particular, upon receiving a video call connection request from the first terminal 100, the mobile communication network 300 forms a video call channel for a video call between the first terminal 100 and the second terminal 200. When a video call is initiated through the formed video call channel, the mobile communication network 300 receives the avatar of the other party and the avatar execution code from the avatar providing server 400 and transmits the received avatar to the first or second terminal 100 or 200. In this case, the mobile communication network 300 uses the currently formed video call channel for the other party's avatar and the avatar execution code, or forms a data communication channel or a message service channel separately from the formed video call channel to the first or second terminals 100 and 200. send.

그리고 아바타 제공서버(400)는 이동통신망(300)을 매개로 화상 통화 채널이 형성된 후, 제1 또는 제2 단말기(100,200)로부터 아바타 요청 신호를 수신하면, 아바타 요청 신호를 발신한 단말기로 상대측에 대응하는 상대측 아바타를 전송한다. 아바타 제공서버(400)는 상대측 아바타를 수신한 단말기의 상대측 단말기로부터 화상 통화 데이터를 수신하고, 수신한 화상 통화 데이터에서 상대측이 취하는 특정 제스처 및 음성 인식에 기반하여 아바타 실행 코드를 생성한다. 그리고 아바타 제공서버(400)는 생성한 아바타 실행 코드를 상대측 아바타를 수신한 단말기로 전송한다. 이때 상대측 아바타를 수신한 단말기는 수신한 아바타 실행 코드에 따라 상대측 아바타의 상태 및 동작 중 적어도 하나를 조절하여 출력한다.When the avatar providing server 400 receives the avatar request signal from the first or second terminals 100 and 200 after the video call channel is formed through the mobile communication network 300, the avatar providing server 400 transmits the avatar request signal to the counterpart. Send the corresponding partner avatar. The avatar providing server 400 receives video call data from the counterpart terminal of the terminal receiving the counterpart avatar, and generates the avatar execution code based on a specific gesture and voice recognition taken by the counterpart from the received video call data. The avatar providing server 400 transmits the generated avatar execution code to the terminal receiving the counterpart avatar. At this time, the terminal receiving the opponent avatar adjusts and outputs at least one of the status and operation of the opponent avatar according to the received avatar execution code.

특히 본 발명의 다른 실시예에 따른 아바타 제공서버(400)는 송수신부(410), 데이터베이스부(420) 및 서버제어부(430)를 포함하여 구성된다.In particular, the avatar providing server 400 according to another exemplary embodiment of the present invention includes a transceiver 410, a database 420, and a server controller 430.

송수신부(410)는 이동통신망(300)을 매개로 제1 단말기(100) 및 제2 단말기(200)와 통신을 수행한다. 송수신부(410)는 서버제어부(430)의 제어에 따라 이동통신망(300)으로부터 아바타 요청을 수신하고, 이동통신망(300)으로 요청한 상대측 아바타를 전송한다. 그리고 송수신부(410)는 서버제어부(430)의 제어에 따라 이동통신망(300)으로 아바타 실행 코드를 전송한다.The transceiver 410 communicates with the first terminal 100 and the second terminal 200 via the mobile communication network 300. The transceiver 410 receives an avatar request from the mobile communication network 300 under the control of the server controller 430, and transmits the requested avatar to the mobile communication network 300. The transceiver 410 transmits the avatar execution code to the mobile communication network 300 under the control of the server controller 430.

데이터베이스부(420)는 상대측 단말기의 전화번호에 연결하여 저장하는 상대측 아바타, 제스처 인식을 위한 제스처 인식 데이터베이스, 음성 인식을 위한 음성 인식 데이터베이스, 인식된 제스처 및 특정 음성에 매핑된 아바타 실행 코드를 저장한다. 즉 데이터베이스부(420)은 제1 및 제2 단말기(100,200)의 사용자에 의해 설정된 상대측 아바타와, 상대측 아바타를 동작시키는 아바타 실행 코드를 저장한다. 사용자는 상대측에 대응되게 상대측 아바타를 설정할 수 있다. 상대측 아바타는 사용자의 선택 신호에 따라 상대측의 전화번호에 매핑하여 설정된 아바타와, 디폴트로 설정된 일반 아바타를 포함한다. 디폴트로 설정된 일반 아바타는 사용자가 설정한 상대측 이외의 사람과 화상 통화를 수행할 경우에 사용될 수 있다.The database unit 420 stores the partner avatar connected to the phone number of the partner terminal, the gesture recognition database for gesture recognition, the voice recognition database for voice recognition, the recognized gesture and the avatar execution code mapped to the specific voice. . That is, the database unit 420 stores the partner avatar set by the user of the first and second terminals 100 and 200 and the avatar execution code for operating the partner avatar. The user may set the opponent avatar to correspond to the opponent party. The partner avatar includes an avatar set by mapping to a phone number of the counterpart according to a user's selection signal and a general avatar set by default. The general avatar set as a default may be used when performing a video call with a person other than the other party set by the user.

그리고 서버제어부(430)는 아바타 제공서버(400)의 전반적인 제어 동작을 수행한다. 특히 서버제어부(430)는 화상 통화 중 상대측이 취하는 특정 제스처 및 음성 인식을 기반으로 한 사용자 단말의 아바타 출력을 제어한다.The server controller 430 performs an overall control operation of the avatar providing server 400. In particular, the server controller 430 controls the output of the avatar of the user terminal based on a specific gesture and voice recognition that the other party takes during the video call.

서버제어부(430)는 아바타 설정 모드에서, 제1 및 제2 단말기(100,200)를 통한 사용자의 선택 신호에 따라 상대측에 대응하는 상대측 아바타를 설정하고, 설정된 상대측 아바타를 동작시키는 아바타 실행 코드를 설정할 수 있다. 이때 서버제어부(430)에 의한 상대측 아바타 및 아바타 실행 코드의 설정은 도 2의 제어부(160)에 의한 설정 방식과 동일한 방식으로 수행될 수 있기 때문에, 상세한 설명은 생략한다.In the avatar setting mode, the server controller 430 may set an opponent avatar corresponding to the other party according to the user's selection signals through the first and second terminals 100 and 200, and set an avatar execution code for operating the avatar. have. In this case, since the setting of the partner avatar and the avatar execution code by the server controller 430 may be performed in the same manner as the setting method by the controller 160 of FIG. 2, a detailed description thereof will be omitted.

서버제어부(430)는 이동통신망(300)을 매개로 제1 및 제2 단말기(100,200) 간에 화상 통화 채널이 형성된 이후에, 이동통신망(300)을 통하여 제1 또는 제2 단말기(100,200)로부터 아바타 요청 신호를 수신하면, 아바타 요청 신호를 발신한 단말기로 상대측 아바타를 전송한다. 서버제어부(430)는 상대측 아바타를 수신한 단말기의 상대측 단말기로부터 화상 통화 데이터를 수신한다. 서버제어부(430)는 수신한 화상 통화 데이터에서 상대측이 취하는 특정 제스처 및 음성 인식에 기반하여 아바타 실행 코드를 생성한다. 그리고 서버제어부(430)는 생성한 아바타 실행 코드를 상대측 아바타를 수신한 단말기로 전송한다. 이때 상대측 아바타를 수신한 단말기는 수신한 아바타 실행 코드에 따라 상대측 아바타의 상태 및 동작 중 적어도 하나를 조절하여 출력한다.After the video call channel is formed between the first and second terminals 100 and 200 via the mobile communication network 300, the server controller 430 may display an avatar from the first or second terminals 100 and 200 through the mobile communication network 300. When receiving the request signal, the other party's avatar is transmitted to the terminal that sent the avatar request signal. The server controller 430 receives video call data from a counterpart terminal of the terminal receiving the counterpart avatar. The server controller 430 generates an avatar execution code based on a specific gesture and voice recognition that the other party takes from the received video call data. The server controller 430 transmits the generated avatar execution code to the terminal receiving the counterpart avatar. At this time, the terminal receiving the opponent avatar adjusts and outputs at least one of the status and operation of the opponent avatar according to the received avatar execution code.

이때 서버제어부(430)는 상대측 아바타를 데이터베이스부(420)에서 호출하거나 상대측 단말기로부터 수신하여 사용자 단말기로 전송한다. 즉 화상 통화 채널이 형성되면, 서버제어부(430)는 상대측에 대응하는 상대측 아바타가 데이터베이스부(420)에 설정되어 있는 지를 확인한다. 확인 결과 설정되어 있으면, 서버제어부(430)는 설정된 상대측 아바타를 호출한다. 확인 결과 미설정되어 있으면, 서버제어부(430)는 데이터베이스부(420)에 저장된 일반 아바타를 호출하거나 상대측 단말기로 상대측 아바타를 요청하여 수신할 수 있다. 그리고 서버제어부(430)는 호출하거나 수신한 상대측 아바타를 사용자 단말기로 전송한다. 이때 서버제어부(430)는 상대측 단말기로부터 상대측 아바타를 요청하여 수신할 때, 현재 형성된 화상 통화 채널을 이용할 수도 있고, 데이터 통신 채널 또는 메시지 서비스 채널을 화상 통화 채널과는 별도로 형성할 수 있다. 서버제어부(430)는 상대측 단말기로부터 수신한 상대측 아바타를 상대측의 전화번호에 연결하여 데이터베이스부(420)에 저장할 수 있다.At this time, the server controller 430 calls the partner avatar from the database unit 420 or receives it from the counterpart terminal and transmits it to the user terminal. That is, when the video call channel is formed, the server controller 430 checks whether the partner avatar corresponding to the partner is set in the database unit 420. If the check result is set, the server controller 430 calls the set counterpart avatar. If the check result is not set, the server control unit 430 may call the general avatar stored in the database unit 420 or request and receive the other party's avatar from the other party's terminal. The server controller 430 transmits the called or received counterpart avatar to the user terminal. In this case, when the server controller 430 requests and receives the partner's avatar from the counterpart terminal, the server controller 430 may use the currently formed video call channel, or may form a data communication channel or a message service channel separately from the video call channel. The server controller 430 may store the partner avatar received from the counterpart terminal in the database unit 420 by connecting the counterpart avatar to the phone number of the counterpart.

서버제어부(430)는 수신한 화상 통화 데이터의 영상에서 상대측의 제스처를 인식하고, 수신한 화상 통화 데이터의 음성에서 상대측의 음성을 인식한다. 즉 서버제어부(430)는 수신한 영상에서 상대측 영상을 추출하고, 추출한 상대측 영상에서 관심영역을 추적하여 객체의 제스처를 인식한다. 이때 관심영역은 얼굴, 양 손, 양 팔 중에 적어도 하나를 포함한다. 서버제어부(430)는 수신한 음성에서 상대측의 감정을 표현하는 단어나 어구의 포함 여부와, 음성 톤의 높낮이로부터 상대측의 음성을 인식한다.The server controller 430 recognizes the other party's gesture from the received video call data and recognizes the other party's voice from the received video call data. That is, the server controller 430 extracts a counterpart image from the received image, tracks a region of interest in the extracted counterpart image, and recognizes an object gesture. In this case, the ROI includes at least one of a face, both hands, and both arms. The server controller 430 recognizes the voice of the other party from the received voice whether the word or phrase expressing the emotion of the other party is included and the height of the voice tone.

서버제어부(430)는 제스처 인식을 통하여 다음과 같이 아바타 실행 코드를 생성할 수 있다. 즉 수신한 화상 통화 데이터의 영상에서 특정 상태 및 동작 변화를 추출한다. 서버제어부(430)는 추출된 특정 상태 및 동작 변화가 기 설정된 특정 제스처에 대응하는지 비교한다. 비교 결과 특정 제스처에 대응하는 경우, 서버제어부(430)는 특정 제스처에 기 매핑된 아바타 실행 코드를 추출하여 생성한다.The server controller 430 may generate an avatar execution code as follows through gesture recognition. That is, the specific state and motion change are extracted from the video of the received video call data. The server controller 430 compares whether the extracted specific state and operation change correspond to a predetermined specific gesture. As a result of the comparison, in response to the specific gesture, the server controller 430 extracts and generates the avatar execution code mapped to the specific gesture.

서버제어부(430)는 음성 인식을 통하여 다음과 같이 아바타 실행 코드를 생성할 수 있다. 즉 서버제어부(430)는 수신한 화상 통화 데이터의 음성을 상기와 같이 인식한다. 서버제어부(430)는 인식한 음성에 기 설정된 특정 음성 있는 지를 판단한다. 판단 결과 특정 음성이 있는 경우, 서버제어부(430)는 특정 음성에 기 매핑된 아바타 실행 코드를 추출하여 생성한다.The server controller 430 may generate an avatar execution code as follows through speech recognition. That is, the server controller 430 recognizes the voice of the received video call data as described above. The server controller 430 determines whether there is a predetermined voice in the recognized voice. If there is a specific voice as a result of the determination, the server controller 430 extracts and generates an avatar execution code that is previously mapped to the specific voice.

한편 서버제어부(430)는 특정 제스처 및 특정 음성에 기 매핑된 아바타 실행 코드를 추출하여 생성할 수도 있다.Meanwhile, the server controller 430 may extract and generate an avatar execution code mapped to a specific gesture and a specific voice.

그리고 서버제어부(430)는 생성한 아바타 실행 코드를 현재 설정된 화상 통화 채널, 별도의 데이터 통신 채널 또는 메시지 서비스 채널을 통하여 사용자 단말기로 전송한다.The server controller 430 transmits the generated avatar execution code to the user terminal through a currently set video call channel, a separate data communication channel, or a message service channel.

이와 같이 본 발명의 다른 실시예에 따른 화상 통화 시스템의 아바타 제공서버(430)는 화상 통화 채널이 형성되면 제1 및 제2 단말기(100,200)로 상대측에 대응하는 상대측 아바타를 전송하고, 상대측의 제스처 및 음성 인식을 기반으로 생성한 아바타 실행 코드를 상대측 아바타를 수신한 단말기로 전송한다. 그리고 상대측 아바타를 수신한 단말기는 아바타 실행 코드에 따라 상대측 아바타의 상태 및 동작 중 적어도 하나를 조절하여 출력하기 때문에, 상대측 아바타를 통하여 상대측의 감정상태를 사용자에게 효과적으로 전달할 수 있다.As such, when the video call channel is formed, the avatar providing server 430 of the video call system according to another embodiment of the present invention transmits a counterpart avatar corresponding to the counterpart to the first and second terminals 100 and 200, and gestures of the counterpart. And transmitting the avatar execution code generated based on the voice recognition to the terminal receiving the counterpart avatar. Since the terminal receiving the opponent avatar adjusts and outputs at least one of the state and operation of the opponent avatar according to the avatar execution code, the terminal may effectively transmit the emotional state of the opponent to the user through the opponent avatar.

본 발명의 다른 실시예에 따른 화상 통화 시스템에서 상대측의 감정을 전달하기 위한 화상 통화 방법에 대해서 도 7 및 도 8을 참조하여 설명하면 다음과 같다. 여기서 도 8은 본 발명의 다른 실시 예에 따른 화상 통화 방법을 설명하기 위한 순서도이다.A video call method for transmitting an opponent's emotion in a video call system according to another embodiment of the present invention will be described with reference to FIGS. 7 and 8 as follows. 8 is a flowchart illustrating a video call method according to another embodiment of the present invention.

먼저 S201 단계에서 제1 단말기(100)가 제2 단말기(200)와의 화상 통화 연결 요청을 발신하면, S203 단계에서 이동통신망(300)은 이를 수신하여 제2 단말기(200)로 화상 통화 착신을 요청한다.First, when the first terminal 100 sends a video call connection request with the second terminal 200 in step S201, the mobile communication network 300 receives the request and receives a video call from the second terminal 200 in step S203. do.

다음으로 S205 단계에서 제2 단말기(200)가 화상 통화 착신 요청에 대한 수락 신호를 이동통신망(300)으로 전송하면, S207 단계에서 이동통신망(300)은 제1 단말기(100)와 제2 단말기(200) 간에 화상 통화 채널을 형성한다. 제1 단말기(100)와 제2 단말기(200)는 형성된 화상 통화 채널을 통하여 각각 획득한 화상 통화 데이터를 서로 교환하며 화상 통화를 수행한다.Next, when the second terminal 200 transmits the acceptance signal for the video call incoming request to the mobile communication network 300 in step S205, the mobile communication network 300 is connected to the first terminal 100 and the second terminal (S207). 200) to form a video call channel. The first terminal 100 and the second terminal 200 exchange video call data obtained through the formed video call channel with each other and perform a video call.

한편 S201 단계를 수행하기 전에, 아바타 제공서버(400)는 제1 단말기(100)를 통한 사용자의 선택 신호에 따라 아바타 설정 모드에서 상대방에 대응하는 아바타와, 설정된 아바타를 동작시키는 아바타 실행 코드를 설정할 수 있다. 사용자는 아바타 설정 모드에서 자신 또는 상대방이 좋아하거나 관심 있는 분야의 아바타 또는 아바타 실행 코드를 선택하여 설정할 수 있다.Meanwhile, before performing step S201, the avatar providing server 400 may set an avatar corresponding to the other party and an avatar execution code for operating the set avatar in the avatar setting mode according to a user's selection signal through the first terminal 100. Can be. In the avatar setting mode, the user may select and set an avatar or an avatar execution code of a field that the user or the counterpart likes or is interested in.

다음으로 S209 단계에서 제1 단말기(100)는 사용자로부터 아바타 모드가 선택되는 지의 여부를 판단한다. 이때 사용자는 입력부(120)를 통하여 아바타 모드를 선택할 수 있다.Next, in step S209, the first terminal 100 determines whether the avatar mode is selected from the user. In this case, the user may select the avatar mode through the input unit 120.

S209 단계의 판단 결과 아바타 모드가 선택되지 않은 경우, 제1 단말기(100)는 일반적인 화상 통화 모드를 유지한다.If the avatar mode is not selected as a result of the determination in step S209, the first terminal 100 maintains the normal video call mode.

S209 단계의 판단 결과 아바타 모드가 선택된 경우, S211 단계에서 제1 단말기(100)는 제2 단말기(200)의 사용자(상대측)에 대응하는 아바타 요청 신호를 이동통신망(300)으로 전송한다. S213 단계에서 이동통신망(300)은 수신한 아바타 요청 신호를 아바타 제공서버(400)로 전달한다.When the avatar mode is selected as a result of step S209, in operation S211, the first terminal 100 transmits an avatar request signal corresponding to a user (relative side) of the second terminal 200 to the mobile communication network 300. In operation S213, the mobile communication network 300 transmits the received avatar request signal to the avatar providing server 400.

다음으로 S215단계에서 아바타 제공서버(400)는 데이터베이스부(420)에서 제1 또는 제2 단말기(100,200) 사용자에 의해 설정된 상대측 아바타를 추출하여 이동통신망(300)으로 전송한다. 이때 상대측 아바타는 제2 단말기(200) 사용자에 대응되게 설정된 아바타일 수 있다. 아바타 제공서버(400)는 상대측 아바타를 현재 형성된 화상 통화 채널, 별도의 데이터 통신 채널 또는 메시지 서비스 채널을 통하여 전송할 수 있다.Next, in step S215, the avatar providing server 400 extracts the partner avatar set by the user of the first or second terminal 100 or 200 from the database unit 420 and transmits the avatar to the mobile communication network 300. In this case, the partner avatar may be an avatar set to correspond to the user of the second terminal 200. The avatar providing server 400 may transmit the avatar of the opposite party through a currently formed video call channel, a separate data communication channel, or a message service channel.

다음으로 S217 단계에서 이동통신망(300)은 수신한 상대측의 화상 통화 데이터와 상대측 아바타를 제1 단말기(100)로 전송한다. 이어서 S218 단계에서 제1 단말기(100)는 수신한 상대측 아바타와 화상 통화 데이터를 출력한다. 즉 제1 단말기(100)는 수신한 화상 통화 데이터에서 음성은 오디오 처리부(130)를 거쳐 스피커(SPK)를 통해 출력한다. 제1 단말기(100)는 수신한 화상 통화 데이터에서 상대측 영상과 상대측 아바타를 함께 표시부(140)에 표시한다. 이때 제1 단말기(100)가 상대측 아바타를 표시할 때, 상대측 영상과 상대측 아바타를 구분하여 표시하거나, 상대측 영상이 출력되는 영역에 상대측 아바타를 오버 레이 시켜 표시할 수 있다.Next, in step S217, the mobile communication network 300 transmits the received video call data of the other party and the other party's avatar to the first terminal 100. Subsequently, in step S218, the first terminal 100 outputs video call data with the received counterpart avatar. That is, the first terminal 100 outputs the voice through the speaker SPK through the audio processor 130 in the received video call data. The first terminal 100 displays the counterpart image and the counterpart avatar together on the display unit 140 in the received video call data. In this case, when the first terminal 100 displays the other party's avatar, the other party's image and the other party's avatar may be displayed separately or may be displayed by overlaying the other party's avatar in the region where the other party's image is output.

다음으로 S219 단계에서 이동통신망(300)은 제2 단말기(200)로부터 수신한 화상 통화 데이터를 아바타 제공서버(400)로 전송한다.Next, in step S219, the mobile communication network 300 transmits the video call data received from the second terminal 200 to the avatar providing server 400.

다음으로 S221 단계에서 아바타 제공서버(400)는 수신한 화상 통화 데이터를 분석하여 상대측이 취하는 특정 제스처 및 음성을 인식한다. 이어서 S223 단계에서 아바타 제공서버(400)는 인식한 특정 제스처 및 음성을 기반으로 아바타 실행 코드를 생성한다.Next, in step S221, the avatar providing server 400 analyzes the received video call data and recognizes a specific gesture and voice taken by the other party. Subsequently, in step S223, the avatar providing server 400 generates an avatar execution code based on the recognized specific gesture and voice.

즉 아바타 제공서버(400)는 수신한 영상에서 상대측 영상을 추출하고, 추출한 상대측 영상에서 관심영역을 추적하여 객체의 제스처를 인식한다. 이때 관심영역은 얼굴, 양 손, 양 팔 중에 적어도 하나를 포함한다. 아바타 제공서버(400)는 수신한 음성에서 상대측의 감정을 표현하는 단어나 어구의 포함 여부와, 음성 톤의 높낮이로부터 상대측의 음성을 인식한다.That is, the avatar providing server 400 extracts a counterpart image from the received image, tracks a region of interest in the extracted counterpart image, and recognizes an object gesture. In this case, the ROI includes at least one of a face, both hands, and both arms. The avatar providing server 400 recognizes the voice of the other party from the received voice whether the word or phrase expressing the other party's feelings is included and the height of the voice tone.

아바타 제공서버(400)는 제스처 인식을 통하여 다음과 같이 아바타 실행 코드를 생성할 수 있다. 즉 아바타 제공서버(400)는 수신한 화상 통화 데이터의 영상에서 특정 상태 및 동작 변화를 추출한다. 아바타 제공서버(400)는 추출된 특정 상태 및 동작 변화가 기 설정된 특정 제스처에 대응하는지 비교한다. 비교 결과 특정 제스처에 대응하는 경우, 아바타 제공서버(400)는 특정 제스처에 기 매핑된 아바타 실행 코드를 추출하여 생성한다.The avatar providing server 400 may generate an avatar execution code as follows through gesture recognition. That is, the avatar providing server 400 extracts a specific state and motion change from the image of the received video call data. The avatar providing server 400 compares whether the extracted specific state and motion change correspond to a preset specific gesture. When the comparison result corresponds to a specific gesture, the avatar providing server 400 extracts and generates an avatar execution code previously mapped to the specific gesture.

아바타 제공서버(400)는 음성 인식을 통하여 다음과 같이 아바타 실행 코드를 생성할 수 있다. 즉 아바타 제공서버(400)는 수신한 화상 통화 데이터의 음성을 상기와 같이 인식한다. 아바타 제공서버(400)는 인식한 음성에 기 설정된 특정 음성 있는 지를 판단한다. 판단 결과 특정 음성이 있는 경우, 아바타 제공서버(400)는 특정 음성에 기 매핑된 아바타 실행 코드를 추출하여 생성한다.The avatar providing server 400 may generate an avatar execution code as follows through speech recognition. That is, the avatar providing server 400 recognizes the voice of the received video call data as described above. The avatar providing server 400 determines whether a predetermined voice is present in the recognized voice. As a result of the determination, when there is a specific voice, the avatar providing server 400 extracts and generates an avatar execution code previously mapped to the specific voice.

한편 아바타 제공서버(400)는 특정 제스처 및 특정 음성에 기 매핑된 아바타 실행 코드를 추출하여 생성할 수도 있다.Meanwhile, the avatar providing server 400 may extract and generate an avatar execution code mapped to a specific gesture and a specific voice.

다음으로 S225단계에서 아바타 제공서버(400)는 생성한 아바타 실행 코드를 이동통신망(300)으로 전송한다. 이어서 S227단계에서 이동통신망(300)은 수신한 아바타 실행 코드를 제1 단말기(100)로 전달한다. 이때 아바타 제공서버(400)는 아바타 실행 코드를 현재 형성된 화상 통화 채널, 별도의 데이터 통신 채널 또는 메시지 서비스 채널을 통하여 전송할 수 있다.Next, in step S225, the avatar providing server 400 transmits the generated avatar execution code to the mobile communication network 300. In operation S227, the mobile communication network 300 transmits the received avatar execution code to the first terminal 100. In this case, the avatar providing server 400 may transmit the avatar execution code through a currently formed video call channel, a separate data communication channel, or a message service channel.

그리고 S229 단계에서 제1 단말기(100)는 수신한 아바타 실행 코드에 따라 상대측 아바타를 변형하여 출력한다. 즉 제1 단말기(100)는 수신한 아바타 실행 코드에 따라 상대측 아바타의 상태 및 동작 중 적어도 하나를 조절하여 출력한다. 예컨대 제1 단말기(100)가 상대측 아바타를 표시할 때, 상대측 영상과 상대측 아바타를 구분하여 표시하거나, 상대측 영상이 출력되는 영역에 상대측 아바타를 오버 레이 시켜 표시할 수 있다.In operation S229, the first terminal 100 transforms and outputs the opponent avatar according to the received avatar execution code. That is, the first terminal 100 adjusts and outputs at least one of a state and an operation of the partner avatar according to the received avatar execution code. For example, when the first terminal 100 displays the opponent's avatar, the opponent's image and the opponent's avatar may be displayed separately or may be displayed by overlaying the opponent's avatar in a region where the opponent's image is output.

한편 본 발명의 다른 실시예에 따른 화상 통화 방법에서는 아바타 제공서버(400)가 제1 단말기(100)로 제2 단말기(200) 사용자의 감정상태를 표현하는 상대측 아바타를 제공하는 예를 개시하였지만, 반대로 제2 단말기(200)로 제1 단말기(100) 사용자의 감정상태를 표현하는 상대측 아바타를 제공할 수도 있다. 또는 제1 및 제2 단말기(100,200)에 상대측의 감정상태를 표현하는 상대측 아바타를 각각 제공할 수도 있다.Meanwhile, in the video call method according to another embodiment of the present invention, the avatar providing server 400 discloses an example in which the avatar providing server 400 provides a counterpart avatar representing the emotional state of the user of the second terminal 200. On the contrary, a counterpart avatar may be provided to the second terminal 200 to express the emotional state of the user of the first terminal 100. Alternatively, the first and second terminals 100 and 200 may provide counterpart avatars representing the emotional state of the counterpart, respectively.

이상 본 발명을 몇 가지 바람직한 실시 예를 사용하여 설명하였으나, 이들 실시 예는 예시적인 것이며 한정적인 것이 아니다. 이와 같이, 본 발명이 속하는 기술분야에서 통상의 지식을 지닌 자라면 본 발명의 사상과 첨부된 특허청구범위에 제시된 권리범위에서 벗어나지 않으면서 균등론에 따라 다양한 변화와 수정을 가할 수 있음을 이해할 것이다.While the present invention has been described with reference to several preferred embodiments, these embodiments are illustrative and not restrictive. As such, those of ordinary skill in the art will appreciate that various changes and modifications may be made according to equivalents without departing from the spirit of the present invention and the scope of rights set forth in the appended claims.

본 발명은 아바타 기반 화상 통화 방법 및 시스템, 이를 지원하는 단말기에 관한 것으로, 화상 통화 시 사용자의 제스처 및 음성을 기반으로 상대측 단말기 또는 자신의 단말기 표시부에 출력 중인 상대측 아바타의 표정이나 동작 등을 제어하여 출력함으로써, 단말기 사용자들은 화상 통화를 수행하면서 상대측이 나타내고자 하는 감정이나 표현 등을 상대측 아바타를 통하여 빠르고 정확하게 인식할 수 있다.The present invention relates to an avatar-based video call method and system, and a terminal supporting the same. The present invention relates to controlling an expression or operation of an avatar outputted on a counterpart terminal or a display unit of a terminal based on a gesture and a voice of a user during a video call. By outputting, the terminal users can quickly and accurately recognize the emotion or expression that the other party wants to express through the other party's avatar while performing a video call.

도 1은 본 발명의 실시 예에 따른 화상 통화 시스템의 구성을 개략적으로 나타낸 도면이다.1 is a view schematically showing the configuration of a video call system according to an embodiment of the present invention.

도 2는 본 발명의 실시 예에 따른 단말기의 구성을 개략적으로 나타낸 도면이다.2 is a diagram schematically illustrating a configuration of a terminal according to an exemplary embodiment of the present invention.

도 3은 도 2의 단말기 구성 중 제어부의 구성을 보다 상세히 나타낸 도면이다.FIG. 3 is a diagram illustrating in more detail the configuration of the controller in the terminal configuration of FIG. 2.

도 4는 도 2의 표시부를 구성하는 화면 인터페이스의 일예를 나타낸 도면이다.4 is a diagram illustrating an example of a screen interface configuring the display unit of FIG. 2.

도 5는 본 발명의 실시 예에 따른 화상 통화 방법 중 송신측 단말기의 운용을 설명하기 위한 순서도이다.5 is a flowchart illustrating an operation of a transmitting terminal in a video call method according to an exemplary embodiment of the present invention.

도 6은 본 발명의 실시 예에 따른 화상 통화 방법 중 수신측 단말기의 운용을 설명하기 위한 순서도이다.6 is a flowchart illustrating an operation of a receiving terminal in a video call method according to an exemplary embodiment of the present invention.

도 7은 본 발명의 다른 실시 예에 따른 화상 통화 시스템의 구성을 개략적으로 나타낸 도면이다.7 is a view schematically showing the configuration of a video call system according to another embodiment of the present invention.

도 8은 본 발명의 다른 실시 예에 따른 화상 통화 방법을 설명하기 위한 순서도이다.8 is a flowchart illustrating a video call method according to another embodiment of the present invention.

* 도면의 주요 부분에 대한 설명 *Description of the Related Art [0002]

100 : 제1 단말기 110 : 통신부100: first terminal 110: communication unit

120 : 입력부 130 : 오디오 처리부120: input unit 130: audio processing unit

140 : 표시부 150 : 저장부140: Display unit 150:

160 : 제어부 161 : 아바타 서비스 모듈160: control unit 161: avatar service module

163 : 화상통화 모듈 165 : 제스처 인식부163: video call module 165: gesture recognition unit

167 : 음성 인식부 170 : 카메라167: speech recognition unit 170: camera

200 : 제2 단말기 300 : 이동통신망200: second terminal 300: mobile communication network

400 : 아바타 제공서버 410 : 송수신부400: avatar providing server 410: transceiver

420 : 데이터베이스부 430 : 서버제어부420: database unit 430: server control unit

Claims

When a video call channel is formed, the first terminal outputs an avatar corresponding to the second terminal, and adjusts and outputs at least one of the state and operation of the avatar based on the avatar execution code received from the second terminal. ;

Recognizing a voice including any one of a specific word or a specific phrase expressing the height and emotion of the voice from a specific gesture and an audio signal collected by a microphone from an operation state and a change of a subject collected by a camera, and the recognized information The second terminal generating the avatar execution code for adjusting at least one of a state and an operation of the avatar, and transmitting the avatar execution code to the first terminal;

Avatar-based video call system comprising a.

A communication network controlling a video call between a first terminal and a second terminal;

After receiving the avatar request signal from the first or second terminal after the video call channel is established through the communication network, the avatar corresponding to the other party is transmitted to the terminal that sent the avatar request signal, and the terminal receives the avatar. Receives the video call data from the other party's terminal and executes the avatar by recognizing a voice including any one of a specific word or a specific phrase expressing a specific gesture and a voice's height and emotion that the other party takes from the received video call data. An avatar providing server generating a code and transmitting the avatar execution code to a terminal receiving the avatar;

Avatar-based video call system comprising a.

A transceiver for communicating with a first terminal or a second terminal via a communication network;

After receiving the avatar request signal from the first or second terminal after the video call channel is established through the communication network, the avatar corresponding to the other party is transmitted to the terminal that sent the avatar request signal, and the terminal receives the avatar. Receives the video call data from the other party's terminal and executes the avatar by recognizing a voice including any one of a specific word or a specific phrase expressing a specific gesture and a voice's height and emotion that the other party takes from the received video call data. A server controller which generates a code and transmits the avatar execution code to a terminal that has received the avatar;

Avatar providing server of the avatar-based video call system comprising a.

The method of claim 3, wherein

A database unit for storing the avatar connected to the phone number of the counterpart terminal, the gesture recognition database for the gesture recognition, the voice recognition database for the voice recognition, the avatar execution code mapped to the recognized gesture and a specific voice; ;

The avatar providing server of the avatar-based video call system, further comprising a.

A camera for collecting an image to be transmitted to a counterpart terminal for a video call;

A display unit for outputting an image of the opposite party received from the opposite terminal;

When a video call channel is established with the counterpart terminal, the avatar corresponding to the counterpart terminal is output to the display unit, and any one of a specific word or a specific phrase expressing the height and emotion of the counterpart's gesture and voice is output from the counterpart terminal. A control unit configured to receive an avatar execution code generated by recognizing a voice including a voice, and to adjust at least one of a state and an operation of the avatar according to the avatar execution code to output to the display unit;

A terminal of the avatar-based video call system comprising a.

6. The method of claim 5,

A storage unit for storing an avatar connected to a phone number of the counterpart terminal, a gesture recognition database for gesture recognition, a voice recognition database for voice recognition, and an avatar execution code mapped to the recognized gesture and a specific voice;

The terminal of the avatar-based video call system, further comprising.

6. The method of claim 5,

The control unit

A gesture recognition unit generating the avatar execution code through the gesture recognition;

A speech recognition unit generating the avatar execution code through the speech recognition;

An avatar service module for outputting the avatar and adjusting at least one of a state and an operation;

A video call module for supporting the video call;

Terminal of the avatar-based video call system comprising a.

After receiving a avatar request signal from the first or second terminal after a video call channel is established between the first terminal and the second terminal, the avatar providing server transmits an avatar corresponding to the other party to the terminal that sent the avatar request signal. Transmitting an avatar;

A receiving step of receiving, by the avatar providing server, video call data from a counterpart terminal of the terminal receiving the avatar;

A step of generating, by the avatar providing server, an avatar execution code by recognizing a voice including one of a specific word and a specific phrase expressing a specific gesture and a voice level and an emotion of the other party in the received video call data; ;

Transmitting, by the avatar providing server, the avatar execution code to the terminal receiving the avatar;

Avatar-based video call method comprising a.

9. The method of claim 8,

In the avatar transmission step or avatar execution code transmission step,

The avatar providing server transmits the avatar or the avatar execution code through at least one of the video call channel, the data communication channel, and the message service channel.

The method of claim 8, wherein the generating step,

Extracting, by the avatar providing server, a specific state and motion change from the received video call data image;

Comparing, by the avatar providing server, whether the extracted specific state and motion change correspond to a preset specific gesture;

When the comparison result corresponds to the specific gesture, detecting, by the avatar providing server, an avatar execution code previously mapped to the specific gesture;

Avatar-based video call method comprising a.

The method of claim 8, wherein the generating step,

Recognizing a speech by the avatar providing server extracting any one of a specific word or a specific phrase expressing the height and the emotion of the voice of the received video call data;

Determining, by the avatar providing server, whether there is a predetermined voice in the recognized voice;

Detecting, by the avatar providing server, an avatar execution code previously mapped to the specific voice if the specific voice is present as a result of the determination;

Avatar-based video call method comprising a.

The method of claim 8, wherein the step is performed before the avatar receiving step,

Selecting and setting, by the avatar providing server, an avatar or an avatar execution code of a field that the user or the other party likes or is interested in in the avatar setting mode;

Avatar-based video call method further comprising.

A channel forming step of forming a video call channel between the first terminal and the second terminal;

Outputting, by the first terminal, an avatar corresponding to the second terminal;

Recognizing a voice including any one of a specific word or a specific phrase expressing the height and emotion of the voice from a specific gesture and an audio signal collected by a microphone from an operation state and a change of an image collected by the camera of the second terminal A receiving step of receiving, by the first terminal, the avatar execution code from the second terminal generating an avatar execution code capable of adjusting at least one of the state and operation of the avatar;

An adjustment output step of the first terminal adjusting and outputting at least one of a state and an operation of the avatar according to the avatar execution code;

Avatar-based video call method comprising a.

The method of claim 13,

Generating the avatar execution code in the receiving step

Extracting, by the second terminal, a specific state and operation change of the image collected by the camera;

Comparing, by the second terminal, whether the extracted specific state and operation change correspond to a preset specific gesture;

When the comparison result corresponds to the specific gesture, detecting, by the second terminal, an avatar execution code previously mapped to the specific gesture;

Avatar-based video call method comprising a.

The method of claim 13,

Generating the avatar execution code in the receiving step

Extracting and recognizing, by the second terminal, a specific word or specific phrase that expresses the height and emotion of the voice collected by the microphone;

Determining, by the second terminal, whether a predetermined voice is present in the recognized voice;

Detecting, by the second terminal, an avatar execution code that is pre-mapped to the specific voice if the specific voice is present as a result of the determination;

Avatar-based video call method comprising a.

The method of claim 13,

The output stage or the adjustment output stage

Dividing the screen by the first terminal and dividing and outputting the image received from the avatar and the second terminal in the divided area; or

Overlaying and outputting the avatar to an area in which the first terminal outputs an image received from the second terminal;

Avatar-based video call method comprising a.

The method of claim 13,

The output step

Detecting and outputting, by the first terminal, the stored avatar connected to the telephone number of the second terminal;

Avatar-based video call method comprising a.

The method of claim 13,

The output step

Receiving, by the first terminal, avatar data corresponding to the avatar from the second terminal, and outputting an avatar based on the received avatar data; or

Designating, by the first terminal, a specific avatar previously stored in a storage unit as an avatar of the second terminal, and outputting the designated avatar;

Avatar-based video call method comprising the step of any one of.

The method of claim 18,

The output step

Storing, by the first terminal, the received avatar or the designated avatar in connection with the telephone number of the second terminal;

Avatar-based video call method further comprising.

The method of claim 13,

In the receiving step

And the first terminal receives the avatar or the avatar execution code through at least one of the video call channel, the data communication channel, and the message service channel.

The method of claim 13, wherein the step is performed before the channel forming step.

Selecting and setting, by the first terminal, an avatar or an avatar execution code of a field that the user or the other party likes or is interested in in the avatar setting mode;

Avatar-based video call method further comprising.

The method according to any one of claims 8 to 21,

The avatar is a graphic content that represents the emotion state of the other party, and includes one of an animated character, a video, a still image, a user created content (UCC), an emoticon, a flash, and a haptic content in which a video and vibration are combined. Avatar-based video call method, characterized in that.