KR20090054609A

KR20090054609A - Voip telephone communication system and method for providing users with telephone communication service comprising emotional contents effect

Info

Publication number: KR20090054609A
Application number: KR1020070121365A
Authority: KR
Inventors: 이지현
Original assignee: (주)씨앤에스 테크놀로지
Priority date: 2007-11-27
Filing date: 2007-11-27
Publication date: 2009-06-01
Also published as: KR100941598B1

Abstract

A VoIP telecommunication system for offering the telephone communication service which includes an emotional contents effect and a method thereof are provided to express the emotion of a user more abundantly during a voice/video call. A plurality of VoIP phones include an emote message data generation function and a voice/video communications function. A VoIP server(103) provides the communication between the VoIP phones through the Internet communication network by relaying the transmission of data packets between the VoIP phones. A contents database(101) stores emote contents data. At the request of the VoIP server, the contents database provides the emote contents data to the VoIP server.

Description

ＶｏＩＰ telephone communication system and method for providing users with telephone communication service comprising emotional contents effect}

본 발명은 전화 통신 시스템 및 방법에 관한 것으로서, 더욱 상세하게는, VoIP(voice over internet protocol) 전화 통신 시스템 및 방법에 관한 것이다.TECHNICAL FIELD The present invention relates to telephony communication systems and methods, and more particularly, to a voice over internet protocol (VoIP) telephony communication system and method.

일반적으로, VoIP 전화 통신 시스템은 각 사용자에게 IP(internet protocol) 망을 통하여 음성 및 영상 통화 서비스를 제공한다. 통화 상대방과의 음성 및 영상 통화를 위해, 각 사용자는 VoIP 폰을 이용하여, VoIP 서버에 의해 제공되는 IP 망을 통하여, 통화 상대방의 VoIP 폰에 통신 접속한다. 하지만 종래의 VoIP 전화 통신 시스템은 단순히 사용자들 간의 음성 및 영상 통화 서비스만을 제공하므로, 사용자들의 다양한 욕구를 충족시키는 데에 한계가 있다.In general, the VoIP telephony system provides voice and video call service to each user through an IP (internet protocol) network. For voice and video call with the call counterpart, each user communicates with the call counterpart's VoIP phone via the IP network provided by the VoIP server using the VoIP phone. However, the conventional VoIP telephony system merely provides a voice and video call service between users, and thus has limitations in satisfying various needs of users.

따라서, 본 발명이 이루고자 하는 기술적 과제는, 사용자의 음성을 인식하거나, 또는 사용자가 선택한 이모트 아이콘을 인식한 결과로서 발신 측 VoIP 폰이 생성한 이모트 메시지 데이터를, VoIP 서버가 수신하여 이모트 메시지 데이터에 대응하는 이모트 콘텐츠 데이터를, 착신 측 VoIP 폰에 제공함으로써, 착신 측 VoIP 폰에 상기 사용자의 음성 및 영상과 함께, 감정적 영상, 감정적 텍스트, 및 감정적 음향 중 적어도 하나가 출력되도록 하여, 사용자가 음성 및 영상 통화 시 자신의 감정을 좀 더 풍부하게 표현할 수 있도록 한 VoIP 전화 통신 시스템을 제공하는 데 있다.Therefore, the technical problem to be achieved by the present invention is that the VoIP server receives emote message data generated by the calling party's VoIP phone as a result of recognizing a user's voice or an emoticon icon selected by the user. By providing emote content data corresponding to the message data to the called VoIP phone, at least one of an emotional video, an emotional text, and an emotional sound is output to the called VoIP phone together with the voice and video of the user. It is to provide a VoIP telephony system that allows users to express their emotions more abundantly in voice and video calls.

본 발명이 이루고자 하는 다른 기술적 과제는, 사용자의 음성을 인식하거나, 또는 사용자가 선택한 이모트 아이콘을 인식한 결과로서 발신 측 VoIP 폰이 생성한 이모트 메시지 데이터를, VoIP 서버가 수신하여 이모트 메시지 데이터에 대응하는 이모트 콘텐츠 데이터를, 착신 측 VoIP 폰에 제공함으로써, 착신 측 VoIP 폰에 상기 사용자의 음성 및 영상과 함께, 감정적 영상, 감정적 텍스트, 및 감정적 음향 중 적어도 하나가 출력되도록 하여, 사용자가 음성 및 영상 통화 시 자신의 감정을 좀 더 풍부하게 표현할 수 있도록 한 VoIP 전화 통신 방법을 제공하는 데 있다.Another technical problem to be solved by the present invention is that the VoIP server receives the emote message data generated by the calling VoIP phone as a result of recognizing the user's voice or the emote icon selected by the user. By providing emote content data corresponding to the data to the called VoIP phone, at least one of an emotional video, an emotional text, and an emotional sound is output to the called VoIP phone, together with the voice and video of the user. Is to provide a VoIP telephony method that enables users to express their feelings more abundantly in voice and video calls.

상기한 기술적 과제를 달성하기 위한 본 발명에 따른 VoIP 전화 통신 시스템은, 복수의 VoIP 폰, VoIP 서버, 및 콘텐츠 DB를 포함한다. 복수의 VoIP 폰은 설정 된 어휘 데이터들 중 적어도 하나, 또는 아이콘 그래픽 데이터들 중 적어도 하나를 포함하는 이모트 메시지 데이터들의 생성 기능과, 음성 및 영상 통화 기능을 가진다. VoIP 서버는 인터넷 통신망을 통해 상기 복수의 VoIP 폰 간의 통신을 제공하고, 상기 복수의 VoIP 폰 간의 데이터 패킷의 전송을 중계한다. 콘텐츠 DB는 상기 이모트 메시지 데이터들에 각각 대응하고, 감정적 영상, 감정적 텍스트, 및 감정적 음향 중 적어도 하나를 나타내는 이모트 콘텐츠 데이터들을 저장하고, 상기 VoIP 서버의 요청에 따라 상기 이모트 콘텐츠 데이터들을 상기 VoIP 서버에 제공한다. 상기 데이터 패킷은, 상기 사용자의 음성에 대응하는 인코딩 오디오 데이터, 및 상기 사용자의 모습을 촬영한 영상에 대응하는 인코딩 비디오 데이터를 포함하거나, 또는 상기 이모트 메시지 데이터들 중 적어도 하나, 상기 인코딩 오디오 데이터, 및 상기 인코딩 비디오 데이터를 포함한다. 상기 복수의 VoIP 폰 각각은, 사용자에게 통화 상대방과의 음성 및 영상 통화 기능을 제공하는 동안, 상기 사용자의 음성을 인식하고, 상기 사용자의 음성이 상기 설정된 어휘 데이터들 중 적어도 하나에 대응하는 내용을 포함하거나, 또는 상기 아이콘 그래픽 데이터들에 기초하여 상기 복수의 VoIP 폰 각각에 의해 표시되는 이모트 아이콘들 중 하나를 상기 사용자가 선택할 때마다, 상기 이모트 메시지 데이터들 중 적어도 하나를 생성한다. 상기 VoIP 서버는, 상기 복수의 VoIP 폰 중 발신 측 VoIP 폰으로부터의 데이터 패킷이 적어도 하나의 이모트 메시지 데이터를 포함할 때마다, 상기 적어도 하나의 이모트 메시지 데이터에 대응하는, 적어도 하나의 이모트 콘텐츠 데이터를 상기 콘텐츠 DB로부터 판독하고, 상기 적어도 하나의 이모트 콘텐츠 데이터, 상기 인코딩 오디오 데이터, 및 상기 인코딩 비디오 데이터를 포함하는 변경 데이터 패킷을 생성하여, 상기 복수의 VoIP 폰 중 착신 측 VoIP 폰에 전송한다.The VoIP telephony communication system according to the present invention for achieving the above technical problem includes a plurality of VoIP phones, VoIP servers, and content DB. The plurality of VoIP phones have a function of generating emote message data including at least one of set lexical data or at least one of icon graphic data, and a voice and video call function. The VoIP server provides communication between the plurality of VoIP phones via an internet communication network, and relays transmission of data packets between the plurality of VoIP phones. The content DB corresponds to the emote message data, respectively, and stores emote content data representing at least one of an emotional image, an emotional text, and an emotional sound, and stores the emote content data at the request of the VoIP server. Provided to the VoIP server. The data packet includes encoded audio data corresponding to the voice of the user and encoded video data corresponding to an image of the user, or at least one of the emote message data, and the encoded audio data. And the encoded video data. Each of the plurality of VoIP phones recognizes a voice of the user while providing a voice and video call function with a call counterpart to the user, and the voice of the user corresponds to at least one of the set lexical data. Each time the user selects one of the emote icons displayed by each of the plurality of VoIP phones based on the icon graphic data or includes the at least one of the emote message data. The VoIP server corresponds to the at least one emote message data whenever a data packet from an originating VoIP phone among the plurality of VoIP phones includes at least one emote message data. Reads the content data from the content DB, generates a modified data packet including the at least one emote content data, the encoded audio data, and the encoded video data, and transmits the changed data packet to the called party VoIP phone of the plurality of VoIP phones. send.

상기한 다른 기술적 과제를 달성하기 위한 본 발명에 따른 VoIP 전화 통신 방법은, VoIP 서버에 의해, 적어도 두 개의 VoIP 폰 간의 호를 연결하는 단계; 상기 적어도 두 개의 VoIP 폰 중 발신 측 VoIP 폰에 의해, 이모트 메시지 데이터, 인코딩 오디오 데이터, 및 인코딩 비디오 데이터를 포함하거나, 또는 인코딩 오디오 데이터 및 인코딩 비디오 데이터를 포함하는 데이터 패킷을 생성하는 단계; 상기 발신 측 VoIP 폰으로부터 수신되는 상기 데이터 패킷이 상기 이모트 메시지 데이터를 포함할 때, 상기 VoIP 서버에 의해, 상기 이모트 메시지 데이터에 대응하는 이모트 콘텐츠 데이터, 인코딩 오디오 데이터, 및 인코딩 비디오 데이터를 포함하는 변경 데이터 패킷을 생성하는 단계; 상기 VoIP 서버에 의해, 상기 변경 데이터 패킷을 상기 적어도 두 개의 VoIP 폰 중 착신 측 VoIP 폰에 전송하는 단계; 상기 발신 측 VoIP 폰으로부터 수신되는 상기 데이터 패킷이 상기 이모트 메시지 데이터를 포함하지 않을 때, 상기 VoIP 서버에 의해, 상기 데이터 패킷을 그대로 상기 착신 측 VoIP 폰에 전송하는 단계; 및 상기 적어도 두 개의 VoIP 폰 간의 호 연결이 해제될 때까지, 상기 데이터 패킷의 생성 단계 내지 상기 착신 측 VoIP 폰에 상기 변경 데이터 패킷 또는 상기 데이터 패킷을 전송하는 단계를 반복하는 단계를 포함한다. 상기 이모트 메시지 데이터는, 사용자의 감정이나 느낌을 함축하는 이모트 아이콘들을 각각 나타내는 아이콘 그래픽 데이터들 중 적어도 하나, 또는 설정된 어휘 데이터들 중 적어도 하나를 포함하고, 상기 이모트 콘텐츠 데이터는 상기 사용 자의 감정이나 느낌을 표현하는 감정적 영상, 감정적 텍스트, 및 감정적 음향 중 적어도 하나를 나타낸다.According to another aspect of the present invention, there is provided a VoIP telephony communication method comprising: connecting a call between at least two VoIP phones by a VoIP server; Generating, by an originating VoIP phone of the at least two VoIP phones, a data packet comprising emote message data, encoded audio data, and encoded video data, or comprising encoded audio data and encoded video data; When the data packet received from the originating VoIP phone includes the emote message data, the VoIP server extracts emote content data, encoded audio data, and encoded video data corresponding to the emote message data. Generating a change data packet comprising; Sending, by the VoIP server, the change data packet to a called VoIP phone of the at least two VoIP phones; When the data packet received from the originating VoIP phone does not include the emote message data, transmitting, by the VoIP server, the data packet as it is to the called party VoIP phone; And repeating the generating of the data packet or transmitting the change data packet or the data packet to the called party's VoIP phone until the call connection between the at least two VoIP phones is released. The emote message data may include at least one of icon graphic data representing emote icons, each of which represents an emotion or a feeling of a user, or at least one of set vocabulary data, wherein the emote content data may be used by the user. At least one of an emotional image, an emotional text, and an emotional sound representing an emotion or feeling.

상술한 것과 같이, 본 발명에 따른 VoIP 전화 통신 시스템 및 방법은, 사용자의 음성을 인식하거나, 또는 사용자가 선택한 이모트 아이콘을 인식한 결과로서 발신 측 VoIP 폰이 생성한 이모트 메시지 데이터를, VoIP 서버가 수신하여 이모트 메시지 데이터에 대응하는 이모트 콘텐츠 데이터를, 착신 측 VoIP 폰에 제공하므로, 착신 측 VoIP 폰이 상기 사용자의 음성 및 영상과 함께, 감정적 영상, 감정적 텍스트, 및 감정적 음향 중 적어도 하나를 출력할 수 있다. 그 결과, 사용자가 음성 및 영상 통화 시 자신의 감정을 좀 더 풍부하게 표현할 수 있다.As described above, the VoIP telephony communication system and method according to the present invention provide VoIP data of emote messages generated by the calling party's VoIP phone as a result of recognizing a user's voice or an emoticon icon selected by the user. Since the server receives and provides the emote content data corresponding to the emote message data to the called VoIP phone, the called VoIP phone, together with the voice and video of the user, includes at least one of an emotional video, an emotional text, and an emotional sound. You can output one. As a result, users can express their emotions more abundantly in voice and video calls.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 설명하기로 한다. 그러나, 본 발명은 이하에서 개시되는 실시예에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예는 본 발명의 개시가 완전하도록 하며 통상의 지식을 가진자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이다.Hereinafter, with reference to the accompanying drawings will be described a preferred embodiment of the present invention. However, the present invention is not limited to the embodiments disclosed below, but can be implemented in various forms, and only the present embodiments are intended to complete the disclosure of the present invention and to those skilled in the art. It is provided for complete information.

도 1은 본 발명의 일 실시예에 따른 VoIP(Voice over internet protocol) 전화 통신 시스템의 개략적인 블록 구성도이다. VoIP 전화 통신 시스템(100)은 복수의 VoIP 폰(VP1∼VPK)(K는 정수), 콘텐츠 DB(contents data base)(101), 단말 DB(102), VoIP 서버(103), 및 관리 DB(104)를 포함한다. 복수의 VoIP 폰(VP1∼VPK) 은 각각 사용자에게 음성 및 영상 통화 기능을 제공한다. 또, 복수의 VoIP 폰(VP1∼VPK) 각각은 이모트 메시지 데이터들(EMSG1∼EMSGJ)(J는 정수, 도 2 참고)의 생성 기능을 갖는다. 여기에서, 이모트 메시지 데이터들(EMSG1∼EMSGJ)은 각각 설정된 어휘 데이터들(SVDT1∼SVDTP, 도 2 참고)(P는 정수) 중 적어도 하나를 포함하거나, 또는 아이콘 그래픽 데이터들(IGDT1∼IGDTQ, 도 2 참고)(Q는 정수) 중 적어도 하나를 포함한다. 설정된 어휘 데이터들(SVDT1∼SVDTP)은 복수의 단어, 복수의 어구, 및 복수의 문장 중 일부 또는 전체를 각각 나타낸다. 아이콘 그래픽 데이터들(IGDT1∼IGDTQ)은 사용자의 감정이나 느낌을 함축하는 이모트 아이콘들(EICON1∼EICONQ, 도 2 및 도 4 참고)(Q는 정수)을 각각 나타낸다.1 is a schematic block diagram of a voice over internet protocol (VoIP) telephony communication system according to an embodiment of the present invention. The VoIP telephony communication system 100 includes a plurality of VoIP phones (VP1 to VPK) (K is an integer), a content DB (contents data base) 101, a terminal DB 102, a VoIP server 103, and a management DB ( 104). The plurality of VoIP phones VP1 to VPK provide voice and video call functions to users, respectively. In addition, each of the plurality of VoIP phones VP1 to VPK has a function of generating emote message data EMSSG1 to EMSSGJ (J is an integer, see FIG. 2). Here, the emote message data EMSG1 to EMSGJ each include at least one of set lexical data SVDT1 to SVDTP (see FIG. 2) (P is an integer), or icon graphic data (IGDT1 to IGDTQ, 2) (Q is an integer). The set lexical data SVDT1 to SVDTP represent some or all of a plurality of words, a plurality of phrases, and a plurality of sentences, respectively. The icon graphic data IGDT1 to IGDTQ represent emote icons EICON1 to EICONQ (see Figs. 2 and 4) (Q is an integer), respectively, which imply the user's emotion or feeling.

한편, 설정된 어휘 데이터들(SVDT1∼SVDTP)은 서로 다른 제1 식별 인자들(미도시)을 각각 포함할 수 있고, 아이콘 그래픽 데이터들(IGDT1∼IGDTQ) 역시 서로 다른 제2 식별 인자들(미도시)을 각각 포함할 수 있다. 이 경우, 이모트 메시지 데이터들(EMSG1∼EMSGJ)은 각각 상기 제1 식별 인자들 중 적어도 하나, 또는 상기 제2 식별 인자들 중 적어도 하나를 포함할 수 있다. 이모트 메시지 데이터들(EMSG1∼EMSGJ) 각각이 상기 제1 또는 제2 식별 인자를 포함할 경우, 이모트 메시지 데이터들(EMSG1∼EMSGJ) 각각이 설정된 어휘 데이터들(SVDT1∼SVDTP) 중 적어도 하나 또는 아이콘 그래픽 데이터들(IGDT1∼IGDTQ) 중 적어도 하나를 포함하는 경우에 비하여, 이모트 메시지 데이터들(EMSG1∼EMSGJ) 각각의 용량이 감소할 수 있다.Meanwhile, the set lexical data SVDT1 to SVDTP may include different first identification factors (not shown), and the icon graphic data IGDT1 to IGDTQ may also have different second identification factors (not shown). ) May be included. In this case, the emote message data EMSG1 to EMSGJ may each include at least one of the first identification factors or at least one of the second identification factors. When each of the emote message data EMSG1 to EMSGJ includes the first or second identification factor, at least one of the lexical data SVDT1 to SVDTP to which each of the emote message data EMSG1 to EMSGJ is set or Compared to the case of including at least one of the icon graphic data IGDT1 to IGDTQ, the capacity of each of the emote message data EMSSG 1 to EMSGJ may be reduced.

콘텐츠 DB(101)는 이모트 메시지 데이터들(EMSG1∼EMSGJ)에 각각 대응하는 이모트 콘텐츠 데이터들(ECNT1∼ECNTJ)을 저장한다. 이모트 콘텐츠 데이터들(ECNT1 ∼ECNTJ)은 사용자의 감정이나 느낌을 표현하는 감정적 영상, 감정적 텍스트, 및 감정적 음향 중 적어도 하나를 각각 나타낸다. 단말 DB(102)는 VoIP 폰(VP1∼VPK)의 단말 정보들(VPIF1∼VPIFK)을 저장한다. 여기에서, 단말 정보들(VPIF1∼VPIFK) 각각은, VoIP 폰(VP1∼VPK) 각각에 대한 ID(identification) 정보, 사용자 정보, 및 통신 서비스 가입 정보 중 어느 하나를 포함할 수 있다. VoIP 서버(103)는 인터넷 통신망(NT)을 통해 VoIP 폰(VP1∼VPK) 간의 통신을 제공한다. 이때 VoIP 서버(103)와 VoIP 폰(VP1∼VPK) 각각은 SIP(session initiation protocol) 통신 방식으로 상호 통신한다. 또, VoIP 서버(103)는 VoIP 폰(VP1∼VPK) 간의 데이터 패킷의 전송을 중계한다. VoIP 서버(103)는, VoIP 폰(VP1∼VPK) 중 발신 측 VoIP 폰(예를 들어, VP1)으로부터의 데이터 패킷이 적어도 하나의 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 포함할 때마다, 적어도 하나의 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)에 대응하는, 적어도 하나의 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)를 콘텐츠 DB(101)로부터 판독한다. 그 후, VoIP 서버(103)는 발신 측 VoIP 폰(예를 들어, VP1)의 데이터 패킷에 포함된 인코딩 오디오 데이터(EAUD) 및 인코딩 비디오 데이터(EVID)와, 판독한 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)를 포함하는 변경 데이터 패킷(DPCK21)을 생성하여, VoIP 폰(VP1∼VPK) 중 착신 측 VoIP 폰(예를 들어, VPK)에 전송한다. 여기에서, "발신 측 VoIP 폰 및 착신 측 VoIP 폰"의 용어들은 호의 연결을 요청한 VoIP 폰 및 호의 연결을 요청받은 VoIP 폰을 나타내는 것에 한정되지 않는다. 즉, "발신 측 VoIP 폰"의 용어는 데이터 패킷을 전송하는 쪽의 VoIP 폰을 의미하기 위해 사용 되었고, "착신 측 VoIP 폰"의 용어는 데이터 패킷을 수신하는 쪽의 VoIP 폰을 의미하기 위해 사용된 것임을 유의해야 한다. 따라서, 하나의 VoIP 폰이 데이터 패킷을 전송할 때에는 발신 측 VoIP 폰으로 되고, 데이터 패킷을 수신할 때에는 착신 측 VoIP 폰으로 된다. 한편, 관리 DB(104)는 VoIP 서버(103)에 포함되는 서버 제어부(160, 도 3 참고)의 동작과 관련한 제어 프로그램(CTLPGM)을 저장한다.The content DB 101 stores emote content data ECNT1 to ECNTJ corresponding to the emote message data EMSG1 to EMSGJ, respectively. The emote content data ECNT1 to ECNTJ each represent at least one of an emotional image, an emotional text, and an emotional sound representing a user's emotion or feeling. The terminal DB 102 stores terminal information VPIF1 to VPIFK of the VoIP phones VP1 to VPK. Here, each of the terminal information VPIF1 to VPIFK may include any one of identification information, user information, and communication service subscription information for each of the VoIP phones VP1 to VPK. The VoIP server 103 provides communication between VoIP phones VP1 to VPK via the Internet communication network NT. At this time, the VoIP server 103 and the VoIP phones (VP1 to VPK) each communicate with each other in a session initiation protocol (SIP) communication method. In addition, the VoIP server 103 relays the transmission of data packets between the VoIP phones (VP1 to VPK). When the data packet from the originating VoIP phone (for example, VP1) among the VoIP phones VP1-VPK includes at least one emote message data (at least one of EMSSG1-EMSSGJ) Each time, at least one emoticon content data (at least one of ECNT1 to ECNTJ) corresponding to at least one emote message data (at least one of EMSSG1 to EMSGJ) is read out from the content DB 101. Thereafter, the VoIP server 103 encodes the encoded audio data (EAUD) and the encoded video data (EVID) contained in the data packet of the originating VoIP phone (e.g., VP1), and the read emote content data (ECNT1). A change data packet DPCK21 containing at least one of the ECNTJs is generated and transmitted to the destination VoIP phone (for example, VPK) of the VoIP phones VP1 to VPK. Here, the terms " calling side VoIP phone and called side VoIP phone " are not limited to indicating the VoIP phone requesting the call connection and the VoIP phone requesting the call connection. In other words, the term "calling VoIP phone" is used to mean a VoIP phone that transmits a data packet, and the term "calling VoIP phone" is used to mean a VoIP phone that receives a data packet. It should be noted. Therefore, when one VoIP phone transmits a data packet, it becomes a calling party VoIP phone, and when receiving a data packet, it becomes a called VoIP phone. Meanwhile, the management DB 104 stores a control program CTLPGM related to the operation of the server controller 160 (refer to FIG. 3) included in the VoIP server 103.

도 2를 참고하여, VoIP 폰들(VP1∼VPK)의 구성 및 동작을 좀 더 상세히 설명한다. 도 2는 도 1에 도시된 VoIP 폰의 상세한 블록 구성도이다. VoIP 폰들(VP1∼VPK)의 구성 및 동작은 서로 유사하므로, 설명의 간략화를 위해, VoIP 폰(VP1)의 구성 및 동작을 중심으로 설명하기로 한다. VoIP 폰(VP1)은 사용자 인터페이스부(110), 단말 제어부(120), 이모트 메시지 생성부(130), 및 통신부(140)를 포함한다. 사용자 인터페이스부(110)는 입력부(111), 마이크(112), 촬영부(113), 미디어 프로세서(114), 디스플레이부(115), 오디오 신호 처리부(116), 및 스피커(117)를 포함한다. 입력부(111)는 복수의 입력 키(미도시)를 포함하거나, 또는 터치 패드, 또는 터치 스크린 등과 같은 입력 장치로 구현될 수 있다. 입력부(111)는 사용자의 입력에 따라, 음성 인식 선택 신호(SRSL), 음성 인식 해제 신호(SCSL), 아이콘 인식 선택 신호(IRSL), 아이콘 인식 해제 신호(ICSL), 이모트 아이콘 선택 신호(EISL), 호출 신호(CALL), 및 해제 신호(STOP) 중 어느 하나를 단말 제어부(120)에 출력한다. 도 2에 도시되지 않았지만, 입력부(111)는 상술한 신호들 외에, 사용자의 입력에 따라 VoIP 폰(VP1)의 동작을 제어하기 위한 추가의 신호들을 더 출력할 수 있다. 마이크(112)는 사용자의 음성을 오디오 데이터(AUD)로 변환하여 출력 한다. 촬영부(113)는 사용자의 모습을 촬영하고, 그 촬영한 영상을 비디오 데이터(VID)로 변환하여 출력한다. 미디어 프로세서(114)는 마이크(112)로부터 수신되는 오디오 데이터(AUD)와, 촬영부(113)로부터 수신되는 비디오 데이터(VID)를 각각 인코딩하고, 인코딩 오디오 데이터(EAUD) 및 인코딩 비디오 데이터(EVID)를 단말 제어부(120)에 출력한다. 또, 미디어 프로세서(114)는 단말 제어부(120)로부터 수신되는 발신 측 VoIP 폰(VP2∼VPK 중 하나)의 인코딩 오디오 데이터(REAUD) 및 인코딩 비디오 데이터(REVID)를 각각 디코딩하고, 디코딩 오디오 데이터(DEAUD) 및 디코딩 비디오 데이터(DEVID)를 출력한다. 또, 미디어 프로세서(114)는 단말 제어부(120)로부터 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)를 수신하여, 디스플레이부(115) 또는 오디오 신호 처리부(116)에 출력한다. 예를 들어, 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)가 감정적 영상이나, 감정적 텍스트를 나타낼 때, 미디어 프로세서(114)는 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)를 디스플레이부(115)에 출력한다. 또, 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)가 감정적 음향을 나타낼 때, 미디어 프로세서(114)는 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)를 오디오 신호 처리부(116)에 출력한다.Referring to Figure 2, the configuration and operation of the VoIP phones VP1 to VPK will be described in more detail. FIG. 2 is a detailed block diagram of the VoIP phone shown in FIG. Since the configuration and operation of the VoIP phones VP1 to VPK are similar to each other, for simplicity of explanation, the configuration and operation of the VoIP phones VP1 will be described. The VoIP phone VP1 includes a user interface 110, a terminal controller 120, an emote message generator 130, and a communicator 140. The user interface unit 110 includes an input unit 111, a microphone 112, a photographing unit 113, a media processor 114, a display unit 115, an audio signal processor 116, and a speaker 117. . The input unit 111 may include a plurality of input keys (not shown) or may be implemented as an input device such as a touch pad or a touch screen. The input unit 111 may perform a voice recognition selection signal SRSL, a voice recognition release signal SCSL, an icon recognition selection signal IRSL, an icon recognition release signal ISL, and an emote icon selection signal according to a user input. ), A call signal CALL, and a release signal STOP are output to the terminal controller 120. Although not shown in FIG. 2, in addition to the signals described above, the input unit 111 may further output additional signals for controlling the operation of the VoIP phone VP1 according to a user input. The microphone 112 converts a user's voice into audio data AUD and outputs the audio. The photographing unit 113 photographs the user's state, and converts the photographed image into video data VID. The media processor 114 encodes the audio data AUD received from the microphone 112 and the video data VID received from the photographing unit 113, respectively, and encodes the encoded audio data EAUD and the encoded video data EVID. ) Is output to the terminal controller 120. The media processor 114 decodes the encoded audio data REAUD and the encoded video data REVID of the calling VoIP phone (one of the VP2 to VPK) received from the terminal controller 120, respectively, and decodes the audio data ( DEAUD) and decoded video data (DEVID). In addition, the media processor 114 receives the emote content data (at least one of the ECNT1 to ECNTJ) from the terminal controller 120, and outputs the emote content data to the display unit 115 or the audio signal processing unit 116. For example, when the emote content data (at least one of ECNT1 to ECNTJ) represents an emotional image or an emotional text, the media processor 114 displays the emote content data (at least one of ECNT1 to ECNTJ). ) In addition, when the emote content data (at least one of the ECNT1 to ECNTJ) exhibits an emotional sound, the media processor 114 outputs the emote content data (at least one of the ECNT1 to ECNTJ) to the audio signal processor 116.

디스플레이부(115)는 미디어 프로세서(114)로부터 수신되는 디코딩 비디오 데이터(DEVID)에 기초하여, 발신 측 VoIP 폰(VP2∼VPK 중 하나) 사용자(즉, 통화 상대방)의 모습을 나타내는 영상을 표시한다. 또, 디스플레이부(115)는 미디어 프로세서(114)로부터 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)를 수신할 때, 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)에 기초하여, 통화 상대방 의 모습을 나타내는 영상과 함께, 감정적 영상이나 감정적 텍스트를 표시한다. 여기에서, 감정적 영상 또는 감정적 텍스트는 통화 상대방의 느낌이나 감정을 나타내는 다양한 형태의 영상 또는 텍스트를 포함한다. 예를 들어, 감정적 영상 또는 감정적 텍스트는 플래시콘(flashcon) 또는 이모티콘(emoticon) 형태의 영상 또는 텍스트를 포함할 수 있다. 또, 감정적 영상은 자연 풍경이나 특정 대상(사람, 사물, 동식물 등)을 촬영한 영상을 포함할 수 있다. 또, 감정적 텍스트는 다양한 형태의 글씨체로 표현된 텍스트를 포함할 수 있다. 도 5a 내지 도 5f를 참고하면, 이모트 콘텐츠 데이터들(ECNT1∼ECNTJ)이 이모티콘 형태의 감정적 영상을 나타내는 경우, 감정적 영상의 일례가 도시되어 있다.The display unit 115 displays an image showing the appearance of the calling party VoIP phone (one of VP2 to VPK) user (that is, the call counterpart) based on the decoded video data DEVID received from the media processor 114. . Also, when the display unit 115 receives the emote content data (at least one of the ECNT1 to ECNTJ) from the media processor 114, the display unit 115 determines the call counterpart based on the emote content data (at least one of the ECNT1 to ECNTJ). Along with the images that represent them, emotional images or emotional text are displayed. Here, the emotional image or the emotional text includes various forms of the image or text representing the feeling or emotion of the call counterpart. For example, the emotional image or the emotional text may include an image or text in the form of a flashcon or emoticon. In addition, the emotional image may include an image photographing a natural landscape or a specific object (a person, an object, a plant or the like). In addition, the emotional text may include text expressed in various types of fonts. 5A to 5F, when the emote content data ECNT1 to ECNTJ represent an emotional image in the form of an emoticon, an example of the emotional image is illustrated.

다시 도 2를 참고하면, 디스플레이부(115)는 이모트 메시지 생성부(130)로부터 수신되는 아이콘 그래픽 데이터들(IGDT1∼IGDTQ)에 기초하여, 이모트 아이콘들(EICON1∼EICONQ)을 표시한다. 도 4를 참고하면, 디스플레이부(115)의 표시 화면(115a)에 표시된 이모트 아이콘들(EICON1∼EICON6)의 일례가 도시되어 있다. 도면의 간략화를 위해, 도 4에는 6개의 이모트 아이콘들(EICON1∼EICON6)만이 도시된다. 이모트 아이콘들(EICON1∼EICONQ)은 도 4에 도시된 것과 같이, 문자 형태로 표시될 수 있고, 특정 모양으로 표현될 수도 있다. 다시 도 2를 참고하면, 디스플레이부(115)는 이모트 메시지 생성부(130)로부터 수신되는 표시 제어 신호(DCTL)에 응답하여, 이모트 아이콘들(EICON1∼EICONQ)의 표시 동작을 정지한다.Referring back to FIG. 2, the display 115 displays the emote icons EICON1 to EICONQ based on the icon graphic data IGDT1 to IGDTQ received from the emote message generator 130. Referring to FIG. 4, examples of emote icons EICON1 to EICON6 displayed on the display screen 115a of the display 115 are illustrated. For simplicity, only six emote icons EICON1 to EICON6 are shown in FIG. 4. The emote icons EICON1 to EICONQ may be displayed in a letter form or may be represented in a specific shape as shown in FIG. 4. Referring back to FIG. 2, the display 115 stops displaying the emote icons EICON1 to EICONQ in response to the display control signal DCTL received from the emote message generator 130.

오디오 신호 처리부(116)는 미디어 프로세서(114)로부터 수신되는 디코딩 오디오 데이터(DEAUD)에 기초하여, 통화 상대방의 음성을 스피커(117)에 출력한다. 또, 오디오 신호 처리부(116)는 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)에 기초하여, 감정적 음향을 스피커(117)에 출력한다. 이때, 스피커(117)가 오디오 신호 처리부(116)에 의해 제어되어, 통화 상대방의 음성을 출력하면서, 감정적 음향을 배경음으로 출력할 수 있다. 또, 스피커(117)는 통화 상대방의 음성을 먼저 출력한 후 감정적 음향을 출력하거나, 또는 감정적 음향을 먼저 출력한 후, 통화 상대방의 음성을 출력할 수도 있다. 여기에서, 감정적 음향은 통화 상대방의 느낌이나 감정을 나타내는 다양한 음향들을 포함한다.The audio signal processor 116 outputs the voice of the other party to the speaker 117 based on the decoded audio data DEAUD received from the media processor 114. In addition, the audio signal processing unit 116 outputs the emotional sound to the speaker 117 based on the emote content data (at least one of the ECNT1 to ECNTJ). In this case, the speaker 117 may be controlled by the audio signal processor 116 to output the emotional sound as the background sound while outputting the voice of the call counterpart. In addition, the speaker 117 may first output the voice of the other party and then output the emotional sound, or output the emotional sound first and then output the voice of the other party. Here, the emotional sound includes various sounds representing the feeling or emotion of the call counterpart.

단말 제어부(120)는 입력부(111)로부터 수신되는 음성 인식 선택 신호(SRSL)에 응답하여, 미디어 프로세서(114)로부터 수신되는 인코딩 오디오 데이터(EAUD)를 이모트 메시지 생성부(130)에 출력한다. 그 결과, VoIP 폰(VP1)의 음성 인식 기능이 선택된다. 또, 단말 제어부(120)는 입력부(111)로부터 수신되는 아이콘 인식 선택 신호(IRSL)에 응답하여, 아이콘 표시 신호(ICDP)를 이모트 메시지 생성부(130)에 출력한다. 그 결과, VoIP 폰(VP1)의 이모트 아이콘 선택 기능이 선택된다. 단말 제어부(120)는 입력부(111)로부터 이모트 아이콘 선택 신호(EISL)를 수신하여 이모트 메시지 생성부(130)에 출력한다. 이 후, 단말 제어부(120)는 이모트 메시지 생성부(130)로부터 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 수신한다. 또, 단말 제어부(120)는 입력부(111)로부터 수신되는 음성 인식 해제 신호(SCSL)에 응답하여, 인코딩 오디오 데이터(EAUD)를 이모트 메시지 생성부(130)에 출력하는 동작을 정지한다. 단말 제어부(120)는 입력부(111)로부터 수신되는 아이콘 인식 해제 신호(ICSL)에 응답하여, 아이콘 표시 정지 신호(ICDPS)를 출력한다. 결국, 사용 자는 입력부(111)를 조작하여 VoIP 폰(VP1)의 음성 인식 기능 또는 이모트 아이콘 선택 기능을 선택함으로써, 이모트 메시지 데이터의 생성 기능을 선택하거나 또는 해제할 수 있다.The terminal controller 120 outputs the encoded audio data EAUD received from the media processor 114 to the emote message generator 130 in response to the voice recognition selection signal SRSL received from the input unit 111. . As a result, the voice recognition function of the VoIP phone VP1 is selected. In addition, the terminal controller 120 outputs the icon display signal ICDP to the emote message generator 130 in response to the icon recognition selection signal IRSL received from the input unit 111. As a result, the emote icon selection function of the VoIP phone VP1 is selected. The terminal controller 120 receives the emote icon selection signal EISL from the input unit 111 and outputs it to the emote message generator 130. Thereafter, the terminal controller 120 receives the emote message data (at least one of the EMSSG1 to the EMSSGJ) from the emote message generator 130. In addition, the terminal controller 120 stops outputting the encoded audio data EAUD to the emote message generator 130 in response to the voice recognition cancel signal SCSL received from the input unit 111. The terminal controller 120 outputs the icon display stop signal ICDPS in response to the icon recognition cancel signal ISL received from the input unit 111. As a result, the user may select or cancel the generation function of the emote message data by selecting the voice recognition function or the emote icon selection function of the VoIP phone VP1 by operating the input unit 111.

한편, 단말 제어부(120)는 이모트 메시지 생성부(130)로부터 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 수신하면, 인코딩 오디오 데이터(EAUD), 인코딩 비디오 데이터(EVID), 및 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 통신부(140)에 출력한다. 단말 제어부(120)는 이모트 메시지 생성부(130)로부터 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 수신하지 않은 경우(즉, 이모트 메시지 생성부(130)로부터 메시지 부재 신호(MABSS)를 수신한 경우), 인코딩 오디오 데이터(EAUD) 및 인코딩 비디오 데이터(EVID)만을 통신부(140)에 출력한다. 여기에서, 단말 제어부(120)가 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 수신하지 않은 경우는, 사용자가 입력부(111)를 조작하여 VoIP 폰(VP1)의 이모트 메시지 데이터의 생성 기능을 해제하였거나, 또는 이모트 메시지 생성부(130)가 인코딩 오디오 데이터(EAUD)에 기초하여 사용자의 음성을 인식한 결과 사용자의 음성에 포함된 어휘 데이터가 설정된 어휘 데이터(SVDT1∼SVDTP 중 적어도 하나)에 대응하는 내용을 포함하지 않거나, 또는 사용자가 이모트 아이콘(EICON1∼EICONQ 중 적어도 하나)을 선택하지 않은 경우일 수 있다.On the other hand, when the terminal control unit 120 receives the emote message data (at least one of the EMSSG1 to the EMSSGJ) from the emote message generating unit 130, the encoded audio data EAUD, the encoded video data EVID, and the emote. The message data (at least one of EMSSG1 to EMSGJ) is output to the communication unit 140. When the terminal controller 120 does not receive the emote message data (at least one of EMSSG1 to EMSGJ) from the emote message generator 130 (that is, the message absent signal MABSS) from the emote message generator 130. Is received), only the encoded audio data EAUD and the encoded video data EVID are output to the communication unit 140. Here, when the terminal controller 120 does not receive the emote message data (at least one of the EMSSG1 to the EMSSGJ), the user operates the input unit 111 to generate the emote message data of the VoIP phone VP1. Vocabulary data (or at least one of SVDT1 to SVDTP) in which the vocabulary data included in the user's voice is set as a result of decompressing or when the emote message generator 130 recognizes the user's voice based on the encoded audio data EAUD. It may not include the content corresponding to the or the user did not select the emote icon (at least one of EICON1 to EICONQ).

단말 제어부(120)는 통신부(140)로부터 수신되는 발신 측 VoIP 폰(VP2∼VPK 중 하나)의 인코딩 오디오 데이터(REAUD) 및 인코딩 비디오 데이터(REVID)를 미디어 프로세서(114)에 출력한다. 또, 단말 제어부(120)는 통신부(140)로부터 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)를 수신하여 미디어 프로세서(114)에 출력한다. 단말 제어부(120)는 입력부(111)로부터 수신되는 호출 신호(CALL)에 응답하여, 자신의 단말 정보(VPIF1) 및 착신 측 VoIP 폰(VP2∼VPK 중 하나)의 단말 정보(VPIF2∼VPIFK 중 하나)와, 호 연결 요청 신호(CREQ1)를 통신부(140)에 출력한다. 여기에서, 자신의 단말 정보(VPIF1)는 단말 제어부(120)에 저장될 수 있고, 착신 측 VoIP 폰(VP2∼VPK 중 하나)의 단말 정보(VPIF2∼VPIFK 중 하나)는 사용자가 입력부(111)를 조작함으로써 입력될 수 있다. 단말 제어부(120)는 호 연결 요청 신호(CREQ1)를 통신부(140)에 출력한 후, 통신부(140)로부터 호 연결 확인 신호(CACK2)를 수신할 때, VoIP 폰(VP1)과 VoIP 폰(VP2∼VPK 중 하나) 간의 호가 연결되었음을 인식한다. 호 연결 확인 신호(CACK2)는 VoIP 폰(VP2∼VPK 중 하나)에 의해 발생된다. 단말 제어부(120)는 입력부(111)로부터 수신되는 해제 신호(STOP)에 응답하여 자신의 단말 정보(VPIF1) 및 VoIP 폰(VP2∼VPK 중 하나)의 단말 정보(VPIF2∼VPIFK 중 하나)와, 호 연결 해제 신호(CCNL1)를 통신부(140)에 출력한다. 단말 제어부(120)는 통신부(140)로부터 호 연결 요청 신호(CREQ2)를 수신할 때, VoIP 폰(VP1)과 VoIP 폰(VP2∼VPK 중 하나) 간의 호가 연결되면, 통신부(140)에 호 연결 확인 신호(CACK1)를 출력한다. 또, 단말 제어부(120)는 통신부(140)로부터 호 연결 해제 신호(CCNL2)를 수신할 때, VoIP 폰(VP1)과 VoIP 폰(VP2∼VPK 중 하나) 간의 호 연결이 해제되었음을 인식한다. 도 2에 도시되지 않았지만, 단말 제어부(120)는 VoIP 폰(VP1)과 VoIP 폰(VP2∼VPK 중 하나) 간의 호 연결에 필요한 추가의 신호들을 더 출력할 수 있다.The terminal controller 120 outputs the encoded audio data REAUD and the encoded video data REVID of the calling VoIP phone (one of the VP2 to VPK) received from the communication unit 140 to the media processor 114. The terminal controller 120 receives the emote content data (at least one of ECNT1 to ECNTJ) from the communication unit 140 and outputs it to the media processor 114. The terminal control unit 120 responds to the call signal CALL received from the input unit 111, so that one of the terminal information VPIF1 and the terminal information (VPIF2 to VPKK) of the called VoIP phone (one of VP2 to VPK) is received. ) And the call connection request signal CREQ1 are output to the communication unit 140. Here, the own terminal information VPIF1 may be stored in the terminal control unit 120, and the terminal information (one of the VPIF2 to VPIFK) of the called VoIP phone (one of the VP2 to the VPK) is input by the user. Can be input by operating. The terminal controller 120 outputs the call connection request signal CREQ1 to the communication unit 140 and then receives the call connection confirmation signal CACK2 from the communication unit 140, and then the VoIP phone VP1 and the VoIP phone VP2. Recognize that a call between one of ~ VPK) is connected. The call connection confirmation signal CACK2 is generated by the VoIP phone (one of VP2 to VPK). In response to the release signal STOP received from the input unit 111, the terminal control unit 120 includes terminal information (one of VPIF2 to VPIFK) of the terminal information VPIF1 and the VoIP phone (one of VP2 to VPK), and The call disconnection signal CCNL1 is output to the communication unit 140. When the terminal control unit 120 receives the call connection request signal CREQ2 from the communication unit 140, if a call is connected between the VoIP phone VP1 and the VoIP phone (one of VP2 to VPK), the terminal control unit 120 connects to the communication unit 140. The acknowledgment signal CACK1 is output. In addition, when receiving the call disconnection signal CCNL2 from the communication unit 140, the terminal controller 120 recognizes that the call connection between the VoIP phone VP1 and the VoIP phone (one of VP2 to VPK) is released. Although not shown in FIG. 2, the terminal controller 120 may further output additional signals required for call connection between the VoIP phone VP1 and the VoIP phone (one of VP2 to VPK).

이모트 메시지 생성부(130)는 음성 인식부(131), 어휘 검사부(132), 아이콘 저장부(133), 디스플레이 제어부(134), 아이콘 선택부(135), 및 메시지 출력부(136)를 포함한다. 음성 인식부(131)는 단말 제어부(120)로부터 수신되는 인코딩 오디오 데이터(EAUD)로부터 어휘 데이터들(VDAT1∼VDATM)(M은 정수)을 추출하여 어휘 검사부(132)에 출력한다. 어휘 검사부(132)에는 복수의 단어, 복수의 어구, 및 복수의 문장 중 일부 또는 전체를 각각 나타내는 설정된 어휘 데이터들(SVDT1∼SVDTP)이 미리 저장된다. 어휘 검사부(132)는 음성 인식부(131)로부터 수신되는 어휘 데이터들(VDAT1∼VDATM)을 설정된 어휘 데이터들(SVDT1∼SVDTP)에 비교하여, 어휘 데이터들(VDAT1∼VDATM) 중 적어도 하나에 일치하는, 적어도 하나의 설정된 어휘 데이터(SVDT1∼SVDTP 중 적어도 하나)가 존재하는지를 판단한다. 어휘 검사부(132)는 어휘 데이터들(VDAT1∼VDATM) 중 적어도 하나에 일치하는, 적어도 하나의 설정된 어휘 데이터(SVDT1∼SVDTP 중 적어도 하나)가 존재할 때, 그 설정된 어휘 데이터(SVDT1∼SVDTP 중 적어도 하나)를 메시지 출력부(136)에 출력한다. 또, 어휘 검사부(132)는 어휘 데이터들(VDAT1∼VDATM) 중 적어도 하나에 일치하는, 적어도 하나의 설정된 어휘 데이터(SVDT1∼SVDTP 중 적어도 하나)가 존재하지 않을 때, 검사 완료 신호(CHEKEND)를 메시지 출력부(136)에 출력한다. 그 결과, 메시지 출력부(136)는 검사 완료 신호(CHEKEND)를 수신하고, 인코딩 오디오 데이터(EAUD)로부터 추출된 어휘 데이터들(VDAT1∼VDATM) 중에서, 설정된 어휘 데이터들(SVDT1∼SVDTP)에 일치하는 어휘 데이터가 존재하지 않음을 인식한다.The emote message generator 130 may include a voice recognition unit 131, a lexical checker 132, an icon storage unit 133, a display control unit 134, an icon selector 135, and a message output unit 136. Include. The speech recognizer 131 extracts the lexical data VDAT1 to VDATM (M is an integer) from the encoded audio data EAUD received from the terminal controller 120 and outputs the lexical data to the lexical checker 132. The vocabulary checker 132 stores preset vocabulary data SVDT1 to SVDTP representing some or all of a plurality of words, a plurality of phrases, and a plurality of sentences in advance. The lexical checker 132 compares the lexical data VDAT1 to VDATM received from the speech recognition unit 131 with the set lexical data SVDT1 to SVDTP and matches at least one of the lexical data VDAT1 to VDATM. It is determined whether at least one set vocabulary data (at least one of SVDT1 to SVDTP) exists. The lexical checking unit 132, when there is at least one set of lexical data (at least one of SVDT1 to SVDTP) that matches at least one of the lexical data (VDAT1 to VDATM), at least one of the set lexical data (SVDT1 to SVDTP) ) Is output to the message output unit 136. In addition, the lexical checker 132 generates a check completion signal CHEKEND when at least one set lexical data (at least one of SVDT1 to SVDTP) corresponding to at least one of the lexical data VDAT1 to VDATM does not exist. The message output unit 136 outputs the message. As a result, the message output unit 136 receives the check completion signal CHEKEND and matches the set lexical data SVDT1 to SVDTP among the lexical data VDAT1 to VDATM extracted from the encoded audio data EAUD. Recognize that there is no lexical data.

한편, 아이콘 저장부(133)에는 사용자의 감정이나 느낌을 함축하는 이모트 아이콘들(EICON1∼EICONQ)에 각각 대응하는 아이콘 그래픽 데이터들(IGDT1∼IGDTQ)이 미리 저장된다. 디스플레이 제어부(134)는 단말 제어부(120)로부터 수신되는 아이콘 표시 신호(ICDP)에 응답하여, 아이콘 그래픽 데이터들(IGDT1∼IGDTQ)을 아이콘 저장부(133)로부터 판독하여 사용자 인터페이스부(110)의 디스플레이부(115)에 출력한다. 그 결과, 디스플레이부(115)가 아이콘 그래픽 데이터들(IGDT1∼IGDTQ)에 기초하여 이모트 아이콘들(EICON1∼EICONQ)을 디스플레이 화면(115a)에 표시한다. 디스플레이부(115)는 이모트 아이콘들(EICON1∼EICONQ)만을 표시하는 디스플레이 화면을 별도로 구비할 수도 있고, 사용자의 촬영 영상을 표시하는 전체 디스플레이 화면의 일부 영역에 이모트 아이콘들(EICON1∼EICONQ)을 표시할 수도 있다. 또, 디스플레이 제어부(134)는 단말 제어부(120)로부터 수신되는 아이콘 표시 정지 신호(ICDPS)에 응답하여, 디스플레이부(115)에 표시 제어 신호(DCTL)를 출력한다. 그 결과, 디스플레이부(115)가 이모트 아이콘들(EICON1∼EICONQ)의 표시 동작을 정지한다. 이 후, 아이콘 선택부(135)는 단말 제어부(120)로부터 수신되는 이모트 아이콘 선택 신호(EISL)에 응답하여, 선택된 적어도 하나의 이모트 아이콘(EICON1∼EICONQ 중 적어도 하나)에 대응하는 적어도 하나의 아이콘 그래픽 데이터(IGDT1∼IGDTQ 중 적어도 하나)를 아이콘 저장부(133)로부터 판독하여, 메시지 출력부(136)에 출력한다. 또, 아이콘 선택부(135)는 단말 제어부(120)로부터 이모트 아이콘 선택 신호(EISL)가 수신되지 않을 때, 아이콘 그래픽 데이터(IGDT1∼IGDTQ 중 적어도 하나)를 메시지 출력부(136)에 출력하지 않는다.Meanwhile, the icon storage unit 133 may store icon graphic data IGDT1 to IGDTQ corresponding to the emote icons EICON1 to EICONQ that imply the user's emotion or feeling. The display control unit 134 reads the icon graphic data IGDT1 to IGDTQ from the icon storage unit 133 in response to the icon display signal ICDP received from the terminal control unit 120, and then displays the user interface unit 110. Output to display unit 115. As a result, the display unit 115 displays the emote icons EICON1 to EICONQ on the display screen 115a based on the icon graphic data IGDT1 to IGDTQ. The display unit 115 may further include a display screen displaying only the emote icons EICON1 to EICONQ, and the emote icons EICON1 to EICONQ on a partial area of the entire display screen displaying the photographed image of the user. Can also be displayed. In addition, the display controller 134 outputs a display control signal DCTL to the display 115 in response to the icon display stop signal ICDPS received from the terminal controller 120. As a result, the display 115 stops displaying the emote icons EICON1 to EICONQ. Thereafter, the icon selector 135 corresponds to at least one emote icon (at least one of EICON1 to EICONQ) selected in response to the emote icon selection signal EISL received from the terminal controller 120. Icon graphic data (at least one of IGDT1 to IGDTQ) is read from the icon storage unit 133 and output to the message output unit 136. In addition, the icon selector 135 does not output the icon graphic data (at least one of IGDT1 to IGDTQ) to the message output unit 136 when the emote icon selection signal EISL is not received from the terminal controller 120. Do not.

메시지 출력부(136)는 어휘 검사부(132)로부터 수신되는 적어도 하나의 어휘 데이터(VDAT1∼VDATM 중 적어도 하나), 또는 아이콘 선택부(135)로부터 수신되는 적어도 하나의 아이콘 그래픽 데이터(IGDT1∼IGDTQ 중 적어도 하나)에 기초하여, 이모트 메시지 데이터들(EMSG1∼EMSGJ) 중 적어도 하나를 생성하여 단말 제어부(120)에 출력한다. 또, 메시지 출력부(136)는 어휘 검사부(132)로부터 검사 완료 신호(CHEKEND)를 수신하거나, 또는 아이콘 선택부(135)로부터 아이콘 그래픽 데이터(IGDT1∼IGDTQ 중 적어도 하나)가 수신되지 않을 때, 단말 제어부(120)에 어떠한 이모트 메시지 데이터도 출력하지 않고, 메시지 부재 신호(MABSS)를 출력한다.The message output unit 136 may include at least one lexical data (at least one of VDAT1 to VDATM) received from the lexical checker 132 or at least one icon graphic data (IGDT1 to IGDTQ) received from the icon selector 135. Based on at least one), at least one of the emote message data (EMSG1 to EMSGJ) is generated and output to the terminal controller 120. In addition, when the message output unit 136 receives the check completion signal CHEKEND from the lexical checker 132 or when the icon graphic data (at least one of IGDT1 to IGDTQ) is not received from the icon selector 135, The IMBS message is output to the terminal controller 120 without outputting any emote message data.

통신부(140)는 패킷(packet) 생성부(141), IP(internet protocol) 송수신부(142), 및 패킷 해석부(143)를 포함한다. 패킷 생성부(141)는 단말 제어부(120)로부터 수신되는 인코딩 오디오 데이터(EAUD), 인코딩 비디오 데이터(EVID), 및 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)에 기초하여, 데이터 패킷(DPCK11)을 생성한다. 또, 패킷 생성부(141)는 단말 제어부(120)로부터 수신되는 인코딩 오디오 데이터(EAUD) 및 인코딩 비디오 데이터(EVID)에 기초하여, 데이터 패킷(DPCK12)을 생성한다. 또한, 패킷 생성부(141)는 단말 제어부(120)로부터 수신되는 단말 정보(VPIF1) 및 단말 정보(VPIF2∼VPIFK 중 하나)와, 호 연결 요청 신호(CREQ1)에 기초하여, 통신 패킷(TPCK11)을 생성한다. 패킷 생성부(141)는 단말 제어부(120)로부터 수신되는 호 연결 확인 신호(CACK1)에 기초하여 통신 패킷(TPCK12)을 생성한다. 또, 패킷 생성부(141)는 단말 제어부(120)로부터 수신되는 단말 정보(VPIF1) 및 단말 정보(VPIF2∼VPIFK 중 하나)와, 호 연결 해제 신호(CCNL1)에 기초하여, 통신 패킷(TPCK13)을 생성한다. 또, 패킷 생성부(141)는 단 말 제어부(120)로부터 수신되는 추가의 신호들에 기초하여, 추가의 통신 패킷(미도시)을 더 생성할 수 있다.The communication unit 140 includes a packet generator 141, an IP protocol transceiver 142, and a packet analyzer 143. The packet generator 141 is configured to generate a data packet DPCK11 based on the encoded audio data EAUD, the encoded video data EVID, and the emote message data (at least one of EMSSG1 to EMSSGJ) received from the terminal controller 120. ). The packet generator 141 generates a data packet DPCK12 based on the encoded audio data EAUD and the encoded video data EVID received from the terminal controller 120. The packet generation unit 141 also communicates with the communication packet TPCK11 based on the terminal information VPIF1 and the terminal information (one of VPIF2 to VPIFK) received from the terminal control unit 120 and the call connection request signal CREQ1. Create The packet generator 141 generates a communication packet TPCK12 based on the call connection confirmation signal CACK1 received from the terminal controller 120. The packet generation unit 141 also communicates with the communication packet TPCK13 based on the terminal information VPIF1 and the terminal information (one of the VPIF2 to VPIFK) and the call disconnection signal CCNL1 received from the terminal control unit 120. Create In addition, the packet generator 141 may further generate an additional communication packet (not shown) based on additional signals received from the terminal controller 120.

IP 송수신부(142)는 패킷 생성부(141)로부터 수신되는 데이터 패킷(DPCK11 또는 DPCK12) 또는 통신 패킷(TPCK11∼TPCK13 중 하나)을 인터넷 통신망(NT, 도 1 참고)을 통하여 VoIP 서버(103)에 전송한다. 또, IP 송수신부(142)는 인터넷 통신망(NT)을 통하여 VoIP 서버(103)로부터 변경 데이터 패킷(DPCK20 또는 DPCK21), 또는 데이터 패킷(DPCK22), 또는 통신 패킷(TPCK21∼TPCK23 중 하나)을 수신하여, 패킷 해석부(143)에 출력한다. 여기에서, 변경 데이터 패킷(DPCK20)과 데이터 패킷(DPCK22)은 각각 VoIP 폰(VP2∼VPK 중 하나)으로부터의 인코딩 오디오 데이터(REAUD) 및 인코딩 비디오 데이터(REVID)만을 포함한다. 또, 변경 데이터 패킷(DPCK21)은 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)와, VoIP 폰(VP2∼VPK 중 하나)으로부터의 인코딩 오디오 데이터(REAUD) 및 인코딩 비디오 데이터(REVID)를 포함한다. 통신 패킷(TPCK21)은 단말 정보(VPIF2∼VPIFK 중 하나) 및 단말 정보(VPIF1)와, 호 연결 요청 신호(CREQ2)를 포함하고, 통신 패킷(TPCK22)은 VoIP 폰(VP2∼VPK 중 하나)으로부터의 호 연결 확인 신호(CACK2)를 포함한다. 또한, 통신 패킷(TPCK23)은 단말 정보(VPIF2∼VPIFK 중 하나) 및 단말 정보(VPIF1)와, 호 연결 해제 신호(CCNL2)를 포함한다.The IP transceiver 142 transmits the data packet DPCK11 or DPCK12 or the communication packet (one of TPCK11 to TPCK13) received from the packet generator 141 via the Internet communication network NT (see FIG. 1). To transmit. The IP transceiver 142 also receives a change data packet DPCK20 or DPCK21, a data packet DPCK22, or a communication packet (one of TPCK21 to TPCK23) from the VoIP server 103 via the Internet communication network NT. The packet is output to the packet analyzer 143. Here, the change data packet DPCK20 and the data packet DPCK22 include only the encoded audio data REAUD and the encoded video data REVID from the VoIP phone (one of VP2 to VPK), respectively. Further, the change data packet DPCK21 includes emote content data (at least one of ECNT1 to ECNTJ), encoded audio data (REAUD) and encoded video data (REVID) from a VoIP phone (one of VP2 to VPK). . The communication packet TPCK21 includes terminal information (one of VPIF2 to VPIFK) and terminal information VPIF1 and a call connection request signal CREQ2. The communication packet TPCK22 is from a VoIP phone (one of VP2 to VPNK). The call connection confirmation signal (CACK2) of the. The communication packet TPCK23 also includes terminal information (one of VPIF2 to VPIFK), terminal information VPIF1, and a call disconnection signal CCNL2.

패킷 해석부(143)는 IP 송수신부(142)로부터 수신되는 데이터 패킷(DPCK21)을 해석하여, 인코딩 오디오 데이터(REAUD), 인코딩 비디오 데이터(REVID), 및 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)를 단말 제어부(120)에 출력한 다. 또, 패킷 해석부(143)는 IP 송수신부(142)로부터 수신되는 데이터 패킷(DPCK20 또는 DPCK22)을 해석하고, 인코딩 오디오 데이터(REAUD) 및 인코딩 비디오 데이터(REVID)를 단말 제어부(120)에 출력한다. 또한, 패킷 해석부(143)는 IP 송수신부(142)로부터 수신되는 통신 패킷(TPCK21)을 해석하고, 호 연결 요청 신호(CREQ2)를 단말 제어부(120)에 출력한다. 패킷 해석부(143)는 IP 송수신부(142)로부터 수신되는 통신 패킷(TPCK22)을 해석하고, 호 연결 확인 신호(CACK2)를 단말 제어부(120)에 출력한다. 패킷 해석부(143)는 IP 송수신부(142)로부터 수신되는 통신 패킷(TPCK23)을 해석하고, 호 연결 해제 신호(CCNL2)를 단말 제어부(120)에 출력한다.The packet analyzer 143 analyzes the data packet DPCK21 received from the IP transceiver 142, and encodes at least one of encoded audio data REAUD, encoded video data REVID, and emote content data ECNT1 to ECNTJ. One) is output to the terminal control unit 120. The packet analyzer 143 analyzes the data packet DPCK20 or DPCK22 received from the IP transceiver 142 and outputs the encoded audio data REAUD and the encoded video data REVID to the terminal controller 120. do. In addition, the packet analyzer 143 analyzes the communication packet TPCK21 received from the IP transceiver 142 and outputs a call connection request signal CREQ2 to the terminal controller 120. The packet analyzer 143 analyzes the communication packet TPCK22 received from the IP transceiver 142 and outputs a call connection confirmation signal CACK2 to the terminal controller 120. The packet analyzer 143 analyzes the communication packet TPCK23 received from the IP transceiver 142, and outputs a call disconnection signal CCNL2 to the terminal controller 120.

도 3을 참고하여, VoIP 서버(103)의 구성 및 구체적인 동작을 좀 더 상세히 설명한다. 도 3은 도 1에 도시된 VoIP 서버의 상세한 블록 구성도이다. VoIP 서버(103)는 통신부(150), 서버 제어부(160), 콘텐츠 선택부(170), 및 단말 관리부(180)를 포함한다. 통신부(150)는 IP 송수신부(151), 패킷 해석부(152), 및 패킷 생성부(153)를 포함한다. IP 송수신부(151)는 인터넷 통신망(NT)을 통하여, 발신 측 VoIP 폰(VP1∼VPK 중 하나)으로부터 데이터 패킷(DPCK11, DPCK12, DPCK22 중 하나), 또는 통신 패킷(TPCK11∼TPCK13 중 하나, 또는 TPCK21∼TPCK23 중 하나)을 수신하여 패킷 해석부(152)에 출력한다. 또, IP 송수신부(151)는 서버 제어부(160)로부터 수신되는 전송 제어 신호(TCTL2)에 응답하여, 데이터 패킷(DPCK22)을 그대로 착신 측 VoIP 폰(VP1∼VPK 중 다른 하나)에 전송한다. IP 송수신부(151)는 패킷 생성부(153)로부터 수신되는 변경 데이터 패킷(DPCK20 또는 DPCK21)을 착신 측 VoIP 폰(VP1∼VPK 중 다른 하나)에 전송한다. 패킷 해석부(152)는 IP 송수신부(151)로부터 수신되는 데이터 패킷(DPCK11, DPCK12, DPCK22 중 하나)을 해석하여, 인코딩 오디오 데이터(EAUD) 및 인코딩 비디오 데이터(EVID)를 출력하거나, 또는 이모트 메시지 데이터(EMSG1∼EMSGJ 중 하나), 인코딩 오디오 데이터(EAUD), 및 인코딩 비디오 데이터(EVID)를 출력한다. 또, 패킷 해석부(152)는 IP 송수신부(151)로부터 수신되는 통신 패킷(TPCK11∼TPCK13 중 하나, 또는 TPCK21∼TPCK23 중 하나)을 해석하여, 단말 정보(VPIF1∼VPIFK 중 둘) 및 호 연결 요청 신호(CREQ1 또는 CREQ2)를 출력한다. 패킷 생성부(153)는 서버 제어부(160)로부터 수신되는 전송 제어 신호(TCTL1)에 응답하여, 서버 제어부(160)로부터 수신되는 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나), 인코딩 오디오 데이터(EAUD), 및 인코딩 비디오 데이터(EVID)에 기초하여 변경 데이터 패킷(DPCK21)을 생성하여 IP 송수신부(151)에 출력한다. 또, 패킷 생성부(153)는 서버 제어부(160)로부터 수신되는 전송 제어 신호(TCTL3)에 응답하여, 서버 제어부(160)로부터 수신되는 인코딩 오디오 데이터(EAUD) 및 인코딩 비디오 데이터(EVID)에 기초하여 변경 데이터 패킷(DPCK20)을 생성하여 IP 송수신부(151)에 출력한다.Referring to Figure 3, the configuration and specific operation of the VoIP server 103 will be described in more detail. 3 is a detailed block diagram of the VoIP server shown in FIG. The VoIP server 103 includes a communication unit 150, a server control unit 160, a content selection unit 170, and a terminal manager 180. The communicator 150 includes an IP transceiver 151, a packet analyzer 152, and a packet generator 153. The IP transceiver 151 transmits a data packet (one of DPCK11, DPCK12, or DPCK22) or a communication packet (TPCK11 to TPCK13) from an originating VoIP phone (one of VP1 to VPK) through the Internet communication network NT. One of TPCK21 to TPCK23) is received and output to the packet analyzer 152. In addition, the IP transceiver 151 transmits the data packet DPCK22 to the destination VoIP phone (the other one of the VP1 to VPK) as it is in response to the transmission control signal TCTL2 received from the server controller 160. The IP transmission / reception unit 151 transmits the change data packet DPCK20 or DPCK21 received from the packet generation unit 153 to the destination VoIP phone (the other of VP1 to VPK). The packet analyzer 152 analyzes the data packet (one of DPCK11, DPCK12, DPCK22) received from the IP transceiver 151, and outputs the encoded audio data EAUD and the encoded video data EVID, or the aunt. Output message data (one of EMSG1 to EMSGJ), encoded audio data EAUD, and encoded video data EVID. In addition, the packet analyzer 152 interprets a communication packet (one of TPCK11 to TPCK13 or one of TPCK21 to TPCK23) received from the IP transceiver 151 to connect the terminal information (two of VPIF1 to VPIFK) and call connection. Output the request signal CREQ1 or CREQ2. In response to the transmission control signal TCTL1 received from the server controller 160, the packet generator 153 receives emote content data (at least one of ECNT1 to ECNTJ) received from the server controller 160, and encoded audio data ( The change data packet DPCK21 is generated on the basis of the EAUD and the encoded video data EVID, and output to the IP transceiver 151. In addition, the packet generator 153 is based on the encoded audio data EAUD and the encoded video data EVID received from the server controller 160 in response to the transmission control signal TCTL3 received from the server controller 160. A change data packet DPCK20 is generated and output to the IP transceiver 151.

서버 제어부(160)는 패킷 해석부(152)로부터 수신되는 단말 정보(VPIF1∼VPIFK 중 하나)를 단말 관리부(180)에 출력한다. 이 후, 서버 제어부(160)는 단말 관리부(180)로부터 수신되는 승인 신호(ADM) 또는 무효 신호(INVLD)와, 패킷 해석부(152)에 의한 데이터 패킷의 해석 결과에 따라, 전송 제어 신호(TCTL1∼TCTL2 중 하나)를 출력한다. 좀 더 상세하게는, 서버 제어부(160)가 단말 관리부(180)로부터 승인 신호(ADM)를 수신할 때, 패킷 해석부(152)에 의한 데이터 패킷의 해석 결과에 따라, 전송 제어 신호(TCTL1 또는 TCTL2)를 출력한다. 서버 제어부(160)는 승인 신호(ADM)를 수신하면, 발신 측 VoIP 폰(VP1∼VPK 중 하나)의 사용자가 통화 시 감정적 영상, 감정적 텍스트, 및 감정적 음향을 전송하는 콘텐츠 효과 서비스에 가입된 것으로 판단한다. 또, 서버 제어부(160)가 단말 관리부(180)로부터 무효 신호(INVLD)를 수신할 때, 패킷 해석부(152)에 의한 데이터 패킷의 해석 결과에 따라, 전송 제어 신호(TCTL2 또는 TCTL3)를 출력한다. 서버 제어부(160)는 무효 신호(INVLD)를 수신하면, 발신 측 VoIP 폰(VP1∼VPK 중 하나)의 사용자가 통화 시 감정적 영상, 감정적 텍스트, 및 감정적 음향을 전송하는 콘텐츠 효과 서비스에 가입되지 않은 것으로 판단한다. 서버 제어부(160)는 패킷 해석부(152)로부터 적어도 하나의 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 수신할 때, 콘텐츠 요청 신호(CNTREQ)와 적어도 하나의 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 콘텐츠 선택부(170)에 출력한다. 이 후, 서버 제어부(160)는 콘텐츠 선택부(170)로부터 수신되는 적어도 하나의 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)를 패킷 생성부(153)에 출력한다. 단말 관리부(180)는 서버 제어부(160)로부터 수신되는 VoIP 폰의 단말 정보(VPIF1∼VPIFK 중 하나)가 단말 DB(102)에 저장된 것인지의 여부를 판단하고, 그 판단 결과에 따라 승인 신호(ADM) 또는 무효 신호(INVLD)를 출력한다.The server controller 160 outputs the terminal information (one of the VPIF1 to the VPIFK) received from the packet analyzer 152 to the terminal manager 180. Subsequently, the server controller 160 transmits a transmission control signal (ADM) or an invalid signal (INVLD) received from the terminal manager 180 and the transmission control signal according to the analysis result of the data packet by the packet analyzer 152. One of TCTL1 to TCTL2) is output. More specifically, when the server control unit 160 receives the acknowledgment signal (ADM) from the terminal management unit 180, according to the analysis result of the data packet by the packet analysis unit 152, the transmission control signal TCTL1 or Output TCTL2). When the server control unit 160 receives the approval signal (ADM), the user of the calling VoIP phone (one of VP1 to VPK) is subscribed to the content effect service that transmits the emotional video, the emotional text, and the emotional sound during the call. To judge. When the server controller 160 receives the invalid signal INVLD from the terminal manager 180, the server controller 160 outputs the transmission control signal TCTL2 or TCTL3 in accordance with the analysis result of the data packet by the packet analyzer 152. do. When the server controller 160 receives the invalid signal INVLD, the user of the calling VoIP phone (one of VP1 to VPK) is not subscribed to the content effect service that transmits the emotional video, the emotional text, and the emotional sound during the call. Judging by it. When the server controller 160 receives the at least one emote message data (at least one of the EMSSG1 to the EMSSGJ) from the packet analyzer 152, the server controller 160 and the at least one emote message data (EMSG1 to). Output at least one of the EMSGJs) to the content selection unit 170. Thereafter, the server controller 160 outputs at least one emote content data (at least one of ECNT1 to ECNTJ) received from the content selection unit 170 to the packet generation unit 153. The terminal manager 180 determines whether the terminal information (one of the VPIF1 to the VPIFK) of the VoIP phone received from the server controller 160 is stored in the terminal DB 102, and the acknowledgment signal (ADM) according to the determination result. ) Or outputs an invalid signal (INVLD).

다음으로, 도 6을 참고하여, VoIP 전화 통신 시스템(100)의 동작 과정을 좀 더 상세히 설명한다. 도 6은 도 1에 도시된 VoIP 전화 통신 시스템의 동작 과정을 나타내는 흐름도이다. 설명의 편의를 위해, 본 실시예에서는 VoIP 폰들(VP1, VPK) 간의 전화 통신 과정을 중심으로 설명하기로 한다. 먼저, VoIP 서버(103)는 VoIP 폰들(VP1, VPK) 중 어느 하나로부터의 호 연결 요청이 있는지의 여부를 판단한다(단계 1001). VoIP 폰(VP1)으로부터 통신 패킷(TPCK11)을 수신하거나, 또는 VoIP 폰(VPK)으로부터 통신 패킷(TPCK21)을 수신할 때, VoIP 서버(103)는 호 연결 요청이 있는 것으로 판단한다. 예를 들어, VoIP 서버(103)가 단말 정보(VPIF1, VPIFK)와 호 연결 요청 신호(CREQ1)를 포함하는 통신 패킷(TPCK11)을 수신한 경우, VoIP 서버(103)는 링(ring) 신호(미도시)와 함께 통신 패킷(TPCK11)을 VoIP 폰(VPK)에 전송하고, VoIP 폰(VPK)의 사용자가 전화를 받으면, VoIP 폰들(VP1, VPK) 간의 호를 연결한다(단계 1002). VoIP 폰들(VP1, VPK) 간의 호가 연결된 상태에서, 각 사용자에게 음성 및 영상 통화를 제공하기 위해, VoIP 폰들(VP1, VPK) 각각은 데이터 패킷을 생성하여(단계 1003), 인터넷 통신망(NT)을 통하여 VoIP 서버(103)에 전송한다. 본 실시예에서는 VoIP 폰(VP1)(즉, 발신 측 VoIP 폰)에 의해 생성된 데이터 패킷(DPCK11 또는 DPCK12)이 VoIP 폰(VPK)(즉, 착신 측 VoIP 폰)에 전송되는 과정을 중심으로 설명하기로 한다. VoIP 서버(103)는 VoIP 폰(VP1)으로부터 데이터 패킷(DPCK11 또는 DPCK12)을 수신한다(단계 1004). 여기에서, 데이터 패킷(DPCK11)은 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나), 인코딩 오디오 데이터(EAUD), 및 인코딩 비디오 데이터(EVID)를 포함하고, 데이터 패킷(DPCK12)은 인코딩 오디오 데이터(EAUD)와 인코딩 비디오 데이터(EVID)를 포함한다. 또, 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)는 적어도 하나의 설정된 어휘 데이 터(SVDT1∼SVDTP 중 적어도 하나)를 포함하거나, 또는 적어도 하나의 아이콘 그래픽 데이터(IGDT1∼IGDTQ 중 적어도 하나)를 포함한다.Next, referring to Figure 6, the operation of the VoIP telephony communication system 100 will be described in more detail. FIG. 6 is a flowchart illustrating an operation process of the VoIP telephony communication system shown in FIG. 1. For convenience of description, this embodiment will be described based on the telephone communication process between the VoIP phones VP1 and VPK. First, the VoIP server 103 determines whether there is a call connection request from any one of the VoIP phones VP1 and VPK (step 1001). When receiving the communication packet TPCK11 from the VoIP phone VP1 or receiving the communication packet TPCK21 from the VoIP phone VPK, the VoIP server 103 determines that there is a call connection request. For example, when the VoIP server 103 receives the communication packet TPCK11 including the terminal information VPIF1 and VPIFK and the call connection request signal CREQ1, the VoIP server 103 receives a ring signal ( The communication packet TPCK11 is transmitted to the VoIP phone VPK together with the user of the VoIP phone VPK, and the call is connected between the VoIP phones VP1 and VPK (step 1002). With a call between VoIP phones VP1 and VPK connected, each of the VoIP phones VP1 and VPK generates a data packet (step 1003) in order to provide voice and video calls to each user (step 1003). Is transmitted to the VoIP server 103. In the present embodiment, a description will be given mainly on the process in which the data packet DPCK11 or DPCK12 generated by the VoIP phone VP1 (i.e., the originating VoIP phone) is transmitted to the VoIP phone VP (i.e., the destination VoIP phone). Let's do it. The VoIP server 103 receives a data packet DPCK11 or DPCK12 from the VoIP phone VP1 (step 1004). Here, the data packet DPCK11 includes emote message data (at least one of EMSSG1 to EMSGJ), encoded audio data EAUD, and encoded video data EVID, and the data packet DPCK12 includes encoded audio data ( EAUD) and encoded video data (EVID). The emote message data (at least one of EMSSG1 to EMSSGJ) includes at least one set of lexical data (at least one of SVDT1 to SVDTP), or at least one icon graphic data (at least one of IGDT1 to IGDTQ). Include.

한편, VoIP 서버(103)는 VoIP 폰들(VP1, VPK) 간의 호 연결 시, 단말 정보들(VPIF1, VPIFK)(즉, 데이터 패킷을 전송하는 발신 측 VoIP 폰)을 수신하고, 단말 정보들(VPIF1, VPIFK)이 단말 DB(102)에 저장된 것인지의 여부를 판단한다(단계 1005). 이를 좀 더 상세히 설명하면, VoIP 서버(103)의 서버 제어부(160)가 VoIP 서버(103)의 패킷 해석부(152)로부터 수신되는 단말 정보들(VPIF1, VPIFK)을 VoIP 서버(103)의 단말 관리부(180)에 출력한다. 단말 관리부(180)는 단말 정보들(VPIF1, VPIFK)이 단말 DB(102)에 저장된 것인지의 여부를 판단한다. 단말 관리부(180)는 단말 정보들(VPIF1, VPIFK)이 단말 DB(102)에 저장된 것일 때, 승인 신호(ADM)를 서버 제어부(160)에 출력하고, 단말 정보들(VPIF1, VPIFK)이 단말 DB(102)에 저장된 것이 아닐 때, 무효 신호(INVLD)를 서버 제어부(160)에 출력한다. 서버 제어부(160)는 단말 관리부(180)로부터 승인 신호(ADM)를 수신하면 단말 정보들(VPIF1, VPIFK)이 단말 DB(102)에 저장된 것으로 판단하고, 단말 관리부(180)로부터 무효 신호(INVLD)를 수신하면, 단말 정보들(VPIF1, VPIFK)이 단말 DB(102)에 저장되지 않은 것으로 판단한다. 단계 1005에서 단말 정보(VPIF1)가 단말 DB(102)에 저장된 것일 때, VoIP 서버(103)는 데이터 패킷(DPCK11 또는 DPCK12)이 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 포함하는지의 여부를 판단한다(단계 1006). 단계 1006에서, 데이터 패킷(DPCK11 또는 DPCK12)이 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 포함할 때(즉, VoIP 서버(103)가 데 이터 패킷(DPCK11)을 수신한 경우), VoIP 서버(103)는 콘텐츠 DB(101)로부터, 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)에 대응하는 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)를 판독한다(단계 1007). 이를 좀 더 상세히 설명하면, VoIP 서버(103)의 서버 제어부(160)가 패킷 해석부(112)로부터 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 수신할 때, 콘텐츠 요청 신호(CNTREQ)와 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 콘텐츠 선택부(170)에 출력한다. 콘텐츠 선택부(170)는 콘텐츠 요청 신호(CNTREQ)에 응답하여, 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)에 대응하는 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)를 콘텐츠 DB(101)로부터 판독하여, 서버 제어부(160)에 출력한다.Meanwhile, the VoIP server 103 receives the terminal information VPIF1 and VPIFK (that is, the originating VoIP phone transmitting the data packet) when the call is connected between the VoIP phones VP1 and VPK, and the terminal information VPIF1. It is determined whether the VPIFK is stored in the terminal DB 102 (step 1005). In more detail, the server controller 160 of the VoIP server 103 transmits the terminal information VPIF1 and VPIFK received from the packet analyzer 152 of the VoIP server 103 to the terminal of the VoIP server 103. Output to the management unit 180. The terminal manager 180 determines whether the terminal information VPIF1 and VPIFK are stored in the terminal DB 102. The terminal manager 180 outputs an approval signal ADM to the server controller 160 when the terminal information VPIF1 and VPIFK are stored in the terminal DB 102, and the terminal information VPIF1 and VPIFK are terminal. When not stored in the DB 102, the invalid signal INVLD is output to the server controller 160. When the server controller 160 receives the approval signal ADM from the terminal manager 180, the server controller 160 determines that the terminal information VPIF1 and VPIFK are stored in the terminal DB 102, and the invalid signal INVLD is received from the terminal manager 180. ), It is determined that the terminal information VPIF1 and VPIFK are not stored in the terminal DB 102. When the terminal information VPIF1 is stored in the terminal DB 102 in step 1005, the VoIP server 103 determines whether the data packet DPCK11 or DPCK12 contains emote message data (at least one of EMSSG1 to EMSSGJ). Determine (step 1006). In step 1006, when the data packet DPCK11 or DPCK12 contains emote message data (at least one of EMSSG1 to EMSSGJ) (that is, when the VoIP server 103 receives the data packet DPCK11), VoIP The server 103 reads out the emote content data (at least one of the ECNT1 to ECNTJ) corresponding to the emote message data (at least one of the EMSSG1 to EMSGJ) from the content DB 101 (step 1007). In more detail, when the server controller 160 of the VoIP server 103 receives the emote message data (at least one of the EMSSG1 to the EMSSGJ) from the packet analyzer 112, the content request signal CNTREQ and The emote message data (at least one of EMSSG1 to EMSGJ) is output to the content selection unit 170. The content selection unit 170, in response to the content request signal CNTREQ, transmits the emote content data (at least one of the ECNT1 to the ECNTJ) corresponding to the emote message data (at least one of the EMSSG1 to the EMSSGJ). It reads from and outputs to the server control part 160.

여기에서, 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)가 적어도 하나의 설정된 어휘 데이터(SVDT1∼SVDTP 중 적어도 하나)를 포함하고, 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)가 감정적 영상을 나타내는 것으로 가정하여, 콘텐츠 선택부(170)의 동작을 설명하면 다음과 같다. 예를 들어, 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)가 "사랑해"라는 어구를 나타내는 설정된 어휘 데이터(SVDT1∼SVDTP 중 하나)를 포함할 때, 콘텐츠 선택부(170)는 도 5a 또는 도 5b에 도시된 하트 모양의 감정적 영상을 나타내는 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 하나)를 콘텐츠 DB(101)로부터 판독한다. 또, 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)가 "생일 축하해"라는 어구를 나타내는 설정된 어휘 데이터(SVDT1∼SVDTP 중 하나)를 포함할 때, 콘텐츠 선택부(170)는 도 5c 에 도시된 케이크 모양의 감정적 영상을 나타내는 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 하나)를 콘텐츠 DB(101)로부터 판독한다. 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)가 "행운"이라는 단어를 나타내는 설정된 어휘 데이터(SVDT1∼SVDTP 중 하나)를 포함할 때, 콘텐츠 선택부(170)는 도 5d에 도시된 네 잎 클로버 모양의 감정적 영상을 나타내는 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 하나)를 콘텐츠 DB(101)로부터 판독한다. 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)가 "축하해"라는 어구를 나타내는 설정된 어휘 데이터(SVDT1∼SVDTP 중 하나)를 포함할 때, 콘텐츠 선택부(170)는 도 5e에 도시된 꽃다발 모양의 감정적 영상을 나타내는 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 하나)를 콘텐츠 DB(101)로부터 판독한다. 또, 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)가 "안녕"이라는 단어를 나타내는 설정된 어휘 데이터(SVDT1∼SVDTP 중 하나)를 포함할 때, 콘텐츠 선택부(170)는 도 5f에 도시된 손 모양의 감정적 영상을 나타내는 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 하나)를 콘텐츠 DB(101)로부터 판독한다.Here, the emote message data (at least one of EMSSG1 to EMSSGJ) includes at least one set of lexical data (at least one of SVDT1 to SVDTP), and the emote content data (at least one of ECNT1 to ECNTJ) includes an emotional image. Assuming that the operation, the operation of the content selection unit 170 will be described as follows. For example, when the emote message data (at least one of EMSSG1 to EMSGJ) includes the set lexical data (one of SVDT1 to SVDTP) indicating the phrase "I love you," the content selection unit 170 is shown in FIG. 5A or FIG. Emote content data (one of ECNT1 to ECNTJ) representing a heart-shaped emotional image shown in 5b is read out from the content DB 101. Further, when the emote message data (at least one of the EMSSG1 to the EMSSGJ) includes the set lexical data (one of the SVDT1 to the SVDTP) indicating the phrase "Happy Birthday", the content selection unit 170 is shown in Fig. 5C. Emote content data (one of ECNT1 to ECNTJ) representing a cake-like emotional image is read from the content DB 101. When the emote message data (at least one of the EMSSG1 to EMSGJ) includes the set lexical data (one of the SVDT1 to SVDTP) indicating the word "good luck", the content selection unit 170 is the four leaf clover shown in Fig. 5D. Emote content data (one of ECNT1 to ECNTJ) representing the emotional image of the shape is read out from the content DB 101. When the emote message data (at least one of the EMSSG1 to EMSGJ) includes the set lexical data (one of the SVDT1 to SVDTP) indicating the phrase "congratulations", the content selection unit 170 has a bouquet shape shown in Fig. 5E. Emote content data (one of ECNT1 to ECNTJ) representing an emotional video is read out from the content DB 101. In addition, when the emote message data (at least one of the EMSSG1 to the EMSSGJ) includes the set lexical data (one of the SVDT1 to the SVDTP) indicating the word "hello", the content selection unit 170 displays the hand shown in FIG. 5F. Emote content data (one of ECNT1 to ECNTJ) representing the emotional image of the shape is read out from the content DB 101.

한편, 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)가 적어도 하나의 아이콘 그래픽 데이터(IGDT1∼IGDTQ 중 적어도 하나)를 포함하고, 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)가 감정적 영상을 나타내는 것으로 가정하여, 콘텐츠 선택부(170)의 동작을 설명하면 다음과 같다. 예를 들어, 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)가 도 4에 도시된 이모트 아이콘(EICON1 또는 EICON2)에 대응하는 아이콘 그래픽 데이터(IGDT1 또는 IGDT2)를 포함할 때, 콘텐츠 선택부(170)는 도 5a 또는 도 5b에 도시된 하트 모양의 감정적 영상을 나타내는 이 모트 콘텐츠 데이터(ECNT1 또는 ECNT2)를 콘텐츠 DB(101)로부터 판독한다. 또, 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)가 도 4에 도시된 이모트 아이콘(EICON3)에 대응하는 아이콘 그래픽 데이터(IGDT3)를 포함할 때, 콘텐츠 선택부(170)는 도 5c에 도시된 케이크 모양의 감정적 영상을 나타내는 이모트 콘텐츠 데이터(ECNT3)를 콘텐츠 DB(101)로부터 판독한다. 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)가 도 4에 도시된 이모트 아이콘(EICON4)에 대응하는 아이콘 그래픽 데이터(IGDT4)를 포함할 때, 콘텐츠 선택부(170)는 도 5d에 도시된 네 잎 클로버 모양의 감정적 영상을 나타내는 이모트 콘텐츠 데이터(ECNT4)를 콘텐츠 DB(101)로부터 판독한다. 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)가 도 4에 도시된 이모트 아이콘(EICON5)에 대응하는 아이콘 그래픽 데이터(IGDT5)를 포함할 때, 콘텐츠 선택부(170)는 도 5e에 도시된 꽃다발 모양의 감정적 영상을 나타내는 이모트 콘텐츠 데이터(ECNT5)를 콘텐츠 DB(101)로부터 판독한다. 또, 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)가 도 4에 도시된 이모트 아이콘(EICON6)에 대응하는 아이콘 그래픽 데이터(IGDT6)를 포함할 때, 콘텐츠 선택부(170)는 도 5f에 도시된 손 모양의 감정적 영상을 나타내는 이모트 콘텐츠 데이터(ECNT6)를 콘텐츠 DB(101)로부터 판독한다.Meanwhile, the emote message data (at least one of EMSSG1 to EMSGJ) includes at least one icon graphic data (at least one of IGDT1 to IGDTQ), and the emote content data (at least one of ECNT1 to ECNTJ) represents an emotional image. Assuming that the operation of the content selection unit 170 is described as follows. For example, when the emote message data (at least one of EMSSG1 to EMSGJ) includes the icon graphic data IGDT1 or IGDT2 corresponding to the emote icon EICON1 or EICON2 shown in FIG. 170 reads out this remote content data ECNT1 or ECNT2 representing the heart-shaped emotional image shown in FIG. 5A or 5B from the content DB 101. In addition, when the emote message data (at least one of EMSSG1 to EMSGJ) includes the icon graphic data IGDT3 corresponding to the emote icon EICON3 shown in Fig. 4, the content selection unit 170 is shown in Fig. 5C. The emote content data ECNT3 representing the illustrated cake-shaped emotional image is read from the content DB 101. When the emote message data (at least one of EMSSG1 to EMSGJ) includes the icon graphic data IGDT4 corresponding to the emote icon EICON4 shown in Fig. 4, the content selection unit 170 is shown in Fig. 5D. The emote content data ECNT4 representing the four-leaf clover-shaped emotional image is read from the content DB 101. When the emote message data (at least one of EMSSG1 to EMSGJ) includes the icon graphic data IGDT5 corresponding to the emote icon EICON5 shown in Fig. 4, the content selection unit 170 is shown in Fig. 5E. The emote content data ECNT5 representing the bouquet-shaped emotional image is read from the content DB 101. In addition, when the emote message data (at least one of EMSSG1 to EMSGJ) includes the icon graphic data IGDT6 corresponding to the emote icon EICON6 shown in Fig. 4, the content selection unit 170 is shown in Fig. 5F. The emote content data ECNT6 representing the illustrated emotional image of the hand is read from the content DB 101.

이 후, VoIP 서버(103)는 판독한 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나), 인코딩 오디오 데이터(EAUD), 및 인코딩 비디오 데이터(EVID)를 포함하는 변경 데이터 패킷(DPCK21)을 생성한다(단계 1008). 이를 좀 더 상세히 설명하면, 서버 제어부(160)가 패킷 해석부(152)로부터 수신한 인코딩 오디오 데이 터(EAUD) 및 인코딩 비디오 데이터(EVID)와, 콘텐츠 선택부(170)로부터 수신한 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)와, 전송 제어 신호(TCTL1)를 패킷 생성부(153)에 출력한다. 그 결과, 패킷 생성부(153)가 전송 제어 신호(TCTL1)에 응답하여, 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나), 인코딩 오디오 데이터(EAUD), 및 인코딩 비디오 데이터(EVID)에 기초하여 변경 데이터 패킷(DPCK21)을 생성하여, VoIP 서버(103)의 IP 송수신부(151)에 출력한다. 이 후, IP 송수신부(151)는 변경 데이터 패킷(DPCK21)을 인터넷 통신망(NT)을 통하여, VoIP 폰(VPK)에 전송한다(단계 1009). 그 결과, VoIP 폰(VPK)이 변경 데이터 패킷(DPCK21)을 수신하고, 인코딩 오디오 데이터(EAUD)에 기초하여, VoIP 폰(VP1) 사용자의 음성을 출력하고, 인코딩 비디오 데이터(EVID)에 기초하여, VoIP 폰(VP1) 사용자의 모습을 촬영한 영상을 표시하면서, 이모트 콘텐츠 데이터(ECNT1∼ECNTJ 중 적어도 하나)에 기초하여, 감정적 영상, 감정적 텍스트, 및 감정적 음향 중 적어도 하나를 출력한다. 또, 단계 1006에서, 데이터 패킷(DPCK11 또는 DPCK12)이 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 포함하지 않을 때(즉, VoIP 서버(103)가 데이터 패킷(DPCK12)을 수신한 경우), VoIP 서버(103)는 데이터 패킷(DPCK12)을 그대로 VoIP 폰(VPK)에 전송한다(단계 1010). 이를 좀 더 상세히 설명하면, 서버 제어부(160)가 전송 제어 신호(TCTL2)를 IP 송수신부(151)에 출력한다. 그 결과, IP 송수신부(151)가 전송 제어 신호(TCTL2)에 응답하여, 데이터 패킷(DPCK12)을 VoIP 폰(VPK)에 전송한다. VoIP 폰(VPK)은 데이터 패킷(DPCK12)을 수신하고, 인코딩 오디오 데이터(EAUD)에 기초하여, VoIP 폰(VP1) 사용자의 음성을 출력하고, 인코딩 비디오 데이터(EVID)에 기초하여, VoIP 폰(VP1) 사용자의 모습을 촬영한 영상을 표시한다.Thereafter, the VoIP server 103 generates a modified data packet DPCK21 including the read emote content data (at least one of the ECNT1 to ECNTJ), the encoded audio data EAUD, and the encoded video data EVID. (Step 1008). In more detail, the encoded audio data (EAUD) and encoded video data (EVID) received from the packet analyzer 152 and the emote content received from the content selector 170 are received by the server controller 160. The data (at least one of ECNT1 to ECNTJ) and the transmission control signal TCTL1 are output to the packet generation unit 153. As a result, the packet generator 153 responds to the transmission control signal TCTL1 and based on the emote content data (at least one of the ECNT1 to ECNTJ), the encoded audio data EAUD, and the encoded video data EVID. The change data packet DPCK21 is generated and output to the IP transmission / reception unit 151 of the VoIP server 103. Thereafter, the IP transceiver 151 transmits the change data packet DPCK21 to the VoIP phone VPK via the Internet communication network NT (step 1009). As a result, the VoIP phone VPK receives the change data packet DPCK21, outputs the voice of the VoIP phone VP1 user based on the encoded audio data EAUD, and based on the encoded video data EVID. And displaying at least one of an emotional image, an emotional text, and an emotional sound based on the emote content data (at least one of the ECNT1 to ECNTJ), while displaying an image photographing the user of the VoIP phone VP1. In step 1006, when the data packet DPCK11 or DPCK12 does not contain the emote message data (at least one of the EMSSG1 to the EMSSGJ) (i.e., when the VoIP server 103 receives the data packet DPCK12). The VoIP server 103 transmits the data packet DPCK12 to the VoIP phone VPK as it is (step 1010). In more detail, the server controller 160 outputs the transmission control signal TCTL2 to the IP transceiver 151. As a result, the IP transmission / reception unit 151 transmits the data packet DPCK12 to the VoIP phone VPK in response to the transmission control signal TCTL2. The VoIP phone VPK receives the data packet DPCK12, outputs the voice of the VoIP phone VP1 user based on the encoded audio data EAUD, and based on the encoded video data EVID, the VoIP phone ( VP1) Display the video taken by the user.

한편, 단계 1005에서, 단말 정보(VPIF1)가 단말 DB(102)에 저장된 것이 아닐 때, VoIP 서버(103)는 데이터 패킷(DPCK11 또는 DPCK12)이 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 포함하는지의 여부를 판단한다(단계 1011). 단계 1011에서, 데이터 패킷(DPCK11 또는 DPCK12)이 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 포함하지 않을 때(즉, VoIP 서버(103)가 데이터 패킷(DPCK12)을 수신한 경우), VoIP 서버(103)는 단계 1010의 동작을 반복한다. 또, 단계 1011에서, 데이터 패킷(DPCK11 또는 DPCK12)이 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 포함할 때(즉, VoIP 서버(103)가 데이터 패킷(DPCK11)을 수신한 경우), VoIP 서버(103)는 데이터 패킷(DPCK11)에 포함된 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 제거하여, 추가의 변경 데이터 패킷(DPCK20)을 생성한다(단계 1012). 이를 좀 더 상세히 설명하면, 서버 제어부(160)가 패킷 해석부(152)로부터 수신한 인코딩 오디오 데이터(EAUD) 및 인코딩 비디오 데이터(EVID)와, 전송 제어 신호(TCTL3)를 패킷 생성부(153)에 출력한다. 그 결과, 패킷 생성부(153)가 전송 제어 신호(TCTL3)에 응답하여, 인코딩 오디오 데이터(EAUD) 및 인코딩 비디오 데이터(EVID)에 기초하여 추가의 변경 데이터 패킷(DPCK20)을 생성하여, IP 송수신부(151)에 출력한다. IP 송수신부(151)는 패킷 생성부(153)로부터 수신되는 추가의 변경 데이터 패킷(DPCK20)을 인터넷 통신망(NT)을 통하여 착신 측 VoIP 폰(VPK)에 전송한다(단계 1013).On the other hand, in step 1005, when the terminal information VPIF1 is not stored in the terminal DB 102, the VoIP server 103 causes the data packet DPCK11 or DPCK12 to send emote message data (at least one of EMSSG1 to EMSSGJ). It is determined whether or not to include (step 1011). In step 1011, when the data packet DPCK11 or DPCK12 does not contain emote message data (at least one of EMSSG1 to EMSSGJ) (i.e., when the VoIP server 103 receives the data packet DPCK12), VoIP The server 103 repeats the operation of step 1010. In step 1011, when the data packet DPCK11 or DPCK12 includes emote message data (at least one of the EMSSG1 to the EMSSGJ) (i.e., when the VoIP server 103 receives the data packet DPCK11), The VoIP server 103 removes the emote message data (at least one of the EMSSG1 to the EMSSGJ) included in the data packet DPCK11 to generate an additional change data packet DPCK20 (step 1012). In more detail, the server controller 160 transmits the encoded audio data EAUD and the encoded video data EVID received from the packet analyzer 152 and the transmission control signal TCTL3 to the packet generator 153. Output to. As a result, the packet generator 153 generates an additional change data packet DPCK20 based on the encoded audio data EAUD and the encoded video data EVID in response to the transmission control signal TCTL3, and transmits and receives IP. Output to the unit 151. The IP transceiver 151 transmits the additional change data packet DPCK20 received from the packet generator 153 to the destination VoIP phone VPK via the Internet communication network NT (step 1013).

이 후, VoIP 서버(103)는 VoIP 폰들(VP1, VPK) 간의 호 연결의 해제 요청이 있는지의 여부를 판단한다(단계 1014). VoIP 폰(VP1)으로부터 통신 패킷(TPCK13)을 수신하거나, 또는 VoIP 폰(VPK)으로부터 통신 패킷(TPCK23)을 수신할 때, VoIP 서버(103)는 호 연결의 해제 요청이 있는 것으로 판단하고, VoIP 폰들(VP1, VPK) 간의 호 연결을 해제한다(단계 1015). 또, 단계 1014에서, 호 연결의 해제 요청이 없는 경우, VoIP 서버(103)와 VoIP 폰들(VP1, VPK)은 단계 1003 내지 단계 1014의 동작을 반복한다.Thereafter, the VoIP server 103 determines whether there is a request for release of call connection between the VoIP phones VP1 and VPK (step 1014). When receiving the communication packet TPCK13 from the VoIP phone VP1 or receiving the communication packet TPCK23 from the VoIP phone VPK, the VoIP server 103 determines that there is a call release request, and the VoIP The call connection between the phones VP1 and VPK is released (step 1015). Further, in step 1014, if there is no request for disconnection of the call connection, the VoIP server 103 and the VoIP phones VP1 and VPK repeat the operations of steps 1003 to 1014.

다음으로, 도 7을 참고하여, VoIP 폰(VP1)(즉, 발신 측 VoIP 폰)에 의한 데이터 패킷(DPCK11 또는 DPCK12)의 생성 과정(단계 1003)을 좀 더 상세히 설명한다. 먼저, VoIP 폰(VP1)의 마이크(112)가 사용자의 음성을 오디오 데이터(AUD)로 변환한다(단계 1101). 또, VoIP 폰(VP1)의 촬영부(113)는 사용자의 모습을 촬영하고, 그 촬영 영상을 비디오 데이터(VID)로 변환한다(단계 1102). VoIP 폰(VP1)의 미디어 프로세서(114)는 오디오 데이터(AUD)와 비디오 데이터(VID)를 각각 인코딩하고 인코딩 오디오 데이터(EAUD)와 인코딩 비디오 데이터(EVID)를 출력한다(단계 1103). 단말 제어부(120)는 입력부(111)로부터 음성 인식 선택 신호(SRSL)가 수신되는지의 여부에 따라, 음성 인식 기능이 선택되는지의 여부를 판단한다(단계 1104). 단계 1104에서, 음성 인식 기능이 선택된 경우, 단말 제어부(120)는 인코딩 오디오 데이터(EAUD)를 이모트 메시지 생성부(130)의 음성 인식부(131)에 출력한다. 음성 인식부(131)는 인코딩 오디오 데이터(EAUD)로부터 어휘 데이터들(VDAT1∼VDATM)을 추출한다(단계 1105). 어휘 검사부(132)는 음성 인식부(131)로부터 수신 되는 어휘 데이터들(VDAT1∼VDATM) 중 적어도 하나에 일치하는, 적어도 하나의 설정된 어휘 데이터(SVDT1∼SVDTP 중 적어도 하나)가 존재하는지의 여부를 판단한다(단계 1106). 단계 1106에서, 어휘 데이터들(VDAT1∼VDATM) 중 적어도 하나에 일치하는, 적어도 하나의 설정된 어휘 데이터(SVDT1∼SVDTP 중 적어도 하나)가 존재할 때, 어휘 검사부(132)는 적어도 하나의 설정된 어휘 데이터(SVDT1∼SVDTP 중 적어도 하나)를 이모트 메시지 생성부(130)의 메시지 출력부(136)에 출력한다. 그 결과, 메시지 출력부(136)가 적어도 하나의 설정된 어휘 데이터(SVDT1∼SVDTP 중 적어도 하나)를 포함하는 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 생성하여 단말 제어부(120)에 출력한다(단계 1107). 단말 제어부(120)는 인코딩 오디오 데이터(EAUD) 및 인코딩 비디오 데이터(EVID)를 통신부(140)의 패킷 생성부(141)에 출력하고, 메시지 출력부(136)로부터 수신되는 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 패킷 생성부(141)에 출력한다. 패킷 생성부(141)는 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나), 인코딩 오디오 데이터(EAUD), 및 인코딩 비디오 데이터(EVID)에 기초하여, 데이터 패킷(DPCK11)을 생성한다(단계 1108). 단계 1106에서, 어휘 데이터들(VDAT1∼VDATM) 중 적어도 하나에 일치하는, 적어도 하나의 설정된 어휘 데이터(SVDT1∼SVDTP 중 적어도 하나)가 존재하지 않을 때, VoIP 폰(VP1)은 인코딩 오디오 데이터(EAUD) 및 인코딩 비디오 데이터(EVID)에 기초하여, 데이터 패킷(DPCK12)을 생성한다(단계 1109). 이를 좀 더 상세히 설명하면, 어휘 데이터들(VDAT1∼VDATM) 중 적어도 하나에 일치하는, 적어도 하나의 설정된 어휘 데이터(SVDT1∼SVDTP 중 적어도 하나)가 존재하지 않을 때, 어휘 검사부(132)가 검사 완료 신호(CHEKEND)를 메시지 출력부(136)에 출력한다. 메시지 출력부(136)는 검사 완료 신호(CHEKEND)를 수신하면, 단말 제어부(120)에 어떠한 이모트 메시지 데이터도 출력하지 않고, 메시지 부재 신호(MABSS)를 출력한다. 단말 제어부(120)는 메시지 부재 신호(MABSS)에 응답하여, 인코딩 오디오 데이터(EAUD) 및 인코딩 비디오 데이터(EVID)만을 패킷 생성부(141)에 출력한다. 그 결과, 패킷 생성부(141)는 인코딩 오디오 데이터(EAUD) 및 인코딩 비디오 데이터(EVID)에 기초하여, 데이터 패킷(DPCK12)을 생성한다.Next, with reference to FIG. 7, the generation process (step 1003) of the data packet DPCK11 or DPCK12 by the VoIP phone VP1 (i.e., the originating VoIP phone) will be described in more detail. First, the microphone 112 of the VoIP phone VP1 converts the user's voice into audio data AUD (step 1101). In addition, the photographing unit 113 of the VoIP phone VP1 photographs the user's appearance and converts the photographed image into video data (VID) (step 1102). The media processor 114 of the VoIP phone VP1 encodes the audio data AUD and the video data VID, respectively, and outputs the encoded audio data EAUD and the encoded video data EVID (step 1103). The terminal controller 120 determines whether the speech recognition function is selected according to whether the speech recognition selection signal SRSL is received from the input unit 111 (step 1104). In operation 1104, when the speech recognition function is selected, the terminal controller 120 outputs the encoded audio data EAUD to the speech recognition unit 131 of the emote message generator 130. The speech recognition unit 131 extracts the lexical data VDAT1 to VDATM from the encoded audio data EAUD (step 1105). The lexical checking unit 132 determines whether there is at least one set lexical data (at least one of SVDT1 to SVDTP) corresponding to at least one of the lexical data VDAT1 to VDATM received from the speech recognition unit 131. Determine (step 1106). In step 1106, when there is at least one set vocabulary data (at least one of SVDT1 to SVDTP) corresponding to at least one of the vocabulary data VDAT1 to VDATM, the lexical checking unit 132 performs at least one set vocabulary data ( At least one of SVDT1 to SVDTP) is output to the message output unit 136 of the emote message generator 130. As a result, the message output unit 136 generates emote message data (at least one of EMSSG1 to EMSSGJ) including at least one set of lexical data (at least one of SVDT1 to SVDTP) and outputs it to the terminal controller 120. (Step 1107). The terminal controller 120 outputs the encoded audio data EAUD and the encoded video data EVID to the packet generator 141 of the communicator 140, and receives the emote message data EMSG1 received from the message output unit 136. At least one of ... EMSGJ) is output to the packet generation unit 141. The packet generator 141 generates a data packet DPCK11 based on emote message data (at least one of EMSSG1 to EMSSGJ), encoded audio data EAUD, and encoded video data EVID (step 1108). . In step 1106, when there is no at least one set vocabulary data (at least one of SVDT1 to SVDTP) corresponding to at least one of the vocabulary data VDAT1 to VDATM, the VoIP phone VP1 is encoded audio data EAUD. And the data packet DPCK12 based on the encoded video data EVID (step 1109). In more detail, when the at least one set of lexical data (at least one of SVDT1 to SVDTP) corresponding to at least one of the lexical data VDAT1 to VDATM does not exist, the lexical checker 132 checks completion. The signal CHEKEND is output to the message output unit 136. When the message output unit 136 receives the check completion signal CHEKEND, the message output unit 136 does not output any emote message data to the terminal control unit 120, and outputs a message absence signal MABSS. The terminal controller 120 outputs only the encoded audio data EAUD and the encoded video data EVID to the packet generator 141 in response to the message absence signal MABSS. As a result, the packet generator 141 generates a data packet DPCK12 based on the encoded audio data EAUD and the encoded video data EVID.

한편, 단계 1104에서, 음성 인식 기능이 선택되지 않은 경우, 단말 제어부(120)는 입력부(111)로부터 아이콘 인식 선택 신호(IRSL)가 수신되는지의 여부에 따라, 이모트 아이콘 선택 기능이 선택되는지의 여부를 판단한다(단계 1110). 단계 1110에서, 이모트 아이콘 선택 기능이 선택되지 않은 경우(즉, 음성 인식 기능과, 이모트 아이콘 선택 기능이 모두 선택되지 않은 경우), VoIP 폰(VP1)은 단계 1109의 동작을 반복한다. 또, 단계 1110에서, 이모트 아이콘 선택 기능이 선택된 경우, VoIP 폰(VP1)은 이모트 아이콘들(EICON1∼EICONQ)을 디스플레이 화면에 표시한다(단계 1111). 이를 좀 더 상세히 설명하면, 단말 제어부(120)가 아이콘 인식 선택 신호(IRSL)에 응답하여, 아이콘 표시 신호(ICDP)를 이모트 메시지 생성부(130)의 디스플레이 제어부(134)에 출력한다. 디스플레이 제어부(134)는 아이콘 표시 신호(ICDP)에 응답하여, 아이콘 저장부(133)로부터 아이콘 그래픽 데이터들(IGDT1∼IGDTQ)을 판독하여, 사용자 인터페이스부(110)의 디스플레이부(115)에 출력한다. 디스플레이부(115)는 아이콘 그래픽 데이터들(IGDT1∼IGDTQ)에 기초하여, 이모트 아이콘들(EICON1∼EICONQ)을 표시한다.On the other hand, when the voice recognition function is not selected in step 1104, the terminal controller 120 determines whether the emote icon selection function is selected according to whether the icon recognition selection signal IRSL is received from the input unit 111. It is determined whether or not (step 1110). In step 1110, when the emote icon selection function is not selected (that is, both the voice recognition function and the emote icon selection function are not selected), the VoIP phone VP1 repeats the operation of step 1109. Also, in step 1110, when the emote icon selection function is selected, the VoIP phone VP1 displays the emote icons EICON1 to EICONQ on the display screen (step 1111). In more detail, the terminal controller 120 outputs the icon display signal ICDP to the display controller 134 of the emote message generator 130 in response to the icon recognition selection signal IRSL. The display control unit 134 reads the icon graphic data IGDT1 to IGDTQ from the icon storage unit 133 in response to the icon display signal ICDP and outputs the icon graphic data IGDT1 to IGDTQ to the display unit 115 of the user interface unit 110. do. The display unit 115 displays the emote icons EICON1 to EICONQ based on the icon graphic data IGDT1 to IGDTQ.

단말 제어부(120)는 입력부(111)로부터 이모트 아이콘 선택 신호(EISL)가 수신되는지의 여부에 따라, 사용자에 의해 선택된 적어도 하나의 이모트 아이콘이 존재하는지의 여부를 판단한다(단계 1112). 단계 1112에서, 사용자에 의해 선택된 적어도 하나의 이모트 아이콘이 존재하지 않을 때, VoIP 폰(VP1)은 단계 1109의 동작을 반복한다. 또, 단계 1112에서, 사용자에 의해 선택된 적어도 하나의 이모트 아이콘이 존재할 때, 이모트 메시지 생성부(130)는 선택된 적어도 하나의 이모트 아이콘에 대응하는 적어도 하나의 아이콘 그래픽 데이터(IGDT1∼IGDTQ 중 적어도 하나)를 포함하는 이모트 메시지 데이터(EMSG1∼EMSGJ 중 적어도 하나)를 생성한다(단계 1113). 이 후, VoIP 폰(VP1)은 단계 1108의 동작을 반복한다.The terminal controller 120 determines whether there is at least one emote icon selected by the user according to whether the emote icon selection signal EISL is received from the input unit 111 (step 1112). In step 1112, when there is no at least one emote icon selected by the user, the VoIP phone VP1 repeats the operation of step 1109. In operation 1112, when the at least one emote icon selected by the user exists, the emote message generating unit 130 may include at least one icon graphic data (IGDT1 to IGDTQ) corresponding to the selected at least one emote icon. Generate emote message data (at least one of EMSG1 to EMSGJ) including at least one) (step 1113). Thereafter, the VoIP phone VP1 repeats the operation of step 1108.

상술한 것과 같이, VoIP 전화 통신 시스템(100) 및 그 방법에 의하면, 발신 측 VoIP 폰이 사용자의 음성을 인식하거나, 또는 사용자가 선택한 이모트 아이콘을 인식하여 이모트 메시지 데이터를 생성하고, VoIP 서버가 이모트 메시지 데이터에 대응하는 이모트 콘텐츠 데이터를 착신 측 VoIP 폰에 제공하므로, 착신 측 VoIP 폰이 사용자의 음성 및 영상과 함께, 감정적 영상, 감정적 텍스트, 및 감정적 음향 중 적어도 하나를 출력할 수 있다. 따라서 사용자는 음성 및 영상 통화 시 상대방에게 자신의 감정을 좀 더 풍부하게 표현할 수 있다. 또, 콘텐츠 DB(101)가 비교적 큰 용량의 이모트 콘텐츠 데이터들을 저장하고, VoIP 서버가 발신 측 VoIP 폰으로부터 비교적 작은 용량의 이모트 메시지 데이터를 수신할 때마다, 이에 대응하는 이모트 콘텐츠 데이터를 콘텐츠 DB(101)로부터 판독하여 착신 측 VoIP 폰에 제공하 므로, 각 VoIP 폰이 이모트 콘텐츠 데이터들을 별도로 저장할 필요가 없다. 따라서, VoIP 폰의 메모리 용량이 감소할 수 있다.As described above, according to the VoIP telephony communication system 100 and the method, the calling VoIP phone recognizes the user's voice or the emote icon selected by the user to generate emote message data, and the VoIP server Provides the emote content data corresponding to the emote message data to the destination VoIP phone, so that the destination VoIP phone can output at least one of an emotional video, an emotional text, and an emotional sound together with the user's voice and video. have. Therefore, the user can express his feelings more abundantly to the other party during voice and video calls. In addition, whenever the content DB 101 stores relatively large volume of emote content data, and the VoIP server receives relatively small volume of emote message data from the calling party's VoIP phone, the corresponding emote content data is stored. Since it reads from the content DB 101 and provides it to the called party's VoIP phone, each VoIP phone does not need to store emote content data separately. Therefore, the memory capacity of the VoIP phone can be reduced.

상기한 실시 예들은 본 발명을 설명하기 위한 것으로서 본 발명이 이들 실시 예에 국한되는 것은 아니며, 본 발명의 범위 내에서 다양한 실시예가 가능하다. 또한 설명되지는 않았으나, 균등한 수단도 또한 본 발명에 그대로 결합되는 것이라 할 것이다. 따라서 본 발명의 진정한 보호범위는 아래의 특허청구범위에 의하여 정해져야 할 것이다. The above embodiments are for explaining the present invention, and the present invention is not limited to these embodiments, and various embodiments are possible within the scope of the present invention. In addition, although not described, equivalent means will also be referred to as incorporated in the present invention. Therefore, the true scope of the present invention will be defined by the claims below.

도 1은 본 발명의 일 실시예에 따른 VoIP 전화 통신 시스템의 개략적인 블록 구성도이다.1 is a schematic block diagram of a VoIP telephone communication system according to an embodiment of the present invention.

도 2는 도 1에 도시된 VoIP 폰의 상세한 블록 구성도이다.FIG. 2 is a detailed block diagram of the VoIP phone shown in FIG.

도 3은 도 1에 도시된 VoIP 서버의 상세한 블록 구성도이다.3 is a detailed block diagram of the VoIP server shown in FIG.

도 4는 도 2에 도시된 디스플레이부에 표시된 이모트 아이콘들의 일례를 나타내는 도면이다.FIG. 4 is a diagram illustrating an example of emote icons displayed on the display unit illustrated in FIG. 2.

도 5a 내지 도 5f는 도 2에 도시된 디스플레이부에 표시된 감정적 영상들의 일례를 나타내는 도면이다.5A through 5F are diagrams showing examples of emotional images displayed on the display unit illustrated in FIG. 2.

도 6은 도 1에 도시된 VoIP 전화 통신 시스템의 동작 과정을 나타내는 흐름도이다.FIG. 6 is a flowchart illustrating an operation process of the VoIP telephony communication system shown in FIG. 1.

도 7은 도 6에 도시된 발신 측 VoIP 폰에 의한 데이터 패킷의 생성 과정을 나타내는 상세한 흐름도이다.FIG. 7 is a detailed flowchart illustrating a process of generating a data packet by an originating VoIP phone illustrated in FIG. 6.

〈도면의 주요 부분에 대한 부호의 설명〉<Explanation of symbols for main parts of drawing>

100 : VoIP 전화 통신 시스템 VP1∼VPK : VoIP 폰100: VoIP telephone communication system VP1 to VPK: VoIP phone

101 : 콘텐츠 DB 102 : 단말 DB101: content DB 102: terminal DB

103 : VoIP 서버 104 : 관리 DB103: VoIP Server 104: Management DB

110 : 사용자 인터페이스부 120 : 단말 제어부110: user interface unit 120: terminal control unit

130 : 이모트 메시지 생성부 131 : 음성 인식부130: emote message generation unit 131: speech recognition unit

132 : 어휘 검사부 133 : 아이콘 저장부132: vocabulary checker 133: icon storage unit

134 : 디스플레이 제어부 135 : 아이콘 선택부134: display control unit 135: icon selection unit

136 : 메시지 출력부 140, 150 : 통신부136: message output unit 140, 150: communication unit

160 : 서버 제어부 170 : 콘텐츠 선택부160: server control unit 170: content selection unit

180 : 단말 관리부180: terminal management unit

Claims

A plurality of VoIP phones having a function of generating emote message data including at least one of set lexical data or at least one of icon graphic data, and a voice and video call function;

A VoIP server that provides communication between the plurality of VoIP phones via an internet communication network and relays transmission of data packets between the plurality of VoIP phones; And

Respectively corresponding to the emote message data, storing emote content data representing at least one of an emotional image, an emotional text, and an emotional sound, and transmitting the emote content data to the VoIP server at the request of the VoIP server. Including provided content DB,

The data packet includes encoded audio data corresponding to the voice of the user and encoded video data corresponding to an image of the user, or at least one of the emote message data, and the encoded audio data. And the encoded video data,

Each of the plurality of VoIP phones recognizes a voice of the user while providing a voice and video call function with a call counterpart to the user, and the voice of the user corresponds to at least one of the set lexical data. Generate at least one of the emote message data each time the user selects one of the emote icons displayed by each of the plurality of VoIP phones based on the icon graphic data;

The VoIP server corresponds to the at least one emote message data whenever a data packet from an originating VoIP phone among the plurality of VoIP phones includes at least one emote message data. Reads the content data from the content DB, generates a modified data packet including the at least one emote content data, the encoded audio data, and the encoded video data, and transmits the changed data packet to the called party VoIP phone of the plurality of VoIP phones. VoIP telephony system to transmit.

The method of claim 1,

And the VoIP server transmits the data packet to the called party's VoIP phone as it is, when the data packet from the calling party's VoIP phone contains only the encoded audio data and the encoded video data.

The method of claim 1, wherein the VoIP server,

Interpreting the data packet received from the originating VoIP phone, outputting the encoded audio data and the encoded video data, or at least one of the emote message data, the encoded audio data, and the encoded video data. A communication unit which outputs the modified data packet in response to a first transmission control signal and transmits the changed data packet to the called VoIP server, or transmits the data packet as it is to the called VoIP server in response to a second transmission control signal. ;

A content selection unit reading the at least one emote content data corresponding to the at least one emote message data from the content DB in response to a content request signal; And

The content request signal and the at least one when outputting the first or second transmission control signal and receiving the at least one emote message data from the communication unit according to a result of the analysis of the data packet by the communication unit; And a server controller for outputting emote message data of the at least one emoticon content data from the content selection unit and outputting the at least one emote content data from the content selection unit.

The method of claim 3,

A management DB for storing a control program related to the operation of the server controller; And

Further comprising a terminal DB for storing the terminal information for each of the plurality of VoIP phones,

And the communication unit further interprets a communication packet received from the originating VoIP phone, and outputs terminal information and a call connection request signal of the originating and destination VoIP phone.

The method of claim 4, wherein

The VoIP server further includes a terminal manager that determines whether the terminal information of the originating VoIP phone is stored in the terminal DB, and outputs an approval signal or an invalid signal according to the determination result.

The server control unit outputs terminal information of the originating VoIP phone received from the communication unit to the terminal management unit, and transmits the first or second transmission in accordance with an analysis result of the approval signal and the data packet by the communication unit. Outputting a control signal or outputting the second transmission control signal or the third transmission control signal according to a result of analysis of the invalid signal and the data packet by the communication unit;

And the communication unit, in response to the third transmission control signal, further generates an additional change data packet including the encoded audio data and the encoded video data and transmits it to the called party's VoIP phone.

The method of claim 2, wherein each of the plurality of VoIP phones,

According to the user's input, one of a voice recognition selection signal, an icon recognition selection signal, and an emote icon selection signal is output, and the voice of the user is converted into the encoded audio data and output, and the image of the user is photographed. A user interface for converting and outputting one image into the encoded video data, and visually displaying the emote icons that imply the emotion or feeling of the user based on the icon graphic data;

At least one of the set lexical data when the voice of the user is recognized based on the encoded audio data, and the voice of the user includes content corresponding to at least one of the set lexical data Generating one emote message data or outputting the icon graphic data to the user interface in response to an icon display signal and corresponding to the selected emote icon in response to the emote icon selection signal; An emote message generator for generating at least one emote message data including at least one of icon graphic data;

Generating the data packet including the encoded audio data and the encoded video data, or including the at least one emote message data, the encoded audio data, and the encoded video data, and generating the data packet through the Internet communication network. Communication unit for transmitting to the server; And

In response to the voice recognition selection signal, outputting the encoded audio data received from the user interface unit to the emote message generation unit, in response to the icon recognition selection signal, outputting the icon display signal, and the user Outputting the emote icon selection signal received from an interface unit to the emote message generation unit, outputting the encoded audio data and the encoded video data to the communication unit, or received from the emote message generation unit; And a terminal controller for outputting at least one emote message data, said encoded audio data and said encoded video data.

The method of claim 6, wherein the emote message generator,

A speech recognition unit extracting and outputting lexical data from the encoded audio data received from the terminal controller;

Store the set vocabulary data representing each part or all of a plurality of words, a plurality of phrases, and a plurality of sentences in advance, compare the vocabulary data received from the speech recognition unit with the set vocabulary data, and A lexical checker configured to output at least one set lexical data corresponding to at least one of the data;

An icon storage unit for storing the icon graphic data representing each of the emote icons;

A display control unit which reads the icon graphic data from the icon storage unit and outputs the icon graphic data in response to the icon display signal;

An icon selector configured to read and output at least one icon graphic data representing at least one emote icon selected by the user from the icon storage in response to the emote icon selection signal received from the terminal controller; And

Generate the at least one emote message data based on the at least one set vocabulary data received from the vocabulary checker or the at least one icon graphic data received from the icon selector, and output the generated at least one emote message data to the terminal controller. VoIP telephony system comprising a message output unit.

The method of claim 6,

The communication unit interprets the data packet or the change data packet received from the VoIP server through the Internet communication network, and outputs encoded audio data and encoded video data of the calling party VoIP phone, or the calling party VoIP Output encoded audio data and encoded video data of the phone and the at least one emote content data,

The terminal controller may be further configured to transmit the encoded audio data and encoded video data of the originating VoIP phone or the encoded audio data and encoded video data of the calling VoIP phone and the at least one emote content data received from the communication unit. Output to the interface unit,

The user interface unit outputs the voice of the calling VoIP phone user based on the encoded audio data of the calling VoIP phone, and based on the encoded video data of the calling VoIP phone, And display at least one of the emotional image, the emotional text, and the emotional sound based on the at least one emote content data.

The method of claim 1,

And the VoIP server and the plurality of VoIP phones communicate with each other by a session initiation protocol (SIP) communication method.

Connecting, by the VoIP server, a call between at least two VoIP phones;

Generating, by an originating VoIP phone of the at least two VoIP phones, a data packet comprising emote message data, encoded audio data, and encoded video data, or comprising encoded audio data and encoded video data;

When the data packet received from the originating VoIP phone includes the emote message data, the VoIP server extracts emote content data, encoded audio data, and encoded video data corresponding to the emote message data. Generating a change data packet comprising;

Sending, by the VoIP server, the change data packet to a called VoIP phone of the at least two VoIP phones;

When the data packet received from the originating VoIP phone does not include the emote message data, transmitting, by the VoIP server, the data packet as it is to the called party VoIP phone; And

Repeating the step of generating the data packet or transmitting the change data packet or the data packet to the called party's VoIP phone until the call connection between the at least two VoIP phones is released;

The emote message data includes at least one of icon graphic data respectively representing emote icons that imply a feeling or a feeling of a user, or at least one of set vocabulary data, wherein the emote content data of the user A VoIP telephony method for representing at least one of an emotional image, an emotional text, and an emotional sound that expresses an emotion or feeling.

The method of claim 10,

Receiving, by the VoIP server, terminal information of the at least two VoIP phones when the call is connected between the at least two VoIP phones;

When the terminal information of the originating VoIP phone is not stored in the terminal DB and the data packet includes the emote message data, the VoIP server removes the emote message data included in the data packet. Generating an additional change data packet;

Sending, by the VoIP server, the additional change data packet to the called party's VoIP phone; And

When the terminal information of the calling VoIP phone is not stored in the terminal DB and the data packet does not include the emote message data, the VoIP server transmits the data packet to the called VoIP phone as it is. More steps,

The generating of the modified data packet and transmitting the changed data packet to the called VoIP phone are executed only when the terminal information of the calling VoIP phone is stored in the terminal DB.

The method of claim 10, wherein the generating of the data packet comprises:

Converting the user's voice into audio data;

Converting the photographed image of the user into video data;

Encoding the audio data and the video data, respectively;

Extracting lexical data from the encoded audio data when a speech recognition function is selected;

Generating the emote message data including the at least one set vocabulary data when there is at least one set vocabulary data corresponding to at least one of the vocabulary data among the set vocabulary data;

When the emote icon selection function is selected, displaying the emote icons on a display screen based on the icon graphic data;

Generating the emote message data comprising at least one icon graphic data corresponding to the at least one emote icon selected by the user;

Generating the data packet based on the emote message data, the encoded audio data, and the encoded video data; And

The at least one set vocabulary data corresponding to at least one of the vocabulary data does not exist, or the at least one emote icon selected by the user does not exist, or the speech recognition function and the emote icon If both selection functions are not selected, generating the data packet based on the encoded audio data and the encoded video data.