KR101326651B1

KR101326651B1 - Apparatus and method for image communication inserting emoticon

Info

Publication number: KR101326651B1
Application number: KR1020060130335A
Authority: KR
Inventors: 강래훈
Original assignee: 엘지전자 주식회사
Priority date: 2006-12-19
Filing date: 2006-12-19
Publication date: 2013-11-08
Also published as: KR20080057030A

Abstract

본 발명은 화상통화장치 및 방법에 관한 것으로, 외부로부터 음성 및/또는 영상을 입력으로 받아 음성인식 및 안면인식 기술을 이용하여 언어 및 표정으로 표현되는 감정상태를 인지하고, 상기 인지한 감정상태에 대응되는 이모티콘을 검색하여 상기 입력영상에 해당 이모티콘을 삽입하여 합성하며 상기 합성된 영상을 수신측의 화상통화단말로 전송한다. 이와 같이, 본 발명에서는 사용자의 조작없이 손쉽게 사용자의 감정상태에 해당하는 이모티콘을 입력영상과 합성하여 전송하므로 시각적인 욕구를 충족시킬 수 있다.The present invention relates to a video call apparatus and method, which receives a voice and / or an image from the outside and recognizes an emotional state expressed by language and facial expression using voice recognition and facial recognition technology, Search for the corresponding emoticon, insert the corresponding emoticon into the input image, and synthesize the emoticon. The synthesized image is transmitted to the video call terminal at the receiving end. As described above, in the present invention, an emoticon corresponding to the emotional state of the user is easily synthesized and transmitted without input by the user, thereby satisfying the visual desire.

화상통화, 이모티콘, 음성인식, 안면인식 Video call, Emoticon, Voice recognition, Facial recognition

Description

Video call device and method using emoticon {APPARATUS AND METHOD FOR IMAGE COMMUNICATION INSERTING EMOTICON}

도 1은 본 발명에 따른 이모티콘을 이용한 화상통화장치의 블록구성도.1 is a block diagram of a video call apparatus using an emoticon according to the present invention.

도 2는 본 발명에 따른 이모티콘을 이용한 화상통화장치의 이모티콘 관리 화면 예시도.Figure 2 is an illustration of an emoticon management screen of the video call device using the emoticon according to the present invention.

도 3은 본 발명에 따른 이모티콘을 이용한 화상통화방법을 설명하기 위한 흐름도.3 is a flowchart illustrating a video call method using an emoticon according to the present invention.

도 4 및 도 5는 본 발명에 따른 이모티콘을 이용한 화상통화 화면 예시도.4 and 5 are examples of a video call screen using an emoticon according to the present invention.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

100: 마이크 110: 카메라100: microphone 110: camera

120: 감지부 122: 음성인식부120: detection unit 122: voice recognition unit

124: 안면인식부 130: 처리부124: facial recognition unit 130: processing unit

140: 저장부 150: 합성부140: storage unit 150: synthesis unit

160: 통신부 170: 표시부160: communication unit 170: display unit

본 발명은 화상통화 장치에 관한 것으로, 특히 화상통화 중 사용자의 감정상태를 인식하여 해당하는 이모티콘(emoticon)을 영상에 삽입하여 전송하는 이모티콘을 이용한 화상통화장치 및 방법에 관한 것이다.The present invention relates to a video call apparatus, and more particularly, to a video call apparatus and method using an emoticon for recognizing a user's emotional state and inserting a corresponding emoticon into an image and transmitting the same.

일반적으로, 이동통신단말과 같은 화상통화장치는 카메라와 마이크를 통해 획득한 사용자의 영상 및 음성을 송수신하므로 화상통화를 수행한다. 이러한 화상통화 장치는 카메라를 통해 획득한 영상을 그대로 전송하므로 시각적인 효과를 중요시하는 사용자들의 욕구를 충족시키지 못하였다.In general, a video call apparatus such as a mobile communication terminal transmits and receives video and audio of a user acquired through a camera and a microphone, thereby performing a video call. Since the video call device transmits the image acquired through the camera as it is, it does not satisfy the needs of users who value the visual effect.

상기한 문제점을 해결하기 위해 종래에는 카메라를 통해 획득한 영상에 다양한 이모티콘을 삽입하여 전송할 수 있게 하므로 사용자들의 시각적인 욕구를 충족시켰다. 즉, 화상통화 중에 사용자가 이모티콘의 삽입을 원하면 사용자는 키 조작을 통해 원하는 이모티콘을 선택하고, 방향키를 사용하여 상기 선택한 이모티콘이 삽입될 위치를 결정하여 그 위치에 상기 이모티콘을 배치한다. 상기 이모티콘의 배치가 완료되면 화상통화장치는 카메라를 통해 입력받은 영상에 상기 이모티콘을 삽입하여 전송한다.In order to solve the above problems, it is possible to insert and transmit a variety of emoticons in the image obtained through the camera in order to satisfy the visual needs of the user. That is, when a user wants to insert an emoticon during a video call, the user selects a desired emoticon through key manipulation, determines a position to insert the selected emoticon by using a direction key, and places the emoticon at the position. When the placement of the emoticon is completed, the video call device inserts the emoticon into an image received through a camera and transmits the emoticon.

그러나, 상기한 바와 같은 종래기술에서는 이모티콘을 삽입할 때마다 사용자가 키 조작 통해 원하는 이모티콘을 선택하고 원하는 위치에 배치해야 하는 불편함이 있다. However, in the prior art as described above, whenever the user inserts the emoticon, the user has to select a desired emoticon through key manipulation and place it in a desired position.

따라서, 본 발명의 목적은 사용자의 감정상태를 인지하여 해당 이모티콘을 삽입하여 전송하는 이모티콘을 이용한 화상통화장치 및 방법을 제공하는 데 있다.Accordingly, an object of the present invention is to provide a video call apparatus and method using an emoticon for recognizing a user's emotional state and inserting and transmitting a corresponding emoticon.

상기한 바와 같은 목적을 달성하기 위한 본 발명은 화자의 감정상태를 감지하는 감지부와; 상기 감지된 감정상태에 대응되는 이모티콘을 카메라를 통해 획득한 영상에 삽입하여 전송하는 처리부를 포함한다.The present invention for achieving the above object and the sensing unit for detecting the emotional state of the speaker; And a processor configured to insert an emoticon corresponding to the detected emotional state into an image obtained through a camera and transmit the emoticon.

바람직하게, 상기 감지부는 외부로부터 입력받은 음성신호로부터 언어정보를 추출하는 음성인식부 및 입력받은 영상으로부터 안면정보를 추출하는 안면인식부 중 적어도 하나 이상으로 구성된 것을 특징으로 하는 이모티콘을 이용한 화상통화장치.Preferably, the sensing unit is a video call device using an emoticon comprising at least one of a voice recognition unit for extracting language information from the voice signal received from the outside and a face recognition unit for extracting facial information from the received image .

바람직하게, 상기 안면정보는 얼굴영역 및 눈, 코, 입과 같은 국부영역에 대한 특징정보, 그리고 상기 특징정보를 토대로 생성되는 표정정보를 포함한다.Preferably, the facial information includes facial information and feature information about local areas such as eyes, nose, and mouth, and facial expression information generated based on the feature information.

바람직하게, 상기 이모티콘과 상기 이모티콘에 대응되는 언어정보 및 표정정보, 상기 이모티콘이 삽입될 위치정보를 저장하고 있는 저장부를 더 포함한다.Preferably, the apparatus further includes a storage unit for storing the emoticon, language information and facial expression information corresponding to the emoticon, and location information for inserting the emoticon.

바람직하게, 안면정보 및 언어정보를 토대로 상기 영상에 해당 이모티콘을 삽입하는 합성부와; 상기 합성부에 의해 합성된 영상을 전송하기 위한 통신부를 더 포함한다.Preferably, the synthesis unit for inserting the corresponding emoticon in the image based on the facial information and language information; The apparatus further includes a communication unit for transmitting the image synthesized by the synthesis unit.

본 발명의 다른 특징에 따르면, 본 발명에 따른 화상통화방법은 사용자의 감정상태를 감지하는 단계와; 상기 감정상태에 따른 해당 이모티콘을 카메라를 통해 획득한 영상에 삽입하여 전송하는 단계를 포함한다.According to another feature of the invention, the video call method according to the invention comprises the steps of detecting the emotional state of the user; And inserting the corresponding emoticon according to the emotional state into an image obtained through a camera.

바람직하게, 상기 감정상태 감지단계는 입력받은 음성신호로부터 상기 감정상태를 나타내는 언어정보를 추출하는 음성인식에 의해 수행된다.Preferably, the emotion state detection step is performed by voice recognition to extract language information indicating the emotion state from the received voice signal.

바람직하게, 상기 감정상태 감지단계는 상기 영상으로부터 안면정보를 추출 하는 안면인식에 의해 수행된다.Preferably, the emotion state detection step is performed by facial recognition to extract facial information from the image.

바람직하게, 상기 감정상태 감지단계는, 상기 영상으로부터 안면의 특징정보를 추출하는 단계와; 상기 추출한 특징정보에 근거하여 표정정보를 생성하는 단계를 포함한다.Preferably, the emotional state detection step, extracting the facial feature information from the image; Generating facial expression information based on the extracted feature information.

바람직하게, 상기 감정상태 감지단계는 입력받은 음성신호로부터 언어정보를 추출하는 음성인식 및 상기 영상으로부터 안면정보를 추출하는 안면인식에 의해 수행된다.Preferably, the emotion state detection step is performed by voice recognition to extract language information from the received voice signal and facial recognition to extract facial information from the image.

바람직하게, 상기 안면정보는 얼굴영역 및 눈, 코, 입과 같은 국부영역에 대한 특징정보, 그리고 상기 특징정보에 근거한 표정정보를 포함한다.Preferably, the facial information includes facial information and feature information on local areas such as eyes, nose and mouth, and facial expression information based on the feature information.

바람직하게, 상기 이모티콘을 삽입하여 전송하는 단계는, 상기 감정상태에 대응되는 이모티콘을 검색하는 단계와; 상기 검색된 이모티콘을 상기 입력영상에 배치하여 합성하는 단계를 포함한다.Preferably, the step of inserting and transmitting the emoticon includes: searching for an emoticon corresponding to the emotional state; And placing the retrieved emoticon on the input image to synthesize the emoticon.

이하, 본 발명에 따른 바람직한 실시 예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 이모티콘을 이용한 화상통화장치의 블록구성도이다.1 is a block diagram of a video call apparatus using an emoticon according to the present invention.

도 1에 도시된 바와 같이, 본 발명에 따른 화상통화장치는 마이크(100), 카메라(110), 감지부(120), 그리고 처리부(130)를 구비한다.As shown in FIG. 1, the video call apparatus according to the present invention includes a microphone 100, a camera 110, a detector 120, and a processor 130.

상기 마이크(100)는 외부로부터 음성신호를 입력받기 위한 입력장치이고, 상기 카메라(110)는 외부로부터 영상을 입력받기 위한 수단이다.The microphone 100 is an input device for receiving an audio signal from the outside, and the camera 110 is a means for receiving an image from the outside.

상기 감지부(120)는 상기 마이크(100)로부터 전송되는 음성신호로부터 언어정보를 추출하는 음성인식부(122) 및 상기 카메라(110)를 통해 획득한 영상으로부터 안면정보를 추출하는 안면인식부(124) 중 적어도 하나 이상을 사용하여 구현된다. 여기서, 상기 안면정보는 눈, 코, 입과 같은 국부영역 및 얼굴영역에 대한 특징정보와 상기 특징정보에 근거하여 인지된 표정정보를 포함한다. 상기 감지부(120)는 언어(speech) 및 표정으로 표현되는 사용자의 감정상태를 상기 음성인식부(122) 및 상기 안면인식부(124)를 이용하여 감지한다.The detector 120 may include a voice recognition unit 122 for extracting language information from the voice signal transmitted from the microphone 100 and a face recognition unit for extracting facial information from an image obtained through the camera 110. 124). Here, the facial information includes feature information on local and face areas such as eyes, nose, and mouth, and facial expression information recognized based on the feature information. The detector 120 detects an emotional state of a user expressed in speech and expression using the voice recognizer 122 and the face recognizer 124.

상기 음성인식부(122)는 상기 음성신호로부터 감정상태를 나타내는 언어정보를 추출한다. 여기서, 언어정보 추출은 상기 음성신호로부터 잡음(noise)을 제거하고 특징 파라미터(feature parameter)를 추출하는 특징 추출과정과, 그 특징 파라미터를 이용하여 단어 및 문장을 인식하는 인식과정을 수행한다. 예컨대, 사용자가 "행복"이라고 발성하면 상기 음성인식부(122)는 입력받은 음성신호의 패턴(특징 파라미터)과 데이터 베이스에 저장된 참조패턴의 유사도를 측정하여 인식한 결과를 텍스트 데이터(text data) "행복"으로 출력한다. 여기서, 상기 유사도 측정을 위해 HMM(Hiddem Markow Model), DTW(Dynamic Time Warping), 신경회로망(Neural Network) 모델 등이 사용된다.The voice recognition unit 122 extracts language information representing an emotional state from the voice signal. Here, the language information extraction performs a feature extraction process of removing noise from the voice signal and extracting feature parameters, and a recognition process of recognizing words and sentences using the feature parameters. For example, when the user speaks "happy," the speech recognition unit 122 measures the similarity between the pattern (feature parameter) of the input speech signal and the reference pattern stored in the database, and recognizes the result of the text data. Outputs "happy" Here, a HID (Hiddem Markow Model), a Dynamic Time Warping (DTW), a Neural Network (Neural Network) model, etc. are used to measure the similarity.

그리고, 상기 안면인식부(124)는 상기 획득한 영상에 포함된 잡음을 제거하는 영상처리 및 분할을 수행하는 전처리부(미도시)와 상기 영상으로부터 눈, 코, 입과 같은 국부영역 및 얼굴영역에 대한 특징정보를 검출하는 검출부(미도시)를 구비한다. 상기 안면인식부(124)는 상기 검출된 결과를 토대로 사용자의 감정상태가 표현되는 표정정보를 텍스트 데이터로 출력한다.The face recognition unit 124 may include a preprocessing unit (not shown) that performs image processing and segmentation to remove noise included in the obtained image, and local and face regions such as eyes, nose, and mouth from the image. And a detector (not shown) for detecting feature information about the apparatus. The face recognition unit 124 outputs facial expression information representing the emotional state of the user as text data based on the detected result.

상기 처리부(130)는 상기 감지부(120)에 의해 감지된 감정상태 즉, 언어정보 및 표정정보에 대응되는 이모티콘을 검색하여 출력하고, 검색된 이모티콘과 상기 영상을 합성하여 전송하게 한다. 그리고, 상기 처리부(130)는 각 구성요소의 동작을 제어하여 상기 화상통화장치의 전반적인 동작을 수행한다.The processor 130 searches for and outputs an emoticon corresponding to an emotional state sensed by the sensor 120, that is, language information and facial expression information, and transmits the synthesized emoticon and the image. The processor 130 controls the operation of each component to perform overall operations of the video call apparatus.

그리고, 본 발명에 따른 이모티콘을 이용한 화상통화장치는 상기 이모티콘, 이모티콘 명칭, 그리고 상기 이모티콘이 삽입될 위치에 대한 위치정보를 저장하고 있는 저장부(140)를 구비한다. 여기서, 상기 이모티콘 명칭은 상기 처리부(130)가 상기 감지된 감정상태에 대응디는 이모티콘을 검색할 때 이용되고, 상기 이모티콘은 사용자에 의해 추가 또는 삭제될 수 있다. 예를 들면, 사용자가 이모티콘을 추가하고자 하는 경우 도 2에 도시된 바와 같이 사용자는 이모티콘으로 사용할 이미지를 선택하고, 선택된 이미지에 해당하는 이모티콘 명칭(메롱)을 입력한다. 그리고, 이모티콘 출력시 배치될 위치에 대한 정보입력까지 완료되면, 상기 처리부(130)는 상기 입려받은 정보를 상기 저장부(140)에 저장한다.In addition, the video call apparatus using the emoticon according to the present invention includes a storage unit 140 that stores the emoticon, the emoticon name, and location information on the position at which the emoticon is to be inserted. Here, the emoticon name is used when the processor 130 searches for an emoticon corresponding to the detected emotional state, and the emoticon may be added or deleted by the user. For example, when the user wants to add an emoticon, as shown in FIG. 2, the user selects an image to be used as an emoticon, and inputs an emoticon name corresponding to the selected image. When the emoticon output is completed, the processing unit 130 stores the received information in the storage unit 140.

또한, 본 발명에 따른 화상통화장치는 상기 카메라(110)를 통해 획득한 영상과 상기 검색된 이모티콘을 합성하는 합성부(150)를 구비한다. 상기 합성부(150)는 상기 저장부(140)로부터 판독된 위치정보에 근거하여 상기 처리부(130)에 의해 검색된 이모티콘을 상기 영상에 삽입하여 배치하고, 배치가 완료되면 상기 이모티콘과 영상을 합성한다.In addition, the video call apparatus according to the present invention includes a synthesizing unit 150 for synthesizing the image obtained through the camera 110 and the searched emoticon. The synthesizing unit 150 inserts the emoticon searched by the processing unit 130 into the image based on the position information read from the storage unit 140, and combines the emoticon with the image when the disposition is completed. .

본 발명에 따른 화상통화장치는 상기 합성부(150)에 의해 합성된 영상을 전 송하기 위한 통신부(160) 및 상기 화상통화장치의 동작에 따른 상태 및 결과를 표시하는 표시부(170)를 구비한다. 상기 통신부(160)는 무선통신을 통해 상기 합성영상을 송수신하는 무선통신모듈로 구현되거나 또는 네트워크 상의 통신 프로토콜에 맞춰 상기 합성영상을 송수신하는 유선통신모듈로 구현된다.The video call apparatus according to the present invention includes a communication unit 160 for transmitting the image synthesized by the synthesizing unit 150 and a display unit 170 displaying a state and a result according to the operation of the video call apparatus. . The communication unit 160 is implemented as a wireless communication module for transmitting and receiving the composite image through wireless communication, or a wired communication module for transmitting and receiving the composite image in accordance with a communication protocol on a network.

도 3는 본 발명에 따른 이모티콘을 이용한 화상통화방법을 설명하기 위한 흐름도이다.3 is a flowchart illustrating a video call method using an emoticon according to the present invention.

도 3에 도시된 바와 같이, 화상통화 상태에서 감지부(120)는 사용자의 음성 및/또는 표정을 통해 감정상태를 인지한다(S300, S302).As shown in FIG. 3, in the video call state, the sensing unit 120 recognizes an emotional state through a user's voice and / or facial expression (S300 and S302).

먼저, 음성을 통해 상기 감정상태를 인지하는 경우, 상기 감지부(120)의 음성인식부(122)는 마이크(100)를 통해 입력되는 음성신호로부터 언어정보를 추출하여 출력한다. 예를 들면, 사용자가 "반사"라고 발성하면 음성인식부(122)는 입력으로 받은 "반사"라는 음성신호로부터 추출한 입력패턴(특징 파라미터)와 참조패턴 간의 유사도를 측정하여 언어정보를 인식하여 텍스트 데이터 "반사"로 출력한다. 즉, 상기 음성인식부(122)는 언어(speech)로 표현되는 감정상태를 인식하여 텍스트로 출력한다.First, when recognizing the emotional state through the voice, the voice recognition unit 122 of the detection unit 120 extracts and outputs language information from the voice signal input through the microphone 100. For example, when the user speaks "reflection", the speech recognition unit 122 measures the similarity between the input pattern (feature parameter) extracted from the speech signal "reflection" received from the input and the reference pattern, and recognizes the language information to recognize the text. Output the data as "reflection". That is, the voice recognition unit 122 recognizes an emotional state expressed in speech and outputs it as text.

다음, 사용자의 표정을 통해 상기 감정상태를 인지하는 경우, 상기 감지부(120)의 안면인식부(124)는 카메라(110)를 통해 입력받은 영상으로부터 안면정보를 검출하고 검출된 안면정보를 토대로 상기 표정정보를 인지하여 그 결과를 텍스트 데이터로 출력한다. 여기서, 상기 안면정보는 눈, 코, 입과 같은 국부영역 및 얼굴영역에 대한 안면특징정보, 그리고 표정정보를 포함한다. 예를 들어, 사용자가 혀를 내밀면 상기 안면인식부(124)는 상기 카메라(110)를 통한 입력영상으로부터 혀를 내민 사용자의 안면특징을 검출하고 검출된 안면특징을 근거로 생성한 상기 표정정보 "메롱"를 포함한 안면정보를 출력한다.Next, when recognizing the emotional state through the user's facial expression, the facial recognition unit 124 of the detection unit 120 detects facial information from the image input through the camera 110 and based on the detected facial information The facial expression information is recognized and the result is output as text data. Here, the facial information includes facial feature information and facial expression information for local and face areas such as eyes, nose and mouth. For example, when the user sticks out the tongue, the face recognition unit 124 detects the facial features of the user who put his tongue out of the input image through the camera 110 and generates the facial expression information based on the detected facial features. Output facial information, including "long".

마지막으로, 상기 사용자의 음성 및 표정을 조합하여 상기 감정상태를 인지하는 경우, 상기 음성인식부(122) 및 안면인식부(124)의 출력을 조합하여 사용자의 감정상태를 판단하여 출력한다.Finally, when the emotion state is recognized by combining the voice and facial expression of the user, the emotion state of the user is determined and output by combining the outputs of the voice recognition unit 122 and the face recognition unit 124.

상기 감지부(120)의 출력 즉, 상기 언어정보 및 표정정보를 입력으로 받은 처리부(130)는 상기 감지부(120)로부터 출력되는 감정상태에 대응되는 이모티콘을 저장부(140)에서 검색한다(S305). 이때, 상기 처리부(130)는 상기 음성인식부(122) 및/또는 안면인식부(124)의 출력과 상기 저장부(140)에 저장된 이모티콘 명칭을 비교하여 이모티콘 검색을 수행한다.The processor 130 receiving the output of the sensor 120, that is, the language information and facial expression information, searches for the emoticon corresponding to the emotion state output from the sensor 120 in the storage 140 ( S305). In this case, the processor 130 compares the output of the voice recognition unit 122 and / or the face recognition unit 124 with the name of the emoticon stored in the storage unit 140 to perform an emoticon search.

상기 검색결과 상기 음성인식부(122) 및 안면인식부(124)의 출력과 일치하는 이모티콘이 검색되면 상기 처리부(130)는 상기 저장부(140)로부터 해당 이모티콘과 관련정보 즉, 위치정보를 판독(read)하여 합성부(150)로 전송한다. 해당 이모티콘을 입력으로 받은 상기 합성부(150)는 상기 저장부(140)에 저장된 이모티콘의 위치정보에 근거하여 상기 카메라(110)를 통해 획득한 영상에 상기 이모티콘을 삽입한다(S306). 상기 영상과 이모티콘의 합성이 완료되면 상기 합성부(150)는 상기 합성된 영상을 통신부(160)로 전송한다.When the emoticon matching the output of the voice recognition unit 122 and the face recognition unit 124 is found as a result of the search, the processor 130 reads the corresponding emoticon and related information, that is, location information, from the storage unit 140. (read) and transmits to the synthesis unit 150. The synthesizer 150 receiving the corresponding emoticon inserts the emoticon into an image obtained through the camera 110 based on the location information of the emoticon stored in the storage 140 (S306). When the synthesis of the image and the emoticon is completed, the synthesis unit 150 transmits the synthesized image to the communication unit 160.

예를 들어, "반사"와 일치하는 이모티콘 명칭을 가진 이모티콘이 검색되면 저장부(140)에 저장된 이모티콘의 위치정보 및 상기 안면인식부(124)로부터의 안면 정보를 토대로 도 4에 도시된 바와 같이 상기 이모티콘을 영상에 배치하여 합성한다. 그리고, 상기 합성부(150)로부터 합성영상을 전송받은 처리부(130)는 상기 합성영상을 표시부(170)로 출력하고, 동시에 상기 합성영상을 상기 통신부(160)를 통해 수신측의 화상통화장치로 송신한다. 상기 합성영상을 수신한 수신측의 화상통화장치는 상기 합성영상을 표시부(170')의 화면에 표시한다.For example, when an emoticon having an emoticon name matching "reflection" is searched, as shown in FIG. 4, based on the location information of the emoticon stored in the storage 140 and the face information from the face recognition unit 124. The emoticon is placed on the image and synthesized. In addition, the processor 130 receiving the synthesized image from the synthesizer 150 outputs the synthesized image to the display unit 170, and simultaneously transmits the synthesized image to the video call device at the receiving side through the communication unit 160. Send. The image call apparatus of the receiving side that receives the composite image displays the composite image on the screen of the display unit 170 '.

또는, "눈물" 또는 우는 소리가 마이크(100)를 통해 입력되면 상기 화상통화장치는 상기 인식한 음성에 대응되는 이모티콘을 검색하고, 안면인식부(124)로부터 전송받은 안면정보를 통해 상기 눈물모양의 이모티콘을 도 5에 도시된 바와 같이 눈 아래 삽입하여 전송한다.Alternatively, when a "tear" or a crying sound is input through the microphone 100, the video call apparatus searches for an emoticon corresponding to the recognized voice, and the tear shape through the facial information received from the face recognition unit 124. The emoticon is inserted under the eye as shown in FIG. 5 and transmitted.

이와 같이, 본 발명에서는 마이크 및 카메라를 통해 입력받은 음성 및 영상을 통해 사용자의 감정상태를 감지하여 해당 이모티콘을 영상에 삽입하여 전송하므로 사용자의 키조작이 요구되지 않으며 시각적인 효과를 향상시킬 수 있다.As described above, the present invention detects the emotional state of the user through the voice and the image input through the microphone and the camera, inserts the corresponding emoticon into the image, and transmits the emoticon, thereby improving the visual effect. .

위에서 상세히 설명한 바와 같은 본 발명에서는 사용자의 음성 및 표정을 통해 사용자의 감정상태를 인식하고, 그에 해당하는 이모티콘을 입력영상에 삽입하여 전송한다. 따라서, 사용자의 시각적인 욕구를 충족시킬 수 있고, 사용자의 조작없이 손쉽게 이모티콘을 삽입할 수 있어 사용자에게 편의를 제공한다.In the present invention as described in detail above, the emotion state of the user is recognized through the user's voice and facial expression, and the corresponding emoticon is inserted into the input image and transmitted. Therefore, the visual desire of the user can be satisfied, and the emoticon can be easily inserted without the user's manipulation to provide the user with convenience.

Claims

A detector for detecting an emotional state of the speaker;

A processor for inserting and transmitting an emoticon corresponding to the detected emotional state into an image obtained through a camera;

And a synthesis unit for inserting the emoticon into the image based on the facial information extracted from the image and the location information into which the emoticon is to be inserted.

The method of claim 1,

And the sensing unit comprises at least one of a voice recognition unit for extracting language information from an externally input voice signal and a face recognition unit for extracting the face information from the received image.

3. The method of claim 2,

The facial information is a video call device using an emoticon, characterized in that it comprises facial information and feature information on local areas such as eyes, nose, mouth, and facial expression information generated based on the feature information.

The method of claim 1,

And a storage unit for storing the emoticon, language information and facial expression information corresponding to the emoticon, and location information to which the emoticon is to be inserted.

The method of claim 1,

And a communication unit for transmitting the image synthesized by the synthesis unit.

Detecting an emotional state of the user;

Inserting the corresponding emoticon according to the emotional state into an image obtained through a camera and transmitting the emoticon;

Wherein the transmitting comprises:

And inserting the emoticon into the image based on the facial information extracted from the image and the location information into which the emoticon is to be inserted.

The method according to claim 6,

The emotion state detection step is a video call method using an emoticon, characterized in that by voice recognition to extract the language information indicating the emotional state from the received voice signal.

The method according to claim 6,

The emotional state detection step is a video call method using an emoticon, characterized in that by facial recognition to extract the facial information from the image.

The method of claim 8, wherein the emotional state detection step,

Extracting facial feature information from the image;

And generating facial expression information on the basis of the extracted feature information.

The method according to claim 6,

The emotion state detection step is a video call method using an emoticon, characterized in that the voice recognition to extract language information from the received voice signal and the face recognition to extract the facial information from the image.

The method of claim 10,

The facial information is a video call method using an emoticon, characterized in that it comprises facial information and feature information on local areas such as eyes, nose, mouth, and facial expression information based on the feature information.

The method of claim 6, wherein the transmitting step,

Searching for an emoticon corresponding to the emotional state;

And placing the retrieved emoticon on the input image to synthesize the emoticon.