KR100853122B1

KR100853122B1 - Method and system for providing Real-time Subsititutive Communications using mobile telecommunications network

Info

Publication number: KR100853122B1
Application number: KR1020070014756A
Authority: KR
Inventors: 허재회
Original assignee: 주식회사 인스프리트
Priority date: 2007-02-13
Filing date: 2007-02-13
Publication date: 2008-08-20
Also published as: KR20080075625A

Abstract

The present invention relates to an alternative video service method and system using a mobile communication network, which can provide a higher quality video call service by analyzing voice and video of a subject in real time during a video call and replacing it with a character. It relates to an alternative video service method and system used.

Alternative video service system using a mobile communication network of the present invention, a video processor for performing a real-time face tracking function for extracting the change of the feature point information of the image by analyzing the received video video signal, and the audio information by analyzing the received audio signal An RSC server including an audio processor configured to perform a real-time lip-sync function for extracting a video signal, and an RSC encoding module for generating a replacement video which is a video in real time by combining the video feature information and audio information with a character content; It features.

Mobile network, real time, alternative video

Description

Method and system for real-time alternative video service using mobile communication network {Method and system for providing Real-time Subsititutive Communications using mobile telecommunications network}

도 1은 종래 기술에 따른 동영상 제작 시스템의 구성도.1 is a block diagram of a video production system according to the prior art.

도 2는 본 발명의 제 1 실시예에 따른 대체 영상 서비스 시스템의 구성도.2 is a block diagram of an alternative video service system according to a first embodiment of the present invention;

도 3은 본 발명의 제 1 실시예에 따른 대체 영상 통화 서비스 플랫폼의 구성도.3 is a block diagram of an alternative video calling service platform according to a first embodiment of the present invention;

도 4는 본 발명의 제 1 실시예에 따른 대체 영상 통화 서버의 구성도.4 is a configuration diagram of an alternative video call server according to a first embodiment of the present invention.

도 5a 및 도 5b는 본 발명에 따른 대체 영상 서비스에 이용되는 특징점이 표시된 입력 영상과 캐릭터 영상의 예시도.5A and 5B are exemplary diagrams of an input image and a character image on which feature points are used for a substitute image service according to the present invention.

도 6은 본 발명의 제 1 실시예에 따른 대체 영상 서비스 개념도.6 is a conceptual diagram of an alternative video service according to the first embodiment of the present invention;

도 7은 본 발명의 제 1 실시예에 따른 대체 영상 서비스 방법에서의 호 처리 순서도.7 is a call processing flowchart of an alternative video service method according to the first embodiment of the present invention.

도 8a 및 도 8b는 본 발명의 제 1 실시예에 따른 2G 단말기와 3G 단말기 사이의 대체 영상 서비스 방법 예시도.8A and 8B illustrate an alternative video service method between a 2G terminal and a 3G terminal according to the first embodiment of the present invention.

도 9는 본 발명의 제 1 실시예에 따른 대체 영상 서비스용 이동통신단말기의 디스플레이 구성도.9 is a view illustrating a configuration of a mobile communication terminal for an alternative video service according to a first embodiment of the present invention.

도 10은 본 발명의 제 2 실시예에 따른 대체 영상 통화 서비스 시스템의 구성도.10 is a configuration diagram of an alternative video call service system according to a second embodiment of the present invention.

도 11은 본 발명의 제 2 실시예에 따른 대체 영상 서비스용 이동통신단말기의 구성도.11 is a block diagram of a mobile communication terminal for an alternative video service according to a second embodiment of the present invention.

도 12는 본 발명의 제 2 실시예에 따른 대체 영상 서비스 방법에서의 호 처리 순서도.12 is a call processing flowchart in an alternative video service method according to a second embodiment of the present invention.

도 13은 본 발명의 제 3 실시예에 따른 대체 영상 서비스 방법에서의 호 처리 순서도.13 is a call processing flowchart of an alternative video service method according to a third embodiment of the present invention.

본 발명은 이동통신망을 이용한 대체 영상 서비스 방법 및 시스템에 관한 것으로서, 화상 통화 중 실시간으로 피사체의 음성과 영상을 분석하고 이를 캐릭터로 대체시킴으로써 보다 고품질의 영상 통화 서비스를 제공할 수 있는, 이동통신망을 이용한 대체 영상 서비스 방법 및 시스템에 관한 것이다.The present invention relates to an alternative video service method and system using a mobile communication network, which can provide a higher quality video call service by analyzing voice and video of a subject in real time during a video call and replacing it with a character. It relates to an alternative video service method and system used.

대체 영상이라 함은 캐릭터(또는 아바타)와 같이 피사체의 실제 영상을 대신하는 영상을 말하는데, 이러한 대체 영상은 현재 컴퓨터 애니메이션, 이동통신, 방송 등의 분야에서 널리 이용되고 있으며, 이와 관련하여 사람의 실제 영상과 유사한 대체 영상을 제공하기 위한 많은 시도가 있었다.The substitute image refers to an image that replaces the actual image of the subject, such as a character (or an avatar). The substitute image is widely used in the fields of computer animation, mobile communication, and broadcasting. Many attempts have been made to provide alternate images similar to the images.

이러한 종래 기술의 하나로서 2000년 9월 20일자로 출원된 국내 특허 출원 제 10-2000-0055309 호 "3차원 캐릭터의 동작, 얼굴 표정, 립싱크 및 립싱크된 음성 합성을 지원하는 3차원 동영상 저작 도구의 제작 시스템 및 방법"은, 출력 음성과 대체 영상의 입술의 움직임을 일치시키는 립싱크 기능을 구비한 동영상 제작 시스템에 관한 발명으로서 도 1에 도시된 바와 같은 구성을 가진다.Korean Patent Application No. 10-2000-0055309, filed Sep. 20, 2000, describes a three-dimensional video authoring tool that supports motion, facial expression, lip-syncing, and lip-synced speech synthesis. Production system and method "is an invention related to a video production system having a lip sync function for matching the movement of the lips of an output voice and a substitute image, and has a configuration as shown in FIG.

도 1의 동영상 제작 시스템(100)은, 사용자가 입력한 텍스트 정보를 음성 정보로 변환하는 TTS(Text To Speech) 기능을 수행하는 음성 정보 변환 엔진(110)과, 음성에 포함된 각각의 음소에 대응하는 캐릭터의 입 모양과 미리 정해진 표정 모델에 따라 설정된 다수의 얼굴 모양에 대한 데이터베이스로 구성되는 음성 라이브러리(130)와, 음성 정보 변환 엔진(110)으로부터 입력된 초성, 중성, 종성의 각 음소 정보를 추출하고, 이를 음성 라이브러리(130)에 저장된 캐릭터의 입 모양 및 얼굴 모양과 매칭시켜 화면에 표시될 캐릭터의 입 모양과 얼굴 모양을 변화시키는 립싱크 생성 엔진(120)과, 캐릭터가 취할 수 있는 머리, 팔, 다리 등 몸체의 움직임에 관한 캐릭터 모션 정보를 데이터베이스로 구성한 모션 라이브러리(150)와, 모션 라이브러리(150)에 저장된 모션 정보에 맞추어 캐릭터 몸체의 모션을 설정하는 애니메이션 생성 엔진(140)과, 립싱크 생성 엔진(120)에 의해 생성된 캐릭터의 입 모양, 얼굴 모양과 애니메이션 생성 엔진(140)에 의해 생성된 캐릭터 몸체의 모션을 합성하여 동영상의 형태로 출력하는 합성 엔진(160)으로 구성된다.The video production system 100 of FIG. 1 includes a voice information conversion engine 110 that performs a text to speech (TTS) function for converting text information input by a user into voice information, and each phoneme included in the voice. The phoneme information of the voice library 130 composed of a database of a plurality of face shapes set according to a mouth shape of a corresponding character and a predetermined facial expression model, and each phoneme information of the initial, neutral, and final characters inputted from the voice information conversion engine 110. The lip-sync generating engine 120 to change the mouth shape and the face shape of the character to be displayed on the screen by extracting and matching the mouth shape and the face shape of the character stored in the voice library 130 and the head that the character can take. The motion library 150 includes a database of character motion information related to the movement of the body, such as arms and legs, and the motion information stored in the motion library 150. Combining the motion of the character body generated by the animation creation engine 140 with the animation generation engine 140 that sets the motion of the chur character body, the mouth shape, the face shape of the character generated by the lip-sync generation engine 120, and the animation generation engine 140. It consists of a synthesis engine 160 for outputting in the form of a video.

그런데, 도 1의 종래 기술의 발명은 입력되는 음성의 음소에 맞추어 캐릭터의 입 모양을 변화시킬 수는 있으나 캐릭터의 얼굴 모양과 캐릭터 몸체의 모션은 입력 음성의 음소에 연동되지 않으므로, 결국 미리 저장된 다양한 형태의 얼굴 모양과 모션 중 사용자가 지정한 것을 입 모양에 결합시키기 때문에 전체적인 캐릭터의 얼굴 표정과 동작이 부조화스러울 뿐 아니라, 실제 영상이 하나의 파일 단위로 입력되고 캐릭터 영상도 하나의 파일 단위로 생성되고 두 개의 파일을 합쳐서 동영상을 생성하는 순차적 방식에 의존하므로, 비디오 스트림 및 오디오 스트림의 형태로 입력되는 실제 영상의 움직임에 맞추어 캐릭터 얼굴의 표정과 캐릭터의 모션이 실시간으로 변경되는 자연스러운 동영상 캐릭터의 생성이 곤란하므로, 실시간 통신이 필수적인 이동통신망에서 통화자를 대신하는 대체 영상으로 이용하기에는 부적절하다.By the way, the invention of the prior art of Figure 1 can change the shape of the mouth of the character in accordance with the phoneme of the input voice, but the face shape of the character and the motion of the body of the character is not linked to the phoneme of the input voice, so in advance Since the user's facial expressions and motions are combined with the shape of the mouth, the overall facial expression and motion of the character is not incongruous, and the actual image is input in one file unit, and the character image is generated in one file unit. Since it relies on the sequential method of joining two files together to create a video, the creation of a natural video character in which the facial expression of the character and the motion of the character are changed in real time according to the movement of the actual video input in the form of a video stream and an audio stream is achieved. It is difficult, so real-time communication is essential It is not adequate to replace the image used to call in behalf of the trust.

도 1의 종래 기술과 유사한 방식의 기술을 이동통신망에 적용한 선행 기술들이 다수 존재하지만 이들은 모두 도 1에서와 같은 오프라인 상에서의 캐릭터 생성 방법을 단순히 이동통신망에 적용할 수 있다고만 주장하는 수준에 불과하며, 따라서 이동통신망을 통해 실시간 입력되는 비디오 스트림과 오디오 스트림을 캐릭터 영상과 실시간으로 결합하여 제공할 수 있는 실시간 캐릭터 생성 및 대체 영상 제공 방법이 실제로 구현된 예가 없는 상황이다.There are a number of prior arts that apply a technique similar to that of the prior art of FIG. 1 to a mobile communication network, but these are all merely insisting that the offline character generation method as shown in FIG. 1 can be simply applied to a mobile communication network. Therefore, there is no example in which a real-time character generation and replacement image providing method capable of combining a video stream and an audio stream input in real time through a mobile communication network in real time with a character image is provided.

실제로, 3D MAX, MAYA 등 오프라인 방식(즉, 비실시간 방식)의 컴퓨터 그래픽 프로그램들의 경우 기본적으로 640*480의 해상도를 가지는 VGA급 영상을 대상으로 하는데, 풀컬러 영상을 표시하기 위해서는 픽셀당 3byte(=bit)의 데이터량이 필요하며 따라서 한 화면의 데이터량이 640*480*24=7372800bit=921Kbyte가 된다. 오프라인 상에서 워크스테이션이나 PC를 이용하여 상기 데이터를 처리하는데에는 별 다른 어려움이 없지만, 동일 영상 데이터를 이동통신망을 이용하여 교환한다고 할 때 문제가 발생하게 된다.In fact, off-line computer graphics programs such as 3D MAX, MAYA, etc., target VGA-level images with a resolution of 640 * 480 by default. To display full-color images, 3 bytes per pixel ( = bit) data amount, so the data amount of one screen is 640 * 480 * 24 = 7372800bit = 921Kbyte. There is no difficulty in processing the data using a workstation or a PC offline, but a problem occurs when the same image data is exchanged using a mobile communication network.

예컨대, 화상 통화를 지원하는 3G 이통통신망인 WCDMA망에 적용되는 MPEG4 압축방법에서 지원하는 전송 속도(64Kbps)로 상기 영상 데이터를 전송할 경우, t=640*480*24/64000=115.2(s)로 약 2분 미만의 시간이 소요되어 버리므로 이동통신망에 적용하기에는 사실상 불가능하다.For example, when the video data is transmitted at a transmission rate (64 Kbps) supported by the MPEG4 compression method applied to a WCDMA network that supports a video call, t = 640 * 480 * 24/64000 = 115.2 (s). Since it takes less than about 2 minutes, it is virtually impossible to apply to a mobile communication network.

결국, 이동통신의 경우 전송 속도와 전송 대역폭의 한계로 인해 각 이동통신망에 허용되는 범위 내에서 비디오 신호와 오디오 신호를 모두 고비율로 압축하여 전송해야 하고, 실시간으로 대체 영상을 제공하기 위해서는 비디오 스트림의 형태로 입력되는 연속적인 비디오 신호에 포함된 영상에서 특징점들을 추출하는 동시에 압축된 오디오 신호에서 통화자의 음성만을 추출하고, 추출된 데이터를 대체 영상용 캐릭터와 합성하여 대체 영상을 실시간으로(즉, 전체 지연 시간이 1초 미만일 것) 생성할 수 있는 신속하고도 간단한 시스템 및 방법이 요구된다.After all, in case of mobile communication, due to the limitation of transmission speed and transmission bandwidth, both video signal and audio signal should be compressed and transmitted at a high rate within the allowable range of each mobile communication network. Extract the feature points from the image included in the continuous video signal input in the form of, and extract only the voice of the caller from the compressed audio signal, and synthesize the extracted data with the character for the substitute image in real time (that is, There is a need for a quick and simple system and method capable of generating an overall latency of less than one second.

또한, 이동통신의 경우 사용자의 이동, 전파 장애물의 출현, 전파 음영 지역의 존재 등으로 인해 통화 중에도 데이터 전송 상태가 급변하는 경우가 많기 때문에, 이와 같은 경우에 대비한 대체 영상 서비스 방법 및 시스템이 요구된다.In addition, in the case of mobile communication, the data transmission state is often changed even during a call due to the movement of a user, the appearance of radio wave obstacles, the presence of a radio wave shadow area, and so on. do.

나아가 실시간으로 제공되는 대체 영상 서비스의 실용화를 위해서는, 이동통신만이 가지는 특성들을 최대한 활용하여 보다 간단하고 신속한 방법으로 대체 영상을 생성하고 전송할 수 있는 방법과 시스템이 강구되어야 한다.Furthermore, in order to realize the alternative video service provided in real time, a method and a system capable of generating and transmitting a replacement video in a simpler and more rapid manner should be devised by utilizing the characteristics of mobile communication only.

상기 문제점을 고려하여, 본 발명은 실시간 통신의 특성이 중요시되는 이동통신망에 최적화된 형태의 서비스를 제공하기 위해, 화상 통화 중 실시간으로 피사체의 음성과 영상을 분석하고 이를 캐릭터로 대체시킴으로써 보다 고품질의 영상 통화 서비스를 제공할 수 있는, 이동통신망을 이용한 대체 영상 서비스 방법 및 시스템을 제공하는 것을 목적으로 한다.In view of the above problems, the present invention analyzes voice and video of a subject in real time during a video call and replaces it with a character in order to provide a service optimized for a mobile communication network in which the characteristics of real-time communication are important. An object of the present invention is to provide an alternative video service method and system using a mobile communication network, which can provide a video call service.

또한, 본 발명은 통신 환경에 따라 데이터 전송 상태가 급변하는 이동통신망에 최적화된 형태의 서비스를 제공하기 위해, 오디오 신호와 비디오 신호의 품질 상태에 따라 오디오 신호와 비디오 신호가 상호 보완적으로 작용함으로써 통신 환경의 변화에도 불구하고 대체 영상을 지속적으로 생성하여 전송할 수 있는, 이동통신망을 이용한 대체 영상 서비스 방법 및 시스템을 제공하는 것을 목적으로 한다.In addition, the present invention provides a service optimized for a mobile communication network in which the data transmission status changes rapidly according to the communication environment, the audio signal and the video signal complementary to each other according to the quality of the audio signal and the video signal It is an object of the present invention to provide an alternative video service method and system using a mobile communication network that can continuously generate and transmit an alternative video despite a change in a communication environment.

또한, 본 발명은 비디오 신호를 위주로 하여 대체 영상을 생성하고 오디오 신호에 근거한 표준화된 입 모양으로 대체 영상의 입 모양을 보완함으로써, 보다 전달력이 우수한 대체 영상을 생성하여 전송할 수 있는, 이동통신망을 이용한 대체 영상 서비스 방법 및 시스템을 제공하는 것을 목적으로 한다.The present invention also provides a mobile communication network capable of generating and transmitting a substitute image having a higher transmission power by generating a substitute image based on a video signal and supplementing the mouth shape of the substitute image with a standardized mouth shape based on the audio signal. An object of the present invention is to provide an alternative video service method and system.

또한, 본 발명은 발음에 따른 입 모양을 위주로 하여 대체 영상을 생성하여 전송함으로써 청각 장애인의 경우에도 대체 영상 통화가 가능한, 이동통신망을 이용한 대체 영상 서비스 방법 및 시스템을 제공하는 것을 목적으로 한다.In addition, an object of the present invention is to provide an alternative video service method and system using a mobile communication network capable of alternative video call in the case of a hearing impaired by generating and transmitting an alternative video with the mouth shape according to the pronunciation.

또한, 본 발명은 하나의 이동통신단말기가 하나의 이동통신사용자에게 거의 전적으로 할당되는 이용되는 이동통신의 특성을 고려하여, 이동통신사용자별로 학 습된 개인 메타데이터를 생성함으로써 대체 영상을 보다 간단하고 신속하게 생성하여 제공할 수 있는, 이동통신망을 이용한 대체 영상 서비스 방법 및 시스템을 제공하는 것을 목적으로 한다.In addition, the present invention provides a simpler and faster alternative image by generating personal metadata learned for each mobile communication user in consideration of the characteristics of the mobile communication in which one mobile communication terminal is allocated almost exclusively to one mobile communication user. An object of the present invention is to provide an alternative video service method and system using a mobile communication network, which can be generated and provided.

또한, 본 발명은 대체 영상 통화시에 상대방에게 전달되는 자신의 대체 영상의 전송 상태를 확인할 수 있는, 이동통신망을 이용한 대체 영상 서비스 방법 및 시스템을 제공하는 것을 목적으로 한다.In addition, an object of the present invention is to provide an alternative video service method and system using a mobile communication network, which can confirm the transmission status of its own alternative video delivered to the other party in the alternative video call.

또한, 본 발명은 화상 통신 기능이 제공되지 않는 기존의 2G 단말기와의 통화시에도 대체 영상 서비스를 제공할 수 있는, 이동통신망을 이용한 대체 영상 서비스 방법 및 시스템을 제공하는 것을 목적으로 한다.Another object of the present invention is to provide an alternative video service method and system using a mobile communication network, which can provide an alternative video service even when a call is made with an existing 2G terminal which is not provided with a video communication function.

또한, WCDMA, HSDPA, IP-TV, 유선전화망, 인터넷망 등 다양한 통신망을 연동할 수 있는 3세대 통신환경 및 그 이상의 진화된 통신환경에서도 적용 가능한 구조를 가지는 실시간 대체 영상 서비스 방법 및 시스템을 제공하는 것을 목적으로 한다.In addition, it provides a real-time alternative video service method and system having a structure that can be applied to the third generation communication environment that can interoperate various communication networks, such as WCDMA, HSDPA, IP-TV, wired telephone network, Internet network and more advanced communication environment For the purpose of

본 발명의 상기한 목적 및 그 이상의 목적과 효과는 이하에서 상술되는 발명의 구성에 대한 상세한 설명 부분에서 더욱 명확히 이해될 수 있을 것이다.The above and further objects and effects of the present invention will be more clearly understood in the detailed description of the construction of the invention detailed below.

상기 목적을 달성하기 위해 본 발명의 이동통신망을 이용한 대체 영상 서비스 시스템은, 수신된 동영상 비디오 신호를 분석하여 영상의 특징점 정보의 변화를 추출하는 실시간 페이스 트래킹 기능을 수행하는 비디오 처리부와, 수신된 오디오 신호를 분석하여 음성 정보를 추출하는 실시간 립싱크 기능을 수행하는 오디오 처리부와, 캐릭터 컨텐츠에 상기 영상 특징 정보와 음성 정보를 결합하여 실시간으로 동영상인 대체 영상을 생성하는 RSC 인코딩 모듈을 포함하는 RSC 서버가 상기 이동통신망에 연결된 것을 특징으로 한다.In order to achieve the above object, the alternative video service system using the mobile communication network of the present invention includes a video processor for performing a real-time face tracking function to analyze the received video signal and extract the change of the feature point information of the image, and the received audio An RSC server including an audio processor configured to perform a real-time lip sync function by analyzing a signal and extracting voice information, and an RSC encoding module configured to generate a substitute image which is a video in real time by combining the image feature information and voice information with character content; It is characterized in that connected to the mobile communication network.

또한, 상기 비디오 처리부는, 상기 수신된 비디오 신호를 디코딩하는 비디오 디코더와, 디코딩된 비디오 신호에 포함된 복원 영상의 윤곽 특징점 및 개별 특징점 정보의 변화를 추출하는 RFT 엔진을 포함하며, 상기 RFT 엔진은, 상기 복원 영상의 특징점 정보와 미리 저장된 사람 얼굴의 기본적 윤곽의 특징점 정보를 비교하여 상기 복원 영상이 사람의 얼굴 영상인지를 판별하는 페이스 분석 모듈과, 상기 복원 영상이 사람의 얼굴 영상으로 판별될 경우 얼굴 부분의 개별 특징점의 변화를 실시간으로 추적하는 페이스 트래킹 모듈과, 상기 복원 영상이 사람의 얼굴 영상으로 판별될 경우 얼굴 부분의 윤곽의 특징점 정보를 실시간으로 추적하는 모션 트래킹 모듈을 포함하며, 상기 페이스 트래킹 모듈과 상기 모션 트래킹 모듈에서 추적한 상기 특징점들의 변화는 대체 영상 서비스용 캐릭터의 특징 정보로서 상기 RSC 인코딩 모듈로 전송되는 것을 특징으로 한다.The video processor may include a video decoder that decodes the received video signal, and an RFT engine that extracts a change of contour feature points and individual feature point information of a reconstructed image included in the decoded video signal. And a face analysis module that determines whether the restored image is a human face image by comparing feature point information of the restored image with feature points of basic contours of a human face, and when the restored image is determined as a human face image. A face tracking module for tracking changes in individual feature points of the face in real time, and a motion tracking module for tracking feature point information of the contours of the face in real time when the reconstructed image is identified as a human face image. Of the feature points tracked by the tracking module and the motion tracking module. Screen is characterized in that the transmission to the RSC encoding module as the feature information of the character for replacing video services.

또한, 상기 오디오 처리부는 상기 수신된 오디오 신호를 디코딩하는 오디오 디코더와, 디코딩된 오디오 신호에서 음성 신호를 추출하여 실시간 립싱크를 수행하는 RLS 엔진을 포함하며, 상기 RLS 엔진은, 입력된 오디오 신호를 필터링하여 미리 설정된 주파수 범위의 신호만을 음성 신호로 추출하는 음성 분석 모듈과, 상기 음성 분석 모듈에서 추출한 음성 신호에 입 모양 정보 데이터베이스를 매칭시켜 실 시간으로 립싱크 변수를 추출하는 RTLS 모듈을 포함하며, 상기 립싱크 변수는 상기 대체 영상 서비스용 캐릭터의 음성을 특정하는 캐릭터 음성 정보로서 상기 RSC 인코딩 모듈로 전송되는 것을 특징으로 한다.The audio processor may include an audio decoder that decodes the received audio signal, and an RLS engine that extracts a voice signal from the decoded audio signal and performs a real-time lip sync. The RLS engine filters the input audio signal. And an RTLS module for extracting a lip sync variable in real time by matching a mouth shape information database to a voice signal extracted from the voice analysis module and extracting only a signal having a predetermined frequency range as a voice signal. The variable may be transmitted to the RSC encoding module as character voice information specifying voice of the substitute video service character.

또한, 상기 RSC 인코딩 모듈은, 상기 캐릭터 특징 정보와 상기 캐릭터 음성 정보를 수신하고 이들을 상기 대체 영상 서비스용 캐릭터와 결합하여 실시간으로 대체 영상을 생성하는 RSC 애니메이션 엔진과, 상기 생성된 대체 영상을 렌더링하는 렌더러와, 상기 렌더링된 대체 영상을 인코딩하여 전송하는 비디오 인코더를 포함하는 것을 특징으로 한다.In addition, the RSC encoding module is configured to receive the character feature information and the character voice information and combine them with the character for the substitute image service to generate a substitute image in real time, and to render the generated substitute image; And a renderer and a video encoder for encoding and transmitting the rendered substitute image.

또한, 상기 RTLS 모듈에서 매칭되는 입 모양 정보 데이터베이스는 모든 종류의 모음에 대한 입 모양, 초성과 종성의 차이에 따른 모음의 변화된 입 모양, 각 글자의 구성에 따른 상기 입 모양들의 변화 과정에 관한 데이터, 및 음성 신호의 주파수 범위에 따라 변화되는 입 모양 데이터를 포함하는 것을 특징으로 한다.In addition, the mouth shape information database matched in the RTLS module is the mouth shape for all kinds of vowels, the changed mouth shape of the vowel according to the difference between the initial and the finality, the data on the process of changing the mouth shape according to the composition of each letter , And mouth shape data changed according to the frequency range of the voice signal.

또한, 상기 RTLS 모듈에서 매칭되는 입 모양 정보 데이터베이스는 상기 시스템에 의해 설정된 기준 주파수 범위의 음성에 맞추어 표준화된 입 모양 데이터와 상기 기준 주파수보다 높거나 낮은 주파수 범위의 음성에 맞추어 변화된 입 모양 데이터를 저장하고 있으며, 상기 RTLS 모듈은 상기 음성 분석 모듈에서 추출한 상기 음성 신호의 주파수 범위가 상기 기준 주파수 범위와 다를 경우 입력된 음성 신호의 주파수 범위에 맞추어 립싱크 변수를 추출하는 것을 특징으로 한다.In addition, the mouth shape information database matched by the RTLS module stores mouth shape data normalized to the voice of the reference frequency range set by the system and mouth shape data changed to the voice of the frequency range higher or lower than the reference frequency. The RTLS module is configured to extract a lip sync parameter according to the frequency range of the input voice signal when the frequency range of the voice signal extracted from the voice analysis module is different from the reference frequency range.

또한, 상기 RTLS 모듈은 상기 캐릭터 음성 정보에 포함된 음성 파라미터 DNA에 따라 생성되는 상기 립싱크 변수를 각각의 사용자 단말기별로 분류하여 개인화 된 메타 데이터로 반복 저장하여 학습하고, 향후 동일 사용자 단말기로부터의 오디오 신호 수신시에 상기 학습된 개인화된 메타 데이터를 디폴트 음성 모델로 이용하여 상기 수신된 오디오 신호로부터 상기 립싱크 변수를 추출하는 것을 특징으로 한다.In addition, the RTLS module classifies the lip sync variable generated according to the voice parameter DNA included in the character voice information for each user terminal, repeatedly stores the data as personalized metadata, and learns the audio signal from the same user terminal in the future. The lip sync variable is extracted from the received audio signal using the learned personalized metadata as a default speech model upon reception.

또한, 상기 RSC 인코딩 모듈은, 상기 영상 특징 정보를 기준으로 상기 대체 영상의 윤곽과 얼굴을 생성하고, 상기 음성 정보에 포함된 립싱크 변수를 이용하여 상기 대체 영상의 입 모양을 보정하는 것을 특징으로 한다.The RSC encoding module may generate an outline and a face of the substitute image based on the image feature information, and correct a mouth shape of the substitute image by using a lip sync parameter included in the voice information. .

또한, 상기 RSC 인코딩 모듈은, 상기 음성 정보를 기준으로 상기 대체 영상의 입 모양을 생성하며, 상기 영상 특징 정보를 기준으로 상기 입 모양을 제외한 대체 영상의 얼굴 부분을 생성하는 것을 특징으로 한다.The RSC encoding module may generate a mouth shape of the substitute image based on the voice information, and generate a face portion of the substitute image except the mouth shape based on the image feature information.

또한, 상기 시스템은 상기 RSC 서버와 연동되어 대체 영상 통화 호를 위한 호 처리 및 대체 영상 서비스 관리를 수행하며, 대체 영상 서비스 가입자 정보 및 가입자별 캐릭터 정보를 포함하는 대체 영상 서비스 정보를 구비하는 SGN 서버를 더 포함하는 것을 특징으로 한다.In addition, the system interworks with the RSC server to perform call processing and replacement video service management for a replacement video call, and has an SGN server including replacement video service information including subscriber information and character information for each subscriber. It characterized in that it further comprises.

또한, 상기 시스템은 WEB/WAP 서버를 이용하여 사용자 단말기들과 연결되며, 상기 사용자 단말기로부터 요청받은 대체 영상 서비스용 캐릭터의 등록과 대체 영상 서비스 가입을 수행하며, 상기 대체 영상 서비스 가입과 대체 영상 서비스용 캐릭터의 등록에 관한 정보를 상기 SGN 서버로 전달하는 CMS 서버; 및 상기 등록된 캐릭터를 저장하기 위한 저장장치를 더 포함하는 것을 특징으로 한다.In addition, the system is connected to the user terminals using a WEB / WAP server, and performs the registration of the replacement image service and subscription of the character for the alternative video service requested from the user terminal, the alternative video service subscription and the alternative video service A CMS server for transmitting information on registration of a dragon character to the SGN server; And a storage device for storing the registered character.

또한, 상기 시스템은 멀티미디어메시지의 생성, 저장 및 전송을 수행하는 MMSC; 및 상기 MMSC에 저장된 메시지에 포함된 오디오 신호를 분석하여 메시지 음성 정보를 추출하는 비실시간 립싱크 기능을 수행하는 메시지 오디오 처리부와, 캐릭터 컨텐츠에 상기 메시지 음성 정보를 결합하여 비실시간으로 동영상 캐릭터를 생성하고 생성된 동영상 캐릭터를 상기 MMSC로 전송하는 대체 영상 메시지 인코딩 모듈을 구비한 LSS 서버를 더 포함하는 것을 특징으로 한다.In addition, the system MMSC for generating, storing and transmitting the multimedia message; And a message audio processor for performing a non-real time lip sync function by analyzing the audio signal included in the message stored in the MMSC and extracting the message voice information, and generating the video character in non real time by combining the message voice information with character content. And an LSS server having an alternative video message encoding module for transmitting the generated video character to the MMSC.

또한, 상기 시스템에서 상기 RSC 서버와 상기 LSS 서버와 상기 SGN 서버와 상기 CMS 서버와 상기 저장장치는 RSC 서비스 플랫폼을 구성하며, 상기 RSC 서비스 플랫폼은 상기 이동통신망과의 연동을 위한 미디어 게이트웨이를 더 포함하는 것을 특징으로 한다.In the system, the RSC server, the LSS server, the SGN server, the CMS server and the storage device constitute an RSC service platform, and the RSC service platform further includes a media gateway for interworking with the mobile communication network. Characterized in that.

한편, 상기 대체 영상 시스템을 이용한 본 발명에 따른 대체 영상 서비스 방법은, 이동통신용 발신 단말기로부터의 대체 영상 통화 호 설정 요청에 따라 상기 위치등록기에서 대체 영상 서비스 가입 여부를 판단하는 단계; 상기 판단 결과 상기 발신 단말기가 대체 영상 서비스 가입자의 단말기일 경우, 상기 코어 망에서 대체 영상 통화 호 설정을 수행하고, 상기 RSC 서버에서 상기 발신 단말기에 대해 등록된 캐릭터를 활성화하여 대체 영상 서비스를 준비하는 단계; 상기 RSC 서버에서 상기 발신 단말기로부터 입력된 상기 비디오 신호와 상기 오디오 신호로부터 각기 추출된 상기 영상 특징 정보와 상기 음성 정보를 상기 활성화된 캐릭터와 결합하여 실시간으로 대체 영상을 생성하는 단계; 및 상기 RSC 서버에서 상기 코어 망을 통해 착신 단말기로 상기 대체 영상을 전송함에 의해 실시간 대체 영상을 이용한 화상 통화를 수행하는 단계를 포함하는 것을 특징으로 한다.On the other hand, the alternative video service method according to the present invention using the alternative video system, the method comprising the steps of determining whether to subscribe to the alternative video service in the location register according to the request for the alternative video call call from the mobile terminal for the mobile communication; If the originating terminal is a terminal of an alternative video service subscriber, as a result of the determination, an alternative video call call is configured in the core network, and the RSC server activates a registered character for the calling terminal to prepare for an alternative video service. step; Generating a substitute image in real time by combining the image characteristic information and the audio information extracted from the video signal and the audio signal input from the calling terminal from the RSC server with the activated character; And performing a video call using a real-time substitute image by transmitting the substitute image to the called terminal through the core network in the RSC server.

한편, 본 발명에 따른 대체 영상 서비스용 이동통신단말기는, 사용자의 실제 영상 획득을 위한 카메라부와, 오디오 신호의 입출력을 담당하는 마이크로폰과 스피커를 포함하는 오디오부와, 대체 영상 서비스 요청을 위한 소프트키와 핫키 중 적어도 하나가 구비된 키입력부와, 상기 실제 영상과 대체 영상을 디스플레이하기 위한 표시부와, 상기 대체 영상 생성을 수행하는 RSC 모듈과, 상기 대체 영상용 캐릭터 및 표준 입 모양 정보를 포함하는 데이터를 저장하는 데이터 메모리와, 상기 대체 영상 생성 기능을 수행하기 위한 소프트웨어 프로그램을 포함하는 단말기 동작 프로그램을 저장하는 프로그램 메모리와, 전송 신호의 송신과 수신을 수행하는 송수신기, 및 상기 구성요소들과 연결되어 각 구성요소의 기능을 제어하는 적어도 하나의 마이크로프로세서로 구성되는 제어부를 포함하도록 구성되는 것을 특징으로 한다.Meanwhile, the mobile communication terminal for an alternative video service according to the present invention includes a camera unit for acquiring an actual image of a user, an audio unit including a microphone and a speaker for input / output of an audio signal, and a software for requesting an alternative video service. A key input unit including at least one of a key and a hot key, a display unit for displaying the real image and the substitute image, an RSC module for generating the substitute image, and a character and standard mouth shape information for the substitute image; A data memory for storing data, a program memory for storing a terminal operating program including a software program for performing the substitute image generating function, a transceiver for transmitting and receiving transmission signals, and the components At least one microcontroller to control the function of each component It characterized in that a control unit is configured to be composed of a processor.

또한, 상기 대체 영상 서비스용 이동통신단말기를 이용한 대체 영상 서비스 방법은, 상기 이동통신단말기의 상기 소프트키와 상기 핫키 중 적어도 하나의 키를 이용한 대체 영상 서비스 요청에 따라, 상기 이동통신단말기에서 상기 대체 영상을 생성하는 단계; 상기 대체 영상 서비스 요청에 따라, 이동통신망의 위치등록기에서 대체 영상 서비스 가입 여부를 판단하는 단계; 및 상기 판단 결과 상기 이동통신단말기가 대체 영상 서비스 가입자의 단말기일 경우, 상기 이동통신망에서 대체 영상 통화 호 설정을 수행하고 착신 단말기로 상기 대체 영상을 전송함에 의해 실시간 대체 영상을 이용한 화상 통화를 수행하는 단계를 포함하는 것을 특징으로 한다.The alternative video service method using the alternative video service mobile communication terminal may be performed in response to a request for a replacement video service using at least one of the soft key and the hot key of the mobile communication terminal. Generating an image; Determining whether to subscribe to a replacement video service at a location register of a mobile communication network according to the replacement video service request; And when the mobile communication terminal is a terminal of a substitute video service subscriber, performing a video call using a real-time substitute video by setting up a substitute video call call in the mobile communication network and transmitting the substitute video to a called terminal. Characterized in that it comprises a step.

한편, 본 발명의 다른 특징에 따라 영상 처리 부분의 일부가 단말기에 구현 된 대체 영상 서비스 시스템의 경우, 카메라에서 획득한 비디오 신호 처리를 위한 비디오 디코더와, 상기 비디오 신호에 포함된 상기 실제 영상의 특징점 추출을 통해 대체 영상용 캐릭터 특징 정보를 생성하는 RFT 엔진을 포함하여 캐릭터 특징 정보 추출을 수행하는 RSC 모듈을 포함하도록 구성된 이통통신단말기; 및 상기 이동통신망의 코어 망에 연결되어 있으며, 상기 이동통신단말기로부터 압축된 상태로 수신된 오디오 신호를 복원하고 분석하여 음성 정보를 추출하는 실시간 립싱크 기능을 수행하는 오디오 처리부와, 대체 영상용 캐릭터 컨텐츠에 상기 음성 정보와 상기 이동통신단말기로의 대체 영상용 캐릭터 특징 정보를 결합하여 실시간으로 동영상인 대체 영상을 생성하는 RSC 인코딩 모듈을 포함하는 RSC 서버를 포함하는 것을 특징으로 한다.Meanwhile, in the alternative video service system in which a part of an image processing part is implemented in a terminal according to another feature of the present invention, a video decoder for processing a video signal obtained from a camera, and a feature point of the actual image included in the video signal A telecommunication communication terminal configured to include an RSC module for performing character feature information extraction, including an RFT engine for generating character feature information for substitute images through extraction; And an audio processor connected to a core network of the mobile communication network and performing a real-time lip sync function to extract and extract voice information by restoring and analyzing an audio signal received in a compressed state from the mobile communication terminal, and character content for a substitute image. And an RSC server including an RSC encoding module for combining the voice information with character feature information for the substitute image to the mobile communication terminal to generate a substitute image which is a video in real time.

또한, 영상 처리 부분의 일부가 단말기에 구현된 대체 영상 서비스 시스템을 이용하여 대체 영상 서비스를 수행하는 방법은, 상기 이동통신단말기로부터의 대체 영상 통화 호 설정 요청에 따라, 상기 이동통신단말기에서 상기 실제 영상에 대한 상기 캐릭터 특징 정보를 추출하는 단계; 상기 대체 영상 통화 호 설정 요청에 따라 상기 RSC 서버에서 대체 영상 서비스 가입 정보를 판단하는 단계; 상기 RSC 서버에서, 상기 대체 영상 서비스 가입 정보에 따라 상기 이동통신단말기에 대해 등록된 캐릭터를 활성화하고, 상기 오디오 처리부에서 추출한 상기 음성 정보와 상기 이동통신단말기로부터의 상기 캐릭터 특징 정보를 상기 등록된 캐릭터와 결합하여 실시간으로 대체 영상을 생성하는 단계; 및 상기 RSC 서버에서 상기 코어 망을 통해 착신 단말기로 상기 대체 영상을 전송함에 의해 실시간 대체 영상을 이용한 화 상 통화를 수행하는 단계를 포함하는 것을 특징으로 한다.In addition, a method of performing an alternative video service using an alternative video service system in which a part of an image processing part is implemented in a terminal may be performed by the mobile communication terminal in response to a request for establishing an alternative video call call from the mobile communication terminal. Extracting the character feature information on the image; Determining replacement video service subscription information in the RSC server according to the replacement video call call setup request; In the RSC server, the registered character is activated according to the substitute video service subscription information, and the registered character is used for the voice information extracted by the audio processor and the character feature information from the mobile communication terminal. Generating a substitute image in real time by combining with a; And performing a video call using a real-time substitute image by transmitting the substitute image to a called terminal through the core network in the RSC server.

이하에서는, 첨부도면에 도시된 본 발명의 일 실시예를 참조하여 본 발명을 더 상세히 설명하기로 한다.Hereinafter, with reference to an embodiment of the present invention shown in the accompanying drawings will be described in more detail the present invention.

도 2의 대체 영상 서비스 시스템은 3G 통신망의 하나인 WCDMA(Wideband Code Division Multiple Access)망을 코어 망으로 이용하지만, 예컨대 IP(Internet Protocol) 기반 화상 통신에서 사용하는 SIP(Session Initiation Protocol) 전송 프로토콜과 H.323 코덱 기술 및 3.5G 이동통신서비스라 불리는 HSDPA(High Speed Downlink Packet Access, 고속하향패킷접속)에서 사용하는 SS7 프로토콜과 H.324M 프로토콜 및 H263 코덱 등 서로 다른 통신망에서 사용하는 다양한 프로토콜과 코덱 기술을 통합적으로 지원할 수 있으며, 서킷망의 TDM(Time Division Multiplexing, 시분할다중화) 트래픽과 데이터의 실시간 전송을 위한 패킷망의 RTP(Real-time Transport Protocol) 트래픽을 연결할 수 있는 미디어 게이트웨이(205)와 연동하기 때문에, 3G 이동통신사용자와 유선 화상 전화 사용자 간, 및 3G 이동통신사용자와 웹 화상 전화 사용자 간의 이종 통신망을 통한 대체 영상 통화 서비스를 지원할 수 있도록 구성되어 있으며, WCDMA, HSDPA, IP-TV, 유선전화망, 인터넷망 등 다양한 통신망을 연동할 수 있는 3세대 통신환경 및 그 이상의 진화된 통신환경에서도 적용 가능한 구조를 가지고 있다.The alternative video service system of FIG. 2 uses a Wideband Code Division Multiple Access (WCDMA) network, which is one of 3G networks, as a core network, but uses a Session Initiation Protocol (SIP) transmission protocol used in IP (Internet Protocol) based video communication. Various protocols and codecs used in different communication networks such as SS7 protocol and H.324M protocol and H263 codec used in HSDPA (High Speed Downlink Packet Access) called H.323 codec technology and 3.5G mobile communication service. It can support the integrated technology and interwork with the media gateway 205 which can connect the time division multiplexing (TDM) traffic of the circuit network and the real-time transport protocol (RTP) traffic of the packet network for real-time transmission of data. Between 3G mobile users and landline video telephone users, and between 3G mobile users and web video telephone users. It is composed to support alternative video call service through heterogeneous communication network between 3rd generation communication environment that can interoperate various communication networks such as WCDMA, HSDPA, IP-TV, wired telephone network, Internet network and more advanced communication environment. Also has an applicable structure.

따라서, 도 2에 도시된 것과 다른 종류의 코어 망을 이용하거나 다른 종류의 통신망과 연동하는 경우에도 본 발명의 대체 영상 서비스 기능을 수행하기 위한 RSC 서비스 플랫폼(300)과 동일 또는 동등한 구성요소를 이용하여 대체 영상 서비스 시스템을 구성할 수 있음을 미리 밝혀 둔다.Therefore, even when using a different type of core network than that shown in FIG. 2 or interworking with other types of communication networks, the same or equivalent components as those of the RSC service platform 300 for performing an alternative video service function of the present invention are used. It is clear that the alternative video service system can be configured.

도 2에서, 통상적인 WCDMA망의 구성요소는 이동통신망에 연결된 다수의 이동통신단말기(201)와, 기지국인 Node-B(202)와, 제어기인 RNC(203)와, 교환기인 MSC(204)과, 위치등록기인 HLR/HSS(206)과, 멀티미디어메시지센터인 MMSC(207)와, 인증서버인 AuC(208)와 이종 통신망과의 연동을 위한 미디어 게이트웨이인 G/W(205) 등이다.In FIG. 2, components of a typical WCDMA network include a plurality of mobile communication terminals 201 connected to a mobile communication network, a Node-B 202 serving as a base station, an RNC 203 serving as a controller, and an MSC 204 serving as an exchange. And an HLR / HSS 206 serving as a location register, an MMSC 207 serving as a multimedia message center, an AuC 208 serving as an authentication server, and a G / W 205 serving as a media gateway for interworking with heterogeneous communication networks.

한편, 본 발명에 따른 대체 영상 서비스 제공을 위해, 대체 영상 서비스를 위해 RSC 서비스 플랫폼(300)이 추가되고, 상기 G/W(205)는 상기 RSC 서비스 플랫폼(300)과 이종 통신망과의 연동을 지원하기 위한 프로토콜과 코덱 등을 구비하도록 구성되며, WEB/WAP 서버(408)와 같은 이종 통신망을 통해 컨텐츠 제공자(CP)(404)와, 사용자 단말기(401, 402)들이 접속되어 있다.Meanwhile, in order to provide an alternative video service according to the present invention, an RSC service platform 300 is added for an alternative video service, and the G / W 205 interworks with the RSC service platform 300 and a heterogeneous communication network. A content provider (CP) 404 and user terminals 401 and 402 are connected to each other through a heterogeneous communication network such as a WEB / WAP server 408.

여기서, 이동통신단말기(201)는 2G, 3G 단말기 외에 향후 출시가능한 4G 단말기 등을 모두 포함하는 개념이며, Node-B(202)는 이동통신단말기(201)로 통화 채널을 연결해 주는 기지국이며, RNC(Radio Network Controller)(203)는 다수의 Node-B(202)를 제어하고 코어 망의 교환기와 연동하는 제어기이며, MSC(Mobile Switching Center)(204)는 RNC(203)를 코어 망과 연결하고 호를 스위칭하는 교환기이다.Here, the mobile communication terminal 201 is a concept including all 4G terminals available in the future in addition to the 2G, 3G terminal, Node-B (202) is a base station for connecting the communication channel to the mobile communication terminal 201, RNC (Radio Network Controller) 203 is a controller that controls a plurality of Node-B (202) and interwork with the switch of the core network, MSC (Mobile Switching Center) 204 connects the RNC 203 with the core network An exchange that switches calls.

G/W(Media GateWay)(205)는 통신망에 따라 트래픽 포맷의 형식을 변환하는 장비로서, 패킷음성전달망에서 서킷망의 TDM 트래픽을 패킷망의 비동기 전송 방식(ATM, Asynchronous Transfer Mode) 혹은 IP 트래픽으로 변환하는 트랜스 코딩(Trans-coding) 기능과 통화 호 제어를 위한 시그널 스위칭 기능을 수행한다.G / W (Media GateWay) 205 is a device for converting the format of the traffic format according to the communication network. In the packet voice delivery network, TDM traffic of the circuit network is transferred to the Asynchronous Transfer Mode (ATM) or IP traffic of the packet network. Trans-coding function to convert and signal switching function for call control.

HLR/HSS(Home Location Register/Home Subscriber Server)(206)은 이동통신가입자정보(즉, 위치정보, 인증정보, 서비스정보, 권한 및 부가정보 등)을 실시간으로 관리하며, MSC/VLR(204), AuC(208), MMSC(207), 및 RSC 서비스 플랫폼(300)과의 연동을 통해 발/착신, 인증, 멀티미디어메시지, 패킷 전송, 위치정보, 대체 영상 서비스와 같은 부가 서비스 설정 상태를 관리하고 기타 지능망 서비스를 제공한다.The HLR / HSS (Home Location Register / Home Subscriber Server) 206 manages mobile subscriber information (ie, location information, authentication information, service information, authority and additional information, etc.) in real time, and the MSC / VLR 204. And additional service setting status such as incoming / outgoing, authentication, multimedia message, packet transmission, location information, and alternative video service through interworking with, AuC 208, MMSC 207, and RSC service platform 300 Provide other intelligent network services.

AuC(Authentication Center, AC)(208)는 인증키와 알고리즘을 이용하여 서비스 가입자 인증과 무선상 암호를 제공하며, HLR/HSS(206)에 연동되어 동작을 수행한다.The AuC (Authentication Center, AC) 208 provides service subscriber authentication and wireless password using authentication keys and algorithms, and performs operations in conjunction with the HLR / HSS 206.

MMSC(Multimedia Message Service Center(207)는 단문메시지서비스센터(SMSC)의 기능을 포함하도록 구성되어, 음성 메시지, 단문 메시지, 멀티미디어 메시지 서비스를 제공하며, 코어 망에 연결되어 인터넷 이메일을 포함하는 어떠한 형태의 멀티미디어 메시지도 수용할 수 있도록 메시지 서비스를 제공한다.MMSC (Multimedia Message Service Center 207) is configured to include the functions of Short Message Service Center (SMSC), provides voice message, short message, multimedia message service, and any form including Internet e-mail connected to core network It also provides a message service to accommodate multimedia messages.

RSC(Real-time Substitutive Communications, 실시간 대체 영상 통화) 서비스 플랫폼(300)은 본 발명의 대체 영상 통화 구현을 위해 제공되는 핵심 구성요소로서, 대체 영상 통화를 위한 호 처리 및 서비스 관리를 위한 SGN(Signal Gateway Network) 서버(304), 실시간으로 대체 영상을 생성하기 위한 RSC 서버(306), 비실시간 대체 영상 서비스를 위한 LSS(Lip-Sync Service) 서버(308), 대체 영상용 캐릭터 컨텐츠와 서비스 가입자별 서비스 내용 및 학습된 개인 메타데이터를 포함하는 각종 데이터를 저장하는 저장장치(Storage)(307), 및 캐릭터의 등록과 캐릭터의 템플릿 변환 등 캐릭터 컨텐츠 관리를 하기 위한 CMS(Contents Management Service) 서버(406)를 포함하도록 구성된다. RSC 서비스 플랫폼(300)의 구성과 동작에 대해서는 다른 도면을 참조하여 상세히 후술하기로 한다.Real-time Substitutive Communications (RSC) service platform 300 is a core component provided for implementing the alternative video call of the present invention, and SGN (Signal) for call processing and service management for the alternative video call. Gateway Network (304) server, RSC server 306 for generating alternate video in real time, Lip-Sync Service (LSS) server 308 for non-real-time alternate video service, character content for alternate video and service subscriber Storage 307 for storing various data including service contents and learned personal metadata, and CMS (Contents Management Service) server 406 for character content management such as character registration and character template conversion. It is configured to include). The configuration and operation of the RSC service platform 300 will be described later in detail with reference to other drawings.

대체 영상 서비스 제공을 위해서는 사전에 컨텐츠 제공자인 CP(404)들이나 사용자들이 대체 영상의 기본 모델로 이용될 수 있는 다양한 캐릭터 컨텐츠를 WEB/WAP(408) 서버를 통해 CMS(406)에 등록하여 저장장치(307)에 제공하고, 사용자들이 이동통신단말기(201)나 무선 인터넷 단말기 등의 무선 단말기(401) 또는 PC 등의 유선 단말기(402)를 이용하여 WEB/WAP서버(408)를 대체 영상 서비스 가입 및 캐릭터 설정 등의 절차를 수행한다. In order to provide an alternative video service, CP 404, which is a content provider, or users can register various character contents that can be used as a basic model of the alternative video to the CMS 406 through the WEB / WAP 408 server and store the storage device. 307, and users subscribe to the alternative video service using the mobile terminal 201, the wireless terminal 401 such as a wireless Internet terminal, or the wired terminal 402 such as a PC. And character setting.

도 3은 본 발명의 제 1 실시예에 따른 RSC(대체 영상 통화) 서비스 플랫폼의 구성도이다.3 is a block diagram of an RSC (alternate video call) service platform according to a first embodiment of the present invention.

먼저, RSC 서버(306)는 수신된 비디오 신호를 복원 및 분석하여 영상의 특징점 정보의 변화를 추출하는 실시간 페이스 트래킹(Real-time Face Tracking, RFT) 기능을 수행하는 비디오 처리부와 수신된 오디오 신호를 복원하고 분석하여 음성 정보를 추출하는 실시간 립싱크(Real-time Lip-Sync, RLS) 기능을 수행하는 오디오 처리부, 및 저장장치(307)의 컨텐츠 DB에서 선택된 캐릭터에 RSC 엔진 모듈로부터의 영상 특징 정보와 음성 정보를 결합하여 동영상 캐릭터를 생성하는 RSC 인코딩 모듈을 포함하도록 구성된다.First, the RSC server 306 restores and analyzes the received video signal to perform a real-time face tracking (RFT) function to extract a change in the feature point information of the image. An audio processor performing a real-time lip-sync (RLS) function for reconstructing and analyzing voice information, and image characteristic information from an RSC engine module to a character selected from a content DB of a storage device 307; And an RSC encoding module for combining the voice information to generate a video character.

SGN 서버(304)는 대체 영상 서비스 애플리케이션을 관리하고 운용하는 구성요소이다. SGN 서버(304)는 장치드라이버를 포함하여 RSC 서버(306), 컨텐츠 관리 서비스(Contents Management Service, CMS) 서버(406), 저장장치(307), 립싱크 서비스 서버(Lip-Synce Service)(308) 등 RSC 서비스 플랫폼(300)의 타 구성요소들 제어하고, 대체 영상 서비스와 관련하여 서비스 가입자가 설정한 정보(대체 영상용 캐릭터의 종류(즉, 캐릭터 ID), 캐릭터의 전송 시간, 방식, 모드 등)를 관리하며 설정된 서비스 가입자 정보에 따라 대체 영상 서비스가 이루어지도록 타 구성요소들을 제어한다.SGN server 304 is a component that manages and manages alternative video service applications. The SGN server 304 includes a device driver, including an RSC server 306, a Contents Management Service (CMS) server 406, a storage device 307, and a Lip-Synce Service server 308. The other components of the RSC service platform 300 are controlled, and information set by the service subscriber in relation to the alternative video service (type of the alternative video character (ie, character ID), transmission time, method, mode, etc.) ) And controls the other components to perform the alternative video service according to the set service subscriber information.

발신자의 대체 영상 서비스 가입 여부 및 단말기에 대한 정보(예컨대, 대체 영상 서비스가 가능한 종류의 단말기인지)는 HLR/HSS(206)에 저장되어 RSC 서비스 플랫폼(300)으로의 호 처리에 이용되며, 서비스 가입 여부 및 단말기 종류 확인 이후의 대체 영상 서비스 제어는 SGN 서버(304)에 의해 수행된다. 대체 영상 서비스에 이용될 캐릭터 컨텐츠의 ID 또한 SGN 서버(304)에 의해 RSC 서버(306)로 전송되며, 이에 의해 RSC 서버(306)가 각각의 캐릭터를 식별할 수 있게 된다.Information on whether the caller has subscribed to the alternative video service and information on the terminal (eg, whether the terminal is an alternative video service) is stored in the HLR / HSS 206 to be used for call processing to the RSC service platform 300. Subsequent video service control after checking whether a subscription is made and the terminal type is performed by the SGN server 304. The ID of the character content to be used for the replacement video service is also transmitted by the SGN server 304 to the RSC server 306, which enables the RSC server 306 to identify each character.

CMS 서버(406)는 컨텐츠 제공자 또는 대체 영상 서비스 이용자에 의해 제공되는 캐릭터 컨텐츠 및 서비스 가입자가 설정한 서비스 환경 정보(예컨대, 대체 영상용 캐릭터의 종류, 캐릭터의 전송 시간, 방식, 모드 등)를 등록하며, 등록된 서 비스 환경 정보를 SGN 서버(304)에 전달하여 SGN 서버(304)가 대체 영상 서비스를 제어할 수 있도록 한다. 또한, CMS 서버(406)은 컨텐츠 제공자가 제공한 캐릭터 컨텐츠의 템플릿을 편집하여 RSC 서버(306)에서의 실시간 적용이 가능한 형태로 변환하고, 변환된 캐릭터 컨텐츠를 저장장치(307)에 저장하는 기능을 수행한다.The CMS server 406 registers the character content provided by the content provider or the substitute video service user and service environment information (eg, the type of the character for the substitute video, the transmission time, the method, the mode, etc.) set by the service subscriber. In addition, the registered service environment information is transmitted to the SGN server 304 so that the SGN server 304 can control the alternative video service. In addition, the CMS server 406 edits the template of the character content provided by the content provider, converts it into a form that can be applied in real time in the RSC server 306, and stores the converted character content in the storage device 307. Do this.

LSS 서버(308)는 사용자 단말기의 요청에 의해 MMSC(207)에서 생성 및 전송되는 문자 또는 음성메시지나, 발신단말기와 착신단말기 간에 실시간 통화가 이루어지지 않을 경우(예컨대, 단말기 전원 오프, 통화권 이탈, 통화 중 등)에 자동으로 메시지를 생성하는 다른 부가서비스에 의해 MMSC(207)에서 생성 및 전송되는 문자 또는 음성 메시지에 캐릭터를 결합시켜 대체 영상 멀티미디어메시지(MMS)를 제공하는 비실시간 대체 영상 서비스 기능을 수행한다. LLS 서버(308)는 미리 녹음되거나 TTS에 의해 생성된 음성 파일을 대상으로 하여 캐릭터를 결합시키므로 실시간 대체 영상 생성 기능을 수행하지는 않는다.The LSS server 308 is a text or voice message generated and transmitted by the MMSC 207 at the request of the user terminal, or if there is no real-time call between the calling terminal and the called terminal (for example, power off the terminal, out of call rights, A non-real time alternative video service function that provides a substitute video multimedia message (MMS) by combining a character with a text or voice message generated and transmitted by the MMSC 207 by another additional service that automatically generates a message during a call). Do this. The LLS server 308 does not perform a real-time substitute image generation function because it combines the characters in the pre-recorded or generated by the TTS voice file.

LSS 서버(308)는 후술될 도 4의 RSC 서버(306)의 구성요소 중 오디오 신호 처리부와 RSC 인코딩 모듈(350)에 각각 대응되는 메시지 오디오 처리부(미도시)와 MMS 인코딩 모듈(미도시)를 포함하며, 입력된 메시지가 문자 메시지일 경우 이를 음성 메시지로 변환하기 위한 TTS 엔진(미도시)를 포함하도록 구성된다.The LSS server 308 may include a message audio processor (not shown) and an MMS encoding module (not shown) corresponding to the audio signal processor and the RSC encoding module 350 among the components of the RSC server 306 of FIG. 4 to be described later. And a TTS engine (not shown) for converting the input message into a voice message when the input message is a text message.

LSS 서버(308)의 메시지 오디오 신호 처리부는 MMSC(207)에 저장된 메시지에 포함된 메시지 오디오 신호를 분석하여 메시지 음성 정보를 추출하는 비실시간 립싱크 기능을 수행하며, MMS 인코딩 모듈은 SGN 서버(304) 및 저장장치(307)와 연동하여 사용자가 설정하거나 시스템에 설정된 캐릭터 영상을 불러와서 이를 메시지 오디오 신호 처리부로부터의 메시지 음성 정보와 결합시켜 메시지 음성 정보에 따라 입 모양이 결정되는 대체 영상을 생성하고 이를 MMSC(207)로 전송하는 기능을 수행한다.The message audio signal processor of the LSS server 308 performs a non-real-time lip sync function to extract message voice information by analyzing a message audio signal included in a message stored in the MMSC 207, and the MMS encoding module is an SGN server 304. And a character image set by the user or set in the system in conjunction with the storage device 307, and combined with the message voice information from the message audio signal processor to generate an alternative image having a mouth shape determined according to the message voice information. It performs the function of transmitting to the MMSC 207.

저장장치(307)는 CMS 서버(406)에 등록된 모든 캐릭터 컨텐츠를 제공하고, SGN 서버(304)의 제어에 따라 RSC 서버(306)와 LSS 서버(308)로 캐릭터 컨텐츠를 전송하는 기능을 수행한다.The storage device 307 provides all the character content registered in the CMS server 406 and transmits the character content to the RSC server 306 and the LSS server 308 under the control of the SGN server 304. do.

도 4는 본 발명의 제 1 실시예에 따른 대체 영상 통화(RSC) 서버의 구성도이다.4 is a block diagram of an alternative video call (RSC) server according to a first embodiment of the present invention.

먼저, RSC 서버(306)는 비디오 디코더(310)와 RFT 엔진(320)을 포함하는 비디오 신호 처리부와, 오디오 디코더(330)와 RLS 엔진(340)을 포함하는 오디오 신호 처리부, 및 RSC 인코딩 모듈(350)로 구성된다.First, the RSC server 306 may include a video signal processor including a video decoder 310 and an RFT engine 320, an audio signal processor including an audio decoder 330 and an RLS engine 340, and an RSC encoding module ( 350).

비디오 디코더(310)는 이동통신망을 통해 수신한 비디오 신호를 디코딩하여 RFT 엔진(320)으로 전달한다.The video decoder 310 decodes the video signal received through the mobile communication network and transmits the decoded video signal to the RFT engine 320.

RFT 엔진(320)은 페이스 분석 모듈(Face Analysis Module, F.A.)(322), 페이스 트래킹 모듈(Face Tracking Module, F.T.)(324) 및 모션 트래킹 모듈(Motion Tracking Module, M.T.)(326)을 포함하도록 구성된다.The RFT engine 320 includes a face analysis module (FA) 322, a face tracking module (FT) 324, and a motion tracking module (MT) 326. It is composed.

페이스 분석 모듈(322)은 자체 메모리(미도시)를 이용하여 사람 얼굴의 기본적인 윤곽에 대한 특징점 정보를 미리 저장하고 있으며, 비디오 디코더(310)에서 입력된 영상의 윤곽을 이루는 특징점들을 추출하여 이를 저장된 기본 윤곽 특징점 정보와 비교함으로써 입력 영상이 사람의 얼굴 영상인지를 확인한다.The face analysis module 322 prestores feature point information on a basic outline of a human face using its own memory (not shown), and extracts and stores feature points forming the outline of the image input from the video decoder 310. It is checked whether the input image is a human face image by comparing with basic contour feature point information.

이와 관련하여 도 5a에는 윤곽 특징점이 표시된 입력 영상과 캐릭터 영상이 도시되어 있다.In this regard, FIG. 5A illustrates an input image and a character image on which contour feature points are displayed.

도 5a의 좌측 영상은 이동통신단말기에서 전송한 비디오 신호에 포함된 실제 영상이며, 우측 영상은 RSC 서버(306)에서 생성하는 캐릭터 영상이다. 두 영상에는 사람 얼굴의 기본적 윤곽(즉, 머리, 눈썹, 눈, 코, 입 등의 상대적 위치)에 대한 특징점(및 연결선)들이 표시되어 있으며, 다수의 윤곽 특징점(및 윤곽 특징점의 연결선)들은 대체 영상 서비스 수행 여부를 결정하기 위해 입력 영상이 사람의 얼굴 영상인지를 신속하게 확인하는데 이용된다.The left image of FIG. 5A is an actual image included in a video signal transmitted from a mobile communication terminal, and the right image is a character image generated by the RSC server 306. Both images show feature points (and connecting lines) for the basic contours of the human face (ie, relative positions of the head, eyebrows, eyes, nose, mouth, etc.), and many contour feature points (and connecting lines of the contour feature points) are replaced. It is used to quickly determine whether the input image is a face image of a person in order to determine whether to perform a video service.

페이스 분석 모듈(322)에서 윤곽 특징점 판단 결과, 입력 영상이 사람의 얼굴이 아닌 경우, 디폴트 캐릭터를 이용한 대체 영상을 제공하거나 또는 대체 영상 서비스를 제공하지 않도록 시스템이 설정될 수 있다.As a result of determining the contour feature point in the face analysis module 322, when the input image is not a human face, the system may be configured not to provide a substitute image using a default character or to provide a substitute image service.

다시 도 4로 돌아가서, 페이스 분석 모듈(322)에서 입력 영상이 사람의 얼굴인 것으로 판단되면, 페이스 트래킹 모듈(324)은 입력 영상의 얼굴 부분의 개별 특징점의 변화 즉, 이동 상태를 실시간으로 추적하며, 모션 트래킹 모듈(326)은 입력 영상의 윤곽 특징점 변화를 실시간으로 추적한다.4, if the face analysis module 322 determines that the input image is a human face, the face tracking module 324 tracks the change of the individual feature points of the face portion of the input image, that is, the movement state in real time. The motion tracking module 326 tracks the contour feature point change of the input image in real time.

이와 관련하여 도 5b에는 개별 특징점이 추가로 표시된 입력 영상과 캐릭터 영상이 도시되어 있다.In this regard, FIG. 5B illustrates an input image and a character image in which individual feature points are additionally displayed.

좌측의 영상에는, 윤곽 특징점 외에, 입력된 영상에 포함된 사람 얼굴의 특징을 나타내는 개별 특징점(즉, 머리, 눈썹, 눈, 코, 입, 귀 등의 구체적 형태)들 이 추가로 표시되어 있으며, 우측에는 윤곽 특징점과 개별 특징점들의 상대적 위치를 그대로 반영하여 생성된 캐릭터 영상이 도시되어 있다. 입력 영상의 머리의 움직임 표정의 변화 등은 윤곽 특징점과 개별 특징점의 위치 변화를 발생시키며, 이러한 위치 변화는 페이스 트래킹 모듈(324)과 모션 트래킹 모듈(326)에 의해 각각 실시간으로 추적되어 캐릭터 영상에 반영된다.In the left image, in addition to the contour feature points, individual feature points (ie, specific shapes such as head, eyebrows, eyes, nose, mouth, and ear) representing the features of the human face included in the input image are additionally displayed. On the right is a character image generated by reflecting the relative feature points and the relative positions of the individual feature points. The change of the facial expression of the head of the input image, etc., causes a change in the position of the contour feature point and the individual feature point, and the change in the position is tracked by the face tracking module 324 and the motion tracking module 326 in real time, respectively. Is reflected.

본 발명의 경우 입력 영상의 눈썹, 눈, 코, 입 귀 등의 구체적 형태를 표현할 수 있도록 각 부분마다 다수의 개별 특징점들이 추가되어 있어서, 입력 영상의 특징을 그대로 재현한 캐릭터를 생성할 수 있다.In the present invention, a plurality of individual feature points are added to each part to express specific shapes such as eyebrows, eyes, noses, mouths, and the like of the input image, thereby generating a character that reproduces the characteristics of the input image as it is.

다시 도 4로 돌아가서, 페이스 트래킹 모듈(324)과 모션 트래킹 모듈(326)에서 추적하여 제공하는 특징점들의 변화는 대체 영상용 캐릭터의 동작과 표정을 결정하는데 이용되는 캐릭터 특징 정보로서 RSC 인코딩 모듈(350)의 RSC 애니메이션 엔진(352)으로 전달된다.4, the change in the feature points tracked and provided by the face tracking module 324 and the motion tracking module 326 is RSC encoding module 350 as the character feature information used to determine the motion and expression of the character for the substitute image. Is passed to the RSC animation engine 352).

한편, 오디오 디코더(330)는 입력된 오디오 신호를 디코딩하여 RLS 엔진(340)에 제공한다.Meanwhile, the audio decoder 330 decodes the input audio signal and provides the decoded audio signal to the RLS engine 340.

RLS 엔진(340)은 입력된 오디오 신호 중 음성 신호를 추출하는 음성 분석 모듈(Voice Analysis Module)(342)과, 실시간 립싱크를 수행하는 실시간 립싱크 모듈(Real-Time Lip-Sync Modue)(344)과, 립싱크된 음성 신호를 변조하는 음성 변조 모듈(Voice Disguise Module)(346)을 포함하도록 구성된다.The RLS engine 340 may include a voice analysis module 342 for extracting a voice signal from an input audio signal, a real-time lip-sync module 344 for performing a real-time lip sync, and And a Voice Disguise Module 346 for modulating the lip-synced speech signal.

먼저, 음성 분석 모듈(342)은 입력된 오디오 신호 중 음성만을 추출하기 위 해서, 본 발명에서 설정한 소정의 음성 대역에 해당하는 주파수의 신호를 제외한 나머지 대역의 신호를 필터링하여 제거한 후 음성 대역 신호를 증폭한다.First, the voice analysis module 342 filters and removes signals in the remaining bands except for signals having a frequency corresponding to a predetermined voice band set in the present invention, in order to extract only voice from the input audio signal. Amplify.

즉, 통상적으로 사람의 가청주파수는 약 20~20,000Hz이고 평상시 대화에 사용되는 음성의 주파수는 약 100~5,000Hz이지만, 이동통신 이용시에 약 300~3,400Hz 대역의 음성 신호만을 처리하더라도 실험자의 83% 이상이 상대방의 음성을 명확하게 인식한다는 음성학적 특성을 고려하여 하여 300~3,400Hz 대역의 신호만을 필터링한다.That is, in general, the audible frequency of a person is about 20 to 20,000 Hz, and the frequency of the voice used for normal conversation is about 100 to 5,000 Hz. However, even when processing only a voice signal of about 300 to 3,400 Hz band when using mobile communication, Filtering only 300 ~ 3,400Hz signal considering the phonetic characteristic that more than% clearly recognizes the other party's voice.

300~3,400Hz의 주파수 대역에도 음성 이외의 다른 종류의 오디오 신호가 포함될 수 있지만, 음색, 음의 고저, 패턴의 규칙성 등과 같은 오디오 특성이 음성과 구별되므로, 이를 이용하여 음성 분석 모듈(332)에서는 미리 설정된 음성 패턴과 상이한 오디오 특성을 가진 주파수를 제외한 음성 신호만을 분리하게 된다.The frequency band of 300 to 3,400 Hz may include other types of audio signals other than voice, but audio characteristics such as tone, tone, pattern regularity, and the like are distinguished from voice, thereby using the voice analysis module 332. In the present invention, only a voice signal except for a frequency having a different audio characteristic from a preset voice pattern is separated.

이와 같은 음성 추출 방식은 대부분 제한된 범위 내의 진폭과 주파수를 갖는 음성 신호를 위주로 처리하는 이동통신의 특성을 이용하여, 기존의 오프라인 상에서의 복잡한 음성 추출 방법을 이용하지 않고도, 간단하고도 효과적으로 오디오 신호에서 음성 신호만을 추출할 수 있도록 한다.Most of these speech extraction methods utilize mobile communication characteristics that mainly process speech signals with amplitudes and frequencies within a limited range. Only voice signals can be extracted.

이어서, 사람이 전화 통화시 사용하는 대부분의 발음에 따른 입 모양이 저장장치(307)에 입 모양 데이터베이스로 저장된 상태에서, 실시간 립싱크 모듈(334)은 음성 분석 모듈(332)에서 획득한 음성 신호에 포함된 각 음성에 고유한 음성 파라미터 DNA를 추출함에 의해 개개인의 음성 패턴에 따른 립싱크 변수를 추출하며, 추출된 립싱크 변수는 입 모양 데이터베이스에 저장된 입 모양 중 하나에 대응되도록 매칭된다.Subsequently, in the state in which the mouth shape according to most of the pronunciations used in the phone call is stored in the storage device 307 as a mouth shape database, the real-time lip sync module 334 is applied to the voice signal acquired by the voice analysis module 332. By extracting the speech parameter DNA unique to each included voice lip synch variable according to the individual voice pattern is extracted, the extracted lip synch variable is matched to correspond to one of the mouth shape stored in the mouth shape database.

정확한 립싱크 변수를 추출하기 위해서는, 입 모양 데이터베이스에 저장되는 입 모양에는 모든 종류의 모음에 대한 입 모양과, 초성과 종성의 차이에 따른 모음의 변화된 입 모양(예컨대, "ㅏ" 발음의 경우에도 "가"와 "파"의 경우에는 입 모양의 변화 상태가 상이함), 연음 등의 특성에 따른 입 모양의 변화 과정 등을 모두 고려해야 하며, 결국 모든 한글의 글자와 발음(글자로 표현하기는 곤란하지는 대화에 이용되는 발음들을 포함함) 및 주요 단어에 대해 입 모양과 그 변화 과정을 실험적인 방법으로 확보하여 입 모양 데이터베이스로 저장할 필요가 있다.In order to extract the correct lip sync parameters, the mouth shape stored in the mouth shape database includes the mouth shape for all kinds of vowels, and the changed mouth shape of the vowels due to differences in initial and final vowels (e.g. " In the case of "ga" and "wave", the state of change of mouth shape is different), and the process of changing the shape of mouth according to the characteristics of Yin, etc. must be considered. Haji includes pronunciations used in conversations) and key words and the process of changing them in an experimental way.

또한, 입 모양 데이터베이스는 시스템의 의해 설정된 소정의 기준 주파수 범위에 속하는 각각의 발음에 대응되도록 표준화된 입 모양 데이터 외에도 기준 주파수 보다 높거나 낮은 발음에 대응되도록 변화된 입 모양 데이터까지도 포함하고 있다.In addition, the mouth shape database includes mouth shape data changed to correspond to pronunciation higher or lower than the reference frequency in addition to mouth shape data standardized to correspond to each pronunciation belonging to a predetermined reference frequency range set by the system.

따라서, 개인별 음성 파라미터 DNA의 차이로 인해 예컨대 추출된 음성 신호의 주파수 범위가 상기 기준 주파수 범위를 벗어나는 경우, 실시간 립싱크 모듈(344)은 추출된 음성 신호의 주파수 범위에 상응하는 변화된 입 모양 데이터를 매칭시킴으로써 보다 정확하게 입 모양을 표현할 수 있는 립싱크 변수를 추출할 수 있다.Thus, if, for example, the frequency range of the extracted speech signal is out of the reference frequency range due to the difference of the individual speech parameter DNA, the real-time lip sync module 344 matches the changed mouth shape data corresponding to the frequency range of the extracted speech signal. By doing this, it is possible to extract the lip sync parameters that can express the shape of the mouth more accurately.

한편, 실시간 립싱크 모듈(344)에서 추출하여 제공되는 립싱크 변수는 SGN 서버(304)의 제어에 의해 저장장치(307)로 보내어져, 해당 이동통신단말기(201)에 대응하는 데이터 필드에 저장되어 개인화된 메타 데이터로 축적되고, 이에 따라 각 사용자의 음성 특징에 대한 학습 데이터가 구축되며, 향후 동일 이동통신단말기(201)를 이용한 대체 영상 통화 서비스 시에 표준 음성에 대응하는 립싱크 변수가 아닌 개인화된 메타 데이터에 따른 립싱크 변수가 곧바로 디폴트 음성 모델로 이용되므로 보다 정확하고 신속한 입 모양 매칭을 가능하게 한다.Meanwhile, the lip sync variable extracted from the real time lip sync module 344 is sent to the storage device 307 under the control of the SGN server 304, and stored in a data field corresponding to the corresponding mobile communication terminal 201 to personalize it. Accumulated metadata, and thus, learning data about voice characteristics of each user is constructed, and personalized meta data instead of a lip sync variable corresponding to a standard voice in an alternative video call service using the same mobile communication terminal 201 in the future. Lipsync parameters based on the data are directly used as the default speech model for more accurate and faster mouth shape matching.

즉, 입력된 음성 신호의 고유한 특징을 나타내는 음성 파라미터 DNA를 추출하여 개개인의 음성 패턴 메타데이터로 별도로 저장하고, 소정 기간 또는 소정 회수 만큼의 반복 저장을 통해 해당 음성 신호의 고유한 특징을 학습하게 되며, 이를 이용하여 애니메이션 생성 시에 표준화된 모핑(morphing)을 하지 않고 학습된 메타데이터를 이용하여 개개인의 음성 특성에 맞는 애니메이션 싱크를 적용할 수 있게 된다.That is, the voice parameter DNA representing the unique characteristics of the input voice signal is extracted and stored separately as individual voice pattern metadata, and the unique characteristics of the corresponding voice signal are learned through repeated storage for a predetermined period or a predetermined number of times. By using this, it is possible to apply animation sinks according to individual voice characteristics by using learned metadata without standardizing morphing at the time of animation generation.

예컨대, 최초로 대체 영상 서비스를 이용하는 경우에 적용되는 입 모양 데이터는 기준 주파수 범위의 음성에 대응하여 매칭되지만, 상기 기준 주파수 범위를 벗어나는 음성 주파수 범위를 가지는 사용자가 수초 이상 또는 수회에 걸쳐 대체 영상 통화를 수행하는 경우, 추출된 음성 신호의 주파수 범위에 상응하는 변화된 입 모양 데이터를 나타내는 립 싱크 변수가 추출되어 개인화된 메타 데이터로 축적되므로, 향후 동일 사용자의 대체 영상 통화시에는 상기 기준 주파수 범위가 아닌 해당 사용자의 음성 주파수 범위에 맞는 변화된 입 모양 데이터를 곧바로 매칭시킬 수 있게 된다.For example, mouth shape data applied when using an alternative video service for the first time is matched with a voice in a reference frequency range, but a user having an audio frequency range outside the reference frequency range performs an alternative video call for several seconds or more. In this case, since the lip sync variable representing the changed mouth shape data corresponding to the frequency range of the extracted speech signal is extracted and accumulated as personalized metadata, the corresponding user is not the reference frequency range in the future when the same video call of the same user is performed. You can instantly match changed mouth shape data to your voice frequency range.

음성 변조 모듈(346)은 주파수 대역, 음색, 억양 등을 달리하는 다양한 종류의 음성(예컨대, 남성, 여성, 연예인, 밝은 목소리, 우울한 목소리 등 임의로 지정 할 수 있음)을 캐릭터 음성 데이터베이스로 저장하고, 통화자의 실제 음성 대신을 대신하여 캐릭터의 음성으로 이용할 수 있도록 한다.The voice modulation module 346 stores various kinds of voices (for example, males, females, entertainers, bright voices, depressed voices, etc.) that vary in frequency band, tone, and intonation into a character voice database. It can be used as the character's voice instead of the caller's actual voice.

실시간 립싱크 모듈(334)에서 추출된 립 싱크 변수와 선택적으로 음성 변조 모듈(346)에서 제공되는 캐릭터 변조 음성 변수는 캐릭터의 입 모양과 목소리를 결정하기 위한 캐릭터 음성 정보로서 RSC 인코딩 모듈(350)로 전송된다.The lip sync parameters extracted from the real-time lip sync module 334 and optionally the character modulated voice variables provided by the voice modulation module 346 are transmitted to the RSC encoding module 350 as character voice information for determining the mouth shape and voice of the character. Is sent.

RSC 인코딩 모듈(350)는 RFT 엔진(320)과 RLS 엔진(340)으로부터의 캐릭터 특징 정보와 캐릭터 음성 정보를 저장장치(307)의 컨텐츠 DB에서 선택된 캐릭터 컨텐츠 및 입 모양 데이터와 결합하여 동영상을 생성하며, 이를 위해 RSC 애니메이션 엔진(Realtime Subsitutive Communications Animation Engine)(352)과, 이미지 렌더러(Image Renderer)(354)와, 비디오 인코더(Video Encoder)(356)를 포함하도록 구성된다.The RSC encoding module 350 generates a video by combining character feature information and character voice information from the RFT engine 320 and the RLS engine 340 with the character content and mouth shape data selected from the content DB of the storage device 307. For this, the RSC animation engine includes a Realtime Subsitutive Communications Animation Engine (352), an Image Renderer (354), and a Video Encoder (356).

사람마다 발음 특성이 상이하므로 서로 다른 통화자는 동일한 발음에 대해서도 조금씩 다른 입 모양을 가지기 마련이며, 동일한 사람의 경우에도 동일한 발음에 대해 때에 따라 상이한 입 모양으로 발음할 뿐 아니라 서로 다른 발음에 대한 입 모양이 거의 유사할 수 있다. 이러한 현상은 페이스 트래킹 방식으로 추출된 영상에서의 입 모양만으로는 음성을 정확하게 예측할 수 없음을 의미한다.Because different people have different pronunciation characteristics, different callers have slightly different mouth shapes for the same pronunciation, and even the same person may not only pronounce the same pronunciation with different mouth shapes, but also mouth shapes for different pronunciations. This can be almost similar. This phenomenon means that the shape of the mouth extracted from the face tracking method cannot accurately predict the voice.

따라서, 본 발명의 RSC 애니메이션 엔진(352)은 페이스 트래킹 모듈(324)과 모션 트래킹 모듈(326)로부터 입력된 캐릭터 특징 정보와, 실시간 립싱크 모듈(344)과 음성 변조 모듈(346)로부터 입력된 캐릭터 음성 정보를 실시간으로 결합 하여 음성이 포함된 캐릭터 동영상을 생성한다.Accordingly, the RSC animation engine 352 of the present invention uses the character feature information input from the face tracking module 324 and the motion tracking module 326, and the character input from the real-time lip sync module 344 and the voice modulation module 346. The voice information is combined in real time to generate a character video including the voice.

본 발명에 따라 캐릭터 동영상을 생성하기 위한 하나의 방법은, RSC 애니메이션 엔진(352)에서 저장장치(307)의 캐릭터 데이터베이스에 포함된 캐릭터 중에서 사용자가 설정한 캐릭터(사용자의 설정이 없을 경우, 미리 지정된 디폴트 캐릭터)를 불러와서, 실시간 페이스 트래킹 엔진(320)으로부터 수신한 캐릭터 영상 정보에 포함된 특징점들을 반영시켜 전송용 대체 영상을 생성한 후, 실시간 립싱크 엔진(340)으로부터 수신한 캐릭터 음성 정보에 따른 입 모양 데이터를 저장장치(307)에서 불러와서 전송용 대체 영상의 캐릭터 입 모양을 보정하는 것이다. 이러한 보정은 대체 영상의 캐릭터 입 모양의 특징점들을 상기 입 모양 데이터에 맞추어 조정함에 의해 수행된다.One method for generating a character video according to the present invention is a character set by the user among characters included in the character database of the storage device 307 in the RSC animation engine 352 (if there is no user's setting, a predetermined value is specified in advance). A default character), a feature image included in the character image information received from the real-time face tracking engine 320 is generated to generate a substitute image for transmission, and then according to the character voice information received from the real-time lip sync engine 340. The mouth shape data is retrieved from the storage device 307 to correct the character mouth shape of the replacement image for transmission. This correction is performed by adjusting the character mouth shape points of the substitute image to the mouth shape data.

이와 달리, 캐릭터 동영상을 생성하기 위한 다른 방법은, 설정된 캐릭터 및 실시간 립싱크 엔진(340)으로부터 수신한 캐릭터 음성 정보에 따른 입 모양 데이터를 저장장치(307)에서 불러와서 이를 RSC 애니메이션 엔진(352)에서 합성하여 전송용 대체 영상을 생성한 후, 실시간 페이스 트래킹 엔진(320)으로부터 수신한 캐릭터 영상 정보에 포함된 캐릭터 특징점들을 반영시켜(입 모양 부분 제외) 전송용 대체 영상의 캐릭터의 표정을 보완하는 것이다. 이때, 캐릭터 영상 정보에 포함된 입 모양 부분에 대한 캐릭터 특징점들은 대체 영상 생성에 이용되지 않고 폐기된다.Alternatively, another method for generating a character video, the mouth shape data according to the set character and the character voice information received from the real-time lip sync engine 340 is retrieved from the storage device 307 and the RSC animation engine 352 After synthesizing and generating a substitute image for transmission, character feature points included in the character image information received from the real-time face tracking engine 320 are reflected (except the mouth shape part) to compensate for the expression of the character of the transfer image for transmission. . At this time, the character feature points of the mouth shape included in the character image information are discarded without being used for generating the replacement image.

이상의 방법에 따르면, 캐릭터 영상 정보에 포함된 캐릭터 영상의 입 모양 부분이 캐릭터 음성 정보에 따라 제공되는 입 모양으로 대체되기 때문에, 예컨대 " ㅏ" 발음의 경우 캐릭터가 "ㅏ" 발음에 해당하는 정확한 입 모양을 표현하도록 해 준다.According to the above method, since the mouth shape portion of the character image included in the character image information is replaced with the mouth shape provided according to the character voice information, for example, in the case of "ㅏ" pronunciation, the character has the correct mouth corresponding to the "ㅏ" pronunciation. Allow them to express their shape.

또한, 이러한 입 모양 보완 특성과 관련하여, RLS 엔진(340)에서는 각 음소에 대해 기준 주파수 범위에 대응하는 표준화된 입 모양 데이터를 이용하여 생성한 표준 입 모양을 대체 영상에 적용할 수 있으며, 이러한 경우 개별 사용자의 입 모양 차이에도 불구하고 대체 영상의 입 모양이 음성 정보의 내용에 최적화되어 표시되도록 해주며, 특히 청각장애인과 같이 입 모양과 같은 시각 정보에 의존하여 정보를 받아들이는 사람들도 본 발명의 서비스를 이용할 수 있도록 하는 효과를 가진다.In addition, in relation to the mouth shape complementary feature, the RLS engine 340 may apply a standard mouth shape generated by using standardized mouth shape data corresponding to a reference frequency range for each phoneme to the replacement image. In this case, the mouth shape of the substitute image is optimized and displayed in the content of the voice information in spite of the difference of the mouth shape of the individual user, and in particular, those who accept the information depending on visual information such as the shape of the mouth, such as a deaf person It has the effect of making the service available.

한편, 이상의 경우는 이동통신환경이 양호한 경우를 상정한 것이지만, 실제의 이동통신환경 특히, 도심에서의 이동통신환경의 경우에는 통신신호의 전송환경이 통화자의 이동, 장애물의 출현 등으로 인해 시시각각으로 변하게 된다. 따라서, 비디오 디코더(310)에 입력된 비디오 신호의 품질이 우수한 상황에서는 캐릭터 영상 정보를 위주로 하여 캐릭터 영상을 생성하며, 오디오 디코더(330)에 입력된 오디오 신호의 품질이 상대적으로 우수한 상황에서는 캐릭터 음성 정보를 위주로 하여 캐릭터 영상을 생성하는 것이 바람직하다.On the other hand, in the above case, it is assumed that the mobile communication environment is good, but in the case of the actual mobile communication environment, especially the mobile communication environment in the city, the communication signal transmission environment is constantly changed due to the movement of the caller and the appearance of obstacles. Will change. Therefore, in the situation where the quality of the video signal input to the video decoder 310 is excellent, the character image is generated based on the character image information. In the situation where the quality of the audio signal input to the audio decoder 330 is relatively excellent, the character voice is generated. It is preferable to generate a character image based on information.

이를 위해, RSC 애니메이션 엔진(352)에는 비디오 디코더(310)와 오디오 디코더(330)로부터 신호 품질에 대한 정보(QoS)를 수신하여 비디오 신호와 오디오 신호가 각각의 품질 기준치를 초과하는지를 판단하고, 비디오 신호와 오디오 신호 중 상대적으로 양호한 신호를 판단하는 수단(미도시)이 구비될 수 있다. 물론, 이동통신망 내의 타 구성요소에서 제공되는 QoS 데이터가 있을 경우에는 상기 신호 판단 수단이 별도로 구비될 필요가 없을 것이다.To this end, the RSC animation engine 352 receives information (QoS) on the signal quality from the video decoder 310 and the audio decoder 330 to determine whether the video signal and the audio signal exceed the respective quality criteria, and the video Means (not shown) for determining a relatively good signal between the signal and the audio signal may be provided. Of course, when there is QoS data provided from other components in the mobile communication network, the signal determination means may not need to be separately provided.

상기 신호 판단 수단은 해당 신호의 품질 저하 여부를 일시적인 손실과 장기간의 손실로 구분할 수 있다. 일시적인 손실은 비디오 신호와 오디오 신호의 프레임 일부가 손실된 경우이며, 장기간의 손실은 통신환경의 변경으로 인해 통화가 차단되거나 신호 수신이 장시간 불량해지는 경우이다. The signal judging means may distinguish whether the signal is deteriorated into a temporary loss and a long term loss. Temporary loss is when a part of a frame of a video signal and an audio signal is lost, and a long-term loss is a case where a call is blocked or a signal reception is bad for a long time due to a change of communication environment.

RSC 애니메이션 엔진(352)은 비디오 신호와 오디오 신호의 일부 프레임에 일시적 손실이 발생할 경우, 손실이 발생하지 않은 신호를 근거로 캐릭터 영상과 입 모양을 생성하여, 신호의 일시적 손실에 상관없이 대체 영상을 지속적으로 제공할 수 있도록 한다. 예를 들어, 비디오 신호의 일부 구간의 데이터가 손실된 경우에는 해당 구간의 캐릭터 영상의 입 모양은 캐릭터 음성 정보를 이용하여 보완하고, 오디오 신호의 일부 구간의 데이터가 손실된 경우에는 해당 구간의 캐릭터 영상의 입 모양은 캐릭터 영상 정보를 이용하여 보완하는 것이다.When a temporary loss occurs in some frames of a video signal and an audio signal, the RSC animation engine 352 generates a character image and a mouth shape based on a signal in which no loss occurs, thereby generating a substitute image regardless of the temporary loss of the signal. Make sure to continue to deliver. For example, when the data of some sections of the video signal is lost, the shape of the mouth of the character image of the corresponding section is supplemented by using character voice information. When the data of some sections of the audio signal are lost, the characters of the corresponding section are lost. The mouth shape of the image is supplemented by using character image information.

한편, 신호 수신이 장시간 불량해질 경우에 있어서, 오디오 신호가 지속적으로 품질 기준치에 미달할 경우에는 비디오 신호만을 기준으로 캐릭터 영상과 입 모양을 생성하며, 비디오 신호가 지속적으로 품질 기준치에 미달할 경우에는 오디오 신호를 기준으로 캐릭터 입 모양과 캐릭터 영상을 생성하게 된다. 이와 같이 오디오 신호를 기준으로 캐릭터 영상을 생성하기 위해서는, 각 발음에 적합한 얼굴 표정의 특징점 분포를 미리 데이터베이스로 준비해 두어야 한다.On the other hand, when the signal reception is poor for a long time, if the audio signal is consistently below the quality standard, the character image and mouth shape are generated based only on the video signal, and if the video signal is continuously below the quality standard, Character shape and character image are generated based on the audio signal. As described above, in order to generate a character image based on an audio signal, a feature point distribution of a facial expression suitable for each pronunciation must be prepared in advance in a database.

또한, RSC 애니메이션 엔진(350)은 대체 영상의 생성을 위해 비디오 신호와 오디오 신호의 동기를 맞추는 동기 수단(미도시)을 포함한다. RSC 애니메이션 엔진(352)의 동기 수단은 주기적으로 캐릭터 영상과 캐릭터 음성을 동기시키며, 상기한 바와 같이 비디오 신호와 오디오 신호의 품질의 변동으로 인해 영상 보완이 수행될 경우에도 동기 신호를 생성하여 캐릭터 영상과 캐릭터 음성을 동기시킨다.In addition, the RSC animation engine 350 includes synchronization means (not shown) for synchronizing the video signal with the audio signal to generate a substitute image. The synchronizing means of the RSC animation engine 352 periodically synchronizes the character image and the character voice, and generates a synchronizing signal even when image complement is performed due to the variation of the quality of the video signal and the audio signal as described above. And character voice.

이어서, 이미지 렌더러(354)는 캐릭터 영상 데이터를 컴파일하여 데이터량을 축소시키는 동시에 캐릭터 영상과 캐릭터 음성을 다시 한번 동기시켜 캐릭터 영상의 움직임을 가장 자연스러운 패턴으로 최적화시키는 기능을 수행한다. 또한 이미지 렌더러(354)는 캐릭터 영상 정보의 급격한 변화 등을 자연스럽게 보간하는 영상 보간법 등 공지의 다양한 영상 처리 방법을 적용할 수 있다.Subsequently, the image renderer 354 compiles the character image data to reduce the data amount and simultaneously synchronizes the character image and the character voice to perform the function of optimizing the movement of the character image to the most natural pattern. In addition, the image renderer 354 may apply various known image processing methods, such as an image interpolation method that naturally interpolates a sudden change of character image information.

마지막으로 비디오 인코더(356)는 이동통신망에서 지원하는 적절한 영상 포맷으로 캐릭터 대체 영상을 인코딩하여 전송한다.Finally, the video encoder 356 encodes and transmits the character replacement video in an appropriate video format supported by the mobile communication network.

도 6은 본 발명의 제 1 실시예에 따른 대체 영상 서비스 순서를 도시하고 있다.6 illustrates an alternative video service sequence according to the first embodiment of the present invention.

먼저, (1) 컨텐츠 업로드 단계에서, CP(404)들이 CMS 서버(406)에 다양한 캐릭터 컨텐츠를 미리 등록하여 놓는다. 대체 영상 서비스에 가입한 사용자들의 경우에도 자신의 유무선 단말기(201, 401, 402)를 이용하여 자신이 생성한 캐릭터 컨텐츠를 CMS 서버(406)에 등록할 수 있다.First, in the content uploading step (1), the CPs 404 register various character contents in advance in the CMS server 406. Even users who subscribe to the alternative video service may register their own character content with the CMS server 406 using their wired / wireless terminals 201, 401, 402.

이어서, (2) 컨텐츠 구입 및 설정 단계에서, 사용자가 자신의 유무선 단말기(201, 401, 402)를 이용하여 대체 영상 서비스에 가입함에 따라 사용자가 설정한 서비스 환경이 CMS 서버(406)에 등록된다.Subsequently, in the (2) content purchasing and setting step, the service environment set by the user is registered in the CMS server 406 as the user subscribes to the alternative video service using his or her wired / wireless terminals 201, 401, 402. .

그러면, (3) 컨텐츠 ID 등록 단계에서, CMS 서버(406)는 대체 영상 서비스 가입 여부와 가입자의 단말기 정보 등을 HLR/HSS(206)로 통지하고, 서비스 등록 사항을 SGN 서버(304)에 통보하며, 사용자가 등록한 컨텐츠의 ID를 RSC 서버(306)에 등록하여 대체 영상 서비스 수행을 준비한다.Then, (3) in the content ID registration step, the CMS server 406 notifies the HLR / HSS 206 whether to subscribe to the alternative video service and the subscriber terminal information, and notifies the SGN server 304 of the service registration details. The user registers the ID of the content registered by the user in the RSC server 306 to prepare to perform the replacement video service.

이제, (4) 통화시도 단계에서, 대체 영상 서비스 가입자가 자신의 단말기를 이용하여 대체 영상 통화를 요청하면, (5) 호 설정 요청/응답 단계에서, 발신 MSC(204), HLR/HSS(206) 및 SGN 서버(304)가 연동하여 서비스 가입 여부를 확인한 후 대체 영상 서비스를 위한 호를 설정하며, (6) 대체 영상 호 처리 단계에서, RSC(306)에서 대체 영상을 생성하여 착신 MSC(204)로 전송함으로써, (7) 대체 영상 통화 연결 단계에서, 발신자(201, 401)와 착신자(201) 간에 대체 영상 통화가 실시간으로 수행된다.Now, if (4) the alternative video service subscriber requests an alternative video call using his terminal in (4) call attempt step, (5) in the call setup request / response step, the originating MSC 204, HLR / HSS 206 ) And the SGN server 304 interwork with each other to determine whether to subscribe to a service, and then set up a call for an alternative video service. (6) In the alternative video call processing step, the RSC 306 generates a replacement video to receive an incoming MSC 204. (7) In the alternative video call connection step, the alternative video call is performed in real time between the callers 201 and 401 and the called party 201.

한편, 착신자와의 통화가 불가능한 경우 또는 발신자가 음성 메시지나 문자 메시지를 생성하여 이를 대체 영상 메시지로 전송하고자 할 경우, MMSC(미도시)가 SGN 서버(304) 및 LSS 서버(308)와 연동하여 대체 영상 서비스 가입자인 발신자가 설정한 캐릭터 영상을 이용하여 대체 영상 MMS를 생성한 후 이를 착신자의 이동통신단말기로 전송하게 된다.On the other hand, when a call with the called party is impossible or when the caller wants to generate a voice message or a text message and transmit it as an alternative video message, the MMSC (not shown) interworks with the SGN server 304 and the LSS server 308. An alternative video MMS is generated using the character video set by the caller who is an alternative video service subscriber and then transmitted to the called party's mobile communication terminal.

RSC 서버(306)를 경유하는 대체 영상 통화는 통화 중에 입력되는 비디오 신 호와 오디오 신호를 실시간으로 분석하여 대체 영상을 제공하는 실시간 서비스이지만, LSS 서버(308)를 이용한 대체 영상 서비스는 미리 생성되어 저장된 음성 메시지(또는 문자 메시지를 변환한 음성 메시지)를 파일 단위로 분석하여 대체 영상을 제공하는 비실시간 서비스인 점에서 차이가 있다.The alternative video call via the RSC server 306 is a real-time service that provides a substitute video by analyzing a video signal and an audio signal input during a call in real time, but the alternative video service using the LSS server 308 is generated in advance. There is a difference in that it is a non-real-time service that provides a substitute image by analyzing stored voice messages (or voice messages converted from text messages) on a file basis.

도 7은 본 발명의 제 1 실시예에 따른 대체 영상 서비스 방법에서의 호 처리 순서를 도시하고 있다.7 illustrates a call processing procedure in the alternative video service method according to the first embodiment of the present invention.

본 발명의 대체 영상 서비스를 위해서 사용자가 사전에 서비스에 가입하고 자신이 대체 영상 통화에 이용할 캐릭터를 설정한다(S310). 이 단계에서 사용자는 자신이 직접 작성한 캐릭터를 업로드하여 대체 영상 통화용으로 사용하도록 설정할 수도 있다.For the alternative video service of the present invention, the user subscribes to the service in advance and sets a character to use for the alternative video call (S310). In this step, the user may upload a character written by himself and set it to use for an alternative video call.

다음, 서비스 가입자가 대체 영상 통화를 위해 발신단말기를 이용하여 상대방에게 전화를 걸어 대체 영상 호 설정 요청을 한다(S320). 대체 영상 통화 호 설정 요청은 특정 서비스 코드와 함께 착신단말기의 전화번호를 입력하거나, 발신 단말기에 제공되는 소프트키 또는 핫키와 함께 착신단말기의 전화번호를 입력함에 의해 수행된다.Next, the service subscriber makes a call to the other party by using the calling terminal for the alternative video call to request a replacement video call (S320). The alternative video call call establishment request is performed by entering a telephone number of the called terminal with a specific service code or by entering a telephone number of the called terminal together with a softkey or a hotkey provided to the calling terminal.

이어서, 대체 영상 호 설정 요청이 발신MSC를 거쳐 전송되며, 발신MSC에 연결된 HLR/HSS(미도시)에서 각 단말기의 대체 영상 서비스 가입 여부 및 화상 통화 가능 단말 여부를 확인하게 되고, 대체 영상 서비스가 가능할 경우 호 설정 요청은 발신G/W를 거쳐 RSC 플랫폼으로 전송된다. RSC 플랫폼은 사용자 인증, 사용자 설 정 캐릭터 활성화 등 대체 영상 통화 준비를 수행한 후(S322), 호 설정 요청을 착신단말기가 이용하는 이동통신망으로 전송하며, 착신G/W와 착신MSC를 거친 대체 영상 호 설정 요청을 착신단말기가 수신하게 되고(S324), 착신단말기의 대체 영상 호 연결 응답(S328)에 의해 대체 영상 통화를 위한 호 설정이 이루어지게 된다(S328).Subsequently, an alternative video call setup request is transmitted through the outgoing MSC, and the HLR / HSS (not shown) connected to the outgoing MSC checks whether each terminal subscribes to the alternative video service and whether or not a video call is possible. If possible, the call establishment request is sent to the RSC platform via the originating G / W. The RSC platform performs preparation of an alternative video call such as user authentication and user setting character activation (S322), and then transmits a call setup request to the mobile communication network used by the called terminal.The alternative video call has passed through the called G / W and the called MSC. The receiving terminal receives the setup request (S324), and the call setup for the alternative video call is made by the alternative video call connection response (S328) of the called terminal (S328).

상기 단계에서 호 설정이 완료되면, 발신단말기와 착신단말기 사이에 실시간 대체 영상 통화가 수행된다. 발신단말기에서 발신MSC를 통해 실제 영상을 전송하게 되면(S360), 발신G/W에서 RSC 플랫폼에서 처리할 수 있는 포맷의 영상으로 전환하여(S330) RSC 플랫폼으로 전송하며, RSC 플랫폼에서는 실제 영상에 대응하는 대체 영상을 생성하여(S340) 착신측 통신망의 G/W로 전달하고, 착신G/W는 해당 통신망의 통신 방식에 적절한 포맷의 미디어로 전환하여(S350) 착신MSC를 거쳐 착신단말기로 대체 영상을 전송하며(S370), 착신단말기도 이에 응답하여 사용자의 실제 영상을 전송하게 되고(S380), 전송된 실제 영상은 RSC 플랫폼에서 대체 영상으로 전환되어 발신단말기로 전송하게 되면서(S390), 대체 영상 통화가 수행된다.When the call setup is completed in the above step, a real-time substitute video call is performed between the calling terminal and the called terminal. When the calling terminal transmits the actual video through the sending MSC (S360), the calling G / W converts the video into a format that can be processed by the RSC platform (S330) and sends it to the RSC platform. A corresponding substitute image is generated (S340) and transmitted to the G / W of the called party's communication network, and the called G / W is converted into a media having a format appropriate for the communication method of the corresponding communication network (S350) and replaced by the called terminal via the recipient MSC. The video is transmitted (S370), and the called terminal also transmits the actual video of the user in response (S380), and the transmitted real video is converted into an alternative video on the RSC platform and transmitted to the calling terminal (S390). Video call is made.

도 7에 도시된 발신G/W, 착신G/W와 같은 소위 미디어 게이트웨이는 이종 통신망을 연동시키기 위한 구성요소로서, 각 통신망에 적용되는 전송 프로토콜과 미디어 코덱 등을 포함하고 있어서 발신측 통신망으로부터 전송된 미디어 데이터를 착신측 통신망에서 지원가능한 형태의 미디어 데이터로 변환하는 기능을 수행한다.The so-called media gateways such as outgoing G / W and incoming G / W shown in FIG. 7 are components for interworking heterogeneous communication networks, and include a transmission protocol and a media codec applied to each communication network, thereby transmitting them from the originating communication network. Converts the media data into media data that can be supported by the called party's communication network.

따라서, 이러한 미디어 게이트웨이를 코어 망에서 구비하지 않은 통신망에 본 발명의 대체 영상 시스템이 적용될 경우, RSC 플랫폼에서 생성한 대체 영상을 착신 단말기가 이용하는 통신망에 적합한 포맷으로 변환하는 수단이 RSC 플랫폼에 제공되어야 한다.Therefore, when the alternative video system of the present invention is applied to a communication network that does not have such a media gateway in the core network, a means for converting the alternative video generated in the RSC platform into a format suitable for the communication network used by the called terminal should be provided to the RSC platform. do.

한편, 발신단말기와 착신단말기 사이에 일반 통화를 수행하는 중에, 단말기에 구비된 소프트키 또는 핫키를 통해 대체 영상 통화를 요청할 수 있으며, 이러한 경우에는 도 7의 대체 영상 호 설정 구간의 대체 영상 통화 준비 단계(S322) 및 실시간 대체 영상 통화 구간의 전체 단계가 곧바로 실행된다.Meanwhile, during a normal call between the calling terminal and the called terminal, an alternative video call may be requested through a softkey or a hotkey provided in the terminal. In this case, an alternative video call preparation in the alternative video call setup section of FIG. 7 is prepared. Step S322 and all steps of the real-time substitute video call section are executed immediately.

한편, 본 발명은 화상 통화를 지원하지 않는 종류의 단말기(예컨대, 2G 단말기)의 경우에도 적용될 수 있는데, 도 8a와 도 8b는 화상 통화가 가능한 3G 단말기와 화상 통화를 지원하지 않는 2G 단말기 사이에서의 대체 영상 서비스 방법을 예시적으로 도시하고 있다.Meanwhile, the present invention can be applied to a type of terminal (eg, 2G terminal) that does not support a video call, and FIGS. 8A and 8B illustrate a 3G terminal capable of a video call and a 2G terminal that does not support a video call. The alternative video service method of FIG.

먼저, 도 8a는 대체 영상 서비스에 가입한 발신자의 3G 단말기와 서비스 미가입자인 착신자의 2G 단말기 사이의 대체 영상 서비스 방법이다. 먼저, 발신단말기가 대체 영상 통화를 요청하면, 발신단말기를 관리하는 HLR(206)에서 착신단말기의 HLR과 연동하여 착신단말기의 화상 통화 가능 여부를 확인하고, 이때 착신단말기가 2G 단말기로 판명됨에 따라 발신단말기에 발신자지정 대체 영상을 제공하도록 설정하고, 착신단말기로부터 입력되는 음성에 맞추어 발신단말기로 대체 영상을 제공한다.First, FIG. 8A illustrates an alternative video service method between a 3G terminal of a caller who subscribes to an alternative video service and a 2G terminal of a called party who is not a subscriber. First, when the calling terminal requests an alternative video call, the HLR 206 managing the calling terminal checks whether the called terminal can make a video call in conjunction with the HLR of the called terminal, and when the called terminal is determined to be a 2G terminal, It is set to provide a caller-specific alternative video to the calling terminal, and provides an alternative video to the calling terminal in accordance with the voice input from the called terminal.

한편, 도 8b는 대체 영상 서비스에 가입한 발신자의 2G 단말기 및 착신자의 3G 단말기 사이의 대체 영상 서비스 방법이다. 발신단말기는 자신의 단말기로 대체 영상을 수신할 수는 없지만 상대방의 단말기에 전송될 자신의 대체 영상을 지정 할 수는 있다. 먼저, 발신단말기(2G)에서 대체 영상 통화를 요청하면, 발신단말기를 관리하는 HLR(206)에서 착신단말기가 3G 단말기임을 확인하고, 발신자가 지정한 대체 영상용 캐릭터에 발신자의 음성 정보를 결합하여 대체 영상을 생성하고 이를 착신단말기로 전송한다.Meanwhile, FIG. 8B illustrates an alternative video service method between the caller's 2G terminal and the called party's 3G terminal subscribed to the alternative video service. The calling terminal cannot receive a substitute video to its own terminal, but can designate a substitute video of itself to be transmitted to the counterpart's terminal. First, when the calling terminal 2G requests an alternative video call, the HLR 206 managing the calling terminal confirms that the called terminal is a 3G terminal, and combines the caller's voice information with an alternative video character designated by the caller and replaces it. Create an image and send it to the called terminal.

이때, 착신자가 대체 영상 서비스 가입자일 경우, 상대방이 지정한 대체 영상에 관계없이, 자신이 설정한 캐릭터 영상의 상대방의 대체 영상으로 표현되도록 지정할 수 있으며, 만약 하나의 통화 호에 발신자 지정 대체 영상와 착신자 지정 대체 영상이 충돌하는 경우에는 착신자 지정 대체 영상을 우선하는 것이 바람직하다.In this case, when the called party is an alternative video service subscriber, the called party may designate to be represented as an alternative video of the other party of the character video set by the other party, regardless of the alternative video designated by the other party. In the case where the alternative video collides, it is preferable to give priority to the called party specified replacement video.

정리하면, 화상 통화를 지원하는 단말기를 보유한 대체 영상 서비스 가입자의 경우 자신의 대체 영상과 상대방의 대체 영상을 모두 지정할 수 있으며, 화상 통화를 지원하지 않는 단말기를 보유한 대체 영상 서비스 가입자의 경우 상대방에게 전송되는 자신의 대체 영상만을 지정할 수 있다.In short, an alternative video service subscriber having a terminal supporting a video call can designate both his / her own alternative video and an alternative video of the other party, and an alternative video service subscriber having a terminal not supporting a video call is transmitted to the other party. You can only specify your own replacement image.

도 9는 본 발명의 제 1 실시예에 따른 대체 영상 통화용 단말기의 디스플레이 구성을 도시하고 있다.9 illustrates a display configuration of an alternative video call terminal according to the first embodiment of the present invention.

대체 영상 통화시 단말기의 디스플레이(710)에는 통상의 소프트키와 함께 대체 영상 통화 호 설정 요청을 위한 소프트키(720)가 표시된다. 또한 단말기의 키패드 상에는 통상의 버튼(740) 외에 대체 영상 호 설정 요청을 위한 핫키(730)가 별도로 구비될 수 있다.In the alternative video call, the display 710 of the terminal displays a softkey 720 for requesting the alternative video call call setup in addition to the normal softkey. In addition, a hot key 730 for requesting an alternative video call may be separately provided on the keypad of the terminal.

도 9의 좌측 첫번째 단말기 화면은 소프트키(720)를 이용하여 화상 통화 중에 대체 영상 호 설정 요청을 수행되어 대체 영상이 전송되어 메인 영상으로 디스플레이되는 과정과, 다시 일반 화상 통화로 복귀하면서 실제 영상이 메인 영상으로 디스플레이 되는 과정이 도시되어 있다.In the first terminal screen of FIG. 9, a soft video 720 is used to perform an alternative video call setup request during a video call, and the alternative video is transmitted and displayed as the main video, and the real video is returned to the normal video call. The process of displaying the main image is shown.

한편, 도 9에서는 상대방의 영상(실제 영상 또는 대체 영상일 수 있다)(712)과 자신의 실제 영상(713) 및 자신의 대체 영상(714)이 단말기 화면에 동시에 표시되고 있는 점에 주의해야 한다. 통상의 단말기에는 카메라와 같은 영상 입력 수단이 구비되어 있으므로 자신의 실제 영상을 표현하는 것은 일반적이지만, 도 9와 같이 상대방의 영상과, 자신의 대체 영상이 추가로 동시에 표현되기 위해서는, 단말기의 디스플레이 화면을 분할하는 기술 외에 단말기 자체에서 대체 영상을 생성하는 기술 또는 하나의 단말기에 2 개의 발신 채널과 수신 채널을 할당하는 기술이 구현된 3G 이상의 단말기를 이용하여야 할 것이다.Meanwhile, in FIG. 9, it should be noted that an image of the other party (which may be a real image or a substitute image) 712, a real image 713 of itself, and a substitute image 714 of oneself are simultaneously displayed on the terminal screen. . Since a conventional terminal is provided with an image input means such as a camera, it is common to express its own actual image. However, as shown in FIG. In addition to the technology of dividing the terminal, it is necessary to use a 3G or more terminal implemented with a technique for generating an alternative image in the terminal itself or a technique for allocating two outgoing and receiving channels to one terminal.

먼저, 하나의 단말기에 2 이상의 발신 채널과 수신 채널이 구비된 경우로서 예컨대, 3G 통신망의 일종인 WCDMA 망에서 이용되는 단말기의 경우, 듀얼 채널 기술을 지원하므로 대체 영상 통화시 RSC 플랫폼에서 발신자의 대체 영상과 착신자의 실제 영상(또는 대체 영상)을 각각 음성 트래픽 채널과 데이터 트래픽 채널의 2개 채널을 통해 발신 단말기로 전송할 수 있으며, 이에 따라 도 9에 도시된 바와 같이 하나의 단말기에 상대방의 영상과 자신의 대체 영상이 모두 표현될 수 있다.First, when a terminal is provided with two or more outgoing channels and a receiving channel, for example, a terminal used in a WCDMA network, which is a kind of 3G communication network, the dual channel technology is supported so that the caller is replaced by an RSC platform during an alternative video call. The video and the actual video (or replacement video) of the called party can be transmitted to the calling terminal through two channels, respectively, a voice traffic channel and a data traffic channel. Accordingly, as shown in FIG. All of his replacement images can be expressed.

한편, RSC 플랫폼에서 대체 영상을 생성하여 전송하는 것이 아니라 단말기에서 자체적으로 대체 영상을 생성하여 전송하는 경우에도 도 9와 같이 하나의 단말 기에 상대방의 영상과 자신의 대체 영상 및 실제 영상이 모두 표현될 수 있으며, 이에 대해서는 본 발명의 제 2 실시예에서 상술하기로 한다.Meanwhile, even when a substitute image is generated and transmitted by the terminal itself instead of generating and transmitting the substitute image in the RSC platform, as shown in FIG. This will be described in detail in the second embodiment of the present invention.

도 10은 본 발명의 제 2 실시예에 따른 대체 영상 서비스 시스템의 구성도이다.10 is a configuration diagram of an alternative video service system according to a second embodiment of the present invention.

도 10은, RSC 서비스 플랫폼이 이동통신망에서 분리되어 이동통신사용자의 단말기(1000)에 구현된 점(단, 제 2 실시예의 경우에도 컨텐츠 등록과 관리를 위한 CMS 서버(406)는 이동통신망 내에 구비되어야 한다)이 제 1 실시예와 다르며, 나머지 구성은 동일 내지 유사하다.10 shows that the RSC service platform is separated from the mobile communication network and implemented in the terminal 1000 of the mobile communication user. However, in the second embodiment, the CMS server 406 for content registration and management is provided in the mobile communication network. Different from the first embodiment, and the remaining configurations are the same to similar.

도 11은 본 발명의 제 2 실시예에 따른 대체 영상 서비스 시스템에서 이용되는 RSC 모듈 탑재 이동통신단말기의 구성이다.11 is a configuration of a mobile communication terminal equipped with an RSC module used in an alternative video service system according to a second embodiment of the present invention.

도 11의 RSC 모듈 탑재 이동통신단말기(1000)는 사용자의 실제 영상 획득을 위한 카메라부(1120)와, 오디오 신호의 입출력을 담당하는 마이크로폰과 스피커를 포함하는 오디오부(1140)와, 대체 영상 서비스 요청을 위한 소프트키와 핫키가 구비된 키입력부(1160)와 실제 영상과 대체 영상을 디스플레이하기 위한 표시부(1180)와 대체 영상 생성을 수행하는 RSC 모듈(1300)과, 대체 영상용 캐릭터를 포함하는 데이터를 저장하는 데이터 메모리와 대체 영상 생성 기능을 수행하기 이한 소프트웨어 프로그램을 포함하는 단말기 동작 프로그램을 저장하는 프로그램 메모리로 구성된 메모리(1140)와, 전송 신호의 송신과 수신을 각각 담당하는 송신 기(Tx)(1220)와 수신기(Rx)(1240) 및 안테나에 연결되어 송수신 주파수를 분리하는 듀플렉서(1260), 및 다른 구성요소들과 연결되어 각 구성요소의 기능을 제어하는 적어도 하나의 마이크로프로세서로 구성되는 제어부(1100)를 포함하도록 구성된다.The RSC module-mounted mobile communication terminal 1000 of FIG. 11 includes a camera unit 1120 for acquiring an actual image of a user, an audio unit 1140 including a microphone and a speaker for input / output of an audio signal, and an alternative video service. A key input unit 1160 having a soft key and a hot key for request, a display unit 1180 for displaying actual images and a substitute image, an RSC module 1300 for generating a substitute image, and a character for a substitute image A memory 1140 including a data memory for storing data and a program memory for storing a terminal operating program including a software program for performing an alternative image generating function, and a transmitter (Tx) each responsible for transmitting and receiving a transmission signal. 1220 and a duplexer 1260 connected to a receiver (Rx) 1240 and an antenna to separate transmission and reception frequencies, and other components It is configured to include a control unit 1100 that is constituted by at least one microprocessor for controlling the functions of the components.

RSC 모듈(1300)은 제 1 실시예의 RSC 서버에 대응되는 구성으로서, 비디오 신호 처리를 위한 비디오 디코더(1310) 및 RFT 엔진(1320)과, 오디오 신호 처리를 위한 오디오 디코더(1330) 및 RLS 엔진(1340)과, RFT 엔진(1320)으로부터의 캐릭터 특징 정보와 RLS 엔진(1340)으로부터의 캐릭터 음성 정보를 메모리(1140)에 저장된 설정 캐릭터와 결합하여 대체 영상을 생성하고 이를 렌더링한 후 이동통신망의 전송 대역폭과 전송 속도에 적합한 포맷으로 변환하는 RSC 인코딩 모듈(1350)를 포함하도록 구성된다.The RSC module 1300 corresponds to the RSC server of the first embodiment, and includes a video decoder 1310 and an RFT engine 1320 for video signal processing, an audio decoder 1330 and an RLS engine for audio signal processing. 1340 and character feature information from the RFT engine 1320 and character voice information from the RLS engine 1340 are combined with a set character stored in the memory 1140 to generate an alternative image, render it, and then transmit the mobile communication network. And an RSC encoding module 1350 for converting to a format suitable for bandwidth and transmission rate.

도 12는 본 발명의 제 2 실시예에 따른 대체 영상 서비스 방법에서의 호 처리 순서를 도시하고 있다.12 is a flowchart illustrating a call processing procedure in an alternative video service method according to the second embodiment of the present invention.

제 2 실시예의 대체 영상 서비스를 위해서는, 사용자가 사전에 서비스에 가입하고, 자신이 대체 영상 통화에 이용할 캐릭터를 직접 생성하거나 또는 CMS 서버(또는 CMS 서버에 연결된 저장장치)(406)로부터 다운로드하여 자신의 단말기에 저장하는 대체 영상 서비스 가입 구간(미도시)이 선행되어야 한다.For the alternative video service of the second embodiment, the user subscribes to the service in advance, himself or herself creates a character to use for the alternative video call, or downloads it from the CMS server (or storage device connected to the CMS server) 406 An alternative video service subscription section (not shown) stored in the terminal must be preceded.

다음, 발신단말기로부터의 대체 영상 통화 호 설정 요청(S1322)에 의해 발신단말기와 착신단말기 사이에 실시간 대체 영상 통화가 수행되는데, 이때 발신단말기는 RSC 모듈 탑재 단말기이므로 자체적으로 대체 영상을 생성하여(S1320) 호 설 정을 요청하게 되며, 이에 따라 발신측 G/W에서는 해당 대체 영상을 G/W간의 통신에 적합한 포맷으로 전환하여(S1324) 착신측 G/W로 전달하며, 착신G/W에서는 수신한 대체 영상을 자체 통신망에 적절한 포맷으로 전환하여(S1326) 착신MSC를 거쳐 착신단말기로 전송한다.Next, a real-time substitute video call is performed between the calling terminal and the called terminal by an alternative video call call setup request from the calling terminal (S1322). At this time, since the calling terminal is a terminal equipped with an RSC module, an alternative video is generated by itself (S1320). ), Call setting is requested, and the calling party G / W converts the corresponding substitute video into a format suitable for communication between the G / W (S1324) and delivers it to the called party G / W. One substitute video is converted into a format suitable for its own communication network (S1326) and transmitted to the called terminal via the called MSC.

이어서 착신단말기는 발신단말기로부터 대체 영상을 수신하고(S1328), 이에 대응하여 자체적으로 착신자의 대체영상을 생성하고(S1330), 생성된 착신자의 대체 영상을 포함하는 대체 영상 호 연결 응답 신호를 착신MSC, 착신G/W, 발신G/W, 발신MSC를 거쳐 적절한 형태의 미디어 포맷으로 전환하면서 발신단말기로 전송하며(S1332, S1334), 발신단말기에서 이를 수신하여 재생함으로써(S1338) 대체 영상 통화가 개시된다.Subsequently, the called terminal receives a substitute image from the calling terminal (S1328), and correspondingly generates a substitute image of the called party in itself (S1330), and receives a substitute video call connection response signal including the generated substitute image of the called party. Transfer to the calling terminal while switching to the appropriate media format through the incoming G / W, outgoing G / W, outgoing MSC (S1332, S1334), and receiving and playing it at the calling terminal (S1338). do.

이상의 과정에서 CMS 서버를 제외한 RSC 서비스 플랫폼은 대체 영상 통화 연결에 기여하지 않는다. 그러나, 두 개의 단말기 중 하나의 단말기가 대체 영상 생성 기능을 구비하지 않은 2G 또는 3G 단말기인 경우, RSC 서비스 플랫폼이 개입하여 대체 영상을 생성하고 이를 전송하게 되므로, 결국 제 1 실시예와 제 2 실시예의 결합 형태로 대체 영상 서비스가 수행된다.In the above process, the RSC service platform except the CMS server does not contribute to the alternative video call connection. However, when one of the two terminals is a 2G or 3G terminal that does not have an alternative image generation function, the RSC service platform intervenes to generate the alternative image and transmits the alternate image. The alternative video service is performed in the combined form of the example.

한편, 제 2 실시예에서와 같이 이동통신단말기에 RSC 모듈이 구현될 경우, 대체 영상의 생성 과정이 이동통신환경의 변화에 무관하게 진행되므로 대체 영상 생성시에 신호 품질 판별 수단 등의 도움이 불필요하게 된다.On the other hand, when the RSC module is implemented in the mobile communication terminal as in the second embodiment, since the generation of the replacement image proceeds regardless of the change in the mobile communication environment, it is unnecessary to use a signal quality discriminating means when generating the replacement image. Done.

또한, 제 2 실시예의 경우, 도 8에서와 같이, 이동통신단말기에 입력되는 자신의 실제 영상과 함께 자체 RSC 모듈에서 생성된 대체 영상이 동시에 디스플레이 될 수 있으므로, 사용자는 상대방에게 전달되는 자신의 실제 영상뿐만 아니라 대체 영상의 표현 상황을 관찰할 수 있으며, 실제 영상과 대체 영상 중 원하는 영상을 송신하도록 송신 영상을 전환할 수 있고, 반복적인 서비스 이용을 통해 대체 영상 서비스가 원활한 실제 영상의 형태와 그렇지 않은 형태를 판별할 수 있으므로(예컨대, 대체 영상 생성이 가능한 머리의 회전 각도, 움직임의 속도, 단말기와 얼굴과의 거리 등) 적절한 형태의 실제 영상의 입력을 통해 대체 영상 서비스를 더욱 효과적으로 이용할 수 있게 된다.In addition, in the second embodiment, as shown in FIG. 8, since the substitute image generated by the RSC module may be simultaneously displayed together with the actual image input to the mobile communication terminal, the user may transmit his / her actual image transmitted to the counterpart. It is possible to observe the representation of alternative video as well as the video, and to switch the transmission video to transmit the desired video among the actual video and the replacement video, and to use the repeated service to provide a smooth form of the real video. It is possible to determine the shape that is not used (e.g., the rotation angle of the head capable of generating the replacement video, the speed of movement, the distance between the terminal and the face), so that the alternative video service can be used more effectively by inputting the actual video in the appropriate form. do.

한편, 제 2 실시예의 경우에는, 비디오 신호 처리를 위한 구성과 오디오 신호 처리를 위한 구성 및 대체 영상 생성을 위한 구성이 모두 이동통신단말기의 RSC 모듈에 탑재되었지만, 제 1 실시예와 제 2 실시예의 중간적인 구성으로서, 도 11의 비디오 디코더(1310)와 RFT 엔진(1320)으로 구성된 비디오 처리부만이 사용자 단말기에 RSC 모듈(1300)로서 구성되고, 오디오 디코더(1330)와 RLS 엔진(1320) 및 RSC 인코딩 모듈은 이동통신망 내의 RSC 서버에 구현될 수도 있다. 제 3 실시예는 이와 같이 대체 영상 생성 기능이 단말기와 서버 사이에 적절하게 분리된 경우에 관한 것이다.On the other hand, in the case of the second embodiment, the configuration for video signal processing, the configuration for audio signal processing, and the configuration for generating a substitute image are all mounted in the RSC module of the mobile communication terminal. As an intermediate configuration, only the video processing unit including the video decoder 1310 and the RFT engine 1320 of FIG. 11 is configured as the RSC module 1300 in the user terminal, and the audio decoder 1330, the RLS engine 1320, and the RSC. The encoding module may be implemented in an RSC server in the mobile communication network. The third embodiment relates to a case where the alternative image generation function is properly separated between the terminal and the server as described above.

도 13은 본 발명의 제 3 실시예에 따른 대체 영상 서비스 방법에서의 호 처리 순서를 도시하고 있다.13 is a flowchart illustrating a call processing procedure in an alternative video service method according to the third embodiment of the present invention.

제 3 실시예의 경우에도 대체 영상 가입 등의 사전 절차 단계가 수행되어야 함은 물론이다.In the case of the third embodiment, it is a matter of course that a preliminary procedure step of substituting an alternative video is performed.

이어서, 이동통신단말기에서 대체 영상 통화 호 설정을 요청하면서(S1210) 자체 단말기의 비디오 디코더와 RFT 엔진을 이용하여 추출한 캐릭터 특징 정보를 전송하면(S1205), RSC 서버에서는 이동통신단말기에서 전송한 오디오 신호를 디코딩하고 캐릭터 음성 정보를 추출하여 이를 수신된 캐릭터 특징 정보 및 설정 캐릭터에 결합함으로써 대체 영상을 생성하며, 생성된 대체 영상은 착신G/W와 착신MSC를 거쳐 착신단말기에 수신된다(S1250).Subsequently, when the mobile communication terminal requests the alternative video call call setup (S1210) and transmits the character feature information extracted using the video decoder and the RFT engine of the own terminal (S1205), the RSC server transmits the audio signal transmitted from the mobile communication terminal. Decodes and extracts the character voice information and combines it with the received character feature information and the set character to generate a substitute image, the generated substitute image is received at the called terminal via the destination G / W and the recipient MS (S1250).

이어서, 착신단말기도 동일한 방식으로 캐릭터 특징 정보를 생성하여(S1260), 대체 영상 호 연결 응답 신호를 전송(S1270)하고, 이동통신망 내에서 대체 영상 생성(S1290) 및 필요한 미디어 전환(S1280, 1295)이 이루어져서 발신단말기에 다시 수신되고 발신단말기가 수신된 대체 영상을 재생함으로써(S1230) 대체 영상 통화가 수행된다.Subsequently, the called terminal generates character feature information in the same manner (S1260), transmits an alternative video call connection response signal (S1270), generates an alternative video in the mobile communication network (S1290), and requires media switching (S1280, 1295). In this case, the replacement video call is performed by playing back the replacement video received at the calling terminal and receiving the receiving video (S1230).

제 3 실시예의 경우에는 이동통신단말기에서 비디오 신호를 처리하여 캐릭터 특징 정보만을 이동통신망으로 전송하므로, 비디오 신호 자체의 전송에 따른 대역폭 및 신호 전송 속도의 제한이 크게 완화될 수 있다. 오디오 신호의 전송을 위한 대역폭은 비디오 신호 전송에 필요한 대역폭에 비해 크게 작기 때문에, 오디오 신호는 이동통신단말기에서 처리하지 않고 그대로 이동통신망으로 전송하여 RSC 서버에서 처리되도록 하더라도 오디오 신호 전송에 따른 대역폭, 전송 속도 문제는 발생하지 않게 된다.In the third embodiment, since the mobile communication terminal processes the video signal and transmits only the character feature information to the mobile communication network, the limitation of the bandwidth and the signal transmission rate due to the transmission of the video signal itself can be greatly alleviated. Since the bandwidth for transmitting the audio signal is significantly smaller than the bandwidth required for the transmission of the video signal, the audio signal is not processed by the mobile terminal but transmitted to the mobile network as it is to be processed by the RSC server. Speed problems do not occur.

한편, 제 4 실시예로서, 비디오 신호 처리부와 오디오 신호 처리부가 모두 이동통신단말기에 RSC 모듈로 구현된 경우를 상정할 수 있으며, 이 경우는 RSC 서버는 RSC 애니메이션 엔진과, 렌더러와 비디오 인코더를 포함하는 RSC 인코딩 모듈만으로 구성될 것이다. 당업자라면 상기한 제 1, 2, 3 실시예와의 비교를 통해 제 4 실시예의 특징과 장단점을 이해할 수 있을 것이므로, 이에 대한 도시와 설명은 생략하기로 한다.On the other hand, as a fourth embodiment, it can be assumed that both the video signal processing unit and the audio signal processing unit is implemented as an RSC module in the mobile communication terminal, in which case the RSC server includes an RSC animation engine, a renderer and a video encoder. It will consist only of the RSC encoding module. Those skilled in the art will be able to understand the features and advantages and disadvantages of the fourth embodiment through comparison with the first, second and third embodiments described above, and thus the description and description thereof will be omitted.

지금까지 본 발명을 바람직한 실시예를 참조하여 상세히 설명하였지만, 본 발명이 속하는 기술분야의 당업자는, 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있으며 이상에서 기술한 실시 예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다.Although the present invention has been described in detail with reference to the preferred embodiments, those skilled in the art to which the present invention pertains can implement the present invention in other specific forms without changing the technical spirit or essential features, and have been described above. The embodiments are to be understood in all respects as illustrative and not restrictive.

그리고, 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 특정되는 것이며, 특허청구범위의 의미 및 범위 그리고 그 균등물로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.In addition, the scope of the present invention is specified by the appended claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalents are included in the scope of the present invention. Should be interpreted as

본 발명에 따르면, 실시간 통신의 특성이 중요시되는 이동통신망에 최적화된 형태의 서비스를 제공하기 위해, 화상 통화 중 실시간으로 피사체의 음성과 영상을 분석하고 이를 캐릭터로 대체시킴으로써 보다 고품질의 영상 통화 서비스를 제공할 수 있는, 이동통신망을 이용한 대체 영상 서비스 방법 및 시스템이 제공된다.According to the present invention, in order to provide a service optimized for a mobile communication network in which the characteristics of real-time communication are important, a higher quality video call service can be obtained by analyzing a voice and a video of a subject in real time during a video call and replacing it with a character. Provided is an alternative video service method and system using a mobile communication network.

또한, 본 발명에 따르면, 통신 환경에 따라 데이터 전송 상태가 급변하는 이동통신망에 최적화된 형태의 서비스를 제공하기 위해, 오디오 신호와 비디오 신호의 품질 상태에 따라 오디오 신호와 비디오 신호가 상호 보완적으로 작용함으로써 통신 환경의 변화에도 불구하고 대체 영상을 지속적으로 생성하여 전송할 수 있는, 이동통신망을 이용한 대체 영상 서비스 방법 및 시스템이 제공된다.In addition, according to the present invention, in order to provide a service of an optimized type for a mobile communication network in which the data transmission state changes rapidly according to the communication environment, the audio signal and the video signal are complementary depending on the quality state of the audio signal and the video signal. The present invention provides an alternative video service method and system using a mobile communication network capable of continuously generating and transmitting a replacement video despite a change in a communication environment.

또한, 본 발명에 따르면, 비디오 신호를 위주로 하여 대체 영상을 생성하고 오디오 신호에 근거한 표준화된 입 모양으로 대체 영상의 입 모양을 보완함으로써, 보다 전달력이 우수한 대체 영상을 생성하여 전송할 수 있는, 이동통신망을 이용한 대체 영상 서비스 방법 및 시스템이 제공된다.In addition, according to the present invention, by generating a replacement image mainly on the video signal and by complementing the mouth shape of the replacement image with a standardized mouth shape based on the audio signal, it is possible to generate and transmit a replacement image with better transmission power, An alternative video service method and system using a communication network are provided.

또한, 본 발명에 따르면, 발음에 따른 입 모양을 위주로 하여 대체 영상을 생성하여 전송함으로써 청각 장애인의 경우에도 대체 영상 통화가 가능한, 이동통신망을 이용한 대체 영상 서비스 방법 및 시스템이 제공된다.In addition, according to the present invention, alternative video service method and system using a mobile communication network, which is possible to alternative video call in the case of the hearing impaired by generating and transmitting a replacement video mainly on the shape of the mouth according to pronunciation is provided.

또한, 본 발명에 따르면, 하나의 이동통신단말기가 하나의 이동통신사용자에게 거의 전적으로 할당되는 이용되는 이동통신의 특성을 고려하여, 이동통신사용자별로 학습된 개인 메타데이터를 생성함으로써 대체 영상을 보다 간단하고 신속하게 생성하여 제공할 수 있는, 이동통신망을 이용한 대체 영상 서비스 방법 및 시스템이 제공된다.Further, according to the present invention, in consideration of the characteristics of the mobile communication used in which one mobile communication terminal is almost entirely assigned to one mobile communication user, the replacement image is made simpler by generating personal metadata learned for each mobile communication user. Provided are an alternative video service method and system using a mobile communication network, which can be generated and provided quickly and quickly.

또한, 본 발명에 따르면, 대체 영상 통화시에 상대방에게 전달되는 자신의 대체 영상의 전송 상태를 확인할 수 있는, 이동통신망을 이용한 대체 영상 서비스 방법 및 시스템이 제공된다.In addition, according to the present invention, there is provided an alternative video service method and system using a mobile communication network that can confirm the transmission status of its own replacement video delivered to the other party in the video call.

또한, 본 발명에 따르면, 화상 통신 기능이 제공되지 않는 기존의 2G 단말기와의 통화시에도 대체 영상 서비스를 제공할 수 있는, 이동통신망을 이용한 대체 영상 서비스 방법 및 시스템이 제공된다.According to the present invention, there is provided an alternative video service method and system using a mobile communication network, which can provide an alternative video service even when a call is made with an existing 2G terminal which is not provided with a video communication function.

또한, 본 발명에 따르면, WCDMA, HSDPA, IP-TV, 유선전화망, 인터넷망 등 다양한 통신망을 연동할 수 있는 3세대 통신환경 및 그 이상의 진화된 통신환경에서도 적용 가능한 구조를 가지는 실시간 대체 영상 서비스 방법 및 시스템이 제공된다.In addition, according to the present invention, a real-time alternative video service method having a structure that can be applied to the third generation communication environment that can interoperate with various communication networks, such as WCDMA, HSDPA, IP-TV, wired telephone network, Internet network and more advanced communication environment And a system is provided.

Claims

In the alternative video service system using a mobile communication network,

The video processor performs a real-time face tracking function for extracting video feature information representing a change in feature points of an image by analyzing a received video signal and a real-time lip sync function for extracting voice information by analyzing a received audio signal. An RSC server including an audio processor and an RSC encoding module which combines the video characteristic information and audio information with character content to generate a substitute video which is a video in real time, is connected to the mobile communication network,

The audio processor includes an audio decoder that decodes the received audio signal, and an RLS engine that extracts a voice signal from the decoded audio signal and performs a real-time lip sync.

The RLS engine filters the input audio signal and extracts only a signal having a predetermined frequency range as a voice signal, and matches the mouth shape information database to the voice signal extracted by the voice analysis module to generate a lip sync variable in real time. Contains an RTLS module to extract,

The lip sync variable is transmitted to the RSC encoding module as voice information of the character for the substitute video service.

The RTLS module classifies the lip sync parameters generated according to the voice parameter DNA included in the character voice information for each user terminal and repeatedly stores them as personalized metadata, and when receiving an audio signal from the same user terminal in the future And extracting the lip sync variable from the received audio signal using the learned personalized metadata as a default speech model.

The method of claim 1,

The video processor includes a video decoder for decoding the received video signal, and an RFT engine for extracting a change of contour feature points and individual feature point information of a reconstructed image included in the decoded video signal.

The RFT engine may include: a face analysis module configured to compare the feature point information of the reconstructed image with the feature point information of the basic contour of the human face, and determine whether the reconstructed image is a face image of a person; and the reconstructed image is a face image of a person. Face tracking module for tracking in real time the change of the individual feature points of the face portion when determined as, and a motion tracking module for tracking the feature point information of the contour of the face portion in real time when the reconstructed image is determined as a human face image ,

The change of the feature points tracked by the face tracking module and the motion tracking module is transmitted to the RSC encoding module as feature information of a character for a substitute video service.

delete

The method of claim 1,

The RSC encoding module may include: an RSC animation engine configured to receive the character feature information and the character voice information and combine them with the substitute image service character to generate a substitute image in real time, and a renderer to render the generated substitute image; And a video encoder for encoding and transmitting the rendered substitute image.

The method of claim 4, wherein

Mouth shape information database matched in the RTLS module is a mouth shape for all kinds of vowels, the changed mouth shape of the vowel according to the difference between the initial and the finality, data about the process of changing the mouth shape according to the composition of each letter, and Alternative image service system using a mobile communication network, characterized in that it comprises a mouth shape data that changes according to the frequency range of the audio signal.

The method of claim 5, wherein

The mouth shape information database matched by the RTLS module stores mouth shape data normalized to the voice of the reference frequency range set by the system and mouth shape data changed to the voice of the frequency range higher or lower than the reference frequency. And, if the frequency range of the voice signal extracted from the voice analysis module is different from the reference frequency range, extracting a lip synch variable according to the frequency range of the input voice signal. Service system.

delete

The method of claim 1,

The RLS engine further includes a voice modulation module for modulating the voice signal lip-synced in the RTLS module, wherein the voice modulation module generates a character voice modulation variable corresponding to various voices having different frequency bands and tones, respectively. And providing the character voice information to the RSC encoding module together with the lip sync parameters extracted from the RTLS.

The method of claim 8,

The RSC encoding module generates a contour and a face of the substitute image based on the image feature information, and corrects a mouth shape of the substitute image by using a lip sync parameter included in the voice information. Alternative video service system using communication network.

The method of claim 8,

The RSC encoding module generates a mouth shape of the substitute image based on the voice information, and generates a face portion of the substitute image except the mouth shape based on the image feature information. Alternative video service system.

The method according to claim 9 or 10,

And an SGN server interworking with the RSC server to perform call processing and replacement video service management for a replacement video call, and having replacement video service information including replacement video service subscriber information and character information for each subscriber. Characterized in that, alternative video service system using a mobile communication network.

The method of claim 11,

Connected to user terminals using a WEB / WAP server, and performs registration of a substitute video service character and subscription of a substitute video service requested from the user terminal, and registers the substitute video service subscription and a substitute video service character. A CMS server for transmitting information about the SGN server; And

Alternative storage service system using a mobile communication network, characterized in that it further comprises a storage device for storing the registered character.

The method of claim 12,

MMSC for generating, storing and transmitting multimedia messages; And

A message audio processing unit for performing a non-real time lip sync function to analyze the audio signal included in the message stored in the MMSC and extracting the message voice information; and generating and generating a video character in non real time by combining the message voice information with character contents. And an LSS server having an alternative video message encoding module for transmitting the encoded video character to the MMSC.

The method of claim 13,

The RSC server, the LSS server, the SGN server, the CMS server and the storage device constitute an RSC service platform, and the RSC service platform further includes a media gateway for interworking with the mobile communication network. , Alternative video service system using mobile communication network.

Using a mobile communication network that provides an alternative video service to an alternative video service system including an alternative video service platform connected to a core network of a mobile communication network including a base station, a controller, a switch, a location register, and a multimedia message center through a gateway. In the alternative video service method,

The alternative video service platform,

The video processing unit performs a real-time face tracking function which extracts and changes the feature point information of the image by restoring and analyzing the video signal received in the compressed state, and the audio information by restoring and analyzing the audio signal received in the compressed state. An RSC server including an audio processor configured to extract a real-time lip sync function, and an RSC encoding module configured to combine the image feature information and the voice information with character content to generate a substitute image which is a video in real time;

Interworking with the RSC server to perform a call processing and replacement video service management for a replacement video call call, and has a replacement video service information including the replacement video service subscriber information and character information for each subscriber, and the alternative video service subscriber list An SGN server notifying the location register;

A storage device for storing the registered character,

The alternative video service method,

Determining, by the location register, whether to subscribe to an alternative video service according to a request for establishing an alternative video call from a mobile terminal;

If the calling terminal is a terminal of an alternative video service subscriber, as a result of the determination, an alternative video call call is established in the core network, and the RSC server activates a registered character for the calling terminal to prepare for an alternative video service. Doing;

Generating a substitute image in real time by combining the image characteristic information and the audio information extracted from the video signal and the audio signal input from the calling terminal from the RSC server with the activated character; And

And performing a video call using a real-time substitute image by transmitting the substitute image to the called terminal through the core network in the RSC server.

The method of claim 15,

The alternative video service platform may include a message audio processor configured to perform a non-real time lip sync function by analyzing an audio signal stored in a multimedia message and extract message voice information, and combine the message voice information with character content to display a video character in real time. It further comprises an LSS server having an alternative video message encoding module for generating and transmitting the generated video character to the MMSC,

The alternative video service method,

Transmitting a multimedia message from the MMSC to the LSS server;

Extracting message voice information included in the multimedia message from the LSS server and combining the message voice information with a character registered for the calling terminal to generate a substitute image which is a video character;

Transmitting the substitute image to the MMSC at the LSS server; And

The MMSC further comprises the step of transmitting a replacement video message combined with the replacement video to the called terminal, alternative video service using a mobile communication network.

The method of claim 16,

The calling terminal has a hot key for an alternative video service, and the hot key is implemented with at least one of a softkey and a terminal button, and the request for establishing an alternative video call call from the calling terminal is a hot key for an alternative video service provided in the calling terminal. Alternative video service method using a mobile communication network, characterized in that performed by pressing.

The method of claim 17,

The storage device stores standardized mouth shapes for the registered character and default information on character facial features according to each mouth shape.

The calling terminal is a terminal of an alternative video service subscriber and does not support a video call, and the called terminal is a terminal supporting a video call.

The generating of the substitute image in real time by the RSC server may include generating the substitute image in real time by combining the audio information extracted from the audio signal input from the calling terminal with the default information. Alternative video service method using a mobile communication network.

The method of claim 18,

The called terminal is a terminal of an alternative video service subscriber and supports a video call.

The alternative video service method,

Generating a substitute video in real time by combining the video feature information and the audio information extracted from the video signal and the audio signal input from the calling terminal with the character registered for the called terminal in the RSC server; And

In the RSC server, in real time by ignoring the replacement image generated by the character registered for the calling terminal, and transmitting the replacement image generated by the character registered for the called terminal to the called terminal through the core network. And performing a video call using the replacement video.

The method according to any one of claims 15 to 19,

Classifying the voice information used in generating the substitute image by the RSC server for each calling terminal and repeatedly storing and learning the voice parameter DNA included in the voice information as personalized metadata in the storage device; And

And using the learned personalized metadata as a default voice model when generating a replacement video for the same originating terminal in the future.

A camera unit for acquiring an actual image of a user, an audio unit including a microphone and a speaker in charge of input / output of an audio signal, a key input unit including at least one of a softkey and a hotkey for requesting an alternative image service, and the actual A display unit for displaying an image and a substitute image, an RSC module for generating the substitute image, a data memory storing data including the character for the substitute image and standard mouth shape information, and performing the substitute image generating function A program memory for storing a terminal operating program including a software program for performing the operation, a transceiver for transmitting and receiving transmission signals, and at least one microprocessor connected to the components to control the functions of the components. It includes a control unit is configured,

The RSC animation engine generates an outline and a face of the substitute image based on character feature information from the RFT engine when data of a part or the whole of an audio signal is lost, and character voice information from the RLS engine. Using to correct the shape of the mouth of the substitute image,

The RSC animation engine generates a mouth shape of the substitute image based on the voice information when data of a part or all sections of a video signal is lost, and the mouth shape based on character feature information from the RFT engine. A mobile communication terminal for a replacement video service, characterized in that for generating a face portion of the replacement video except for.

The method of claim 21,

The RSC module,

A video decoder for processing the video signal;

An RFT engine generating character feature information for the substitute image by extracting feature points of the real image included in the video signal;

An audio decoder for processing the audio signal;

An RLS engine configured to generate character voice information for determining a mouth shape of the character for the substitute image by extracting voice information included in the audio signal and matching the standard mouth shape information;

An RSC animation engine for combining the character feature information from the RFT engine and the character voice information from the RLS engine with the character for the substitute image stored in the data memory to generate the substitute image;

A renderer for rendering a substitute image generated by the RSC animation engine; And

And a video encoder for encoding the substitute image rendered by the renderer.

The method of claim 22,

And the control unit simultaneously displays the actual image of the user acquired by the camera unit and the substitute image of the user generated by the RSC module on the display unit.

The method of claim 23,

And the control unit additionally displays at least one of an actual image and a substitute image of a call counterpart received through the transceiver, on the display unit.

The method of claim 24,

The RLS engine repeatedly stores the voice parameter DNA included in the extracted voice information as the user's metadata in the data memory, and defaults the personalized metadata learned by the repetitive storage when the voice information is extracted in the future. The mobile communication terminal for an alternative video service, characterized in that used as.

delete

In the alternative video service system using a mobile communication network,

RSC module to perform character feature information extraction, including a video decoder for processing a video signal obtained from a camera, and an RFT engine for generating character feature information for a substitute image by extracting feature points of the actual image included in the video signal. A telecommunications terminal configured to include; And

An audio processing unit connected to a core network of the mobile communication network and performing a real-time lip sync function to restore and analyze audio signals received in a compressed state from the mobile communication terminal to extract voice information; And an RSC server including an RSC encoding module for generating a substitute video which is a video in real time by combining the voice information and character feature information for a substitute video to the mobile communication terminal.

And repeatedly storing the voice parameter DNA included in the extracted voice information in the data memory as user metadata, and using personalized metadata learned by the storage as a default voice model when extracting voice information in the future. Alternative video service system using mobile communication network.

The method of claim 29,

The RFT engine compares the feature point information of the reconstructed image with the feature point information of the basic contour of the pre-stored human face to determine whether the reconstructed image is a human face image, and the reconstructed image is a human face image. A face tracking module that tracks the change of the individual feature points of the face part in real time when it is discriminated, and a motion tracking module that tracks the feature point information of the contour of the face part in real time when the reconstructed image is identified as a human face image; ,

The change of the feature points tracked by the face tracking module and the motion tracking module is transmitted to the RSC encoding module of the RSC server as feature information of the character for the substitute video service. system.

The method of claim 30,

The audio processor includes an audio decoder for restoring the received audio signal in the compressed state, and an RLS engine for receiving the audio signal from the audio decoder and extracting a voice signal to perform real-time lip syncing.

The lip sync variable is transmitted to the RSC encoding module as voice information of the character for the alternative video service, alternative video service system using a mobile communication network.

delete

32. A method of performing a replacement video service using the replacement video service system according to any one of claims 29 to 31,

Extracting, by the mobile communication terminal, the character feature information of the actual video according to a request for establishing an alternative video call call from the mobile communication terminal;

Determining replacement video service subscription information in the RSC server according to the replacement video call call setup request;

In the RSC server, the registered character is activated according to the substitute video service subscription information, and the registered character is used for the voice information extracted by the audio processor and the character feature information from the mobile communication terminal. Generating a substitute image in real time by combining with a; And