KR20070019295A

KR20070019295A - Method and system for providing lip-sync service for mobile communication subscriber

Info

Publication number: KR20070019295A
Application number: KR1020050074112A
Authority: KR
Inventors: 허재회
Original assignee: 주식회사 인프라밸리
Priority date: 2005-08-12
Filing date: 2005-08-12
Publication date: 2007-02-15
Also published as: KR100733772B1

Abstract

이동통신 가입자를 위한 립싱크 서비스 제공 방법 및 이를 위한 시스템이 개시된다. 본 발명의 이동통신 가입자를 위한 립싱크 서비스 제공 방법은 립싱크용 캐릭터를 생성하여 등록하는 단계; 오디오 신호를 분석하여 음성 데이터를 추출하는 단계; 상기 음성 데이터에 동기시켜 상기 립싱크용 캐릭터의 입 모양을 매핑하는 단계; 상기 입 모양이 매핑된 립싱크용 캐릭터를 이용하여 연속적으로 동작하는 립싱크 미디어를 생성하는 단계; 및 상기 립싱크 미디어를 단말기로 제공하는 단계를 포함한다. 본 발명에 따르면, 3차원 캐릭터를 음성 메시지와 립싱크시켜 이동통신 가입자에게 제공함으로써, 메시지 전달의 효과가 향상되고, 가입자 개개인의 개성을 잘 표현할 수 있다.A method for providing a lip sync service for a mobile subscriber and a system therefor are disclosed. A lip sync service providing method for a mobile subscriber of the present invention includes the steps of: generating and registering a lip sync character; Analyzing the audio signal to extract voice data; Mapping the shape of the mouth of the lip sync character in synchronization with the voice data; Generating lip sync media continuously operating using the lip sync character mapped with the mouth shape; And providing the lip sync media to the terminal. According to the present invention, by lip-syncing a three-dimensional character with a voice message and providing it to a mobile subscriber, the effect of message delivery can be improved and the individuality of each subscriber can be well expressed.

Description

Method for providing lip sync service for mobile subscribers and system for same {Method and system for providing lip-sync service for mobile communication subscriber}

본 발명의 상세한 설명에서 인용되는 도면을 보다 충분히 이해하기 위하여 각 도면의 간단한 설명이 제공된다.BRIEF DESCRIPTION OF THE DRAWINGS In order to better understand the drawings cited in the detailed description of the invention, a brief description of each drawing is provided.

도 1은 본 발명의 일 실시예에 따른 립싱크 서비스를 제공하기 위한 개략적인 망 구성도를 도시한 것이다.1 is a schematic network diagram for providing a lip sync service according to an embodiment of the present invention.

도 2는 도 1에 도시된 LSS 시스템의 내부 구성 블록도이다.FIG. 2 is a block diagram illustrating an internal configuration of the LSS system shown in FIG. 1.

도 3a는 립싱크 캐릭터 생성 모듈에서 수행되는 립싱크용 캐릭터 생성 방법의 일 예를 나타내는 흐름도이다.3A is a flowchart illustrating an example of a lip sync character generation method performed by a lip sync character generation module.

도 3b는 음성 모듈에 의해 수행되는 음성 추출 및 가공 방법의 일 예를 나타내는 흐름도이다.3B is a flowchart illustrating an example of a voice extraction and processing method performed by a voice module.

도 4는 본 발명에 따른 립싱크 서비스 제공 방법의 일 예를 나타내는 흐름도이다.4 is a flowchart illustrating an example of a method of providing a lip syncing service according to the present invention.

도 5는 본 발명의 립싱크 서비스를 이용한 정보 제공 방법을 나타내는 흐름도이다. 5 is a flowchart illustrating a method of providing information using a lip syncing service of the present invention.

도 6은 본 발명의 립싱크 서비스를 이용한 멀티미디어 메시지 전송 방법을 나타내는 흐름도이다.6 is a flowchart illustrating a method of transmitting a multimedia message using a lip syncing service according to the present invention.

도 7은 얼굴 분석 모듈에 의해 얼굴의 구성요소들이 인식되고 좌표값이 추출되는 일 예를 나타내는 도면이다.7 is a diagram illustrating an example in which components of a face are recognized by a face analysis module and coordinate values are extracted.

도 8은 캐릭터 템플릿 편집 엔진에서 수행되는 포토 믹싱의 일 예를 나타내는 도면이다.8 is a diagram illustrating an example of photo mixing performed in a character template editing engine.

본 발명은 이동 통신망 가입자를 위한 서비스에 관한 것으로, 특히, 3차원(이하, 3D)캐릭터를 생성하고 생성된 3D캐릭터의 움직임을 음성메시지와 립싱크시킴으로써, 립싱크된 3차원 캐릭터 동영상을 서비스 가입자에게 제공하는 방법, 이를 위한 서버 및 그 시스템에 관한 것이다. The present invention relates to a service for a mobile network subscriber, and more particularly, to generate a 3D (hereinafter referred to as 3D) character and to lip-sync 3D character video to a service subscriber by lip-syncing the movement of the generated 3D character with a voice message. To a server and a system therefor.

네트워크 및 사이버 공간의 발달로, 자신의 실제 모습이 아닌 자신을 나타내는 캐릭터를 이용하는 기술이 점점 증대되고 있다. 그 중의 하나가 아바타이다. 아바타는 네트워크 접속자를 대신하는 애니메이션 캐릭터이다. 이 아바타는 이-메일, 채팅 등에 이용되고 있으며, 사이버 쇼핑몰, 가상 교육, 가상 오피스 등으로 이용이 확대되고 있다. 그런데, 단순한 아바타나 케릭터를 화면에 표시하는 것만으로는, 다양한 수요자의 욕구를 충족시키지 못하므로, 캐릭터를 음성에 맞춰 동작시키려는 립싱크에 대한 연구가 많이 진행되고 있다. With the development of networks and cyberspace, technologies that use characters that represent themselves rather than their true form are increasing. One of them is an avatar. An avatar is an animated character on behalf of a network visitor. The avatar is used for e-mail, chat, and the like, and is being used for cyber shopping malls, virtual education, and virtual offices. By the way, simply displaying an avatar or a character on the screen does not satisfy the needs of various consumers, and much research has been conducted on the lip sync to operate the character according to the voice.

한편, 이동통신망에서 제공되는 서비스는 한정된 무선자원과 같은 이동통신 망의 특성에 따른 제한으로 인하여, 음성 기반 혹은 텍스트 기반 서비스가 주를 이루었다. 대표적인 것이 음성 메시지 서비스와 단문 메시지 서비스이다. 최근에는 이동통신망이 진화함에 따라, 다양한 멀티미디어 서비스가 연구되고, 제공되고 있다. 캐릭터 문자 서비스나 멀티미디어 문자 서비스 등이 대표적이다. 종래 기술에 따른 이동통신망에서의 캐릭터를 이용한 서비스는 캐릭터가 음성 메시지와 제대로 연계되지 못함으로써, 단조롭거나 메시지 전달의 효과가 떨어진다. 또한 캐릭터가 한정되어 있어 이동통신 가입자 개개인의 개성을 표현하는데 부족하거나, 자신만의 독특한 개성을 표현하고자 하는 가입자들의 욕구를 충족시키지 못하는 경향이 있다.On the other hand, the services provided in the mobile communication network is mainly based on the voice-based or text-based services due to the limitation of the characteristics of the mobile communication network such as limited radio resources. Typical examples are voice message service and short message service. Recently, as the mobile communication network evolves, various multimedia services have been studied and provided. Character text services and multimedia text services are typical. The service using a character in a mobile communication network according to the prior art is monotonous or ineffective in delivering a message because the character is not properly associated with a voice message. In addition, the characters are limited, so they are insufficient to express individual personalities of mobile communication subscribers or do not satisfy the desire of subscribers to express their own unique personalities.

따라서, 가입자의 요청에 따라 3차원(이하, 3D)캐릭터를 생성하고 생성된 3D캐릭터의 움직임을 음성메시지와 립싱크시킴으로써, 립싱크된 3차원 캐릭터 동영상을 서비스 가입자에게 제공하는 방안이 요구된다.Accordingly, a method of providing a lip-synchronized 3D character video to a service subscriber by generating a 3D character (hereinafter referred to as 3D) character and lip-syncing the generated 3D character's movement with a voice message is required.

본 발명이 이루고자하는 기술적 과제는, 3차원 캐릭터를 음성 메시지와 연계(립싱크)시켜 자연스런 동영상을 자동으로 생성하여 이동통신 단말기에 제공하는 방법, 이를 위한 서버 및 그 시스템을 제공하는 데 있다.The present invention has been made in an effort to provide a method for automatically generating a natural video by providing a mobile communication terminal by linking (lip-syncing) a 3D character with a voice message, a server, and a system therefor.

상기 기술적 과제를 이루기 위한, 본 발명의 일 측면에 따르면, 이동통신 가입자를 위한 립싱크 서비스 제공 방법에 있어서, 립싱크용 캐릭터를 생성하여 등록하는 단계; 오디오 신호를 분석하여 음성 데이터를 추출하는 단계; 상기 음성 데이 터에 동기시켜 상기 립싱크용 캐릭터의 입 모양을 매핑하는 단계; 상기 입 모양이 매핑된 립싱크용 캐릭터를 이용하여 연속적으로 동작하는 립싱크 미디어를 생성하는 단계; 및 상기 립싱크 미디어를 단말기로 제공하는 단계를 포함하는 것을 특징으로 하는 이동통신 가입자를 위한 립싱크 서비스 제공 방법 및 이를 실행하는 프로그램을 기록한 기록매체가 제공된다.According to an aspect of the present invention, there is provided a lip sync service providing method for a mobile subscriber, the method comprising: generating and registering a lip sync character; Analyzing the audio signal to extract voice data; Mapping a mouth shape of the lip sync character in synchronization with the voice data; Generating lip sync media continuously operating using the lip sync character mapped with the mouth shape; And providing a lip sync media to a terminal, and a recording medium recording a lip sync service for a mobile subscriber and a program for executing the lip sync media.

바람직하기로는, 상기 음성 데이터를 추출하는 단계는 상기 오디오 신호의 주파수를 분석하는 단계; 및 상기 오디오 신호로부터 잡음과 음성 외 신호를 제거하여 상기 음성 데이터를 추출하는 단계를 포함한다. Preferably, extracting the voice data comprises analyzing a frequency of the audio signal; And extracting the voice data by removing noise and non-voice signals from the audio signal.

또한 바람직하기로는, 상기 립싱크 서비스 제공 방법은 상기 추출된 음성 데이터를 분석하여 감정 상태를 인식하는 단계; 및 상기 인식된 감정 상태에 따라 표정값을 설정하는 단계를 더 포함하고, 상기 설정된 표정값은 상기 립싱크 미디어를 생성하는데 이용된다.Also preferably, the lip sync service providing method may include: recognizing an emotional state by analyzing the extracted voice data; And setting a facial expression value according to the recognized emotional state, wherein the set facial expression value is used to generate the lip sync media.

또한 바람직하기로는, 상기 립싱크용 캐릭터를 생성하여 등록하는 단계는 3차원 모델링을 통하여 기본 캐릭터를 제작하는 단계; 정지 영상으로부터 얼굴의 구성요소를 인식하는 단계; 상기 구성요소에 상응하는 좌표값을 추출하는 단계; 및 상기 추출된 좌표값 및 상기 기본 캐릭터를 이용하여 상기 립싱크용 캐릭터를 생성하는 단계를 포함한다.Also preferably, the step of generating and registering the character for lip syncing may include: producing a basic character through three-dimensional modeling; Recognizing a component of a face from a still image; Extracting coordinate values corresponding to the component; And generating a character for the lip sync using the extracted coordinate value and the basic character.

상기 기술적 과제를 이루기 위한, 본 발명의 다른 일 측면에 따르면, 이동통신 가입자를 위한 립싱크 서비스 제공 시스템에 있어서, 립싱크용 캐릭터를 생성하여 등록하는 립싱크 캐릭터 생성 모듈; 오디오 신호를 분석하여 음성 데이터를 추 출하는 음성 모듈; 상기 음성 데이터에 동기시켜 상기 립싱크용 캐릭터의 입 모양을 매핑하는 립싱크 엔진; 상기 입 모양이 매핑된 립싱크용 캐릭터를 이용하여 연속적으로 동작하는 립싱크 미디어를 생성하는 애니메이션 엔진; 및 상기 립싱크 미디어를 단말기로 제공하기 위해 다른 장치와 연동하는 망 연동부를 포함하는 이동통신 가입자를 위한 립싱크 서비스 시스템이 제공된다.According to another aspect of the present invention, a lip sync service providing system for a mobile subscriber, a lip sync character generation module for generating and registering a lip sync character; A voice module for analyzing voice signals and extracting voice data; A lip sync engine configured to map a mouth shape of the lip sync character in synchronization with the voice data; An animation engine that generates lip sync media that continuously operates using the lip sync character mapped with the mouth shape; And a lip sync service system for a mobile subscriber including a network interworking unit for interworking with other devices to provide the lip sync media to the terminal.

바람직하기로는, 상기 립싱크 서비스 시스템은 상기 립싱크용 캐릭터를 저장하는 컨텐츠 저장부; 상기 컨텐츠 저장부에 저장된 상기 립싱크용 컨텐츠를 관리하는 컨텐츠 관리부; 상기 립싱크 미디어를 저장하는 미디어 저장부; 및 상기 미디어 저장부에 저장된 상기 립싱크 미디어를 관리하는 미디어 관리부를 더 포함한다.Preferably, the lip sync service system includes a content storage unit for storing the lip sync character; A content manager to manage the lip sync content stored in the content storage; A media storage unit for storing the lip sync media; And a media manager configured to manage the lip sync media stored in the media storage.

또한 바람직하기로는, 상기 립싱크 캐릭터 생성 모듈은, 정지 영상으로부터 얼굴의 구성요소를 인식하고, 상기 구성요소에 상응하는 좌표값을 추출하는 얼굴 분석 모듈; 및 3차원 모델링을 통하여 기본 캐릭터를 제작하고, 상기 추출된 좌표값 및 상기 기본 캐릭터를 이용하여 상기 립싱크용 캐릭터를 생성하는 캐릭터 템플릿 편집 엔진을 포함하고, 상기 음성 모듈은, 상기 오디오 신호의 주파수를 분석하여, 상기 오디오 신호로부터 잡음과 음성 외 신호를 제거하여 상기 음성 데이터를 추출하는 음성 분석 모듈 및 상기 추출된 음성 데이터를 분석하여 감정 상태를 인식하고, 상기 인식된 감정 상태에 따라 표정값을 설정하는 음성 템플릿 편집 엔진을 포함한다. Also preferably, the lip sync character generation module may include: a face analysis module configured to recognize a component of a face from a still image and extract coordinate values corresponding to the component; And a character template editing engine configured to produce a basic character through 3D modeling, and to generate the lip sync character using the extracted coordinate values and the basic character. The voice module may be configured to generate a frequency of the audio signal. The voice analysis module extracts the voice data by removing noise and non-voice signals from the audio signal, and analyzes the extracted voice data to recognize an emotional state, and sets an expression value according to the recognized emotional state. A voice template editing engine.

이하에서 첨부된 도면을 참조하여 본 발명을 보다 상세하게 설명하기로 한다.Hereinafter, the present invention will be described in more detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 립싱크 서비스를 제공하기 위한 개략적인 망 구성도를 도시한 것이다. 도 1을 참조하면, 본 발명의 일 실시예에 따른 립싱크 서비스를 제공하기 위하여 망(network)은 LSS(Lip-Sync Service) 시스템(110), IVR(Interactive Voice Response)/ARS(Automatic Response System) 서버(130), MMSC(Multimedia Messaging Services Center)(140), 홈 위치 등록기(Home Location Register, HLR(150), 단말기정보 DB(160) 및 고객 관리 시스템(CS)(170)을 포함한다. 이 외에도 빌링 시스템(미도시), 타망과의 연동을 위한 게이트웨이(미도시) 등이 더 구비될 수 있다. 또한 사용자가 LSS 시스템(110)에 접속하여 컨텐츠 설정 등의 작업을 할 수 있는 사용자 인터페이스 환경을 제공하기 위하여, LSS 시스템에 접속되는 웹 서버(122), 왑(WAP: Wireless Application Protocol) 서버(124) 및 VM(Virtual Machine) 서버(126)가 더 구비되는 것이 바람직하다.1 is a schematic network diagram for providing a lip sync service according to an embodiment of the present invention. Referring to FIG. 1, in order to provide a lip syncing service according to an embodiment of the present invention, a network includes a Lip-Sync Service (LSS) system 110, an Interactive Voice Response (IVR) / ARS (Automatic Response System) Server 130, MMSC (140), home location register (HLR 150), terminal information DB (160) and customer management system (CS) 170. In addition, a billing system (not shown), a gateway (not shown) for interworking with other networks, etc. may be further provided, and a user interface environment that allows a user to access the LSS system 110 and perform contents setting and the like. In order to provide the above, it is preferable that a web server 122, a WAP (Wireless Application Protocol) server 124 and a VM (Virtual Machine) server 126 connected to the LSS system are further provided.

이동 전화기(191a, 191b)는 이동하면서 통화가 가능한 단말기를 포함하며, 예를 들어, 휴대폰 또는 PDA 등을 포함한다. 이동 네트웍(195)은 상세히 도시되어 있지는 않지만, 기지국(미도시), 기지국 제어기(미도시), 교환기(미도시) 등을 포함한다. 이러한 구성요소들은 당업자에게 널리 알려진 구성요소들이므로, 상세한 설명은 생략된다. The mobile telephones 191a and 191b include a mobile terminal capable of making a phone call, for example, a mobile phone or a PDA. The mobile network 195 is not shown in detail, but includes a base station (not shown), a base station controller (not shown), an exchanger (not shown), and the like. Since these components are well known to those skilled in the art, detailed descriptions are omitted.

LSS 시스템(110)은 립싱크용 캐릭터를 생성하여 등록하고, 오디오 신호를 분석하여 음성 데이터를 추출하며, 추출된 음성 데이터에 동기시켜 립싱크용 캐릭터의 입모양을 매핑하여 연속적으로 동작하는 립싱크된 캐릭터, 즉, LSS 미디어를 생성한다. The LSS system 110 generates and registers a lip sync character, analyzes an audio signal, extracts voice data, and synchronizes the mouth shape of the lip sync character in synchronization with the extracted voice data to continuously operate a lip sync character, That is, LSS media is generated.

도 2는 도 1에 도시된 LSS 시스템(110)의 내부 구성 블록도이다.FIG. 2 is a block diagram illustrating an internal configuration of the LSS system 110 shown in FIG. 1.

이를 참조하면, LSS 시스템(110)은 립싱크용 캐릭터 생성 모듈(21), 음성 모듈(23), 사용자 인터페이스부(245), 컨텐츠 관리부(250), 컨텐츠 저장부(255), 립싱크 엔진(lip-Sync engine)(260), 애니메이션 엔진(270), 미디어 관리부(290), 미디어 저장부(295) 및 망 연동부(280)를 포함한다.Referring to this, the LSS system 110 may include a lip sync character generation module 21, a voice module 23, a user interface unit 245, a content manager 250, a content storage unit 255, and a lip sync engine lip-. Sync engine 260, animation engine 270, media manager 290, media storage 295, and network interworking unit 280.

립싱크 캐릭터 생성 모듈(21)은 립싱크용 캐릭터를 생성하여 등록하는 모듈이다. 도 3a는 립싱크 캐릭터 생성 모듈(21)에서 수행되는 립싱크용 캐릭터 생성 방법의 일 예를 나타내는 흐름도이다. 도 3a를 함께 참조하여, 립싱크 캐릭터 생성 모듈(21)의 동작을 기술하면, 다음과 같다.The lip sync character generation module 21 is a module for generating and registering a lip sync character. 3A is a flowchart illustrating an example of a lip sync character generation method performed by the lip sync character generation module 21. Referring to FIG. 3A together, the operation of the lip sync character generation module 21 will be described as follows.

2차원 정지 영상(사진)으로부터 얼굴의 구성요소들-예컨대, 얼굴 윤곽, 눈, 코, 입의 모양 등-을 인식한다(311 단계). 사용자는 립싱크용 캐릭터 생성에 필요한 사진을 LSS 시스템(110)에 업로드할 수 있다. 사용자는 PC(192a)를 이용하여 웹 서버(122)에 접속하거나, 단말기(192b)를 이용하여 왑(WAP: Wireless Application Protocol) 서버(124) 혹은 VM(Virtual Machine) 서버(126)에 접속하여, 사진 업로드, 립싱크용 캐릭터 조회, 설정, 변경 등을 할 수 있다.The components of the face are recognized from the two-dimensional still image (photograph), for example, the shape of the face, the shape of the eyes, the nose, the mouth, and the like (step 311). The user may upload a picture necessary for generating a lip sync character to the LSS system 110. The user accesses the web server 122 using the PC 192a or the WAP (Wireless Application Protocol) server 124 or the VM (Virtual Machine) server 126 using the terminal 192b. You can upload photos, look up characters for lip syncing, and set and change them.

얼굴의 구성요소들을 자동으로 인식하면, 얼굴 구성요소들에 대한 좌표값을 추출한다(313 단계). 그리고, 추출된 좌표값을 활용하여 립싱크용 캐릭터를 생성한다(315 단계).When the components of the face are automatically recognized, coordinate values of the components of the face are extracted (step 313). In operation 315, a character for lip sync is generated using the extracted coordinate values.

립싱크 캐릭터 생성 모듈(21)은 구체적으로는 얼굴 분석 모듈(210) 및 캐릭터 템플릿 편집 엔진(character template edit engine)(220)을 포함한다.The lip sync character generation module 21 specifically includes a face analysis module 210 and a character template edit engine 220.

얼굴 분석 모듈(210)은 정지 영상(예컨대 사진)에서 얼굴의 구성 요소, 예를 들어, 얼굴의 윤곽, 눈, 코, 입 등을 자동으로 인식하여 체계화된 좌표값을 추출한다. 도 7은 얼굴 분석 모듈(210)에 의해 얼굴의 구성요소들이 인식되고 좌표값이 추출되는 일 예를 나타내는 도면이다. 도 7을 참조하면, 얼굴 사진으로부터, 얼굴 윤곽, 눈, 눈썹, 코, 입, 귀 등이 인식되어 좌표값으로 표현됨을 알 수 있다.The face analysis module 210 automatically recognizes components of a face, for example, a contour of the face, eyes, a nose, a mouth, and the like in the still image (eg, a photo), and extracts coordinated coordinate values. FIG. 7 is a diagram illustrating an example in which components of a face are recognized by the face analyzing module 210 and a coordinate value is extracted. Referring to FIG. 7, it can be seen from the face photograph that the face contour, eyes, eyebrows, nose, mouth, ears, and the like are recognized and represented by coordinate values.

캐릭터 템플릿 편집 엔진(220)은 얼굴 분석 모듈(210)에서 추출된 얼굴 구성요소에 대한 좌표값을 이용하여 립싱크용 캐릭터를 생성한다. 캐릭터 템플릿 편집 엔진(220)은 또한, 립싱크용 캐릭터를 생성하는데 기본적으로 필요한 기본 캐릭터를 제작한다. 구체적으로는, 캐릭터 템플릿 편집 엔진(220)은 사람의 형상(특히, 얼굴 형상)을 3차원적으로 모델링하여 메쉬(mesh)-타입의 기본 캐릭터를 생성한다. 그리고, 제작된 기본 캐릭터에 얼굴 분석 모듈(210)에서 추출된 얼굴 구성요소의 좌표값을 적용하여 립싱크용 캐릭터를 생성한다. The character template editing engine 220 generates a character for lip sync using the coordinate values of the face component extracted by the face analysis module 210. The character template editing engine 220 also produces a basic character which is basically required for generating a character for lip sync. Specifically, the character template editing engine 220 generates a mesh-type basic character by three-dimensionally modeling a human shape (particularly, a face shape). Then, a character for lip sync is generated by applying the coordinate values of the facial component extracted by the face analysis module 210 to the produced basic character.

캐릭터 템플릿 편집 엔진(220)은 또한, 립싱크용 캐릭터를 다양하게 변형할 수 있다. 예를 들어, 캐릭터에 액세서리를 착용시키거나, 의상을 변화시키거나, 헤어 스타일을 변화시키는 등의 응용 처리를 하여 변형된 캐릭터를 생성할 수 있다. 캐릭터 템플릿 편집 엔진(220)은 또한 포토 믹싱(photo mixing) 기능을 수행하는 것이 바람직하다. 포토 믹싱이란 각기 다른 영상 혹은 캐릭터를 합성하는 기술로서, 그 일 예가 도 8에 나타나 있다. The character template editing engine 220 may also variously modify the lip sync character. For example, a deformed character may be generated by application processing such as wearing an accessory to a character, changing a costume, or changing a hairstyle. Character template editing engine 220 also preferably performs a photo mixing function. Photo-mixing is a technique of synthesizing different images or characters, an example of which is illustrated in FIG. 8.

도 8을 참조하면, 특정의 얼굴 사진(혹은 캐릭터)을 다른 사진(혹은 캐릭터나 아바타)과 합성함으로써, 다양하게 변형된 캐릭터가 생성됨을 알 수 있다. 합성 을 하기 위해 필요한 사진이나 아바타들은 별도의 DB에 저장되어 관리되는 것이 바람직하다. Referring to FIG. 8, it can be seen that various deformed characters are generated by composing a specific face picture (or character) with another picture (or character or avatar). Photos or avatars needed for compositing are preferably stored and managed in a separate DB.

캐릭터 템플릿 편집 엔진(220)은 얼굴 분석 모듈(210)에서 추출된 얼굴 구성요소들에 대한 좌표값을 다른 사진이나 아바타에 적용하여, 포토 믹싱 기능을 수행한다. 포토 믹싱 기능은 캐릭터 템플릿 편집 엔진(220)이 아닌 다른 모듈에서도 제공될 수도 있다.The character template editing engine 220 applies a coordinate value of the face components extracted by the face analysis module 210 to another photo or avatar to perform a photo mixing function. The photo mixing function may be provided in a module other than the character template editing engine 220.

음성 모듈(23)은 오디오 신호를 분석하여 음성 데이터를 추출하고 필요시 가공 처리하는 모듈로서, 음성 분석 모듈(230) 및 음성 템플릿 편집 엔진(235)을 포함한다. The voice module 23 is a module that analyzes an audio signal to extract voice data and processes it if necessary. The voice module 23 includes a voice analysis module 230 and a voice template editing engine 235.

음성 분석 모듈(230)은 소리(오디오 신호)의 주파수 등을 분석하여, 상기 오디오 신호로부터 잡음과 음성 외 신호를 제거하여 음성만을 분리 추출한다. 음성 템플릿 편집 엔진(235)은 추출된 음성 데이터를 분석하여 감정 상태를 인식하고 인식된 감정 상태에 따라 표정값을 설정한다. 음성 템플릿 편집 엔진(235)은 또한, 추출된 음성을 변조/변환하여 출력하는 기능을 수행할 수 있다. 음성 모듈(23)은 음성 분석 모듈(230) 및 음성 템플릿 편집 엔진(235) 외에 텍스트-음성 변환 모듈(미도시)을 더 포함할 수 있다. 텍스트-음성 변환 모듈(미도시)이란 텍스트 메시지가 입력되는 경우, 이를 음성 데이터로 변환하는 모듈이다.The voice analysis module 230 analyzes the frequency of the sound (audio signal) and the like, and separates and extracts only the voice by removing noise and extraneous signals from the audio signal. The speech template editing engine 235 analyzes the extracted speech data to recognize the emotional state and sets the facial expression value according to the recognized emotional state. The speech template editing engine 235 may also perform a function of modulating / transforming the extracted speech. The speech module 23 may further include a text-to-speech module (not shown) in addition to the speech analysis module 230 and the speech template editing engine 235. The text-to-speech module (not shown) is a module for converting a text message into voice data when it is input.

도 3b는 음성 분석 모듈(230) 및 음성 템플릿 편집 엔진(235)에 의해 수행되는 음성 추출 및 가공 방법의 일 예를 나타내는 흐름도이다. 이를 참조하여 음성 분석 모듈(230) 및 음성 템플릿 편집 엔진(235)이 수행하는 기능을 좀 더 구체적으 로 설명하면 다음과 같다.3B is a flowchart illustrating an example of a speech extraction and processing method performed by the speech analysis module 230 and the speech template editing engine 235. The function performed by the voice analysis module 230 and the voice template editing engine 235 will be described in more detail with reference to the following.

음성 분석 모듈(230)은 오디오 신호가 수신되면, 먼저 수신된 오디오 신호로부터 주파수를 획득하고, 주파수의 범위, 지속성, 반복성 등을 분석하여 음성의 유무를 판단한다(321, 323 단계). 오디오 신호에서 음성이 아닌 소리나 잡음은 제거하고, 음성만을 분리하여 추출한다(325 단계). 이는, 음성 데이터와 음성외 데이터(예를 들어, 음악, 배경음 등)가 섞여 있는 경우, 음성 데이터만을 분리하여 추출함으로써, 음성 데이터에만 정확하게 동기되는 LSS 미디어를 생성하기 위함이다. 음성 데이터만을 분리하지 않으면, 캐릭터가 음성 외 데이터에도 동기되어 움직일 수 있다. 즉, 실제 사람 목소리 들리지 않는데, 입이 움직이는 오류가 발생할 수 있다.When the audio signal is received, the voice analysis module 230 first obtains a frequency from the received audio signal, and analyzes the frequency range, the persistence, the repeatability, and the like to determine the presence or absence of the voice (steps 321 and 323). In the audio signal, sound or noise other than voice is removed and only voice is separated and extracted (step 325). This is to create an LSS media that is accurately synchronized with only the voice data by separating and extracting only the voice data when voice data and non-voice data (for example, music, background sound, etc.) are mixed. If only voice data is not separated, the character can move in synchronization with non-voice data. That is, a real human voice may not be heard, and a mouth movement error may occur.

음성 데이터가 분리 추출되면, 음성 템플릿 편집 엔진(235)은 추출된 음성을 분석하여(327 단계), 감정 상태(예컨대, 화남, 기쁨, 놀람, 슬픔 등)를 인식한다(329 단계). 음성 신호로부터 감정 상태를 인식할 수 있는 요소는 톤(tone), 음성 신호의 피치(pitch), 포만트 주파수(Formant frequency), 말의 빠르기, 음질 등이다. 따라서, 음성 신호의 스펙트럼, 톤, 피치 등을 분석하여 감정 상태를 인식한다(327, 329 단계). 감정 상태가 인식되면, 그 감정 상태에 해당하는 표정값을 설정한다(331). 따라서, 음성 모듈(23)은 음성 데이터와 표정값을 포함하는 음성 컨텐츠를 생성할 수 있다. 표정값은 LSS 미디어를 생성할 때 활용된다. 음성 템플릿 편집 엔진(235)은 또한, 추출된 음성을 변조하거나 합성할 수 있다(333 단계). 음성 변조는 추출된 음성의 패턴을 합성음이나 다른 음성 패턴으로 변환하여 출력하는 기술이다. 예를 들어, 추출된 음성을 다른 성(여성 혹은 남성)의 목소리로 변환하거나, 어린이 목소리 혹은 특정 연예인 목소리로 변환하는 것이다. When the voice data is separated and extracted, the voice template editing engine 235 analyzes the extracted voice (step 327) to recognize an emotional state (eg, anger, joy, surprise, sadness, etc.) (step 329). The elements that can recognize the emotional state from the speech signal are tone, pitch of the speech signal, formant frequency, speech speed, sound quality, and the like. Accordingly, the emotion state is recognized by analyzing the spectrum, tone, pitch, and the like of the speech signal (steps 327 and 329). When the emotional state is recognized, the facial expression value corresponding to the emotional state is set (331). Accordingly, the voice module 23 may generate voice content including voice data and facial expression values. The facial expression value is used when generating LSS media. The speech template editing engine 235 may also modulate or synthesize the extracted speech (step 333). Voice modulation is a technique for converting extracted speech patterns into synthesized sounds or other speech patterns and outputting them. For example, the extracted voice is converted into a voice of another gender (female or male), or a child voice or a specific celebrity voice.

사용자 인터페이스부(210)는 웹 서버(122), 왑 서버(124) 및 VM 서버(126) 등과 연계하여 사용자 인터페이스 기능을 제공한다. 사용자는 도 1에 도시된 바와 같이, PC(192a), 이동 단말기(192b) 등을 이용하여 웹 서버(122), 왑 서버(124) 및 VM 서버(126)를 통하여 LSS 시스템(110)에 접속하여, 자신의 립싱크용 캐릭터나 음성 컨텐츠를 조회, 변경 및 설정할 수 있다. The user interface unit 210 provides a user interface function in association with the web server 122, the swap server 124, the VM server 126, and the like. As illustrated in FIG. 1, a user connects to the LSS system 110 through a web server 122, a swap server 124, and a VM server 126 using a PC 192a, a mobile terminal 192b, or the like. Thus, the lip sync character or voice content can be inquired, changed, and set.

립싱크용 캐릭터 및 음성 컨텐츠는 컨텐츠 관리부(250)에 의해 컨텐츠 저장부(255)에 저장된다. 컨텐츠 관리부(250)는 컨텐츠 관리를 용이하게 하기 위하여, 저장/생성되는 컨텐츠에 식별가능한 ID를 부여하는 것이 바람직하다.The lip sync character and the voice content are stored in the content storage unit 255 by the content manager 250. In order to facilitate content management, the content management unit 250 preferably assigns an identifiable ID to content to be stored / generated.

립싱크 엔진(260)은 음성 데이터에 동기시켜 립싱크용 캐릭터의 입 모양을 매핑한다. 이를 위하여, 립싱크 엔진(260)은 사람의 음성을 모음소로 분석하여, 각 음성 모음소에 대응하는 입모양 데이터를 저장해 둔다. 예를 들어, 입력된 음성이 한글식 발음인 경우에는 ㅏ(아), ㅑ(야), ㅓ(어), ㅕ(여), ㅗ(오), ㅛ(요), ㅜ(우), ㅠ(유), ㅡ(으), ㅣ(이) 등의 모음소로 분석하고, 입력된 음성이 영문식 발음인 경우에는, a(아), e(에), i(이), o(오), u(우) 등의 모음소로 분석하여, 각 모음소에 대응하는 입모양 데이터를 저장해 둔다.The lip sync engine 260 maps the mouth shape of the lip sync character in synchronization with the voice data. To this end, the lip-sync engine 260 analyzes the voice of a person as a vowel and stores mouth data corresponding to each voice vowel. For example, if the input voice is Korean pronunciation, ㅏ (아), ㅑ (야), ㅓ (어), ㅕ (Female), ㅗ (오), ㅛ (요), ㅜ (우), ㅠ Analyzes with vowels such as (Y), ㅡ, ㅣ, and if the inputted voice is English pronunciation, a (a), e (e), i (o), o (o) and u (right) and the like are analyzed, and mouth-shaped data corresponding to each collection is stored.

그런 다음, 립싱크 엔진(260)은 립싱크용 캐릭터의 입모양을 음성 데이터의 모음소에 따라서 해당 모음소의 입모양으로 매핑한다. 이와 같이 함으로써, 립싱크 엔진(260)은 음성 데이터, 특히 음성 데이터의 모음소에 따라 해당 모음소에 매핑 된 입모양 데이터를 가지는 립싱크용 캐릭터를 만든다. Then, the lip sync engine 260 maps the mouth shape of the lip sync character to the mouth shape of the corresponding collection place according to the collection place of the voice data. By doing so, the lip sync engine 260 creates a lip sync character having mouth-shaped data mapped to the vowels according to the vowels of the voice data, in particular vowel data.

애니메이션 엔진(270)은 입 모양이 매핑된 립싱크용 캐릭터를 이용하여 연속적으로 동작하는 립싱크 미디어를 생성한다. 즉, 애니메이션 엔진(270)은 립싱크용 캐릭터의 입모양 외에 다른 구성요소들(예컨대, 눈, 눈썹, 머리 등)도 움직이게 하여 시간에 따라 연속적으로 동작하는 캐릭터, 즉 애니메이션 캐릭터를 생성한다. 이 때, 애니메이션 엔진(270)은 음성 모듈(23)에서 설정한 표정값을 활용하여 LSS 미디어를 생성할 수 있다. 즉, 애니메이션 엔진(270)은 표정값에 따라 립싱크용 캐릭터를 다르게 동작시킴으로써, 캐릭터의 애니메이션 도중에 표정, 예컨대, 화난 표정, 놀란 표정, 기쁜 표정 등이 표현되도록 한다. 생성된 LSS 미디어는 미디어 관리부(290)에 의해 미디어 저장부(295)에 저장된다. 미디어 관리부(290)는 미디어 관리를 용이하게 하기 위하여, 저장/생성되는 미디어에 식별가능한 ID를 부여하는 것이 바람직하다.The animation engine 270 generates lip sync media that continuously operates using a lip sync character mapped with a mouth shape. That is, the animation engine 270 moves other components (eg, eyes, eyebrows, head, etc.) in addition to the shape of the lip-sync character to generate a character that is continuously operated in time, that is, an animation character. At this time, the animation engine 270 may generate the LSS media by using the facial expression value set by the voice module 23. That is, the animation engine 270 operates the lip sync character differently according to the facial expression value, so that an expression such as an angry facial expression, a surprised facial expression, a happy facial expression, or the like is expressed during the animation of the character. The generated LSS media is stored in the media storage 295 by the media manager 290. In order to facilitate media management, the media manager 290 preferably assigns an identifiable ID to the media to be stored / generated.

애니메이션 엔진(270)에 의해 생성된 LSS 미디어는 망 연동부(280)를 통하여 단말기로 제공된다. The LSS media generated by the animation engine 270 is provided to the terminal through the network interworking unit 280.

망 연동부(280)는 IVR/ARS(130), MMSC(140), HLR(150), 단말기 정보 DB(160), 고객 센터(170) 등과의 연동 기능을 제공한다. 바림직한 일 실시예에서, 망 연동부(280)는 생성된 LSS 미디어를 MMSC(140)로 전송하여, MMSC(140)로 하여금 LSS 미디어를 포함하는 멀티미디어 메시지를 생성하여 단말기로 전송하도록 요청한다.The network interworking unit 280 provides an interworking function with the IVR / ARS 130, the MMSC 140, the HLR 150, the terminal information DB 160, the customer center 170, and the like. In a preferred embodiment, the network interworking unit 280 transmits the generated LSS media to the MMSC 140, and requests the MMSC 140 to generate a multimedia message including the LSS media to the terminal.

망 연동부(280)는 또한, 단말기 정보 DB(160)를 통하여 단말기의 미디어 포 맷 정보 및 코덱 정보를 포함하는 속성 정보를 확인하여, LSS 미디어를 단말기 속성정보에 맞게 변환하는 기능도 수행하는 것이 바람직하다. The network interworking unit 280 also checks the attribute information including the media format information and the codec information of the terminal through the terminal information DB 160 and converts the LSS media according to the terminal attribute information. desirable.

다시 도 1을 참조하면, HLR(150)은 이동 네트웍(195)에 포함되는 이동통신 교환기(미도시)로부터 착신 단말기의 위치 정보를 실시간으로 받아서 발신자와 착신자간 통화를 연결한다. 또한 HLR(150)은 가입자 단말기의 파워 온/오프 여부, 기타 권한 검증 등과 같이 가입자의 상태를 체크한다.Referring back to FIG. 1, the HLR 150 receives location information of a called terminal in real time from a mobile communication exchange (not shown) included in the mobile network 195 and connects a call between the caller and the called party. In addition, the HLR 150 checks the status of the subscriber, such as whether the subscriber terminal is powered on / off or other authority verification.

단말기 정보 DB(160)는 가입자 단말기 속성 정보를 저장하고, LSS 서버(120)의 요청에 응답하여 해당 단말기의 속성 정보를 제공한다.The terminal information DB 160 stores the subscriber terminal attribute information and provides attribute information of the corresponding terminal in response to a request of the LSS server 120.

고객 센터(170)는 이동 네트웍(11) 사업자가 가입자에 대한 정보를 관리하기 위한 것으로, 가입자 정보는 고객 센터 내부 DB(미도시)에 저장된다. 즉, 고객 센터(170)는 가입자의 서비스 등록, 해지 또는 정보 변경에 따른 가입자 정보를 내부 DB(미도시)에 저장 및 관리한다.The customer center 170 is for the mobile network 11 operator to manage information about the subscriber, and the subscriber information is stored in the customer center internal DB (not shown). That is, the customer center 170 stores and manages subscriber information in an internal DB (not shown) according to service registration, termination or information change of the subscriber.

도 4는 본 발명의 일 실시예에 따른 립싱크 서비스 제공 방법을 나타내는 흐름도이다. 도 4에 도시된 단계들 이전에, 도 3a에 도시된 립싱크용 캐릭터를 생성하는 단계들과 도 3b에 도시된 음성 데이터를 추출하여 처리하는 단계들이 먼저 수행되는 것이 바람직하다. 4 is a flowchart illustrating a method of providing a lip syncing service according to an embodiment of the present invention. Before the steps shown in FIG. 4, it is preferable to first perform the steps of generating the lip-sync character shown in FIG. 3A and extracting and processing the voice data shown in FIG. 3B.

사용자가 립싱크 서비스를 요청하면, LSS 시스템은 먼저, 미리 생성되어 저장부에 등록되어 있는 립싱크용 캐릭터를 추출하고, 또한, 음성 데이터를 분리 추출한다.When the user requests a lip syncing service, the LSS system first extracts a lip syncing character previously generated and registered in the storage unit, and separates and extracts voice data.

다음으로, 도 4에 도시된 바와 같이, 립싱크용 캐릭터의 입모양을 음성 데이 터, 특히 음성 데이터의 모음소에 따라 각 모음소에 맞는 입모양으로 매핑함으로써, 립싱크용 캐리터를 음성 데이터에 동기화한다(431 단계). Next, as shown in FIG. 4, the lip sync carrier is synchronized with the voice data by mapping the mouth shape of the lip sync character into a mouth shape suitable for each vowel according to the collection of voice data, particularly voice data. (Step 431).

그 다음으로, 입 모양이 매핑된 립싱크용 캐릭터를 이용하여 연속적으로 동작하는 LSS 미디어를 생성하여(433 단계), 생성된 LSS 미디어를 단말기로 제공한다(435 단계). 435 단계에서, 단말기의 속성 정보를 참조하여, 단말기의 미디어 포맷과 코덱에 따라 LSS 미디어의 포맷과 코덱을 변환하여, 변환된 LSS 미디어를 단말기로 제공하는 것이 바람직하다.Next, LSS media continuously operated using the lip sync character mapped with the mouth shape are generated (step 433), and the generated LSS media are provided to the terminal (step 435). In step 435, it is preferable to convert the format and the codec of the LSS media according to the media format and the codec of the terminal with reference to the attribute information of the terminal and provide the converted LSS media to the terminal.

본 발명에 따른 립싱크 서비스 제공 방법은 이동통신망을 이용한 다양한 서비스에 적용될 수 있다. 예를 들어, 사용자는 멀티미디어 메시지를 본 발명에 립싱크 서비스 제공 방법을 이용하여 전송할 수 있다. 또한 본 발명은 기존에 음성이나 텍스트로 제공되는 정보 제공 서비스나, 화상 통신 혹은 회의 전화에 적용될 수도 있다. The lip sync service providing method according to the present invention can be applied to various services using a mobile communication network. For example, a user may transmit a multimedia message to the present invention using a lip sync service providing method. In addition, the present invention can also be applied to an information service provided by voice or text, video communication, or conference call.

도 5 내지 도 6은 본 발명에 따른 립싱크 서비스 제공 방법의 적용예를 도시한다. 먼저, 도 5는 본 발명의 립싱크 서비스를 이용한 정보 제공 방법을 나타내는 흐름도이다.5 to 6 show an application example of a lip sync service providing method according to the present invention. First, FIG. 5 is a flowchart illustrating an information providing method using a lip syncing service of the present invention.

이를 참조하면, 정보 제공 장치가 인터넷망, 랜(LAN)망과 같은 통신망을 통하여 LSS 시스템(110)으로 음성 및/또는 텍스트 정보를 제공한다(510단계). 여기서, 정보 제공 장치는 음성 정보를 제공하는 ARS 장치, 텍스트 정보를 제공하는 웹 서버 등이 될 수 있다. LSS 시스템(110)은 립싱크용 캐릭터를 추출하고, 추출된 립싱크용 캐릭터를 음성 데이터에 동기화하여 LSS 미디어를 생성한다(512, 515, 520 단계). Referring to this, the information providing apparatus provides voice and / or text information to the LSS system 110 through a communication network such as an Internet network or a LAN network (step 510). Here, the information providing device may be an ARS device for providing voice information, a web server for providing text information, or the like. The LSS system 110 extracts a lip sync character and generates LSS media by synchronizing the extracted lip sync character with voice data (steps 512, 515, and 520).

512 단계에서, 수신된 데이터가 음성 데이터가 아닌 경우, LSS 시스템(110)은 먼저 수신된 데이터를 음성 데이터로 변환한다. 정보 제공을 위해 사용되는 립싱크용 캐릭터는 사용자, 정보 제공업체 혹은 립싱크 서비스 제공업체에 의해 미리 설정되어 LSS 시스템(110)의 미디어 저장부에 저장되는 것이 바람직하다. 이 때, 립싱크용 캐릭터는 정보의 종류-예를 들어, 정보의 종류가 뉴스인지, 증권정보인지, 날씨인지 등-에 따라 다르게 설정될 수 있다.In step 512, if the received data is not voice data, the LSS system 110 first converts the received data into voice data. The lip sync character used for providing information is preferably set in advance by a user, an information provider or a lip sync service provider and stored in the media storage of the LSS system 110. At this time, the character for lip-sync may be set differently according to the type of information-for example, whether the type of information is news, stock information, weather.

LSS 미디어가 생성되면, LSS 시스템(110)은 LSS 미디어를 MMSC로 전송한다(525단계). 그러면, MMSC는 LSS 미디어를 포함하는 멀티미디어 메시지를 생성하여(527단계), 단말기로 메시지를 전송한다(530단계). 단말기는 멀티미디어 메시지를 수신하여, LSS 미디어를 재생한다(535단계). 따라서, 단말기 사용자는 립싱크된 동영상과 함께 정보를 제공받을 수 있다.When the LSS media is generated, the LSS system 110 transmits the LSS media to the MMSC (step 525). Then, the MMSC generates a multimedia message including the LSS media (step 527) and transmits the message to the terminal (step 530). The terminal receives the multimedia message and plays the LSS media (step 535). Accordingly, the terminal user may be provided with the lip synced video.

도 6은 본 발명의 립싱크 서비스를 이용한 멀티미디어 메시지 전송 방법을 나타내는 흐름도이다. 도 6에 도시된 멀티미디어 메시지 전송 방법의 경우, 발신자가 음성 메시지를 입력한 경우, 입력된 음성 메시지에 립싱크된 동영상, 즉 LSS 미디어를 생성하여, 립싱크된 동영상과 음성 메시지를 착신측으로 제공하는 방법이다.6 is a flowchart illustrating a method of transmitting a multimedia message using a lip syncing service according to the present invention. In the multimedia message transmission method illustrated in FIG. 6, when a caller inputs a voice message, a method of providing a lip synced video and a voice message to a called party by generating a lip synced video, that is, LSS media, is input to the input voice message. .

도 6을 참조하여 본 발명의 립싱크 서비스를 이용한 멀티미디어 메시지 전송 방법의 각 단계를 구체적으로 살펴보면, 먼저 사용자가 발신 단말기에서 소정의 피쳐코드(예를 들어, *+숫자)와 착신 번호를 입력하고 통화 키를 눌러 발신 요청을 하면, 발신 단말기가 사용자의 요청에 따라 피쳐코드와 착신 번호를 포함한 발신 신호를 교환기로 전송한다(610단계). Referring to each step of the multimedia message transmission method using the lip sync service of the present invention in detail with reference to Figure 6, the user first enters a predetermined feature code (for example, * + number) and the called number at the calling terminal and the call When a call is requested by pressing a key, the caller terminal transmits a call signal including a feature code and a called number to the exchange according to a user's request (step 610).

교환기는 발신 신호에 포함된 피쳐 코드를 해석하여(615 단계), 립싱크 서비스에 해당하면, 호를 LSS 시스템으로 연결시킨다(620단계). 호가 LSS 시스템으로 연결되면, 발신자는 착신측으로 전달하고자 하는 음성 메시지를 입력하고, 발신 단말기는 음성메시지를 LSS 시스템으로 전송한다(625단계).The exchange interprets the feature code included in the outgoing signal (step 615), and if it corresponds to the lip-sync service, connects the call to the LSS system (step 620). When the call is connected to the LSS system, the caller inputs a voice message to be delivered to the called party, and the calling terminal transmits the voice message to the LSS system (step 625).

호가 LSS 시스템에 연결되면, LSS 시스템은 발신 단말기로 미리 설정된 안내 멘트를 제공할 수 있다. 이를 위해서는 LSS 시스템은 ARS 기능을 제공하는 것이 바람직하다. 음성 데이터를 수신하면, LSS 시스템(110)은 발신자 식별 번호에 대응하는 립싱크용 캐릭터를 추출한다(635단계). 이 때, 발신자 식별 번호에 대응하는 립싱크용 캐릭터가 미리 설정되어 있지 않다면, 디폴트 캐릭터가 사용될 수 있다. When the call is connected to the LSS system, the LSS system may provide a preset announcement to the calling terminal. To this end, the LSS system preferably provides an ARS function. Upon receiving the voice data, the LSS system 110 extracts a lip sync character corresponding to the caller identification number (step 635). At this time, if the character for lip sync corresponding to the caller identification number is not set in advance, a default character can be used.

LSS 시스템(110)는 립싱크용 캐릭터를 음성 데이터에 동기화하여 LSS 미디어를 생성하고(640단계), 생성된 LSS 미디어를 MMSC로 전송한다(645단계). 그러면, MMSC는 LSS 미디어를 포함하는 멀티미디어 메시지를 생성하여(647 단계), 착신 단말기로 메시지를 전송한다(650단계). 착신 단말기는 멀티미디어 메시지를 수신하여, LSS 미디어를 재생한다(655단계). 따라서, 착신 단말기 사용자는 발신측이 보낸 음성 메시지를 립싱크된 동영상과 함께 제공받을 수 있다. The LSS system 110 generates the LSS media by synchronizing the lip sync character with the voice data (step 640), and transmits the generated LSS media to the MMSC (step 645). Then, the MMSC generates a multimedia message including the LSS media (step 647) and transmits the message to the called terminal (step 650). The called terminal receives the multimedia message and plays the LSS media (step 655). Accordingly, the called terminal user can be provided with the lip synced voice message sent by the calling party.

도 6에 도시된 절차와 달리, 발신자가 문자 메시지를 전송하여 립싱크 서비스를 요청할 수도 있다. 문자 메시지 전송과 유사하게, 발신자가 단말기에서 문자 메시지를 작성한 후 소정의 키를 누르면, 문자 메시지가 LSS 시스템으로 전송되어 립싱크 서비스가 이루어질 수도 있다. 문자 메시지가 LSS 시스템으로 전송되면, LSS 시스템은 문자 메시지를 음성으로 변환하고, 음성에 립싱크된 동영상, 즉 LSS 미디어를 생성하여, 립싱크된 동영상과 문자 메시지를 MMSC를 통하여 착신측으로 제공하는 것이다.Unlike the procedure shown in FIG. 6, the caller may request a lip syncing service by sending a text message. Similar to sending a text message, when a caller composes a text message at the terminal and presses a predetermined key, the text message may be transmitted to the LSS system to perform a lip syncing service. When the text message is transmitted to the LSS system, the LSS system converts the text message into voice, generates a lip synced video, that is, LSS media, and provides the lip synced video and the text message to the called party through the MMSC.

상술한 본 발명의 적용예에서는 본 발명의 정보 제공 서비스 및 멀티미디어 메시지 전송 서비스에 적용한 예가 기술되었다. 그러나, 본 발명은 이러한 서비스외에도 다양한 서비스에 적용될 수 있다. 예를 들어, 기존에 음성 메시지를 전달하는 메시지 콜 서비스에 적용하여, 음성과 LSS 미디어를 동시에 제공할 수 있다. 또한, 본 발명은 화상 혹은 회의(conference) 통화 서비스에도 적용될 수 있다. 본 발명은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론될 수 있다. In the above-described application example of the present invention, an example applied to the information providing service and the multimedia message transmission service of the present invention has been described. However, the present invention can be applied to various services in addition to these services. For example, the present invention may be applied to a message call service that delivers a voice message, and simultaneously provide voice and LSS media. The invention may also be applied to video or conference call services. The invention can also be embodied as computer readable code on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disks, optical data storage devices, and the like, which are also implemented in the form of carrier waves (for example, transmission over the Internet). Include. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion. And functional programs, codes and code segments for implementing the present invention can be easily inferred by programmers in the art to which the present invention belongs.

본 발명에 대해 상기 실시예를 참고하여 설명하였으나, 이는 예시적인 것에 불과하며, 본 발명에 속하는 기술분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해해야할 것이다. 따라서, 본 발명의 진정한 기술적 보호범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다.Although the present invention has been described with reference to the above embodiments, it is merely exemplary, and it should be understood by those skilled in the art that various modifications and equivalent other embodiments are possible therefrom. will be. Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

본 발명에 따르면, 3차원 캐릭터를 음성 메시지와 립싱크시켜 이동통신 가입자에게 제공함으로써, 메시지 전달의 효과가 향상될 수 있다. 또한 가입자는 자신을 대변하는 캐릭터를 설정하고, 이를 각종 이동통신 서비스-예를 들어, 멀티미디어 메시지 전송서비스, 정보 제공 서비스, 화상통화 서비스 등-에 이용함으로써, 자신의 개성을 보다 잘 표현할 수 있다.According to the present invention, by lip-syncing a three-dimensional character with a voice message and providing it to the mobile subscriber, the effect of message delivery can be improved. In addition, the subscriber can express his or her personality by setting a character representing himself and using it for various mobile communication services, for example, a multimedia message transmission service, an information providing service, a video call service, and the like.

Claims

In the lip sync service providing method for a mobile subscriber,

Generating and registering a character for lip syncing;

Analyzing the audio signal to extract voice data;

Mapping the shape of the mouth of the lip sync character in synchronization with the voice data;

Generating lip sync media continuously operating using the lip sync character mapped with the mouth shape; And

Providing the lip sync media to a terminal.

The method of claim 1, wherein extracting the voice data

Analyzing the frequency of the audio signal; And

And extracting the voice data by removing the noise and the non-voice signal from the audio signal.

The method of claim 2, wherein the lip syncing service is provided.

Analyzing the extracted voice data to recognize an emotional state; And

Setting a facial expression value according to the recognized emotional state,

And the set facial expression value is used to generate the lip sync media.

The method of claim 1, wherein the generating and registering the character for lip sync is performed.

Producing a basic character through three-dimensional modeling;

Recognizing a component of a face from a still image;

Extracting coordinate values corresponding to the component; And

And generating the lip sync character by using the extracted coordinate value and the basic character.

The method of claim 4, wherein the generating and registering the lip sync character is performed.

And applying the lip sync character to generate a modified lip sync character.

Assigning an identifier to the lip-syncing character; And

And storing the lip sync character according to the identifier.

The method of claim 1, wherein the lip syncing service is provided.

Identifying attribute information including media format information and codec information of the terminal; And

And converting the lip sync media according to the terminal attribute information.

The method of claim 1, wherein the providing of the lip sync media to the terminal comprises:

Transmitting the lip sync media to a multimedia message center,

The multimedia message center generates a multimedia message including the lip sync media, and transmits the generated multimedia message to the terminal.

A recording medium having recorded thereon a program which can be read and executed by a digital signal processing apparatus as a program for performing the method of any one of claims 1 to 8.

In the lip sync service providing system for a mobile subscriber,

A lip sync character generation module for generating and registering a lip sync character;

A voice module for analyzing the audio signal and extracting voice data;

A lip sync engine configured to map a mouth shape of the lip sync character in synchronization with the voice data;

An animation engine that generates lip sync media that continuously operates using the lip sync character mapped with the mouth shape; And

A lip sync service system for a mobile subscriber including a network interworking unit for interworking with other devices to provide the lip sync media to a terminal.

The lip-sync service system of claim 10, wherein

A content storage unit for storing the lip sync character;

A content manager to manage the lip sync content stored in the content storage;

A media storage unit for storing the lip sync media; And

And a media management unit for managing the lip sync media stored in the media storage unit.

The method of claim 10, wherein the lip sync character generation module

A face analysis module for recognizing a component of a face from a still image and extracting coordinate values corresponding to the component; And

And a character template editing engine for creating a basic character through three-dimensional modeling and generating the lip sync character using the extracted coordinate values and the basic character.

The method of claim 12, wherein the character template editing engine

A lip sync service system for a mobile subscriber, characterized in that for generating a modified lip sync character by applying the lip sync character.

The method of claim 10, wherein the voice module

And a voice analysis module for analyzing the frequency of the audio signal to remove noise and non-voice signals from the audio signal to extract the voice data.

The method of claim 14, wherein the voice module

And a voice template editing engine configured to analyze the extracted voice data to recognize an emotional state and set an expression value according to the recognized emotional state.

The method of claim 15, wherein the animation engine

Lip sync service system for generating a lip sync media by operating the lip sync character differently according to the facial expression value.

The method of claim 10, wherein the network linkage

And identifying the attribute information including the media format information and the codec information of the terminal and converting the lip sync media according to the terminal attribute information.

The method of claim 10, wherein the network interlocking unit,

Send the lip sync media to a multimedia message center,