WO2006011295A1

WO2006011295A1 - Communication device

Info

Publication number: WO2006011295A1
Application number: PCT/JP2005/010024
Authority: WO
Inventors: Katsunori Orimoto; Toshiki Hijiri; Akira Uesaki; Yoshiyuki Mochizuki
Original assignee: Matsushita Electric Industrial Co., Ltd.
Priority date: 2004-07-23
Filing date: 2005-06-01
Publication date: 2006-02-02
Also published as: JP2007279776A

Abstract

A communication device for communication using a CG character agent while automatically creating/updating a personal CG agent from image information and speech information sent during conversation through a video telephone. The communication device comprises a communication processing section (210) for communicating video telephone packet data with another terminal, a video telephone processing section (220) for creating image information and speech information on the conversation party from the video telephone packet data, and an agent data creating section (230) for creating agent data on the conversation party from the image information and speech information, an address book data storage unit (250) storing the agent data after associating the agent data with the personal information on the conversation party, an address book data management section (240) for performing data management of the address book data such as data search, and an agent output unit (270) for creating a CG character agent from the agent data.

Description

Specification

Communication device

Technical field

The present invention relates to a communication device that transmits a message using a CG character agent that imitates a specific individual, and more particularly, to a communication device having a videophone function.

Background art

Conventionally, information terminals have transmitted various messages to users. For example, there is a message to inform the state of the device such as “The battery is about to run out”, a message to convey the event such as “E-mail from Mr. OO arrived”, etc. The content of the sent e-mail itself can be regarded as a message from the mail sender. In the transmission of these messages, the simplest interface is to display text information on the screen. However, the display of only characters is (1) the number of characters that can be displayed simultaneously is limited, (2) information that is difficult to express by characters (for example, emotions, etc.) cannot be conveyed, etc. It was difficult.

[0003] Therefore, an easy-to-understand and friendly interface has been proposed by combining various interfaces such as displaying an image of a message sender and reading out a message by voice. One such multi-modal interface is an interface that reads out CG character power messages in the form of a human voice. Such an interface using CG characters is called a CG character agent.

[0004] This CG character agent can be broadly divided into two types: one using a general CG character and one using a personal information-powered CG character. The former is an agent that can be created without depending on the personal information of the user, and refers to, for example, an anime character or a character in the form of an animal. The latter is generated based on the user's personal information, and refers to, for example, a character pasted with a user's face photo or a character using a portrait that reflects the user's characteristics. [0005] As an example of an application using the latter, when an e-mail is received, a CG character in the form of an e-mail sender is displayed, and the e-mail is read out by the voice of the e-mail sender. Things have been proposed. In this way, by reflecting personal information, it is possible to read out messages in a friendly voice so that they can be easily read and visually understand the sender at a glance. Now, it becomes possible to improve comprehension. Below, such a CG character agent is called a personal CG agent.

[0006] As a technique for realizing a personal CG agent, a technique for creating a CG character imitating a target individual in both voice and image has been proposed. With regard to images, there is a technology that displays an arbitrary animated image of a specific person by creating a person's shape model from a person's photograph (or video) and pasting the person's photograph on the shape model. As for speech, there is a technology called speech synthesis, which stores speech units such as vowels and consonants, or frequently used word units as speech dictionary data, and those speech dictionary data during playback. By combining these, the sound of an arbitrary character string is pronounced.

[0007] In the past, there has been proposed a technique for generating a CG character imitating a specific individual from both the image and the sound by using these techniques (see, for example, Patent Document 1).

[0008] In addition, there has been proposed a technique for controlling a CG character agent as intended by a message creator by attaching parameters for controlling such a CG character to a message (for example, an electronic mail). (For example, see Patent Document 2).

Patent Document 1: Japanese Unexamined Patent Publication No. 2003-141564

Patent Document 2: Japanese Patent Laid-Open No. 06-162167

Disclosure of the invention

Problems to be solved by the invention

However, in the above-described conventional technology, (1) in order for the user to use the personal CG agent in the communication device, the user intentionally communicates, for example, by specifying an image for each communication partner. The premise is to create data for each partner, which requires user effort. In addition, in (2) data management of communication devices, control Although a method for attaching parameters has been disclosed, there are problems such as an increase in the amount of data in message management and inapplicability to arbitrary message data.

[0010] The present invention aims to solve the above-described problems. By using information transmitted together with communication such as a videophone, it is possible to automatically perform communication for each communication partner without requiring a user. The primary purpose is to provide a communication device that can generate and update personal CG character agents.

[0011] Communication that facilitates data management of the CG character agent created for each communication partner even when agent data is used for transmission of any message that does not require special attached data. A second object is to provide an apparatus.

Means for solving the problem

[0012] In order to solve the above-described conventional problem, the communication device of the present invention is a communication device for communicating a message using a CG character agent, and communication means for communicating communication data with other terminals. And agent data creation means for creating agent data including personal characteristic data of the communication partner based on the communication data, and address book data including the agent data corresponding to the personal information of the communication partner. The address book data creating means, the address book data storing means for storing the address book data, and the agent data included in the address book data of the communication partner are referred to, and the CG character agent of the communication partner is referred to. Creating means for creating the agent and an agent for outputting the CG character agent Characterized in that it comprises a power means.

[0013] With this configuration, the communication apparatus according to the present invention automatically creates agent data of a communication partner using communication data received during communication with another terminal, and generates address book data. It is possible to create a CG character agent by referring to the agent data of the communication partner registered in the address book in the agent creation means.

[0014] Further, in the communication apparatus according to the present invention, the agent data creation means extracts an image feature extraction unit that automatically extracts image feature data from the communication data, and extracts voice feature data. A speech feature extraction unit that performs at least the agent data. Includes the image feature data and the audio feature data.

With this configuration, in the communication device according to the present invention, it is possible to extract image feature data and audio feature data included in the communication data and automatically manage and update them as agent data.

[0016] Further, the agent data creation means of the communication device according to the present invention temporarily stores a reliability degree assigning unit that assigns reliability to the created agent data, and the agent data to which the reliability is assigned. Compares the temporary storage data storage unit for storage with the reliability of newly created agent data during communication and the reliability of agent data already stored in the temporary storage data storage unit! A reliability determination unit that automatically updates the agent data stored in the temporary storage data storage unit to agent data having high reliability, and the agent creation means further includes the temporary storage data storage The CG character agent is created using the agent data stored in the section.

[0017] With this configuration, the communication device according to the present invention assigns reliability to the image feature data and the sound feature data, which are personal feature data, and is automatically stored in the temporary storage data storage unit. Because agent data can be updated to highly reliable agent data, it is always possible to communicate messages using the latest reliable image and voice CG character agent of the communication partner.

[0018] In order to achieve the above object, the present invention can be realized as a CG character agent creating apparatus having characteristic means of a communication apparatus, a communication method using the means as a step, or each step in a computer. It can also be realized as a program for execution. It goes without saying that such a program can be distributed through a recording medium such as a CD-ROM or a transmission medium such as the Internet.

The invention's effect

[0019] The communication apparatus according to the present invention can automatically generate and update a CG character agent that reflects the characteristics of a communication partner that does not require user effort. Therefore, it is possible to realize a familiar interface using CG character agents and an easy-to-use interface.

[0020] In addition, the communication device according to the present invention stores image feature data and sound feature data of a communication partner. This makes it easy to manage agent data, and can use this agent data for various applications other than videophone and mail.

Brief Description of Drawings

1 is an external view of a mobile phone terminal for explaining an information terminal according to the present invention. [FIG. 2] FIG. 2 shows a configuration of a communication device according to an embodiment of the present invention. FIG. 3 is a block diagram showing a configuration of an agent data creation unit in an embodiment of the present invention.

FIG. 4 is a conceptual diagram for explaining address book data in an embodiment of the present invention.

FIG. 5 is a conceptual diagram for explaining feature extraction of an image in an embodiment of the present invention.

FIG. 6 is a conceptual diagram for explaining a method for generating an animation image using feature points of an image in an embodiment of the present invention.

[Fig. 7] Fig. 7 is a flowchart for explaining generation processing of agent data in one embodiment of the present invention.

[FIG. 8] FIG. 8 is a flowchart for explaining processing of an application using a personal CG agent in one embodiment of the present invention.

FIG. 9 is a conceptual diagram for explaining animation control in an agent output unit in one embodiment of the present invention.

Explanation of symbols

[0022] 10 mobile phone terminals

20 Communication equipment

101 keys

102 Speaker

103 display

104 camera

105 microphone

210 Communication processor 220 Videophone processor

230 Agent Data Creation Department

231 Image feature extraction unit

232 Voice feature extraction unit

233 Reliability judgment unit

234 Temporary storage data storage

240 Address Book Data Management Department

250 Address book data storage

260 Agent setting section

261 Personal identifier setting section

262 Status setting section

263 Message setting section

270 Agent output section

271 CG character drawing

272 Speech synthesis unit

273 Basic data storage

280 output section

281 Image output section

282 Audio output section

290 input section

291 Image input section

292 Audio input section

293 Key input section

300 Application processing section

400 address book data

401 Agent data

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, as an embodiment of the present invention, a message is obtained using a CG character agent. A communication device for transmitting information will be described with reference to the drawings.

[0024] (Appearance of mobile phone terminal)

FIG. 1 is an external view of the mobile phone terminal 10. In FIG. 1, a mobile phone terminal 10 is a mobile phone having a television phone function, and includes a key 101, a speaker 102, a display 103, a camera 104, and a microphone 105.

[0025] The key 101 is composed of a number key for making a call and a plurality of keys for a camera function and a mail function. The speaker 102 outputs sound when receiving a call, or outputs a ring tone for a telephone or e-mail. The display 103 displays images and characters, and specifically may be a liquid crystal display or an organic EL display. The camera 104 acquires a still image or a moving image, and acquires a user image when using a videophone. As an example of the camera 104, a CCD camera or a CMOS camera may be used. The microphone 105 is for inputting voice, and acquires the voice of the user when using the telephone.

[0026] In the description of the present embodiment, the communication device is described as the mobile phone terminal 10, but a communication device having a videophone function, such as a stationary telephone, a PDA (Personal Digital Assistant), a personal It can be a computer or the like.

[0027] (Outline of character CG agent device)

FIG. 2 is a functional block diagram of the communication device 20 in the mobile phone terminal 10. The communication device 20 includes a communication processing unit 210, a videophone processing unit 220, an agent data creation unit 230, an address book data management unit 240, an address book data storage unit 250, an agent setting unit 260, an agent output unit 270, an output unit. 280, an input unit 290, and an application processing unit 300. Needless to say, the communication device may be provided with the CG character agent creation device constituting each functional block shown in FIG.

[0028] (Input / output)

The input unit 290 includes an image input unit 291, an audio input unit 292, and a key input unit 293.

[0029] The image input unit 291 also captures image data from the camera 104 and acquires it as bitmap data. Examples of bitmap data formats include RGB format and YU V format etc.!

The voice input unit 292 acquires voice data from the microphone 105. An example of audio data may be PCM (Pulse Code Modulation) data.

The key input unit 293 acquires the pressed state when the key 101 is pressed.

The output unit 280 includes an image output unit 281 and an audio output unit 282.

The image output unit 281 receives bitmap data or a memory address where the bitmap data is stored, and displays it on the display 103.

The audio output unit 282 receives audio data and outputs sound through the speaker 105.

[0034] (Videophone processing)

A communication processing unit 210 shown in FIG. 1 performs transmission / reception of communication in a videophone. When receiving the videophone data, the communication processing unit 210 passes the received packet data, which has also received the information terminal capability of the other party, to the videophone processing unit 220. When transmitting videophone data, the transmission packet data generated from the videophone processing unit 220 is received and transmitted to the information terminal of the other party.

[0035] As an example of packet data for this videophone, a data format of MPEG4 (Moving Picture Experts Group phase 4) may be used. The videophone packet data may have any data format as long as voice and moving images can be transmitted. The present invention can be applied to any data format.

[0036] (Videophone processor)

The videophone processing unit 220 performs videophone transmission processing and reception processing. In the reception process, the received packet data force received from the communication processing unit 210 also generates image data (bitmap data) as image information and sound data as sound information. The generated bit map data is transferred to the screen output unit 281 and the audio data is transferred to the audio output unit 282, whereby the transmitted image is displayed on the display 103, and the transmitted audio is output to the speaker 102. Output from. In the transmission process, video packet transmission packet data is created from the image data acquired from the image input unit 291 and the audio data acquired from the audio input unit 292, and passed to the communication processing unit 210. By repeating the above reception process and transmission process at regular time intervals (for example, 15 times per second), the TV phone function can be realized. [0037] In addition, when starting a videophone call, the videophone processing unit 220 informs the address book data management unit 240 of the telephone number of the other party (videophone partner) and the agent in the corresponding address data. Get data update settings for data. When the update setting is ON, the generated image data and audio data are passed to the agent data creation unit 230 during the reception process. As an example of the timing for passing data, timing at which reception processing is executed may be used. However, in order to reduce the processing, the timing may be set at regular intervals so that the number of times is less than that of the reception processing.

[0038] (Create agent data)

The agent data creation unit 230 creates agent data for the image information and voice information provided from the videophone processing unit 220 and stores them in the address book data storage unit 250. Hereinafter, detailed description will be given using the configuration diagram of the agent data creation unit 230 shown in FIG.

The agent data creation unit 230 includes an image feature extraction unit 231, an audio feature extraction unit 232, a reliability determination unit 233, and a temporarily saved data storage unit 234.

The image feature extraction unit 231 generates image feature data from the image data received from the videophone processing unit 220. An example of image feature data is the image data (still image) from which the position coordinates of feature points indicating the part of the face and feature points are extracted.

FIG. 5 is a conceptual diagram for explaining the facial feature points extracted by the image feature extraction unit 230. The face image shown in the upper part of FIG. 5 shows the image data delivered from the videophone processing unit 220, and the face of the other party is displayed on a 240 × 320 size bitmap. P1 (upper lip upper end point), P2 (upper lip lower end point), P3 (lower lip upper end point), P4 (lower lip lower end point), P5 (lip left end point), P6 (lip The result of recognizing 6 points (right end point) is shown below. The numerical value in the box next to the feature point name indicates the position coordinates of the feature point. Based on the bitmap size, the upper right point is the origin and the lower right point is (240, 320) t Use the coordinate system! /

The voice feature extraction unit 232 generates voice feature data from the voice data received from the videophone processing unit 220. As an example of the voice feature data, a voice control parameter for controlling the pitch, accent, and the like at the time of voice synthesis may be used. Voice feature data May be voice data obtained by extracting specific phonemes or words used at the time of voice synthesis, or data such as a difference from the voice dictionary data stored in the basic data storage unit 273.

[0043] (Address book data management)

The address book data management unit 240 manages the address book data recorded in the address book data storage unit 250. This address book data includes personal information of multiple persons. The address book data management unit 240 has a function of searching for one element included in address book data such as a telephone number as a key and acquiring a memory address of the address book data as a search result. For example, by passing a numerical value of “13” as the data ID or “0901234 5678” as the telephone number, the memory address for accessing the address book data as shown in FIG. 4 is managed.

Here, the data configuration of the address book data 400 used in the communication apparatus according to the present embodiment will be described with reference to FIG. An example of the address book data 400 shown in FIG. 4 is address book data for one person. A data ID for uniquely identifying the data, and an individual such as a name, telephone number, e-mail address, group number, icon, etc. Information is recorded. These data are used for various purposes, such as displaying icons when receiving a phone call, and sorting to a directory by group when receiving a mail.

Further, the present invention is characterized in that the address book data 400 includes agent data 401. An example of the agent data 401 in FIG. 4 includes image feature data, voice feature data, data update settings, and an agent type.

[0046] The image feature data consists of image data itself, or a certain type of feature data obtained by extracting features from the image data. As an example of the image feature data, the position coordinates of the feature points by state generated by the image feature extraction unit 231, the reliability indicating the certainty of the information about the feature points by state, and the bit of the image data corresponding to the feature points Consists of a file name indicating the map.

It should be noted that the position coordinates of the feature points of the image data are data indicating the positions of the human eyes and mouth and the connected parts in the image data, as shown in the lower diagram of FIG. 5 and the conceptual diagram of FIG. For example, it can be defined as position coordinates indicating points on the contours such as eyes, nose and mouth. The characteristic points for each state are the states in which phonemes such as “A”, “I”, “U”, “E”, “O” are generated, “Angry”, “Laughing”. This means the position coordinates of feature points in emotional states such as “sad”. By having feature points for each state, it is possible to generate personal CG agents with various facial expressions.

[0048] Then, the bitmap of the image data corresponding to the feature point is image data that becomes a base when the image is divided and displayed as shown in the upper diagram of FIG. 6, and may be image data in a specific state. . In the present embodiment, image data may be provided for each of a plurality of states using a single image data. For example, it is possible to improve the expressive power by providing a plurality of image data corresponding to feature points by emotion.

[0049] (Reliability)

The reliability is a numerical value representing the certainty of image recognition and voice recognition when extracting feature data. For example, if the other party is not clearly shown in the image data of a videophone, there is a high possibility that image feature data cannot be extracted accurately. Even if a CG character agent is generated using image feature data in such a case, a clear image cannot be generated. Therefore, in the present invention, a realistic CG character reflecting the features of the other party is added by adding an index of reliability to the image feature data and voice feature data, and selecting and recording feature data with high reliability. Realizes creating an agent.

[0050] As an example of the reliability, it is necessary to generate a numerical value in which the reliability of recognition is high, the value is 100, the reliability is low !, and the value is 0.

[0051] As a method of generating the reliability of the image feature data, it is also possible to generate a value of a table function when recognizing (detecting) a face (or a face portion) from the image data. As an example of the evaluation function in face detection, for example, an evaluation function g for face Z non-face identification disclosed in Japanese Patent Laid-Open No. 2003-44853 may be used. By using this evaluation function g, it is possible to perform face Z non-face discrimination by assuming that a face that should be determined as a face is negative and that a face that should be determined is positive.

[0052] As a method of generating the reliability of the speech feature data, it can be generated from the value of the recognition evaluation function used during speech recognition. As an example of the evaluation function in speech recognition, the value of the cost function V disclosed in JP 2004-198656 may be used. . Note that the reliability of face recognition can also be used to determine the reliability of speech recognition. In this case, when determining the reliability of the audio feature data, the reliability of the corresponding image feature data is used.

In the description of the communication device 20 according to the present embodiment, the reliability of the image feature data is created in the image feature extraction unit 231 and the reliability of the audio feature data is the audio feature extraction unit 232. However, the agent data creation unit 230 may be provided with a reliability providing unit that provides reliability.

[0054] (voice feature data)

The voice feature data also includes the reliability of the voice feature data for each state generated by the voice feature extraction unit 232 and the voice feature data for each state. As an example of audio feature data, as with image feature data, phonemes such as “A”, “I”, “U”, “E”, “O”, “Angry”, “Laughing” , Voice control parameters in emotional states such as “sad” may be used. In addition, the reliability indicates the certainty of recognition and classification, similar to the reliability of the image feature data.

[0055] The data update setting indicates whether or not agent data is to be generated. When ON, it indicates that agent data is generated and updated when a call is received for a TV phone call. Indicates that agent data is not updated during a call. This setting can be specified by the user using an address book editing application.

[0056] The agent type sets what type of agent is used when generating a personal CG agent. The present invention comprises image feature data and audio feature data as agent data. Therefore, these types of data can be used to generate various types of agents. For example, on the image side, it is possible to generate a realistic agent with a face photo of a person or to generate a portrait with image feature data power. On the sound side, the voice feature data power is similar to that person. It is also possible to generate robotic speech with the person's accent.

[0057] (Agent setting)

Agent setting unit 260 is an application processing unit 300 power personal CG age Enter the agent settings when using the agent message transfer function. The details will be described below.

The agent setting unit 260 includes a personal identifier setting unit 261, a state setting unit 262, and a message setting unit 263.

[0059] The personal identifier setting unit 261 sets a personal identifier that is identification data for specifying a personal CG agent to be used. The personal identifier is a key for searching the address book data by the address book data management unit 240. For example, a data ID included in the address book data or a telephone number may be used.

[0060] The state setting unit 262 designates a state when a message is transmitted. An example of the state may be emotions such as “angry” and “laughing”. The state setting unit 262 sets the animation state and the like. For example, it includes settings for whether the same message is repeatedly transmitted, settings for message transmission speed, and the like.

[0061] Message setting section 263 sets a character string of a message to be transmitted. For example, the character string “Denwa Dayo” is set as the message string.

The application processing unit 300 indicates an arbitrary application that uses a CG character agent, and performs agent setting for the agent setting unit 300. When setting a personal identifier, a specific personal identifier may be acquired from various information such as a telephone number and a name by using the data search function of the address book data management unit 240.

[0063] (Agent output)

The agent output unit 270 acquires the personal CG agent data stored in the address book data storage unit 250 from the address book data management unit 240, and performs image display Z sound output of the personal CG agent based on the data. Details will be described below.

The agent output unit 270 includes a CG character drawing unit 271, a voice synthesis unit 272, and a basic data storage unit 273.

[0065] The CG character drawing unit 271 generates a CG character on a bitmap on a certain memory from the data stored in the basic data storage unit 273 and the image feature data stored in the address book data storage unit 250. Draw and draw bitmap data or bitmap data Is passed to the image output unit 281.

[0066] An example of a method of drawing a CG character with image feature data power will be described with reference to FIG. The upper figure in Fig. 6 displays the facial feature points and the facial image data from which the feature points have been extracted, and the lower figure in Fig. 6 shows the facial image data using the coordinates of those feature points. Is divided into meshes. These meshes can be deformed by moving feature points, and the face image data is also deformed in conjunction with the mesh. Using these deformations, various animation images such as opening and closing eyes, opening and closing mouth, angry state, and laughing state can be generated by animation that moves feature points. In Fig. 6, the two-dimensional animation technology has been explained. However, by using the 3DCG technology, the image data of the face is pasted on the face of the 3DCG character and animated to create a three-dimensional animation. Variations can also be applied.

[0067] The speech synthesizer 282 generates speech data reflecting personal features from the speech dictionary data stored in the basic data storage unit 273 and the speech feature data stored in the address book data storage unit 250. To the audio output unit 282.

[0068] The basic data storage unit 273 stores basic data necessary for image generation and speech synthesis of the personal CG agent. Examples of basic data include bitmap data and shape data for displaying character images, and speech dictionary data required for speech synthesis. The agent data stored in the address book data storage unit 250 is data for reflecting the characteristics of the individual. When used together with the basic data in the basic data storage unit 273, the personal CG agent can be output. it can.

[0069] Note that if the agent data in the address book data storage unit 250 includes all of the personal CG agent data, the basic data storage unit 273 can be eliminated.

[0070] (Description of processing flow)

With respect to the communication apparatus configured as described above, the flow of processing will be described using a flowchart.

First, the agent data generation process in the communication apparatus according to the present invention will be described with reference to FIG. FIG. 7 is a flowchart showing agent data generation processing performed during a videophone call. The agent data generation process is a videophone call. Occasionally, image feature data and voice feature data are generated from the image data and voice data of the calling party, and stored in address book data in which personal information of the calling party is recorded. This will be described in detail below.

[0072] (Start generating agent data)

First, the videophone call is started when the power of the other terminal is applied, or when the terminal calls the other terminal (step S101).

[0073] Then, at the start of the videophone call, the videophone processing unit 220 notifies the address book data management unit 240 of the telephone number of the other party, and acquires the data update setting value of the agent data. . The address book data management unit 240 searches the address book data stored in the address book data storage unit 250 from the passed telephone number, and sets the “data update setting” value of the address book data (see FIG. 4). It is acquired and returned to the videophone processing unit 220 (step S102).

If the set value is OFF, the videophone processing unit 220 ends the agent data generation process (step S103), and only the videophone call process is executed. On the other hand, when the set value is ON, the videophone processing unit 220 performs processing for passing image data and audio data to the agent data creation unit 230 simultaneously with the videophone call processing (step S104).

[0075] (Agent data generation)

Next, the agent data creation unit 230 generates agent data from the image data and the sound data. The image feature extraction unit 231 generates image feature data from image data using image recognition technology, and the audio feature extraction unit 232 generates audio feature data from audio data using audio recognition technology. (Step S105). The details will be described below.

[0076] The image data is image data transmitted from the other party's power of the videophone, and mainly the face image of the sender is shown. The image data is subjected to facial feature extraction by the image feature extraction unit 231. As an example of a face image feature extraction method, a pattern matching technique may be used in which the color value of image data is compared with a preliminarily prepared pattern. An example of an object to recognize is whether the entire face is shown What is necessary is just to recognize such as the position of the eyes, nose and mouth. It may also recognize color information. An example of the image feature data acquired here may be position coordinates indicating a specific position such as position coordinates of both ends of the lips as shown in FIG. As the image feature data, any data such as color information and face contour information can be applied as long as it can be acquired from the image data.

[0077] Regarding voice data, the type of "sound" uttered by voice recognition technology (for example, "O", "Ha", "Yo", "U", etc.), and the emotional state of the speaker To generate control parameters for use in speech synthesis. As an example of this control parameter, it shows information such as specific vowel consonants and voice data of frequently used words as they are, and information such as voice pitch, loudness, and speed that changes according to the state of emotion. There are parameters. As an example of the speech recognition method, the type of sound and the type of emotion are defined in advance, and the typical speech feature amount is defined only for the defined type, and it is classified into any type by comparing with the speech feature amount. If you can determine the power that can be.

[0078] (Temporary save of agent data)

Next, the reliability of the generated image feature data and audio feature data is passed to the reliability determination unit 233, which determines the reliability of the data stored in the temporary storage data storage unit 234 and the state-specific reliability. The degree of comparison is compared (step S106), and feature data with high reliability is updated and stored in the temporary storage data storage unit 234 (step S107).

At this time, as an example of a method of comparing the reliability of the reliability determination unit 233, the reliability of the image feature data and the reliability of the audio feature data may be separately compared for determination. . Also, if the person appears in the image and the sound is correct and the sound is transmitted, or if the sound is incorrect and the person is displayed in the state, the characteristics of the other party Since there is a high possibility that it will not be reflected, the reliability of the audio data is reflected in the determination of the image data, and the reliability of the audio data is reflected in the determination of the image data. It may be reflected in the determination. In the above description, the method for storing highly reliable data has been described. However, the created image feature data and audio feature data are stored in the temporary storage data storage unit 234 using the reliability. You may merge with feature data. An example of a merge method is to linearly interpolate two values with confidence as a weight value. Yes.

[0080] Next, the agent data creation unit 230 asks the videophone processing unit 220 whether or not the power of the videophone call has been terminated (step S108). If it is not terminated (NO in step S108) ), Return to the processing in step S104 and the subsequent steps, and repeat the agent data generation processing.

[0081] (Save to address book data)

Next, when the videophone call is finished (YES in step S108), agent data creation unit 230 performs the processing in step S109 and subsequent steps for storing the generated agent data in address book data storage unit 250. Details will be described below.

First, the video phone processing unit 220 notifies the address book data management unit 240 that the video phone call has ended, and passes the phone number of the other party. Address book data management unit 240 receives agent data stored in temporary storage data storage unit 234 from agent data creation unit 230. In addition, the address book data of the other party is searched from the address book data storage unit 250 based on the telephone number that has been passed, and the agent data is obtained. For the two data, the reliability judgment unit 233 compares the reliability for each state (step S109), and when the reliability is high (YES in step S110), the reliability of the address book data Update agent data (step S111). On the other hand, if the reliability is low (NO in step S110), the process is terminated without updating the agent data of the address book data. In the above description, the method for storing highly reliable data has been described. However, the two data may be merged using the reliability.

The above is the flow of the agent data generation process.

[0084] Although the above embodiment has described the data generation processing of the present invention for one-to-one videophone, the present invention can be applied to a TV conference system in which a plurality of persons can participate. Needless to say, the present invention can also be applied to a system that communicates various images and sounds.

[0085] (Processing flow of application using personal CG agent)

Next, in the following, an application using a personal CG agent using the flowchart of FIG. 8, particularly the personal CG agent of the other party at the time of incoming call. This section describes the flow of the abrasion process in which the client sends the message “Denwa Dayo” to the user's communication device. At this time, the application processing unit 300 refers to a processing unit of a telephone application that processes a call when an incoming call is received.

First, the application processing unit 300 sets the personal CG agent to be used for the agent setting unit 260 (step 201). An example of this setting is personal identifier setting, status setting, and message setting power, which will be described in detail below.

At the time of incoming call, application processing unit 300 sets an identifier for identifying an individual in individual identifier setting unit 261. The personal identifier is an identifier for retrieving data in the address book data management unit 240, and may be a telephone number or address book data ID, for example. When the identifier is the address book data ID, the application processing unit 300 passes the telephone number of the incoming call to the address book data management unit 240, searches the address book data in the address book data storage unit 250, and You only need to get the corresponding data ID.

Next, a message character string to be transmitted is set in the message setting unit 263. As an example of the message, a fixed character string such as “Denwa Dayo” may be used. Alternatively, a personal message may be set in the address book and the character string may be read out. In addition, the name of the other party may be obtained from the address book data management unit 240 and may include a character string unique to the individual such as “From Mr. OO. In the present embodiment, the message is information of only a character string, but parameters used in speech synthesis such as accent, size, and interval may be added to the message.

[0089] Further, a state parameter for controlling the personal CG agent is set in the state setting unit 262. As an example of the state parameter, there is a state meter that indicates the state of the agent, and setting power related to animation such as repeated actions. The status parameter may be an emotion parameter for changing the way of reading depending on the emotion such as “angry” or “laughing” when reading the character string “Denwa Dayo”. Repetitive action settings include the ability to read out the message “Calla phone” just once, the power to read it repeatedly as “Phone phone, phone call, phone call, etc.” With settings for repeated control Yes, at the time of repetition, you can also specify the change of reading when repeating, such as “Repeat gradually” and “Repeat gradually”. From these settings, the type of CG agent operation and the type of animation can be determined.

Next, when the agent setting is completed, the agent setting unit 260 passes the setting value to the agent output unit 270. The agent output unit 270 first passes the personal identifier to the address book data management unit and receives the corresponding agent data (step S202). The data transfer may be performed by copying the actual state of the data to a certain memory. You may pass the address of the memory where the data is stored. Further, the image feature data included in the agent data is passed to the CG character drawing unit 271, and the voice feature data is passed to the speech synthesis unit 272.

[0091] Next, the CG character drawing unit 271 draws a CG character reflecting the personal features based on the personal feature data (step S203). Here, an example of a CG character drawing method will be described. Here, depending on the setting of the agent type included in the personal feature data, the method of directly using the facial photo and the method of displaying the portrait are selected. As an example of a method of displaying a portrait, the facial parts stored in the basic data storage unit 273 (“eyes”, “nose”, “mouth”, etc.) from the bitmap of the facial photo and the coordinates of the facial feature points ) Bitmaps are selected, and further scaled according to the position coordinates are combined to generate various facial bitmaps. The generated portrait bitmap can be handled in the same way as the portrait bitmap. Next, using the facial feature point data, an animation image of opening and closing of the mouth and opening and closing of the eyes is generated. As an example of this animation technology, as shown in Fig. 6, a mesh is defined based on facial feature points, and bit map data (face photo or caricature data) of the face is texture-mapped on the mesh, and the mesh is created. You can draw animations such as opening and closing of the mouth and opening and closing of the eyes by moving the constituent vertices. By using these technologies, it is possible to draw animated images with lip sync or blinking as bitmap data in accordance with the sound. In this embodiment, a full body model of the character is created using the 3DCG force, which shows only the method of animating the facial image data, and the above facial image is pasted on the facial mesh to display the full body character. May be. In addition, the speech synthesizer 272 utters a speech corresponding to the given character string (step 204). As an example of the speech synthesis method, the phoneme data of the character to be uttered is acquired from the default phoneme database stored in the basic data storage unit 273, and the phoneme control data included in the personal voice feature data is further acquired. Use to change phoneme data and connect phoneme data of character units to generate voice data of the corresponding character string

[0093] The bitmap data generated by the agent output unit 270 is sent to the screen output unit 281 (step S203), and the voice data is sent to the voice output unit 282 (step S204).

Further, the agent output unit 270 controls animation according to the setting of the agent setting unit 260. For example, if a message with the string “Good morning” is conveyed as voice in 4 seconds, an animation of 4 seconds is also drawn for the screen display. As an example of how to create an animation image, as shown in Fig. 9, every second, the face of the face is shaped like “O”, “Ha”, “Yo”, “U”. Control animation and use multiple frames per second to draw animated images that change mouth shape continuously.

[0095] When the animation ends (YES in step S205), the personal CG agent output processing ends (step S205). On the other hand, if the animation has not ended (NO in step S205), the position coordinates used in the drawing, such as control parameters for voice processing, are changed according to the change in time (step S206), and the personal CG agent The output process (step S203 to step S204) is repeated.

[0096] According to the processing flow of FIG. 8 described above, when a call arrives, a CG character created based on the photograph of the caller's face and a portrait is displayed, and the voice reflecting the characteristics of the callee is displayed. Screams “Denwa Dayo” and an animated image that includes opening and closing the mouth is displayed.

The above is the processing flow of the application that uses the personal CG agent.

[0098] As described above, the communication device 20 of the present invention uses the image information and audio information sent in a videophone call to reflect the characteristics of the other party. Can be automatically generated. Therefore, it is possible to realize a communication device that transmits a message using a CG character agent that reduces the time and effort of the user.

[0099] Further, by storing the agent data created in the agent data creation unit 230 in correspondence with the address book data of the other party of the call, all kinds of messages from persons registered in the address book can be personalized. Can be communicated as a CG character agent.

[0100] Furthermore, in the communication device 20 of the present invention, the agent data 401 including the image feature data and the voice feature data is assigned to the address book data 400 and managed for each communication partner. Management can be facilitated, and the address book data 400 can be used in various other applications other than message transmission applications such as videophone and mail.

[0101] (Deploy to other applications)

In the above embodiment, the processing of the application using the present invention has been explained for the incoming call message. However, the present invention reads out the incoming message of the e-mail and the content of the e-mail. Needless to say, it is equally applicable to various applications that handle personal information, such as applications transmitted by the personal CG agent of the e-mail sender.

[0102] Furthermore, the present invention can be applied to entertainment applications such as games.

For example, by applying it to an adventure game or role-playing game in which multiple people appear, by using a personal CG agent with a random address book or a person specified by the user as a game character. CG characters that reflect the characteristics of a person who knows! / Can appear in the virtual world and convey information (messages), creating their own characters and stories. To provide new entertainment.

Industrial applicability

[0103] The communication apparatus according to the present invention can realize a familiar and easily understandable interface using a CG character agent, and is useful for information terminals having a videophone function, such as a mobile phone terminal, a PDA, and a PC.

Claims

The scope of the claims

[1] A communication device that transmits a message using a CG character agent,

Communication means for communicating communication data with other terminals;

Agent data creating means for creating agent data including personal characteristic data of a communication partner based on the communication data;

Address book data creating means for creating address book data including the agent data in correspondence with personal information of a communication partner;

Address book data storage means for storing the address book data;

An agent creating means for creating the CG character agent of the communication partner with reference to the agent data included in the address book data of the communication partner, and an agent output means for outputting the CG character agent. Communication device.

[2] The agent data creation means includes an image feature extraction unit that automatically extracts image feature data from the communication data, and an audio feature extraction unit that extracts audio feature data.

The agent data includes at least the image feature data and the sound feature data.

The communication device according to claim 1, wherein:

[3] The image feature data includes at least the position coordinates of the feature points for each emotional state of the image,

The voice feature data includes at least sound quality and control parameters for each emotional state of the voice.

The communication device according to claim 2, wherein:

[4] The communication data is videophone packet data.

The communication apparatus according to claim 1 or 2, wherein

[5] The agent data creation means includes:

A reliability level assigning section for giving a reliability level to the created agent data; and a temporary storage data for temporarily storing the agent data to which the reliability level is given. Data storage unit,

During communication, the reliability of newly created agent data is compared with the reliability of agent data already stored in the temporary storage data storage unit, and agent data stored in the temporary storage data storage unit is compared. And a reliability judgment unit that automatically updates the agent data with high reliability.

The agent creating means further creates a CG character agent using the agent data stored in the temporarily stored data storage unit.

The communication device according to claim 1, wherein:

[6] The reliability determination unit is further stored in the address book data storage means after communication, and stored in the temporary storage data storage unit and the reliability of the agent data of the communication partner. The agent data stored in the address book data storage means is automatically updated to highly reliable agent data by comparing with the reliability of the agent data of the communication partner.

6. The communication device according to claim 5, wherein

[7] The agent output means includes:

A basic data storage unit for storing basic data for creating the CG character agent;

The basic data and the image feature data force; a CG character drawing unit for creating image data of the CG character agent;

A speech synthesis unit that creates speech data of the CG character agent from the basic data and the speech feature data.

The communication device according to claim 2, wherein:

[8] The communication device further includes:

Application processing means for processing an application using message transmission by the CG character agent;

The communication apparatus according to claim 1, further comprising agent setting means for inputting settings of the CG character agent.

[9] The application processing means receives a call using the CG character agent. Processing an application that displays a message or an application that displays an incoming mail message using the CG character agent

9. The communication apparatus according to claim 8, wherein

[10] A CG character agent creation device provided in a communication device that transmits a message using a CG character agent,

Communication means for communicating communication data with other terminals;

Address book data storage means for storing the address book data;

An agent creating means for creating the CG character agent of the communication partner with reference to the agent data included in the address book data of the communication partner, and an agent output means for outputting the CG character agent. CG character agent creation device.

[11] The agent data creation means includes an image feature extraction unit that automatically extracts image feature data from the communication data, and an audio feature extraction unit that extracts audio feature data.

The CG character agent creation device according to claim 10.

[12] The agent data creation means includes:

A reliability level granting unit for imparting reliability level to the created agent data; a temporary storage data storage unit for temporarily storing the agent data level with the reliability level;

During communication, the reliability of newly created agent data is compared with the reliability of agent data already stored in the temporary storage data storage unit, and agent data stored in the temporary storage data storage unit is compared. Automatically and reliably. A reliability judgment unit that updates the agent data,

The CG character agent creation device according to claim 10.

[13] A communication method used for a communication device that transmits a message using a CG character agent,

A communication step for communicating communication data with another terminal;

Based on the communication data, an agent data creation step of creating agent data including personal characteristic data of the communication partner;

An address book data creating step for creating address book data including the agent data in correspondence with personal information of a communication partner;

An address book data storage step for storing the address book data;

An agent creating step of creating the CG character agent of the communication partner with reference to the agent data included in the address book data of the communication partner, and an agent output step of outputting the CG character agent. Communication method.

[14] A program used in a communication device that transmits a message using a CG character agent,

An address book data storage step for storing the address book data;

Referring to the agent data included in the address book data of the communication partner, causes the computer to execute an agent creation step of creating the CG character agent of the communication partner and an agent output step of outputting the CG character agent A program characterized by that.