CN104835190A

CN104835190A - 3D instant messaging system and messaging method

Info

Publication number: CN104835190A
Application number: CN201510215785.9A
Authority: CN
Inventors: 陆远刚; 盛蕴; 张桂戌
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2015-04-29
Filing date: 2015-04-29
Publication date: 2015-08-12

Abstract

The invention provides a 3D instant messaging system comprising clients and a server. A face synthesis unit and a voice synthesis unit are disposed inside each client. The face synthesis unit comprises a face feature extraction device, a model nesting device and a texture mapping device. The face feature extraction device is used for extracting face features from 2D face photos. The model nesting device is used for projecting 3D face mesh models onto the 2D face photos according to the extracted face features, so that texture coordinates of the 3D face mesh models can be obtained. The texture mapping device is used for mapping the 2D face photos back to 3D face meshes so as to form 3D faces. The voice synthesis unit is used for generating voice flows and 3D face animation according to the 3D faces and input text, and outputting the voice flows and the 3D face animation to the server. The server is used for achieving information interaction between the clients. According to the invention, 3D technology is introduced into the messaging system, so that users can design customized 3D face animation and chat on the internet with the customized 3D face animation, and the practical operation can be more interesting and vivid.

Description

A kind of 3D instant communicating system and the means of communication thereof

Technical field

The present invention relates to a kind of communication system, particularly relate to a kind of 3D instant communicating system and the means of communication thereof.

Background technology

Instant messaging (IM, Instant Messenger) system is a kind of instant AC system based on internet.This system can make user be talked online by internet and other people, do not worry mail not in time with phone cost issues.1996, after first item immediate communication tool ICQ is born, and become rapidly the instant communicating system that customer volume is maximum in the world at that time.Afterwards, all kinds of IM is pushed out like the mushrooms after rain.Today, instant communicating system has become the requisite internet communication instrument of many people.

In modes such as text, audio frequency, videos, instant communicating system mainly supports that user carries out remote dialogue and exchanges.Text chat is more dull, and Video chat only appears at when you have installed camera (webcom), and hope and the other side could realize when looking.Instant communicating system, reach personalised effects as Microsoft MSN Messenger mainly loads 2D photo by user, this personalisation process belongs to static treatment.Tencent QQ then have employed the way of user interactions, and user reaches personalised effects by the ornament buying a human head picture, but this technology still belongs to 2D technology.

Therefore, needing the text chat that a kind of compromise scheme both can avoid monotony in the market, can removing from again with stranger looking awkward instant communicating system.

Summary of the invention

The object of the invention is, for providing a kind of personalized 3D instant communicating system, only to synthesize realistic 3D face with a 2D human face photo by present system user, and personalization can be carried out to generated 3D face.When the Internet chat, user by text decomposing module, the text of key entry is converted into corresponding phoneme and apparent place, drive system generates corresponding voice and animation.

The present invention proposes a kind of 3D instant communicating system, comprising: client and server end; Described client is for realizing logging in and the constrained input of information of user; Human face segmentation portion and phonetic synthesis portion is provided with in described client; Wherein, described human face segmentation portion comprises: face characteristic extraction element, model nested arrangement and texture mapping unit; Described face extraction device is used for extracting face feature from 2D human face photo; Described model nested arrangement, for 3D face wire frame model being projected to described 2D human face photo according to the described face feature extracted, obtains the texture coordinate of described 3D face wire frame model; Described 2D human face photo is mapped back 3D face grid according to described texture coordinate by described texture mapping unit, forms 3D face; Described phonetic synthesis portion is used for according to the text generation voice flow of described 3D face and input and 3D human face animation and exports described server end to; Described server end is for realizing the information interaction between each client.

In the 3D instant communicating system that the present invention proposes, described face characteristic extraction element extracts described face feature by interactive.

In the 3D instant communicating system that the present invention proposes, described model nested arrangement comprises: attitude evaluation module, overall calibration module, local alignment module and boundary alignment module; Wherein, described attitude evaluation module estimates 3D information according to described face feature from described 2D human face photo; Described 3D face wire frame model projects on 2D human face photo according to described 3D information by described overall calibration module; Described local alignment module is used for matching to the face in described 3D faceform and described 2D human face photo; The border of described 3D faceform is drawn to the border of described 2D human face photo by described boundary alignment module by use spring model algorithm.

In the 3D instant communicating system that the present invention proposes, described phonetic synthesis portion comprises: text decomposing module, Visual text-to-speech module, animation compound module; Wherein, the text of input is decomposed into phoneme by described text decomposing module; Described Visual text-to-speech module by described phoneme conversion be voice flow and described voice flow synchronous apparent place sequence; Described animation compound module generates 3D human face animation according to described apparent place sequence, and with described voice flow synchronism output.

In the 3D instant communicating system that the present invention proposes, comprise further: 3D face personality module in described client, described 3D face personality module is used for decorating described 3D face.

The invention allows for a kind of 3D instant communication method, it comprises the following steps:

Step one: log in client, inputs described client by user profile and 2D human face photo;

Step 2: extract face feature by face extraction device from described 2D human face photo;

Step 3: 3D face wire frame model projects on described 2D human face photo according to the described face feature extracted by model nested arrangement, obtains the texture coordinate of described 3D face wire frame model;

Step 4: described 2D human face photo is mapped back described 3D face grid according to described texture coordinate by texture mapping unit, forms 3D face;

Step 5: select to need the good friend of communication and input text in described client, phonetic synthesis portion is by the text generation voice flow of described 3D face and input and 3D human face animation and export server end to;

Step 6: described voice flow and described 3D human face animation are sent to corresponding client according to the described good friend selected by described server end, realize 3D instant messaging.

The 3D instant communicating system that the present invention proposes, user can synthesize realistic 3D face with a 2D human face photo, and can carry out personalisation process to the 3D face of synthesis.When using the Internet chat, keying in text by phonetic synthesis portion, driving 3D human face segmentation animation and voice.3D technology is introduced network instant communication system by the present invention, user is designed and uses personalized 3D human face animation to carry out Internet chat, adding interest and the vividness of practical operation, promotes the innovation of instant communicating system.

The 3D instant communicating system that the present invention proposes, user only needs the 2D human face photo of input one oneself generation 3D human face animation just can chat with the other side.3D face refers in the real border of three-dimensional can the 3 D stereo head portrait of representative of consumer individual.In 3D instant communicating system, user is by the human face segmentation technology 2D human face photo synthesis 3D face based on single view, and different expressions and mood (emotions) can be generated, to a kind of sensation on the spot in person of people, expression is then by the phoneme of one group of predefined (phonemes) and apparent place phonetic synthesis portion (text-visual speech engine) (text-to-visual speech engine) that (visemes) drives synthesizes.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of 3D human face segmentation in the present invention.

Fig. 2 is the position view of human face characteristic point in the present invention.

Fig. 3 is end rotation θ in the present invention _zestimation figure.

Fig. 4 a is end rotation θ in the invention process _ythe view of estimation position, middle section.

Fig. 4 b is end rotation θ in the invention process _yfront view in estimation.

Fig. 5 a is the schematic diagram of 2D face in the embodiment of the present invention.

Fig. 5 b is the face border schematic diagram before using spring model in the embodiment of the present invention.

Fig. 5 c is the face border schematic diagram after using spring model in the embodiment of the present invention.

Fig. 6 is the workflow diagram in phonetic synthesis portion in the present invention.

Fig. 7 be in table 1 of the present invention each phoneme corresponding apparent place figure.

Fig. 8 is mouth shape animation curve fitted figure of the present invention.

Fig. 9 is the schematic flow sheet of 3D instant communication method of the present invention.

Figure 10 is the schematic diagram that in the embodiment of the present invention, client logins interface.

Figure 11 is client chat window surface chart in the embodiment of the present invention.

Figure 12 a is personalized 3D face schematic diagram in the embodiment of the present invention.

Figure 12 b is personalized 3D face schematic diagram in the embodiment of the present invention.

Figure 12 c is personalized 3D face schematic diagram in the embodiment of the present invention.

Figure 12 d is personalized 3D face schematic diagram in the embodiment of the present invention.

Figure 13 is the surface chart of server end in the embodiment of the present invention.

Figure 14 a is the schematic diagram that in the embodiment of the present invention, client logins interface.

Figure 14 b is the schematic diagram that in the embodiment of the present invention, client logins interface.

Figure 14 c is the schematic diagram that in the embodiment of the present invention, client logins interface.

Figure 14 d is the surface chart of one's own side's client in the embodiment of the present invention.

Figure 14 e is the surface chart of the other side's client in the embodiment of the present invention.

Figure 15 is the structural representation of 3D instant communicating system of the present invention.

Embodiment

In conjunction with following specific embodiments and the drawings, the invention will be described in further detail.Implement process of the present invention, condition, experimental technique etc., except the following content mentioned specially, be universal knowledege and the common practise of this area, the present invention is not particularly limited content.

As shown in figure 15, the present invention proposes a kind of 3D instant communicating system, comprising: client and server end.Client, for realizing logging in and the constrained input of information of user, is provided with human face segmentation portion 1 and phonetic synthesis portion 2 in client.

In the 3D instant communicating system that the present invention proposes, human face segmentation portion 1 comprises: face characteristic extraction element 11, model nested arrangement 12 and texture mapping unit 13.Face extraction device 11 for extracting face feature from 2D human face photo.Model nested arrangement 12, for 3D face wire frame model being projected to 2D human face photo according to the face feature extracted, obtains the texture coordinate of 3D face wire frame model; 2D human face photo is mapped back 3D face grid according to texture coordinate by texture mapping unit 13, forms 3D face.

In the 3D instant communicating system that the present invention proposes, phonetic synthesis portion 2 is for exporting server end to according to the text generation voice flow of 3D face and input and 3D human face animation.

In the 3D instant communicating system that the present invention proposes, server end is for realizing the information interaction between each client.

In the 3D instant communicating system that the present invention proposes, face characteristic extraction element 11 extracts face feature by interactive.

In the 3D instant communicating system that the present invention proposes, model nested arrangement 12 comprises: attitude evaluation module 121, overall calibration module 122, local alignment module 123 and boundary alignment module 124.Wherein, attitude evaluation module 121 estimates 3D information according to face feature from 2D human face photo.3D face wire frame model projects on 2D human face photo according to 3D information by overall situation calibration module 122.Local alignment module 123 is for matching to the face in 3D faceform and 2D human face photo.The border of 3D faceform is drawn to the border of 2D human face photo by boundary alignment module 124 by use spring model algorithm.

In the present invention, spring model is the mathematical model that in physically based deformation, principle of conservation of energy proposes, and it adopts the rule of interior external force conservation to make the 2D project motion of 3D face grid to the position on face border in photo.In spring model, every bar limit of 3D face grid is counted as spring, and 3D face wire frame model is exactly the 3D model that the point that intersected by spring and spring is formed.

The boundary position that external force can make face in mesh motion to photo is applied to spring model.Because the gradient of border and non-boundary has obvious difference in photo, define external force by gradient.External force following formula can be represented as:

\overset{&RightArrow;}{{F_{i}}^{ext}} = τΔ (G_{σ} * I ({\overset{&RightArrow;}{n}}_{i}));

Wherein, that image I is at summit N _ithe intensity at place, τ is a weight constant being used for controlling internal agency and external agency balance, G _σthe 2-D gaussian filtering produced under standard deviation sigma, it is Hamiltonian operator

Because each summit in 3D grid connects many limits, also move in another summit coupled when a vertex movements thereupon, and now applying reacting force stops network to change by these summits, and this power is internal force.When interior external force is equal, this point moves to equilibrium position, and namely the boundary position of 2D face, completes boundary alignment.As shown in Figure 5 c, the face's outline grey broken line in figure is the face border after using spring model.

In the 3D instant communicating system that the present invention proposes, phonetic synthesis portion 2 comprises: text decomposing module 21, Visual text-to-speech module 22, animation compound module 23.Wherein, the text of input is decomposed into phoneme by text decomposing module 21.Visual text-to-speech module 22 by phoneme conversion be voice flow and voice flow synchronous apparent place sequence.Animation compound module 23 according to apparent place sequence generates 3D human face animation, and with voice flow synchronism output.

In the 3D instant communicating system that the present invention proposes, comprise further in client: 3D face personality module 3,3D face personality module 3 is for decorating 3D face.

Based on above 3D communication system, the present invention proposes a kind of 3D instant communication method, as shown in Figure 9, it comprises the following steps:

Step one: log in client, by user profile and 2D human face photo input client;

Step 2: extract face feature by face extraction device 11 from 2D human face photo;

Step 3: 3D face wire frame model projects on 2D human face photo according to the face feature extracted by model nested arrangement 12, obtains the texture coordinate of 3D face wire frame model;

Step 4: 2D human face photo is mapped back 3D face grid according to texture coordinate by texture mapping unit 13, forms 3D face;

Step 5: select to need the good friend of communication and input text in client, phonetic synthesis portion 2 is by the text generation voice flow of 3D face and input and 3D human face animation and export server end to;

Step 6: voice flow and 3D human face animation are sent to corresponding client according to the good friend selected by server end, realize 3D instant messaging.

As shown in Figure 1, human face segmentation of the present invention needs to experience following process: face characteristic extracts, model is nested (modelfitting) and texture.

In the present invention, it is from 2D human face photo, extract face feature, as eyebrow, eyes, nose, face, lower jaw etc. that face characteristic extracts.Face characteristic is extracted in the human face segmentation based on single view and mainly plays two aspect effects: one is for the estimation of 3D human face posture provides geological information; Two is provide positional information for model is nested.

Model cover insertion device 12 is made up of with boundary alignment module 124 attitude evaluation module 121, overall calibration module 122, local alignment module 123.The estimation of 3D human face posture estimates 3D information from 2D human face photo, due to the disappearance of depth information (depth), thus needs to be estimated by the inherent geological information in face characteristic.The accuracy of estimation depends on the degree of accuracy extracted face characteristic.According to 3D human face posture estimation result, by overall situation calibration, the 3D face wire frame model of a predefined is projected on 2D human face photo by rotation, scaling and translation.Because different faces there are differences, the projection of general 3D faceform can not match with the feature of each face and profile thereof, therefore needs local alignment and boundary alignment.Local alignment is the calibration to local locations such as the eyes in 3D faceform, mouth shapes after overall situation calibration, thus makes matching in eyes in 3D faceform, mouth shape and 2D human face photo.Boundary alignment is after local alignment, by use spring model algorithm, the border of 3D faceform is drawn to the border of 2D human face photo.By above step, the texture coordinate of 3D face wire frame model can be calculated, texture element be mapped back 3D face wire frame model, realistic 3D face can be obtained, then be mapped to screen space, complete texture.

In the present invention, phonetic synthesis portion 2 supports different language Text Input, and Text Input can be converted into voice and the output of 3D human face animation, improves the sense of reality of synthesis human face animation.

In the present invention, because extracted human face characteristic point is positioned at the different parts of face, and the complex background of the attitude of face and 2D human face photo all can affect the accuracy of extraction, so the present embodiment adopts the 2D human face photo of the mode of user interactions to input to carry out manual human face characteristic point extraction.Ten human face characteristic points that user extracts as shown in black round dot in Fig. 2, and carry out the estimation of 3D human face posture according to the geometric relationship between each unique point.

In the present invention, the estimation of 3D human face posture estimates 3D information from 2D image, and it is the key based on single view human face segmentation technology.Due to the disappearance of depth information (depth), thus need to be estimated by the inherent geological information in face characteristic.Attitude estimation can be considered to θ in formula _x, θ _y, θ _zestimation, following formula:

[\begin{matrix} X^{'} \\ Y^{'} \\ Z^{'} \end{matrix}] = [\begin{matrix} 1 & - θ_{Z} & θ_{Y} \\ θ_{Z} & 1 & - θ_{X} \\ - θ_{Y} & θ_{X} & 1 \end{matrix}] \times [\begin{matrix} S_{X} & 0 & 0 \\ 0 & S_{Y} & 0 \\ 0 & 0 & S_{Z} \end{matrix}] [\begin{matrix} X \\ Y \\ Z \end{matrix}] + [\begin{matrix} T_{X} \\ T_{Y} \\ T_{Z} \end{matrix}];

Formula is the 3D affined transformation under rectangular projection model.Any one some P (X, Y, Z) on 3D faceform can be mapped to impact point P'=(X', Y', Z') by this affined transformation under very small Euclid angle.Wherein θ _x, θ _yand θ _zrepresent the angle rotated around the X-axis in 3-D space, Y-axis and Z axis respectively, S _x, S _yand S _zrepresent zoom factor respectively, (T _x, T _y, T _z) ^tit is corresponding translation vector.

θ in formula _zestimation can by calculating the line segment P in 2D face picture _lp _robtain, as Fig. 3 with the angle of 2-D horizontal axis.The present invention adopts a kind of method based on round section to be used for estimating end rotation θ _yvalue, prerequisite be hypothesis head be annular through eye level cross section, as shown in figures 4 a and 4b.Now, end rotation θ _yvalue just can be estimated, following formula:

θ_{Y} = &angle; P_{C} {OP}_{3} = \arcsin \frac{2 | P_{CX} - P_{3 X} |}{| P_{2 X} - P_{1 X} |};

Wherein, P ₁and P ₂two points in horizontal eye position direction in face picture, P ₃p ₁, P ₂mid point.Subscript X represents horizontal axis.θ _xfor head faces y direction, now the relative direction of X-axis line is stood in this direction in being.

Mainly use left eye central point, right eye central point and right and left eyes central point, as shown in Figure 2, the angle information of estimation face, the method can rotate 3D faceform by the angle of estimation, and by its rectangular projection on 2D face view.

As shown in Figure 2, left eye center, right eye center, right and left eyes center and four, face center point are calibrated for the overall situation of model; Left eye center and two, right eye center point are calibrated for the width of faceform; Right and left eyes center and face center are used for the altitude calibration of faceform.The left corners of the mouth, the right corners of the mouth, upper lip are used under mid point and lower lip the mouth shape calibration in local alignment along mid point four points.

Fig. 5 a-Fig. 5 c illustrates the 3D face boundary effect figure using spring model.Fig. 5 a is the 2D human face photo for the synthesis of 3D faceform, and in Fig. 5 b, facial contour inner grey broken line is the 3D face border before using spring model, and the facial contour grey broken line in Fig. 5 c is the face border after using spring model.Clearly can find out from comparing result, use the 3D face border after spring model consistent with original face border.

Have employed CANDIDE-3 model and OpenGL shape library in the present invention, completed on the CANDIDE-3 model after by 2D face View Mapping to calibration by the texture function calling OpenGL shape library, generate 3D face.Finally the 3D face of generation is mapped to screen space, completes texture.

In the present invention, phonetic synthesis portion 2 can be input as object with English text, input text is decomposed into the aligned phoneme sequence with sequential by text decomposing module 21, the phoneme generated not only for phonetic synthesis but also for apparent place generation, by generate apparent place sequence completes the synthesis of 3D human face animation.Complete text, speech-sound synthesizing function by Visual text-to-speech module 22, carry

Table 1: phoneme is apparent place mapping table

Apparent place	Set of phonemes	Example
			1	Acquiescence	-
2	ay,ah	bite,but
			3	ey,eh,ae	bait,bet,bat
4	er	Bird
			5	ix,iy,ih,ax,axr,y	debit,beet,bit,about,butter,yacht
6	uw,uh,w	boot,book,way
			7	ao,aa,oy,ow	bought,bott,boy,boat
8	aw	Bout
			9	g,hh,k,ng	gay,hay,key,sing
a	r	Ray
			b	r,d,n,en,el,t	lay,day,noon,button,bottle,tea
c	s,z	sea,zone
			d	ch,sh,jh,zh	choke,she,joke,azure
e	th,dh	thin,van
			f	f,v	fin,van
g	m,em,b,p	mom,bottom,bee,pea

Get the phoneme corresponding with text, by self-defining phoneme apparent place table, phoneme conversion can be become corresponding apparent place.As shown in first branch of Fig. 6, when user inputs English text, phonetic synthesis portion 2 is first decomposed into phoneme, then generates corresponding voice flow, finally completes phonetic synthesis.

As shown in table 1,16 of the present invention's employing have the basic apparent place class of obviously difference, and it correspond to 16 substantially apparent place as Fig. 7.Based on 16 basic apparent place, by accurate expert along training define herein phoneme be mapped as apparent place translation table.

Table 2: phoneme is apparent place the rule of mapping

As shown in table 2, in translation table, have 3 kinds of projected forms: single phoneme corresponding single apparent place, multiple phoneme corresponding multiple apparent place, multiple phoneme corresponding single apparent place.As shown in half branch under Fig. 6, present system phoneme can be mapped as corresponding apparent place, obtain synchronous with phoneme apparent place sequence, complete human face segmentation.

Due to apparent place describe be sounding, what the present invention mainly realized is based on mouth shape animation.Based on 16 differences apparent place, define herein different mouth shape animation rule.Fig. 8 illustrates mouth shape animation principle.When the upper lip of face moves upward, as the direction of arrow indication upwards, now use the change of sin curve C 1 matching upper lip.When upper lip moves, the height first calculating face point midway is the height of sin curve, and secondly the movement position of other position points of upper lip calculates gained by sin curve.The motion of lower lip is the direction of downward arrow indication in figure, adopts and upper lip moves same principle, and be C1 when face closes up, C2 converges to the situation of C3 line segment, and now sin height of curve is 0.

The present invention conveniently user chat experience, client is made up of two interfaces, is respectively and logins interface and dialog box.Figure 10 be client login interface, this interface for supporting that user inputs user name, the 2D human face photo of user and interactive operation of extracting ten unique points.After completing aforesaid operations, the information of user is sent to server end by client.User obtains online user's list information that server end is passed back, selects the user of chat.Now user just can chat with the other side, and enters the dialog box of client, as shown in figure 11.As can be seen from the figure this dialog box by Text Entry, chat message display box, one's own side and the other side 3D face frame, send the controls such as button, chat record button and individual operation button and form.After user input text, text can be sent to the client of the other side by client.Meanwhile resolved to phoneme by the phonetic synthesis portion 2 of this client, then phonetic synthesis portion 2 can generate voice flow with corresponding apparent place sequence.Now adopt a Thread control herein for voice flow, animation producing adopts another Thread control, with multi-thread mechanism make phoneme with apparent place accomplish synchronous.Can the animation sequence corresponding with it be exported while output voice, create with text, the instant chat of voice and animation three real-time synchronization.

The present invention has more interest and vividness to allow chat, is also provided with 3D face personality module 3.As shown in figure 11, the invention provides 4 kinds of personalized jewelrys and select for user, 4 kinds of personalized jewelry icons as top-right in interface, it represents yellow grid cap, red grid cap, black glasses, yellow glasses respectively.When user does not also put on personalized jewelry, now click the personalized jewelry icon of client, one's own side can put on corresponding personalized jewelry to the 3D face of the other side's client, completes wearing.When user clicks the personalized jewelry put on again, the personalized jewelry that the 3D face of one's own side and the other side's client is worn can picked-off.Shown in Figure 12 a is front elevation after putting on a hat, and shown in Figure 12 b is 3D face after rotation 30 degree, and shown in Figure 12 c is front elevation after putting on one's glasses, and shown in Figure 12 d is 3D face after rotation 30 degree.

Server end interface as shown in fig. 13 that, is made up of online user's list, user's chat text information, unlatching server end button, closing server end button four controls.The current online user list of online user's list display.User's chat text message box is for recording the chat message of all users.In the present invention, the text communications between user and user is completed by server end transfer, and it is first send to server end that user sends text to another user, then gives request user that user that will send by this information and sending of server-side processes.In addition server end also stored for other information of user, as user name, user 2D human face photo, human face characteristic point etc.Because server end will process various message, all having added message indications at message header when client sends message, is connection server end request message for indicating message, or sends message to other users, or the message of other types.According to different message header indications, server end can provide different processing modes and complete corresponding operation.When click open server end time, server end can open a port for waiting for the connection of user, and bring into operation its various service functions; When clicking closing server end button, server end meeting close port, disconnects, exits server end interface.

3D communication system of the present invention uses step as follows: first user opens client, can show and login interface box, as shown in figures 14a.Input user name and 2D human face photo, as shown in 14b figure logining in interface.Connection server end, now the information of user can be sent to server end by client.Server end receives user profile, is stored in server end, and passes online user's list back, and user selects the object of chat according to online user's list, as shown in figure 14 c.Just can enter the dialog box of client in doing so and chat with online user, as shown in Figure 14 d and Figure 14 e.As shown in Figure 14 d, user first inputs chat text in input frame, click and send button, chat text will be sent to the client of the other side by client, and in chat message display box, upgrade chat text record (chat text, transmit leg user name, transmitting time).Meanwhile client can generate the voice corresponding with text and human face animation by phonetic synthesis portion 2.When one's own side receives the chat text of the other side's transmission, client can upgrade the chat text of the other side in chat message display box, and is generated the human face animation of voice and the other side by phonetic synthesis portion 2.Figure 14 d is the state of user's one's own side's client when chatting, and Figure 14 e is the state of the other side's client.The invention provides yellow grid cap, red grid cap, black glasses, the personalized jewelry of yellow glasses 4 kinds for user's selection, user can click arbitrarily one or more jewelry icon and complete 3D face, and is shown in one's own side and the other side's client simultaneously.As jewelry need be extractd, only the personalized jewelry icon put on need be clicked.What Figure 14 d and Figure 14 e showed is the state that one's own side puts on that yellow grid cap, black glasses and the other side put on red grid cap, the chat of yellow glasses.

Protection content of the present invention is not limited to above embodiment.Under the spirit and scope not deviating from inventive concept, the change that those skilled in the art can expect and advantage are all included in the present invention, and are protection domain with appending claims.

Claims

1. a 3D instant communicating system, is characterized in that, comprising: client and server end;

Described client is for realizing logging in and the constrained input of information of user; Human face segmentation portion (1) and phonetic synthesis portion (2) are provided with in described client; Wherein,

Described human face segmentation portion (1) comprising: face characteristic extraction element (11), model nested arrangement (12) and texture mapping unit (13); Described face extraction device (11) for extracting face feature from 2D human face photo; Described model nested arrangement (12), for calibrating 3D face wire frame model according to the described face feature extracted, obtains the texture coordinate of described 3D face wire frame model; Described 2D human face photo is mapped back 3D face grid according to described texture coordinate by described texture mapping unit (13), forms 3D face;

Described phonetic synthesis portion (2) is for exporting described server end to according to the text generation voice flow of described 3D face and input and 3D human face animation;

Described server end is for realizing the information interaction between each client.

2. 3D instant communicating system as claimed in claim 1, is characterized in that, described face characteristic extraction element (11) extracts described face feature by interactive.

3. 3D instant communicating system as claimed in claim 1, it is characterized in that, described model nested arrangement (12) comprising: attitude evaluation module (121), overall calibration module (122), local alignment module (123) and boundary alignment module (124); Wherein,

Described attitude evaluation module (121) estimates 3D information according to described face feature from described 2D human face photo;

Described 3D face wire frame model projects on 2D human face photo according to described 3D information by described overall calibration module (122);

Described local alignment module (123) is for matching to the face in described 3D faceform and described 2D human face photo;

The border of described 3D faceform is drawn to the border of described 2D human face photo by described boundary alignment module (124) by use spring model algorithm.

4. 3D instant communicating system as claimed in claim 1, it is characterized in that, described phonetic synthesis portion (2) comprising: text decomposing module (21), Visual text-to-speech module (22), animation compound module (23); Wherein,

The text of input is decomposed into phoneme by described text decomposing module (21);

Described Visual text-to-speech module (22) by described phoneme conversion be voice flow and described voice flow synchronous apparent place sequence;

Described animation compound module (23) generates 3D human face animation according to described apparent place sequence, and with described voice flow synchronism output.

5. 3D instant communicating system as claimed in claim 1, it is characterized in that, comprise further in described client: 3D face personality module (3), described 3D face personality module (3) is for decorating described 3D face.

6. a 3D instant communication method, is characterized in that, for 3D instant communicating system as claimed in claim 1, it comprises the following steps:

Step 2: extract face feature by face extraction device (11) from described 2D human face photo;

Step 3: 3D face wire frame model projects on described 2D human face photo according to the described face feature extracted by model nested arrangement (12), obtains the texture coordinate of described 3D face wire frame model;

Step 4: described 2D human face photo is mapped back described 3D face grid according to described texture coordinate by texture mapping unit (13), forms 3D face;

Step 5: select to need the good friend of communication and input text in described client, phonetic synthesis portion (2) are by the text generation voice flow of described 3D face and input and 3D human face animation and export server end to;