CN102497513A

CN102497513A - Video virtual hand language system facing digital television

Info

Publication number: CN102497513A
Application number: CN2011103804082A
Authority: CN
Inventors: 曾金龙; 林谋广; 罗笑南
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2011-11-25
Filing date: 2011-11-25
Publication date: 2012-06-13

Abstract

The invention discloses a video virtual hand language system facing a digital television. According to the system, de-multiplexing is carried out on a program source code stream, and voice, video and other data information are decoded, wherein other data information comprises caption text information; a caption text is inputted into a virtual hand language generation module which calls corresponding hand language data from a hand language database according to a text entry and carries out figure drafting to generate a hand language frame, and proper smooth processing is carried out between different gestures; and the hand language frame and voice information of a program are subjected to synchronous superposition and outputting. Through the video virtual hand language system, manpower and material resources are saved and preparation is standard, simultaneously, the smooth processing based on content is employed, motions of a gesture is natural, cooperation of face expression and the gesture is introduced, and gesture expression is more accurate and in accordance with reality.

Description

A kind of video visual human sign language system towards DTV

Technical field

The present invention relates to the digital television techniques field, be specifically related to a kind of video visual human sign language system towards DTV.

Background technology

There is nearly deaf-mute of 3,000 ten thousand in China, solves the deaf-mute and sees that the difficult problem of TV is a great project supported by commen people.And current most of TV station all is not furnished with the channel that suitable deaf-mute watches; Some news column adopts the artificial form of recording to accomplish the conversion of sign language; Not only labor manpower and materials, and the standardized degree of ageing and accuracy and sign language are all very not enough.

In recent years Sign Language Recognition and sign language study on the synthesis all there are very big progress, preliminary application have also been arranged, begun to try out in a lot of public arenas based on visual human's sign language system.The software of having developed such as Vcom3D company that can let the people exchange with other people on the internet through sign language and countenance; The ViSiCAST system in Europe has used movement capturing technology to realize the conversion from voice to Britain's sign language, and this system uses in public situation such as post office, networks.And at home, 2009, the Chinese Academy of Sciences researched and developed " video frequency virtual humance sign language compiling system " and has applied it in the radio data system; Harbin Institute of Technology has also proposed the news in Sign Language broadcasting method based on the visual human of oneself; Also have the development of company of Konka with the closely-related a kind of television set that function is explained orally in sign language that has of hardware.

At present " video frequency virtual humance sign language compiling system " of the Chinese Academy of Sciences comprises program source input, the department of computer science output module of unifying in the prior art.Wherein computer system is a core, can be industrial computer system, mainly comprises 5 modules: (1) sign language synthesis module, the program text of input is translated into the sign language data; (2) visual human's synthesis module is expressed described sign language data through the visual human; (3) the non-linear editing integrated circuit board of support overplay; Voice duration synchronizing information acquisition module writes down the corresponding initial sum termination time of each text; (5) primary module is responsible for the communication for coordination between above-mentioned each module.Primary module; The text sentence corresponding according to the voice duration information synchronization call of being obtained; Translate into the sign language data by the sign language synthesis module, the visual human who is generated by visual human's synthesis module again expresses, and through the non-linear editing integrated circuit board sign language frame is added in the program image.

The shortcoming of prior art is that the synchronizing information of voice and sign language captions in the above-mentioned video frequency virtual humance sign language compiling system obtains is the mode through artificial " bat libretto "; Obtain the pairing time span information of all text sentences of video; This method is through watching program on one side through the staff; Pat a lower keyboard in the place of needs on one side, make software records get off pairing zero-time of each text and termination time.This method is labor intensive not only, and it is strong to have subjectivity, drawbacks such as inaccuracy.Secondly; Smoothing processing is not made in generation to the visual human in this system; Just carry out the man-to-man drafting of calling according to the mapping relations of text entry and gesture data; Because possibly there are bigger difference in position between the different gestures and direction, so should between different gestures, carry out suitable smoothing processing.In addition, what this system concentrated concern is the generation of gesture, and the human face expression of having ignored personage in sign language also is a very important factor.

Summary of the invention

In order to overcome the existing in prior technology defective; The invention provides a kind of video visual human sign language system towards DTV; Can save manpower and materials and prepare standard through this system, adopt content-based smoothing processing simultaneously, make and move nature between the gesture; And introduce the coordinated of human face expression and gesture, make that the sign language expression is more accurate and meet reality.

A kind of video visual human sign language system towards DTV at first carries out demultiplexing to the program source code stream, decodes voice, video and other data messages, wherein comprises captioned test information in other data messages; Captioned test is input to visual human's sign language generation module, and this module accesses corresponding sign language data according to the text entry from the sign language storehouse, carries out graphic plotting then and generates the sign language frame, between different gestures, will carry out suitable smoothing processing; With the output then that superposes synchronously of the voice messaging of sign language frame and program.

The sign language generation module is the nucleus module of this system, and it comprises the text resolution module, the gesture generation module, and the expression generation module, gesture and expression synthesis module, frame sequence smoothly reaches simplifies processing module and synchronous processing module; What the text resolution module was imported is the text sequence of captions, and text resolution carries out participle to the captions statement, and the participle of gained obtains corresponding gesture data and expression data through the retrieval of adversary's repertorie; The function of text resolution module comprises, text editing input, the conversion of text dividing and Chinese word to sign language sign indicating number; The text editing input is edited the feasible text dividing that meets next step of preliminary treatment with the Chinese sentence of input; Text dividing is divided into speech with sentence, and punctuation mark becomes speech separately; The participle process of system at first adopts the maximum matching method cutting, utilizes first step word segmentation result to call the speech rule through the ambiguity flag bit of searching entry then, and then carries out ambiguity and proofread and correct; The content that is comprised in the basic dictionary is the pairing Chinese word of sign words that synthesis system can be synthesized; The content that is comprised in the gesture storehouse is the hand graphic data of the sign words that can synthesize of synthesis system, and the relation of hinting obliquely between the data of human face expression and the sign words then is kept in the human face expression storehouse.

The product process concrete steps of sign language frame are following:

Step1: the text resolution module gets access to the captioned test sequence from the captioned test passage, current captioned test is carried out dissection process, directly can obtain being used for synchronous this captions zero-time and termination time; Generate gesture data and expression data through the coupling in adversary's repertorie, change step2;

Step2: utilize OpenGL to draw according to gesture data and expression data, generate the sign language frame sequence, change step3;

Step3: insert the smoothed frame of respective numbers according to the difference size of interframe gesture, promptly carry out smoothing processing, utilize the information redundancy between the gesture to simplify processing simultaneously, change Step4;

Step4: by temporal information sign language frame and programme information are carried out synchronously, the frame per second of adjustment sign language frame, simultaneously also with this temporal information as feedback, smoothing processing is adjusted with simplifying to handle;

Step5: the outputting sign language frame sequence, as the input of video superimpose, finish.

Adopt a kind of frame brush choosing strategy based on context during the Synchronous Processing of sign language frame and programme information, the intensity of variation of the time interval between the frame according to gesture determines; When changing between two frames when big, the time interval during so is also big, otherwise little if the action between two frames changes, the time between this two frame is little so; In addition, carry out smoothing processing, insert an amount of smoothed frame, so that action links up in the big interframe of variation.

The level and smooth degree solution of visual human's gesture motion is exactly to insert some frames according to the size of the difference between two actions to carry out smoothly; The generation of the frame that realization is inserted can adopt the Hermite interpolation algorithm that the joint angle vector is carried out interpolation calculation; The quantity of inserting frame depends on two gap sizes between the gesture, and gap is big more, then is prone to insert more frame number; On the contrary, gap is more little, and the frame number that then inserts then can suitably reduce.

The generation of human face expression relates to the setting of people's face defined parameters FDP, utilizes the Xface instrument three-dimensional face model to be carried out the setting of FDP; After having defined influence area and warping function,, just calculate the displacement on each summit on a certain moment three-dimensional face model according to the cartoon driving method of MPEG-4, and final the drafting renders human face animation for the FAP parameter stream of one group of input; The generation of human face expression simultaneously also comprises the extraction to human face animation parameter F AP; Follow the expression of nature in order to drive three-dimensional virtual human, need obtain the FAP parameter of basic facial expression, happiness, sadness, indignation, fear, detest, surprised; All in theory facial expressions can be synthesized by these basic expressions; Through the setting of people's face defined parameters and people's face kinematic parameter,, select to be fit to the expression of current gesture, the accuracy that so further enhancing is expressed the meaning in conjunction with the sign language data.

Video superimpose adopts the superposition algorithm of realizing video according to the rgb value of pixel; The process of video superimpose can be described as: scan main video image, pointer is navigated to the position that needs stack; Scan the pixel value of superimposed image one by one, if the background color pixel be black then skip, if not then replacing the pixel value of corresponding predeterminated position in the main video with this pixel value; Know the entire image been scanned; Each width of cloth image in the video is repeated the real-time stack that above-mentioned additive process can be realized video.

The sign language system is carried out modularization, make convenient transplanting of form of middleware, be adapted at moving in the different system platforms; And consider the rendering performance of different hardware platforms, adjust accordingly according to the performance of hardware: when hardware performance was low, visual human's tri patch was represented in suitable minimizing, sacrificed picture quality and exchanged speed for; When used platform hardware allows, can increase the quantity of tri patch, on the contrary to obtain higher image quality.

In addition the face cartoon method that is based on MPEG-4 that this system adopts in the generation of human face expression also has such as interpolation method, parameterization, Free Transform method, muscle model method, elastic mesh method, FInite Element.

Method realizing video superimpose can also adopt the video superimpose based on brightness value, Alpha value, tone etc. except can superposeing with rgb value.

Technique scheme can be found out, because the present invention has following beneficial effect:

1) utilizes visual human's sign language system and use artificial recording to have to use manpower and material resources sparingly and accurate advantage such as standard;

2) adopted content-based smoothing processing, made and move nature between the gesture, and introduced the coordinated of human face expression and gesture, made sign language express more accurate and meet reality;

3) according to platform property the quantity of visual human's tri patch is carried out the intelligence adjustment, on image quality and operational efficiency, carry out balance;

4) modularized design and middlewareization make things convenient for the transplanting of whole system.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art; To do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below; Obviously, the accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills; Under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is based on the system diagram of visual human's sign language system among the present invention;

Fig. 2 is hand and an arm abstract structure sketch map among the present invention;

Fig. 3 is the product process figure of sign language frame among the present invention;

Fig. 4 be among the present invention Chinese word to the mapping relations figure between the sign language.

Embodiment

To combine the accompanying drawing in the embodiment of the invention below, the technical scheme in the embodiment of the invention is carried out clear, intactly description, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills are not making all other embodiment that obtained under the creative work prerequisite, all belong to the scope of the present invention's protection.

The embodiment of the invention provides a kind of video visual human sign language system towards DTV, can use manpower and material resources sparingly and accurate advantage such as standard, below is elaborated respectively.

The objective of the invention is to solve the defective that exists in the above-mentioned prior art, provide a kind of effect more good sign language system based on the visual human.The problem that mainly solves has: the Synchronous Processing of (1) sign language frame and programme information; (2) smoothing processing of gesture motion; (3) drafting of the collaborative gesture of human face expression; (4) system integration and modularization.

The technical scheme that the present invention adopted is: at first the program source code stream is carried out demultiplexing, decode voice, video and other data messages, wherein comprise captioned test information in other data messages; Captioned test is input to visual human's sign language generation module, and this module accesses corresponding sign language data according to the text entry from the sign language storehouse, carries out graphic plotting then and generates the sign language frame, between different gestures, will carry out suitable smoothing processing; With the output then that superposes synchronously of the voice messaging of sign language frame and program.Concrete system diagram is referring to Fig. 1.

The sign language generation module is the nucleus module of native system, and it comprises the text resolution module, the gesture generation module, and the expression generation module, gesture and expression synthesis module, frame sequence smoothly reaches simplifies processing module and synchronous processing module.What the text resolution module was imported is the text sequence of captions, and text resolution carries out participle to the captions statement, and the participle of gained obtains corresponding gesture data and expression data through the retrieval of adversary's repertorie.The present invention adopts H-Anim (HumanoidAnimation) standard that the visual human is carried out modeling, and a gesture can be used 56 yuan of vector representations, and the abstract schematic of hand and arm is as shown in Figure 2; A sign language motion then can be represented with a vector function that closes from time power set in one's hands.People's face object can be represented with the grid model of three-dimensional; Main through people's face defined parameters (facial definition parameter; FDP) and the human face animation parameter (facial animation parameter FAP) comes to describe respectively the characteristic such as shape, texture of people's face and the motion state of people's face.Characteristics such as the drafting of gesture and the drafting of human face expression all are based on the OpenGL storehouse, have to implement conveniently, and algorithm is ripe, and portability is good.The sign language frame sequence of drawing back formation is not a final result; Because exist the difference on position and the direction between the different gestures, if some gesture even differ very big is directly output then significantly action can to occur not smooth; The mistake of expressing the meaning is so should carry out the smoothing processing of interframe.And the vector of 56 dimensions of definition gesture considers to exist between these 56 factors correlation, can further simplify dimension, the dynamic adaptation, helps the raising of speed of minimizing and the drafting of data volume like this.The sign language frame sequence will with the fusion that superposes of program video frame, the speeds match between them just seems very necessary synchronously so; The temporal information that goes out from the text resolution module parses can indicate the zero-time and the termination time of these captions, can follow according to this two times the sign language frame is adjusted, synchronously.Simultaneously, also having influence on the level and smooth of sign language frame sequence and simplifying processing synchronously between program video frame sequence and the sign language frame as a kind of feedback information.

The flow process of the generation of sign language frame is referring to Fig. 3, and concrete steps are following:

The function of text resolution module comprises, text editing input, the conversion of text dividing and Chinese word to sign language sign indicating number.The text editing input is edited the feasible text dividing that meets next step of preliminary treatment with the Chinese sentence of input.Text dividing is divided into speech with sentence, and punctuation mark becomes speech separately; The participle process of system at first adopts the maximum matching method cutting, utilizes first step word segmentation result to call the speech rule through the ambiguity flag bit of searching entry then, and then carries out ambiguity and proofread and correct.The content that is comprised in the basic dictionary is the pairing Chinese word of sign words that synthesis system can be synthesized.The content that is comprised in the gesture storehouse is the hand graphic data of the sign words that can synthesize of synthesis system, and the relation of hinting obliquely between the data of human face expression and the sign words then is kept in the human face expression storehouse; Generally speaking, we are referred to as the gesture storehouse with gesture storehouse and human face expression storehouse, remove when leaveing no choice but point out separately.Chinese word and sign words and as shown in Figure 4 to the mapping relations of gesture, expression.

The problem that the present invention need solve is the synchronous of sign language frame and programme information.The present invention inserts captions in the captions sequence zero-time and termination time are methods that convenience is feasible, more save time and save manpower with respect to the method for " bat libretto ", and be simultaneously also more accurate.And in fact being produced on of captions just exists in a lot of performance recording processes, and comprised the zero-time and the termination time of each sequence, so this point is than the problem that is easier to solve.And another is synchronously because the characteristics of sign language itself determine; Sign language is the body language that the meaning is expressed in a kind of variation of motion and the expression through hand and arm; Compare its speed of expressing the meaning with natural language slow; Speed exists bigger difference, so mechanically must cause expressing the meaning sign language frame sequence and the stack of program video sequence inharmonious.A kind of frame brush choosing strategy based on context, the intensity of variation of the time interval between the frame according to gesture determines.When changing between two frames when big, the time interval during so is also big, otherwise little if the action between two frames changes, the time between this two frame should be little so.In addition, carry out smoothing processing, insert an amount of smoothed frame, so that action links up in the big interframe of variation.

The level and smooth degree direct influence of visual human's gesture motion is to the property understood of gesture motion.The particularity of visual human's gesture motion is that it is the animation sequence that is spliced by some first animation datas, between adjacent two sign words and the different roots of same sign words, has very big gesture motion difference.If do not do smoothing processing, span is too big between some action so, and excessive velocities can cause not seeing Chu so.Solution is exactly to insert some frames according to the size of the difference between two actions to carry out smoothly.The generation of the frame that realization is inserted can adopt the Hermite interpolation algorithm that the joint angle vector is carried out interpolation calculation.The quantity of inserting frame depends on two gap sizes between the gesture, and gap is big more, is prone to insert more frame number so; On the contrary, gap is more little, and the frame number that inserts so then can suitably reduce.

Sign language be by gesture be aided with the expression posture be the more stable expression system that symbol constitutes, be not sufficiently complete so only there is gesture certainly will cause to express the meaning.The present invention not only provides the generation of gesture motion in the sign language, also generates human face expression simultaneously, and the present invention adopts the face cartoon method based on MPEG-4 to generate human face animation.MPEG-4 is based on the multimedia compression standard of object and since the people from multimedia in occupation of crucial position, so MPEG-4 to the 3 D human face animation formal definition international standard.MPEG-4 defined people's face defined parameters (facial definition parameter, FDP) with the human face animation parameter (facial animation parameter, FAP).Characteristics such as the shape of FDP definition people face, texture wherein, FAP then describes the motion of people's face.In the FDP parameter-definition, need to confirm that (feature point, FP), they have described the position and the shape of people's face major parts such as comprising eye, eyebrow, mouth, hyoid tooth to 84 human face characteristic points.MPEG-4 also comprises 68 FAP, wherein comprises two senior FAP, i.e. lip (viseme) FAP and expression (expression) FAP.For lip FAP, can define some basic, different lips in advance, other lip can be formed by these basic lip linear combination.Expression FAP also is the same principle, can be combined into various abundant expressions by several kinds of basic expression linearities.Except that senior FAP, other common FAP have defined the motion of a certain zonule of people's face respectively.The value of FAP is with human face animation parameter unit (facial animation parameter unit; FAPU) be unit; The purpose that with FAPU is unit is that same FAP parameter is applied on the different model; What produce is the same moving and expression of lip, and the moving and expression of lip is lost shape.

The generation of human face expression relates to the setting of people's face defined parameters (FDP), and the present invention utilizes the Xface instrument three-dimensional face model to be carried out the setting of FDP.After having defined influence area and warping function,, just can calculate the displacement on each summit on a certain moment three-dimensional face model according to the cartoon driving method of MPEG-4, and final drafting renders human face animation for the FAP parameter stream of one group of input.

The generation of human face expression also comprises the extraction to human face animation parameter (FAP).Follow the expression of nature in order to drive three-dimensional virtual human, need obtain the FAP parameter of basic facial expression, happiness, sadness, indignation, fear, detest, surprised.All in theory facial expressions can be synthesized by these basic expressions.

Through the setting of people's face defined parameters and people's face kinematic parameter,, can select to be fit to the expression of current gesture, the accuracy that so further enhancing is expressed the meaning in conjunction with the sign language data.

In addition, the video superimpose part adopts the superposition algorithm of realizing video according to the rgb value of pixel.The process of video superimpose can be described as: scan main video image, pointer is navigated to the position that needs stack; Scan the pixel value of superimposed image one by one, (with black as background color) is then skipped if the background color pixel, if not the pixel value of then replacing corresponding predeterminated position in the main video with this pixel value; Know the entire image been scanned.Each width of cloth image in the video is repeated the real-time stack that above-mentioned additive process can be realized video.

The present invention carries out modularization with the sign language system, makes convenient transplanting of form of middleware, is adapted at moving in the different system platforms; And consider the rendering performance of different hardware platforms, the present invention adjusts according to the performance of hardware accordingly: when hardware performance was low, visual human's tri patch was represented in suitable minimizing, sacrificed picture quality and exchanged speed for; When used platform hardware allows, can increase the quantity of tri patch, on the contrary to obtain higher image quality.

In a word, the present invention is through superposeing to captioned test generation sign language frame sequence and program video sequence; The generation of sign language frame sequence is not only considered the generation of gesture but also has added the generation of human face expression that it is more accurate, abundant to make sign language express; In the sign language frame sequence, done suitable smoothing processing; Feasible action differs greatly and can seamlessly transit between the frame; Simultaneously also utilize the correlation in the gesture vector to simplify, to can only adjusting of dough sheet quantity, to improve the efficient of operation; The last modular design of the present invention and with system middlewareization makes things convenient for system transplantation.

The beneficial effect that technical scheme of the present invention is brought:

The face cartoon method that is based on MPEG-4 that the present invention adopts in the generation of human face expression; In addition also having all can an examination such as methods such as interpolation method, parameterization, Free Transform method, muscle model method, elastic mesh method, FInite Elements, each advantage of these methods.

In addition, also have multiplely in the method that realizes video superimpose, except can superposeing, can also adopt video superimpose based on brightness value, Alpha value, tone etc. with rgb value.

Need to prove, contents such as the information interaction between said apparatus and intrasystem each unit, implementation since with the inventive method embodiment based on same design, particular content can repeat no more referring to the narration among the inventive method embodiment here.

One of ordinary skill in the art will appreciate that all or part of step in the whole bag of tricks of the foregoing description is to instruct relevant hardware to accomplish through program; This program can be stored in the computer-readable recording medium; Storage medium can comprise: read-only memory (ROM; Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc.

More than to a kind of video visual human sign language system that the embodiment of the invention provided towards DTV; Carried out detailed introduction; Used concrete example among this paper principle of the present invention and execution mode are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that on embodiment and range of application, all can change, in sum, this description should not be construed as limitation of the present invention.

Claims

1. the video visual human sign language system towards DTV is characterized in that this system at first carries out demultiplexing to the program source code stream, decodes voice, video and other data messages, wherein comprises captioned test information in other data messages; Captioned test is input to visual human's sign language generation module, and this module accesses corresponding sign language data according to the text entry from the sign language storehouse, carries out graphic plotting then and generates the sign language frame, between different gestures, will carry out suitable smoothing processing; With the output then that superposes synchronously of the voice messaging of sign language frame and program.

2. system according to claim 1 is characterized in that, the sign language generation module is the nucleus module of this system; It comprises the text resolution module, gesture generation module, expression generation module; Gesture and expression synthesis module, frame sequence smoothly reach simplifies processing module and synchronous processing module; What the text resolution module was imported is the text sequence of captions, and text resolution carries out participle to the captions statement, and the participle of gained obtains corresponding gesture data and expression data through the retrieval of adversary's repertorie; The function of text resolution module comprises, text editing input, the conversion of text dividing and Chinese word to sign language sign indicating number; The text editing input is edited the feasible text dividing that meets next step of preliminary treatment with the Chinese sentence of input; Text dividing is divided into speech with sentence, and punctuation mark becomes speech separately; The participle process of system at first adopts the maximum matching method cutting, utilizes first step word segmentation result to call the speech rule through the ambiguity flag bit of searching entry then, and then carries out ambiguity and proofread and correct; The content that is comprised in the basic dictionary is the pairing Chinese word of sign words that synthesis system can be synthesized; The content that is comprised in the gesture storehouse is the hand graphic data of the sign words that can synthesize of synthesis system, and the relation of hinting obliquely between the data of human face expression and the sign words then is kept in the human face expression storehouse.

3. system according to claim 1 and 2 is characterized in that, the product process concrete steps of sign language frame are following:

4. according to claim 1 or 3 described systems, it is characterized in that adopt a kind of frame brush choosing strategy based on context during the Synchronous Processing of sign language frame and programme information, the intensity of variation of the time interval between the frame according to gesture determines; When changing between two frames when big, the time interval during so is also big, otherwise little if the action between two frames changes, the time between this two frame is little so; In addition, carry out smoothing processing, insert an amount of smoothed frame, so that action links up in the big interframe of variation.

5. system according to claim 4 is characterized in that, the level and smooth degree solution of visual human's gesture motion is exactly to insert some frames according to the size of the difference between two actions to carry out smoothly; The generation of the frame that realization is inserted can adopt the Hermite interpolation algorithm that the joint angle vector is carried out interpolation calculation; The quantity of inserting frame depends on two gap sizes between the gesture, and gap is big more, then is prone to insert more frame number; On the contrary, gap is more little, and the frame number that then inserts then can suitably reduce.

6. system according to claim 2 is characterized in that, the generation of human face expression relates to the setting of people's face defined parameters FDP, utilizes the Xface instrument three-dimensional face model to be carried out the setting of FDP; After having defined influence area and warping function,, just calculate the displacement on each summit on a certain moment three-dimensional face model according to the cartoon driving method of MPEG-4, and final the drafting renders human face animation for the FAP parameter stream of one group of input; The generation of human face expression simultaneously also comprises the extraction to human face animation parameter F AP; Follow the expression of nature in order to drive three-dimensional virtual human, need obtain the FAP parameter of basic facial expression, happiness, sadness, indignation, fear, detest, surprised; All in theory facial expressions can be synthesized by these basic expressions; Through the setting of people's face defined parameters and people's face kinematic parameter,, select to be fit to the expression of current gesture, the accuracy that so further enhancing is expressed the meaning in conjunction with the sign language data.

7. system according to claim 3 is characterized in that, video superimpose adopts the superposition algorithm of realizing video according to the rgb value of pixel; The process of video superimpose can be described as: scan main video image, pointer is navigated to the position that needs stack; Scan the pixel value of superimposed image one by one, if the background color pixel be black then skip, if not then replacing the pixel value of corresponding predeterminated position in the main video with this pixel value; Know the entire image been scanned; Each width of cloth image in the video is repeated the real-time stack that above-mentioned additive process can be realized video.

8. system according to claim 1 is characterized in that, the sign language system is carried out modularization, makes convenient transplanting of form of middleware, is adapted at moving in the different system platforms; And consider the rendering performance of different hardware platforms, adjust accordingly according to the performance of hardware: when hardware performance was low, visual human's tri patch was represented in suitable minimizing, sacrificed picture quality and exchanged speed for; When used platform hardware allows, can increase the quantity of tri patch, on the contrary to obtain higher image quality.

9. according to claim 1 or 6 described systems; It is characterized in that; In addition the face cartoon method that is based on MPEG-4 that this system adopts in the generation of human face expression also has such as interpolation method, parameterization, Free Transform method, muscle model method, elastic mesh method, FInite Element.

10. system according to claim 7 is characterized in that, except can superposeing with rgb value, can also adopt the video superimpose based on brightness value, Alpha value, tone etc. in the method that realizes video superimpose.