CN102497513A - Video virtual hand language system facing digital television - Google Patents
Video virtual hand language system facing digital television Download PDFInfo
- Publication number
- CN102497513A CN102497513A CN2011103804082A CN201110380408A CN102497513A CN 102497513 A CN102497513 A CN 102497513A CN 2011103804082 A CN2011103804082 A CN 2011103804082A CN 201110380408 A CN201110380408 A CN 201110380408A CN 102497513 A CN102497513 A CN 102497513A
- Authority
- CN
- China
- Prior art keywords
- sign language
- frame
- gesture
- expression
- video
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000014509 gene expression Effects 0.000 claims abstract description 36
- 238000012545 processing Methods 0.000 claims abstract description 36
- 230000008921 facial expression Effects 0.000 claims abstract description 31
- 230000033001 locomotion Effects 0.000 claims abstract description 15
- 230000001360 synchronised effect Effects 0.000 claims abstract description 11
- 238000000034 method Methods 0.000 claims description 54
- 230000000007 visual effect Effects 0.000 claims description 31
- 238000009499 grossing Methods 0.000 claims description 21
- 230000008569 process Effects 0.000 claims description 17
- 238000012360 testing method Methods 0.000 claims description 16
- 230000015572 biosynthetic process Effects 0.000 claims description 15
- 238000003786 synthesis reaction Methods 0.000 claims description 14
- 230000009471 action Effects 0.000 claims description 12
- 230000008859 change Effects 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 8
- 230000002123 temporal effect Effects 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 5
- 239000000654 additive Substances 0.000 claims description 3
- 230000000996 additive effect Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000008878 coupling Effects 0.000 claims description 3
- 238000010168 coupling process Methods 0.000 claims description 3
- 238000005859 coupling reaction Methods 0.000 claims description 3
- 238000006073 displacement reaction Methods 0.000 claims description 3
- 238000002224 dissection Methods 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 239000004744 fabric Substances 0.000 claims description 3
- 210000003205 muscle Anatomy 0.000 claims description 3
- 238000009877 rendering Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 239000000463 material Substances 0.000 abstract description 6
- 230000001815 facial effect Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000013461 design Methods 0.000 description 4
- 206010011878 Deafness Diseases 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000002950 deficient Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 235000012364 Peperomia pellucida Nutrition 0.000 description 1
- 240000007711 Peperomia pellucida Species 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 210000004709 eyebrow Anatomy 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000010473 stable expression Effects 0.000 description 1
- 238000002054 transplantation Methods 0.000 description 1
Images
Landscapes
- Processing Or Creating Images (AREA)
Abstract
The invention discloses a video virtual hand language system facing a digital television. According to the system, de-multiplexing is carried out on a program source code stream, and voice, video and other data information are decoded, wherein other data information comprises caption text information; a caption text is inputted into a virtual hand language generation module which calls corresponding hand language data from a hand language database according to a text entry and carries out figure drafting to generate a hand language frame, and proper smooth processing is carried out between different gestures; and the hand language frame and voice information of a program are subjected to synchronous superposition and outputting. Through the video virtual hand language system, manpower and material resources are saved and preparation is standard, simultaneously, the smooth processing based on content is employed, motions of a gesture is natural, cooperation of face expression and the gesture is introduced, and gesture expression is more accurate and in accordance with reality.
Description
Technical field
The present invention relates to the digital television techniques field, be specifically related to a kind of video visual human sign language system towards DTV.
Background technology
There is nearly deaf-mute of 3,000 ten thousand in China, solves the deaf-mute and sees that the difficult problem of TV is a great project supported by commen people.And current most of TV station all is not furnished with the channel that suitable deaf-mute watches; Some news column adopts the artificial form of recording to accomplish the conversion of sign language; Not only labor manpower and materials, and the standardized degree of ageing and accuracy and sign language are all very not enough.
In recent years Sign Language Recognition and sign language study on the synthesis all there are very big progress, preliminary application have also been arranged, begun to try out in a lot of public arenas based on visual human's sign language system.The software of having developed such as Vcom3D company that can let the people exchange with other people on the internet through sign language and countenance; The ViSiCAST system in Europe has used movement capturing technology to realize the conversion from voice to Britain's sign language, and this system uses in public situation such as post office, networks.And at home, 2009, the Chinese Academy of Sciences researched and developed " video frequency virtual humance sign language compiling system " and has applied it in the radio data system; Harbin Institute of Technology has also proposed the news in Sign Language broadcasting method based on the visual human of oneself; Also have the development of company of Konka with the closely-related a kind of television set that function is explained orally in sign language that has of hardware.
At present " video frequency virtual humance sign language compiling system " of the Chinese Academy of Sciences comprises program source input, the department of computer science output module of unifying in the prior art.Wherein computer system is a core, can be industrial computer system, mainly comprises 5 modules: (1) sign language synthesis module, the program text of input is translated into the sign language data; (2) visual human's synthesis module is expressed described sign language data through the visual human; (3) the non-linear editing integrated circuit board of support overplay; Voice duration synchronizing information acquisition module writes down the corresponding initial sum termination time of each text; (5) primary module is responsible for the communication for coordination between above-mentioned each module.Primary module; The text sentence corresponding according to the voice duration information synchronization call of being obtained; Translate into the sign language data by the sign language synthesis module, the visual human who is generated by visual human's synthesis module again expresses, and through the non-linear editing integrated circuit board sign language frame is added in the program image.
The shortcoming of prior art is that the synchronizing information of voice and sign language captions in the above-mentioned video frequency virtual humance sign language compiling system obtains is the mode through artificial " bat libretto "; Obtain the pairing time span information of all text sentences of video; This method is through watching program on one side through the staff; Pat a lower keyboard in the place of needs on one side, make software records get off pairing zero-time of each text and termination time.This method is labor intensive not only, and it is strong to have subjectivity, drawbacks such as inaccuracy.Secondly; Smoothing processing is not made in generation to the visual human in this system; Just carry out the man-to-man drafting of calling according to the mapping relations of text entry and gesture data; Because possibly there are bigger difference in position between the different gestures and direction, so should between different gestures, carry out suitable smoothing processing.In addition, what this system concentrated concern is the generation of gesture, and the human face expression of having ignored personage in sign language also is a very important factor.
Summary of the invention
In order to overcome the existing in prior technology defective; The invention provides a kind of video visual human sign language system towards DTV; Can save manpower and materials and prepare standard through this system, adopt content-based smoothing processing simultaneously, make and move nature between the gesture; And introduce the coordinated of human face expression and gesture, make that the sign language expression is more accurate and meet reality.
A kind of video visual human sign language system towards DTV at first carries out demultiplexing to the program source code stream, decodes voice, video and other data messages, wherein comprises captioned test information in other data messages; Captioned test is input to visual human's sign language generation module, and this module accesses corresponding sign language data according to the text entry from the sign language storehouse, carries out graphic plotting then and generates the sign language frame, between different gestures, will carry out suitable smoothing processing; With the output then that superposes synchronously of the voice messaging of sign language frame and program.
The sign language generation module is the nucleus module of this system, and it comprises the text resolution module, the gesture generation module, and the expression generation module, gesture and expression synthesis module, frame sequence smoothly reaches simplifies processing module and synchronous processing module; What the text resolution module was imported is the text sequence of captions, and text resolution carries out participle to the captions statement, and the participle of gained obtains corresponding gesture data and expression data through the retrieval of adversary's repertorie; The function of text resolution module comprises, text editing input, the conversion of text dividing and Chinese word to sign language sign indicating number; The text editing input is edited the feasible text dividing that meets next step of preliminary treatment with the Chinese sentence of input; Text dividing is divided into speech with sentence, and punctuation mark becomes speech separately; The participle process of system at first adopts the maximum matching method cutting, utilizes first step word segmentation result to call the speech rule through the ambiguity flag bit of searching entry then, and then carries out ambiguity and proofread and correct; The content that is comprised in the basic dictionary is the pairing Chinese word of sign words that synthesis system can be synthesized; The content that is comprised in the gesture storehouse is the hand graphic data of the sign words that can synthesize of synthesis system, and the relation of hinting obliquely between the data of human face expression and the sign words then is kept in the human face expression storehouse.
The product process concrete steps of sign language frame are following:
Step1: the text resolution module gets access to the captioned test sequence from the captioned test passage, current captioned test is carried out dissection process, directly can obtain being used for synchronous this captions zero-time and termination time; Generate gesture data and expression data through the coupling in adversary's repertorie, change step2;
Step2: utilize OpenGL to draw according to gesture data and expression data, generate the sign language frame sequence, change step3;
Step3: insert the smoothed frame of respective numbers according to the difference size of interframe gesture, promptly carry out smoothing processing, utilize the information redundancy between the gesture to simplify processing simultaneously, change Step4;
Step4: by temporal information sign language frame and programme information are carried out synchronously, the frame per second of adjustment sign language frame, simultaneously also with this temporal information as feedback, smoothing processing is adjusted with simplifying to handle;
Step5: the outputting sign language frame sequence, as the input of video superimpose, finish.
Adopt a kind of frame brush choosing strategy based on context during the Synchronous Processing of sign language frame and programme information, the intensity of variation of the time interval between the frame according to gesture determines; When changing between two frames when big, the time interval during so is also big, otherwise little if the action between two frames changes, the time between this two frame is little so; In addition, carry out smoothing processing, insert an amount of smoothed frame, so that action links up in the big interframe of variation.
The level and smooth degree solution of visual human's gesture motion is exactly to insert some frames according to the size of the difference between two actions to carry out smoothly; The generation of the frame that realization is inserted can adopt the Hermite interpolation algorithm that the joint angle vector is carried out interpolation calculation; The quantity of inserting frame depends on two gap sizes between the gesture, and gap is big more, then is prone to insert more frame number; On the contrary, gap is more little, and the frame number that then inserts then can suitably reduce.
The generation of human face expression relates to the setting of people's face defined parameters FDP, utilizes the Xface instrument three-dimensional face model to be carried out the setting of FDP; After having defined influence area and warping function,, just calculate the displacement on each summit on a certain moment three-dimensional face model according to the cartoon driving method of MPEG-4, and final the drafting renders human face animation for the FAP parameter stream of one group of input; The generation of human face expression simultaneously also comprises the extraction to human face animation parameter F AP; Follow the expression of nature in order to drive three-dimensional virtual human, need obtain the FAP parameter of basic facial expression, happiness, sadness, indignation, fear, detest, surprised; All in theory facial expressions can be synthesized by these basic expressions; Through the setting of people's face defined parameters and people's face kinematic parameter,, select to be fit to the expression of current gesture, the accuracy that so further enhancing is expressed the meaning in conjunction with the sign language data.
Video superimpose adopts the superposition algorithm of realizing video according to the rgb value of pixel; The process of video superimpose can be described as: scan main video image, pointer is navigated to the position that needs stack; Scan the pixel value of superimposed image one by one, if the background color pixel be black then skip, if not then replacing the pixel value of corresponding predeterminated position in the main video with this pixel value; Know the entire image been scanned; Each width of cloth image in the video is repeated the real-time stack that above-mentioned additive process can be realized video.
The sign language system is carried out modularization, make convenient transplanting of form of middleware, be adapted at moving in the different system platforms; And consider the rendering performance of different hardware platforms, adjust accordingly according to the performance of hardware: when hardware performance was low, visual human's tri patch was represented in suitable minimizing, sacrificed picture quality and exchanged speed for; When used platform hardware allows, can increase the quantity of tri patch, on the contrary to obtain higher image quality.
In addition the face cartoon method that is based on MPEG-4 that this system adopts in the generation of human face expression also has such as interpolation method, parameterization, Free Transform method, muscle model method, elastic mesh method, FInite Element.
Method realizing video superimpose can also adopt the video superimpose based on brightness value, Alpha value, tone etc. except can superposeing with rgb value.
Technique scheme can be found out, because the present invention has following beneficial effect:
1) utilizes visual human's sign language system and use artificial recording to have to use manpower and material resources sparingly and accurate advantage such as standard;
2) adopted content-based smoothing processing, made and move nature between the gesture, and introduced the coordinated of human face expression and gesture, made sign language express more accurate and meet reality;
3) according to platform property the quantity of visual human's tri patch is carried out the intelligence adjustment, on image quality and operational efficiency, carry out balance;
4) modularized design and middlewareization make things convenient for the transplanting of whole system.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art; To do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art below; Obviously, the accompanying drawing in describing below only is some embodiments of the present invention, for those of ordinary skills; Under the prerequisite of not paying creative work, can also obtain other accompanying drawing according to these accompanying drawings.
Fig. 1 is based on the system diagram of visual human's sign language system among the present invention;
Fig. 2 is hand and an arm abstract structure sketch map among the present invention;
Fig. 3 is the product process figure of sign language frame among the present invention;
Fig. 4 be among the present invention Chinese word to the mapping relations figure between the sign language.
Embodiment
To combine the accompanying drawing in the embodiment of the invention below, the technical scheme in the embodiment of the invention is carried out clear, intactly description, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills are not making all other embodiment that obtained under the creative work prerequisite, all belong to the scope of the present invention's protection.
The embodiment of the invention provides a kind of video visual human sign language system towards DTV, can use manpower and material resources sparingly and accurate advantage such as standard, below is elaborated respectively.
The objective of the invention is to solve the defective that exists in the above-mentioned prior art, provide a kind of effect more good sign language system based on the visual human.The problem that mainly solves has: the Synchronous Processing of (1) sign language frame and programme information; (2) smoothing processing of gesture motion; (3) drafting of the collaborative gesture of human face expression; (4) system integration and modularization.
The technical scheme that the present invention adopted is: at first the program source code stream is carried out demultiplexing, decode voice, video and other data messages, wherein comprise captioned test information in other data messages; Captioned test is input to visual human's sign language generation module, and this module accesses corresponding sign language data according to the text entry from the sign language storehouse, carries out graphic plotting then and generates the sign language frame, between different gestures, will carry out suitable smoothing processing; With the output then that superposes synchronously of the voice messaging of sign language frame and program.Concrete system diagram is referring to Fig. 1.
The sign language generation module is the nucleus module of native system, and it comprises the text resolution module, the gesture generation module, and the expression generation module, gesture and expression synthesis module, frame sequence smoothly reaches simplifies processing module and synchronous processing module.What the text resolution module was imported is the text sequence of captions, and text resolution carries out participle to the captions statement, and the participle of gained obtains corresponding gesture data and expression data through the retrieval of adversary's repertorie.The present invention adopts H-Anim (HumanoidAnimation) standard that the visual human is carried out modeling, and a gesture can be used 56 yuan of vector representations, and the abstract schematic of hand and arm is as shown in Figure 2; A sign language motion then can be represented with a vector function that closes from time power set in one's hands.People's face object can be represented with the grid model of three-dimensional; Main through people's face defined parameters (facial definition parameter; FDP) and the human face animation parameter (facial animation parameter FAP) comes to describe respectively the characteristic such as shape, texture of people's face and the motion state of people's face.Characteristics such as the drafting of gesture and the drafting of human face expression all are based on the OpenGL storehouse, have to implement conveniently, and algorithm is ripe, and portability is good.The sign language frame sequence of drawing back formation is not a final result; Because exist the difference on position and the direction between the different gestures, if some gesture even differ very big is directly output then significantly action can to occur not smooth; The mistake of expressing the meaning is so should carry out the smoothing processing of interframe.And the vector of 56 dimensions of definition gesture considers to exist between these 56 factors correlation, can further simplify dimension, the dynamic adaptation, helps the raising of speed of minimizing and the drafting of data volume like this.The sign language frame sequence will with the fusion that superposes of program video frame, the speeds match between them just seems very necessary synchronously so; The temporal information that goes out from the text resolution module parses can indicate the zero-time and the termination time of these captions, can follow according to this two times the sign language frame is adjusted, synchronously.Simultaneously, also having influence on the level and smooth of sign language frame sequence and simplifying processing synchronously between program video frame sequence and the sign language frame as a kind of feedback information.
The flow process of the generation of sign language frame is referring to Fig. 3, and concrete steps are following:
Step1: the text resolution module gets access to the captioned test sequence from the captioned test passage, current captioned test is carried out dissection process, directly can obtain being used for synchronous this captions zero-time and termination time; Generate gesture data and expression data through the coupling in adversary's repertorie, change step2;
Step2: utilize OpenGL to draw according to gesture data and expression data, generate the sign language frame sequence, change step3;
Step3: insert the smoothed frame of respective numbers according to the difference size of interframe gesture, promptly carry out smoothing processing, utilize the information redundancy between the gesture to simplify processing simultaneously, change Step4;
Step4: by temporal information sign language frame and programme information are carried out synchronously, the frame per second of adjustment sign language frame, simultaneously also with this temporal information as feedback, smoothing processing is adjusted with simplifying to handle;
Step5: the outputting sign language frame sequence, as the input of video superimpose, finish.
The function of text resolution module comprises, text editing input, the conversion of text dividing and Chinese word to sign language sign indicating number.The text editing input is edited the feasible text dividing that meets next step of preliminary treatment with the Chinese sentence of input.Text dividing is divided into speech with sentence, and punctuation mark becomes speech separately; The participle process of system at first adopts the maximum matching method cutting, utilizes first step word segmentation result to call the speech rule through the ambiguity flag bit of searching entry then, and then carries out ambiguity and proofread and correct.The content that is comprised in the basic dictionary is the pairing Chinese word of sign words that synthesis system can be synthesized.The content that is comprised in the gesture storehouse is the hand graphic data of the sign words that can synthesize of synthesis system, and the relation of hinting obliquely between the data of human face expression and the sign words then is kept in the human face expression storehouse; Generally speaking, we are referred to as the gesture storehouse with gesture storehouse and human face expression storehouse, remove when leaveing no choice but point out separately.Chinese word and sign words and as shown in Figure 4 to the mapping relations of gesture, expression.
The problem that the present invention need solve is the synchronous of sign language frame and programme information.The present invention inserts captions in the captions sequence zero-time and termination time are methods that convenience is feasible, more save time and save manpower with respect to the method for " bat libretto ", and be simultaneously also more accurate.And in fact being produced on of captions just exists in a lot of performance recording processes, and comprised the zero-time and the termination time of each sequence, so this point is than the problem that is easier to solve.And another is synchronously because the characteristics of sign language itself determine; Sign language is the body language that the meaning is expressed in a kind of variation of motion and the expression through hand and arm; Compare its speed of expressing the meaning with natural language slow; Speed exists bigger difference, so mechanically must cause expressing the meaning sign language frame sequence and the stack of program video sequence inharmonious.A kind of frame brush choosing strategy based on context, the intensity of variation of the time interval between the frame according to gesture determines.When changing between two frames when big, the time interval during so is also big, otherwise little if the action between two frames changes, the time between this two frame should be little so.In addition, carry out smoothing processing, insert an amount of smoothed frame, so that action links up in the big interframe of variation.
The level and smooth degree direct influence of visual human's gesture motion is to the property understood of gesture motion.The particularity of visual human's gesture motion is that it is the animation sequence that is spliced by some first animation datas, between adjacent two sign words and the different roots of same sign words, has very big gesture motion difference.If do not do smoothing processing, span is too big between some action so, and excessive velocities can cause not seeing Chu so.Solution is exactly to insert some frames according to the size of the difference between two actions to carry out smoothly.The generation of the frame that realization is inserted can adopt the Hermite interpolation algorithm that the joint angle vector is carried out interpolation calculation.The quantity of inserting frame depends on two gap sizes between the gesture, and gap is big more, is prone to insert more frame number so; On the contrary, gap is more little, and the frame number that inserts so then can suitably reduce.
Sign language be by gesture be aided with the expression posture be the more stable expression system that symbol constitutes, be not sufficiently complete so only there is gesture certainly will cause to express the meaning.The present invention not only provides the generation of gesture motion in the sign language, also generates human face expression simultaneously, and the present invention adopts the face cartoon method based on MPEG-4 to generate human face animation.MPEG-4 is based on the multimedia compression standard of object and since the people from multimedia in occupation of crucial position, so MPEG-4 to the 3 D human face animation formal definition international standard.MPEG-4 defined people's face defined parameters (facial definition parameter, FDP) with the human face animation parameter (facial animation parameter, FAP).Characteristics such as the shape of FDP definition people face, texture wherein, FAP then describes the motion of people's face.In the FDP parameter-definition, need to confirm that (feature point, FP), they have described the position and the shape of people's face major parts such as comprising eye, eyebrow, mouth, hyoid tooth to 84 human face characteristic points.MPEG-4 also comprises 68 FAP, wherein comprises two senior FAP, i.e. lip (viseme) FAP and expression (expression) FAP.For lip FAP, can define some basic, different lips in advance, other lip can be formed by these basic lip linear combination.Expression FAP also is the same principle, can be combined into various abundant expressions by several kinds of basic expression linearities.Except that senior FAP, other common FAP have defined the motion of a certain zonule of people's face respectively.The value of FAP is with human face animation parameter unit (facial animation parameter unit; FAPU) be unit; The purpose that with FAPU is unit is that same FAP parameter is applied on the different model; What produce is the same moving and expression of lip, and the moving and expression of lip is lost shape.
The generation of human face expression relates to the setting of people's face defined parameters (FDP), and the present invention utilizes the Xface instrument three-dimensional face model to be carried out the setting of FDP.After having defined influence area and warping function,, just can calculate the displacement on each summit on a certain moment three-dimensional face model according to the cartoon driving method of MPEG-4, and final drafting renders human face animation for the FAP parameter stream of one group of input.
The generation of human face expression also comprises the extraction to human face animation parameter (FAP).Follow the expression of nature in order to drive three-dimensional virtual human, need obtain the FAP parameter of basic facial expression, happiness, sadness, indignation, fear, detest, surprised.All in theory facial expressions can be synthesized by these basic expressions.
Through the setting of people's face defined parameters and people's face kinematic parameter,, can select to be fit to the expression of current gesture, the accuracy that so further enhancing is expressed the meaning in conjunction with the sign language data.
In addition, the video superimpose part adopts the superposition algorithm of realizing video according to the rgb value of pixel.The process of video superimpose can be described as: scan main video image, pointer is navigated to the position that needs stack; Scan the pixel value of superimposed image one by one, (with black as background color) is then skipped if the background color pixel, if not the pixel value of then replacing corresponding predeterminated position in the main video with this pixel value; Know the entire image been scanned.Each width of cloth image in the video is repeated the real-time stack that above-mentioned additive process can be realized video.
The present invention carries out modularization with the sign language system, makes convenient transplanting of form of middleware, is adapted at moving in the different system platforms; And consider the rendering performance of different hardware platforms, the present invention adjusts according to the performance of hardware accordingly: when hardware performance was low, visual human's tri patch was represented in suitable minimizing, sacrificed picture quality and exchanged speed for; When used platform hardware allows, can increase the quantity of tri patch, on the contrary to obtain higher image quality.
In a word, the present invention is through superposeing to captioned test generation sign language frame sequence and program video sequence; The generation of sign language frame sequence is not only considered the generation of gesture but also has added the generation of human face expression that it is more accurate, abundant to make sign language express; In the sign language frame sequence, done suitable smoothing processing; Feasible action differs greatly and can seamlessly transit between the frame; Simultaneously also utilize the correlation in the gesture vector to simplify, to can only adjusting of dough sheet quantity, to improve the efficient of operation; The last modular design of the present invention and with system middlewareization makes things convenient for system transplantation.
The beneficial effect that technical scheme of the present invention is brought:
1) utilizes visual human's sign language system and use artificial recording to have to use manpower and material resources sparingly and accurate advantage such as standard;
2) adopted content-based smoothing processing, made and move nature between the gesture, and introduced the coordinated of human face expression and gesture, made sign language express more accurate and meet reality;
3) according to platform property the quantity of visual human's tri patch is carried out the intelligence adjustment, on image quality and operational efficiency, carry out balance;
4) modularized design and middlewareization make things convenient for the transplanting of whole system.
The face cartoon method that is based on MPEG-4 that the present invention adopts in the generation of human face expression; In addition also having all can an examination such as methods such as interpolation method, parameterization, Free Transform method, muscle model method, elastic mesh method, FInite Elements, each advantage of these methods.
In addition, also have multiplely in the method that realizes video superimpose, except can superposeing, can also adopt video superimpose based on brightness value, Alpha value, tone etc. with rgb value.
Need to prove, contents such as the information interaction between said apparatus and intrasystem each unit, implementation since with the inventive method embodiment based on same design, particular content can repeat no more referring to the narration among the inventive method embodiment here.
One of ordinary skill in the art will appreciate that all or part of step in the whole bag of tricks of the foregoing description is to instruct relevant hardware to accomplish through program; This program can be stored in the computer-readable recording medium; Storage medium can comprise: read-only memory (ROM; Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc.
More than to a kind of video visual human sign language system that the embodiment of the invention provided towards DTV; Carried out detailed introduction; Used concrete example among this paper principle of the present invention and execution mode are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that on embodiment and range of application, all can change, in sum, this description should not be construed as limitation of the present invention.
Claims (10)
1. the video visual human sign language system towards DTV is characterized in that this system at first carries out demultiplexing to the program source code stream, decodes voice, video and other data messages, wherein comprises captioned test information in other data messages; Captioned test is input to visual human's sign language generation module, and this module accesses corresponding sign language data according to the text entry from the sign language storehouse, carries out graphic plotting then and generates the sign language frame, between different gestures, will carry out suitable smoothing processing; With the output then that superposes synchronously of the voice messaging of sign language frame and program.
2. system according to claim 1 is characterized in that, the sign language generation module is the nucleus module of this system; It comprises the text resolution module, gesture generation module, expression generation module; Gesture and expression synthesis module, frame sequence smoothly reach simplifies processing module and synchronous processing module; What the text resolution module was imported is the text sequence of captions, and text resolution carries out participle to the captions statement, and the participle of gained obtains corresponding gesture data and expression data through the retrieval of adversary's repertorie; The function of text resolution module comprises, text editing input, the conversion of text dividing and Chinese word to sign language sign indicating number; The text editing input is edited the feasible text dividing that meets next step of preliminary treatment with the Chinese sentence of input; Text dividing is divided into speech with sentence, and punctuation mark becomes speech separately; The participle process of system at first adopts the maximum matching method cutting, utilizes first step word segmentation result to call the speech rule through the ambiguity flag bit of searching entry then, and then carries out ambiguity and proofread and correct; The content that is comprised in the basic dictionary is the pairing Chinese word of sign words that synthesis system can be synthesized; The content that is comprised in the gesture storehouse is the hand graphic data of the sign words that can synthesize of synthesis system, and the relation of hinting obliquely between the data of human face expression and the sign words then is kept in the human face expression storehouse.
3. system according to claim 1 and 2 is characterized in that, the product process concrete steps of sign language frame are following:
Step1: the text resolution module gets access to the captioned test sequence from the captioned test passage, current captioned test is carried out dissection process, directly can obtain being used for synchronous this captions zero-time and termination time; Generate gesture data and expression data through the coupling in adversary's repertorie, change step2;
Step2: utilize OpenGL to draw according to gesture data and expression data, generate the sign language frame sequence, change step3;
Step3: insert the smoothed frame of respective numbers according to the difference size of interframe gesture, promptly carry out smoothing processing, utilize the information redundancy between the gesture to simplify processing simultaneously, change Step4;
Step4: by temporal information sign language frame and programme information are carried out synchronously, the frame per second of adjustment sign language frame, simultaneously also with this temporal information as feedback, smoothing processing is adjusted with simplifying to handle;
Step5: the outputting sign language frame sequence, as the input of video superimpose, finish.
4. according to claim 1 or 3 described systems, it is characterized in that adopt a kind of frame brush choosing strategy based on context during the Synchronous Processing of sign language frame and programme information, the intensity of variation of the time interval between the frame according to gesture determines; When changing between two frames when big, the time interval during so is also big, otherwise little if the action between two frames changes, the time between this two frame is little so; In addition, carry out smoothing processing, insert an amount of smoothed frame, so that action links up in the big interframe of variation.
5. system according to claim 4 is characterized in that, the level and smooth degree solution of visual human's gesture motion is exactly to insert some frames according to the size of the difference between two actions to carry out smoothly; The generation of the frame that realization is inserted can adopt the Hermite interpolation algorithm that the joint angle vector is carried out interpolation calculation; The quantity of inserting frame depends on two gap sizes between the gesture, and gap is big more, then is prone to insert more frame number; On the contrary, gap is more little, and the frame number that then inserts then can suitably reduce.
6. system according to claim 2 is characterized in that, the generation of human face expression relates to the setting of people's face defined parameters FDP, utilizes the Xface instrument three-dimensional face model to be carried out the setting of FDP; After having defined influence area and warping function,, just calculate the displacement on each summit on a certain moment three-dimensional face model according to the cartoon driving method of MPEG-4, and final the drafting renders human face animation for the FAP parameter stream of one group of input; The generation of human face expression simultaneously also comprises the extraction to human face animation parameter F AP; Follow the expression of nature in order to drive three-dimensional virtual human, need obtain the FAP parameter of basic facial expression, happiness, sadness, indignation, fear, detest, surprised; All in theory facial expressions can be synthesized by these basic expressions; Through the setting of people's face defined parameters and people's face kinematic parameter,, select to be fit to the expression of current gesture, the accuracy that so further enhancing is expressed the meaning in conjunction with the sign language data.
7. system according to claim 3 is characterized in that, video superimpose adopts the superposition algorithm of realizing video according to the rgb value of pixel; The process of video superimpose can be described as: scan main video image, pointer is navigated to the position that needs stack; Scan the pixel value of superimposed image one by one, if the background color pixel be black then skip, if not then replacing the pixel value of corresponding predeterminated position in the main video with this pixel value; Know the entire image been scanned; Each width of cloth image in the video is repeated the real-time stack that above-mentioned additive process can be realized video.
8. system according to claim 1 is characterized in that, the sign language system is carried out modularization, makes convenient transplanting of form of middleware, is adapted at moving in the different system platforms; And consider the rendering performance of different hardware platforms, adjust accordingly according to the performance of hardware: when hardware performance was low, visual human's tri patch was represented in suitable minimizing, sacrificed picture quality and exchanged speed for; When used platform hardware allows, can increase the quantity of tri patch, on the contrary to obtain higher image quality.
9. according to claim 1 or 6 described systems; It is characterized in that; In addition the face cartoon method that is based on MPEG-4 that this system adopts in the generation of human face expression also has such as interpolation method, parameterization, Free Transform method, muscle model method, elastic mesh method, FInite Element.
10. system according to claim 7 is characterized in that, except can superposeing with rgb value, can also adopt the video superimpose based on brightness value, Alpha value, tone etc. in the method that realizes video superimpose.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011103804082A CN102497513A (en) | 2011-11-25 | 2011-11-25 | Video virtual hand language system facing digital television |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011103804082A CN102497513A (en) | 2011-11-25 | 2011-11-25 | Video virtual hand language system facing digital television |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102497513A true CN102497513A (en) | 2012-06-13 |
Family
ID=46189297
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011103804082A Pending CN102497513A (en) | 2011-11-25 | 2011-11-25 | Video virtual hand language system facing digital television |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102497513A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104732590A (en) * | 2015-03-09 | 2015-06-24 | 北京工业大学 | Sign language animation synthesis method |
CN105868282A (en) * | 2016-03-23 | 2016-08-17 | 乐视致新电子科技(天津)有限公司 | Method and apparatus used by deaf-mute to perform information communication, and intelligent terminal |
CN106056994A (en) * | 2016-08-16 | 2016-10-26 | 安徽渔之蓝教育软件技术有限公司 | Assisted learning system for gesture language vocational education |
CN106653051A (en) * | 2016-12-09 | 2017-05-10 | 天脉聚源(北京)传媒科技有限公司 | Video deaf-mute mode expression method and apparatus |
CN109166409A (en) * | 2018-10-10 | 2019-01-08 | 长沙千博信息技术有限公司 | A kind of sign language conversion method and device |
CN109982757A (en) * | 2016-06-30 | 2019-07-05 | 阿巴卡达巴广告和出版有限公司 | Digital multi-media platform |
CN110413841A (en) * | 2019-06-13 | 2019-11-05 | 深圳追一科技有限公司 | Polymorphic exchange method, device, system, electronic equipment and storage medium |
CN110570877A (en) * | 2019-07-25 | 2019-12-13 | 咪咕文化科技有限公司 | Sign language video generation method, electronic device and computer readable storage medium |
CN110826441A (en) * | 2019-10-25 | 2020-02-21 | 深圳追一科技有限公司 | Interaction method, interaction device, terminal equipment and storage medium |
CN111369652A (en) * | 2020-02-28 | 2020-07-03 | 长沙千博信息技术有限公司 | Method for generating continuous sign language action based on multiple independent sign language actions |
CN111985268A (en) * | 2019-05-21 | 2020-11-24 | 搜狗(杭州)智能科技有限公司 | Method and device for driving animation by human face |
CN113326746A (en) * | 2021-05-13 | 2021-08-31 | 中国工商银行股份有限公司 | Sign language broadcasting method and device for human body model |
CN113689879A (en) * | 2020-05-18 | 2021-11-23 | 北京搜狗科技发展有限公司 | Method, device, electronic equipment and medium for driving virtual human in real time |
CN113689530A (en) * | 2020-05-18 | 2021-11-23 | 北京搜狗科技发展有限公司 | Method and device for driving digital person and electronic equipment |
CN115223428A (en) * | 2021-04-20 | 2022-10-21 | 美光科技公司 | Converting sign language |
CN115239855A (en) * | 2022-06-23 | 2022-10-25 | 安徽福斯特信息技术有限公司 | Virtual sign language anchor generation method, device and system based on mobile terminal |
CN115484493A (en) * | 2022-09-09 | 2022-12-16 | 深圳市小溪流科技有限公司 | Real-time intelligent streaming media system for converting IPTV audio and video into virtual sign language video in real time |
CN115497499A (en) * | 2022-08-30 | 2022-12-20 | 阿里巴巴(中国)有限公司 | Method for synchronizing voice and action time |
CN116959119A (en) * | 2023-09-12 | 2023-10-27 | 北京智谱华章科技有限公司 | Sign language digital person driving method and system based on large language model |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101005574A (en) * | 2006-01-17 | 2007-07-25 | 上海中科计算技术研究所 | Video frequency virtual humance sign language compiling system |
CN101727766A (en) * | 2009-12-04 | 2010-06-09 | 哈尔滨工业大学深圳研究生院 | Sign language news broadcasting method based on visual human |
-
2011
- 2011-11-25 CN CN2011103804082A patent/CN102497513A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101005574A (en) * | 2006-01-17 | 2007-07-25 | 上海中科计算技术研究所 | Video frequency virtual humance sign language compiling system |
CN101727766A (en) * | 2009-12-04 | 2010-06-09 | 哈尔滨工业大学深圳研究生院 | Sign language news broadcasting method based on visual human |
Non-Patent Citations (2)
Title |
---|
晏 洁,宋益波,高 文: "一个聋哑人辅助教学系统", 《计算机辅助设计与图形学学报》 * |
王兆其, 高 文: "基于虚拟人合成技术的中国手语合成方法", 《软 件 学 报》 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104732590A (en) * | 2015-03-09 | 2015-06-24 | 北京工业大学 | Sign language animation synthesis method |
CN104732590B (en) * | 2015-03-09 | 2018-06-22 | 北京工业大学 | A kind of synthetic method of sign language animation |
CN105868282A (en) * | 2016-03-23 | 2016-08-17 | 乐视致新电子科技(天津)有限公司 | Method and apparatus used by deaf-mute to perform information communication, and intelligent terminal |
CN109982757A (en) * | 2016-06-30 | 2019-07-05 | 阿巴卡达巴广告和出版有限公司 | Digital multi-media platform |
CN106056994A (en) * | 2016-08-16 | 2016-10-26 | 安徽渔之蓝教育软件技术有限公司 | Assisted learning system for gesture language vocational education |
CN106653051A (en) * | 2016-12-09 | 2017-05-10 | 天脉聚源(北京)传媒科技有限公司 | Video deaf-mute mode expression method and apparatus |
CN109166409A (en) * | 2018-10-10 | 2019-01-08 | 长沙千博信息技术有限公司 | A kind of sign language conversion method and device |
CN111985268B (en) * | 2019-05-21 | 2024-08-06 | 北京搜狗科技发展有限公司 | Method and device for driving animation by face |
CN111985268A (en) * | 2019-05-21 | 2020-11-24 | 搜狗(杭州)智能科技有限公司 | Method and device for driving animation by human face |
CN110413841A (en) * | 2019-06-13 | 2019-11-05 | 深圳追一科技有限公司 | Polymorphic exchange method, device, system, electronic equipment and storage medium |
CN110570877A (en) * | 2019-07-25 | 2019-12-13 | 咪咕文化科技有限公司 | Sign language video generation method, electronic device and computer readable storage medium |
CN110570877B (en) * | 2019-07-25 | 2022-03-22 | 咪咕文化科技有限公司 | Sign language video generation method, electronic device and computer readable storage medium |
CN110826441A (en) * | 2019-10-25 | 2020-02-21 | 深圳追一科技有限公司 | Interaction method, interaction device, terminal equipment and storage medium |
CN111369652A (en) * | 2020-02-28 | 2020-07-03 | 长沙千博信息技术有限公司 | Method for generating continuous sign language action based on multiple independent sign language actions |
CN111369652B (en) * | 2020-02-28 | 2024-04-05 | 长沙千博信息技术有限公司 | Method for generating continuous sign language actions based on multiple independent sign language actions |
CN113689530A (en) * | 2020-05-18 | 2021-11-23 | 北京搜狗科技发展有限公司 | Method and device for driving digital person and electronic equipment |
WO2021232875A1 (en) * | 2020-05-18 | 2021-11-25 | 北京搜狗科技发展有限公司 | Method and apparatus for driving digital person, and electronic device |
CN113689879A (en) * | 2020-05-18 | 2021-11-23 | 北京搜狗科技发展有限公司 | Method, device, electronic equipment and medium for driving virtual human in real time |
CN113689530B (en) * | 2020-05-18 | 2023-10-20 | 北京搜狗科技发展有限公司 | Method and device for driving digital person and electronic equipment |
CN113689879B (en) * | 2020-05-18 | 2024-05-14 | 北京搜狗科技发展有限公司 | Method, device, electronic equipment and medium for driving virtual person in real time |
CN115223428A (en) * | 2021-04-20 | 2022-10-21 | 美光科技公司 | Converting sign language |
CN113326746A (en) * | 2021-05-13 | 2021-08-31 | 中国工商银行股份有限公司 | Sign language broadcasting method and device for human body model |
CN115239855A (en) * | 2022-06-23 | 2022-10-25 | 安徽福斯特信息技术有限公司 | Virtual sign language anchor generation method, device and system based on mobile terminal |
CN115497499A (en) * | 2022-08-30 | 2022-12-20 | 阿里巴巴(中国)有限公司 | Method for synchronizing voice and action time |
CN115497499B (en) * | 2022-08-30 | 2024-09-17 | 阿里巴巴(中国)有限公司 | Method for synchronizing voice and action time |
CN115484493A (en) * | 2022-09-09 | 2022-12-16 | 深圳市小溪流科技有限公司 | Real-time intelligent streaming media system for converting IPTV audio and video into virtual sign language video in real time |
CN116959119A (en) * | 2023-09-12 | 2023-10-27 | 北京智谱华章科技有限公司 | Sign language digital person driving method and system based on large language model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102497513A (en) | Video virtual hand language system facing digital television | |
McDonald et al. | An automated technique for real-time production of lifelike animations of American Sign Language | |
CN108447474B (en) | Modeling and control method for synchronizing virtual character voice and mouth shape | |
US12017145B2 (en) | Method and system of automatic animation generation | |
CN110751708B (en) | Method and system for driving face animation in real time through voice | |
KR102035596B1 (en) | System and method for automatically generating virtual character's facial animation based on artificial intelligence | |
CN103561277B (en) | Transmission method and system for network teaching | |
CN115209180B (en) | Video generation method and device | |
KR20230098068A (en) | Moving picture processing method, apparatus, electronic device and computer storage medium | |
CN118138855A (en) | Method and system for generating realistic video based on multiple artificial intelligence technologies | |
CN113221840B (en) | Portrait video processing method | |
CN117557695A (en) | Method and device for generating video by driving single photo through audio | |
Wei et al. | A practical model for live speech-driven lip-sync | |
CN117315102A (en) | Virtual anchor processing method, device, computing equipment and storage medium | |
CN112002005A (en) | Cloud-based remote virtual collaborative host method | |
CN102855652B (en) | Method for redirecting and cartooning face expression on basis of radial basis function for geodesic distance | |
Papadogiorgaki et al. | Synthesis of virtual reality animations from SWML using MPEG-4 body animation parameters | |
Papadogiorgaki et al. | Text-to-sign language synthesis tool | |
Lee | Transforming Text into Video: A Proposed Methodology for Video Production Using the VQGAN-CLIP Image Generative AI Model | |
Papadogiorgaki et al. | VSigns–a virtual sign synthesis web tool | |
ten Hagen et al. | CharToon: a system to animate 2D cartoons faces. | |
You | RETRACTED: Design of Double-effect Propulsion System for News Broadcast Based on Artificial Intelligence and Virtual Host Technology | |
CN114374867B (en) | Method, device and medium for processing multimedia data | |
CN206574139U (en) | A kind of three-dimensional animation automatically generating device of data-driven | |
CN207051979U (en) | A kind of audio-visual converting system of word |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20120613 |