CN103745462B - A kind of human body mouth shape video reconfiguration system and reconstructing method - Google Patents

A kind of human body mouth shape video reconfiguration system and reconstructing method Download PDF

Info

Publication number
CN103745462B
CN103745462B CN201310745441.XA CN201310745441A CN103745462B CN 103745462 B CN103745462 B CN 103745462B CN 201310745441 A CN201310745441 A CN 201310745441A CN 103745462 B CN103745462 B CN 103745462B
Authority
CN
China
Prior art keywords
mouth
shape
video
human body
speaks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310745441.XA
Other languages
Chinese (zh)
Other versions
CN103745462A (en
Inventor
孟濬
黄吉羊
刘琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201310745441.XA priority Critical patent/CN103745462B/en
Publication of CN103745462A publication Critical patent/CN103745462A/en
Application granted granted Critical
Publication of CN103745462B publication Critical patent/CN103745462B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Processing Or Creating Images (AREA)

Abstract

The present invention provides a kind of human body mouth shape video reconfiguration system based on cyclic spring spatial dynamics temporal evolution and corresponding method.The inventive method includes information reading, pretreatment, shape of the mouth as one speaks reconstruct and these four steps of video frequency output, the relevant method of inversion and two kinds of implementations of logic revised law.Reconstructing method and system that the present invention provides both can realize reading in the inverting of Shape of mouth on this single-frame images, generate the human body shape of the mouth as one speaks video after reconstruct, on the video of multiple image composition, can also realize reading in the correction of Shape of mouth, generate the human body shape of the mouth as one speaks video after reconstruct.Compare traditional shape of the mouth as one speaks reconstructing method and system, the inventive method and system precise and high efficiency, it is not necessary to data base, while saving space, also enhance the flexibility ratio of shape of the mouth as one speaks conversion.It is highly preferred that all unit of the system of the present invention can be integrated on an intelligent terminal, described intelligent terminal can be various smart mobile phone, panel computer (such as iPad etc.), palm PC, intelligence handheld device etc..

Description

A kind of human body mouth shape video reconfiguration system and reconstructing method
Technical field
The present invention relates to field of video image processing, be specifically related to a kind of based on cyclic spring space power class hour Between develop human body mouth shape video reconfiguration system and reconstructing method.
Background technology
Along with the development of computer technology be gradually improved, the moulding of face and animation are as computer graphics In a unique branch the most increasingly paid close attention to by people, wherein for the human body shape of the mouth as one speaks in video, image Change be widely used especially.Many occasions need the shape of the mouth as one speaks by the people in existing video or image to enter Line reconstruction, is i.e. generated a series of shape of the mouth as one speaks actions by a static image, or enters the shape of the mouth as one speaks in existing video Row is revised.In order to reach such purpose, existing technical method is typically all by regarding in a large number existing Frequently image information is analyzed and processed, and sets up mouth shape data storehouse, then carries out from described mouth for particular problem Type data base calls relevant information.Although such technological means can be relatively accurately to video, figure The human body shape of the mouth as one speaks in Xiang converts, but its limitation is also apparent from.On the one hand, its realization depends on Relying in the huge mouth shape data storehouse built in advance, need huge data sample, portability is relatively Difference;On the other hand, the realization of algorithm relates to substantial amounts of computational analysis, and complexity is the highest, also limit its Range of application.
Summary of the invention
For the deficiencies in the prior art, the technical problem to be solved be to provide a kind of precision high, can The human body mouth shape video reconfiguration method and system that transplantability is good, to realize destination object according to the required shape of the mouth as one speaks Single-frame images is to the evolution of video, or realizes amendment and the inverting of the video of destination object multiple image composition. Traditional shape of the mouth as one speaks converter technique depends on huge mouth shape data storehouse, contains sound bank in this mouth shape data storehouse And corresponding mouth shape image, in order to called in conversion, on the one hand occupied substantial amounts of sky Between;On the other hand the shape of the mouth as one speaks made new advances, nothing in practice can not independently be built due to this mouth shape data storehouse itself Method processes the transformation problem not comprising the shape of the mouth as one speaks in data base.Present system is different from traditional shape of the mouth as one speaks transformation series System, it is not necessary to such mouth shape data storehouse, can be completed rapidly and accurately the video reconstruction of the human body shape of the mouth as one speaks.
The technical solution used in the present invention is as follows:
A kind of human body mouth shape video reconfiguration method, specifically includes following four step:
(1) information is read in: read in human body information and Shape of mouth from input port, and described human body information is selected from mesh The mark single-frame images of object or the video of multiple image composition, described Shape of mouth selected from word, Sound, image, video any one or the most multiple;
(2) pretreatment: the Shape of mouth reading in input port is identified conversion and will identify the shape of the mouth as one speaks after changing Information shows in real time at display module, and the human body information reading in input port is analyzed and locks The position of oral area;
(3) shape of the mouth as one speaks reconstruct: temporal evolution method based on cyclic spring spatial dynamics, according to pretreated mouth Type information and human body information carry out human body mouth shape video reconfiguration;
(4) video frequency output: the human body shape of the mouth as one speaks video after delivery outlet output has reconstructed.
The flow chart of technical solution of the present invention is as shown in Figure 1.
In described step (3), the method for described shape of the mouth as one speaks reconstruct is to drill based on the cyclic spring spatial dynamics time Change.Described cyclic spring space is a kind of to define order a little and the plane space of distance, its have with Lower 4 character:
1, any two points P in cyclic spring space1And P2, distance variable therebetween.
2, any two points P in cyclic spring space1And P2, its order is the most constant, it may be assumed that choose annular P is differed from elastic space1、P2Any point P3, the order of these 3 (or counterclockwise) clockwise is arbitrarily All without changing in conversion.
3, any point P in cyclic spring space can by with trunnion axis angle be α, size be the power of f The effect of F, and therefore produce the change on position, showing as edge, relatively primitive position is α with trunnion axis angle Direction produce certain displacement.
4, when any point P in cyclic spring space is acted on by power F, this power F is in impact Also other point in cyclic spring space is influenced whether so that it is be equal to by one and trunnion axis while P The effect of power that angle is α ', size is f ', referred to as correlation.This locus relative to P is certainly Having determined the size of α ', the distance of this point and P determines the size of f ', when the shadow that the distance of this point and P is more than During scope R of sound, it is believed that the correlation impact of its F that do not stresses.
Cyclic spring space schematic diagram is as shown in Figure 2.
The conversion of the shape of the mouth as one speaks is that the orbicularis oris of lip is affected generation, therefore for the shape of the mouth as one speaks by buccal branch of facial nerve domination Described cyclic spring spatial model can be set up study.When the t shape of the mouth as one speaks changes, it is believed that Be now this cyclic spring spatially certain n some P1, P2..., PnReceive power F respectively1, F2..., FnEffect, the common effect of this n power make this cyclic spring space occur the displacement of local, rotation or Stretching, i.e. produces the conversion of the shape of the mouth as one speaks.In described step (3), system processing module can pick out video, figure The position of the shape of the mouth as one speaks and changing based on seasonal effect in time series in Xiang, sets up corresponding cyclic spring spatial model, extracts Go out the effect of the power that each t produces on this model regional.Meanwhile, the human body described in recycling Information sets up new cyclic spring spatial model, by the power that extracted according to corresponding time effect at new ring Correspondence position on shape elastic space model, can complete human body mouth shape video reconfiguration.Described correspondence position Can be determined by the characteristic point on the 4 of the shape of the mouth as one speaks contour lines and contour line, in order to ensure the precision of conversion, In practical operation, the characteristic point on every contour line should be greater than equal to 3, as shown in Figure 3.Described determination The process of correspondence position is association based on cyclic spring space.
As preferably, in described step (3) is association based on cyclic spring spatial dynamics temporal evolution method The method of inversion, the Shape of mouth i.e. demonstrated as synchronization object simulant display model by on-the-spot true man, then pass through Real-time Collection module gathers analog video, and the human body information read carries out based on cyclic spring space Coupling, thus complete the reconstruct of human body shape of the mouth as one speaks video.As shown in Figure 4, in this method, synchronization object scene mould Intending Shape of mouth to be reconstructed, this process is collected as analog video, sets up ring based on this analog video Then it be analyzed processing by shape elastic space model, and Shape of mouth to be reconstructed can be made by accurate, high Effect ground reappears on the human body information of destination object, thus realizes the reconstruct at destination object oral area of this shape of the mouth as one speaks. This method schematic flow sheet is as shown in Figure 6.Specifically, the shape of the mouth as one speaks letter that synchronization object demonstrates according to display module The breath simulation shape of the mouth as one speaks, such as, reads the passage of display or imitates some shape of the mouth as one speaks pictures of display, now, Processing module controls Real-time Collection module and gathers the analog video of synchronization object, as the foundation of shape of the mouth as one speaks reconstruct. After collection completes, the analog video collected is segmented into n frame according to certain frame number N average mark by processing module (when the described sample shape of the mouth as one speaks video during a length of T second, have n=TN), the most corresponding time t1, t2..., tn, position the shape of the mouth as one speaks of each frame, and in the profile of the shape of the mouth as one speaks and characteristic point and the human body information read It is corresponding that the profile of the shape of the mouth as one speaks carries out linkage with characteristic point.Described frame number N can determine according to practical situation, expire Foot sampling thheorem can reflect the Shape of mouth of required reconstruct with the image after ensureing segmentation;The frequency of segmentation The highest, the complexity of shape of the mouth as one speaks reconstruct is the highest, and the precision of reconstruct is the highest;The frequency of segmentation is the lowest, shape of the mouth as one speaks weight The complexity of structure is the lowest, and the precision of reconstruct is the lowest.When the human body information read in step (1) is single-frame images Time, described linkage correspondence refers to the shape of the mouth as one speaks characteristic point in each for analog video frame all to correspond to single frames human body On frame;When the video that the human body information read in step (1) is multiple image composition, described connection Dynamic correspondence refers to the shape of the mouth as one speaks characteristic point in each for analog video frame all to correspond to the corresponding frame of human body information video On.Described corresponding frame can be determined by method below: the frame figure gone out by human body information Video segmentation and mould The frame figure that plan Video segmentation goes out all is numbered, if the frame number of human body information video and analog video is equal, Described corresponding frame is the frame that numbering is identical;If the frame number of human body information video and analog video not phase Deng, described corresponding frame is then the frame that proportion position is identical in sum.When the frame number of analog video is big When human body information video frame number, unnecessary frame is cast out in proportion;When the frame number of analog video is less than human body During informational video frame number, not enough frame being carried out interpolation processing in proportion, the middle entry shape of the mouth as one speaks of interpolation passes through base Kinetics temporal evolution structure in cyclic spring space.After the linkage correspondence completing the shape of the mouth as one speaks, Show that t=(i/N) second in this moment is corresponding according to the mutation analysis of the i-th frame in analog video to the shape of the mouth as one speaks of (i+1) frame Cyclic spring spatial model in the acting on of the power that is subject to of each characteristic point, the power obtained is acted on human body letter In the cyclic spring spatial model that breath is corresponding, the reconstruct of t=this moment Shape of mouth of (i/N) second can be completed. After new video each frame figure has reconstructed, i.e. obtain the human body shape of the mouth as one speaks video after having reconstructed.
As preferably, in described step (3) is logic based on cyclic spring spatial dynamics temporal evolution method Revised law, does not i.e. rely on on-the-spot true man and deduces, directly according to required Shape of mouth, call shape of the mouth as one speaks primitive Module builds shape of the mouth as one speaks state template artificially, more raw by kinetics temporal evolution based on cyclic spring space The transitive state becoming disappearance completes video reconstruction.As it is shown in figure 5, this method is without synchronization object scene mould Intend, but on the basis of human body information and Shape of mouth, generate shape of the mouth as one speaks shape by calling shape of the mouth as one speaks primitive artificially Morphotype plate, resettles the evolution of cyclic spring spatial model and generates the shape of the mouth as one speaks video of destination object, it is achieved target pair Video reconstruction as oral area.This method schematic flow sheet is as shown in Figure 7.Described shape of the mouth as one speaks primitive is human body mouth The model of the most basic situation of type, such as a shape of the mouth as one speaks (opening one's mouth) in phonetic, the o shape of the mouth as one speaks (pouting one's lips), the i shape of the mouth as one speaks (grinning) etc., it is possible to generate all transition by kinetics temporal evolution based on cyclic spring space Shape of the mouth as one speaks state.The shape of the mouth as one speaks state of described transition refers to the shape of the mouth as one speaks basis element change mistake to another shape of the mouth as one speaks primitive The shape of the mouth as one speaks state produced in journey, such as, from the shape of the mouth as one speaks primitive remained silent to sending the shape of the mouth as one speaks primitive of phonetic " a ", The shape of the mouth as one speaks state of its transition is exactly the shape of the mouth as one speaks during oral area slowly magnifies.Specifically, show when display module Go out the Shape of mouth of required reconstruct, n the basic shape of the mouth as one speaks meeting demand can be chosen artificially in shape of the mouth as one speaks storehouse Being associated the shape of the mouth as one speaks in ad-hoc location frame revising, simulation is constructed based on seasonal effect in time series shape of the mouth as one speaks state mould Plate.When the human body information read in the step (1) is single-frame images, in the described shape of the mouth as one speaks state template shape of the mouth as one speaks it Outer information is all extended by single-frame images;When the human body information read in step (1) is multiple image composition During video, in described shape of the mouth as one speaks state template, the information outside the shape of the mouth as one speaks is consistent with video.Outside the described shape of the mouth as one speaks Information be all information outside oral area in image or video, including other parts outside human body oral area Environment residing for (such as nose, eye, cheek, trunk, extremity etc.) and people.Such as, the blinking of eye The rocking of dynamic, health, after one's death other people through etc. be all considered as the change that all information outside oral area occurs Change.After shape of the mouth as one speaks state template has built, then the oral area peripheral position in human body information is carried out based on annular The association of elastic space so that corresponding impact is caused in the region of oral area peripheral extent by the change of oral area, i.e. Construct the cyclic spring spatial model of correspondence.Now, i-th shape of the mouth as one speaks primitive is analyzed to (i+1) individual shape of the mouth as one speaks The change of primitive, the power that in the cyclic spring spatial model that i-th stage that can draw is corresponding, each point receives Effect, then by the prolonged action of power to longer time series, i.e. can get in the two stage all of Shape of the mouth as one speaks transitive state.When (n-1) individual transitive state has all reconstructed, i.e. realize the weight of human body shape of the mouth as one speaks video Structure.
For association inversion method, the present invention also provides for a kind of human body mouth shape video reconfiguration system, including input Mouth, delivery outlet, processing module, display module and Real-time Collection module, wherein:
Described input port is used for reading in human body information and Shape of mouth, and described human body information is selected from target pair The single-frame images of elephant or the video of multiple image composition, described Shape of mouth is selected from word, sound, figure As, video any one or the most multiple;
Described delivery outlet is for exporting the human body shape of the mouth as one speaks video after having reconstructed;
Described display module shows in real time for the Shape of mouth reading in input port;
Described processing module carries out conversion process for the Shape of mouth reading in input port, then believes at human body The reconstruct of human body shape of the mouth as one speaks video is realized on the basis of breath;
Described Real-time Collection module is used for during using association inversion method to be reconstructed synchronization object Video carry out Real-time Collection.
The connected mode of modules is as shown in Figure 8.Between wherein, described input port and processing module, Between processing module and delivery outlet, between processing module and Real-time Collection module, processing module and display module Between can be attached partially or completely through wired or wireless mode, to ensure effective transmission of data.Can With according to actual needs, all use wired mode to connect, all use wireless mode to connect, or part is adopted Connect with wired mode, partly use wireless mode to connect.
Described processing module is to have Computer Vision and the terminal of information analysis ability, is selected from numeral Chip, intelligent terminal.Described intelligent terminal refers to capture external information, can carry out calculating, analyzing And process, and the equipment of information transmission can be carried out between different terminals, include but not limited to desktop Brain, notebook computer, mobile intelligent terminal.Described mobile intelligent terminal is portable intelligent terminal, Include but not limited to various smart mobile phone, panel computer (such as iPad etc.), palm PC, intelligence handheld game Machine.Described digit chip refers to, through design, use integrated electronic technique, it is possible to carry out calculating, analyze and The chip processed, and other equipment can be controlled by extension, include but not limited to single-chip microcomputer, ARM, DSP, FPGA etc..
Described Real-time Collection module is selected from video camera, photographing unit, photographic head, digitized image equipment, tool Have camera function intelligent terminal any one or the most multiple.
Described display module selected from display, display screen, projector, intelligent terminal any one or appoint Anticipate multiple.
Specifically, the Shape of mouth simulation shape of the mouth as one speaks that synchronization object demonstrates according to display module, such as, read The passage of display or some shape of the mouth as one speaks pictures of imitation display, now, processing module controls Real-time Collection mould Block gathers the analog video of synchronization object, as the foundation of shape of the mouth as one speaks reconstruct.After collection completes, processing module will The analog video collected is segmented into n frame (when described sample shape of the mouth as one speaks video according to certain frame number N average mark During a length of T second, have n=TN), the most corresponding time t1, t2..., tn, position the shape of the mouth as one speaks of each frame, and Profile and the characteristic point of the shape of the mouth as one speaks in the profile of the shape of the mouth as one speaks and characteristic point and the human body information read are linked Corresponding.The frequency of described segmentation can determine according to practical situation, after sampling thheorem to be met is to ensure segmentation Image can reflect the Shape of mouth of required reconstruct;The frequency of segmentation is the highest, the complexity of shape of the mouth as one speaks reconstruct The highest, the precision of reconstruct is the highest;The frequency of segmentation is the lowest, and the complexity of shape of the mouth as one speaks reconstruct is the lowest, reconstruct Precision is the lowest.When the human body information read in described input port is single-frame images, described linkage is right The shape of the mouth as one speaks characteristic point in each for analog video frame should be referred to all to correspond on single frames human body information image;Work as institute When the human body information read in the input port stated is the video of multiple image composition, described linkage correspondence refers to Shape of the mouth as one speaks characteristic point in each for analog video frame is all corresponded on the corresponding frame of human body information video.Described Corresponding frame can be determined by method below: the frame figure gone out by human body information Video segmentation and analog video segmentation The frame figure gone out all is numbered, if the frame number of human body information video and analog video is equal, and described correspondence Frame is the frame that numbering is identical;If the frame number of human body information video and analog video is unequal, described is right Answering frame is then the frame that proportion position is identical in sum.When the frame number of analog video regards more than human body information Frequently, during frame number, unnecessary frame is cast out in proportion;When the frame number of analog video is less than human body information video frame number Time, not enough frame is carried out interpolation processing in proportion, the middle entry shape of the mouth as one speaks of interpolation is by empty based on cyclic spring Between kinetics temporal evolution structure.After the linkage correspondence completing the shape of the mouth as one speaks, can be according to analog video In the i-th frame show that cyclic spring corresponding to t=(i/N) second in this moment is empty to the mutation analysis of the shape of the mouth as one speaks of (i+1) frame Between the acting on of each characteristic point is subject in model power, the power obtained is acted on the annular that human body information is corresponding In elastic space model, the reconstruct of t=this moment Shape of mouth of (i/N) second can be completed.The each frame of new video After figure reconstruct completes, i.e. obtain the human body shape of the mouth as one speaks video after having reconstructed.
For logic revised law, the present invention also provides for a kind of human body mouth shape video reconfiguration system, including input Mouth, delivery outlet, processing module, display module and shape of the mouth as one speaks primitive models, wherein:
Described input port is used for reading in human body information and Shape of mouth, and described human body information is selected from target pair The single-frame images of elephant or the video of multiple image composition, described Shape of mouth is selected from word, sound, figure As, video any one or the most multiple;
Described delivery outlet is for exporting the human body shape of the mouth as one speaks video after having reconstructed;
Described display module shows in real time for the Shape of mouth reading in input port;
Described processing module carries out conversion process for the Shape of mouth reading in input port, then believes at human body The reconstruct of human body shape of the mouth as one speaks video is realized on the basis of breath;
Described shape of the mouth as one speaks primitive models is for storing basic shape of the mouth as one speaks primitive, in order to use logic revised law to enter Call during line reconstruction, build shape of the mouth as one speaks state template artificially.
The described model that shape of the mouth as one speaks primitive is the most basic situation of the human body shape of the mouth as one speaks, such as a shape of the mouth as one speaks in phonetic (is opened Mouth), the o shape of the mouth as one speaks (pouting one's lips), the i shape of the mouth as one speaks (grinning) etc., it is possible to by based on cyclic spring space dynamic Mechanics temporal evolution generates the shape of the mouth as one speaks state of all transition.The shape of the mouth as one speaks state of described transition refers to a shape of the mouth as one speaks base The shape of the mouth as one speaks state that unit produces during transforming to another shape of the mouth as one speaks primitive, such as, from the shape of the mouth as one speaks primitive remained silent To sending the shape of the mouth as one speaks primitive of phonetic " a ", the shape of the mouth as one speaks state of its transition is exactly the mouth during oral area slowly magnifies Type.
The connected mode of modules is as shown in Figure 9.Between wherein, described input port and processing module, Between processing module and delivery outlet, between processing module and shape of the mouth as one speaks primitive models, processing module and display module Between can be attached partially or completely through wired or wireless mode, to ensure effective transmission of data.Can With according to actual needs, all use wired mode to connect, all use wireless mode to connect, or part is adopted Connect with wired mode, partly use wireless mode to connect.
Described processing module is to have Computer Vision and the terminal of information analysis ability, including selected from number Word chip, intelligent terminal.Described intelligent terminal refers to capture external information, can carry out calculating, dividing Analysis and process, and the equipment of information transmission can be carried out between different terminals, include but not limited to desktop Brain, notebook computer, mobile intelligent terminal.Described mobile intelligent terminal is portable intelligent terminal, Include but not limited to various smart mobile phone, panel computer (such as iPad etc.), palm PC, intelligence handheld game Machine.Described digit chip refers to, through design, use integrated electronic technique, it is possible to carry out calculating, analyze and The chip processed, and other equipment can be controlled by extension, include but not limited to single-chip microcomputer, ARM, DSP, FPGA etc..
Described display module selected from display, display screen, projector, intelligent terminal any one or appoint Anticipate multiple.
Described shape of the mouth as one speaks primitive models is for storing basic shape of the mouth as one speaks model, in order to use logic revised law to enter Call during line reconstruction, build shape of the mouth as one speaks state template artificially.Traditional shape of the mouth as one speaks converter technique depends on Huge mouth shape data storehouse, this mouth shape data storehouse contains sound bank and corresponding mouth shape image with It is easy to be called in conversion, on the one hand occupies substantial amounts of space;On the other hand due to this mouth shape data Storehouse itself can not independently build the shape of the mouth as one speaks made new advances, and cannot process in data base and not comprise the shape of the mouth as one speaks in practice Transformation problem.Present system is different from traditional shape of the mouth as one speaks transformation system, it is not necessary to such mouth shape data Storehouse, can be completed rapidly and accurately the video reconstruction of the human body shape of the mouth as one speaks.
As preferably, the mouth shape video reconfiguration system of the present invention can be have camera function desktop computer, Notebook computer or mobile intelligent terminal.Described mobile intelligent terminal is portable intelligent terminal, including But it is not limited to various smart mobile phone, panel computer (such as iPad etc.), palm PC, intelligence handheld game Machine.Specifically, the mouth shape video reconfiguration system of the present invention can be only a desktop with camera function Brain, or a notebook computer with camera function, or an intelligent movable with camera function Terminal.Now, the communication of equipment and data transmission module are as the input port of system and delivery outlet, in processing Core is as the processing module of system, and photographic head is as the Real-time Collection module of system, and display screen is as system Display module, memory element is as the shape of the mouth as one speaks primitive models of system.The mouth shape video reconfiguration system of the present invention is also Can be the combination with the desktop computer of camera function, notebook computer or mobile intelligent terminal, such as, There is the photographic head of the mobile intelligent terminal of camera function and display screen respectively as Real-time Collection module and display Module, the communication module of notebook computer, process kernel and memory element are respectively as the input and output of system Mouth, processing module and shape of the mouth as one speaks primitive models, etc..
As preferably, in described step (3) is logic based on cyclic spring spatial dynamics temporal evolution method Revised law, does not i.e. rely on on-the-spot true man and deduces, directly according to required Shape of mouth, call shape of the mouth as one speaks primitive Module builds shape of the mouth as one speaks state template artificially, more raw by kinetics temporal evolution based on cyclic spring space The transitive state becoming disappearance completes video reconstruction, and its schematic flow sheet is as shown in Figure 7.Described shape of the mouth as one speaks primitive is The model of the most basic situation of the human body shape of the mouth as one speaks, such as a shape of the mouth as one speaks (opening one's mouth) in phonetic, the o shape of the mouth as one speaks (pouting one's lips), i Shape of the mouth as one speaks (grinning) etc., it is possible to generate all mistakes by kinetics temporal evolution based on cyclic spring space The shape of the mouth as one speaks state crossed.The shape of the mouth as one speaks state of described transition refers to that a shape of the mouth as one speaks basis element change is to another shape of the mouth as one speaks primitive During produce shape of the mouth as one speaks state, such as, from the shape of the mouth as one speaks primitive remained silent to the shape of the mouth as one speaks base sending phonetic " a " Unit, the shape of the mouth as one speaks state of its transition is exactly the shape of the mouth as one speaks during oral area slowly magnifies.Specifically, display module is worked as Demonstrate the Shape of mouth of required reconstruct, n the basic mouth meeting demand can be chosen artificially in shape of the mouth as one speaks storehouse The shape of the mouth as one speaks in ad-hoc location frame is associated revising by type, and simulation is constructed based on seasonal effect in time series shape of the mouth as one speaks state Template.When the human body information read in described input port is single-frame images, described shape of the mouth as one speaks state template Information outside the middle shape of the mouth as one speaks is all extended by single-frame images;When the human body information read in described input port is During the video that multiple image forms, in described shape of the mouth as one speaks state template, the information outside the shape of the mouth as one speaks is consistent with video. The described information outside the shape of the mouth as one speaks is all information in image or video outside oral area, including human body oral area it Environment residing for outer other parts (such as nose, eye, cheek, trunk, extremity etc.) and people.Such as, The blinking of eye, the rocking of health, after one's death other people through etc. be all considered as all information outside oral area The change occurred.After shape of the mouth as one speaks state template has built, then the oral area peripheral position in human body information is carried out Association based on cyclic spring space so that the region of oral area peripheral extent is caused accordingly by the change of oral area Impact, i.e. constructs the cyclic spring spatial model of correspondence.Now, i-th shape of the mouth as one speaks primitive is analyzed to (i+1) The change of individual shape of the mouth as one speaks primitive, in the cyclic spring spatial model that i-th stage that can draw is corresponding, each point receives The effect of power, then by the prolonged action of power to longer time series, i.e. can get institute in the two stage Some shape of the mouth as one speaks transitive states.When (n-1) individual transitive state has all reconstructed, i.e. realize human body shape of the mouth as one speaks video Reconstruct.
The invention has the beneficial effects as follows:
(1) present invention both can realize reading in the inverting of Shape of mouth on this single-frame images, after generating reconstruct Human body shape of the mouth as one speaks video, it is also possible to realize reading in the correction of Shape of mouth on the video of multiple image composition, raw Become the human body shape of the mouth as one speaks video after reconstruct, have the strongest suitability.
(2) present invention has association inversion and two kinds of specific embodiments of logic correction, and the former is true by scene The synchronization of people is deduced and can quickly and efficiently be completed the reconstruct of human body shape of the mouth as one speaks video;The latter needs artificially to call Shape of the mouth as one speaks primitive but be not dependent on on-the-spot deduction, it is possible to achieve off-line is revised, and two kinds of methods can meet does not sympathizes with The demand of mouth shape video reconfiguration under condition.
(3) present invention configures simply in terms of system hardware, with low cost;Software aspects the most only needs common regarding Frequently, image processing software and small-sized shape of the mouth as one speaks primitive, be not related to extra software and dispose, the most relatively pass The shape of the mouth as one speaks reconfiguration system of system, present system, without data base, also enhances mouth while saving space The flexibility ratio of type conversion.
(4) it is highly preferred that all unit of the system of the present invention can be integrated on an intelligent terminal, described Intelligent terminal can be smart mobile phone, panel computer, palm PC, intelligence handheld device, therefore have There is the highest portability.
Accompanying drawing explanation
Fig. 1 is the inventive method flow chart.
Fig. 2 is cyclic spring space schematic diagram.
Contour line and the schematic diagram of characteristic point when Fig. 3 is shape of the mouth as one speaks position correspondence in the inventive method, in figure, L1 is extremely L4 and L1 ' to L4 ' be respectively the contour line of two shape of the mouth as one speaks, P1 to P6 and P1 ' be two respectively to P6 ' Key point on individual shape of the mouth as one speaks contour line, needs to ensure there are at least 3 corresponding point on every contour line to ensure to become The accuracy changed.
Fig. 4 is the information conversion sketch of association inversion method in the present invention.
Fig. 5 is the information conversion sketch of logic revised law in the present invention.
Fig. 6 is the schematic flow sheet of association inversion method in the present invention.
Fig. 7 is the schematic flow sheet of logic revised law in the present invention.
Fig. 8 is the system construction drawing that association inversion method of the present invention is corresponding.
Fig. 9 is the system construction drawing that logic revised law of the present invention is corresponding.
Detailed description of the invention
In order to illustrate in greater detail the human body mouth shape video reconfiguration method of the present invention, below according to accompanying drawing specifically The bright present invention.
Embodiment 1
As shown in Figure 6, using B as synchronization object, use an association inversion method photograph from destination object A As a example by sheet reconstructs the video that A reads aloud a lecture original text, illustrate the shape of the mouth as one speaks reconstructing method of the present invention. Here using a desktop computer being equipped with photographic head as reconfiguration system, wherein: USB interface is as system Input, delivery outlet, processor is as the processing module of system, and photographic head is as the Real-time Collection mould of system Block, display is as the display module of system.
(1) information is read in: system reads in the photo of A as pending human body information, reading from USB interface Enter speech draft document as pending Shape of mouth.
(2) pretreatment: it is text formatting that processor identifies Shape of mouth, it is contemplated that the utilization of association inversion method Convenient, directly the Shape of mouth of text formatting is transferred to display and shows;Meanwhile, processor pair The photo of A carries out graphical analysis, identifies and locks out the position of A oral area in photo, choose outlet The characteristic point of type, such as two labial angles, the center of four lip lines.
(3) shape of the mouth as one speaks reconstruct: the Word message simulation shape of the mouth as one speaks that synchronization object B demonstrates according to display, reads The content of speech draft.Meanwhile, camera collection B reads the video (duration 1000 of this speech draft Second), i.e. analog video, it is used as the foundation of shape of the mouth as one speaks reconstruct.After collection completes, processor will be adopted Collect to the analog video of B be divided into 30000 frames by the frame number of 30 frames/second, the most corresponding time t1, t2..., t30000, and position the shape of the mouth as one speaks in each frame, choose same shape of the mouth as one speaks characteristic point, i.e. Two labial angles, the centers of four lip lines.Because the photo of human body information, i.e. A is single frames figure Picture, 30000 frames that the analog video of B is partitioned into respectively with characteristic of correspondence in the photo of A Point carries out correspondence, and link peripheral position, it is established that based on seasonal effect in time series cyclic spring spatial mode Type.Afterwards, can be according to the 1st frame in the analog video of B to the mutation analysis of the shape of the mouth as one speaks of the 2nd frame Show that each characteristic point in cyclic spring spatial model corresponding to t=(1/30) second in this moment is subject to The effect of power, acts in the cyclic spring spatial model that A photo is corresponding, i.e. by the power obtained The reconstruct of t=(1/30) second in this moment A Shape of mouth can be completed.When 30000 frame reconstruct are the completeest Become, i.e. obtain the A after having reconstructed and read aloud the video of this speech draft.
(4) video frequency output: the A after USB interface output has reconstructed reads aloud the video of this speech draft.
In the present embodiment, it is also possible to use here using a smart mobile phone as reconfiguration system, wherein: WIFI connects Mouthful as the input of system, delivery outlet, cell phone processor is made as the processing module of system, mobile phone camera For the Real-time Collection module of system, mobile phone display screen is as the display module of system.
(1) information is read in: system reads in the photo of A as pending human body information, reading from WIFI interface Enter speech draft document as pending Shape of mouth.
(2) pretreatment: it is text formatting that cell phone processor identifies Shape of mouth, it is contemplated that association inversion method It is convenient to use, and directly the Shape of mouth of text formatting is transferred to display and shows;Meanwhile, process Device carries out graphical analysis to the photo of A, identifies and lock out the position of A oral area in photo, choosing Take out the characteristic point of the shape of the mouth as one speaks, such as two labial angles, the center of four lip lines.
(3) shape of the mouth as one speaks reconstruct: the Word message simulation shape of the mouth as one speaks that synchronization object B demonstrates according to display, reads The content of speech draft.Meanwhile, mobile phone camera by the analog video of B that collects by 30 frames/second Frame number be divided into 30000 frames, the most corresponding time t1, t2..., t30000, and in each frame The location shape of the mouth as one speaks, chooses same shape of the mouth as one speaks characteristic point, i.e. two labial angles, the centers of four lip lines.Cause Photo for human body information, i.e. A is single-frame images, 30000 be partitioned into by the analog video of B Frame carries out correspondence respectively with characteristic of correspondence point in the photo of A, and link peripheral position, it is established that Based on seasonal effect in time series cyclic spring spatial model.Afterwards, can be according in the analog video of B 1st frame draws annular corresponding to t=(1/30) second in this moment to the mutation analysis of the shape of the mouth as one speaks of the 2nd frame Acting on of the power that in elastic space model, each characteristic point is subject to, acts on the power obtained A and shines In the cyclic spring spatial model that sheet is corresponding, t=(1/30) second in this moment A shape of the mouth as one speaks letter can be completed The reconstruct of breath.When 30000 frame reconstruct are fully completed, i.e. obtain the A after having reconstructed and read aloud this The video of speech draft.
(4) video frequency output: the A after WIFI interface output has reconstructed reads aloud the video of this speech draft.
Embodiment 2
As it is shown in fig. 7, below to use logic revised law that the shape of the mouth as one speaks of certain fragment in announcer's C video is repaiied As a example by just, illustrate the present invention shape of the mouth as one speaks reconstructing method, in the present embodiment, C is destination object.This In using smart mobile phone as reconfiguration system, wherein: WIFI interface as the input of system, delivery outlet, Cell phone processor is as the processing module of system, and mobile phone display screen is as the display module of system, depositing of mobile phone Storage unit is as the shape of the mouth as one speaks primitive models of system.
(1) information is read in: system reads in the video of announcer C from WIFI interface, and editing goes out portion to be revised It is allocated as pending human body information, reads in voice correction content as pending shape of the mouth as one speaks letter simultaneously Breath.
(2) pretreatment: it is phonetic matrix that processor identifies Shape of mouth, it is contemplated that the utilization of logic revised law Convenient, the Shape of mouth of phonetic matrix is converted into the shape of the mouth as one speaks delivery of video of correspondence to showing screen display Show.
(3) shape of the mouth as one speaks reconstruct: when display screen demonstrates the Shape of mouth of required reconstruct, can be artificially at shape of the mouth as one speaks base Element module calls and meets the basic shape of the mouth as one speaks of demand and be associated repairing to the shape of the mouth as one speaks in ad-hoc location frame Just, simulation is constructed based on seasonal effect in time series shape of the mouth as one speaks state template, letter outside the shape of the mouth as one speaks in template Breath, the most here the rocking of people's limbs, the change etc. of surrounding, need consistent with video.Example As, wait reconstruct be the sound being sent " a " by closed configuration after recover one section of voice remaining silent again, only Need the shape of the mouth as one speaks of initially remaining silent, send out " a " time open the shape of the mouth as one speaks of maximum, pronounce to terminate after the mouth remained silent The type these three shape of the mouth as one speaks rewrites the frame into the corresponding time, can be used as the state template of this section of shape of the mouth as one speaks, Set up corresponding cyclic spring spatial model.Two changes between this model three phases are carried out point Analysis, i.e. can get the work of the power that each characteristic point receives in the two stage cyclic spring spatial model With, then by the prolonged action of power to longer time series, i.e. can get institute in the two stage The shape of the mouth as one speaks state of some transition, be i.e. slow of speech the some frames magnified slowly and the some frames closed up slowly that are slow of speech. For example, it is desired to build 30 frames between the two shape of the mouth as one speaks primitive to complete video reconstruction, just will be divided The effect of the power separated out is divided into 30 parts, acts in this cyclic spring spatial model successively, produces The shape of the mouth as one speaks state of raw 30 transition.
(4) video frequency output: cover, with the video generated, the part that in original video, editing goes out, in WIFI interface Export the video of the announcer C after having reconstructed.
Should be understood that to one skilled in the art and require can carry out various with other factors according to design Amendment, combination, certainly combination and change, as long as they all fall within claims and equivalents is limited In fixed scope.

Claims (14)

1. a human body mouth shape video reconfiguration method, it is characterised in that include following four step:
(1) information is read in: read in human body information and Shape of mouth from input port, and described human body information is selected from target The single-frame images of object or multiple image composition video, described Shape of mouth selected from word, sound, Image, video any one or the most multiple;
(2) pretreatment: the shape of the mouth as one speaks after the Shape of mouth reading in input port is identified conversion and will identify conversion is believed Breath shows in real time at display module, and the human body information reading in input port is analyzed and locks oral area Position;
(3) shape of the mouth as one speaks reconstruct: temporal evolution method based on cyclic spring spatial dynamics, according to the pretreated shape of the mouth as one speaks Information and human body information carry out human body mouth shape video reconfiguration;
(4) video frequency output: the human body shape of the mouth as one speaks video after delivery outlet output has reconstructed;
Wherein, described cyclic spring space is a kind of to define order a little and the plane space of distance, its There are following 4 character:
1) any two points P in cyclic spring space1And P2, distance variable therebetween;
2) any two points P in cyclic spring space1And P2, its order is the most constant, it may be assumed that choose annular elastomeric Property differs from P in space1、P2Any point P3, these 3 orders clockwise or counterclockwise are in arbitrarily conversion All without changing;
3) any point P in cyclic spring space can by with trunnion axis angle be α, size be power F of f Effect, and therefore produce the change on position, show as relatively primitive position along being α with trunnion axis angle Direction produce certain displacement;
4) when any point P in cyclic spring space is acted on by power F, this power F is in impact Also other point in cyclic spring space is influenced whether so that it is be equal to by one and horizontal axle clamp while P The effect of power that angle is α ', size is f ', referred to as correlation;Other the point described space bit relative to P Putting the size determining α ', other point described and the distance of P determine the size of f ', when other point described During coverage R being more than with the distance of P, it is believed that the correlation impact of its F that do not stresses.
Human body mouth shape video reconfiguration method the most according to claim 1, it is characterised in that: described step (3) In be association inversion method based on cyclic spring spatial dynamics temporal evolution method, i.e. by on-the-spot true man's conduct The Shape of mouth that synchronization object simulant display model demonstrates, then regarded by Real-time Collection module collection simulation Frequently, and the human body information read carries out coupling based on cyclic spring space, thus completes human body mouth The reconstruct of type video.
Human body mouth shape video reconfiguration method the most according to claim 1, it is characterised in that: described step (3) In be logic revised law based on cyclic spring spatial dynamics temporal evolution method, i.e. do not rely on on-the-spot true People deduces, and directly according to required Shape of mouth, calls shape of the mouth as one speaks primitive models and builds shape of the mouth as one speaks state artificially Template, the transitive state generating disappearance completes video reconstruction.
The human body mouth shape video reconfiguration system of reconstructing method the most according to claim 2, it is characterised in that: described Video reconstruction system include input port, delivery outlet, processing module, display module and Real-time Collection module, Wherein:
Described input port is used for reading in human body information and Shape of mouth, and described human body information is selected from destination object Single-frame images or multiple image composition video, described Shape of mouth selected from word, sound, image, Video any one or the most multiple;
Described delivery outlet is for exporting the human body shape of the mouth as one speaks video after having reconstructed;
Described display module shows in real time for the Shape of mouth reading in input port;
Described processing module carries out conversion process for the Shape of mouth reading in input port, then at human body information On the basis of realize the reconstruct of human body shape of the mouth as one speaks video;
Described Real-time Collection module is used for during using association inversion method to be reconstructed synchronization object Video carries out Real-time Collection.
Human body mouth shape video reconfiguration system the most according to claim 4, it is characterised in that: described Real-time Collection Module selected from digitized image equipment, have camera function intelligent terminal any one or the most multiple.
Human body mouth shape video reconfiguration system the most according to claim 4, it is characterised in that: described Real-time Collection Module selected from video camera, photographing unit any one or the most multiple.
Human body mouth shape video reconfiguration system the most according to claim 4, it is characterised in that: described Real-time Collection Module is photographic head.
The human body mouth shape video reconfiguration system of reconstructing method the most according to claim 3, it is characterised in that: described Video reconstruction system include input port, delivery outlet, processing module, display module and shape of the mouth as one speaks primitive models, Wherein:
Described input port is used for reading in human body information and Shape of mouth, and described human body information is selected from destination object Single-frame images or multiple image composition video, described Shape of mouth selected from word, sound, image, Video any one or the most multiple;
Described delivery outlet is for exporting the human body shape of the mouth as one speaks video after having reconstructed;
Described display module shows in real time for the Shape of mouth reading in input port;
Described processing module carries out conversion process for the Shape of mouth reading in input port, then at human body information On the basis of realize the reconstruct of human body shape of the mouth as one speaks video;
Described shape of the mouth as one speaks primitive models is for storing basic shape of the mouth as one speaks primitive, in order to use logic revised law to carry out Call during reconstruct, build shape of the mouth as one speaks state template artificially.
9. according to the human body mouth shape video reconfiguration system described in any one of claim 4-8, it is characterised in that: described Processing module is to have Computer Vision and the terminal of information analysis ability.
10. according to the human body mouth shape video reconfiguration system described in any one of claim 4-8, it is characterised in that: institute The display module stated selected from display, display screen, projector any one or the most multiple.
11. according to the human body mouth shape video reconfiguration system described in any one of claim 4-8, it is characterised in that: institute The display module stated is intelligent terminal.
12. according to the human body mouth shape video reconfiguration system described in any one of claim 4-8, it is characterised in that: institute The mouth shape video reconfiguration system stated is to have the desktop computer of camera function, notebook computer.
13. according to the human body mouth shape video reconfiguration system described in any one of claim 4-8, it is characterised in that: institute The mouth shape video reconfiguration system stated is mobile intelligent terminal.
14. according to the human body mouth shape video reconfiguration system described in any one of claim 4-8, it is characterised in that: institute The mouth shape video reconfiguration system stated is smart mobile phone, panel computer, palm PC, intelligence handheld device.
CN201310745441.XA 2013-12-27 2013-12-27 A kind of human body mouth shape video reconfiguration system and reconstructing method Active CN103745462B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310745441.XA CN103745462B (en) 2013-12-27 2013-12-27 A kind of human body mouth shape video reconfiguration system and reconstructing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310745441.XA CN103745462B (en) 2013-12-27 2013-12-27 A kind of human body mouth shape video reconfiguration system and reconstructing method

Publications (2)

Publication Number Publication Date
CN103745462A CN103745462A (en) 2014-04-23
CN103745462B true CN103745462B (en) 2016-11-02

Family

ID=50502477

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310745441.XA Active CN103745462B (en) 2013-12-27 2013-12-27 A kind of human body mouth shape video reconfiguration system and reconstructing method

Country Status (1)

Country Link
CN (1) CN103745462B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104298961B (en) * 2014-06-30 2018-02-16 中国传媒大学 Video method of combination based on Mouth-Shape Recognition
CN108831463B (en) * 2018-06-28 2021-11-12 广州方硅信息技术有限公司 Lip language synthesis method and device, electronic equipment and storage medium
CN109168067B (en) * 2018-11-02 2022-04-22 深圳Tcl新技术有限公司 Video time sequence correction method, correction terminal and computer readable storage medium
CN114554267B (en) * 2022-02-22 2024-04-02 上海艾融软件股份有限公司 Audio and video synchronization method and device based on digital twin technology

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100332229A1 (en) * 2009-06-30 2010-12-30 Sony Corporation Apparatus control based on visual lip share recognition

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101101752A (en) * 2007-07-19 2008-01-09 华中科技大学 Monosyllabic language lip-reading recognition system based on vision character

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Extraction of Visual Features for Lipreading》;lain Matthews et al.;《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》;20020228;第24卷(第2期);198-213 *
《视觉驱动的语音合成系统中唇形轮廓的正交变换描述》;李刚等;《光学精密工程》;20070731;第15卷(第7期);1117-1123 *

Also Published As

Publication number Publication date
CN103745462A (en) 2014-04-23

Similar Documents

Publication Publication Date Title
CN103745423B (en) A kind of shape of the mouth as one speaks teaching system and teaching method
CN103745462B (en) A kind of human body mouth shape video reconfiguration system and reconstructing method
CN108615009A (en) A kind of sign language interpreter AC system based on dynamic hand gesture recognition
CN113901894A (en) Video generation method, device, server and storage medium
CN103544724A (en) System and method for realizing fictional cartoon character on mobile intelligent terminal by augmented reality and card recognition technology
WO2020134436A1 (en) Method for generating animated expression and electronic device
CN101271591A (en) Interactive multi-vision point three-dimensional model reconstruction method
CN109801349A (en) A kind of real-time expression generation method of the three-dimensional animation role of sound driver and system
CN110751708A (en) Method and system for driving face animation in real time through voice
CN111724458B (en) Voice-driven three-dimensional face animation generation method and network structure
CN110415701A (en) The recognition methods of lip reading and its device
CN103778661B (en) A kind of method, system and computer for generating speaker's three-dimensional motion model
CN104376309A (en) Method for structuring gesture movement basic element models on basis of gesture recognition
CN110008961A (en) Text real-time identification method, device, computer equipment and storage medium
CN107704817A (en) A kind of detection algorithm of animal face key point
CN100487732C (en) Method for generating cartoon portrait based on photo of human face
CN108810561A (en) A kind of three-dimensional idol live broadcasting method and device based on artificial intelligence
CN112419334A (en) Micro surface material reconstruction method and system based on deep learning
CN114697759B (en) Virtual image video generation method and system, electronic device and storage medium
CN111105487B (en) Face synthesis method and device in virtual teacher system
CN116416386A (en) Digital twin L5-level simulation-based high-definition rendering and restoring system
CN116563923A (en) RGBD-based facial acupoint positioning method, digital twin system and device
CN110322545A (en) Campus three-dimensional digital modeling method, system, device and storage medium
CN114494542A (en) Character driving animation method and system based on convolutional neural network
CN111582067B (en) Facial expression recognition method, system, storage medium, computer program and terminal

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant