CN107623622A - A kind of method and electronic equipment for sending speech animation - Google Patents
A kind of method and electronic equipment for sending speech animation Download PDFInfo
- Publication number
- CN107623622A CN107623622A CN201610560255.2A CN201610560255A CN107623622A CN 107623622 A CN107623622 A CN 107623622A CN 201610560255 A CN201610560255 A CN 201610560255A CN 107623622 A CN107623622 A CN 107623622A
- Authority
- CN
- China
- Prior art keywords
- animation
- expression
- voice messaging
- image
- sound bite
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Processing Or Creating Images (AREA)
Abstract
The embodiments of the invention provide a kind of method and electronic equipment for sending speech animation.This method includes:Obtain image;Obtain voice messaging;According to the voice messaging, animation is carried out to described image, to generate speech animation;And send the speech animation to recipient.By according to voice messaging, carrying out animation to image, to generate speech animation, and speech animation being sent to recipient, can enable the recipient to watch speech animation.Compared to pure speech message, speech animation is more interesting and visual effect, so as to improve Consumer's Experience.
Description
Technical field
The present invention relates to the communications field, and in particular to a kind of method and electronic equipment for sending speech animation.
Background technology
With the development of internet, people are communicated using speech message more and more.But current voice
Message is more dull, lacks interest.
The content of the invention
The embodiments of the invention provide a kind of method and electronic equipment for sending speech animation, to increase speech message
Interest, and improve Consumer's Experience.
According to the first aspect of the invention, there is provided a kind of method for sending speech animation, this method include:
Obtain image;
Obtain voice messaging;
According to the voice messaging, animation is carried out to described image, to generate speech animation;And
The speech animation is sent to recipient.
According to the first aspect of the invention, in the first possible implementation, the acquisition image includes:
Obtain pre-set image or the image of user's input.
According to the first aspect of the invention, in second of possible implementation, the acquisition voice messaging includes:
Obtain the voice messaging of user's input.
According to the first aspect of the invention, it is described according to the voice messaging in the third possible implementation, it is right
Described image carries out animation, is included with generating speech animation:
The voice messaging is divided into multiple sound bites;
Obtain the feature of each sound bite in the multiple sound bite;
According to mouth expression corresponding to the Feature Selection of each sound bite;
According to expression frame corresponding to mouth expression corresponding to each described sound bite and described image generation;
According to expression frame corresponding to the multiple sound bite and voice messaging generation speech animation.
The third possible implementation according to the first aspect of the invention, in the 4th kind of possible implementation,
It is described the voice messaging is divided into multiple sound bites to include:
According to default animation frame per second, the voice messaging is divided into multiple sound bites, wherein, each sound bite with
A frame in animation is corresponding.
The third possible implementation according to the first aspect of the invention, in the 5th kind of possible implementation,
Mouth expression includes corresponding to the Feature Selection of each sound bite described in the basis:
According to the feature of each sound bite and default model, selection and the spy in default expression storehouse
Mouth expression corresponding to sign.
The third possible implementation according to the first aspect of the invention, in the 6th kind of possible implementation,
Mouth expression includes corresponding to the Feature Selection of each sound bite described in the basis:
According to the feature of each sound bite, a upper sound bite pair for default model and the sound bite
The mouth expression answered, mouth expression corresponding with the feature is selected in default expression storehouse.
The third possible implementation according to the first aspect of the invention, in the 7th kind of possible implementation,
Expression frame corresponding to mouth expression corresponding to each sound bite described in the basis and described image generation includes:
Mouth expression and described image it will be combined corresponding to each described sound bite, expression frame corresponding to generation.
The third possible implementation according to the first aspect of the invention, in the 8th kind of possible implementation,
The expression frame according to corresponding to the multiple sound bite and voice messaging generation speech animation include:
Expression frame corresponding to the multiple sound bite is sequentially arranged, generates expression animation;And
The expression animation and the voice messaging are packaged into the speech animation.
According to the second aspect of the invention, there is provided a kind of electronic equipment, the electronic equipment include:
Image collection module, for obtaining image;
Voice acquisition module, for obtaining voice messaging;
Speech animation generation module, for according to the voice messaging, animation being carried out to described image, to generate voice
Animation;And
Sending module, for sending the speech animation to other electronic equipments.
With reference to the second aspect of the present invention, in the first possible implementation, described image acquisition module is specifically used
In:
Obtain pre-set image or the image of user's input.
With reference to the second aspect of the present invention, in second of possible implementation, the voice acquisition module is specifically used
In:
Obtain the voice messaging of user's input.
With reference to the second aspect of the present invention, in the third possible implementation, the speech animation generation module bag
Include:
Voice splits submodule, for the voice messaging to be divided into multiple sound bites;
Feature acquisition submodule, for obtaining the feature of each sound bite in the multiple sound bite;
Mouth expression selects submodule, for mouth table corresponding to the Feature Selection according to each sound bite
Feelings;
Expression frame generates submodule, for the mouth expression according to corresponding to each described sound bite and described image life
Into corresponding expression frame;
Speech animation generates submodule, for the expression frame according to corresponding to the multiple sound bite and the voice messaging
Generate speech animation.
With reference to the third possible implementation of the second aspect of the present invention, in the 4th kind of possible implementation,
The voice segmentation submodule is specifically used for:
According to default animation frame per second, the voice messaging is divided into multiple sound bites, wherein, each sound bite
It is corresponding with the frame in animation.
With reference to the third possible implementation of the second aspect of the present invention, in the 5th kind of possible implementation,
The mouth expression selection submodule is specifically used for:
According to the feature of each sound bite and default model, selection and the spy in default expression storehouse
Mouth expression corresponding to sign.
With reference to the third possible implementation of the second aspect of the present invention, in the 6th kind of possible implementation,
The mouth expression selection submodule is specifically used for:
According to the feature of each sound bite, a upper sound bite pair for default model and the sound bite
The mouth expression answered, mouth expression corresponding with the feature is selected in default expression storehouse.
With reference to the third possible implementation of the second aspect of the present invention, in the 7th kind of possible implementation,
The expression frame generation submodule is specifically used for:
Mouth expression and described image it will be combined corresponding to each described sound bite, expression frame corresponding to generation.
With reference to the third possible implementation of the second aspect of the present invention, in the 8th kind of possible implementation,
The speech animation generation submodule is specifically used for:
Expression frame corresponding to the multiple sound bite is sequentially arranged, generates expression animation;And
The expression animation and the voice messaging are packaged into the speech animation.
According to the third aspect of the invention we, there is provided a kind of electronic equipment, the electronic equipment include:
Memory, audio acquisition module, Network Interface Module and with memory, audio acquisition module 402, network interface
The processor of module connection, wherein, memory is used to store batch processing code, and processor calls the program that memory is stored
Code is used to perform following operation:
Obtain image;
Obtain voice messaging;
According to the voice messaging, animation is carried out to described image, to generate speech animation;And
The speech animation is sent to other electronic equipments.
With reference to the third aspect of the present invention, in the first possible implementation, processor calls memory to be stored
Program code be used for perform following operation:
Obtain pre-set image or the image of user's input.
With reference to the third aspect of the present invention, in second of possible implementation, processor memory is stored
Program code is used to perform following operation:
Obtain the voice messaging of user's input.
With reference to the third aspect of the present invention, in the third possible implementation, processor calls memory to be stored
Program code be used for perform following operation:
The voice messaging is divided into multiple sound bites;
Obtain the feature of each sound bite in the multiple sound bite;
According to mouth expression corresponding to the Feature Selection of each sound bite;
According to expression frame corresponding to mouth expression corresponding to each described sound bite and described image generation;
According to expression frame corresponding to the multiple sound bite and voice messaging generation speech animation.
With reference to the third possible implementation of the third aspect of the present invention, in the 4th kind of possible implementation,
The program code that processor calls memory to be stored is used to perform following operation:
According to default animation frame per second, the voice messaging is divided into multiple sound bites, wherein, each sound bite with
A frame in animation is corresponding.
With reference to the third possible implementation of the third aspect of the present invention, in the 5th kind of possible implementation,
The program code that processor calls memory to be stored is used to perform following operation:
According to the feature of each sound bite and default model, selection and the spy in default expression storehouse
Mouth expression corresponding to sign.
With reference to the third possible implementation of the third aspect of the present invention, in the 6th kind of possible implementation,
The program code that processor calls memory to be stored is used to perform following operation:
According to the feature of each sound bite, a upper sound bite pair for default model and the sound bite
The mouth expression answered, mouth expression corresponding with the feature is selected in default expression storehouse.
With reference to the third possible implementation of the third aspect of the present invention, in the 7th kind of possible implementation,
The program code that processor calls memory to be stored is used to perform following operation:
Mouth expression and described image it will be combined corresponding to each described sound bite, expression frame corresponding to generation.
With reference to the third possible implementation of the third aspect of the present invention, in the 8th kind of possible implementation,
The program code that processor calls memory to be stored is used to perform following operation:
Expression frame corresponding to the multiple sound bite is sequentially arranged, generates expression animation;And
The expression animation and the voice messaging are packaged into the speech animation.
The embodiments of the invention provide a kind of method and electronic equipment for sending speech animation.By according to voice messaging,
Animation is carried out to image, to generate speech animation, and speech animation is sent to recipient, can enable the recipient to watch
Speech animation.Compared to pure speech message, speech animation is more interesting and visual effect, so as to improve user's body
Test.Corresponding speech animation can be generated by voice in real time according further to the method for speech production speech animation, without
Obtain facial information, have efficiency high, speed be fast, limitation less, the advantages of resource consumption is few.
Brief description of the drawings
Technical scheme in order to illustrate the embodiments of the present invention more clearly, make required in being described below to embodiment
Accompanying drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the present invention, for
For those of ordinary skill in the art, on the premise of not paying creative work, other can also be obtained according to these accompanying drawings
Accompanying drawing.
Fig. 1 shows a kind of flow chart of the method for transmission speech animation according to embodiments of the present invention;
Fig. 2 shows according to embodiments of the present invention according to the voice messaging, animation is carried out to described image, with life
Into the flow chart of voice animated steps;
Fig. 3 shows the block diagram of a kind of electronic equipment according to embodiments of the present invention;
Fig. 4 shows the block diagram of a kind of electronic equipment according to embodiments of the present invention;
Fig. 5 is a kind of schematic diagram for switching pre-set image provided in an embodiment of the present invention;
Fig. 6 is a kind of schematic diagram for switching pre-set image provided in an embodiment of the present invention;
Fig. 7 is a kind of schematic diagram for switching pre-set image provided in an embodiment of the present invention.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached in the embodiment of the present invention
Figure, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only this
Invention part of the embodiment, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art exist
The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
The embodiments of the invention provide a kind of method and electronic equipment for sending speech animation.By according to voice messaging,
Animation is carried out to image, to generate speech animation, and speech animation is sent to recipient, can enable the recipient to watch
Speech animation.Compared to pure speech message, speech animation is more interesting and visual effect, so as to improve user's body
Test.Corresponding speech animation can be generated by voice in real time according further to the method for speech production speech animation, without
Obtain facial information, have efficiency high, speed be fast, limitation less, the advantages of resource consumption is few.
Fig. 1 shows a kind of flow chart of the method for transmission speech animation according to embodiments of the present invention.This method can be
Performed in terminal.Terminal may include but be not limited to, mobile phone, on knee or notebook, desktop computer, individual number
Word assistant (PDA), game control terminal.As shown in figure 1, this method may include following steps:
Step 101:Obtain image.
In one embodiment, obtaining image includes obtaining pre-set image.In one example, pre-set image is acquiescence
Image, the acquiescence acquisition image is operated every time.In another example, pre-set image is the figure inputted user's last time
Picture, so each user input new image and then update pre-set image.
In another embodiment, the image that image includes obtaining user's input is obtained.In one example, user inputs
Image be image that user selects from local image library so that user can select arbitrary local image dynamic to generate
Draw.In another example, the image of user's input is the photo that user is obtained by camera so that user can select i.e.
When the photo that shoots generate animation.Here photo can be from taking a picture or non-from taking a picture.At also one
In example, the image of user's input is the image that user selects from one group of pre-set image of display.In further example,
During generating speech animation, user can switch between several pre-set images, as shown in Figures 5 to 7, have several pre-
If image, during expression frame is generated, can be switched between different pre-set images, be elected in a certain width it is pre-
If during image, according to expression frame corresponding to pre-set image generation, after another width pre-set image is switched to, after switching
Pre-set image continue to generate corresponding to expression frame, what is initially selected is the pre-set image shown in Fig. 5, through after a while it
Afterwards, the pre-set image shown in Fig. 6 is switched in real time, has been crossed again after a period of time, switches to the default figure shown in Fig. 7
Picture.
Step 102:Obtain voice messaging.
In one embodiment, the voice messaging that voice messaging includes obtaining user's input is obtained.In one example, lead to
The audio acquisition module (such as microphone) that crossing terminal includes or couple obtains the voice messaging that user inputs immediately.Specifically,
In this example embodiment, it may include audio-frequency information is obtained by microphone in real time, voice messaging is isolated from the audio-frequency information.Generally
For, the frequency range of the voice of people in 300Hz between 4000Hz, therefore can by being filtered to audio-frequency information, point
Separate out voice messaging of the frequency range in 300Hz to the information between 4000Hz as people.Optionally, can also further pass through
The intensity of sound separates voice messaging because the voice of people typically in 40dB between 60dB, therefore can according to sound
DB filters to audio-frequency information, isolates intensity in 40dB to the audio-frequency information between 60dB.Optionally, can also be right
The voice messaging isolated carries out the processing such as noise reduction, obtains more accurate voice messaging.In another example, user's input
Voice messaging be that user selects from local voice storehouse.
Step 103:According to the voice messaging, animation is carried out to described image, to generate speech animation.
Specifically, step 103 may include following steps:
The voice messaging is divided into multiple sound bites;
Obtain the feature of each sound bite in the multiple sound bite;
According to mouth expression corresponding to the Feature Selection of each sound bite;
According to expression frame corresponding to mouth expression corresponding to each described sound bite and described image generation;And
According to expression frame corresponding to the multiple sound bite and voice messaging generation speech animation.
Above-mentioned steps hereinafter are described in detail in reference picture 2.
Step 104:The speech animation is sent to recipient.
Recipient can be one or more.For example, can to one of one or more social networking application user-associations or
Multiple terminals send the speech animation.For another example, can be sent to multiple terminals of one or more social networking application group associations
The speech animation.Transmission may include the mode by transit server or the mode without transit server.
In one embodiment, before step 101, this method also includes:Obtain and start speech animation instruction.For example,
User can start speech animation instruction by clicking on pre-set button in instant messaging application interface to trigger.The embodiment of the present invention
The mode for starting speech animation instruction is not limited.
In one embodiment, this method also includes the step of display speech animation so that user can intuitively experience
The speech animation of recording.
The embodiments of the invention provide a kind of method for sending speech animation.By according to voice messaging, being carried out to image
Animation, to generate speech animation, and speech animation is sent to recipient, can enable the recipient to watch speech animation.
Compared to pure speech message, speech animation is more interesting and visual effect, so as to improve Consumer's Experience.
Fig. 2 shows according to embodiments of the present invention according to the voice messaging, animation is carried out to described image, with life
Into the flow chart of voice animated steps.As shown in Figure 2, the step comprises the following steps:
Step 201:The voice messaging is divided into multiple sound bites.
Specifically, the step includes:According to default animation frame per second, the voice messaging is divided into multiple sound bites,
Wherein, each sound bite is corresponding with the frame in animation.Exemplary, when default animation frame per second is 30 frames/second, each
The length of sound bite is 1/30 second;When default animation frame per second is 60 frames/second, the length of each sound bite is 1/60 second.
The embodiment of the present invention is not limited to specific preset frame rate and partitioning scheme.
Step 202:Obtain the feature of each sound bite in the multiple sound bite.
Exemplary, this feature can be MFCC (Mel Frequency Cepstral Coefficents, mel-frequency
Cepstrum coefficient) feature.The embodiment of the present invention is not limited to specific feature.
Step 203:According to mouth expression corresponding to the Feature Selection of each sound bite.
In one embodiment, the process can include:
According to the feature and preset model, mouth expression corresponding with the feature is selected in default expression storehouse.
Preset model can be trained to obtain by supervised learning.Training can use any one side in following scheme
Case is carried out:
Scheme 1:
A, training data is collected.
Collect the data of the shape corresponding relation largely comprising voice and mouth, such as film, TV fragment.
B, the data being collected into are pre-processed.
The frame of video that face mouth is carried in the data being collected into is picked out.
The MFCC feature extractions of the shape of mouth in these frame of video and corresponding voice messaging are come out.
C, random forest (Random Forest) is instructed according to the shape of these mouths and corresponding MFCC features
Practice, the random forest after being trained is as preset model.
During the mouth expression according to corresponding to the Feature Selection, the feature is inputted random after the training
Forest, random forest will determine that the shape of mouth corresponding to this feature, and the shape of the mouth is chosen from default expression storehouse
Corresponding mouth expression is as mouth expression corresponding to the feature.
Scheme 2:
A, training data is collected.
Collect the largely data comprising voice and mouth open and-shut mode corresponding relation, such as film, TV fragment.
B, the data being collected into are pre-processed.
The frame of video that face mouth is carried in the data being collected into is picked out.
The MFCC feature extractions of the open and-shut mode of mouth in these frame of video and corresponding voice messaging are come out.
C, according to the open and-shut mode of these mouths and corresponding MFCC features to SVM (Support Vector Machine,
SVMs) it is trained, the SVM after being trained is as preset model.
During the mouth expression according to corresponding to the Feature Selection, SVM that the feature is inputted after the training,
SVM will determine that mouth state is out or closed corresponding to this feature, if corresponding state is out, from default expression storehouse
The expression that is out of mouth state is chosen as mouth expression corresponding to the feature, if corresponding state is to close, from default
Expression storehouse in choose mouth state be the expression closed as mouth expression corresponding to the feature.
Scheme 3:
A, training data is collected.
Collect the data of the shape corresponding relation largely comprising voice and mouth, such as film, TV fragment.
B, the data being collected into are pre-processed.
The frame of video that face mouth is carried in the data being collected into is picked out.
By voice letter corresponding to the characteristic point of face corresponding to the shape of mouth in these frame of video and the shape of the mouth
The MFCC feature extractions of breath come out.
C, according to the characteristic point of these faces and corresponding MFCC features to GMM (Gaussian Mixture Model) mould
Type is trained, and the GMM model after being trained is as preset model.
During the mouth expression according to corresponding to the Feature Selection, GMM that the feature is inputted after the training
Model, GMM model will determine that the characteristic point of face corresponding to this feature, and the feature of the face is chosen from default expression storehouse
Mouth expression is as mouth expression corresponding to the feature corresponding to point.
Scheme 4:
A, training data is collected.
Collect the data of the shape corresponding relation largely comprising voice and mouth, such as film, TV fragment.
B, the data being collected into are pre-processed.
The frame of video that face mouth is carried in the data being collected into is picked out.
By voice letter corresponding to the characteristic point of face corresponding to the shape of mouth in these frame of video and the shape of the mouth
The MFCC feature extractions of breath come out.
C, according to the characteristic point of these faces and corresponding MFCC features to 3 layers of neutral net (Neural Networks)
It is trained, 3 after being trained layer neutral net is as preset model.
During the mouth expression according to corresponding to the Feature Selection, the feature is inputted after the training 3 layers
Neutral net, 3 layers of neutral net will determine that the characteristic point of face corresponding to this feature, and choose the people from default expression storehouse
Mouth expression corresponding to the characteristic point of face is as mouth expression corresponding to the feature.
In another embodiment, the process can include:
According to the feature of each sound bite, a upper sound bite pair for default model and the sound bite
The mouth expression answered, mouth expression corresponding with the feature is selected in default expression storehouse.
In this embodiment, the training method of preset model is referred to described in SVM above, will not be repeated here.
During the mouth expression according to corresponding to the Feature Selection, SVM that the feature is inputted after the training,
SVM will determine that the probability that mouth state corresponding to this feature is out, and be designated as p, then the mouth state is that the probability closed is 1-p.
If p exceedes default threshold value, mouth state corresponding to judgement is out that otherwise mouth state corresponding to judgement is
Close.The initial value of the threshold value is 0.5, and expression corresponding to a upper sound bite for sound bite according to corresponding to when the feature
Mouth state the threshold value dynamically adjusted.
Exemplary, when the mouth state of expression corresponding to a upper sound bite for sound bite corresponding to the feature is
It is 0.3 by the adjusting thresholds when opening, i.e., p corresponding to described feature is more than 0.3 and judges that its corresponding mouth state is out.
If SVM judges that state is out corresponding to this feature, selection mouth state is opened from default expression storehouse
Expression is as expression corresponding to the feature, if SVM judges that state corresponding to this feature is to close, from default expression storehouse
Selection mouth state is the expression closed as expression corresponding to the feature.
Step 204:According to expression corresponding to mouth expression corresponding to each described sound bite and described image generation
Frame.
Specifically, the process can be:
Identify the face mouth region in described image.
Exemplary, can be according to active appearance models (Active Appearance Model), active shape model
(Active Shape Model) or other modes, the characteristic point of face mouth region is obtained from described image.
It should be noted that the pre- picture can have several, can be in real time described during expression frame is generated
Switched in multiple image.
As shown in Figures 5 to 7, there is multiple image, during expression frame is generated, can enter between different images
Row switching, when certain piece image in choosing, according to expression frame corresponding to image generation, after another piece image is switched to, root
According to the image after switching continue to generate corresponding to expression frame, what is initially selected is the image shown in Fig. 5, through after a while it
Afterwards, the image shown in Fig. 6 is switched in real time, has been crossed again after a period of time, switches to the image shown in Fig. 7.
According to the mouth expression, the face mouth region in described image is driven.
Exemplary, it is corresponding with the face mouth region in described image to calculate mouth feature point in the mouth expression
Characteristic point position deviation, according to the deviation, generate the shifting of each characteristic point in the face mouth region in described image
Dynamic parameter, and the face mouth region in described image is driven according to the moving parameter.
Expression frame corresponding to face mouth region generation in the described image after described image and driving.
Exemplary, replaced with the face mouth region in the described image after the driving in the described image before driving
Face mouth region, and new image is generated, according to expression frame corresponding to the new image generation.
During face mouth region in above-mentioned identification image, if being not detected by face, with the face of acquiescence
Basis of the mouth region as generation expression frame.
According to the mouth expression, the face mouth region of the acquiescence is driven.
Exemplary, it is corresponding with the face mouth region of the acquiescence to calculate mouth feature point in the mouth expression
The position deviation of characteristic point, according to the deviation, generate the mobile ginseng of each characteristic point in the face mouth region of the acquiescence
Number, and according to the face mouth region of the moving parameter driving acquiescence.
Generated according to the face mouth region of the acquiescence after the face mouth region of the acquiescence and driving corresponding
Expression frame.
Exemplary, with the people of the acquiescence before the face mouth region replacement driving of the acquiescence after the driving
Face mouth region, and new image is generated, according to expression frame corresponding to the new image generation.
Step 205:According to expression frame corresponding to the multiple sound bite and voice messaging generation speech animation.
Specifically, the process may include following steps:
Expression frame corresponding to the multiple sound bite is sequentially arranged, generates expression animation;And
The expression animation and the voice messaging are packaged into the speech animation.
Encapsulation format can use any suitable audio frequency and video encapsulation format, such as MP4, TS, MKV, FLV etc..
The embodiment of the present invention can generate corresponding speech animation by voice in real time, without obtaining facial information,
With efficiency high, speed is fast, limitation less, the advantages of resource consumption is few.Quickly the open and-shut mode of mouth can be entered by SVM
Row judges, so as to effectively improve the speed of identification.The shape of mouth can be quickly identified by random forest, so as to have
Improve the speed of identification in effect ground.The shape of mouth can be quickly identified by SVM, so as to effectively improve the speed of identification
Degree, judges the mouth state of present frame according further to the mouth state of previous frame, is effectively improved identification
Accuracy rate.The shape of mouth can be quickly identified by GMM model, so as to effectively improve the speed of identification.Pass through god
The shape of mouth can be quickly identified through network, so as to effectively improve the speed of identification.
Fig. 3 shows the block diagram of a kind of electronic equipment according to embodiments of the present invention.As shown in figure 3, the electronic equipment bag
Include:Image collection module 301, for obtaining image;Voice acquisition module 302, for obtaining voice messaging;Speech animation generates
Module 303, for according to the voice messaging, animation being carried out to described image, to generate speech animation;And sending module
304, for sending the speech animation to other electronic equipments.
Specifically, described image acquisition module 301 is used for the image for obtaining pre-set image or user's input.
Specifically, the voice acquisition module 302 is used for the voice messaging for obtaining user's input.
Specifically, the speech animation generation module 303 includes:
Voice splits submodule 3031, for the voice messaging to be divided into multiple sound bites;
Feature acquisition submodule 3032, for obtaining the feature of each sound bite in the multiple sound bite;
Mouth expression selects submodule 3033, for mouth corresponding to the Feature Selection according to each sound bite
Expression;
Expression frame generates submodule 3034, for mouth expression and the figure according to corresponding to each described sound bite
The expression frame as corresponding to generation;
Speech animation generates submodule 3035, for the expression frame according to corresponding to the multiple sound bite and the voice
Information generates speech animation.
Specifically, the voice segmentation submodule 3031 is used to, according to default animation frame per second, the voice messaging be split
For multiple sound bites, wherein, each sound bite is corresponding with the frame in animation.
Optionally, mouth expression selection submodule 3033 be used for according to the feature of each sound bite with it is pre-
If model, corresponding with feature mouth expression is selected in default expression storehouse.
Optionally, the mouth expression selection submodule 3033 is used for the feature according to each sound bite, in advance
If model and the sound bite a upper sound bite corresponding to mouth expression, in default expression storehouse selection with it is described
Mouth expression corresponding to feature.
Specifically, the expression frame generation submodule 3034 is used for mouth expression corresponding to each described sound bite
Combined with described image, expression frame corresponding to generation.
Specifically, the speech animation generation submodule 3035 is used to be sequentially arranged the multiple sound bite pair
The expression frame answered, generate expression animation;And the expression animation and the voice messaging are packaged into the speech animation.
Optionally, the electronic equipment also includes:Start speech animation instruction acquisition module, start speech animation for obtaining
Instruction.For example, user can start speech animation instruction by clicking on pre-set button in instant messaging application interface to trigger.
The embodiment of the present invention is not limited to the mode for starting speech animation instruction.
Optionally, the electronic equipment also includes speech animation display module, for showing speech animation, so that user
The speech animation of recording can intuitively be experienced.
The embodiments of the invention provide a kind of electronic equipment for being used to send speech animation.It is right by according to voice messaging
Image carries out animation, to generate speech animation, and sends speech animation to recipient, can enable the recipient to watch language
Sound animation.Compared to pure speech message, speech animation is more interesting and visual effect, so as to improve Consumer's Experience.
Fig. 4 shows a kind of electronic equipment according to embodiments of the present invention.As shown in figure 4, the electronic equipment includes storage
Device 401, audio acquisition module 402, Network Interface Module 403 and with memory 401, audio acquisition module 402, network interface
The processor 404 that module 403 connects, wherein, memory 401 is used to store batch processing code, and processor 404 calls memory
401 program codes stored are used to perform following operation:
Obtain image;
Obtain voice messaging;
According to the voice messaging, animation is carried out to described image, to generate speech animation;And
The speech animation is sent to other electronic equipments.
Specifically, the program code that processor 404 calls memory 401 to be stored is used to perform following operation:
Obtain pre-set image or the image of user's input.
Specifically, the program code that processor 404 is stored with memory 401 is used to perform following operation:
Obtain the voice messaging of user's input.
Specifically, the program code that processor 404 calls memory 401 to be stored is used to perform following operation:
The voice messaging is divided into multiple sound bites;
Obtain the feature of each sound bite in the multiple sound bite;
According to mouth expression corresponding to the Feature Selection of each sound bite;
According to expression frame corresponding to mouth expression corresponding to each described sound bite and described image generation;
According to expression frame corresponding to the multiple sound bite and voice messaging generation speech animation.
Specifically, the program code that processor 404 calls memory 401 to be stored is used to perform following operation:
According to default animation frame per second, the voice messaging is divided into multiple sound bites, wherein, each sound bite with
A frame in animation is corresponding.
Specifically, the program code that processor 404 calls memory 401 to be stored is used to perform following operation:
According to the feature of each sound bite and default model, selection and the spy in default expression storehouse
Mouth expression corresponding to sign.
Specifically, the program code that processor 404 calls memory 401 to be stored is used to perform following operation:
According to the feature of each sound bite, a upper sound bite pair for default model and the sound bite
The mouth expression answered, mouth expression corresponding with the feature is selected in default expression storehouse.
Specifically, the program code that processor 404 calls memory 401 to be stored is used to perform following operation:
Mouth expression and described image it will be combined corresponding to each described sound bite, expression frame corresponding to generation.
Specifically, the program code that processor 404 calls memory 401 to be stored is used to perform following operation:
Expression frame corresponding to the multiple sound bite is sequentially arranged, generates expression animation;And
The expression animation and the voice messaging are packaged into the speech animation.
The embodiments of the invention provide a kind of electronic equipment for being used to send speech animation.It is right by according to voice messaging
Image carries out animation, to generate speech animation, and sends speech animation to recipient, can enable the recipient to watch language
Sound animation.Compared to pure speech message, speech animation is more interesting and visual effect, so as to improve Consumer's Experience.
Above-mentioned all optional technical schemes, any combination can be used to form the alternative embodiment of the present invention, herein no longer
Repeat one by one.
It should be noted that:Above-described embodiment provide electronic equipment perform send speech animation method when, only with
The division progress of above-mentioned each functional module, can be as needed and by above-mentioned function distribution by not for example, in practical application
Same functional module is completed, i.e., the internal structure of equipment is divided into different functional modules, to complete whole described above
Or partial function.In addition, the electronic equipment that above-described embodiment provides belongs to same with sending the embodiment of the method for speech animation
Design, its specific implementation process refer to embodiment of the method, repeated no more here.
One of ordinary skill in the art will appreciate that hardware can be passed through by realizing all or part of step of above-described embodiment
To complete, by program the hardware of correlation can also be instructed to complete, described program can be stored in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only storage, disk or CD etc..
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent substitution and improvements made etc., it should be included in the scope of the protection.
Claims (10)
- A kind of 1. method for sending speech animation, it is characterised in that methods described includes:Obtain image;Obtain voice messaging;According to the voice messaging, animation is carried out to described image, to generate speech animation;AndThe speech animation is sent to recipient.
- 2. according to the method for claim 1, it is characterised in that the acquisition image includes:Obtain pre-set image or the image of user's input.
- 3. according to the method for claim 1, it is characterised in that the acquisition voice messaging includes:Obtain the voice messaging of user's input.
- 4. according to the method for claim 1, it is characterised in that it is described according to the voice messaging, described image is carried out Animation, included with generating speech animation:The voice messaging is divided into multiple sound bites;Obtain the feature of each sound bite in the multiple sound bite;According to mouth expression corresponding to the Feature Selection of each sound bite;According to expression frame corresponding to mouth expression corresponding to each described sound bite and described image generation;According to expression frame corresponding to the multiple sound bite and voice messaging generation speech animation.
- 5. according to the method for claim 4, it is characterised in that described that the voice messaging is divided into multiple sound bites Including:According to default animation frame per second, the voice messaging is divided into multiple sound bites, wherein, each sound bite and animation In a frame it is corresponding.
- 6. a kind of electronic equipment, it is characterised in that the equipment includes:Image collection module, for obtaining image;Voice acquisition module, for obtaining voice messaging;Speech animation generation module, for according to the voice messaging, carrying out animation to described image, being moved with generating voice Draw;AndSending module, for sending the speech animation to other electronic equipments.
- 7. electronic equipment according to claim 6, it is characterised in that described image acquisition module is specifically used for:Obtain pre-set image or the image of user's input.
- 8. electronic equipment according to claim 6, it is characterised in that the voice acquisition module is specifically used for:Obtain the voice messaging of user's input.
- 9. electronic equipment according to claim 6, it is characterised in that the speech animation generation module includes:Voice splits submodule, for the voice messaging to be divided into multiple sound bites;Feature acquisition submodule, for obtaining the feature of each sound bite in the multiple sound bite;Mouth expression selects submodule, for mouth expression corresponding to the Feature Selection according to each sound bite;Expression frame generates submodule, for the mouth expression according to corresponding to each described sound bite and described image generation pair The expression frame answered;Speech animation generates submodule, is generated for the expression frame according to corresponding to the multiple sound bite and the voice messaging Speech animation.
- 10. electronic equipment according to claim 9, it is characterised in that the voice segmentation submodule is specifically used for:According to default animation frame per second, the voice messaging is divided into multiple sound bites, wherein, each sound bite and animation In a frame it is corresponding.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610560255.2A CN107623622A (en) | 2016-07-15 | 2016-07-15 | A kind of method and electronic equipment for sending speech animation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610560255.2A CN107623622A (en) | 2016-07-15 | 2016-07-15 | A kind of method and electronic equipment for sending speech animation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107623622A true CN107623622A (en) | 2018-01-23 |
Family
ID=61087410
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610560255.2A Pending CN107623622A (en) | 2016-07-15 | 2016-07-15 | A kind of method and electronic equipment for sending speech animation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107623622A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110288680A (en) * | 2019-05-30 | 2019-09-27 | 盎锐(上海)信息科技有限公司 | Image generating method and mobile terminal |
CN110971502A (en) * | 2018-09-30 | 2020-04-07 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for displaying sound message in application program |
CN111200555A (en) * | 2019-12-30 | 2020-05-26 | 咪咕视讯科技有限公司 | Chat message display method, electronic device and storage medium |
CN111383642A (en) * | 2018-12-27 | 2020-07-07 | Tcl集团股份有限公司 | Voice response method based on neural network, storage medium and terminal equipment |
CN112188304A (en) * | 2020-09-28 | 2021-01-05 | 广州酷狗计算机科技有限公司 | Video generation method, device, terminal and storage medium |
CN116312612A (en) * | 2023-02-02 | 2023-06-23 | 北京甲板智慧科技有限公司 | Audio processing method and device based on deep learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1731833A (en) * | 2005-08-23 | 2006-02-08 | 孙丹 | Method for composing audio/video file by voice driving head image |
US7168953B1 (en) * | 2003-01-27 | 2007-01-30 | Massachusetts Institute Of Technology | Trainable videorealistic speech animation |
CN103218842A (en) * | 2013-03-12 | 2013-07-24 | 西南交通大学 | Voice synchronous-drive three-dimensional face mouth shape and face posture animation method |
CN105551071A (en) * | 2015-12-02 | 2016-05-04 | 中国科学院计算技术研究所 | Method and system of face animation generation driven by text voice |
-
2016
- 2016-07-15 CN CN201610560255.2A patent/CN107623622A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7168953B1 (en) * | 2003-01-27 | 2007-01-30 | Massachusetts Institute Of Technology | Trainable videorealistic speech animation |
CN1731833A (en) * | 2005-08-23 | 2006-02-08 | 孙丹 | Method for composing audio/video file by voice driving head image |
CN103218842A (en) * | 2013-03-12 | 2013-07-24 | 西南交通大学 | Voice synchronous-drive three-dimensional face mouth shape and face posture animation method |
CN105551071A (en) * | 2015-12-02 | 2016-05-04 | 中国科学院计算技术研究所 | Method and system of face animation generation driven by text voice |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110971502A (en) * | 2018-09-30 | 2020-04-07 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for displaying sound message in application program |
CN110971502B (en) * | 2018-09-30 | 2021-09-28 | 腾讯科技(深圳)有限公司 | Method, device, equipment and storage medium for displaying sound message in application program |
US11895273B2 (en) | 2018-09-30 | 2024-02-06 | Tencent Technology (Shenzhen) Company Limited | Voice message display method and apparatus in application, computer device, and computer-readable storage medium |
CN111383642A (en) * | 2018-12-27 | 2020-07-07 | Tcl集团股份有限公司 | Voice response method based on neural network, storage medium and terminal equipment |
CN111383642B (en) * | 2018-12-27 | 2024-01-02 | Tcl科技集团股份有限公司 | Voice response method based on neural network, storage medium and terminal equipment |
CN110288680A (en) * | 2019-05-30 | 2019-09-27 | 盎锐(上海)信息科技有限公司 | Image generating method and mobile terminal |
CN111200555A (en) * | 2019-12-30 | 2020-05-26 | 咪咕视讯科技有限公司 | Chat message display method, electronic device and storage medium |
CN112188304A (en) * | 2020-09-28 | 2021-01-05 | 广州酷狗计算机科技有限公司 | Video generation method, device, terminal and storage medium |
CN116312612A (en) * | 2023-02-02 | 2023-06-23 | 北京甲板智慧科技有限公司 | Audio processing method and device based on deep learning |
CN116312612B (en) * | 2023-02-02 | 2024-04-16 | 北京甲板智慧科技有限公司 | Audio processing method and device based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107623622A (en) | A kind of method and electronic equipment for sending speech animation | |
CN107203953B (en) | Teaching system based on internet, expression recognition and voice recognition and implementation method thereof | |
TWI778477B (en) | Interaction methods, apparatuses thereof, electronic devices and computer readable storage media | |
CN109637518B (en) | Virtual anchor implementation method and device | |
CN111833418B (en) | Animation interaction method, device, equipment and storage medium | |
CN106339680B (en) | Face key independent positioning method and device | |
CN104780093B (en) | Expression information processing method and processing device during instant messaging | |
US11670015B2 (en) | Method and apparatus for generating video | |
CN109951654A (en) | A kind of method of Video Composition, the method for model training and relevant apparatus | |
CN109618184A (en) | Method for processing video frequency and device, electronic equipment and storage medium | |
CN108874114B (en) | Method and device for realizing emotion expression of virtual object, computer equipment and storage medium | |
CN108197185A (en) | A kind of music recommends method, terminal and computer readable storage medium | |
WO2020253128A1 (en) | Voice recognition-based communication service method, apparatus, computer device, and storage medium | |
CN110503942A (en) | A kind of voice driven animation method and device based on artificial intelligence | |
CN107623830B (en) | A kind of video call method and electronic equipment | |
CN107153496A (en) | Method and apparatus for inputting emotion icons | |
CN111459454B (en) | Interactive object driving method, device, equipment and storage medium | |
CN106296690A (en) | The method for evaluating quality of picture material and device | |
CN105989165A (en) | Method, apparatus and system for playing facial expression information in instant chat tool | |
CN109032470A (en) | Screenshot method, device, terminal and computer readable storage medium | |
CN113971828B (en) | Virtual object lip driving method, model training method, related device and electronic equipment | |
CN107291704A (en) | Treating method and apparatus, the device for processing | |
US20230368461A1 (en) | Method and apparatus for processing action of virtual object, and storage medium | |
CN107480766A (en) | The method and system of the content generation of multi-modal virtual robot | |
CN112911192A (en) | Video processing method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180123 |
|
WD01 | Invention patent application deemed withdrawn after publication |