CN104869326A - Image display method for cooperating with audios and equipment thereof - Google Patents

Image display method for cooperating with audios and equipment thereof Download PDF

Info

Publication number
CN104869326A
CN104869326A CN201510279742.7A CN201510279742A CN104869326A CN 104869326 A CN104869326 A CN 104869326A CN 201510279742 A CN201510279742 A CN 201510279742A CN 104869326 A CN104869326 A CN 104869326A
Authority
CN
China
Prior art keywords
shape
mouth
speaks
operational scenarios
session operational
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510279742.7A
Other languages
Chinese (zh)
Other versions
CN104869326B (en
Inventor
周有凯
蒋心怡
徐晓然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Hangzhou Network Co Ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN201510279742.7A priority Critical patent/CN104869326B/en
Publication of CN104869326A publication Critical patent/CN104869326A/en
Application granted granted Critical
Publication of CN104869326B publication Critical patent/CN104869326B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • User Interface Of Digital Computer (AREA)

Abstract

The embodiment mode of the invention provides an image display method for cooperating with audios. The method comprises the steps that a dialogue scene is operated; the mouth shapes of scene roles are dynamically displayed when the dialogue scene operates in a voice time period; and the mouth shapes of the scene roles are statically displayed when the dialogue scene operates in a mute time period; wherein the voice time period and the mute time period are obtained by dividing the dialogue scene according to dialogue scene audio waveform information, audio waveform amplitude within the voice time period is greater than a first amplitude threshold, the audio waveform amplitude within the mute time period is less than a second amplitude threshold, and the first amplitude threshold is not less than the second amplitude threshold. The voice time period and the mute time period are divided via the audio waveform information, and dialogue audios and mouth shape images of the scene roles can be cooperated by the method so that a more lifelike dialogue display effect can be provided to a user. Besides, the embodiment mode of the invention also provides image display equipment for cooperating with the audios.

Description

A kind of method for displaying image and equipment coordinating audio frequency
Technical field
Embodiments of the present invention relate to image real time transfer field, and more specifically, embodiments of the present invention relate to a kind of method for displaying image and the equipment that coordinate audio frequency.
Background technology
This part embodiments of the present invention be intended to for stating in claims provide background or context.Description is not herein because be included in just admit it is prior art in this part.
In various game application, animation video or Computer Simulation application program, usually can relate to the session operational scenarios that the display of some images needs to work in coordination with audio frequency.In these session operational scenarios, scene role can engage in the dialogue in turn.Such as, usually can relate to game scenario session operational scenarios in game application, in game scenario session operational scenarios, game role can engage in the dialogue in turn.Visible, in session operational scenarios, not only need the sound playing scene part dialog, also need to present the scene role shape of the mouth as one speaks matched with conversation audio, also, need to present this scene role shape of the mouth as one speaks when scene role speaks and change dynamically.
When speaking to make scene role, the shape of the mouth as one speaks realizes dynamic change, what prior art adopted is, for session operational scenarios pre-sets the picture of the different shape of the mouth as one speaks of scene role, when application program runs to session operational scenarios, the picture of different for the scene role shape of the mouth as one speaks is dynamically switched display, so just make the shape of the mouth as one speaks of the display image Scene role of session operational scenarios to change dynamically, thus match with the dialogue of the audio frequency Scene role of session operational scenarios.
Summary of the invention
It should be noted that, in session operational scenarios, scene role is speaking always.In a lot of situation, scene role speaking in session operational scenarios has pause to a certain degree, even if that is, under session operational scenarios, scene role is in the state of speaking in some stage, and the state seized up in some stage or time slot; Therefore, scene role, when some stage or time slot are in dumb state, needs the shape of the mouth as one speaks presenting this scene role to remain unchanged, and could produce dialogue display effect more true to nature like this.But, in prior art, the picture of application program to the different shape of the mouth as one speaks of scene role carries out switching display, carry out in units of session operational scenarios, this makes the shape of the mouth as one speaks of the image Scene role shown by session operational scenarios be in dynamic change always, also, even if in the time slot that session operational scenarios Scene role is in silent state, the scene role shape of the mouth as one speaks is also still in dynamic change, thus causes the conversation audio of session operational scenarios Scene role cannot match with mouth shape image.
Therefore in the prior art, for the part stage of session operational scenarios, even if scene role is in dumb state under this stage, in display image, the shape of the mouth as one speaks of this scene role is also still in dynamic change, thus causing the conversation audio of scene role under the part stage of session operational scenarios cannot match with mouth shape image, this is very bothersome process.
For this reason, be starved of a kind of method for displaying image and equipment of cooperation audio frequency of improvement, to make, the stage being in the state of speaking session operational scenarios Scene role can the image of displayed scene role shape of the mouth as one speaks dynamic change, and, the stage being in silent state session operational scenarios Scene role can the image that remains unchanged of the displayed scene role shape of the mouth as one speaks, thus the conversation audio of each stage Scene role of session operational scenarios can both be matched with mouth shape image.
In the present context, embodiments of the present invention are expected to provide a kind of method for displaying image and the equipment that coordinate audio frequency.
In the first aspect of embodiment of the present invention, provide a kind of method for displaying image coordinating audio frequency, comprising: run session operational scenarios; When described session operational scenarios operates in Speech time section, Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role; When described session operational scenarios operates in time periods of silence, static state display is carried out to the shape of the mouth as one speaks of scene role; Wherein, the shape information of described Speech time section and described time periods of silence audio frequency corresponding to described session operational scenarios divides described session operational scenarios and obtains, wherein, in described Speech time section, the amplitude of wave form of described shape information is greater than the first amplitude threshold, in described time periods of silence, the amplitude of wave form of described shape information is less than the second amplitude threshold, wherein, described first amplitude threshold is not less than described second amplitude threshold.
In the second aspect of embodiment of the present invention, providing a kind of image display coordinating audio frequency, comprising: running module, for running session operational scenarios; Dynamic display module, for when described session operational scenarios operates in Speech time section, carries out Dynamic Announce to the shape of the mouth as one speaks of scene role; Static state display module, for when described session operational scenarios operates in time periods of silence, carries out static state display to the shape of the mouth as one speaks of scene role; Wherein, the shape information of described Speech time section and described time periods of silence audio frequency corresponding to described session operational scenarios obtains spending scene to divide described, wherein, in described Speech time section, the amplitude of wave form of described shape information is greater than the first amplitude threshold, in described time periods of silence, the amplitude of wave form of described shape information is less than the second amplitude threshold, wherein, described first amplitude threshold is not less than described second amplitude threshold.
According to embodiment of the present invention, for configuring method for displaying image and the equipment of audio frequency, according to the audio volume control information of session operational scenarios, session operational scenarios is divided, using the amplitude of wave form of audio volume control information larger time period as Speech time section, using the amplitude of wave form of audio volume control information less time period as time periods of silence, and when session operational scenarios is run, can in Speech time section, carry out dynamic display to the shape of the mouth as one speaks of scene role and static display can be carried out at time periods of silence to the scene role shape of the mouth as one speaks.Therefore, in session operational scenarios, because audio volume control amplitude shows more greatly that scene role speaks and audio volume control amplitude is less shows that scene role is not speaking, therefore, in Speech time section Dynamic Announce scene role the shape of the mouth as one speaks and in the shape of the mouth as one speaks of time periods of silence static state display scene role, just can make the image of the only displayed scene role shape of the mouth as one speaks dynamic change when scene role speaks in session operational scenarios, and the image that the displayed scene role shape of the mouth as one speaks remains unchanged when session operational scenarios Scene role is silent, thus the conversation audio of each stage Scene role of session operational scenarios can both be matched with mouth shape image, obtain dialogue display effect more true to nature, for user brings better experience.
summary of the invention
The present inventor finds, in session operational scenarios, scene role is speaking always, in a lot of situation, there is pause to a certain degree in scene role speaking in session operational scenarios, also, even if under session operational scenarios, scene role is in the state of speaking at part stage, is then be in dumb state in another part stage.But, in prior art, the picture of application program to the different shape of the mouth as one speaks of scene role carries out switching display, carry out in units of whole session operational scenarios, this makes the shape of the mouth as one speaks of the image Scene role shown by session operational scenarios be in dynamic change always, therefore, in session operational scenarios, even if scene role is in the time slot of silent state, the scene role shape of the mouth as one speaks is also still in the state of dynamic change, and the conversation audio which results in session operational scenarios Scene role cannot match with mouth shape image.
Based on the above-mentioned research of inventor, general principle of the present invention is: consider that the size of session operational scenarios sound intermediate frequency amplitude of wave form can reflect whether scene role speaks, and can divide according to the audio volume control information of session operational scenarios to session operational scenarios; Because audio volume control amplitude shows that more greatly scene role speaks, can using the amplitude of wave form of audio volume control information larger time period as Speech time section, can show dynamically the shape of the mouth as one speaks of scene role in Speech time section when session operational scenarios is run, the mouth shape image of dynamic change when session operational scenarios Scene role is spoken, can be shown; Show that scene role is not speaking because audio volume control amplitude is less, can using the amplitude of wave form of audio volume control information less time period as time periods of silence, in time periods of silence, static display can being carried out to scene role when session operational scenarios is run, making session operational scenarios Scene role can not show the mouth shape image remained unchanged when speaking.Therefore, the conversation audio of each stage Scene role of session operational scenarios can both match with mouth shape image, obtains dialogue display effect more true to nature, for user brings better experience.
After describing general principle of the present invention, lower mask body introduces various non-limiting embodiment of the present invention.
application scenarios overview
First be the block schematic illustration of an exemplary application scene of embodiments of the present invention with reference to figure 1, Fig. 1.Wherein, user can realize session operational scenarios alternately with the client 102 on subscriber equipment, and the application program running this session operational scenarios can be that the server 101 of application program is supplied to client 102.It will be understood by those skilled in the art that the block schematic illustration shown in Fig. 1 is only the example that embodiments of the present invention can be achieved wherein.The scope of application of embodiment of the present invention is not subject to the restriction of any aspect of this framework.
It should be noted that, subscriber equipment herein can be existing, research and develop or in the future research and development, can by any type of wired and/or wireless connections (such as, Wi-Fi, LAN, honeycomb, coaxial cable etc.) realize client on it the 102 any subscriber equipment mutual with server 101, include but not limited to: existing, research and develop or the smart mobile phone, non intelligent mobile phone, panel computer, laptop PC, desktop personal computer, minicom, medium-size computer, mainframe computer etc. of research and development in the future.
It is also to be noted that server 101 be herein only existing, research and develop or in the future research and development, an example of the equipment that can configure application system.Embodiments of the present invention are unrestricted in this regard.
Based on the framework shown in Fig. 1, client 102 can run session operational scenarios.When described session operational scenarios operates in Speech time section, client 102 can carry out Dynamic Announce to the shape of the mouth as one speaks of scene role.When described session operational scenarios operates in time periods of silence, client 102 can carry out static state display to the shape of the mouth as one speaks of scene role.Wherein, the shape information of described Speech time section and described time periods of silence audio frequency corresponding to described session operational scenarios divides described session operational scenarios and obtains, wherein, in described Speech time section, the amplitude of wave form of described shape information is greater than the first amplitude threshold, in described time periods of silence, the amplitude of wave form of described shape information is less than the second amplitude threshold, wherein, described first amplitude threshold is not less than described second amplitude threshold.
Be understandable that, in application scenarios of the present invention, although herein and below by the action description of embodiment of the present invention for be performed by client 102, but these actions also part can be performed by client 102, are partly performed by server 101, or these actions can also be performed by server 101.The present invention is unrestricted in executive agent, as long as perform the action disclosed in embodiment of the present invention.
illustrative methods
Below in conjunction with the application scenarios of Fig. 1, with reference to figure 2 ~ 3, the method for displaying image for coordinating audio frequency according to exemplary embodiment of the invention is described.It should be noted that above-mentioned application scenarios is only that embodiments of the present invention are unrestricted in this regard for the ease of understanding spirit of the present invention and principle and illustrating.On the contrary, embodiments of the present invention can be applied to applicable any scene.
See Fig. 2, show in the present invention the flow chart of method for displaying image one embodiment coordinating audio frequency.In the present embodiment, such as specifically can comprise the steps:
Step 201, operation session operational scenarios.
Step 202, when described session operational scenarios operates in Speech time section, Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role.
Wherein, carrying out Dynamic Announce to the shape of the mouth as one speaks of scene role, can be specifically the multiple pictures dynamically switching the different shape of the mouth as one speaks of displayed scene role, thus makes the shape of the mouth as one speaks presenting scene role in display image be in the state of dynamic change.
Step 203, when described session operational scenarios operates in time periods of silence, static state display is carried out to the shape of the mouth as one speaks of scene role.
Wherein, carrying out static display to the shape of the mouth as one speaks of scene role, such as, can be specifically the same picture keeping the displayed scene role shape of the mouth as one speaks, thus makes the shape of the mouth as one speaks presenting scene role in display image be in static constant state.Or, to the display of the shape of the mouth as one speaks static state of scene role, can be and for example the multiple pictures switching the identical shape of the mouth as one speaks of displayed scene role, thus make the shape of the mouth as one speaks presenting scene role in display image be in static state.
Be understandable that, session operational scenarios is made up of Speech time section and time periods of silence, and the shape information of its Speech time section and time periods of silence audio frequency corresponding to described session operational scenarios divides described session operational scenarios and obtains.Particularly, in described Speech time section, the amplitude of wave form of described shape information is greater than the first amplitude threshold, in described time periods of silence, the amplitude of wave form of described shape information is less than the second amplitude threshold, also be, for any one moment of session operational scenarios, if the amplitude of wave form of this moment subaudio frequency is greater than the first amplitude threshold, then this moment belongs to the Speech time section of session operational scenarios, if the amplitude of wave form of this moment subaudio frequency is less than the second amplitude threshold, then this moment belongs to the time periods of silence of session operational scenarios.Wherein, described first amplitude threshold is not less than described second amplitude threshold, also be, when choosing amplitude threshold, the first selected amplitude threshold can be identical threshold value with the second selected amplitude threshold, or the first selected amplitude threshold also can be greater than the second selected amplitude threshold.Such as, the first amplitude threshold and the second amplitude threshold can be set to 0.2 decibel.
It should be noted that, in the present embodiment, the Speech time section of session operational scenarios and time periods of silence can be adopt multiple different mode to divide, and, under different time period dividing mode, multiple different mode in session operational scenarios running, can be adopted to identify Speech time section and time periods of silence.
Such as, in some execution modes of the present embodiment, Speech time section and time periods of silence can be identify in real time session operational scenarios is run while and divide.Particularly, in the process running session operational scenarios, the current form information of the audio frequency of session operational scenarios can be obtained in real time, determine that current time belongs to Speech time section or time periods of silence according to the amplitude of wave form of current form information, wherein, if the amplitude of wave form of current form information is greater than the first amplitude threshold, then determine that current time belongs to Speech time section, Dynamic Announce can be carried out to the shape of the mouth as one speaks of scene role, if the amplitude of wave form of current form information is less than the second amplitude threshold, then determine that current time belongs to time periods of silence, static state display can be carried out to the shape of the mouth as one speaks of scene role.
And for example, in other execution modes of the present embodiment, Speech time section and time periods of silence can divide session operational scenarios in advance and obtain before session operational scenarios is run, and, the Speech time section marked off in advance and time periods of silence can pre-recordedly before session operational scenarios is run get off, to identify Speech time section and time periods of silence when session operational scenarios is run according to record.Particularly, in the present embodiment, the described Speech time section of described session operational scenarios and described time periods of silence such as can be recorded in the time period information in advance for described session operational scenarios configuration; Step 202 such as can be specially: determine that current described session operational scenarios operates in Speech time section in response in the process running described session operational scenarios according to described time period information, carry out Dynamic Announce to the shape of the mouth as one speaks of scene role; Step 203 such as can be specially: determine that current described session operational scenarios operates in time periods of silence in response in the process running described session operational scenarios according to time period information, carry out static state display to the shape of the mouth as one speaks of scene role.More specifically, in this embodiment, before session operational scenarios is run, the shape information of the audio frequency of whole session operational scenarios can be obtained in advance, and time each according to session operational scenarios, inscribe amplitude of wave form and first amplitude threshold of shape information, the magnitude relationship of the second amplitude threshold, session operational scenarios is divided into Speech time section and time periods of silence, again the Speech time section of session operational scenarios and time periods of silence are recorded as the time period information of session operational scenarios, and in the process running session operational scenarios, can determine that current time belongs to Speech time section or time periods of silence in real time by calling this time period information, wherein, if this time period information represents that current time belongs to Speech time section, then enter step 202, if this time period information represents current and belongs to time periods of silence, then enter step 203.
For another example, in the other execution mode of the present embodiment, Speech time section and time periods of silence can divide session operational scenarios in advance and obtain, and, can according to the Speech time section marked off in advance and time periods of silence, be session operational scenarios generating video image file in advance before session operational scenarios is run, make the shape of the mouth as one speaks of video image file Scene role in Speech time section Dynamic Announce and in time periods of silence static state display so that when session operational scenarios is run can according to video image file control scene role the shape of the mouth as one speaks display.Particularly, in the present embodiment, the described shape of the mouth as one speaks to scene role carries out Dynamic Announce, with, the described shape of the mouth as one speaks to scene role is carried out static state and is presented, such as, can be all by playing in advance for the video image file of described session operational scenarios configuration realizes in the process running described session operational scenarios; In the image of described video image file in Speech time section, the shape of the mouth as one speaks dynamic change of scene role; In the image of described video image file in time periods of silence, the shape of the mouth as one speaks static state of scene role is constant.More specifically, in this embodiment, before session operational scenarios is run, the shape information of the audio frequency of whole session operational scenarios can be obtained in advance, and time each according to session operational scenarios, inscribe amplitude of wave form and first amplitude threshold of shape information, magnitude relationship between second amplitude threshold, session operational scenarios is divided into Speech time section and time periods of silence, be that session operational scenarios is created on Speech time section Dynamic Announce scene role's shape of the mouth as one speaks and at the video image file of the time periods of silence static state display scene role shape of the mouth as one speaks according to the Speech time section marked off and time periods of silence again, and in the process running session operational scenarios, only need to play this video image file, just can make the shape of the mouth as one speaks of session operational scenarios running Scene role can at Speech time section Dynamic Announce at time periods of silence static state display, and identify that current time belongs to Speech time section or time periods of silence in real time without the need to going again in session operational scenarios running.
Be understandable that, in the respective embodiments described above of the present embodiment, some step performs in the process running session operational scenarios, and some step performed in advance before operation session operational scenarios.For running the step performed in session operational scenarios process, can be performed by operation session operational scenarios, the application program of installing on the terminal device, also, this kind of step is that application program performs in the process running session operational scenarios.For the step performed in advance before operation session operational scenarios, in some embodiments, can be such as perform when installing application program update on the terminal device, now, this kind of step can be performed by the application program of installing on the terminal device, also namely, this kind of step can be perform when mounted application program upgrades self on the terminal device.For the step performed in advance before operation session operational scenarios, in other execution modes, can be such as perform in the compiling procedure in advance of application program before set up applications on the terminal device, now, this kind of step can be that the equipment writing application program by technical staff performs, also namely, this kind of step can be performed when writing application program by the equipment of technical staff.
It should be noted that, in session operational scenarios, on the one hand, scene role is keeping the pause that may there is some short time in the process of speaking, as the pause existed between statement and statement, the pause existed between some phrase or word, and the second amplitude threshold may be less than at the amplitude of wave form of the pause place audio volume control information of these short time, this will cause scene role at the image keeping may existing in the process of speaking several sections of shape of the mouth as one speaks static state display, and what make the conversation audio of scene role and mouth shape image occur the short time cannot matching problem; On the other hand, scene role is keeping the noise that may there is some short time in dumb process, and the first amplitude threshold may be greater than at the amplitude of wave form of the noise place audio volume control information of these short time, this will cause scene at the image keeping may existing in dumb process several sections of shape of the mouth as one speaks Dynamic Announce, and what make the conversation audio of scene role and mouth shape image occur the short time cannot matching problem.In order to avoid the conversation audio of above-mentioned two aspect scene roles and mouth shape image short time cannot matching problem, in some execution modes of the present embodiment, such as can preset a minimum interval, make the Speech time section of described session operational scenarios and time periods of silence all be not less than default minimum interval.Wherein, this minimum interval such as can be set to 0.1 second.
For the execution mode being preset with minimum interval, during specific implementation, before session operational scenarios is run, such as can first according to amplitude of wave form and first amplitude threshold of audio volume control information, session operational scenarios is divided into Speech time section and time periods of silence by the magnitude relationship of the second amplitude threshold, and then each Speech time section and each time periods of silence are analyzed, be to the Speech time section of time periods of silence former and later two time periods, if this Speech time section is less than minimum interval, then former and later two time periods of silence of this Speech time Duan Yuqi are merged into a time periods of silence, and be to the time periods of silence of Speech time section former and later two time periods, if this time periods of silence is less than minimum interval, then this time periods of silence and its former and later two Speech time sections are merged into a Speech time section, the Speech time section that finally obtains and time periods of silence so just can be made all to be not less than minimum interval.
Be understandable that, because Speech time section and time periods of silence are all not less than minimum interval, between Speech time section, the time periods of silence of short time can be integrated in Speech time section, so just can the scene role static state display that keeping the pause of short time in the process of speaking can not cause its shape of the mouth as one speaks, thus make scene role can keep Dynamic Announce keeping its mouth shape image in the process of speaking always, what avoid short time between scene role and mouth shape image cannot matching problem; Similarly, between time periods of silence, the Speech time section of short time can be integrated in time periods of silence, so just scene role can not cause the Dynamic Announce of its shape of the mouth as one speaks keeping the noise of short time in dumb process, thus make scene role its mouth shape image in the dumb process of maintenance can keep static state display always, what avoid short time between scene role and mouth shape image cannot matching problem.
It should be noted that, in order to present dialogue display effect more true to nature to user, it is different for considering that scene role sends the different syllable shape of the mouth as one speaks, at some execution modes of the present embodiment, can also divide Speech time section further, enable the syllable that each Speech time section corresponding scene role is different, can in each Speech time section, adopt the specific shape of the mouth as one speaks matched with respective corresponding syllable to carry out Dynamic Announce to the shape of the mouth as one speaks of scene role in the process running session operational scenarios like this.Particularly, send two different syllables for scene role, abovementioned steps 202 such as can comprise: when described session operational scenarios operates in the first Speech time section, adopts first shape of the mouth as one speaks to carry out Dynamic Announce to the shape of the mouth as one speaks of scene role; When described session operational scenarios operates in the second Speech time section, second shape of the mouth as one speaks is adopted to carry out Dynamic Announce to the shape of the mouth as one speaks of scene role; Wherein, described first Speech time section and described second Speech time section to divide described Speech time section according to the speech syllable of the corresponding audio frequency of described Speech time section and obtain, wherein, in described first Speech time section, described speech syllable is the first pronunciation syllable, and in described second Speech time section, described speech syllable is the second pronunciation syllable; Wherein, described first pronunciation syllable and described second syllable that pronounces is different syllable, and described first shape of the mouth as one speaks is different from the shape of the mouth as one speaks shape of described second shape of the mouth as one speaks.
For the execution mode presenting the different shape of the mouth as one speaks for different syllable, during specific implementation, can be such as the shape of the mouth as one speaks shape image of each pronunciation syllable configuration correspondence in advance, when session operational scenarios is divided into Speech time section and silence period, for the Speech time section marked off, can according to the pronunciation syllable of the shape information identification scene role of its audio frequency, and according to different pronunciation syllables, Speech time section further can be divided, the pronunciation syllable that each Speech time section that Further Division is obtained is corresponding different, the shape of the mouth as one speaks shape image of the pronunciation syllable of each Speech time section correspondence can be adopted respectively to carry out Dynamic Announce to the shape of the mouth as one speaks of scene role when operating in each Speech time section to make session operational scenarios.More specifically, a kind of possible execution mode can be such as, in the process running session operational scenarios, obtain the current form information of the audio frequency of session operational scenarios in real time and determine that current time belongs to time periods of silence or belongs to the Speech time section of which pronunciation syllable according to current form information, if determine that current time belongs to time periods of silence, then can carry out static state display to the scene role shape of the mouth as one speaks, if determine that current time belongs to the Speech time section of a pronunciation syllable, the shape of the mouth as one speaks shape image that then can call this pronunciation syllable carries out Dynamic Announce to the shape of the mouth as one speaks of scene role, another kind of possible execution mode can be such as, before session operational scenarios is run, the shape information of the audio frequency of session operational scenarios can be obtained in advance, and session operational scenarios is divided into time periods of silence according to the shape information of inscribing time each and corresponds respectively to the different phonetic time period of different pronunciation syllable, again the Speech time section of pronunciation syllable each in session operational scenarios and time periods of silence are recorded as the time period information of session operational scenarios, and in the process running session operational scenarios, can determine that current time belongs to time periods of silence or belongs to the Speech time section of which pronunciation syllable in real time by allocating time segment information, if determine that current time belongs to time periods of silence, then can carry out static state display to the scene role shape of the mouth as one speaks, if determine that current time belongs to the Speech time section of a pronunciation syllable, the shape of the mouth as one speaks shape image that then can call this pronunciation syllable carries out Dynamic Announce to the shape of the mouth as one speaks of scene role, another possible execution mode can be such as, before session operational scenarios is run, the shape information of the audio frequency of session operational scenarios can be obtained in advance, and session operational scenarios is divided into time periods of silence according to the shape information of inscribing time each and corresponds respectively to the different phonetic time period of different pronunciation syllable, again according to the Speech time section of the time periods of silence marked off and each pronunciation syllable, for session operational scenarios generating video image file, make the shape of the mouth as one speaks of static state display scene role in time periods of silence in this video image and utilize shape of the mouth as one speaks shape image corresponding to each pronunciation syllable to carry out the shape of the mouth as one speaks of Dynamic Announce scene role respectively in the Speech time section of the syllable that respectively pronounces, and when running session operational scenarios, only need to play this video image file, and identify that current time belongs to time periods of silence or belongs to the Speech time section of which pronunciation syllable in real time without the need to going again.
By the technical scheme of the present embodiment, in session operational scenarios, because audio volume control amplitude shows more greatly that scene role speaks and audio volume control amplitude is less shows that scene role is not speaking, therefore, in Speech time section Dynamic Announce scene role the shape of the mouth as one speaks and in the shape of the mouth as one speaks of time periods of silence static state display scene role, just can make the image of the only displayed scene role shape of the mouth as one speaks dynamic change when scene role speaks in session operational scenarios, and the image that the displayed scene role shape of the mouth as one speaks remains unchanged when session operational scenarios Scene role is silent, thus the conversation audio of each stage Scene role of session operational scenarios can both be matched with mouth shape image, obtain dialogue display effect more true to nature.
In order to make those skilled in the art more clearly understand the execution mode of the present invention under embody rule scene, below by two methods scene as a specific example, embodiment of the present invention is introduced.
In Application Scenarios-Example one, the Speech time section of session operational scenarios and time periods of silence are divided in advance by the first equipment and are recorded in time period information, are then determined the display mode of the scene role shape of the mouth as one speaks when session operational scenarios is run according to time period information by the second equipment.Wherein, first equipment can be such as the terminal equipment that technical staff writes the terminal equipment of application program, the server apparatus providing application program or user installation application client, and the second equipment can be such as the terminal equipment of user installation application client.Particularly, the execution mode of Application Scenarios-Example one, coordinate the flow chart of another embodiment of method for displaying image of audio frequency in the present invention that can be shown in Figure 3, the present embodiment such as specifically can comprise the steps:
The time period of step 301, the first device responds session operational scenarios in application programs divides instruction, obtains the audio volume control information of session operational scenarios.
Particularly, the first equipment, when getting the audio frequency of session operational scenarios, can obtain shape information by carrying out parsing to this audio frequency.
Session operational scenarios is divided into Speech time section and time periods of silence according to this audio volume control information by step 302, the first equipment.
Particularly, for any time in session operational scenarios, if the amplitude of wave form of shape information is greater than the first amplitude threshold, then this moment can be divided into Speech time section, if the amplitude of wave form of shape information is less than the second amplitude threshold, then this moment can be divided into time periods of silence.In addition, amplitude of wave form is greater than to the Speech time section of the first amplitude threshold, can also divides again according to the pronunciation syllable that shape information is corresponding, thus mark off the Speech time section of each corresponding different pronunciation syllable.Again in addition, be to the Speech time section of time periods of silence former and later two time periods, if this Speech time section is less than minimum interval, former and later two time periods of silence of this Speech time Duan Yuqi can also be merged into a time periods of silence, and, be to the time periods of silence of Speech time section former and later two time periods, if this time periods of silence is less than minimum interval, this time periods of silence and its former and later two Speech time sections can also be merged into a Speech time section, the Speech time section that finally obtains and time periods of silence can be made so to be all not less than minimum interval.
The Speech time section of session operational scenarios and time periods of silence are recorded to the time period information of session operational scenarios by step 303, the first equipment, and are saved in corresponding with session operational scenarios for time period information in application program.
Step 304, the second device responds, in the triggering command running session operational scenarios, call the time period information of session operational scenarios.
Step 305, the second equipment, in the running of session operational scenarios, according to the time period information of session operational scenarios, determine that current time belongs to Speech time section or time periods of silence in real time.
Step 306, the second device responds belong to Speech time section in current time, carry out Dynamic Announce to the shape of the mouth as one speaks of scene role.
Particularly, if can belong to which pronunciation syllable according to time period information determination Speech time section, the second equipment can utilize in advance for the shape of the mouth as one speaks shape image of this pronunciation syllable configuration carries out Dynamic Announce to scene role.
Step 307, the second device responds belong to time periods of silence in current time, carry out static state display to the shape of the mouth as one speaks of scene role.
By the technical scheme of the present embodiment, the image of the only displayed scene role shape of the mouth as one speaks dynamic change when scene role speaks in session operational scenarios can be made, and the image that the displayed scene role shape of the mouth as one speaks remains unchanged when session operational scenarios Scene role is silent, thus the conversation audio of each stage Scene role of session operational scenarios can both be matched with mouth shape image, obtain dialogue display effect more true to nature.
In Application Scenarios-Example two, first equipment is having configured the video image file of scene role shape of the mouth as one speaks display mode in each time period to generating according to time division section during session operational scenarios time division section in advance, when session operational scenarios is run, the second equipment only needs displaying video image file, and identifies Speech time section and time periods of silence without the need to going again.Wherein, first equipment can be such as the terminal equipment that technical staff writes the terminal equipment of application program, the server apparatus providing application program or user installation application client, and the second equipment can be such as the terminal equipment of user installation application client.Particularly, the execution mode of Application Scenarios-Example two, coordinates the flow chart of the another embodiment of the method for displaying image of audio frequency in the present invention that can be shown in Figure 4, the present embodiment such as specifically can comprise the steps:
The time period of step 401, the first device responds session operational scenarios in application programs divides instruction, obtains the audio volume control information of session operational scenarios.
Particularly, the first equipment, when getting the audio frequency of session operational scenarios, can obtain shape information by carrying out parsing to this audio frequency.
Session operational scenarios is divided into Speech time section and time periods of silence according to this audio volume control information by step 402, the first equipment.
Particularly, for any time in session operational scenarios, if the amplitude of wave form of shape information is greater than the first amplitude threshold, then this moment can be divided into Speech time section, if the amplitude of wave form of shape information is less than the second amplitude threshold, then this moment can be divided into time periods of silence.In addition, amplitude of wave form is greater than to the Speech time section of the first amplitude threshold, can also divides again according to the pronunciation syllable that shape information is corresponding, thus mark off the Speech time section of each corresponding different pronunciation syllable.Again in addition, be to the Speech time section of time periods of silence former and later two time periods, if this Speech time section is less than minimum interval, former and later two time periods of silence of this Speech time Duan Yuqi can also be merged into a time periods of silence, and, be to the time periods of silence of Speech time section former and later two time periods, if this time periods of silence is less than minimum interval, this time periods of silence and its former and later two Speech time sections can also be merged into a Speech time section, the Speech time section that finally obtains and time periods of silence can be made so to be all not less than minimum interval.
Step 403, the first equipment, according to the Speech time section of session operational scenarios and time periods of silence, generate the video image file of session operational scenarios, and are saved in corresponding with session operational scenarios for video image file in application program.
Wherein, in video image file, the shape of the mouth as one speaks of static state display scene role in time periods of silence, the shape of the mouth as one speaks of Dynamic Announce scene role in Speech time section.Further, if Speech time section is become the Speech time section of each corresponding different pronunciation syllable by Further Division, then in video image file, in the Speech time section of each pronunciation syllable, be utilized as the shape of the mouth as one speaks shape image Dynamic Announce scene role of each pronunciation syllable configuration.
Step 404, the second device responds are in the triggering command running session operational scenarios, and the video image file calling session operational scenarios is play.
By the technical scheme of the present embodiment, the image of the only displayed scene role shape of the mouth as one speaks dynamic change when scene role speaks in session operational scenarios can be made, and the image that the displayed scene role shape of the mouth as one speaks remains unchanged when session operational scenarios Scene role is silent, thus the conversation audio of each stage Scene role of session operational scenarios can both be matched with mouth shape image, obtain dialogue display effect more true to nature.
example devices
After the method describing exemplary embodiment of the invention, next, with reference to figure 5 pairs of exemplary embodiment of the invention, for coordinating the image display of audio frequency to be introduced.
See Fig. 5, show in the present invention the structure chart of image display one embodiment coordinating audio frequency.At the present embodiment, described equipment such as specifically can comprise:
Run module 501, for running session operational scenarios;
Dynamic display module 502, for when described session operational scenarios operates in Speech time section, carries out Dynamic Announce to the shape of the mouth as one speaks of scene role;
Static state display module 503, for when described session operational scenarios operates in time periods of silence, carries out static state display to the shape of the mouth as one speaks of scene role;
Wherein, the shape information of described Speech time section and described time periods of silence audio frequency corresponding to described session operational scenarios divides described session operational scenarios and obtains, wherein, in described Speech time section, the amplitude of wave form of described shape information is greater than the first amplitude threshold, in described time periods of silence, the amplitude of wave form of described shape information is less than the second amplitude threshold, wherein, described first amplitude threshold is not less than described second amplitude threshold.
Optionally, in some execution modes of the present embodiment, the Speech time section of described session operational scenarios and described time periods of silence such as can be recorded in the time period information in advance for described session operational scenarios configuration;
Described dynamic display module 502, specifically for determining that current described session operational scenarios operates in Speech time section in response in the process running described session operational scenarios according to described time period information, carries out Dynamic Announce to the shape of the mouth as one speaks of scene role;
Described static state display module 503, specifically for determining that current described session operational scenarios operates in time periods of silence in response in the process running described session operational scenarios according to time period information, carries out static state display to the shape of the mouth as one speaks of scene role.
Optionally, in other execution modes of the present embodiment, the described shape of the mouth as one speaks to scene role carries out Dynamic Announce, with, the described shape of the mouth as one speaks to scene role is carried out static state and is presented, such as, can be all by playing in advance for the video image file of described session operational scenarios configuration realizes in the process running described session operational scenarios; In the image of described video image file in Speech time section, the shape of the mouth as one speaks dynamic change of scene role; In the image of described video image file in time periods of silence, the shape of the mouth as one speaks static state of scene role is constant.
Optionally, in the other execution mode of the present embodiment, described dynamic display module 502 such as specifically can comprise:
First Dynamic Announce submodule, for when described session operational scenarios operates in the first Speech time section, adopts first shape of the mouth as one speaks to carry out Dynamic Announce to the shape of the mouth as one speaks of scene role;
Second Dynamic Announce submodule, for when described session operational scenarios operates in the second Speech time section, adopts second shape of the mouth as one speaks to carry out Dynamic Announce to the shape of the mouth as one speaks of scene role;
Wherein, described first Speech time section and described second Speech time section to divide described Speech time section according to the speech syllable of described Speech time section audio and obtain, wherein, in described first Speech time section, described speech syllable is the first pronunciation syllable, and in described second Speech time section, described speech syllable is the second pronunciation syllable;
Wherein, described first shape of the mouth as one speaks is different from the shape of the mouth as one speaks shape of described second shape of the mouth as one speaks.
Optionally, in some execution modes again of the present embodiment, the Speech time section of described session operational scenarios and time periods of silence such as all can be not less than default minimum interval.
By the technical scheme of the present embodiment, in session operational scenarios, because audio volume control amplitude shows more greatly that scene role speaks and audio volume control amplitude is less shows that scene role is not speaking, therefore, in Speech time section Dynamic Announce scene role the shape of the mouth as one speaks and in the shape of the mouth as one speaks of time periods of silence static state display scene role, just can make the image of the only displayed scene role shape of the mouth as one speaks dynamic change when scene role speaks in session operational scenarios, and the image that the displayed scene role shape of the mouth as one speaks remains unchanged when session operational scenarios Scene role is silent, thus the conversation audio of each stage Scene role of session operational scenarios can both be matched with mouth shape image, obtain dialogue display effect more true to nature.
Although it should be noted that the some modules or submodule that are referred to the image display coordinating audio frequency in above-detailed, this division is only not enforceable.In fact, according to the embodiment of the present invention, the Characteristic and function of two or more modules above-described can be specialized in a module.Otherwise, the Characteristic and function of an above-described module can Further Division for be specialized by multiple module.
In addition, although describe the operation of the inventive method in the accompanying drawings with particular order, this is not that requirement or hint must perform these operations according to this particular order, or must perform the result that all shown operation could realize expectation.Additionally or alternatively, some step can be omitted, multiple step be merged into a step and perform, and/or a step is decomposed into multiple step and perform.
Although describe spirit of the present invention and principle with reference to some embodiments, but should be appreciated that, the present invention is not limited to disclosed embodiment, can not combine to be benefited to the feature that the division of each side does not mean that in these aspects yet, this division is only the convenience in order to state.The present invention is intended to contain the interior included various amendment of spirit and scope and the equivalent arrangements of claims.
Accompanying drawing explanation
By reference to accompanying drawing reading detailed description hereafter, above-mentioned and other objects of exemplary embodiment of the invention, feature and advantage will become easy to understand.In the accompanying drawings, show some execution modes of the present invention by way of example, and not by way of limitation, wherein:
Fig. 1 schematically shows the block schematic illustration of an exemplary application scene of embodiment of the present invention;
Fig. 2 schematically shows in the present invention the flow chart of method for displaying image one embodiment coordinating audio frequency;
Fig. 3 schematically shows in the present invention the flow chart of another embodiment of method for displaying image coordinating audio frequency;
Fig. 4 schematically shows in the present invention the flow chart coordinating the another embodiment of the method for displaying image of audio frequency;
Fig. 5 schematically shows in the present invention the structure chart of image display one embodiment coordinating audio frequency;
In the accompanying drawings, identical or corresponding label represents identical or corresponding part.
Embodiment
Below with reference to some illustrative embodiments, principle of the present invention and spirit are described.Should be appreciated that providing these execution modes is only used to enable those skilled in the art understand better and then realize the present invention, and not limit the scope of the invention by any way.On the contrary, provide these execution modes to be to make the disclosure more thorough and complete, and the scope of the present disclosure intactly can be conveyed to those skilled in the art.
One skilled in the art will appreciate that embodiments of the present invention can be implemented as a kind of system, device, equipment, method or computer program.Therefore, the disclosure can be implemented as following form, that is: hardware, completely software (comprising firmware, resident software, microcode etc.) completely, or the form that hardware and software combines.
According to the embodiment of the present invention, a kind of method for displaying image and the equipment that coordinate audio frequency is proposed.
In this article, it will be appreciated that, involved term " session operational scenarios " represents in application program the story of a play or opera scene fragment comprising scene part dialog, " session operational scenarios " can be realized by one or one group of file, and application program can be realized " session operational scenarios " and run by the file calling " session operational scenarios ".Wherein, have the application program of " session operational scenarios ", can be such as game application, Computer Simulation application etc., the present invention limit this.In addition, any number of elements in accompanying drawing is all unrestricted for example, and any name is all only for distinguishing, and does not have any limitation.
Below with reference to some representative embodiments of the present invention, explaination principle of the present invention and spirit in detail.

Claims (10)

1. a method, comprising:
Run session operational scenarios;
When described session operational scenarios operates in Speech time section, Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role;
When described session operational scenarios operates in time periods of silence, static state display is carried out to the shape of the mouth as one speaks of scene role;
Wherein, the shape information of described Speech time section and described time periods of silence audio frequency corresponding to described session operational scenarios divides described session operational scenarios and obtains, wherein, in described Speech time section, the amplitude of wave form of described shape information is greater than the first amplitude threshold, in described time periods of silence, the amplitude of wave form of described shape information is less than the second amplitude threshold, wherein, described first amplitude threshold is not less than described second amplitude threshold.
2. method according to claim 1, wherein, the described Speech time section of described session operational scenarios and described time periods of silence are recorded in the time period information in advance for described session operational scenarios configuration;
It is described when described session operational scenarios operates in Speech time section, Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role, be specially: determine that current described session operational scenarios operates in Speech time section in response in the process running described session operational scenarios according to described time period information, Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role;
It is described when described session operational scenarios operates in time periods of silence, static state display is carried out to the shape of the mouth as one speaks of scene role, be specially: determine that current described session operational scenarios operates in time periods of silence in response in the process running described session operational scenarios according to time period information, static state display is carried out to the shape of the mouth as one speaks of scene role.
3. method according to claim 1, wherein, the described shape of the mouth as one speaks to scene role carries out Dynamic Announce, with, the described shape of the mouth as one speaks to scene role is carried out static state and is presented, and is all by playing in advance for the video image file of described session operational scenarios configuration realizes in the process running described session operational scenarios; In the image of described video image file in Speech time section, the shape of the mouth as one speaks dynamic change of scene role; In the image of described video image file in time periods of silence, the shape of the mouth as one speaks static state of scene role is constant.
4. method according to claim 1, wherein, when described session operational scenarios operates in Speech time section, Dynamic Announce is carried out to the shape of the mouth as one speaks of scene role, comprising:
When described session operational scenarios operates in the first Speech time section, first shape of the mouth as one speaks is adopted to carry out Dynamic Announce to the shape of the mouth as one speaks of scene role;
When described session operational scenarios operates in the second Speech time section, second shape of the mouth as one speaks is adopted to carry out Dynamic Announce to the shape of the mouth as one speaks of scene role;
Wherein, described first Speech time section and described second Speech time section to divide described Speech time section according to the speech syllable of the corresponding audio frequency of described Speech time section and obtain, wherein, in described first Speech time section, described speech syllable is the first pronunciation syllable, and in described second Speech time section, described speech syllable is the second pronunciation syllable;
Wherein, described first shape of the mouth as one speaks is different from the shape of the mouth as one speaks shape of described second shape of the mouth as one speaks.
5. method according to claim 1, wherein, the Speech time section of described session operational scenarios and time periods of silence are all not less than default minimum interval.
6. an equipment, comprising:
Run module, for running session operational scenarios;
Dynamic display module, for when described session operational scenarios operates in Speech time section, carries out Dynamic Announce to the shape of the mouth as one speaks of scene role;
Static state display module, for when described session operational scenarios operates in time periods of silence, carries out static state display to the shape of the mouth as one speaks of scene role;
Wherein, the shape information of described Speech time section and described time periods of silence audio frequency corresponding to described session operational scenarios divides described session operational scenarios and obtains, wherein, in described Speech time section, the amplitude of wave form of described shape information is greater than the first amplitude threshold, in described time periods of silence, the amplitude of wave form of described shape information is less than the second amplitude threshold, wherein, described first amplitude threshold is not less than described second amplitude threshold.
7. equipment according to claim 6, wherein, the Speech time section of described session operational scenarios and described time periods of silence are recorded in the time period information in advance for described session operational scenarios configuration;
Described dynamic display module, specifically for determining that current described session operational scenarios operates in Speech time section in response in the process running described session operational scenarios according to described time period information, carries out Dynamic Announce to the shape of the mouth as one speaks of scene role;
Described static state display module, specifically for determining that current described session operational scenarios operates in time periods of silence in response in the process running described session operational scenarios according to time period information, carries out static state display to the shape of the mouth as one speaks of scene role.
8. equipment according to claim 6, wherein, the described shape of the mouth as one speaks to scene role carries out Dynamic Announce, with, the described shape of the mouth as one speaks to scene role is carried out static state and is presented, and is all by playing in advance for the video image file of described session operational scenarios configuration realizes in the process running described session operational scenarios; In the image of described video image file in Speech time section, the shape of the mouth as one speaks dynamic change of scene role; In the image of described video image file in time periods of silence, the shape of the mouth as one speaks static state of scene role is constant.
9. equipment according to claim 6, wherein, described dynamic display module comprises:
First Dynamic Announce submodule, for when described session operational scenarios operates in the first Speech time section, adopts first shape of the mouth as one speaks to carry out Dynamic Announce to the shape of the mouth as one speaks of scene role;
Second Dynamic Announce submodule, for when described session operational scenarios operates in the second Speech time section, adopts second shape of the mouth as one speaks to carry out Dynamic Announce to the shape of the mouth as one speaks of scene role;
Wherein, described first Speech time section and described second Speech time section to divide described Speech time section according to the speech syllable of described Speech time section audio and obtain, wherein, in described first Speech time section, described speech syllable is the first pronunciation syllable, and in described second Speech time section, described speech syllable is the second pronunciation syllable;
Wherein, described first shape of the mouth as one speaks is different from the shape of the mouth as one speaks shape of described second shape of the mouth as one speaks.
10. equipment according to claim 6, wherein, the Speech time section of described session operational scenarios and time periods of silence are all not less than default minimum interval.
CN201510279742.7A 2015-05-27 2015-05-27 A kind of method for displaying image and equipment of cooperation audio Active CN104869326B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510279742.7A CN104869326B (en) 2015-05-27 2015-05-27 A kind of method for displaying image and equipment of cooperation audio

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510279742.7A CN104869326B (en) 2015-05-27 2015-05-27 A kind of method for displaying image and equipment of cooperation audio

Publications (2)

Publication Number Publication Date
CN104869326A true CN104869326A (en) 2015-08-26
CN104869326B CN104869326B (en) 2018-09-11

Family

ID=53914807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510279742.7A Active CN104869326B (en) 2015-05-27 2015-05-27 A kind of method for displaying image and equipment of cooperation audio

Country Status (1)

Country Link
CN (1) CN104869326B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109168067A (en) * 2018-11-02 2019-01-08 深圳Tcl新技术有限公司 Video timing correction method, correction terminal and computer readable storage medium
CN109600628A (en) * 2018-12-21 2019-04-09 广州酷狗计算机科技有限公司 Video creating method, device, computer equipment and storage medium
CN113421543A (en) * 2021-06-30 2021-09-21 深圳追一科技有限公司 Data labeling method, device and equipment and readable storage medium
CN113660537A (en) * 2021-09-28 2021-11-16 北京七维视觉科技有限公司 Subtitle generating method and device
CN117714763A (en) * 2024-02-05 2024-03-15 深圳市鸿普森科技股份有限公司 Virtual object speaking video generation method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050207582A1 (en) * 2004-03-17 2005-09-22 Kohei Asada Test apparatus, test method, and computer program
CN1731833A (en) * 2005-08-23 2006-02-08 孙丹 Method for composing audio/video file by voice driving head image
CN101482976A (en) * 2009-01-19 2009-07-15 腾讯科技(深圳)有限公司 Method for driving change of lip shape by voice, method and apparatus for acquiring lip cartoon
CN101751692A (en) * 2009-12-24 2010-06-23 四川大学 Method for voice-driven lip animation
US20140314391A1 (en) * 2013-03-18 2014-10-23 Samsung Electronics Co., Ltd. Method for displaying image combined with playing audio in an electronic device
CN104144280A (en) * 2013-05-08 2014-11-12 上海恺达广告有限公司 Voice and action animation synchronous control and device of electronic greeting card
CN104574478A (en) * 2014-12-30 2015-04-29 北京像素软件科技股份有限公司 Method and device for editing mouth shapes of animation figures

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050207582A1 (en) * 2004-03-17 2005-09-22 Kohei Asada Test apparatus, test method, and computer program
CN1731833A (en) * 2005-08-23 2006-02-08 孙丹 Method for composing audio/video file by voice driving head image
CN101482976A (en) * 2009-01-19 2009-07-15 腾讯科技(深圳)有限公司 Method for driving change of lip shape by voice, method and apparatus for acquiring lip cartoon
CN101751692A (en) * 2009-12-24 2010-06-23 四川大学 Method for voice-driven lip animation
US20140314391A1 (en) * 2013-03-18 2014-10-23 Samsung Electronics Co., Ltd. Method for displaying image combined with playing audio in an electronic device
CN104144280A (en) * 2013-05-08 2014-11-12 上海恺达广告有限公司 Voice and action animation synchronous control and device of electronic greeting card
CN104574478A (en) * 2014-12-30 2015-04-29 北京像素软件科技股份有限公司 Method and device for editing mouth shapes of animation figures

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109168067A (en) * 2018-11-02 2019-01-08 深圳Tcl新技术有限公司 Video timing correction method, correction terminal and computer readable storage medium
CN109600628A (en) * 2018-12-21 2019-04-09 广州酷狗计算机科技有限公司 Video creating method, device, computer equipment and storage medium
CN113421543A (en) * 2021-06-30 2021-09-21 深圳追一科技有限公司 Data labeling method, device and equipment and readable storage medium
CN113421543B (en) * 2021-06-30 2024-05-24 深圳追一科技有限公司 Data labeling method, device, equipment and readable storage medium
CN113660537A (en) * 2021-09-28 2021-11-16 北京七维视觉科技有限公司 Subtitle generating method and device
CN117714763A (en) * 2024-02-05 2024-03-15 深圳市鸿普森科技股份有限公司 Virtual object speaking video generation method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN104869326B (en) 2018-09-11

Similar Documents

Publication Publication Date Title
US11158102B2 (en) Method and apparatus for processing information
CN104869326A (en) Image display method for cooperating with audios and equipment thereof
US11183187B2 (en) Dialog method, dialog system, dialog apparatus and program that gives impression that dialog system understands content of dialog
JP2019102063A (en) Method and apparatus for controlling page
CN107423364B (en) Method, device and storage medium for answering operation broadcasting based on artificial intelligence
Lee et al. MMDAgent—A fully open-source toolkit for voice interaction systems
US11308671B2 (en) Method and apparatus for controlling mouth shape changes of three-dimensional virtual portrait
CN109447234A (en) A kind of model training method, synthesis are spoken the method and relevant apparatus of expression
US20180247443A1 (en) Emotional analysis and depiction in virtual reality
CN112309365B (en) Training method and device of speech synthesis model, storage medium and electronic equipment
CN109754783A (en) Method and apparatus for determining the boundary of audio sentence
JP2023525173A (en) Conversational AI platform with rendered graphical output
KR20190109651A (en) Voice imitation conversation service providing method and sytem based on artificial intelligence
CN110136715A (en) Audio recognition method and device
Yamamoto et al. Voice interaction system with 3D-CG virtual agent for stand-alone smartphones
JP2008125815A (en) Conversation robot system
CN111768759A (en) Method and apparatus for generating information
Umetani et al. Scalable component-based manzai robots as automated funny content generators
CN111128120B (en) Text-to-speech method and device
CN113707124A (en) Linkage broadcasting method and device of voice operation, electronic equipment and storage medium
US11848011B1 (en) Systems and methods for language translation during live oral presentation
KR102153220B1 (en) Method for outputting speech recognition results based on determination of sameness and appratus using the same
CN114201596A (en) Virtual digital human use method, electronic device and storage medium
CN112289298A (en) Processing method and device for synthesized voice, storage medium and electronic equipment
CN113157241A (en) Interaction equipment, interaction device and interaction system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant