CN108234735A - A kind of media display methods and terminal - Google Patents
A kind of media display methods and terminal Download PDFInfo
- Publication number
- CN108234735A CN108234735A CN201611154485.5A CN201611154485A CN108234735A CN 108234735 A CN108234735 A CN 108234735A CN 201611154485 A CN201611154485 A CN 201611154485A CN 108234735 A CN108234735 A CN 108234735A
- Authority
- CN
- China
- Prior art keywords
- phonetic
- image
- mouth shape
- pronunciation mouth
- mapping table
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72484—User interfaces specially adapted for cordless or mobile telephones wherein functions are triggered by incoming communication events
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
- G06V40/166—Detection; Localisation; Normalisation using acquisition arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/724—User interfaces specially adapted for cordless or mobile telephones
- H04M1/72403—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
- H04M1/7243—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
- H04M1/72439—User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for image or video messaging
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/72—Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
- H04M1/725—Cordless telephones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/141—Systems for two-way working between two video terminals, e.g. videophone
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2250/00—Details of telephonic subscriber devices
- H04M2250/74—Details of telephonic subscriber devices with voice recognition means
Abstract
The present invention provides a kind of media display methods and terminal, wherein method includes:Acquire destination call voice;According to the destination call voice, obtain and show image with an at least frame target medium for the destination call voice match;Image, which plays out display, to be shown to an at least frame target medium by display interface.The program makes with the video pictures of virtualization in communication process, and communication process is more fresh and alive, agile, strengthens communication efficiency, adds communication enjoyment.
Description
Technical field
The present invention relates to field of communication technology more particularly to a kind of media display methods and terminal.
Background technology
People are during communication telephones, and traditional function mobile phone can only carry out communication transmission by voice, and traditional call does not have
There is the specific video scene of image, it is more vivid specific deep that audio communication can not show a candle to video communication.
Existing mobile phone can enter network environment by SIM card/wireless WIFI etc., realize video calling.But video calling
It generally requires dependent on Wireless Networking, and actual conditions not have wireless network covering whenever and wherever possible, are networked by phonecard
Video calling, expensive, which relies on network environment, is confined to the constraint of network environment, and video conversation can not become logical
Believe normality, and people converse when exchanging, only speech exchange underaction is lively, and user experience is insufficient.
Invention content
A kind of media display methods and terminal in the embodiment of the present invention are provided, do not had to solve traditional call in the prior art
The specific video scene of image, and Internet video call is confined to the constraint of network environment, it is of high cost the problem of.
In order to solve the above-mentioned technical problem, the embodiment of the present invention adopts the following technical scheme that:
On the one hand, the embodiment of the present invention provides a kind of media display methods, including:
Acquire destination call voice;
According to the destination call voice, obtain and shown with an at least frame target medium for the destination call voice match
Image;
Image, which plays out display, to be shown to an at least frame target medium by display interface.
Optionally, it is described according to the destination call voice, obtain at least frame with the destination call voice match
Target medium shows image, including:
Determine target expression packet corresponding with the destination call voice;
According to the vocal print feature of the destination call voice, determine to send out at least one needed for the destination call voice
The target speaker shape of the mouth as one speaks;
At least frame target medium that matching obtains including the target speaker shape of the mouth as one speaks from the target expression packet is shown
Image.
Optionally, the vocal print feature according to the destination call voice determines to send out the destination call voice institute
At least one target speaker shape of the mouth as one speaks needed, including:
According to the vocal print feature of the destination call voice, word content corresponding with the destination call voice is determined;
According to the word content, from the mapping table of a word and phonetic, corresponding pinyin combinations are matched;
According to the pinyin combinations, from the mapping table of a phonetic and pronunciation mouth shape, matching is corresponding at least one
Pronunciation mouth shape;
It is at least one target speaker sent out needed for the destination call voice to determine at least one pronunciation mouth shape
The shape of the mouth as one speaks.
Optionally, the vocal print feature according to the destination call voice determines to send out the destination call voice institute
At least one target speaker shape of the mouth as one speaks needed, including:
According to the vocal print feature of the destination call voice, word content corresponding with the destination call voice is determined;
According to the word content, from word and the mapping table of phonetic notation, matching is corresponding with the word content
Phonetic notation is combined;
It is combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, matches corresponding at least one hair
The sound shape of the mouth as one speaks;
It is at least one target speaker sent out needed for the destination call voice to determine at least one pronunciation mouth shape
The shape of the mouth as one speaks.
Optionally, the word and the mapping table of phonetic notation include the mapping table of Chinese character and phonetic;It is described according to
According to the word content, from word and the mapping table of phonetic notation, match phonetic notation corresponding with the word content and combine, wrap
It includes:
According to the word content, from the mapping table of Chinese character and phonetic, matching is corresponding with the word content
Pinyin combinations;
The mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of phonetic and pronunciation mouth shape, it is described according to
It is combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, matches corresponding at least one pronunciation mouth shape, packet
It includes:
According to the pinyin combinations, from the mapping table of phonetic and pronunciation mouth shape, corresponding at least one hair is matched
The sound shape of the mouth as one speaks.
Optionally, the mapping table of the phonetic and pronunciation mouth shape includes initial consonant and simple or compound vowel of a Chinese syllable and pronunciation mouth shape in phonetic
Correspondence;It is described according to the pinyin combinations, from the mapping table of phonetic and pronunciation mouth shape, matching is corresponding at least
One pronunciation mouth shape, including:
According to the pinyin combinations, initial consonant and simple or compound vowel of a Chinese syllable included in the pinyin combinations are determined, alternatively, determining the spelling
Simple or compound vowel of a Chinese syllable included in sound combination;
According to the initial consonant and simple or compound vowel of a Chinese syllable, alternatively, according to the simple or compound vowel of a Chinese syllable, it is corresponding with pronunciation mouth shape from the initial consonant and simple or compound vowel of a Chinese syllable
In relationship, matching obtains corresponding at least one pronunciation mouth shape.
Optionally, the word and the mapping table of phonetic notation include the mapping table of English and English phonetic;Institute
It states according to the word content, from word and the mapping table of phonetic notation, matches phonetic notation group corresponding with the word content
It closes, including:
According to the word content, in the mapping table from English with English phonetic, matching and the word content pair
The phonetic symbol combination answered;
The mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of English phonetic and pronunciation mouth shape, institute
It states and is combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, match corresponding at least one pronunciation mouth shape,
Including:
It is combined according to the phonetic symbol, from the mapping table of English phonetic and pronunciation mouth shape, matching corresponding at least one
A pronunciation mouth shape.
Optionally, the mapping table of the English phonetic and pronunciation mouth shape include English phonetic medial vowel and consonant with
The correspondence of pronunciation mouth shape;It is described to be combined according to the phonetic symbol, from the mapping table of English phonetic and pronunciation mouth shape,
With corresponding at least one pronunciation mouth shape, including:
It is combined according to the phonetic symbol, vowel and consonant included in the phonetic symbol combination is determined, alternatively, determining the sound
Vowel included in mark combination;
According to the vowel and consonant, alternatively, according to the vowel, it is corresponding with pronunciation mouth shape from the vowel and consonant
In relationship, matching obtains corresponding at least one pronunciation mouth shape.
Optionally, it is described to determine target expression packet corresponding with the destination call voice, including:
Determine the object contact person corresponding to the destination call voice;
Transfer the target expression packet with the object contact person pre-association.
Optionally, the acquisition destination call voice, including:
Monitor voice communication process;
It is the destination call voice to determine the other side's call voice received.
Optionally, it before the step of acquisition destination call voice, further includes:
The personal images of material resource packet and call contact person are obtained, wherein, at least one is included in the material resource packet
A media materials image;
The personal images of the call contact person and each media materials image are integrated, generation is at least one
Media show image, obtain showing the expression packet of image comprising at least one media.
Optionally, the personal images include personal face image, and the media materials image is included in phonetic
The pronunciation mouth shape image corresponding to pronunciation mouth shape image or English phonetic medial vowel and consonant corresponding to initial consonant and simple or compound vowel of a Chinese syllable, it is described
The personal images of the call contact person and each media materials image are integrated, at least one media is generated and shows
Image obtains showing the expression packet of image comprising at least one media, including:
Identify the mouth region in the face image;
Pronunciation mouth shape image in the media materials image is filled replacement in the mouth region;
Generation media display image corresponding with each pronunciation mouth shape image in the media materials image, obtains
The expression packet of image is shown comprising media corresponding with each pronunciation mouth shape image.
Optionally, it is described that image, which plays out display, to be shown to an at least frame target medium by display interface, it wraps
It includes:
By the display interface, using the display image acquired as background frame, to an at least frame target
Media show that image plays out display.On the other hand, the embodiment of the present invention also provides a kind of medium display terminal end, including:
Acquisition module, for acquiring destination call voice;
First acquisition module, for according to the destination call voice, obtaining with the destination call voice match extremely
A few frame target medium shows image;
Display module shows that image plays out display for passing through display interface to an at least frame target medium.
Optionally, the first acquisition module, including:
First determination sub-module, for determining target expression packet corresponding with the destination call voice;
Second determination sub-module for the vocal print feature according to the destination call voice, determines that sending out the target leads to
At least one target speaker shape of the mouth as one speaks needed for language sound;
Acquisition submodule obtains including at least the one of the target speaker shape of the mouth as one speaks for matching from the target expression packet
Frame target medium shows image.
Optionally, second determination sub-module, including:
First determination unit for the vocal print feature according to the destination call voice, determines and the destination call language
The corresponding word content of sound;
First matching unit, for according to the word content, from the mapping table of a word and phonetic, matching pair
The pinyin combinations answered;
Second matching unit, for according to the pinyin combinations, from the mapping table of a phonetic and pronunciation mouth shape,
With corresponding at least one pronunciation mouth shape;
Second determination unit, for determining that at least one pronunciation mouth shape is to send out needed for the destination call voice
At least one target speaker shape of the mouth as one speaks.
Optionally, the mapping table of the phonetic and pronunciation mouth shape includes initial consonant and simple or compound vowel of a Chinese syllable and pronunciation mouth shape in phonetic
Correspondence, second matching unit, including:
Determination subelement, for according to the pinyin combinations, determining initial consonant and/or rhythm included in the pinyin combinations
It is female;
Coupling subelement, it is corresponding with pronunciation mouth shape from the initial consonant and simple or compound vowel of a Chinese syllable for according to the initial consonant and/or simple or compound vowel of a Chinese syllable
In relationship, matching obtains corresponding at least one pronunciation mouth shape.
Optionally, first determination sub-module, including:
Third determination unit, for determining the object contact person corresponding to the destination call voice;
Unit is transferred, for transferring the target expression packet with the object contact person pre-association.
Optionally, the acquisition module, including:
Submodule is monitored, for monitoring voice communication process;
Third determination sub-module, for determining that the other side's call voice received is the destination call voice.
Optionally, it further includes:Second acquisition module, for obtaining the personal images of material resource packet and call contact person,
Wherein, at least one media materials image is included in the material resource packet;
Generation module, it is whole for the personal images of the call contact person to be carried out with each media materials image
It closes, generates at least one media and show image, obtain showing the expression packet of image comprising at least one media.
Optionally, the personal images include personal face image, and the media materials image is included in phonetic
The pronunciation mouth shape image corresponding to pronunciation mouth shape image or English phonetic medial vowel and consonant corresponding to initial consonant and simple or compound vowel of a Chinese syllable, it is described
Generation module, including:
Submodule is identified, for identifying the mouth region in the face image;
Replacement module, for the pronunciation mouth shape image in the media materials image to be filled in the mouth region
It replaces;
Submodule is generated, for generating matchmaker corresponding with each pronunciation mouth shape image in the media materials image
Body shows image, obtains showing the expression packet of image comprising media corresponding with each pronunciation mouth shape image.
Optionally, the display module, including:
Display sub-module, for passing through the display interface, using the display image acquired as background frame, to institute
It states an at least frame target medium and shows that image plays out display.
One or more embodiments of the invention has the advantages that:
In the embodiment of the present invention, according to collected call voice, matching obtains an at least frame target medium and shows image,
Those target mediums are shown with image carries out the continuous result of broadcast for playing display, forming video pattern, realizes speech recognition
It is converted to video to show, can break through network environment limitation by the local software application model of terminal device, virtually regarded
Frequency communication on telephone, the process do not depend on network, save flow and even break away from flow restriction, make in communication process with virtualization
Video pictures, communication process is more fresh and alive, agile, strengthens communication efficiency, adds communication enjoyment, prompts user experience.
Description of the drawings
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is part of the embodiment of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, the every other implementation that those of ordinary skill in the art are obtained without creative efforts
Example, shall fall within the protection scope of the present invention.
Fig. 1 shows the flow diagrams of media display methods in first embodiment of the invention;
Fig. 2 represents the flow diagram of media display methods in second embodiment of the invention;
Fig. 3 represents the structure diagram of medium display terminal end in third embodiment of the invention;
Fig. 4 represents the operator template schematic diagram in the embodiment of the present invention.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is part of the embodiment of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, the every other implementation that those of ordinary skill in the art are obtained without creative efforts
Example, shall fall within the protection scope of the present invention.
First embodiment
A kind of media display methods disclosed in the embodiment of the present invention, with reference to shown in Fig. 1, including:
Step 101:Acquire destination call voice.
The step can betide during normal telephone relation broadcasting for voice in the either instant message applications such as wechat
During putting, which can be the contact person that the voice of the one's own side in call is either conversed with oneself
Sound, the destination call voice form next process object.
When during applied to normal telephone relation, as a specific embodiment, the acquisition destination call language
The step of sound, can include:Monitor voice communication process;It is the destination call voice to determine the other side's call voice received.
That is, the call voice of the other side contact person in call is acquired, next processing procedure is carried out, realizes entertaining call.
Step 102:According to destination call voice, obtain and shown with an at least frame target medium for destination call voice match
Image.
Wherein, which shows that image can carry out matching choosing in the one group of image packet set
It takes.An at least frame target medium shows that image can be matched with the dialog context of destination call voice or and mesh
The tone of sound of call voice, volume or the shape of the mouth as one speaks needed for sounding is marked to match, the target medium show image can be expression,
Either limb action or it is the different symbol of height or is place corresponding with voice content and background environment.
Step 103:Image, which plays out display, to be shown to an at least frame target medium by display interface.
Accordingly, during this shows an at least frame target medium image carries out continuous broadcasting display, in addition to expression
Become outside the pale of civilization with voice difference, can also be that limb action is switched over voice difference.
The process, according to collected call voice, matching obtains an at least frame target medium and shows image, to those mesh
Mark media show that image plays out display, form the result of broadcast of video pattern, realize that speech recognition conversion is shown for video,
It can break through network environment limitation by the local software application model of terminal device, carry out virtual video communication on telephone, the mistake
Journey does not depend on network, saves flow and even breaks away from flow restriction, makes to converse with the video pictures of virtualization in communication process
Cheng Gengjia is fresh and alive, agile, strengthens communication efficiency, adds communication enjoyment, prompts user experience.
Second embodiment
A kind of media display methods disclosed in the embodiment of the present invention, with reference to shown in Fig. 2, including:
Step 201:Acquire destination call voice.
When during applied to normal telephone relation, as a specific embodiment, the acquisition destination call language
The step of sound, can include:Monitor voice communication process;It is the destination call voice to determine the other side's call voice received.
That is, the call voice of the other side contact person in call is acquired, next processing procedure is carried out, realizes entertaining call.
Step 202:Determine target expression packet corresponding with destination call voice.
The target expression packet can be the fixed expression packet to match from different destination call voices of setting,
Directly it is determined and reads from memory device;Also either having targetedly can be with specific in destination call voice
Element and the expression packet changed, need according to destination call voice, carry out matching and determine.
As a specific embodiment, wherein, this determines target expression packet corresponding with the destination call voice
Step, including:
Determine the object contact person corresponding to the destination call voice;Transfer the mesh with the object contact person pre-association
Mark expression packet.The expression packet can have the resource content for realizing the correspondence established with object contact person, such as:When
When knowing that contact person phones the mobile phone of user, the mobile phone of user can know that the destination call voice sent out is the contact
What human hair went out, it is corresponding to be adapted to the corresponding expression packet of incoming call connection people according to address list information.Specifically can be and object contact person
My photo either particular picture, different settings is done for specific contact person, makes display result more targeted,
It is more interesting.
Step 203:According to the vocal print feature of destination call voice, determine to send out at least one needed for destination call voice
The target speaker shape of the mouth as one speaks.
Different call voices will correspond to different pronunciation mouth shapes, and different voices has different vocal print feature information,
Specifically, can required pronunciation mouth shape be determined according to the vocal print feature in the destination call voice collected.
As a specific embodiment, wherein, which determines to send out institute
The step of stating at least one target speaker shape of the mouth as one speaks needed for destination call voice, including:
According to the vocal print feature of destination call voice, word content corresponding with destination call voice is determined;According to this article
Word content from word and the mapping table of phonetic notation, matches phonetic notation corresponding with the word content and combines;According to the phonetic notation group
It closes, from the mapping table of phonetic notation and pronunciation mouth shape, matches corresponding at least one pronunciation mouth shape;Determine at least one hair
The sound shape of the mouth as one speaks is at least one target speaker shape of the mouth as one speaks sent out needed for the destination call voice.
Wherein, which is the pronunciation marked content that word occurs, and different language corresponds to different words, different
Word corresponds to different phonetic notation systems, for example, when word is Chinese, corresponding phonetic notation is phonetic, when word is English,
Corresponding phonetic notation is English phonetic.The vocal print feature of the foundation destination call voice determines the process of corresponding pronunciation mouth shape, needs
Word is first converted speech into, is combined by the corresponding phonetic notation of characters matching, corresponding target is combined so as to match the phonetic notation
Pronunciation mouth shape.
Specifically, which realizes that flow is as follows, receives voice, and interface realization of increasing income is called in voice data quantization processing
Voice converts word, and principle has different vocal print feature information for different voices, can be with pre-recorded vocal print feature information
It is compareed with word, generates database correspondence, being compared after new speech is captured with pre-recorded database can
Corresponding word is found, and text conversion phonetic notation process is also in this way, pair of pre-recorded difference phonetic notation and corresponding different literals
According to relationship, array is written, generates database correspondence, after voice conversion word, word finds note in contrasting data library again
Sound, then the phonetic notation pre-set from database are sent out in the corresponding table of pronunciation mouth shape, obtaining target corresponding with phonetic notation
Phonetic notation is split adaptation facial expression image, is switched fast display, generates virtual video effect by the sound shape of the mouth as one speaks.
On the one hand, it is specific real as one when determining word content corresponding with destination call voice is Chinese character
Mode is applied, wherein, the mapping table of the word and phonetic notation includes the mapping table of Chinese character and phonetic;In the foundation word
Hold, from word and the mapping table of phonetic notation, match phonetic notation corresponding with the word content and combine, including:
According to the word content, from the mapping table of Chinese character and phonetic, phonetic corresponding with the word content is matched
Combination.
Accordingly, the mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of phonetic and pronunciation mouth shape,
This is combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, matches corresponding at least one pronunciation mouth shape,
Including:According to the pinyin combinations, from the mapping table of phonetic and pronunciation mouth shape, corresponding at least one pronunciation mouth is matched
Type.
The process of Chinese character conversion phonetic is also in this way, pre-recorded phonetic and Chinese character contrast relationship, the GBK characters of standard
Collect and phonetic transcriptions of Chinese characters table is read in database, array is written, generate database correspondence, after voice converts Chinese character, Chinese character is right again
According to finding phonetic in database, then the phonetic pre-set from database is in the corresponding table of pronunciation mouth shape, obtaining
Phonetic is split adaptation facial expression image by the target speaker shape of the mouth as one speaks corresponding with phonetic, is switched fast display, generation virtual video effect
Fruit.
As a specific embodiment, wherein, the mapping table of the phonetic and pronunciation mouth shape includes sound in phonetic
The female and correspondence of simple or compound vowel of a Chinese syllable and pronunciation mouth shape;This is according to the pinyin combinations, from phonetic and the mapping table of pronunciation mouth shape
In, corresponding at least one pronunciation mouth shape is matched, including:
According to the pinyin combinations, initial consonant and simple or compound vowel of a Chinese syllable included in the pinyin combinations are determined, alternatively, alternatively, determining
Simple or compound vowel of a Chinese syllable included in the pinyin combinations;According to the initial consonant and simple or compound vowel of a Chinese syllable, alternatively, according to the simple or compound vowel of a Chinese syllable, from the initial consonant and simple or compound vowel of a Chinese syllable
With in the correspondence of pronunciation mouth shape, matching and obtaining corresponding at least one pronunciation mouth shape.
The process, in the mapping table from Chinese character and phonetic, after matching pinyin combinations corresponding with word content,
Phonetic in structure of arrays body is positioned and splits initial consonant and simple or compound vowel of a Chinese syllable, since some Chinese character pronunciations correspond only to individual simple or compound vowel of a Chinese syllable, because
The combination that initial consonant and simple or compound vowel of a Chinese syllable may be included in this pinyin combinations determined or only simple or compound vowel of a Chinese syllable, according to initial consonant and simple or compound vowel of a Chinese syllable
Corresponding shape of the mouth as one speaks adaptation facial expression image, and switching display is done in display window, expression is switched fast, and generation virtual video shows shape
Formula.Specifically, the mapping table of phonetic and pronunciation mouth shape therein can also be imported or downloaded generation in realization.
Specifically, this programme is conversed according on mobile phone, for realizing virtual video, is wherein set in storage device
There is the shape of the mouth as one speaks resources bank of Chinese phonetic alphabet, in the standard shape of the mouth as one speaks library collected in advance, each mouth shape image can correspond in library
The pronunciation example of one phonetic alphabet, the initial and the final in shape of the mouth as one speaks figure and pinyin table correspond, and each pronouncing in this way can
Corresponding initial consonant, the combination association of simple or compound vowel of a Chinese syllable image are found in shape of the mouth as one speaks library, makes mouth of the pronunciation mouth shape really with practical call personnel
Sounding nozzle type display effect is consistent.
On the other hand, when determining word content corresponding with destination call voice is English, as a specific implementation
Mode, wherein, the mapping table of the word and phonetic notation includes the mapping table of English and English phonetic;Described in the foundation
Word content from word and the mapping table of phonetic notation, matches phonetic notation corresponding with the word content and combines, including:
According to the word content, in the mapping table from English with English phonetic, matching is corresponding with the word content
Phonetic symbol combines.
Accordingly, the mapping table of the phonetic notation and pronunciation mouth shape includes the correspondence of English phonetic and pronunciation mouth shape
Table, it is described to be combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, match corresponding at least one pronunciation
The shape of the mouth as one speaks, including:It is combined according to the phonetic symbol, from the mapping table of English phonetic and pronunciation mouth shape, matching is corresponding at least
One pronunciation mouth shape.
The process of English conversion English phonetic is also the contrast relationship in this way, between pre-recorded English phonetic and English,
Array is written, generates database correspondence, after voice conversion is English, English phonetic is found in English contrasting data library again,
Pre-set English phonetic is sent out in the corresponding table of pronunciation mouth shape, obtaining target corresponding with English phonetic from database again
English phonetic is carried out splitting adaptation facial expression image, is switched fast display, generates virtual video effect by the sound shape of the mouth as one speaks.
As a specific embodiment, wherein, the mapping table of the English phonetic and pronunciation mouth shape includes English
The correspondence of phonetic symbol medial vowel and consonant and pronunciation mouth shape;This is combined according to the phonetic symbol, from English phonetic and pronunciation mouth shape
Mapping table in, match corresponding at least one pronunciation mouth shape, including:
It is combined according to the phonetic symbol, vowel and consonant included in the phonetic symbol combination is determined, alternatively, determining the sound
Vowel included in mark combination;According to the vowel and consonant, alternatively, according to the vowel, from the vowel and consonant with
In the correspondence of pronunciation mouth shape, matching obtains corresponding at least one pronunciation mouth shape.
The process in the mapping table from English with English phonetic, matches phonetic symbol group corresponding with word content
After conjunction, English phonetic in structure of arrays body is positioned and splits vowel and consonant, since some English equivalents correspond only to individually
Vowel, it is thus determined that the combination that vowel and consonant may be included in obtained pinyin combinations or only vowel, according to member
Sound and consonant correspond to shape of the mouth as one speaks adaptation facial expression image, and do switching display in display window, and expression is switched fast, and generates virtual video
Show form.Specifically, the mapping table of English phonetic and pronunciation mouth shape therein can also be imported in realization or
Download generation.
Specifically, this programme is conversed according on mobile phone, for realizing virtual video, is wherein set in storage device
There is the shape of the mouth as one speaks resources bank of English phonetic symbol, in the standard shape of the mouth as one speaks library collected in advance, each mouth shape image can correspond in library
The pronunciation example of one English phonetic symbol, the vowel-consonant in shape of the mouth as one speaks figure and English phonetic table are corresponded, are each sent out in this way
Sound can find corresponding vowel, consonant articulation image in shape of the mouth as one speaks library, be combined association, make pronunciation mouth shape really with reality
The mouth sounding nozzle type display effect of border call personnel is consistent.
Step 204:At least frame target medium that matching obtains including the target speaker shape of the mouth as one speaks from target expression packet is shown
Image.
Specifically, which as includes the expression packet of different nozzle type, matchmaker included in the nozzle type expression packet
Body shows that image also has the display image of different opening and closing for nozzle type, it is preferable that the target medium shows image kind in addition to can be with
Outside comprising the target speaker shape of the mouth as one speaks, other variation elements, such as variation, expression and shape of face of eyebrow etc. can also be included, into
Row is arranged in pairs or groups with nozzle type.
Step 205:Image, which plays out display, to be shown to an at least frame target medium by display interface.
The process, in two-party conversation, user's terminal, for example, mobile phone carry out adaptation contingency table according to speech recognition
Feelings through correspondence image resource synthesis optimizing, make picture continuously be played with voice, expression constantly switching update therewith, can
To generate virtual video scene under no network environment, realize the scene of virtual display video conversation, allow dialogue it is more effective,
It is mostly interesting.
The process is described in lower mask body.The voice converts word phonetic and is adapted to the technology realization step of expression such as
Under:
Step 1. reads phonetic transcriptions of Chinese characters content in the Hanzi coded character set of standard (GBK) database in advance, by Chinese character
Array is written in phonetic data, and each element of array is a structure, and structure is divided into four parts, phonetic, after phonetic is split
Initial consonant, simple or compound vowel of a Chinese syllable and the corresponding Chinese character of phonetic.
Step 2. voice, for word, is searched in array using word as keyword, can looked for by interface conversion of increasing income
Go out corresponding phonetic, then split in enantiomorph by phonetic and inquire associated initial consonant, simple or compound vowel of a Chinese syllable.
Association expression is searched in step 3. initial consonant, simple or compound vowel of a Chinese syllable control, and visual dialog frame is created as user interface (UI) in desktop
Window is presented expression in UI windows.
An at least frame target medium is shown image carry out it is continuous play display when, can be with for the time is presented
The time span of destination call voice is first calculated, according to the time span of destination call voice, determines to show per frame target medium
Image carries out playing duration during continuous broadcasting display, secretly changes continuous play of playing duration progress and shows, specifically by mesh
It marks the time span of call voice divided by target medium shows the frame number of image, the playing duration of a frame image is obtained, with " you
It is good " for, it is assumed that voice " hello " time span is 1 second, and speech decomposition corresponds to 4 images for expression, and every image shows 1/4
Second, i.e. every picture display times are 250 milliseconds, according to the sequencing that destination call voice is got, pair and destination call
An at least frame target medium for voice match shows that image carries out continuous switching and presents.The expression and reception that window is presented at this time
Voice corresponds to, and in communication process, constantly changes with voice, and user's expression packet is switched fast in UI windows, that is, shows video effect
Fruit, so as to fulfill virtual video call function.
As a specific embodiment, wherein, image is shown to an at least frame target medium by display interface
The step of playing out display, including:By the display interface, using the display image acquired as background frame, to institute
It states an at least frame target medium and shows that image plays out display.
When showing that image plays out display to target medium, broadcasting background can be added, which can be setting
A fixed background corresponding with contact person or according to the keyword or word recognized in destination call voice come into
Row matching obtains corresponding image as background frame, and the background frame is allow to be changed according to user speech content.
This process, by speech recognition, carries out the shape of the mouth as one speaks and corresponds to lookup adaptation mainly under no wireless network environment, and
Synthesis processing is carried out to scene image (background frame) known in library, is reproduced so as to fulfill the virtual scene under no network environment,
Can be under no network environment, the scene and expression that constantly switch can realize the scene of virtual display video conversation, allow dialogue
Exchange is more effective, mostly interesting.
Specifically, which realizes that flow is as follows, such as when carrying out call voice dialogue on mobile phone, and both sides' mobile phone exists
Pre-stored expression, scene image, when receiving other side's voice, local first starts sound identification module, passes through and identifies switching
Corresponding media show resource, and in terminal session interface display, voice changes at any time, and speech recognition is constantly adapted to corresponding expression money
Source can show a fixed background frame in terminal interface or be adapted to corresponding background frame according to voice content, and expression is matched
It closes background frame in terminal interface to switch, the visual effect being switched fast, is exactly the video conversation scene of reality, so as to local
Realize the video conversation of virtual reality.
As a specific embodiment, wherein, acquire destination call voice the step of before, further include:
The personal images of material resource packet and call contact person are obtained, wherein, at least one is included in the material resource packet
A media materials image;The personal images of call contact person and each media materials image are integrated, generation is at least one
Media show image, obtain showing the expression packet of image comprising at least one media.
Wherein, the personal images can be personal facial expression image or with call contact person have incidence relation,
Relative image.The process corresponds to the initialization procedure for the expression packet that image is shown comprising media, to allow to basis
Destination call voice obtains at least frame target medium to match and shows image.The material resource packet can have network
In the case of in advance download obtain.The process that personal images are integrated with each media materials image, can be that for example stingy picture is filled out
It fills, part is replaced, and is realized the methods of partial mulching.
Specifically, e.g., user installs virtual simulation software in mobile phone, and picture field can be arbitrarily set in software
Scape, upload user image, initialization is preset, generates user's expression packet.Image resource, Ke Yishi first needed for user mobile phone storage
It takes pictures before pre- or network downloads to mobile phone local, generally include individual subscriber image, material resource packet, convention video dialogue institute
In scene picture, material resource packet is, for example, shape of the mouth as one speaks resource packet, and wherein shape of the mouth as one speaks resource packet is integrated for invention software itself, provides
User uses.By taking shape of the mouth as one speaks resource packet as an example, software integrates mouth shape image resource in itself, and user first provides personal images using preceding,
To software initialization, initialization can synthesize mouth shape image and personal images, and do image optimization, by the shape of the mouth as one speaks inclusion into
To user's image, generation user corresponds to the expression packet of pinyin table letter.Image composing technique first will be in user's facial image
The shape of the mouth as one speaks and fringe region are removed and the superposition of the shape of the mouth as one speaks resource of same size, then to image optimization processing, obtains corresponding phonetic alphabet
The user of table customizes expression packet.
As a specific embodiment, wherein, which includes personal face image, media materials image
Include pronunciation mouth shape image in phonetic corresponding to initial consonant and simple or compound vowel of a Chinese syllable or English phonetic medial vowel and the pronunciation corresponding to consonant
Mouth shape image, the personal images by call contact person are integrated with each media materials image, and generation is at least one
Media show image, the step of obtaining showing the expression packet of image comprising at least one media, including:
Identify the mouth region in the face image;By the pronunciation mouth shape image in the media materials image described
Mouth region is filled replacement;Generate media corresponding with each pronunciation mouth shape image in the media materials image
It shows image, obtains showing the expression packet of image comprising media corresponding with each pronunciation mouth shape image.
By taking the pronunciation mouth shape image that media materials image is included in phonetic corresponding to initial consonant and simple or compound vowel of a Chinese syllable as an example, the face and
The particular technique of shape of the mouth as one speaks synthesis expression packet realizes that step is as follows:
Step 1. is by taking first and second liang of people are conversed as an example, in the voice communication apparatus that first is held, prestores the image of second
Resource separately prestores the mouth shape image resource of all alphabetical pronunciations of corresponding Chinese phonetic alphabet table.
Face and shape of the mouth as one speaks coloured image are converted to gray level image by step 2., turn gray scale for colour, according to classical operation
Formula, Gray=R*0.299+G*0.587+B*0.114 to avoid the floating-point operation of low speed, introduce integer arithmetic, and make four houses
Five enter, and obtain mutation algorithm of equal value, and Gray=(R*30+G*59+B*11+50)/100 improves calculating transfer efficiency.
Step 3. gray level image takes gray threshold, is split face using threshold value, mouth region detection is realized, to people
Face image carries out edge detection, right using mean operator template (i.e. each calculated for pixel values is updated to the mean value of adjacent pixels)
Gray level image is handled, and can detect face characteristic region, such as eye mouth and nose characteristic area;Or according to facial symmetry and knot
Structure is distributed, and identifies mouth region.
Step 4. replaces the filling of original shape of the mouth as one speaks resource in the shape of the mouth as one speaks region that human face region detects, and expression is generated, by step 1
Often row carries out quantization sampling to the pixel value of the mouth shape image of middle generation, makes the shape of the mouth as one speaks area pixel of its number of pixels and step 3
Number is consistent, and the shape of the mouth as one speaks resource after sample quantization is replaced filling to the shape of the mouth as one speaks region of step 3, rebuilds generation human face expression
Image.
Step 5. carries out gaussian filtering process to newly-generated facial expression image, enhances the flatness of composograph, generates table
Feelings resources bank, every facial expression image corresponds to the pronunciation shape of the mouth as one speaks of alphabet for the face shape of the mouth as one speaks in library.
Face corresponds to and the synthesis of the different shape of the mouth as one speaks, generates different facial expression images, every image corresponds to the pronunciation of phonetic alphabet
Shape of the mouth as one speaks expression.Newly-generated Facial Expression Image is needed to carry out denoising smooth processing by gaussian filtering, and then is got clear
Clear facial expression image.The template that Gaussian function is calculated is floating point number, whole for balance filter effect and computational efficiency, use
5 × 5 template operator of number, coefficient 1/273, template such as Fig. 4 examples.
The process, according to collected call voice, matching obtains an at least frame target medium and shows image, to those mesh
Mark media and show that image carries out the continuous result of broadcast for playing display, forming video pattern, realize speech recognition conversion be regarding
Frequency is shown, can be broken through network environment limitation by the local software application model of terminal device, carry out virtual video phone friendship
Stream, the process do not depend on network, save flow and even break away from flow restriction, make to draw with the video of virtualization in communication process
Face, communication process is more fresh and alive, agile, strengthens communication efficiency, adds communication enjoyment.
3rd embodiment
The embodiment of the present invention discloses a kind of medium display terminal end, with reference to shown in Fig. 3, including:Acquisition module 301, first are obtained
Modulus block 302 and display module 303.The medium display terminal end can be the terminal such as smartwatch, mobile phone for supporting voice.
Acquisition module 301, for acquiring destination call voice.
First acquisition module 302, for according to the destination call voice, obtaining and the destination call voice match
An at least frame target medium shows image.
Display module 303, for pass through display interface to an at least frame target medium show image play out it is aobvious
Show.
Wherein, the first acquisition module, including:
First determination sub-module, for determining target expression packet corresponding with the destination call voice.
Second determination sub-module for the vocal print feature according to the destination call voice, determines that sending out the target leads to
At least one target speaker shape of the mouth as one speaks needed for language sound.
Acquisition submodule obtains including at least the one of the target speaker shape of the mouth as one speaks for matching from the target expression packet
Frame target medium shows image.
Wherein, second determination sub-module, including:
First determination unit for the vocal print feature according to the destination call voice, determines and the destination call language
The corresponding word content of sound.
First matching unit, according to the word content, from word and the mapping table of phonetic notation, matching and the text
The corresponding phonetic notation combination of word content.
Second matching unit, for being combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, matching
Corresponding at least one pronunciation mouth shape.
Second determination unit, for determining that at least one pronunciation mouth shape is to send out needed for the destination call voice
At least one target speaker shape of the mouth as one speaks.
Wherein, the word and the mapping table of phonetic notation include the mapping table of Chinese character and phonetic;Described first
Matching unit includes:
First coupling subelement, for according to the word content, from the mapping table of Chinese character and phonetic, matching with
The corresponding pinyin combinations of the word content.
The mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of phonetic and pronunciation mouth shape, and described
Two matching units include:
Second coupling subelement, for according to the pinyin combinations, from the mapping table of phonetic and pronunciation mouth shape,
With corresponding at least one pronunciation mouth shape.
Wherein, the mapping table of the phonetic and pronunciation mouth shape includes initial consonant and simple or compound vowel of a Chinese syllable and pronunciation mouth shape in phonetic
Correspondence;Second coupling subelement is specifically used for:
According to the pinyin combinations, initial consonant and simple or compound vowel of a Chinese syllable included in the pinyin combinations are determined, alternatively, determining the spelling
Simple or compound vowel of a Chinese syllable included in sound combination;According to the initial consonant and simple or compound vowel of a Chinese syllable, alternatively, according to the simple or compound vowel of a Chinese syllable, from the initial consonant and simple or compound vowel of a Chinese syllable with
In the correspondence of pronunciation mouth shape, matching obtains corresponding at least one pronunciation mouth shape.
Wherein, the word and the mapping table of phonetic notation include the mapping table of English and English phonetic;It is described
First matching unit includes:
Third coupling subelement, for according to the word content, in the mapping table from English with English phonetic,
It is combined with phonetic symbol corresponding with the word content.
The mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of English phonetic and pronunciation mouth shape, institute
The second matching unit is stated to include:
4th coupling subelement, for being combined according to the phonetic symbol, from English phonetic and the mapping table of pronunciation mouth shape
In, match corresponding at least one pronunciation mouth shape.
Wherein, the mapping table of the English phonetic and pronunciation mouth shape includes English phonetic medial vowel and consonant and hair
The correspondence of the sound shape of the mouth as one speaks;4th coupling subelement is specifically used for:
It is combined according to the phonetic symbol, vowel and consonant included in the phonetic symbol combination is determined, alternatively, determining the sound
Vowel included in mark combination;According to the vowel and consonant, alternatively, according to the vowel, from the vowel and consonant with
In the correspondence of pronunciation mouth shape, matching obtains corresponding at least one pronunciation mouth shape.
Wherein, first determination sub-module, including:
Third determination unit, for determining the object contact person corresponding to the destination call voice.
Unit is transferred, for transferring the target expression packet with the object contact person pre-association.
Wherein, the acquisition module, including:
Submodule is monitored, for monitoring voice communication process.
Third determination sub-module, for determining that the other side's call voice received is the destination call voice.
Wherein, which further includes:
Second acquisition module, for obtaining the personal images of material resource packet and call contact person, wherein, the material money
At least one media materials image is included in the packet of source.
Generation module, it is whole for the personal images of the call contact person to be carried out with each media materials image
It closes, generates at least one media and show image, obtain showing the expression packet of image comprising at least one media.
Wherein, the personal images include personal face image, and the media materials image includes sound in phonetic
The pronunciation mouth shape image corresponding to pronunciation mouth shape image or English phonetic medial vowel and consonant corresponding to female and simple or compound vowel of a Chinese syllable, the life
Into module, including:
Submodule is identified, for identifying the mouth region in the face image.
Replacement module, for the pronunciation mouth shape image in the media materials image to be filled in the mouth region
It replaces.
Submodule is generated, for generating matchmaker corresponding with each pronunciation mouth shape image in the media materials image
Body shows image, obtains showing the expression packet of image comprising media corresponding with each pronunciation mouth shape image.
Wherein, the display module, including:
Display sub-module, for passing through the display interface, using the display image acquired as background frame, to institute
It states an at least frame target medium and shows that image plays out display.
The medium display terminal end, according to collected call voice, matching obtains an at least frame target medium and shows image,
Those target mediums are shown with image carries out the continuous result of broadcast for playing display, forming video pattern, realizes speech recognition
It is converted to video to show, can break through network environment limitation by the local software application model of terminal device, virtually regarded
Frequency communication on telephone, the process do not depend on network, save flow and even break away from flow restriction, make in communication process with virtualization
Video pictures, communication process is more fresh and alive, agile, strengthens communication efficiency, adds communication enjoyment.
Although the preferred embodiment of the embodiment of the present invention has been described, those skilled in the art once know base
This creative concept can then make these embodiments other change and modification.So appended claims are intended to be construed to
Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.
Finally, it is to be noted that, in embodiments of the present invention, relational terms such as first and second and the like are only
Only it is used for distinguishing one entity or operation from another entity or operation, without necessarily requiring or implying these realities
There are any actual relationship or orders between body or operation.Moreover, term " comprising ", "comprising" or its it is any its
He is intended to non-exclusive inclusion by variant, so that process, method, article or terminal including a series of elements are set
It is standby not only to include those elements, but also including other elements that are not explicitly listed or further include as this process, side
Method, article or the intrinsic element of terminal device.In the absence of more restrictions, it is limited by sentence "including a ..."
Fixed element, it is not excluded that also there is in addition identical in the process including the element, method, article or terminal device
Element.
Above-described is the preferred embodiment of the present invention, it should be pointed out that the ordinary person of the art is come
It says, several improvements and modifications can also be made under the premise of principle of the present invention is not departed from, these improvements and modifications also exist
In protection scope of the present invention.
Claims (24)
1. a kind of media display methods, which is characterized in that including:
Acquire destination call voice;
According to the destination call voice, obtain and scheme with an at least frame target medium for the destination call voice match display
Picture;
Image, which plays out display, to be shown to an at least frame target medium by display interface.
2. according to the method described in claim 1, it is characterized in that, described according to the destination call voice, obtain with it is described
An at least frame target medium for destination call voice match shows image, including:
Determine target expression packet corresponding with the destination call voice;
According to the vocal print feature of the destination call voice, determine to send out at least one target needed for the destination call voice
Pronunciation mouth shape;
Matching obtains at least frame target medium comprising the target speaker shape of the mouth as one speaks and shows image from the target expression packet.
3. according to the method described in claim 2, it is characterized in that, the vocal print feature according to the destination call voice,
Determine to send out at least one target speaker shape of the mouth as one speaks needed for the destination call voice, including:
According to the vocal print feature of the destination call voice, word content corresponding with the destination call voice is determined;
According to the word content, from word and the mapping table of phonetic notation, phonetic notation corresponding with the word content is matched
Combination;
It is combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, matches corresponding at least one pronunciation mouth
Type;
It is at least one target speaker shape of the mouth as one speaks sent out needed for the destination call voice to determine at least one pronunciation mouth shape.
4. according to the method described in claim 3, it is characterized in that, the word and the mapping table of phonetic notation include Chinese character
With the mapping table of phonetic;It is described according to the word content, from word and the mapping table of phonetic notation, matching with it is described
The corresponding phonetic notation combination of word content, including:
According to the word content, from the mapping table of Chinese character and phonetic, phonetic corresponding with the word content is matched
Combination;
The mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of phonetic and pronunciation mouth shape, described according to institute
Phonetic notation combination is stated, from the mapping table of phonetic notation and pronunciation mouth shape, matches corresponding at least one pronunciation mouth shape, including:
According to the pinyin combinations, from the mapping table of phonetic and pronunciation mouth shape, corresponding at least one pronunciation mouth is matched
Type.
5. according to the method described in claim 4, it is characterized in that, the mapping table of the phonetic and pronunciation mouth shape includes
The correspondence of initial consonant and simple or compound vowel of a Chinese syllable and pronunciation mouth shape in phonetic;It is described according to the pinyin combinations, from phonetic and pronunciation mouth shape
In mapping table, corresponding at least one pronunciation mouth shape is matched, including:
According to the pinyin combinations, initial consonant and simple or compound vowel of a Chinese syllable included in the pinyin combinations are determined, alternatively, determining the pinyin-group
Simple or compound vowel of a Chinese syllable included in conjunction;
According to the initial consonant and simple or compound vowel of a Chinese syllable, alternatively, according to the simple or compound vowel of a Chinese syllable, from the initial consonant and the correspondence of simple or compound vowel of a Chinese syllable and pronunciation mouth shape
In, matching obtains corresponding at least one pronunciation mouth shape.
6. according to the method described in claim 3, it is characterized in that, the word and the mapping table of phonetic notation include English
With the mapping table of English phonetic;It is described according to the word content, from word and the mapping table of phonetic notation, matching with
The corresponding phonetic notation combination of the word content, including:
According to the word content, in the mapping table from English with English phonetic, matching is corresponding with the word content
Phonetic symbol combines;
The mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of English phonetic and pronunciation mouth shape, it is described according to
It is combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, matches corresponding at least one pronunciation mouth shape, packet
It includes:
It is combined according to the phonetic symbol, from the mapping table of English phonetic and pronunciation mouth shape, matches corresponding at least one hair
The sound shape of the mouth as one speaks.
7. according to the method described in claim 6, it is characterized in that, in the mapping table of the English phonetic and pronunciation mouth shape
Correspondence including English phonetic medial vowel and consonant and pronunciation mouth shape;It is described to be combined according to the phonetic symbol, from English phonetic
With in the mapping table of pronunciation mouth shape, matching corresponding at least one pronunciation mouth shape, including:
It is combined according to the phonetic symbol, vowel and consonant included in the phonetic symbol combination is determined, alternatively, determining the phonetic symbol group
Vowel included in conjunction;
According to the vowel and consonant, alternatively, according to the vowel, from the vowel and the correspondence of consonant and pronunciation mouth shape
In, matching obtains corresponding at least one pronunciation mouth shape.
8. according to the method described in claim 2, it is characterized in that, described determine target corresponding with the destination call voice
Expression packet, including:
Determine the object contact person corresponding to the destination call voice;
Transfer the target expression packet with the object contact person pre-association.
9. according to claim 1-8 any one of them methods, which is characterized in that the acquisition destination call voice, including:
Monitor voice communication process;
It is the destination call voice to determine the other side's call voice received.
10. according to claim 1-8 any one of them methods, which is characterized in that before the acquisition destination call voice, institute
The method of stating further includes:
The personal images of material resource packet and call contact person are obtained, wherein, at least one matchmaker is included in the material resource packet
Body material image;
The personal images of the call contact person and each media materials image are integrated, generate at least one media
It shows image, obtains showing the expression packet of image comprising at least one media.
11. according to the method described in claim 10, it is characterized in that, the personal images include personal face image,
The media materials image include pronunciation mouth shape image in phonetic corresponding to initial consonant and simple or compound vowel of a Chinese syllable or English phonetic medial vowel and
Pronunciation mouth shape image corresponding to consonant, the personal images by the call contact person and each media materials image
It is integrated, generates at least one media and show image, obtain showing the expression packet of image, packet comprising at least one media
It includes:
Identify the mouth region in the face image;
Pronunciation mouth shape image in the media materials image is filled replacement in the mouth region;
Generation media display image corresponding with each pronunciation mouth shape image in the media materials image, comprising
Media corresponding with each pronunciation mouth shape image show the expression packet of image.
12. according to the method described in claim 1, it is characterized in that, it is described by display interface to an at least frame target
Media show that image plays out display, including:
By the display interface, using the display image acquired as background frame, to an at least frame target medium
Display image plays out display.
13. a kind of medium display terminal end, which is characterized in that including:
Acquisition module, for acquiring destination call voice;
First acquisition module, for according to the destination call voice, obtaining at least one with the destination call voice match
Frame target medium shows image;
Display module shows that image plays out display for passing through display interface to an at least frame target medium.
14. medium display terminal end according to claim 13, which is characterized in that the first acquisition module, including:
First determination sub-module, for determining target expression packet corresponding with the destination call voice;
Second determination sub-module for the vocal print feature according to the destination call voice, determines to send out the destination call language
At least one target speaker shape of the mouth as one speaks needed for sound;
Acquisition submodule, for matching at least frame mesh for obtaining including the target speaker shape of the mouth as one speaks from the target expression packet
It marks media and shows image.
15. medium display terminal end according to claim 14, which is characterized in that second determination sub-module, including:
First determination unit for the vocal print feature according to the destination call voice, determines and the destination call voice pair
The word content answered;
First matching unit, according to the word content, from word and the mapping table of phonetic notation, matching in the word
Hold corresponding phonetic notation combination;
Second matching unit, for being combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, matching corresponds to
At least one pronunciation mouth shape;
Second determination unit, for determining that at least one pronunciation mouth shape is to send out needed for the destination call voice at least
One target speaker shape of the mouth as one speaks.
16. medium display terminal end according to claim 15, which is characterized in that the word and the mapping table of phonetic notation
Include the mapping table of Chinese character and phonetic;First matching unit includes:
First coupling subelement, for according to the word content, from the mapping table of Chinese character and phonetic, matching with it is described
The corresponding pinyin combinations of word content;
The mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of phonetic and pronunciation mouth shape, described second
Include with unit:
Second coupling subelement, for according to the pinyin combinations, from the mapping table of phonetic and pronunciation mouth shape, matching pair
At least one pronunciation mouth shape answered.
17. medium display terminal end according to claim 16, which is characterized in that the phonetic is corresponding with pronunciation mouth shape to close
It is that table includes the correspondence of initial consonant and simple or compound vowel of a Chinese syllable and pronunciation mouth shape in phonetic;Second coupling subelement is specifically used for:
According to the pinyin combinations, initial consonant and simple or compound vowel of a Chinese syllable included in the pinyin combinations are determined, alternatively, determining the pinyin-group
Simple or compound vowel of a Chinese syllable included in conjunction;
According to the initial consonant and simple or compound vowel of a Chinese syllable, alternatively, according to the simple or compound vowel of a Chinese syllable, from the initial consonant and the correspondence of simple or compound vowel of a Chinese syllable and pronunciation mouth shape
In, matching obtains corresponding at least one pronunciation mouth shape.
18. medium display terminal end according to claim 15, which is characterized in that the word and the mapping table of phonetic notation
Include the mapping table of English and English phonetic;First matching unit includes:
Third coupling subelement, for according to the word content, in the mapping table from English with English phonetic, matching with
The corresponding phonetic symbol combination of the word content;
The mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of English phonetic and pronunciation mouth shape, and described
Two matching units include:
4th coupling subelement, for being combined according to the phonetic symbol, from the mapping table of English phonetic and pronunciation mouth shape,
With corresponding at least one pronunciation mouth shape.
19. medium display terminal end according to claim 18, which is characterized in that pair of the English phonetic and pronunciation mouth shape
Relation table is answered to include the correspondence of English phonetic medial vowel and consonant and pronunciation mouth shape;4th coupling subelement is specific
For:
It is combined according to the phonetic symbol, vowel and consonant included in the phonetic symbol combination is determined, alternatively, determining the phonetic symbol group
Vowel included in conjunction;
According to the vowel and consonant, alternatively, according to the vowel, from the vowel and the correspondence of consonant and pronunciation mouth shape
In, matching obtains corresponding at least one pronunciation mouth shape.
20. medium display terminal end according to claim 14, which is characterized in that first determination sub-module, including:
Third determination unit, for determining the object contact person corresponding to the destination call voice;
Unit is transferred, for transferring the target expression packet with the object contact person pre-association.
21. according to claim 13-20 any one of them medium display terminal ends, which is characterized in that the acquisition module, packet
It includes:
Submodule is monitored, for monitoring voice communication process;
Third determination sub-module, for determining that the other side's call voice received is the destination call voice.
22. according to claim 13-20 any one of them medium display terminal ends, which is characterized in that further include:
Second acquisition module, for obtaining the personal images of material resource packet and call contact person, wherein, the material resource packet
In include at least one media materials image;
Generation module, it is raw for the personal images of the call contact person and each media materials image to be integrated
Image is shown at least one media, obtains showing the expression packet of image comprising at least one media.
23. medium display terminal end according to claim 22, which is characterized in that the personal images include personal face
Portion's image, the media materials image are included in pronunciation mouth shape image or English phonetic in phonetic corresponding to initial consonant and simple or compound vowel of a Chinese syllable
Pronunciation mouth shape image corresponding to vowel and consonant, the generation module, including:
Submodule is identified, for identifying the mouth region in the face image;
Replacement module replaces for the pronunciation mouth shape image in the media materials image to be filled in the mouth region
It changes;
Submodule is generated, is shown for generating media corresponding with each pronunciation mouth shape image in the media materials image
Diagram picture obtains showing the expression packet of image comprising media corresponding with each pronunciation mouth shape image.
24. medium display terminal end according to claim 13, which is characterized in that the display module, including:
Display sub-module, for passing through the display interface, using acquire one display image as background frame, to it is described extremely
A few frame target medium shows that image plays out display.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611154485.5A CN108234735A (en) | 2016-12-14 | 2016-12-14 | A kind of media display methods and terminal |
PCT/CN2017/114843 WO2018108013A1 (en) | 2016-12-14 | 2017-12-06 | Medium displaying method and terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611154485.5A CN108234735A (en) | 2016-12-14 | 2016-12-14 | A kind of media display methods and terminal |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108234735A true CN108234735A (en) | 2018-06-29 |
Family
ID=62557913
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611154485.5A Withdrawn CN108234735A (en) | 2016-12-14 | 2016-12-14 | A kind of media display methods and terminal |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108234735A (en) |
WO (1) | WO2018108013A1 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109377540A (en) * | 2018-09-30 | 2019-02-22 | 网易(杭州)网络有限公司 | Synthetic method, device, storage medium, processor and the terminal of FA Facial Animation |
CN110062116A (en) * | 2019-04-29 | 2019-07-26 | 上海掌门科技有限公司 | Method and apparatus for handling information |
CN110446066A (en) * | 2019-08-28 | 2019-11-12 | 北京百度网讯科技有限公司 | Method and apparatus for generating video |
CN110784762A (en) * | 2019-08-21 | 2020-02-11 | 腾讯科技(深圳)有限公司 | Video data processing method, device, equipment and storage medium |
CN110809090A (en) * | 2019-10-31 | 2020-02-18 | Oppo广东移动通信有限公司 | Call control method and related product |
CN111063339A (en) * | 2019-11-11 | 2020-04-24 | 珠海格力电器股份有限公司 | Intelligent interaction method, device, equipment and computer readable medium |
CN111596841A (en) * | 2020-04-28 | 2020-08-28 | 维沃移动通信有限公司 | Image display method and electronic equipment |
CN111741162A (en) * | 2020-06-01 | 2020-10-02 | 广东小天才科技有限公司 | Recitation prompting method, electronic equipment and computer readable storage medium |
WO2020221104A1 (en) * | 2019-04-30 | 2020-11-05 | 上海连尚网络科技有限公司 | Emoji packet presentation method and equipment |
CN112804440A (en) * | 2019-11-13 | 2021-05-14 | 北京小米移动软件有限公司 | Method, device and medium for processing image |
WO2022089222A1 (en) * | 2020-10-28 | 2022-05-05 | Ningbo Geely Automobile Research & Development Co., Ltd. | A camera system and method for generating an eye contact image view of a person |
CN114827648A (en) * | 2022-04-19 | 2022-07-29 | 咪咕文化科技有限公司 | Method, device, equipment and medium for generating dynamic expression package |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112770063B (en) * | 2020-12-22 | 2023-07-21 | 北京奇艺世纪科技有限公司 | Image generation method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6539354B1 (en) * | 2000-03-24 | 2003-03-25 | Fluent Speech Technologies, Inc. | Methods and devices for producing and using synthetic visual speech based on natural coarticulation |
CN101482975A (en) * | 2008-01-07 | 2009-07-15 | 丰达软件(苏州)有限公司 | Method and apparatus for converting words into animation |
CN104239394A (en) * | 2013-06-18 | 2014-12-24 | 三星电子株式会社 | Translation system comprising display apparatus and server and control method thereof |
CN104468959A (en) * | 2013-09-25 | 2015-03-25 | 中兴通讯股份有限公司 | Method, device and mobile terminal displaying image in communication process of mobile terminal |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101968893A (en) * | 2009-07-28 | 2011-02-09 | 上海冰动信息技术有限公司 | Game sound-lip synchronization system |
CN104238991B (en) * | 2013-06-21 | 2018-05-25 | 腾讯科技(深圳)有限公司 | Phonetic entry matching process and device |
-
2016
- 2016-12-14 CN CN201611154485.5A patent/CN108234735A/en not_active Withdrawn
-
2017
- 2017-12-06 WO PCT/CN2017/114843 patent/WO2018108013A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6539354B1 (en) * | 2000-03-24 | 2003-03-25 | Fluent Speech Technologies, Inc. | Methods and devices for producing and using synthetic visual speech based on natural coarticulation |
CN101482975A (en) * | 2008-01-07 | 2009-07-15 | 丰达软件(苏州)有限公司 | Method and apparatus for converting words into animation |
CN104239394A (en) * | 2013-06-18 | 2014-12-24 | 三星电子株式会社 | Translation system comprising display apparatus and server and control method thereof |
CN104468959A (en) * | 2013-09-25 | 2015-03-25 | 中兴通讯股份有限公司 | Method, device and mobile terminal displaying image in communication process of mobile terminal |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109377540B (en) * | 2018-09-30 | 2023-12-19 | 网易(杭州)网络有限公司 | Method and device for synthesizing facial animation, storage medium, processor and terminal |
CN109377540A (en) * | 2018-09-30 | 2019-02-22 | 网易(杭州)网络有限公司 | Synthetic method, device, storage medium, processor and the terminal of FA Facial Animation |
CN110062116A (en) * | 2019-04-29 | 2019-07-26 | 上海掌门科技有限公司 | Method and apparatus for handling information |
WO2020221104A1 (en) * | 2019-04-30 | 2020-11-05 | 上海连尚网络科技有限公司 | Emoji packet presentation method and equipment |
CN110784762A (en) * | 2019-08-21 | 2020-02-11 | 腾讯科技(深圳)有限公司 | Video data processing method, device, equipment and storage medium |
CN110446066A (en) * | 2019-08-28 | 2019-11-12 | 北京百度网讯科技有限公司 | Method and apparatus for generating video |
CN110809090A (en) * | 2019-10-31 | 2020-02-18 | Oppo广东移动通信有限公司 | Call control method and related product |
CN111063339A (en) * | 2019-11-11 | 2020-04-24 | 珠海格力电器股份有限公司 | Intelligent interaction method, device, equipment and computer readable medium |
CN112804440A (en) * | 2019-11-13 | 2021-05-14 | 北京小米移动软件有限公司 | Method, device and medium for processing image |
CN111596841B (en) * | 2020-04-28 | 2021-09-07 | 维沃移动通信有限公司 | Image display method and electronic equipment |
CN111596841A (en) * | 2020-04-28 | 2020-08-28 | 维沃移动通信有限公司 | Image display method and electronic equipment |
CN111741162A (en) * | 2020-06-01 | 2020-10-02 | 广东小天才科技有限公司 | Recitation prompting method, electronic equipment and computer readable storage medium |
WO2022089222A1 (en) * | 2020-10-28 | 2022-05-05 | Ningbo Geely Automobile Research & Development Co., Ltd. | A camera system and method for generating an eye contact image view of a person |
CN114827648A (en) * | 2022-04-19 | 2022-07-29 | 咪咕文化科技有限公司 | Method, device, equipment and medium for generating dynamic expression package |
CN114827648B (en) * | 2022-04-19 | 2024-03-22 | 咪咕文化科技有限公司 | Method, device, equipment and medium for generating dynamic expression package |
Also Published As
Publication number | Publication date |
---|---|
WO2018108013A1 (en) | 2018-06-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108234735A (en) | A kind of media display methods and terminal | |
CN110288077B (en) | Method and related device for synthesizing speaking expression based on artificial intelligence | |
CN110531860B (en) | Animation image driving method and device based on artificial intelligence | |
CN110941954B (en) | Text broadcasting method and device, electronic equipment and storage medium | |
US20060281064A1 (en) | Image communication system for compositing an image according to emotion input | |
CN112099628A (en) | VR interaction method and device based on artificial intelligence, computer equipment and medium | |
CN106537493A (en) | Speech recognition system and method, client device and cloud server | |
CA2416592A1 (en) | Method and device for providing speech-to-text encoding and telephony service | |
US20070136671A1 (en) | Method and system for directing attention during a conversation | |
CN112188304A (en) | Video generation method, device, terminal and storage medium | |
CN110970018A (en) | Speech recognition method and device | |
CN107291704A (en) | Treating method and apparatus, the device for processing | |
CN107623622A (en) | A kind of method and electronic equipment for sending speech animation | |
CN113538628A (en) | Expression package generation method and device, electronic equipment and computer readable storage medium | |
CN108073572A (en) | Information processing method and its device, simultaneous interpretation system | |
CN109215629A (en) | Method of speech processing, device and terminal | |
CN107800860A (en) | Method of speech processing, device and terminal device | |
KR20170135598A (en) | System and Method for Voice Conversation using Synthesized Virtual Voice of a Designated Person | |
WO2022193635A1 (en) | Customer service system, method and apparatus, electronic device, and storage medium | |
CN109686359B (en) | Voice output method, terminal and computer readable storage medium | |
CN109754816B (en) | Voice data processing method and device | |
CN115690280B (en) | Three-dimensional image pronunciation mouth shape simulation method | |
CN109166368A (en) | A kind of talking pen | |
CN116229311B (en) | Video processing method, device and storage medium | |
CN110134235A (en) | A kind of method of guiding interaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20180629 |