CN108234735A

CN108234735A - A kind of media display methods and terminal

Info

Publication number: CN108234735A
Application number: CN201611154485.5A
Authority: CN
Inventors: 张长帅
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2016-12-14
Filing date: 2016-12-14
Publication date: 2018-06-29
Also published as: WO2018108013A1

Abstract

The present invention provides a kind of media display methods and terminal, wherein method includes：Acquire destination call voice；According to the destination call voice, obtain and show image with an at least frame target medium for the destination call voice match；Image, which plays out display, to be shown to an at least frame target medium by display interface.The program makes with the video pictures of virtualization in communication process, and communication process is more fresh and alive, agile, strengthens communication efficiency, adds communication enjoyment.

Description

A kind of media display methods and terminal

Technical field

The present invention relates to field of communication technology more particularly to a kind of media display methods and terminal.

Background technology

People are during communication telephones, and traditional function mobile phone can only carry out communication transmission by voice, and traditional call does not have There is the specific video scene of image, it is more vivid specific deep that audio communication can not show a candle to video communication.

Existing mobile phone can enter network environment by SIM card/wireless WIFI etc., realize video calling.But video calling It generally requires dependent on Wireless Networking, and actual conditions not have wireless network covering whenever and wherever possible, are networked by phonecard Video calling, expensive, which relies on network environment, is confined to the constraint of network environment, and video conversation can not become logical Believe normality, and people converse when exchanging, only speech exchange underaction is lively, and user experience is insufficient.

Invention content

A kind of media display methods and terminal in the embodiment of the present invention are provided, do not had to solve traditional call in the prior art The specific video scene of image, and Internet video call is confined to the constraint of network environment, it is of high cost the problem of.

In order to solve the above-mentioned technical problem, the embodiment of the present invention adopts the following technical scheme that：

On the one hand, the embodiment of the present invention provides a kind of media display methods, including：

Acquire destination call voice；

According to the destination call voice, obtain and shown with an at least frame target medium for the destination call voice match Image；

Image, which plays out display, to be shown to an at least frame target medium by display interface.

Optionally, it is described according to the destination call voice, obtain at least frame with the destination call voice match Target medium shows image, including：

Determine target expression packet corresponding with the destination call voice；

According to the vocal print feature of the destination call voice, determine to send out at least one needed for the destination call voice The target speaker shape of the mouth as one speaks；

At least frame target medium that matching obtains including the target speaker shape of the mouth as one speaks from the target expression packet is shown Image.

Optionally, the vocal print feature according to the destination call voice determines to send out the destination call voice institute At least one target speaker shape of the mouth as one speaks needed, including：

According to the vocal print feature of the destination call voice, word content corresponding with the destination call voice is determined；

According to the word content, from the mapping table of a word and phonetic, corresponding pinyin combinations are matched；

According to the pinyin combinations, from the mapping table of a phonetic and pronunciation mouth shape, matching is corresponding at least one Pronunciation mouth shape；

It is at least one target speaker sent out needed for the destination call voice to determine at least one pronunciation mouth shape The shape of the mouth as one speaks.

According to the word content, from word and the mapping table of phonetic notation, matching is corresponding with the word content Phonetic notation is combined；

It is combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, matches corresponding at least one hair The sound shape of the mouth as one speaks；

Optionally, the word and the mapping table of phonetic notation include the mapping table of Chinese character and phonetic；It is described according to According to the word content, from word and the mapping table of phonetic notation, match phonetic notation corresponding with the word content and combine, wrap It includes：

According to the word content, from the mapping table of Chinese character and phonetic, matching is corresponding with the word content Pinyin combinations；

The mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of phonetic and pronunciation mouth shape, it is described according to It is combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, matches corresponding at least one pronunciation mouth shape, packet It includes：

According to the pinyin combinations, from the mapping table of phonetic and pronunciation mouth shape, corresponding at least one hair is matched The sound shape of the mouth as one speaks.

Optionally, the mapping table of the phonetic and pronunciation mouth shape includes initial consonant and simple or compound vowel of a Chinese syllable and pronunciation mouth shape in phonetic Correspondence；It is described according to the pinyin combinations, from the mapping table of phonetic and pronunciation mouth shape, matching is corresponding at least One pronunciation mouth shape, including：

According to the pinyin combinations, initial consonant and simple or compound vowel of a Chinese syllable included in the pinyin combinations are determined, alternatively, determining the spelling Simple or compound vowel of a Chinese syllable included in sound combination；

According to the initial consonant and simple or compound vowel of a Chinese syllable, alternatively, according to the simple or compound vowel of a Chinese syllable, it is corresponding with pronunciation mouth shape from the initial consonant and simple or compound vowel of a Chinese syllable In relationship, matching obtains corresponding at least one pronunciation mouth shape.

Optionally, the word and the mapping table of phonetic notation include the mapping table of English and English phonetic；Institute It states according to the word content, from word and the mapping table of phonetic notation, matches phonetic notation group corresponding with the word content It closes, including：

According to the word content, in the mapping table from English with English phonetic, matching and the word content pair The phonetic symbol combination answered；

The mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of English phonetic and pronunciation mouth shape, institute It states and is combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, match corresponding at least one pronunciation mouth shape, Including：

It is combined according to the phonetic symbol, from the mapping table of English phonetic and pronunciation mouth shape, matching corresponding at least one A pronunciation mouth shape.

Optionally, the mapping table of the English phonetic and pronunciation mouth shape include English phonetic medial vowel and consonant with The correspondence of pronunciation mouth shape；It is described to be combined according to the phonetic symbol, from the mapping table of English phonetic and pronunciation mouth shape, With corresponding at least one pronunciation mouth shape, including：

It is combined according to the phonetic symbol, vowel and consonant included in the phonetic symbol combination is determined, alternatively, determining the sound Vowel included in mark combination；

According to the vowel and consonant, alternatively, according to the vowel, it is corresponding with pronunciation mouth shape from the vowel and consonant In relationship, matching obtains corresponding at least one pronunciation mouth shape.

Optionally, it is described to determine target expression packet corresponding with the destination call voice, including：

Determine the object contact person corresponding to the destination call voice；

Transfer the target expression packet with the object contact person pre-association.

Optionally, the acquisition destination call voice, including：

Monitor voice communication process；

It is the destination call voice to determine the other side's call voice received.

Optionally, it before the step of acquisition destination call voice, further includes：

The personal images of material resource packet and call contact person are obtained, wherein, at least one is included in the material resource packet A media materials image；

The personal images of the call contact person and each media materials image are integrated, generation is at least one Media show image, obtain showing the expression packet of image comprising at least one media.

Optionally, the personal images include personal face image, and the media materials image is included in phonetic The pronunciation mouth shape image corresponding to pronunciation mouth shape image or English phonetic medial vowel and consonant corresponding to initial consonant and simple or compound vowel of a Chinese syllable, it is described The personal images of the call contact person and each media materials image are integrated, at least one media is generated and shows Image obtains showing the expression packet of image comprising at least one media, including：

Identify the mouth region in the face image；

Pronunciation mouth shape image in the media materials image is filled replacement in the mouth region；

Generation media display image corresponding with each pronunciation mouth shape image in the media materials image, obtains The expression packet of image is shown comprising media corresponding with each pronunciation mouth shape image.

Optionally, it is described that image, which plays out display, to be shown to an at least frame target medium by display interface, it wraps It includes：

By the display interface, using the display image acquired as background frame, to an at least frame target Media show that image plays out display.On the other hand, the embodiment of the present invention also provides a kind of medium display terminal end, including：

Acquisition module, for acquiring destination call voice；

First acquisition module, for according to the destination call voice, obtaining with the destination call voice match extremely A few frame target medium shows image；

Display module shows that image plays out display for passing through display interface to an at least frame target medium.

Optionally, the first acquisition module, including：

First determination sub-module, for determining target expression packet corresponding with the destination call voice；

Second determination sub-module for the vocal print feature according to the destination call voice, determines that sending out the target leads to At least one target speaker shape of the mouth as one speaks needed for language sound；

Acquisition submodule obtains including at least the one of the target speaker shape of the mouth as one speaks for matching from the target expression packet Frame target medium shows image.

Optionally, second determination sub-module, including：

First determination unit for the vocal print feature according to the destination call voice, determines and the destination call language The corresponding word content of sound；

First matching unit, for according to the word content, from the mapping table of a word and phonetic, matching pair The pinyin combinations answered；

Second matching unit, for according to the pinyin combinations, from the mapping table of a phonetic and pronunciation mouth shape, With corresponding at least one pronunciation mouth shape；

Second determination unit, for determining that at least one pronunciation mouth shape is to send out needed for the destination call voice At least one target speaker shape of the mouth as one speaks.

Optionally, the mapping table of the phonetic and pronunciation mouth shape includes initial consonant and simple or compound vowel of a Chinese syllable and pronunciation mouth shape in phonetic Correspondence, second matching unit, including：

Determination subelement, for according to the pinyin combinations, determining initial consonant and/or rhythm included in the pinyin combinations It is female；

Coupling subelement, it is corresponding with pronunciation mouth shape from the initial consonant and simple or compound vowel of a Chinese syllable for according to the initial consonant and/or simple or compound vowel of a Chinese syllable In relationship, matching obtains corresponding at least one pronunciation mouth shape.

Optionally, first determination sub-module, including：

Third determination unit, for determining the object contact person corresponding to the destination call voice；

Unit is transferred, for transferring the target expression packet with the object contact person pre-association.

Optionally, the acquisition module, including：

Submodule is monitored, for monitoring voice communication process；

Third determination sub-module, for determining that the other side's call voice received is the destination call voice.

Optionally, it further includes：Second acquisition module, for obtaining the personal images of material resource packet and call contact person, Wherein, at least one media materials image is included in the material resource packet；

Generation module, it is whole for the personal images of the call contact person to be carried out with each media materials image It closes, generates at least one media and show image, obtain showing the expression packet of image comprising at least one media.

Optionally, the personal images include personal face image, and the media materials image is included in phonetic The pronunciation mouth shape image corresponding to pronunciation mouth shape image or English phonetic medial vowel and consonant corresponding to initial consonant and simple or compound vowel of a Chinese syllable, it is described Generation module, including：

Submodule is identified, for identifying the mouth region in the face image；

Replacement module, for the pronunciation mouth shape image in the media materials image to be filled in the mouth region It replaces；

Submodule is generated, for generating matchmaker corresponding with each pronunciation mouth shape image in the media materials image Body shows image, obtains showing the expression packet of image comprising media corresponding with each pronunciation mouth shape image.

Optionally, the display module, including：

Display sub-module, for passing through the display interface, using the display image acquired as background frame, to institute It states an at least frame target medium and shows that image plays out display.

One or more embodiments of the invention has the advantages that：

In the embodiment of the present invention, according to collected call voice, matching obtains an at least frame target medium and shows image, Those target mediums are shown with image carries out the continuous result of broadcast for playing display, forming video pattern, realizes speech recognition It is converted to video to show, can break through network environment limitation by the local software application model of terminal device, virtually regarded Frequency communication on telephone, the process do not depend on network, save flow and even break away from flow restriction, make in communication process with virtualization Video pictures, communication process is more fresh and alive, agile, strengthens communication efficiency, adds communication enjoyment, prompts user experience.

Description of the drawings

Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is part of the embodiment of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, the every other implementation that those of ordinary skill in the art are obtained without creative efforts Example, shall fall within the protection scope of the present invention.

Fig. 1 shows the flow diagrams of media display methods in first embodiment of the invention；

Fig. 2 represents the flow diagram of media display methods in second embodiment of the invention；

Fig. 3 represents the structure diagram of medium display terminal end in third embodiment of the invention；

Fig. 4 represents the operator template schematic diagram in the embodiment of the present invention.

Specific embodiment

First embodiment

A kind of media display methods disclosed in the embodiment of the present invention, with reference to shown in Fig. 1, including：

Step 101：Acquire destination call voice.

The step can betide during normal telephone relation broadcasting for voice in the either instant message applications such as wechat During putting, which can be the contact person that the voice of the one's own side in call is either conversed with oneself Sound, the destination call voice form next process object.

When during applied to normal telephone relation, as a specific embodiment, the acquisition destination call language The step of sound, can include：Monitor voice communication process；It is the destination call voice to determine the other side's call voice received. That is, the call voice of the other side contact person in call is acquired, next processing procedure is carried out, realizes entertaining call.

Step 102：According to destination call voice, obtain and shown with an at least frame target medium for destination call voice match Image.

Wherein, which shows that image can carry out matching choosing in the one group of image packet set It takes.An at least frame target medium shows that image can be matched with the dialog context of destination call voice or and mesh The tone of sound of call voice, volume or the shape of the mouth as one speaks needed for sounding is marked to match, the target medium show image can be expression, Either limb action or it is the different symbol of height or is place corresponding with voice content and background environment.

Step 103：Image, which plays out display, to be shown to an at least frame target medium by display interface.

Accordingly, during this shows an at least frame target medium image carries out continuous broadcasting display, in addition to expression Become outside the pale of civilization with voice difference, can also be that limb action is switched over voice difference.

The process, according to collected call voice, matching obtains an at least frame target medium and shows image, to those mesh Mark media show that image plays out display, form the result of broadcast of video pattern, realize that speech recognition conversion is shown for video, It can break through network environment limitation by the local software application model of terminal device, carry out virtual video communication on telephone, the mistake Journey does not depend on network, saves flow and even breaks away from flow restriction, makes to converse with the video pictures of virtualization in communication process Cheng Gengjia is fresh and alive, agile, strengthens communication efficiency, adds communication enjoyment, prompts user experience.

Second embodiment

A kind of media display methods disclosed in the embodiment of the present invention, with reference to shown in Fig. 2, including：

Step 201：Acquire destination call voice.

Step 202：Determine target expression packet corresponding with destination call voice.

The target expression packet can be the fixed expression packet to match from different destination call voices of setting, Directly it is determined and reads from memory device；Also either having targetedly can be with specific in destination call voice Element and the expression packet changed, need according to destination call voice, carry out matching and determine.

As a specific embodiment, wherein, this determines target expression packet corresponding with the destination call voice Step, including：

Determine the object contact person corresponding to the destination call voice；Transfer the mesh with the object contact person pre-association Mark expression packet.The expression packet can have the resource content for realizing the correspondence established with object contact person, such as：When When knowing that contact person phones the mobile phone of user, the mobile phone of user can know that the destination call voice sent out is the contact What human hair went out, it is corresponding to be adapted to the corresponding expression packet of incoming call connection people according to address list information.Specifically can be and object contact person My photo either particular picture, different settings is done for specific contact person, makes display result more targeted, It is more interesting.

Step 203：According to the vocal print feature of destination call voice, determine to send out at least one needed for destination call voice The target speaker shape of the mouth as one speaks.

Different call voices will correspond to different pronunciation mouth shapes, and different voices has different vocal print feature information, Specifically, can required pronunciation mouth shape be determined according to the vocal print feature in the destination call voice collected.

As a specific embodiment, wherein, which determines to send out institute The step of stating at least one target speaker shape of the mouth as one speaks needed for destination call voice, including：

According to the vocal print feature of destination call voice, word content corresponding with destination call voice is determined；According to this article Word content from word and the mapping table of phonetic notation, matches phonetic notation corresponding with the word content and combines；According to the phonetic notation group It closes, from the mapping table of phonetic notation and pronunciation mouth shape, matches corresponding at least one pronunciation mouth shape；Determine at least one hair The sound shape of the mouth as one speaks is at least one target speaker shape of the mouth as one speaks sent out needed for the destination call voice.

Wherein, which is the pronunciation marked content that word occurs, and different language corresponds to different words, different Word corresponds to different phonetic notation systems, for example, when word is Chinese, corresponding phonetic notation is phonetic, when word is English, Corresponding phonetic notation is English phonetic.The vocal print feature of the foundation destination call voice determines the process of corresponding pronunciation mouth shape, needs Word is first converted speech into, is combined by the corresponding phonetic notation of characters matching, corresponding target is combined so as to match the phonetic notation Pronunciation mouth shape.

Specifically, which realizes that flow is as follows, receives voice, and interface realization of increasing income is called in voice data quantization processing Voice converts word, and principle has different vocal print feature information for different voices, can be with pre-recorded vocal print feature information It is compareed with word, generates database correspondence, being compared after new speech is captured with pre-recorded database can Corresponding word is found, and text conversion phonetic notation process is also in this way, pair of pre-recorded difference phonetic notation and corresponding different literals According to relationship, array is written, generates database correspondence, after voice conversion word, word finds note in contrasting data library again Sound, then the phonetic notation pre-set from database are sent out in the corresponding table of pronunciation mouth shape, obtaining target corresponding with phonetic notation Phonetic notation is split adaptation facial expression image, is switched fast display, generates virtual video effect by the sound shape of the mouth as one speaks.

On the one hand, it is specific real as one when determining word content corresponding with destination call voice is Chinese character Mode is applied, wherein, the mapping table of the word and phonetic notation includes the mapping table of Chinese character and phonetic；In the foundation word Hold, from word and the mapping table of phonetic notation, match phonetic notation corresponding with the word content and combine, including：

According to the word content, from the mapping table of Chinese character and phonetic, phonetic corresponding with the word content is matched Combination.

Accordingly, the mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of phonetic and pronunciation mouth shape, This is combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, matches corresponding at least one pronunciation mouth shape, Including：According to the pinyin combinations, from the mapping table of phonetic and pronunciation mouth shape, corresponding at least one pronunciation mouth is matched Type.

The process of Chinese character conversion phonetic is also in this way, pre-recorded phonetic and Chinese character contrast relationship, the GBK characters of standard Collect and phonetic transcriptions of Chinese characters table is read in database, array is written, generate database correspondence, after voice converts Chinese character, Chinese character is right again According to finding phonetic in database, then the phonetic pre-set from database is in the corresponding table of pronunciation mouth shape, obtaining Phonetic is split adaptation facial expression image by the target speaker shape of the mouth as one speaks corresponding with phonetic, is switched fast display, generation virtual video effect Fruit.

As a specific embodiment, wherein, the mapping table of the phonetic and pronunciation mouth shape includes sound in phonetic The female and correspondence of simple or compound vowel of a Chinese syllable and pronunciation mouth shape；This is according to the pinyin combinations, from phonetic and the mapping table of pronunciation mouth shape In, corresponding at least one pronunciation mouth shape is matched, including：

According to the pinyin combinations, initial consonant and simple or compound vowel of a Chinese syllable included in the pinyin combinations are determined, alternatively, alternatively, determining Simple or compound vowel of a Chinese syllable included in the pinyin combinations；According to the initial consonant and simple or compound vowel of a Chinese syllable, alternatively, according to the simple or compound vowel of a Chinese syllable, from the initial consonant and simple or compound vowel of a Chinese syllable With in the correspondence of pronunciation mouth shape, matching and obtaining corresponding at least one pronunciation mouth shape.

The process, in the mapping table from Chinese character and phonetic, after matching pinyin combinations corresponding with word content, Phonetic in structure of arrays body is positioned and splits initial consonant and simple or compound vowel of a Chinese syllable, since some Chinese character pronunciations correspond only to individual simple or compound vowel of a Chinese syllable, because The combination that initial consonant and simple or compound vowel of a Chinese syllable may be included in this pinyin combinations determined or only simple or compound vowel of a Chinese syllable, according to initial consonant and simple or compound vowel of a Chinese syllable Corresponding shape of the mouth as one speaks adaptation facial expression image, and switching display is done in display window, expression is switched fast, and generation virtual video shows shape Formula.Specifically, the mapping table of phonetic and pronunciation mouth shape therein can also be imported or downloaded generation in realization.

Specifically, this programme is conversed according on mobile phone, for realizing virtual video, is wherein set in storage device There is the shape of the mouth as one speaks resources bank of Chinese phonetic alphabet, in the standard shape of the mouth as one speaks library collected in advance, each mouth shape image can correspond in library The pronunciation example of one phonetic alphabet, the initial and the final in shape of the mouth as one speaks figure and pinyin table correspond, and each pronouncing in this way can Corresponding initial consonant, the combination association of simple or compound vowel of a Chinese syllable image are found in shape of the mouth as one speaks library, makes mouth of the pronunciation mouth shape really with practical call personnel Sounding nozzle type display effect is consistent.

On the other hand, when determining word content corresponding with destination call voice is English, as a specific implementation Mode, wherein, the mapping table of the word and phonetic notation includes the mapping table of English and English phonetic；Described in the foundation Word content from word and the mapping table of phonetic notation, matches phonetic notation corresponding with the word content and combines, including：

According to the word content, in the mapping table from English with English phonetic, matching is corresponding with the word content Phonetic symbol combines.

Accordingly, the mapping table of the phonetic notation and pronunciation mouth shape includes the correspondence of English phonetic and pronunciation mouth shape Table, it is described to be combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, match corresponding at least one pronunciation The shape of the mouth as one speaks, including：It is combined according to the phonetic symbol, from the mapping table of English phonetic and pronunciation mouth shape, matching is corresponding at least One pronunciation mouth shape.

The process of English conversion English phonetic is also the contrast relationship in this way, between pre-recorded English phonetic and English, Array is written, generates database correspondence, after voice conversion is English, English phonetic is found in English contrasting data library again, Pre-set English phonetic is sent out in the corresponding table of pronunciation mouth shape, obtaining target corresponding with English phonetic from database again English phonetic is carried out splitting adaptation facial expression image, is switched fast display, generates virtual video effect by the sound shape of the mouth as one speaks.

As a specific embodiment, wherein, the mapping table of the English phonetic and pronunciation mouth shape includes English The correspondence of phonetic symbol medial vowel and consonant and pronunciation mouth shape；This is combined according to the phonetic symbol, from English phonetic and pronunciation mouth shape Mapping table in, match corresponding at least one pronunciation mouth shape, including：

It is combined according to the phonetic symbol, vowel and consonant included in the phonetic symbol combination is determined, alternatively, determining the sound Vowel included in mark combination；According to the vowel and consonant, alternatively, according to the vowel, from the vowel and consonant with In the correspondence of pronunciation mouth shape, matching obtains corresponding at least one pronunciation mouth shape.

The process in the mapping table from English with English phonetic, matches phonetic symbol group corresponding with word content After conjunction, English phonetic in structure of arrays body is positioned and splits vowel and consonant, since some English equivalents correspond only to individually Vowel, it is thus determined that the combination that vowel and consonant may be included in obtained pinyin combinations or only vowel, according to member Sound and consonant correspond to shape of the mouth as one speaks adaptation facial expression image, and do switching display in display window, and expression is switched fast, and generates virtual video Show form.Specifically, the mapping table of English phonetic and pronunciation mouth shape therein can also be imported in realization or Download generation.

Specifically, this programme is conversed according on mobile phone, for realizing virtual video, is wherein set in storage device There is the shape of the mouth as one speaks resources bank of English phonetic symbol, in the standard shape of the mouth as one speaks library collected in advance, each mouth shape image can correspond in library The pronunciation example of one English phonetic symbol, the vowel-consonant in shape of the mouth as one speaks figure and English phonetic table are corresponded, are each sent out in this way Sound can find corresponding vowel, consonant articulation image in shape of the mouth as one speaks library, be combined association, make pronunciation mouth shape really with reality The mouth sounding nozzle type display effect of border call personnel is consistent.

Step 204：At least frame target medium that matching obtains including the target speaker shape of the mouth as one speaks from target expression packet is shown Image.

Specifically, which as includes the expression packet of different nozzle type, matchmaker included in the nozzle type expression packet Body shows that image also has the display image of different opening and closing for nozzle type, it is preferable that the target medium shows image kind in addition to can be with Outside comprising the target speaker shape of the mouth as one speaks, other variation elements, such as variation, expression and shape of face of eyebrow etc. can also be included, into Row is arranged in pairs or groups with nozzle type.

Step 205：Image, which plays out display, to be shown to an at least frame target medium by display interface.

The process, in two-party conversation, user's terminal, for example, mobile phone carry out adaptation contingency table according to speech recognition Feelings through correspondence image resource synthesis optimizing, make picture continuously be played with voice, expression constantly switching update therewith, can To generate virtual video scene under no network environment, realize the scene of virtual display video conversation, allow dialogue it is more effective, It is mostly interesting.

The process is described in lower mask body.The voice converts word phonetic and is adapted to the technology realization step of expression such as Under：

Step 1. reads phonetic transcriptions of Chinese characters content in the Hanzi coded character set of standard (GBK) database in advance, by Chinese character Array is written in phonetic data, and each element of array is a structure, and structure is divided into four parts, phonetic, after phonetic is split Initial consonant, simple or compound vowel of a Chinese syllable and the corresponding Chinese character of phonetic.

Step 2. voice, for word, is searched in array using word as keyword, can looked for by interface conversion of increasing income Go out corresponding phonetic, then split in enantiomorph by phonetic and inquire associated initial consonant, simple or compound vowel of a Chinese syllable.

Association expression is searched in step 3. initial consonant, simple or compound vowel of a Chinese syllable control, and visual dialog frame is created as user interface (UI) in desktop Window is presented expression in UI windows.

An at least frame target medium is shown image carry out it is continuous play display when, can be with for the time is presented The time span of destination call voice is first calculated, according to the time span of destination call voice, determines to show per frame target medium Image carries out playing duration during continuous broadcasting display, secretly changes continuous play of playing duration progress and shows, specifically by mesh It marks the time span of call voice divided by target medium shows the frame number of image, the playing duration of a frame image is obtained, with " you It is good " for, it is assumed that voice " hello " time span is 1 second, and speech decomposition corresponds to 4 images for expression, and every image shows 1/4 Second, i.e. every picture display times are 250 milliseconds, according to the sequencing that destination call voice is got, pair and destination call An at least frame target medium for voice match shows that image carries out continuous switching and presents.The expression and reception that window is presented at this time Voice corresponds to, and in communication process, constantly changes with voice, and user's expression packet is switched fast in UI windows, that is, shows video effect Fruit, so as to fulfill virtual video call function.

As a specific embodiment, wherein, image is shown to an at least frame target medium by display interface The step of playing out display, including：By the display interface, using the display image acquired as background frame, to institute It states an at least frame target medium and shows that image plays out display.

When showing that image plays out display to target medium, broadcasting background can be added, which can be setting A fixed background corresponding with contact person or according to the keyword or word recognized in destination call voice come into Row matching obtains corresponding image as background frame, and the background frame is allow to be changed according to user speech content.

This process, by speech recognition, carries out the shape of the mouth as one speaks and corresponds to lookup adaptation mainly under no wireless network environment, and Synthesis processing is carried out to scene image (background frame) known in library, is reproduced so as to fulfill the virtual scene under no network environment, Can be under no network environment, the scene and expression that constantly switch can realize the scene of virtual display video conversation, allow dialogue Exchange is more effective, mostly interesting.

Specifically, which realizes that flow is as follows, such as when carrying out call voice dialogue on mobile phone, and both sides' mobile phone exists Pre-stored expression, scene image, when receiving other side's voice, local first starts sound identification module, passes through and identifies switching Corresponding media show resource, and in terminal session interface display, voice changes at any time, and speech recognition is constantly adapted to corresponding expression money Source can show a fixed background frame in terminal interface or be adapted to corresponding background frame according to voice content, and expression is matched It closes background frame in terminal interface to switch, the visual effect being switched fast, is exactly the video conversation scene of reality, so as to local Realize the video conversation of virtual reality.

As a specific embodiment, wherein, acquire destination call voice the step of before, further include：

The personal images of material resource packet and call contact person are obtained, wherein, at least one is included in the material resource packet A media materials image；The personal images of call contact person and each media materials image are integrated, generation is at least one Media show image, obtain showing the expression packet of image comprising at least one media.

Wherein, the personal images can be personal facial expression image or with call contact person have incidence relation, Relative image.The process corresponds to the initialization procedure for the expression packet that image is shown comprising media, to allow to basis Destination call voice obtains at least frame target medium to match and shows image.The material resource packet can have network In the case of in advance download obtain.The process that personal images are integrated with each media materials image, can be that for example stingy picture is filled out It fills, part is replaced, and is realized the methods of partial mulching.

Specifically, e.g., user installs virtual simulation software in mobile phone, and picture field can be arbitrarily set in software Scape, upload user image, initialization is preset, generates user's expression packet.Image resource, Ke Yishi first needed for user mobile phone storage It takes pictures before pre- or network downloads to mobile phone local, generally include individual subscriber image, material resource packet, convention video dialogue institute In scene picture, material resource packet is, for example, shape of the mouth as one speaks resource packet, and wherein shape of the mouth as one speaks resource packet is integrated for invention software itself, provides User uses.By taking shape of the mouth as one speaks resource packet as an example, software integrates mouth shape image resource in itself, and user first provides personal images using preceding, To software initialization, initialization can synthesize mouth shape image and personal images, and do image optimization, by the shape of the mouth as one speaks inclusion into To user's image, generation user corresponds to the expression packet of pinyin table letter.Image composing technique first will be in user's facial image The shape of the mouth as one speaks and fringe region are removed and the superposition of the shape of the mouth as one speaks resource of same size, then to image optimization processing, obtains corresponding phonetic alphabet The user of table customizes expression packet.

As a specific embodiment, wherein, which includes personal face image, media materials image Include pronunciation mouth shape image in phonetic corresponding to initial consonant and simple or compound vowel of a Chinese syllable or English phonetic medial vowel and the pronunciation corresponding to consonant Mouth shape image, the personal images by call contact person are integrated with each media materials image, and generation is at least one Media show image, the step of obtaining showing the expression packet of image comprising at least one media, including：

Identify the mouth region in the face image；By the pronunciation mouth shape image in the media materials image described Mouth region is filled replacement；Generate media corresponding with each pronunciation mouth shape image in the media materials image It shows image, obtains showing the expression packet of image comprising media corresponding with each pronunciation mouth shape image.

By taking the pronunciation mouth shape image that media materials image is included in phonetic corresponding to initial consonant and simple or compound vowel of a Chinese syllable as an example, the face and The particular technique of shape of the mouth as one speaks synthesis expression packet realizes that step is as follows：

Step 1. is by taking first and second liang of people are conversed as an example, in the voice communication apparatus that first is held, prestores the image of second Resource separately prestores the mouth shape image resource of all alphabetical pronunciations of corresponding Chinese phonetic alphabet table.

Face and shape of the mouth as one speaks coloured image are converted to gray level image by step 2., turn gray scale for colour, according to classical operation Formula, Gray=R*0.299+G*0.587+B*0.114 to avoid the floating-point operation of low speed, introduce integer arithmetic, and make four houses Five enter, and obtain mutation algorithm of equal value, and Gray=(R*30+G*59+B*11+50)/100 improves calculating transfer efficiency.

Step 3. gray level image takes gray threshold, is split face using threshold value, mouth region detection is realized, to people Face image carries out edge detection, right using mean operator template (i.e. each calculated for pixel values is updated to the mean value of adjacent pixels) Gray level image is handled, and can detect face characteristic region, such as eye mouth and nose characteristic area；Or according to facial symmetry and knot Structure is distributed, and identifies mouth region.

Step 4. replaces the filling of original shape of the mouth as one speaks resource in the shape of the mouth as one speaks region that human face region detects, and expression is generated, by step 1 Often row carries out quantization sampling to the pixel value of the mouth shape image of middle generation, makes the shape of the mouth as one speaks area pixel of its number of pixels and step 3 Number is consistent, and the shape of the mouth as one speaks resource after sample quantization is replaced filling to the shape of the mouth as one speaks region of step 3, rebuilds generation human face expression Image.

Step 5. carries out gaussian filtering process to newly-generated facial expression image, enhances the flatness of composograph, generates table Feelings resources bank, every facial expression image corresponds to the pronunciation shape of the mouth as one speaks of alphabet for the face shape of the mouth as one speaks in library.

Face corresponds to and the synthesis of the different shape of the mouth as one speaks, generates different facial expression images, every image corresponds to the pronunciation of phonetic alphabet Shape of the mouth as one speaks expression.Newly-generated Facial Expression Image is needed to carry out denoising smooth processing by gaussian filtering, and then is got clear Clear facial expression image.The template that Gaussian function is calculated is floating point number, whole for balance filter effect and computational efficiency, use 5 × 5 template operator of number, coefficient 1/273, template such as Fig. 4 examples.

The process, according to collected call voice, matching obtains an at least frame target medium and shows image, to those mesh Mark media and show that image carries out the continuous result of broadcast for playing display, forming video pattern, realize speech recognition conversion be regarding Frequency is shown, can be broken through network environment limitation by the local software application model of terminal device, carry out virtual video phone friendship Stream, the process do not depend on network, save flow and even break away from flow restriction, make to draw with the video of virtualization in communication process Face, communication process is more fresh and alive, agile, strengthens communication efficiency, adds communication enjoyment.

3rd embodiment

The embodiment of the present invention discloses a kind of medium display terminal end, with reference to shown in Fig. 3, including：Acquisition module 301, first are obtained Modulus block 302 and display module 303.The medium display terminal end can be the terminal such as smartwatch, mobile phone for supporting voice.

Acquisition module 301, for acquiring destination call voice.

First acquisition module 302, for according to the destination call voice, obtaining and the destination call voice match An at least frame target medium shows image.

Display module 303, for pass through display interface to an at least frame target medium show image play out it is aobvious Show.

Wherein, the first acquisition module, including：

First determination sub-module, for determining target expression packet corresponding with the destination call voice.

Second determination sub-module for the vocal print feature according to the destination call voice, determines that sending out the target leads to At least one target speaker shape of the mouth as one speaks needed for language sound.

Wherein, second determination sub-module, including：

First determination unit for the vocal print feature according to the destination call voice, determines and the destination call language The corresponding word content of sound.

First matching unit, according to the word content, from word and the mapping table of phonetic notation, matching and the text The corresponding phonetic notation combination of word content.

Second matching unit, for being combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, matching Corresponding at least one pronunciation mouth shape.

Wherein, the word and the mapping table of phonetic notation include the mapping table of Chinese character and phonetic；Described first Matching unit includes：

First coupling subelement, for according to the word content, from the mapping table of Chinese character and phonetic, matching with The corresponding pinyin combinations of the word content.

The mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of phonetic and pronunciation mouth shape, and described Two matching units include：

Second coupling subelement, for according to the pinyin combinations, from the mapping table of phonetic and pronunciation mouth shape, With corresponding at least one pronunciation mouth shape.

Wherein, the mapping table of the phonetic and pronunciation mouth shape includes initial consonant and simple or compound vowel of a Chinese syllable and pronunciation mouth shape in phonetic Correspondence；Second coupling subelement is specifically used for：

According to the pinyin combinations, initial consonant and simple or compound vowel of a Chinese syllable included in the pinyin combinations are determined, alternatively, determining the spelling Simple or compound vowel of a Chinese syllable included in sound combination；According to the initial consonant and simple or compound vowel of a Chinese syllable, alternatively, according to the simple or compound vowel of a Chinese syllable, from the initial consonant and simple or compound vowel of a Chinese syllable with In the correspondence of pronunciation mouth shape, matching obtains corresponding at least one pronunciation mouth shape.

Wherein, the word and the mapping table of phonetic notation include the mapping table of English and English phonetic；It is described First matching unit includes：

Third coupling subelement, for according to the word content, in the mapping table from English with English phonetic, It is combined with phonetic symbol corresponding with the word content.

The mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of English phonetic and pronunciation mouth shape, institute The second matching unit is stated to include：

4th coupling subelement, for being combined according to the phonetic symbol, from English phonetic and the mapping table of pronunciation mouth shape In, match corresponding at least one pronunciation mouth shape.

Wherein, the mapping table of the English phonetic and pronunciation mouth shape includes English phonetic medial vowel and consonant and hair The correspondence of the sound shape of the mouth as one speaks；4th coupling subelement is specifically used for：

Wherein, first determination sub-module, including：

Third determination unit, for determining the object contact person corresponding to the destination call voice.

Wherein, the acquisition module, including：

Submodule is monitored, for monitoring voice communication process.

Wherein, which further includes：

Second acquisition module, for obtaining the personal images of material resource packet and call contact person, wherein, the material money At least one media materials image is included in the packet of source.

Wherein, the personal images include personal face image, and the media materials image includes sound in phonetic The pronunciation mouth shape image corresponding to pronunciation mouth shape image or English phonetic medial vowel and consonant corresponding to female and simple or compound vowel of a Chinese syllable, the life Into module, including：

Submodule is identified, for identifying the mouth region in the face image.

Replacement module, for the pronunciation mouth shape image in the media materials image to be filled in the mouth region It replaces.

Wherein, the display module, including：

The medium display terminal end, according to collected call voice, matching obtains an at least frame target medium and shows image, Those target mediums are shown with image carries out the continuous result of broadcast for playing display, forming video pattern, realizes speech recognition It is converted to video to show, can break through network environment limitation by the local software application model of terminal device, virtually regarded Frequency communication on telephone, the process do not depend on network, save flow and even break away from flow restriction, make in communication process with virtualization Video pictures, communication process is more fresh and alive, agile, strengthens communication efficiency, adds communication enjoyment.

Although the preferred embodiment of the embodiment of the present invention has been described, those skilled in the art once know base This creative concept can then make these embodiments other change and modification.So appended claims are intended to be construed to Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.

Finally, it is to be noted that, in embodiments of the present invention, relational terms such as first and second and the like are only Only it is used for distinguishing one entity or operation from another entity or operation, without necessarily requiring or implying these realities There are any actual relationship or orders between body or operation.Moreover, term " comprising ", "comprising" or its it is any its He is intended to non-exclusive inclusion by variant, so that process, method, article or terminal including a series of elements are set It is standby not only to include those elements, but also including other elements that are not explicitly listed or further include as this process, side Method, article or the intrinsic element of terminal device.In the absence of more restrictions, it is limited by sentence "including a ..." Fixed element, it is not excluded that also there is in addition identical in the process including the element, method, article or terminal device Element.

Above-described is the preferred embodiment of the present invention, it should be pointed out that the ordinary person of the art is come It says, several improvements and modifications can also be made under the premise of principle of the present invention is not departed from, these improvements and modifications also exist In protection scope of the present invention.

Claims

1. a kind of media display methods, which is characterized in that including：

Acquire destination call voice；

According to the destination call voice, obtain and scheme with an at least frame target medium for the destination call voice match display Picture；

2. according to the method described in claim 1, it is characterized in that, described according to the destination call voice, obtain with it is described An at least frame target medium for destination call voice match shows image, including：

According to the vocal print feature of the destination call voice, determine to send out at least one target needed for the destination call voice Pronunciation mouth shape；

Matching obtains at least frame target medium comprising the target speaker shape of the mouth as one speaks and shows image from the target expression packet.

3. according to the method described in claim 2, it is characterized in that, the vocal print feature according to the destination call voice, Determine to send out at least one target speaker shape of the mouth as one speaks needed for the destination call voice, including：

According to the word content, from word and the mapping table of phonetic notation, phonetic notation corresponding with the word content is matched Combination；

It is combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, matches corresponding at least one pronunciation mouth Type；

It is at least one target speaker shape of the mouth as one speaks sent out needed for the destination call voice to determine at least one pronunciation mouth shape.

4. according to the method described in claim 3, it is characterized in that, the word and the mapping table of phonetic notation include Chinese character With the mapping table of phonetic；It is described according to the word content, from word and the mapping table of phonetic notation, matching with it is described The corresponding phonetic notation combination of word content, including：

According to the word content, from the mapping table of Chinese character and phonetic, phonetic corresponding with the word content is matched Combination；

The mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of phonetic and pronunciation mouth shape, described according to institute Phonetic notation combination is stated, from the mapping table of phonetic notation and pronunciation mouth shape, matches corresponding at least one pronunciation mouth shape, including：

According to the pinyin combinations, from the mapping table of phonetic and pronunciation mouth shape, corresponding at least one pronunciation mouth is matched Type.

5. according to the method described in claim 4, it is characterized in that, the mapping table of the phonetic and pronunciation mouth shape includes The correspondence of initial consonant and simple or compound vowel of a Chinese syllable and pronunciation mouth shape in phonetic；It is described according to the pinyin combinations, from phonetic and pronunciation mouth shape In mapping table, corresponding at least one pronunciation mouth shape is matched, including：

According to the pinyin combinations, initial consonant and simple or compound vowel of a Chinese syllable included in the pinyin combinations are determined, alternatively, determining the pinyin-group Simple or compound vowel of a Chinese syllable included in conjunction；

According to the initial consonant and simple or compound vowel of a Chinese syllable, alternatively, according to the simple or compound vowel of a Chinese syllable, from the initial consonant and the correspondence of simple or compound vowel of a Chinese syllable and pronunciation mouth shape In, matching obtains corresponding at least one pronunciation mouth shape.

6. according to the method described in claim 3, it is characterized in that, the word and the mapping table of phonetic notation include English With the mapping table of English phonetic；It is described according to the word content, from word and the mapping table of phonetic notation, matching with The corresponding phonetic notation combination of the word content, including：

According to the word content, in the mapping table from English with English phonetic, matching is corresponding with the word content Phonetic symbol combines；

The mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of English phonetic and pronunciation mouth shape, it is described according to It is combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, matches corresponding at least one pronunciation mouth shape, packet It includes：

It is combined according to the phonetic symbol, from the mapping table of English phonetic and pronunciation mouth shape, matches corresponding at least one hair The sound shape of the mouth as one speaks.

7. according to the method described in claim 6, it is characterized in that, in the mapping table of the English phonetic and pronunciation mouth shape Correspondence including English phonetic medial vowel and consonant and pronunciation mouth shape；It is described to be combined according to the phonetic symbol, from English phonetic With in the mapping table of pronunciation mouth shape, matching corresponding at least one pronunciation mouth shape, including：

It is combined according to the phonetic symbol, vowel and consonant included in the phonetic symbol combination is determined, alternatively, determining the phonetic symbol group Vowel included in conjunction；

According to the vowel and consonant, alternatively, according to the vowel, from the vowel and the correspondence of consonant and pronunciation mouth shape In, matching obtains corresponding at least one pronunciation mouth shape.

8. according to the method described in claim 2, it is characterized in that, described determine target corresponding with the destination call voice Expression packet, including：

9. according to claim 1-8 any one of them methods, which is characterized in that the acquisition destination call voice, including：

Monitor voice communication process；

10. according to claim 1-8 any one of them methods, which is characterized in that before the acquisition destination call voice, institute The method of stating further includes：

The personal images of material resource packet and call contact person are obtained, wherein, at least one matchmaker is included in the material resource packet Body material image；

The personal images of the call contact person and each media materials image are integrated, generate at least one media It shows image, obtains showing the expression packet of image comprising at least one media.

11. according to the method described in claim 10, it is characterized in that, the personal images include personal face image, The media materials image include pronunciation mouth shape image in phonetic corresponding to initial consonant and simple or compound vowel of a Chinese syllable or English phonetic medial vowel and Pronunciation mouth shape image corresponding to consonant, the personal images by the call contact person and each media materials image It is integrated, generates at least one media and show image, obtain showing the expression packet of image, packet comprising at least one media It includes：

Identify the mouth region in the face image；

Generation media display image corresponding with each pronunciation mouth shape image in the media materials image, comprising Media corresponding with each pronunciation mouth shape image show the expression packet of image.

12. according to the method described in claim 1, it is characterized in that, it is described by display interface to an at least frame target Media show that image plays out display, including：

By the display interface, using the display image acquired as background frame, to an at least frame target medium Display image plays out display.

13. a kind of medium display terminal end, which is characterized in that including：

Acquisition module, for acquiring destination call voice；

First acquisition module, for according to the destination call voice, obtaining at least one with the destination call voice match Frame target medium shows image；

14. medium display terminal end according to claim 13, which is characterized in that the first acquisition module, including：

Second determination sub-module for the vocal print feature according to the destination call voice, determines to send out the destination call language At least one target speaker shape of the mouth as one speaks needed for sound；

Acquisition submodule, for matching at least frame mesh for obtaining including the target speaker shape of the mouth as one speaks from the target expression packet It marks media and shows image.

15. medium display terminal end according to claim 14, which is characterized in that second determination sub-module, including：

First determination unit for the vocal print feature according to the destination call voice, determines and the destination call voice pair The word content answered；

First matching unit, according to the word content, from word and the mapping table of phonetic notation, matching in the word Hold corresponding phonetic notation combination；

Second matching unit, for being combined according to the phonetic notation, from the mapping table of phonetic notation and pronunciation mouth shape, matching corresponds to At least one pronunciation mouth shape；

16. medium display terminal end according to claim 15, which is characterized in that the word and the mapping table of phonetic notation Include the mapping table of Chinese character and phonetic；First matching unit includes：

First coupling subelement, for according to the word content, from the mapping table of Chinese character and phonetic, matching with it is described The corresponding pinyin combinations of word content；

The mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of phonetic and pronunciation mouth shape, described second Include with unit：

Second coupling subelement, for according to the pinyin combinations, from the mapping table of phonetic and pronunciation mouth shape, matching pair At least one pronunciation mouth shape answered.

17. medium display terminal end according to claim 16, which is characterized in that the phonetic is corresponding with pronunciation mouth shape to close It is that table includes the correspondence of initial consonant and simple or compound vowel of a Chinese syllable and pronunciation mouth shape in phonetic；Second coupling subelement is specifically used for：

18. medium display terminal end according to claim 15, which is characterized in that the word and the mapping table of phonetic notation Include the mapping table of English and English phonetic；First matching unit includes：

Third coupling subelement, for according to the word content, in the mapping table from English with English phonetic, matching with The corresponding phonetic symbol combination of the word content；

The mapping table of the phonetic notation and pronunciation mouth shape includes the mapping table of English phonetic and pronunciation mouth shape, and described Two matching units include：

4th coupling subelement, for being combined according to the phonetic symbol, from the mapping table of English phonetic and pronunciation mouth shape, With corresponding at least one pronunciation mouth shape.

19. medium display terminal end according to claim 18, which is characterized in that pair of the English phonetic and pronunciation mouth shape Relation table is answered to include the correspondence of English phonetic medial vowel and consonant and pronunciation mouth shape；4th coupling subelement is specific For：

20. medium display terminal end according to claim 14, which is characterized in that first determination sub-module, including：

21. according to claim 13-20 any one of them medium display terminal ends, which is characterized in that the acquisition module, packet It includes：

Submodule is monitored, for monitoring voice communication process；

22. according to claim 13-20 any one of them medium display terminal ends, which is characterized in that further include：

Second acquisition module, for obtaining the personal images of material resource packet and call contact person, wherein, the material resource packet In include at least one media materials image；

Generation module, it is raw for the personal images of the call contact person and each media materials image to be integrated Image is shown at least one media, obtains showing the expression packet of image comprising at least one media.

23. medium display terminal end according to claim 22, which is characterized in that the personal images include personal face Portion's image, the media materials image are included in pronunciation mouth shape image or English phonetic in phonetic corresponding to initial consonant and simple or compound vowel of a Chinese syllable Pronunciation mouth shape image corresponding to vowel and consonant, the generation module, including：

Submodule is identified, for identifying the mouth region in the face image；

Replacement module replaces for the pronunciation mouth shape image in the media materials image to be filled in the mouth region It changes；

Submodule is generated, is shown for generating media corresponding with each pronunciation mouth shape image in the media materials image Diagram picture obtains showing the expression packet of image comprising media corresponding with each pronunciation mouth shape image.

24. medium display terminal end according to claim 13, which is characterized in that the display module, including：

Display sub-module, for passing through the display interface, using acquire one display image as background frame, to it is described extremely A few frame target medium shows that image plays out display.