CN108962251A - A kind of game role Chinese speech automatic identifying method - Google Patents

A kind of game role Chinese speech automatic identifying method Download PDF

Info

Publication number
CN108962251A
CN108962251A CN201810671470.9A CN201810671470A CN108962251A CN 108962251 A CN108962251 A CN 108962251A CN 201810671470 A CN201810671470 A CN 201810671470A CN 108962251 A CN108962251 A CN 108962251A
Authority
CN
China
Prior art keywords
data
frequency spectrum
game role
identifying method
automatic identifying
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810671470.9A
Other languages
Chinese (zh)
Inventor
杨键
陈镇秋
陈汉辉
李茂�
吴海权
卢歆翮
江卓浩
陈晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Western Hills Residence Guangzhou Shi You Network Technology Co Ltd
Zhuhai Kingsoft Online Game Technology Co Ltd
Original Assignee
Western Hills Residence Guangzhou Shi You Network Technology Co Ltd
Zhuhai Kingsoft Online Game Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Western Hills Residence Guangzhou Shi You Network Technology Co Ltd, Zhuhai Kingsoft Online Game Technology Co Ltd filed Critical Western Hills Residence Guangzhou Shi You Network Technology Co Ltd
Priority to CN201810671470.9A priority Critical patent/CN108962251A/en
Publication of CN108962251A publication Critical patent/CN108962251A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/2053D [Three Dimensional] animation driven by audio data

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Processing Or Creating Images (AREA)

Abstract

Technical solution of the present invention includes a kind of game role Chinese speech automatic identifying method, for realizing: to dubbing extraction frequency spectrum data, frequency spectrum data is subjected to the disposal of gentle filter, using treated, data calculate resonance peak data, vowel articulation is extracted according to feature of the vowel articulation on formant and matches corresponding vowel movement, using in gaming, voice mouth shape cartoon is kept or finely tuned according to practical manifestation.The invention has the benefit that duplicate foundation and modification process during simplifying game shape of the mouth as one speaks animation, realize the production of efficient situational dialogues animation, and and mouth shape cartoon feeds back and adjusts in real time, reaches good interactive voice effect and visual signature.

Description

A kind of game role Chinese speech automatic identifying method
Technical field
The present invention relates to a kind of game role Chinese speech automatic identifying methods, belong to computer game field.
Background technique
As internet is more flourishing, the amusement and leisure mode of people is also more and more various, and game industry is led in network Domain is also to propagate its belief on a large scale, present most game, plot dialogue animation is equipped with, to increase the substitution sense of player and immerse Sense makes plot dialogue animation there are mainly two types of method at present, first is that directly using three-dimensional animation according to voice production animation, two It is that shape of the mouth as one speaks size is changed based on volume.
The process flow diagram of the prior art referring to Figure 1 and Figure 2.
Basic procedure shown in FIG. 1 is: first by artificial treatment voice, being adjusted and is made according to voice using Three-dimensional Animation Software Then the speech animation of whole sentence sees whether meet the needs of game effect performance, is such as unsatisfactory for, then continue to be adjusted according to voice The speech animation for making whole sentence finally exports each speech animation until meeting effect.
Basic procedure shown in Fig. 2 is: the volume that game is dubbed is obtained, using simple code parameter come according to trip The opening and closing of speech sound size of playing control personage's nozzle type.
The above method has different defects, and method shown in Fig. 1 has the following deficiencies:
1, there can be a large amount of voice dialogue demand in game, it will at least one animation system according to specific requirements every words Make, if every voice requires that the fine arts is allowed to make speech animation by way of manual manufacture, it will largely occupy 3D The time of the fine arts is acted, this large amount of workload often leads to the scheme that game studios have abandoned the optimization of this level of detail, To reduce game quality.
2 generate due to not having to automate, and during production, need to listen every voice repeatedly, in game making, speech volume Often especially huge, this production process becomes redundancy and time-consuming.
Method shown in Fig. 2 has the following deficiencies: the program using more commonly used based on volume control nozzle type opening and closing, i.e. base In the shape of the mouth as one speaks opening and closing for judging role at that time of the modes such as volume level just slightly.This mode is uncomfortable in the game for requiring best quality With because of volume level opening and closing only, so that role lacks the sense of reality.
Summary of the invention
To solve the above problems, the purpose of the present invention is to provide a kind of game role Chinese speech automatic identifying method, To reach the performance of the automatic processing game role voice shape of the mouth as one speaks, and reach best quality game requirement.
Technical solution used by the present invention solves the problems, such as it is:
A kind of game role Chinese speech automatic identifying method, the described method comprises the following steps:
The step of extracting frequency spectrum data identifies audio file, reads audio file and extracts frequency spectrum data;Handle spectrum number According to the step of, to frequency spectrum data carry out the disposal of gentle filter;The step of obtaining resonance peak data, after the disposal of gentle filter Frequency spectrum data obtains resonance peak data;The step of generating mouth shape cartoon data, is generated as the shape of the mouth as one speaks for obtained resonance peak data and moves Draw data.
Further, the step of extraction frequency spectrum data includes: that acquisition dubs audio file identification and dubs audio file and be It is no to dub audio file for Chinese, if then frequency spectrum data is extracted based on audio file, if not executing processing then.
Further, the step of processing frequency spectrum data includes: to hold the value of frequency spectrum data input array with Gaussian kernel Row convolution operation, the step of obtaining convolution results and convolution results are regarded output valve, execute the acquisition resonance peak data.
Further, the value by frequency spectrum data input array and the step of Gaussian kernel convolution include: that Gaussian template is raw At and process of convolution.
Further, the Gaussian template generation includes: creation Gaussian templateDefine Gaussian template Size and σ;According to the size of template, the center of template is found;Traversal processing is executed, according to the function of Gaussian Profile, The value of each coefficient in calculation template.
Further, the step of process of convolution includes: that the Gaussian template that will be obtained is executed as weight and frequency spectrum data Multiply calculation processing.
Further, described the step of obtaining resonance peak data includes: the peak F 1 for calculating frequency spectrum data, F2, F3, is total to Vibration peak data.
Further, described the step of generating mouth shape cartoon data includes: the member that present frame hair is identified according to formant feature The sound shape of the mouth as one speaks matches corresponding vowel animation and weight;Mouth shape cartoon data based on the whole section of every frame of speech production;After saving Mouth shape cartoon data tested in gaming.
Further, this method further include: the editing machine for finely tuning weight threshold and vowel animation is created, and, according to Fine tuning weight threshold and vowel animation automatically generate the mouth shape cartoon of corresponding version.
The beneficial effects of the present invention are: a kind of game role Chinese speech automatic identification shape of the mouth as one speaks algorithm that the present invention uses Design realizes the part dialog shape of the mouth as one speaks that automation in gaming generates meet demand according to voice shape of the mouth as one speaks automatic identification algorithm Animation, to make the plenty of time of every speech animation manually instead of cartoon making personnel, and saving animation resource is big It is small, by real-time automatic identification voice vowel, mouth shape cartoon is fed back in real time, reaches good interactive voice effect and vision Feature.
Detailed description of the invention
Fig. 1 show the Three-dimensional Animation Software process flow diagram of the prior art;
Fig. 2 show the simple volume modification process figure of the prior art;
Fig. 3 show the algorithm flow chart of embodiment according to the present invention;
Fig. 4 show the overview flow chart used in gaming of embodiment according to the present invention.
Specific embodiment
It is carried out below with reference to technical effect of the embodiment and attached drawing to design of the invention, specific structure and generation clear Chu, complete description, to be completely understood by the purpose of the present invention, scheme and effect.
It should be noted that unless otherwise specified, in the disclosure used in the "an" of singular, " described " and "the" is also intended to including most forms, unless the context clearly indicates other meaning.In addition, unless otherwise defined, this paper institute All technical and scientific terms used are identical as the normally understood meaning of those skilled in the art.This paper specification Used in term be intended merely to description specific embodiment, be not intended to be limiting of the invention.Term as used herein "and/or" includes the arbitrary combination of one or more relevant listed items.
(" such as ", " such as ") makes it should be appreciated that provided in this article any and all example or exemplary language With being intended merely to that the embodiment of the present invention is better described, and unless the context requires otherwise, otherwise the scope of the present invention will not be applied Limitation.
Fig. 3 show the algorithm flow chart of embodiment according to the present invention.Referring to algorithm flow chart of the invention, according to original The Chinese of beginning dubs audio extraction frequency spectrum data, obtains sound spectrum, the frequency spectrum data smothing filtering that will be obtained, so-called smooth filter Wave is exactly the finger and Gaussian kernel convolution of array will to be inputted, and convolution results are regarded output valve, after being followed by subsequent processing smothing filtering Frequency spectrum data calculate its peak F 1, F2, F3 exactly by the frequency spectrum data after obtaining, obtain resonance peak data, be then based on Feature of the Chinese vowels on formant, the vowel shape of the mouth as one speaks of identification present frame hair, matches corresponding vowel animation and weight, raw At mouth shape cartoon data.
Application specifically in gaming is as follows: the Chinese of situational dialogues being dubbed, frequency spectrum data is extracted, due to each assonance Frequency processing system is in complex environment in the acquisition of audio, acquisition, transmission and conversion process, and all audios are equal To some extent by visible or sightless noise jamming, taking corresponding countermeasure thus is exactly to carry out necessary filtering to audio Noise reduction process, that is, smothing filtering, specific practice are that will input the value and Gaussian kernel convolution of array, and convolution results are exported Frequency spectrum data after smothing filtering can be obtained.
By raw spectroscopy data and Gaussian kernel convolution, first have to establish Gaussian template, the foundation of Gaussian template is base In formulaThe self-defined template size ksize and sigma in parameter codes, the mistake of template generation Journey: first according to the size of self-defined template, the center ksize/2 of template is found, is then opened by starting point of center Begin traversal, and according to the function of Gaussian Profile, the value of each coefficient in calculation template completes in this way, Gaussian template is just established.
Using the Gaussian template of foundation as weight, it is multiplied with original audible spectrum data, the frequency after obtaining smothing filtering Modal data calculates peak F 1, F2, F3, obtains resonance peak data, and formant refers to the energy Relatively centralized in the frequency spectrum of sound Some regions, the formant not still determinant of sound quality, and reflect the physical features of sound channel (resonant cavity).Sound is passing through When crossing resonant cavity, by the filter action of cavity, so that the energy of different frequency is redistributed in frequency domain, a part is because of resonance The resonant interaction of chamber is strengthened, and another part is then attenuated.Since Energy distribution is uneven, strong part is like mountain peak one As, so referred to as formant.In Speech acoustics, the pitch of vowel is changeable, but is by two between different vowels Kind is distinguished from each other with the relevant typical pitch of their overtones, and difference of the such case substantially between the vowel of front and back is corresponding. Vowel pitch height, tongue position are with regard to low;Vowel pitch is low, tongue position is just high.This is with our described vowel height one in pronunciation term It causes.These typical overtones are exactly the formant of vowel, and formant decides the sound quality of vowel.Correspondingly, the F1 on sonograph Height with tongue position is corresponding, and F2 is corresponding with the front and back of tongue position, and the circle non-round lip of lip then has relationship with F2 and F3.
Feature based on Chinese vowels on formant, being open, bigger F1 is higher, and tongue position is more forward, and F2 is higher, non-round lip member The F3 of sound is higher than round vowel.Identify the vowel shape of the mouth as one speaks of present frame hair.Match corresponding vowel animation and weight.Based on whole section The mouth shape cartoon data of the every frame of speech production after preservation, carry out test result, and fine tuning data in gaming.
Above-mentioned algorithm in actual development, can according to using coded program to realize its function, and write one it is simple Editing machine can adjust details by fine tuning, speech animation can thus be automatically generated according to specific requirements, then editing machine Fine tuning details achievees the effect that game needs, and the shape of the mouth as one speaks of different editions is automatically generated by adjusting Quanzhou threshold value and vowel animation Animation.
Fig. 4 show the overview flow chart used in gaming of embodiment according to the present invention.Specific dialogue animation It generates, as illustrated in the flow diagram of fig. 4, is acted with the shape of the mouth as one speaks that related software establishes vowel articulation, according to different game and scene, It determines the demand of game voice, creates the editing machine for finely tuning weight threshold and vowel animation, and, according to fine tuning weight threshold Value and vowel animation automatically generate the mouth shape cartoon of corresponding version.The related data in dubbing is mentioned with the algorithm above It takes out, speech animation file is automatically generated according to the frequency spectrum data of audio by particular code, by the animation file set of generation Enter in game and detect, voice mouth shape cartoon is kept or finely tuned according to expression effect.
It should be appreciated that the embodiment of the present invention can be by computer hardware, the combination of hardware and software or by depositing The computer instruction in non-transitory computer-readable memory is stored up to be effected or carried out.Standard volume can be used in the method Journey technology-includes that the non-transitory computer-readable storage media configured with computer program is realized in computer program, In configured in this way storage medium computer is operated in a manner of specific and is predefined --- according in a particular embodiment The method and attached drawing of description.Each program can with the programming language of level process or object-oriented come realize with department of computer science System communication.However, if desired, the program can be realized with compilation or machine language.Under any circumstance, which can be volume The language translated or explained.In addition, the program can be run on the specific integrated circuit of programming for this purpose.
In addition, the operation of process described herein can be performed in any suitable order, unless herein in addition instruction or Otherwise significantly with contradicted by context.Process described herein (or modification and/or combination thereof) can be held being configured with It executes, and is can be used as jointly on the one or more processors under the control of one or more computer systems of row instruction The code (for example, executable instruction, one or more computer program or one or more application) of execution, by hardware or its group It closes to realize.The computer program includes the multiple instruction that can be performed by one or more processors.
Further, the method can be realized in being operably coupled to suitable any kind of computing platform, wrap Include but be not limited to PC, mini-computer, main frame, work station, network or distributed computing environment, individual or integrated Computer platform or communicated with charged particle tool or other imaging devices etc..Each aspect of the present invention can be to deposit The machine readable code on non-transitory storage medium or equipment is stored up to realize no matter be moveable or be integrated to calculating Platform, such as hard disk, optical reading and/or write-in storage medium, RAM, ROM, so that it can be read by programmable calculator, when Storage medium or equipment can be used for configuration and operation computer to execute process described herein when being read by computer.This Outside, machine readable code, or part thereof can be transmitted by wired or wireless network.When such media include combining microprocessor Or other data processors realize steps described above instruction or program when, invention as described herein including these and other not The non-transitory computer-readable storage media of same type.When methods and techniques according to the present invention programming, the present invention It further include computer itself.
Computer program can be applied to input data to execute function as described herein, to convert input data with life At storing to the output data of nonvolatile memory.Output information can also be applied to one or more output equipments as shown Device.In the preferred embodiment of the invention, the data of conversion indicate physics and tangible object, including the object generated on display Reason and the particular visual of physical objects are described.
The above, only presently preferred embodiments of the present invention, the invention is not limited to above embodiment, as long as It reaches technical effect of the invention with identical means, all within the spirits and principles of the present invention, any modification for being made, Equivalent replacement, improvement etc., should be included within the scope of the present invention.Its technical solution within the scope of the present invention And/or embodiment can have a variety of different modifications and variations.

Claims (9)

1. a kind of game role Chinese speech automatic identifying method, which is characterized in that method includes the following steps:
The step of extracting frequency spectrum data identifies audio file, reads audio file and extracts frequency spectrum data;
The step of handling frequency spectrum data carries out the disposal of gentle filter to frequency spectrum data;
The step of obtaining resonance peak data, the frequency spectrum data after the disposal of gentle filter obtain resonance peak data;
The step of generating mouth shape cartoon data, is generated as mouth shape cartoon data for obtained resonance peak data.
2. game role Chinese speech automatic identifying method according to claim 1, which is characterized in that the extraction frequency spectrum The step of data includes:
Acquisition dubs audio file identification and dubs whether audio file is that Chinese dubs audio file, if being then based on audio file Frequency spectrum data is extracted, if not executing processing then.
3. game role Chinese speech automatic identifying method according to claim 1, which is characterized in that the processing frequency spectrum The step of data includes:
The value of frequency spectrum data input array and Gaussian kernel are executed into convolution operation, convolution results is obtained and regards convolution results Output valve, execute acquisitions resonate peak data the step of.
4. game role Chinese speech automatic identifying method according to claim 3, which is characterized in that described by spectrum number It include: Gaussian template generation and process of convolution according to the value and the step of Gaussian kernel convolution of input array.
5. game role Chinese speech automatic identifying method according to claim 4, which is characterized in that the Gaussian template Generation includes:
Create Gaussian template
Define the size and σ of Gaussian template;
According to the size of template, the center of template is found;
Execute traversal processing, according to the function of Gaussian Profile, the value of each coefficient in calculation template.
6. game role Chinese speech automatic identifying method according to claim 4, which is characterized in that the process of convolution The step of include: the Gaussian template that will obtain as weight and frequency spectrum data execution and multiply calculation processing.
7. game role Chinese speech automatic identifying method according to claim 1, which is characterized in that the acquisition resonance The step of peak data includes: the peak F 1 for calculating frequency spectrum data, F2, F3, obtains resonance peak data.
8. game role Chinese speech automatic identifying method according to claim 1, which is characterized in that the generation shape of the mouth as one speaks The step of animation data includes:
The vowel shape of the mouth as one speaks that present frame hair is identified according to formant feature, matches corresponding vowel animation and weight;
Mouth shape cartoon data based on the whole section of every frame of speech production;
Mouth shape cartoon data after preservation are tested in gaming.
9. game role Chinese speech automatic identifying method according to claim 1, which is characterized in that this method is also wrapped It includes:
Create the editing machine for finely tuning weight threshold and vowel animation, and, according to fine tuning weight threshold and vowel animation into Row automatically generates the mouth shape cartoon of corresponding version.
CN201810671470.9A 2018-06-26 2018-06-26 A kind of game role Chinese speech automatic identifying method Pending CN108962251A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810671470.9A CN108962251A (en) 2018-06-26 2018-06-26 A kind of game role Chinese speech automatic identifying method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810671470.9A CN108962251A (en) 2018-06-26 2018-06-26 A kind of game role Chinese speech automatic identifying method

Publications (1)

Publication Number Publication Date
CN108962251A true CN108962251A (en) 2018-12-07

Family

ID=64486970

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810671470.9A Pending CN108962251A (en) 2018-06-26 2018-06-26 A kind of game role Chinese speech automatic identifying method

Country Status (1)

Country Link
CN (1) CN108962251A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112023391A (en) * 2020-09-02 2020-12-04 杭州瞳阳科技有限公司 Control system and method for game VR
CN112700520A (en) * 2020-12-30 2021-04-23 上海幻维数码创意科技股份有限公司 Mouth shape expression animation generation method and device based on formants and storage medium
CN112750187A (en) * 2021-01-19 2021-05-04 腾讯科技(深圳)有限公司 Animation generation method, device and equipment and computer readable storage medium

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101656069A (en) * 2009-09-17 2010-02-24 陈拙夫 Chinese voice information communication system and communication method thereof
CN101702198A (en) * 2009-11-19 2010-05-05 浙江大学 Identification method for video and living body faces based on background comparison
CN101894566A (en) * 2010-07-23 2010-11-24 北京理工大学 Visualization method of Chinese mandarin complex vowels based on formant frequency
CN101930747A (en) * 2010-07-30 2010-12-29 四川微迪数字技术有限公司 Method and device for converting voice into mouth shape image
CN201741384U (en) * 2010-07-30 2011-02-09 四川微迪数字技术有限公司 Anti-stammering device for converting Chinese speech into mouth-shaped images
CN102176313A (en) * 2009-10-10 2011-09-07 北京理工大学 Formant-frequency-based Mandarin single final vioce visualizing method
CN102722721A (en) * 2012-05-25 2012-10-10 山东大学 Human falling detection method based on machine vision
CN103729654A (en) * 2014-01-22 2014-04-16 青岛新比特电子科技有限公司 Image matching retrieval system on account of improving Scale Invariant Feature Transform (SIFT) algorithm
CN105022835A (en) * 2015-08-14 2015-11-04 武汉大学 Public safety recognition method and system for crowd sensing big data
CN106503660A (en) * 2016-10-31 2017-03-15 天津大学 Time series complexity measuring method based on image microstructure Frequence Analysis
CN107742114A (en) * 2017-11-09 2018-02-27 深圳大学 high spectrum image feature detection method and device

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101656069A (en) * 2009-09-17 2010-02-24 陈拙夫 Chinese voice information communication system and communication method thereof
CN102176313A (en) * 2009-10-10 2011-09-07 北京理工大学 Formant-frequency-based Mandarin single final vioce visualizing method
CN101702198A (en) * 2009-11-19 2010-05-05 浙江大学 Identification method for video and living body faces based on background comparison
CN101894566A (en) * 2010-07-23 2010-11-24 北京理工大学 Visualization method of Chinese mandarin complex vowels based on formant frequency
CN101930747A (en) * 2010-07-30 2010-12-29 四川微迪数字技术有限公司 Method and device for converting voice into mouth shape image
CN201741384U (en) * 2010-07-30 2011-02-09 四川微迪数字技术有限公司 Anti-stammering device for converting Chinese speech into mouth-shaped images
CN102722721A (en) * 2012-05-25 2012-10-10 山东大学 Human falling detection method based on machine vision
CN103729654A (en) * 2014-01-22 2014-04-16 青岛新比特电子科技有限公司 Image matching retrieval system on account of improving Scale Invariant Feature Transform (SIFT) algorithm
CN105022835A (en) * 2015-08-14 2015-11-04 武汉大学 Public safety recognition method and system for crowd sensing big data
CN106503660A (en) * 2016-10-31 2017-03-15 天津大学 Time series complexity measuring method based on image microstructure Frequence Analysis
CN107742114A (en) * 2017-11-09 2018-02-27 深圳大学 high spectrum image feature detection method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
柳杨: "《数字图像物体识别理论详解与实战》", 31 March 2018, 北京邮电大学出版社 *
王延江 等: "《数字图像处理》", 30 November 2016, 石油大学出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112023391A (en) * 2020-09-02 2020-12-04 杭州瞳阳科技有限公司 Control system and method for game VR
CN112023391B (en) * 2020-09-02 2024-01-16 杭州瞳阳科技有限公司 Control system and method for game VR
CN112700520A (en) * 2020-12-30 2021-04-23 上海幻维数码创意科技股份有限公司 Mouth shape expression animation generation method and device based on formants and storage medium
CN112700520B (en) * 2020-12-30 2024-03-26 上海幻维数码创意科技股份有限公司 Formant-based mouth shape expression animation generation method, device and storage medium
CN112750187A (en) * 2021-01-19 2021-05-04 腾讯科技(深圳)有限公司 Animation generation method, device and equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
US10997764B2 (en) Method and apparatus for generating animation
Toda et al. The Voice Conversion Challenge 2016.
US11295721B2 (en) Generating expressive speech audio from text data
CN110136698B (en) Method, apparatus, device and storage medium for determining mouth shape
CN105244026B (en) A kind of method of speech processing and device
CN108231062B (en) Voice translation method and device
Lugosch et al. Using speech synthesis to train end-to-end spoken language understanding models
CN111433847B (en) Voice conversion method, training method, intelligent device and storage medium
CN108962251A (en) A kind of game role Chinese speech automatic identifying method
CN109524020A (en) A kind of speech enhan-cement processing method
Chen et al. Tone Classification in Mandarin Chinese Using Convolutional Neural Networks.
CN109582952A (en) Poem generation method, device, computer equipment and medium
CN105931631A (en) Voice synthesis system and method
Kapralova et al. A big data approach to acoustic model training corpus selection
Llorach et al. Web-based live speech-driven lip-sync
JP7124373B2 (en) LEARNING DEVICE, SOUND GENERATOR, METHOD AND PROGRAM
US20230039540A1 (en) Automated pipeline selection for synthesis of audio assets
Li et al. Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data.
CN112652041A (en) Virtual image generation method and device, storage medium and electronic equipment
Kang et al. Grad-stylespeech: Any-speaker adaptive text-to-speech synthesis with diffusion models
Luong et al. LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example
JP2020184100A (en) Information processing program, information processing apparatus, information processing method and learned model generation method
CN116095357B (en) Live broadcasting method, device and system of virtual anchor
CN110556092A (en) Speech synthesis method and device, storage medium and electronic device
Kumar et al. Towards building text-to-speech systems for the next billion users

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181207