CN108962251A - A kind of game role Chinese speech automatic identifying method - Google Patents
A kind of game role Chinese speech automatic identifying method Download PDFInfo
- Publication number
- CN108962251A CN108962251A CN201810671470.9A CN201810671470A CN108962251A CN 108962251 A CN108962251 A CN 108962251A CN 201810671470 A CN201810671470 A CN 201810671470A CN 108962251 A CN108962251 A CN 108962251A
- Authority
- CN
- China
- Prior art keywords
- data
- frequency spectrum
- game role
- identifying method
- automatic identifying
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000001228 spectrum Methods 0.000 claims abstract description 36
- 230000008569 process Effects 0.000 claims abstract description 13
- 238000004519 manufacturing process Methods 0.000 claims abstract description 8
- 238000000605 extraction Methods 0.000 claims abstract description 4
- 238000012545 processing Methods 0.000 claims description 11
- 238000004364 calculation method Methods 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 2
- 238000004321 preservation Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 9
- 238000012986 modification Methods 0.000 abstract description 5
- 230000004048 modification Effects 0.000 abstract description 5
- 230000002452 interceptive effect Effects 0.000 abstract description 2
- 230000000007 visual effect Effects 0.000 abstract description 2
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000001914 filtration Methods 0.000 description 6
- 238000003860 storage Methods 0.000 description 6
- 238000004590 computer program Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000007630 basic procedure Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 238000004611 spectroscopical analysis Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G10L15/25—Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
- G06T13/205—3D [Three Dimensional] animation driven by audio data
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Processing Or Creating Images (AREA)
Abstract
Technical solution of the present invention includes a kind of game role Chinese speech automatic identifying method, for realizing: to dubbing extraction frequency spectrum data, frequency spectrum data is subjected to the disposal of gentle filter, using treated, data calculate resonance peak data, vowel articulation is extracted according to feature of the vowel articulation on formant and matches corresponding vowel movement, using in gaming, voice mouth shape cartoon is kept or finely tuned according to practical manifestation.The invention has the benefit that duplicate foundation and modification process during simplifying game shape of the mouth as one speaks animation, realize the production of efficient situational dialogues animation, and and mouth shape cartoon feeds back and adjusts in real time, reaches good interactive voice effect and visual signature.
Description
Technical field
The present invention relates to a kind of game role Chinese speech automatic identifying methods, belong to computer game field.
Background technique
As internet is more flourishing, the amusement and leisure mode of people is also more and more various, and game industry is led in network
Domain is also to propagate its belief on a large scale, present most game, plot dialogue animation is equipped with, to increase the substitution sense of player and immerse
Sense makes plot dialogue animation there are mainly two types of method at present, first is that directly using three-dimensional animation according to voice production animation, two
It is that shape of the mouth as one speaks size is changed based on volume.
The process flow diagram of the prior art referring to Figure 1 and Figure 2.
Basic procedure shown in FIG. 1 is: first by artificial treatment voice, being adjusted and is made according to voice using Three-dimensional Animation Software
Then the speech animation of whole sentence sees whether meet the needs of game effect performance, is such as unsatisfactory for, then continue to be adjusted according to voice
The speech animation for making whole sentence finally exports each speech animation until meeting effect.
Basic procedure shown in Fig. 2 is: the volume that game is dubbed is obtained, using simple code parameter come according to trip
The opening and closing of speech sound size of playing control personage's nozzle type.
The above method has different defects, and method shown in Fig. 1 has the following deficiencies:
1, there can be a large amount of voice dialogue demand in game, it will at least one animation system according to specific requirements every words
Make, if every voice requires that the fine arts is allowed to make speech animation by way of manual manufacture, it will largely occupy 3D
The time of the fine arts is acted, this large amount of workload often leads to the scheme that game studios have abandoned the optimization of this level of detail,
To reduce game quality.
2 generate due to not having to automate, and during production, need to listen every voice repeatedly, in game making, speech volume
Often especially huge, this production process becomes redundancy and time-consuming.
Method shown in Fig. 2 has the following deficiencies: the program using more commonly used based on volume control nozzle type opening and closing, i.e. base
In the shape of the mouth as one speaks opening and closing for judging role at that time of the modes such as volume level just slightly.This mode is uncomfortable in the game for requiring best quality
With because of volume level opening and closing only, so that role lacks the sense of reality.
Summary of the invention
To solve the above problems, the purpose of the present invention is to provide a kind of game role Chinese speech automatic identifying method,
To reach the performance of the automatic processing game role voice shape of the mouth as one speaks, and reach best quality game requirement.
Technical solution used by the present invention solves the problems, such as it is:
A kind of game role Chinese speech automatic identifying method, the described method comprises the following steps:
The step of extracting frequency spectrum data identifies audio file, reads audio file and extracts frequency spectrum data;Handle spectrum number
According to the step of, to frequency spectrum data carry out the disposal of gentle filter;The step of obtaining resonance peak data, after the disposal of gentle filter
Frequency spectrum data obtains resonance peak data;The step of generating mouth shape cartoon data, is generated as the shape of the mouth as one speaks for obtained resonance peak data and moves
Draw data.
Further, the step of extraction frequency spectrum data includes: that acquisition dubs audio file identification and dubs audio file and be
It is no to dub audio file for Chinese, if then frequency spectrum data is extracted based on audio file, if not executing processing then.
Further, the step of processing frequency spectrum data includes: to hold the value of frequency spectrum data input array with Gaussian kernel
Row convolution operation, the step of obtaining convolution results and convolution results are regarded output valve, execute the acquisition resonance peak data.
Further, the value by frequency spectrum data input array and the step of Gaussian kernel convolution include: that Gaussian template is raw
At and process of convolution.
Further, the Gaussian template generation includes: creation Gaussian templateDefine Gaussian template
Size and σ;According to the size of template, the center of template is found;Traversal processing is executed, according to the function of Gaussian Profile,
The value of each coefficient in calculation template.
Further, the step of process of convolution includes: that the Gaussian template that will be obtained is executed as weight and frequency spectrum data
Multiply calculation processing.
Further, described the step of obtaining resonance peak data includes: the peak F 1 for calculating frequency spectrum data, F2, F3, is total to
Vibration peak data.
Further, described the step of generating mouth shape cartoon data includes: the member that present frame hair is identified according to formant feature
The sound shape of the mouth as one speaks matches corresponding vowel animation and weight;Mouth shape cartoon data based on the whole section of every frame of speech production;After saving
Mouth shape cartoon data tested in gaming.
Further, this method further include: the editing machine for finely tuning weight threshold and vowel animation is created, and, according to
Fine tuning weight threshold and vowel animation automatically generate the mouth shape cartoon of corresponding version.
The beneficial effects of the present invention are: a kind of game role Chinese speech automatic identification shape of the mouth as one speaks algorithm that the present invention uses
Design realizes the part dialog shape of the mouth as one speaks that automation in gaming generates meet demand according to voice shape of the mouth as one speaks automatic identification algorithm
Animation, to make the plenty of time of every speech animation manually instead of cartoon making personnel, and saving animation resource is big
It is small, by real-time automatic identification voice vowel, mouth shape cartoon is fed back in real time, reaches good interactive voice effect and vision
Feature.
Detailed description of the invention
Fig. 1 show the Three-dimensional Animation Software process flow diagram of the prior art;
Fig. 2 show the simple volume modification process figure of the prior art;
Fig. 3 show the algorithm flow chart of embodiment according to the present invention;
Fig. 4 show the overview flow chart used in gaming of embodiment according to the present invention.
Specific embodiment
It is carried out below with reference to technical effect of the embodiment and attached drawing to design of the invention, specific structure and generation clear
Chu, complete description, to be completely understood by the purpose of the present invention, scheme and effect.
It should be noted that unless otherwise specified, in the disclosure used in the "an" of singular, " described " and
"the" is also intended to including most forms, unless the context clearly indicates other meaning.In addition, unless otherwise defined, this paper institute
All technical and scientific terms used are identical as the normally understood meaning of those skilled in the art.This paper specification
Used in term be intended merely to description specific embodiment, be not intended to be limiting of the invention.Term as used herein
"and/or" includes the arbitrary combination of one or more relevant listed items.
(" such as ", " such as ") makes it should be appreciated that provided in this article any and all example or exemplary language
With being intended merely to that the embodiment of the present invention is better described, and unless the context requires otherwise, otherwise the scope of the present invention will not be applied
Limitation.
Fig. 3 show the algorithm flow chart of embodiment according to the present invention.Referring to algorithm flow chart of the invention, according to original
The Chinese of beginning dubs audio extraction frequency spectrum data, obtains sound spectrum, the frequency spectrum data smothing filtering that will be obtained, so-called smooth filter
Wave is exactly the finger and Gaussian kernel convolution of array will to be inputted, and convolution results are regarded output valve, after being followed by subsequent processing smothing filtering
Frequency spectrum data calculate its peak F 1, F2, F3 exactly by the frequency spectrum data after obtaining, obtain resonance peak data, be then based on
Feature of the Chinese vowels on formant, the vowel shape of the mouth as one speaks of identification present frame hair, matches corresponding vowel animation and weight, raw
At mouth shape cartoon data.
Application specifically in gaming is as follows: the Chinese of situational dialogues being dubbed, frequency spectrum data is extracted, due to each assonance
Frequency processing system is in complex environment in the acquisition of audio, acquisition, transmission and conversion process, and all audios are equal
To some extent by visible or sightless noise jamming, taking corresponding countermeasure thus is exactly to carry out necessary filtering to audio
Noise reduction process, that is, smothing filtering, specific practice are that will input the value and Gaussian kernel convolution of array, and convolution results are exported
Frequency spectrum data after smothing filtering can be obtained.
By raw spectroscopy data and Gaussian kernel convolution, first have to establish Gaussian template, the foundation of Gaussian template is base
In formulaThe self-defined template size ksize and sigma in parameter codes, the mistake of template generation
Journey: first according to the size of self-defined template, the center ksize/2 of template is found, is then opened by starting point of center
Begin traversal, and according to the function of Gaussian Profile, the value of each coefficient in calculation template completes in this way, Gaussian template is just established.
Using the Gaussian template of foundation as weight, it is multiplied with original audible spectrum data, the frequency after obtaining smothing filtering
Modal data calculates peak F 1, F2, F3, obtains resonance peak data, and formant refers to the energy Relatively centralized in the frequency spectrum of sound
Some regions, the formant not still determinant of sound quality, and reflect the physical features of sound channel (resonant cavity).Sound is passing through
When crossing resonant cavity, by the filter action of cavity, so that the energy of different frequency is redistributed in frequency domain, a part is because of resonance
The resonant interaction of chamber is strengthened, and another part is then attenuated.Since Energy distribution is uneven, strong part is like mountain peak one
As, so referred to as formant.In Speech acoustics, the pitch of vowel is changeable, but is by two between different vowels
Kind is distinguished from each other with the relevant typical pitch of their overtones, and difference of the such case substantially between the vowel of front and back is corresponding.
Vowel pitch height, tongue position are with regard to low;Vowel pitch is low, tongue position is just high.This is with our described vowel height one in pronunciation term
It causes.These typical overtones are exactly the formant of vowel, and formant decides the sound quality of vowel.Correspondingly, the F1 on sonograph
Height with tongue position is corresponding, and F2 is corresponding with the front and back of tongue position, and the circle non-round lip of lip then has relationship with F2 and F3.
Feature based on Chinese vowels on formant, being open, bigger F1 is higher, and tongue position is more forward, and F2 is higher, non-round lip member
The F3 of sound is higher than round vowel.Identify the vowel shape of the mouth as one speaks of present frame hair.Match corresponding vowel animation and weight.Based on whole section
The mouth shape cartoon data of the every frame of speech production after preservation, carry out test result, and fine tuning data in gaming.
Above-mentioned algorithm in actual development, can according to using coded program to realize its function, and write one it is simple
Editing machine can adjust details by fine tuning, speech animation can thus be automatically generated according to specific requirements, then editing machine
Fine tuning details achievees the effect that game needs, and the shape of the mouth as one speaks of different editions is automatically generated by adjusting Quanzhou threshold value and vowel animation
Animation.
Fig. 4 show the overview flow chart used in gaming of embodiment according to the present invention.Specific dialogue animation
It generates, as illustrated in the flow diagram of fig. 4, is acted with the shape of the mouth as one speaks that related software establishes vowel articulation, according to different game and scene,
It determines the demand of game voice, creates the editing machine for finely tuning weight threshold and vowel animation, and, according to fine tuning weight threshold
Value and vowel animation automatically generate the mouth shape cartoon of corresponding version.The related data in dubbing is mentioned with the algorithm above
It takes out, speech animation file is automatically generated according to the frequency spectrum data of audio by particular code, by the animation file set of generation
Enter in game and detect, voice mouth shape cartoon is kept or finely tuned according to expression effect.
It should be appreciated that the embodiment of the present invention can be by computer hardware, the combination of hardware and software or by depositing
The computer instruction in non-transitory computer-readable memory is stored up to be effected or carried out.Standard volume can be used in the method
Journey technology-includes that the non-transitory computer-readable storage media configured with computer program is realized in computer program,
In configured in this way storage medium computer is operated in a manner of specific and is predefined --- according in a particular embodiment
The method and attached drawing of description.Each program can with the programming language of level process or object-oriented come realize with department of computer science
System communication.However, if desired, the program can be realized with compilation or machine language.Under any circumstance, which can be volume
The language translated or explained.In addition, the program can be run on the specific integrated circuit of programming for this purpose.
In addition, the operation of process described herein can be performed in any suitable order, unless herein in addition instruction or
Otherwise significantly with contradicted by context.Process described herein (or modification and/or combination thereof) can be held being configured with
It executes, and is can be used as jointly on the one or more processors under the control of one or more computer systems of row instruction
The code (for example, executable instruction, one or more computer program or one or more application) of execution, by hardware or its group
It closes to realize.The computer program includes the multiple instruction that can be performed by one or more processors.
Further, the method can be realized in being operably coupled to suitable any kind of computing platform, wrap
Include but be not limited to PC, mini-computer, main frame, work station, network or distributed computing environment, individual or integrated
Computer platform or communicated with charged particle tool or other imaging devices etc..Each aspect of the present invention can be to deposit
The machine readable code on non-transitory storage medium or equipment is stored up to realize no matter be moveable or be integrated to calculating
Platform, such as hard disk, optical reading and/or write-in storage medium, RAM, ROM, so that it can be read by programmable calculator, when
Storage medium or equipment can be used for configuration and operation computer to execute process described herein when being read by computer.This
Outside, machine readable code, or part thereof can be transmitted by wired or wireless network.When such media include combining microprocessor
Or other data processors realize steps described above instruction or program when, invention as described herein including these and other not
The non-transitory computer-readable storage media of same type.When methods and techniques according to the present invention programming, the present invention
It further include computer itself.
Computer program can be applied to input data to execute function as described herein, to convert input data with life
At storing to the output data of nonvolatile memory.Output information can also be applied to one or more output equipments as shown
Device.In the preferred embodiment of the invention, the data of conversion indicate physics and tangible object, including the object generated on display
Reason and the particular visual of physical objects are described.
The above, only presently preferred embodiments of the present invention, the invention is not limited to above embodiment, as long as
It reaches technical effect of the invention with identical means, all within the spirits and principles of the present invention, any modification for being made,
Equivalent replacement, improvement etc., should be included within the scope of the present invention.Its technical solution within the scope of the present invention
And/or embodiment can have a variety of different modifications and variations.
Claims (9)
1. a kind of game role Chinese speech automatic identifying method, which is characterized in that method includes the following steps:
The step of extracting frequency spectrum data identifies audio file, reads audio file and extracts frequency spectrum data;
The step of handling frequency spectrum data carries out the disposal of gentle filter to frequency spectrum data;
The step of obtaining resonance peak data, the frequency spectrum data after the disposal of gentle filter obtain resonance peak data;
The step of generating mouth shape cartoon data, is generated as mouth shape cartoon data for obtained resonance peak data.
2. game role Chinese speech automatic identifying method according to claim 1, which is characterized in that the extraction frequency spectrum
The step of data includes:
Acquisition dubs audio file identification and dubs whether audio file is that Chinese dubs audio file, if being then based on audio file
Frequency spectrum data is extracted, if not executing processing then.
3. game role Chinese speech automatic identifying method according to claim 1, which is characterized in that the processing frequency spectrum
The step of data includes:
The value of frequency spectrum data input array and Gaussian kernel are executed into convolution operation, convolution results is obtained and regards convolution results
Output valve, execute acquisitions resonate peak data the step of.
4. game role Chinese speech automatic identifying method according to claim 3, which is characterized in that described by spectrum number
It include: Gaussian template generation and process of convolution according to the value and the step of Gaussian kernel convolution of input array.
5. game role Chinese speech automatic identifying method according to claim 4, which is characterized in that the Gaussian template
Generation includes:
Create Gaussian template
Define the size and σ of Gaussian template;
According to the size of template, the center of template is found;
Execute traversal processing, according to the function of Gaussian Profile, the value of each coefficient in calculation template.
6. game role Chinese speech automatic identifying method according to claim 4, which is characterized in that the process of convolution
The step of include: the Gaussian template that will obtain as weight and frequency spectrum data execution and multiply calculation processing.
7. game role Chinese speech automatic identifying method according to claim 1, which is characterized in that the acquisition resonance
The step of peak data includes: the peak F 1 for calculating frequency spectrum data, F2, F3, obtains resonance peak data.
8. game role Chinese speech automatic identifying method according to claim 1, which is characterized in that the generation shape of the mouth as one speaks
The step of animation data includes:
The vowel shape of the mouth as one speaks that present frame hair is identified according to formant feature, matches corresponding vowel animation and weight;
Mouth shape cartoon data based on the whole section of every frame of speech production;
Mouth shape cartoon data after preservation are tested in gaming.
9. game role Chinese speech automatic identifying method according to claim 1, which is characterized in that this method is also wrapped
It includes:
Create the editing machine for finely tuning weight threshold and vowel animation, and, according to fine tuning weight threshold and vowel animation into
Row automatically generates the mouth shape cartoon of corresponding version.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810671470.9A CN108962251A (en) | 2018-06-26 | 2018-06-26 | A kind of game role Chinese speech automatic identifying method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810671470.9A CN108962251A (en) | 2018-06-26 | 2018-06-26 | A kind of game role Chinese speech automatic identifying method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108962251A true CN108962251A (en) | 2018-12-07 |
Family
ID=64486970
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810671470.9A Pending CN108962251A (en) | 2018-06-26 | 2018-06-26 | A kind of game role Chinese speech automatic identifying method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108962251A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112023391A (en) * | 2020-09-02 | 2020-12-04 | 杭州瞳阳科技有限公司 | Control system and method for game VR |
CN112700520A (en) * | 2020-12-30 | 2021-04-23 | 上海幻维数码创意科技股份有限公司 | Mouth shape expression animation generation method and device based on formants and storage medium |
CN112750187A (en) * | 2021-01-19 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Animation generation method, device and equipment and computer readable storage medium |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101656069A (en) * | 2009-09-17 | 2010-02-24 | 陈拙夫 | Chinese voice information communication system and communication method thereof |
CN101702198A (en) * | 2009-11-19 | 2010-05-05 | 浙江大学 | Identification method for video and living body faces based on background comparison |
CN101894566A (en) * | 2010-07-23 | 2010-11-24 | 北京理工大学 | Visualization method of Chinese mandarin complex vowels based on formant frequency |
CN101930747A (en) * | 2010-07-30 | 2010-12-29 | 四川微迪数字技术有限公司 | Method and device for converting voice into mouth shape image |
CN201741384U (en) * | 2010-07-30 | 2011-02-09 | 四川微迪数字技术有限公司 | Anti-stammering device for converting Chinese speech into mouth-shaped images |
CN102176313A (en) * | 2009-10-10 | 2011-09-07 | 北京理工大学 | Formant-frequency-based Mandarin single final vioce visualizing method |
CN102722721A (en) * | 2012-05-25 | 2012-10-10 | 山东大学 | Human falling detection method based on machine vision |
CN103729654A (en) * | 2014-01-22 | 2014-04-16 | 青岛新比特电子科技有限公司 | Image matching retrieval system on account of improving Scale Invariant Feature Transform (SIFT) algorithm |
CN105022835A (en) * | 2015-08-14 | 2015-11-04 | 武汉大学 | Public safety recognition method and system for crowd sensing big data |
CN106503660A (en) * | 2016-10-31 | 2017-03-15 | 天津大学 | Time series complexity measuring method based on image microstructure Frequence Analysis |
CN107742114A (en) * | 2017-11-09 | 2018-02-27 | 深圳大学 | high spectrum image feature detection method and device |
-
2018
- 2018-06-26 CN CN201810671470.9A patent/CN108962251A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101656069A (en) * | 2009-09-17 | 2010-02-24 | 陈拙夫 | Chinese voice information communication system and communication method thereof |
CN102176313A (en) * | 2009-10-10 | 2011-09-07 | 北京理工大学 | Formant-frequency-based Mandarin single final vioce visualizing method |
CN101702198A (en) * | 2009-11-19 | 2010-05-05 | 浙江大学 | Identification method for video and living body faces based on background comparison |
CN101894566A (en) * | 2010-07-23 | 2010-11-24 | 北京理工大学 | Visualization method of Chinese mandarin complex vowels based on formant frequency |
CN101930747A (en) * | 2010-07-30 | 2010-12-29 | 四川微迪数字技术有限公司 | Method and device for converting voice into mouth shape image |
CN201741384U (en) * | 2010-07-30 | 2011-02-09 | 四川微迪数字技术有限公司 | Anti-stammering device for converting Chinese speech into mouth-shaped images |
CN102722721A (en) * | 2012-05-25 | 2012-10-10 | 山东大学 | Human falling detection method based on machine vision |
CN103729654A (en) * | 2014-01-22 | 2014-04-16 | 青岛新比特电子科技有限公司 | Image matching retrieval system on account of improving Scale Invariant Feature Transform (SIFT) algorithm |
CN105022835A (en) * | 2015-08-14 | 2015-11-04 | 武汉大学 | Public safety recognition method and system for crowd sensing big data |
CN106503660A (en) * | 2016-10-31 | 2017-03-15 | 天津大学 | Time series complexity measuring method based on image microstructure Frequence Analysis |
CN107742114A (en) * | 2017-11-09 | 2018-02-27 | 深圳大学 | high spectrum image feature detection method and device |
Non-Patent Citations (2)
Title |
---|
柳杨: "《数字图像物体识别理论详解与实战》", 31 March 2018, 北京邮电大学出版社 * |
王延江 等: "《数字图像处理》", 30 November 2016, 石油大学出版社 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112023391A (en) * | 2020-09-02 | 2020-12-04 | 杭州瞳阳科技有限公司 | Control system and method for game VR |
CN112023391B (en) * | 2020-09-02 | 2024-01-16 | 杭州瞳阳科技有限公司 | Control system and method for game VR |
CN112700520A (en) * | 2020-12-30 | 2021-04-23 | 上海幻维数码创意科技股份有限公司 | Mouth shape expression animation generation method and device based on formants and storage medium |
CN112700520B (en) * | 2020-12-30 | 2024-03-26 | 上海幻维数码创意科技股份有限公司 | Formant-based mouth shape expression animation generation method, device and storage medium |
CN112750187A (en) * | 2021-01-19 | 2021-05-04 | 腾讯科技(深圳)有限公司 | Animation generation method, device and equipment and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10997764B2 (en) | Method and apparatus for generating animation | |
Toda et al. | The Voice Conversion Challenge 2016. | |
US11295721B2 (en) | Generating expressive speech audio from text data | |
CN110136698B (en) | Method, apparatus, device and storage medium for determining mouth shape | |
CN105244026B (en) | A kind of method of speech processing and device | |
CN108231062B (en) | Voice translation method and device | |
Lugosch et al. | Using speech synthesis to train end-to-end spoken language understanding models | |
CN111433847B (en) | Voice conversion method, training method, intelligent device and storage medium | |
CN108962251A (en) | A kind of game role Chinese speech automatic identifying method | |
CN109524020A (en) | A kind of speech enhan-cement processing method | |
Chen et al. | Tone Classification in Mandarin Chinese Using Convolutional Neural Networks. | |
CN109582952A (en) | Poem generation method, device, computer equipment and medium | |
CN105931631A (en) | Voice synthesis system and method | |
Kapralova et al. | A big data approach to acoustic model training corpus selection | |
Llorach et al. | Web-based live speech-driven lip-sync | |
JP7124373B2 (en) | LEARNING DEVICE, SOUND GENERATOR, METHOD AND PROGRAM | |
US20230039540A1 (en) | Automated pipeline selection for synthesis of audio assets | |
Li et al. | Expressive Speech Driven Talking Avatar Synthesis with DBLSTM Using Limited Amount of Emotional Bimodal Data. | |
CN112652041A (en) | Virtual image generation method and device, storage medium and electronic equipment | |
Kang et al. | Grad-stylespeech: Any-speaker adaptive text-to-speech synthesis with diffusion models | |
Luong et al. | LaughNet: synthesizing laughter utterances from waveform silhouettes and a single laughter example | |
JP2020184100A (en) | Information processing program, information processing apparatus, information processing method and learned model generation method | |
CN116095357B (en) | Live broadcasting method, device and system of virtual anchor | |
CN110556092A (en) | Speech synthesis method and device, storage medium and electronic device | |
Kumar et al. | Towards building text-to-speech systems for the next billion users |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181207 |