CN108962251A

CN108962251A - A kind of game role Chinese speech automatic identifying method

Info

Publication number: CN108962251A
Application number: CN201810671470.9A
Authority: CN
Inventors: 杨键; 陈镇秋; 陈汉辉; 李茂�; 吴海权; 卢歆翮; 江卓浩; 陈晨
Original assignee: Western Hills Residence Guangzhou Shi You Network Technology Co Ltd; Zhuhai Kingsoft Online Game Technology Co Ltd
Current assignee: Western Hills Residence Guangzhou Shi You Network Technology Co Ltd; Zhuhai Kingsoft Online Game Technology Co Ltd
Priority date: 2018-06-26
Filing date: 2018-06-26
Publication date: 2018-12-07

Abstract

Technical solution of the present invention includes a kind of game role Chinese speech automatic identifying method, for realizing: to dubbing extraction frequency spectrum data, frequency spectrum data is subjected to the disposal of gentle filter, using treated, data calculate resonance peak data, vowel articulation is extracted according to feature of the vowel articulation on formant and matches corresponding vowel movement, using in gaming, voice mouth shape cartoon is kept or finely tuned according to practical manifestation.The invention has the benefit that duplicate foundation and modification process during simplifying game shape of the mouth as one speaks animation, realize the production of efficient situational dialogues animation, and and mouth shape cartoon feeds back and adjusts in real time, reaches good interactive voice effect and visual signature.

Description

A kind of game role Chinese speech automatic identifying method

Technical field

The present invention relates to a kind of game role Chinese speech automatic identifying methods, belong to computer game field.

Background technique

As internet is more flourishing, the amusement and leisure mode of people is also more and more various, and game industry is led in network Domain is also to propagate its belief on a large scale, present most game, plot dialogue animation is equipped with, to increase the substitution sense of player and immerse Sense makes plot dialogue animation there are mainly two types of method at present, first is that directly using three-dimensional animation according to voice production animation, two It is that shape of the mouth as one speaks size is changed based on volume.

The process flow diagram of the prior art referring to Figure 1 and Figure 2.

Basic procedure shown in FIG. 1 is: first by artificial treatment voice, being adjusted and is made according to voice using Three-dimensional Animation Software Then the speech animation of whole sentence sees whether meet the needs of game effect performance, is such as unsatisfactory for, then continue to be adjusted according to voice The speech animation for making whole sentence finally exports each speech animation until meeting effect.

Basic procedure shown in Fig. 2 is: the volume that game is dubbed is obtained, using simple code parameter come according to trip The opening and closing of speech sound size of playing control personage's nozzle type.

The above method has different defects, and method shown in Fig. 1 has the following deficiencies:

1, there can be a large amount of voice dialogue demand in game, it will at least one animation system according to specific requirements every words Make, if every voice requires that the fine arts is allowed to make speech animation by way of manual manufacture, it will largely occupy 3D The time of the fine arts is acted, this large amount of workload often leads to the scheme that game studios have abandoned the optimization of this level of detail, To reduce game quality.

2 generate due to not having to automate, and during production, need to listen every voice repeatedly, in game making, speech volume Often especially huge, this production process becomes redundancy and time-consuming.

Method shown in Fig. 2 has the following deficiencies: the program using more commonly used based on volume control nozzle type opening and closing, i.e. base In the shape of the mouth as one speaks opening and closing for judging role at that time of the modes such as volume level just slightly.This mode is uncomfortable in the game for requiring best quality With because of volume level opening and closing only, so that role lacks the sense of reality.

Summary of the invention

To solve the above problems, the purpose of the present invention is to provide a kind of game role Chinese speech automatic identifying method, To reach the performance of the automatic processing game role voice shape of the mouth as one speaks, and reach best quality game requirement.

Technical solution used by the present invention solves the problems, such as it is:

A kind of game role Chinese speech automatic identifying method, the described method comprises the following steps:

The step of extracting frequency spectrum data identifies audio file, reads audio file and extracts frequency spectrum data；Handle spectrum number According to the step of, to frequency spectrum data carry out the disposal of gentle filter；The step of obtaining resonance peak data, after the disposal of gentle filter Frequency spectrum data obtains resonance peak data；The step of generating mouth shape cartoon data, is generated as the shape of the mouth as one speaks for obtained resonance peak data and moves Draw data.

Further, the step of extraction frequency spectrum data includes: that acquisition dubs audio file identification and dubs audio file and be It is no to dub audio file for Chinese, if then frequency spectrum data is extracted based on audio file, if not executing processing then.

Further, the step of processing frequency spectrum data includes: to hold the value of frequency spectrum data input array with Gaussian kernel Row convolution operation, the step of obtaining convolution results and convolution results are regarded output valve, execute the acquisition resonance peak data.

Further, the value by frequency spectrum data input array and the step of Gaussian kernel convolution include: that Gaussian template is raw At and process of convolution.

Further, the Gaussian template generation includes: creation Gaussian templateDefine Gaussian template Size and σ；According to the size of template, the center of template is found；Traversal processing is executed, according to the function of Gaussian Profile, The value of each coefficient in calculation template.

Further, the step of process of convolution includes: that the Gaussian template that will be obtained is executed as weight and frequency spectrum data Multiply calculation processing.

Further, described the step of obtaining resonance peak data includes: the peak F 1 for calculating frequency spectrum data, F2, F3, is total to Vibration peak data.

Further, described the step of generating mouth shape cartoon data includes: the member that present frame hair is identified according to formant feature The sound shape of the mouth as one speaks matches corresponding vowel animation and weight；Mouth shape cartoon data based on the whole section of every frame of speech production；After saving Mouth shape cartoon data tested in gaming.

Further, this method further include: the editing machine for finely tuning weight threshold and vowel animation is created, and, according to Fine tuning weight threshold and vowel animation automatically generate the mouth shape cartoon of corresponding version.

The beneficial effects of the present invention are: a kind of game role Chinese speech automatic identification shape of the mouth as one speaks algorithm that the present invention uses Design realizes the part dialog shape of the mouth as one speaks that automation in gaming generates meet demand according to voice shape of the mouth as one speaks automatic identification algorithm Animation, to make the plenty of time of every speech animation manually instead of cartoon making personnel, and saving animation resource is big It is small, by real-time automatic identification voice vowel, mouth shape cartoon is fed back in real time, reaches good interactive voice effect and vision Feature.

Detailed description of the invention

Fig. 1 show the Three-dimensional Animation Software process flow diagram of the prior art；

Fig. 2 show the simple volume modification process figure of the prior art；

Fig. 3 show the algorithm flow chart of embodiment according to the present invention；

Fig. 4 show the overview flow chart used in gaming of embodiment according to the present invention.

Specific embodiment

It is carried out below with reference to technical effect of the embodiment and attached drawing to design of the invention, specific structure and generation clear Chu, complete description, to be completely understood by the purpose of the present invention, scheme and effect.

It should be noted that unless otherwise specified, in the disclosure used in the "an" of singular, " described " and "the" is also intended to including most forms, unless the context clearly indicates other meaning.In addition, unless otherwise defined, this paper institute All technical and scientific terms used are identical as the normally understood meaning of those skilled in the art.This paper specification Used in term be intended merely to description specific embodiment, be not intended to be limiting of the invention.Term as used herein "and/or" includes the arbitrary combination of one or more relevant listed items.

(" such as ", " such as ") makes it should be appreciated that provided in this article any and all example or exemplary language With being intended merely to that the embodiment of the present invention is better described, and unless the context requires otherwise, otherwise the scope of the present invention will not be applied Limitation.

Fig. 3 show the algorithm flow chart of embodiment according to the present invention.Referring to algorithm flow chart of the invention, according to original The Chinese of beginning dubs audio extraction frequency spectrum data, obtains sound spectrum, the frequency spectrum data smothing filtering that will be obtained, so-called smooth filter Wave is exactly the finger and Gaussian kernel convolution of array will to be inputted, and convolution results are regarded output valve, after being followed by subsequent processing smothing filtering Frequency spectrum data calculate its peak F 1, F2, F3 exactly by the frequency spectrum data after obtaining, obtain resonance peak data, be then based on Feature of the Chinese vowels on formant, the vowel shape of the mouth as one speaks of identification present frame hair, matches corresponding vowel animation and weight, raw At mouth shape cartoon data.

Application specifically in gaming is as follows: the Chinese of situational dialogues being dubbed, frequency spectrum data is extracted, due to each assonance Frequency processing system is in complex environment in the acquisition of audio, acquisition, transmission and conversion process, and all audios are equal To some extent by visible or sightless noise jamming, taking corresponding countermeasure thus is exactly to carry out necessary filtering to audio Noise reduction process, that is, smothing filtering, specific practice are that will input the value and Gaussian kernel convolution of array, and convolution results are exported Frequency spectrum data after smothing filtering can be obtained.

By raw spectroscopy data and Gaussian kernel convolution, first have to establish Gaussian template, the foundation of Gaussian template is base In formulaThe self-defined template size ksize and sigma in parameter codes, the mistake of template generation Journey: first according to the size of self-defined template, the center ksize/2 of template is found, is then opened by starting point of center Begin traversal, and according to the function of Gaussian Profile, the value of each coefficient in calculation template completes in this way, Gaussian template is just established.

Using the Gaussian template of foundation as weight, it is multiplied with original audible spectrum data, the frequency after obtaining smothing filtering Modal data calculates peak F 1, F2, F3, obtains resonance peak data, and formant refers to the energy Relatively centralized in the frequency spectrum of sound Some regions, the formant not still determinant of sound quality, and reflect the physical features of sound channel (resonant cavity).Sound is passing through When crossing resonant cavity, by the filter action of cavity, so that the energy of different frequency is redistributed in frequency domain, a part is because of resonance The resonant interaction of chamber is strengthened, and another part is then attenuated.Since Energy distribution is uneven, strong part is like mountain peak one As, so referred to as formant.In Speech acoustics, the pitch of vowel is changeable, but is by two between different vowels Kind is distinguished from each other with the relevant typical pitch of their overtones, and difference of the such case substantially between the vowel of front and back is corresponding. Vowel pitch height, tongue position are with regard to low；Vowel pitch is low, tongue position is just high.This is with our described vowel height one in pronunciation term It causes.These typical overtones are exactly the formant of vowel, and formant decides the sound quality of vowel.Correspondingly, the F1 on sonograph Height with tongue position is corresponding, and F2 is corresponding with the front and back of tongue position, and the circle non-round lip of lip then has relationship with F2 and F3.

Feature based on Chinese vowels on formant, being open, bigger F1 is higher, and tongue position is more forward, and F2 is higher, non-round lip member The F3 of sound is higher than round vowel.Identify the vowel shape of the mouth as one speaks of present frame hair.Match corresponding vowel animation and weight.Based on whole section The mouth shape cartoon data of the every frame of speech production after preservation, carry out test result, and fine tuning data in gaming.

Above-mentioned algorithm in actual development, can according to using coded program to realize its function, and write one it is simple Editing machine can adjust details by fine tuning, speech animation can thus be automatically generated according to specific requirements, then editing machine Fine tuning details achievees the effect that game needs, and the shape of the mouth as one speaks of different editions is automatically generated by adjusting Quanzhou threshold value and vowel animation Animation.

Fig. 4 show the overview flow chart used in gaming of embodiment according to the present invention.Specific dialogue animation It generates, as illustrated in the flow diagram of fig. 4, is acted with the shape of the mouth as one speaks that related software establishes vowel articulation, according to different game and scene, It determines the demand of game voice, creates the editing machine for finely tuning weight threshold and vowel animation, and, according to fine tuning weight threshold Value and vowel animation automatically generate the mouth shape cartoon of corresponding version.The related data in dubbing is mentioned with the algorithm above It takes out, speech animation file is automatically generated according to the frequency spectrum data of audio by particular code, by the animation file set of generation Enter in game and detect, voice mouth shape cartoon is kept or finely tuned according to expression effect.

It should be appreciated that the embodiment of the present invention can be by computer hardware, the combination of hardware and software or by depositing The computer instruction in non-transitory computer-readable memory is stored up to be effected or carried out.Standard volume can be used in the method Journey technology-includes that the non-transitory computer-readable storage media configured with computer program is realized in computer program, In configured in this way storage medium computer is operated in a manner of specific and is predefined --- according in a particular embodiment The method and attached drawing of description.Each program can with the programming language of level process or object-oriented come realize with department of computer science System communication.However, if desired, the program can be realized with compilation or machine language.Under any circumstance, which can be volume The language translated or explained.In addition, the program can be run on the specific integrated circuit of programming for this purpose.

In addition, the operation of process described herein can be performed in any suitable order, unless herein in addition instruction or Otherwise significantly with contradicted by context.Process described herein (or modification and/or combination thereof) can be held being configured with It executes, and is can be used as jointly on the one or more processors under the control of one or more computer systems of row instruction The code (for example, executable instruction, one or more computer program or one or more application) of execution, by hardware or its group It closes to realize.The computer program includes the multiple instruction that can be performed by one or more processors.

Further, the method can be realized in being operably coupled to suitable any kind of computing platform, wrap Include but be not limited to PC, mini-computer, main frame, work station, network or distributed computing environment, individual or integrated Computer platform or communicated with charged particle tool or other imaging devices etc..Each aspect of the present invention can be to deposit The machine readable code on non-transitory storage medium or equipment is stored up to realize no matter be moveable or be integrated to calculating Platform, such as hard disk, optical reading and/or write-in storage medium, RAM, ROM, so that it can be read by programmable calculator, when Storage medium or equipment can be used for configuration and operation computer to execute process described herein when being read by computer.This Outside, machine readable code, or part thereof can be transmitted by wired or wireless network.When such media include combining microprocessor Or other data processors realize steps described above instruction or program when, invention as described herein including these and other not The non-transitory computer-readable storage media of same type.When methods and techniques according to the present invention programming, the present invention It further include computer itself.

Computer program can be applied to input data to execute function as described herein, to convert input data with life At storing to the output data of nonvolatile memory.Output information can also be applied to one or more output equipments as shown Device.In the preferred embodiment of the invention, the data of conversion indicate physics and tangible object, including the object generated on display Reason and the particular visual of physical objects are described.

The above, only presently preferred embodiments of the present invention, the invention is not limited to above embodiment, as long as It reaches technical effect of the invention with identical means, all within the spirits and principles of the present invention, any modification for being made, Equivalent replacement, improvement etc., should be included within the scope of the present invention.Its technical solution within the scope of the present invention And/or embodiment can have a variety of different modifications and variations.

Claims

1. a kind of game role Chinese speech automatic identifying method, which is characterized in that method includes the following steps:

The step of extracting frequency spectrum data identifies audio file, reads audio file and extracts frequency spectrum data；

The step of handling frequency spectrum data carries out the disposal of gentle filter to frequency spectrum data；

The step of obtaining resonance peak data, the frequency spectrum data after the disposal of gentle filter obtain resonance peak data；

The step of generating mouth shape cartoon data, is generated as mouth shape cartoon data for obtained resonance peak data.

2. game role Chinese speech automatic identifying method according to claim 1, which is characterized in that the extraction frequency spectrum The step of data includes:

Acquisition dubs audio file identification and dubs whether audio file is that Chinese dubs audio file, if being then based on audio file Frequency spectrum data is extracted, if not executing processing then.

3. game role Chinese speech automatic identifying method according to claim 1, which is characterized in that the processing frequency spectrum The step of data includes:

The value of frequency spectrum data input array and Gaussian kernel are executed into convolution operation, convolution results is obtained and regards convolution results Output valve, execute acquisitions resonate peak data the step of.

4. game role Chinese speech automatic identifying method according to claim 3, which is characterized in that described by spectrum number It include: Gaussian template generation and process of convolution according to the value and the step of Gaussian kernel convolution of input array.

5. game role Chinese speech automatic identifying method according to claim 4, which is characterized in that the Gaussian template Generation includes:

Create Gaussian template

Define the size and σ of Gaussian template；

According to the size of template, the center of template is found；

Execute traversal processing, according to the function of Gaussian Profile, the value of each coefficient in calculation template.

6. game role Chinese speech automatic identifying method according to claim 4, which is characterized in that the process of convolution The step of include: the Gaussian template that will obtain as weight and frequency spectrum data execution and multiply calculation processing.

7. game role Chinese speech automatic identifying method according to claim 1, which is characterized in that the acquisition resonance The step of peak data includes: the peak F 1 for calculating frequency spectrum data, F2, F3, obtains resonance peak data.

8. game role Chinese speech automatic identifying method according to claim 1, which is characterized in that the generation shape of the mouth as one speaks The step of animation data includes:

The vowel shape of the mouth as one speaks that present frame hair is identified according to formant feature, matches corresponding vowel animation and weight；

Mouth shape cartoon data based on the whole section of every frame of speech production；

Mouth shape cartoon data after preservation are tested in gaming.

9. game role Chinese speech automatic identifying method according to claim 1, which is characterized in that this method is also wrapped It includes:

Create the editing machine for finely tuning weight threshold and vowel animation, and, according to fine tuning weight threshold and vowel animation into Row automatically generates the mouth shape cartoon of corresponding version.