CN106653054B

CN106653054B - Method and device for generating voice animation

Info

Publication number: CN106653054B
Application number: CN201610889079.7A
Authority: CN
Inventors: 王夏鸣; 赵志翔
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2016-10-11
Filing date: 2016-10-11
Publication date: 2020-02-14
Anticipated expiration: 2036-10-11
Also published as: CN106653054A

Abstract

The application provides a method and a device for generating voice animation, wherein the method for generating the voice animation comprises the following steps: acquiring a volume value of a current voice signal; determining a peak position point in position points forming a voice animation waveform to be generated according to the volume value, and determining a peak amplitude value of the peak position point; and generating a voice animation waveform according to the peak position point and the peak amplitude value. The method can generate more accurate voice animation of the voice-like audio spectrum, and brings more realistic voice feedback experience for users.

Description

Method and device for generating voice animation

Technical Field

The present application relates to the field of speech signal processing, and in particular, to a method and an apparatus for generating speech animation.

Background

On mainstream smart phone systems, such as iOS and Android, third-party software has permission to access a mobile phone microphone, and acquires sound recorded by the microphone by using a system-level audio recording interface. When the application software with the voice recording function interacts with the user, the recording state can be fed back to the user in real time in a voice animation mode, so that the user is informed that the current state is recording.

Although the voice animation in the related art may exhibit a ripple effect similar to a sound spectrum, the analysis of the voice animation in the related art cannot accurately reflect the real characteristics of the sound signal, and thus has a certain problem in terms of simulation effect and the like.

Disclosure of Invention

The present application is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, an object of the present application is to provide a method for generating a speech animation, which can generate a more accurate speech animation imitating a voice frequency spectrum, and bring a more realistic speech feedback experience to a user.

Another object of the present application is to provide an apparatus for generating speech animation.

In order to achieve the above object, an embodiment of the first aspect of the present application provides a method for generating a speech animation, including: acquiring a volume value of a current voice signal; determining a peak position point in position points forming a voice animation waveform to be generated according to the volume value, and determining a peak amplitude value of the peak position point; and generating a voice animation waveform according to the peak position point and the peak amplitude value.

According to the method for generating the voice animation provided by the embodiment of the first aspect of the application, the peak position point and the peak amplitude value are determined according to the volume value, so that the frequency spectrum characteristics of sound can be reflected more truly, the voice animation of a more accurate voice-like audio spectrum is generated, and more vivid voice feedback experience is brought to a user.

In order to achieve the above object, an apparatus for generating a voice animation according to an embodiment of a second aspect of the present application includes: the volume acquisition module is used for acquiring the volume value of the current voice signal; the peak determining module is used for determining a peak position point in position points forming a voice animation waveform to be generated according to the volume value and determining a peak amplitude value of the peak position point; and the waveform generating module is used for generating a voice animation waveform according to the peak position point and the peak amplitude value.

According to the device for generating the voice animation provided by the embodiment of the second aspect of the application, the sound quantity value is obtained, and the peak position point and the peak amplitude value are determined according to the sound quantity value, so that the frequency spectrum characteristics of sound can be reflected more truly, the voice animation of a more accurate voice-like audio spectrum is generated, and more vivid voice feedback experience is brought to a user.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow diagram of a method for generating speech animation according to an embodiment of the present application;

FIG. 2 is a schematic diagram of location points composing a voice animation waveform to be generated in an embodiment of the present application;

FIG. 3 is a flow chart illustrating a method for generating speech animation according to another embodiment of the present application;

FIG. 4 is a schematic diagram of candidate peak location points in an embodiment of the present application;

FIG. 5 is a schematic illustration of the peak location points ultimately employed in the embodiments of the present application;

FIG. 6 is a schematic diagram of an initial value of a peak amplitude value in an embodiment of the present application;

FIG. 7 is a diagram illustrating final values of peak amplitude values in an embodiment of the present application;

FIG. 8 is a schematic diagram of a speech animation waveform generated in an embodiment of the present application;

FIG. 9 is a schematic structural diagram of an apparatus for generating speech animation according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an apparatus for generating a speech animation according to another embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar modules or modules having the same or similar functionality throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the application include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.

Fig. 1 is a flowchart illustrating a method for generating a speech animation according to an embodiment of the present application.

As shown in fig. 1, the present embodiment includes the following steps:

s11: and acquiring the volume value of the current voice signal.

Wherein, the volume value of the current voice signal can be detected by adopting a volume detection module. It is to be understood that the volume detection module may be implemented using existing or future technologies and will not be described in detail herein.

Assuming that the volume value of the current speech signal is represented by V0, the value of V0 is generally a value between 0 and 1. It can be understood that the physical unit of the volume value is usually decibels, and the volume value in this application refers to the normalized volume value in physical unit, so that the volume value in decibel unit is normalized to be between 0 and 1 for convenient operation. During normalization, the normalized value is positively correlated with the original decibel unit value, i.e., the higher the original volume, the larger the normalized volume value. The specific normalization algorithm is not limited and may be implemented using existing or future technologies.

S12: and determining a peak position point in the position points forming the voice animation waveform to be generated according to the volume value, and determining a peak amplitude value of the peak position point.

The parameters of the horizontal axis and the vertical axis of the voice animation waveform (waveform for short) to be generated can be set. In the present embodiment, since the waveform of the imitation-sound frequency spectrum is to be generated, as shown in fig. 2, the horizontal axis x is defined as the frequency and the vertical axis h is defined as the amplitude. Further, the location points that make up the waveform are also definable. To simplify the operation, as shown in fig. 2, N position points on the horizontal axis are defined as position points constituting the waveform. Further, in order to better match with the design principle of axial symmetry of a recording interface and a button on the mobile device, the waveform is also designed in an axial symmetry mode, and the frequency farther away from the axis is defined as the frequency with smaller amplitude contained in the human sound spectrum. As shown in fig. 2, when the coordinate values x are axisymmetric, the N position points may be symmetrically and uniformly distributed within a range of a maximum value of the preset coordinate values x.

After the position points are determined, whether the position points are candidate peak position points or not can be determined according to the volume value corresponding to each position point, and finally adopted peak position points are determined according to the candidate peak position points. For details, refer to the following embodiments.

Furthermore, the distribution range of the peak position points and the volume value form a positive correlation relationship, that is, the larger the volume value is, the richer the distribution of the peak position points is, and the medium-high frequency is involved. Because people pronounce through vocal cord vibration, the vibration range and amplitude of vocal cord muscles are larger and the frequency spectrum range is wider when the people speak loudly. Therefore, when the distribution range of the peak position points and the volume value form a positive correlation, the actual effect of the spectrum type animation is better met.

When determining the peak amplitude value, the peak amplitude value and the volume value have a positive correlation, i.e. the larger the volume value, the larger the peak amplitude value. Therefore, volume information can be fed back from the amplitude, and a user can adjust the volume according to the amplitude effect feedback so as to achieve a better recognition effect.

S13: and generating a voice animation waveform according to the peak position point and the peak amplitude value.

Wherein the generated voice animation is a waveform, which may be referred to as a voice animation waveform.

This step may include: determining the amplitude value of the position point according to the coordinate value of the position point, the coordinate value of the peak position point and the peak amplitude value; and generating a voice animation waveform according to the amplitude value of the position point. Please refer to the following description for details.

In the embodiment, the sound volume value is obtained, and the peak position point and the peak amplitude value are determined according to the sound volume value, so that the frequency spectrum characteristics of sound can be reflected more truly, more accurate voice animation of the voice-like audio spectrum is generated, and more vivid voice feedback experience is brought to a user.

Fig. 3 is a flowchart illustrating a method for generating a speech animation according to another embodiment of the present application.

As shown in fig. 3, the method of the present embodiment includes:

s31: and acquiring the volume value of the current voice signal.

The details can be found in the above example and are not described in detail herein.

S32: and determining an input volume value according to the volume value.

Assuming that the input volume value is represented by V, after the volume value V0 of the current speech signal is determined, the volume value V0 can be directly used as the input volume value V.

Further, in order to avoid excessive waveform generation due to background noise when the volume is small, the input volume value V may be determined according to the volume value V0 as follows:

if the volume value of the current voice signal is smaller than the preset volume value, setting the input volume value as a fixed value smaller than the preset value; and if the volume value of the current voice signal is greater than or equal to the preset volume value, increasing the volume value of the current voice signal by a preset increment to be used as an input volume value. It is understood that, as shown in the above embodiment, the volume value of the current speech signal is usually between 0 and 1, and therefore, in the above threshold determination, a limit between 0 and 1 may be added in addition to the preset volume value. For example, the volume value of the current speech signal being smaller than the preset volume value may specifically mean that the volume value of the current speech signal is smaller than the preset volume value and is greater than or equal to 0, the volume value of the current speech signal being greater than or equal to the preset volume value may specifically mean that the volume value of the current speech signal is greater than or equal to the preset volume value and is less than or equal to 1, and the preset volume value is a value between 0 and 1.

Specifically, taking the preset volume value as 0.1, the fixed value as 0.1, and the preset increment as 2 as an example, the calculation formula is as follows:

s33: the method comprises the steps of obtaining position points forming a voice animation waveform to be generated, and obtaining coordinate values of the position points.

As shown in fig. 2, the position point may be determined in a defined manner, and after the position point is determined, the coordinate value of the position point is also determined. Specifically, in each embodiment of the present application, if not specifically stated, the coordinate value refers to a coordinate value of the horizontal axis of the position point. Therefore, the coordinate value x of each position point can be acquired from the known information_j,j＝1,…,N。

S34: and determining candidate peak position points in the position points according to the input volume value and the coordinate values of the position points.

In this case, it is determined whether the position point is a candidate peak position point corresponding to each position point, so that a candidate peak position point is determined among all the position points.

The method specifically comprises the following steps: determining a probability value of the position point as a candidate peak position point according to the input volume value and the coordinate value; and determining whether the corresponding position point is a candidate peak position point or not according to the probability value.

In particular, each location point is known, and thus each bitThe coordinate value of the positioning point is known, in particular the abscissa value, in x_jAnd (4) showing.

After determining the input volume value V and the coordinate value x_jThereafter, the probability values described above may be calculated. Wherein the probability value has a positive correlation with the input volume value and an absolute value of the coordinate value. One calculation is as follows:

wherein, B (x)_j) Is a coordinate value of x_jThe position point of (a) is a probability value of a candidate peak position point, V is an input volume value, x_jIs the coordinate value of the position point, and W is the maximum value of the preset coordinate value.

After the probability value of the position point is determined, if the probability value is smaller than a preset probability value, determining that the corresponding position point is not a candidate peak position point; and if the probability value is greater than or equal to a preset probability value, determining that the corresponding position point is a candidate peak position point.

Assuming the coordinate value is x_jIs not R (x) for candidate peak position point_j) R (x) represents a peak position candidate point represented by 0_j) As indicated by 1, taking the preset probability value as 0.5 as an example, the following formula can be adopted to determine whether the position point is a candidate peak position point:

and determining whether the position point is a candidate peak position point or not by adopting the mode corresponding to each position point, so that the candidate peak position point can be determined from all the position points. For example, referring to fig. 4, a schematic diagram of candidate peak location points is given.

S35: and determining the finally adopted peak position point according to the candidate peak position points.

After the candidate peak position point is determined, the candidate peak position point can be directly used as the peak position point finally adopted.

Furthermore, in consideration of actual experience, the number of peaks needs to be limited, otherwise, the frequency spectrum loses the fluctuation of the sound spectrum after the peaks are excessively superposed. Therefore, when determining the peak position point to be finally adopted according to the candidate peak position points, the method may further include: determining the number of peak position points, and selecting the candidate peak position points of the number as the peak position points finally adopted.

When determining the number of peak position points, the number of peak position points may be determined according to the input volume value. The number of the peak position points and the input volume value form positive correlation. One calculation is as follows:

M＝20*V

wherein M is the number of peak position points; v is the input volume value; 20 is an empirically selected coefficient, and may be selected to be other values.

Further, M may be varied within a predetermined range in order to simulate a situation where the number of sound frequency components may not be completely the same even at the same volume during the real sound production. Assuming that the variation of the preset range is random within the range of ± 1, the formula is:

M＝20*V+Rand(0,2)-1

where Rand (0,2) refers to a random number in the range of 0-2.

After the number M is determined, M candidate peak position points may be randomly selected from all candidate peak position points as the peak position points to be finally adopted. For example, referring to FIG. 5, a schematic of the peak location points ultimately employed is given.

S36: and determining the peak amplitude value of the peak position point according to the coordinate value of the peak position point and the input volume value.

After the peak position point is determined, the coordinate value of the peak position point is also determined, so that the coordinate value of the peak position point can be acquired. Assume that the coordinate value of the peak position point is represented as x_i,i＝1,…,M。

The maximum amplitude value can be determined from the input volume value, and then the peak amplitude value can be determined from the coordinate value of the peak position point and the maximum amplitude value.

Specifically, the maximum amplitude value is positively correlated with the input volume value, and is expressed by the following formula:

H_max＝V^0.333×0.8×H

wherein H_maxIs the amplitude maximum; v is the input volume value; h is a preset amplitude threshold; 0.333 and 0.8 are empirically set values, and may be selected as other values.

When determining the peak amplitude value according to the coordinate value of the peak position point and the maximum amplitude value, an initial value of the peak amplitude value may be determined first, and then a final value of the corresponding peak amplitude value may be determined according to the initial value.

The initial value of the peak amplitude value and the absolute value of the coordinate value of the peak position point and the maximum amplitude value form a positive correlation relationship, and the positive correlation relationship is expressed by a formula as follows:

wherein h is₀(x_i) Is the initial value of the peak amplitude value, H_maxIs the maximum value of the amplitude, x_iIs the coordinate value of the peak position point, and W is the maximum value of the preset coordinate value.

After adding the initial value of the corresponding peak amplitude value to the peak position point, the schematic diagram of the peak may be as shown in fig. 6.

After the initial value of the peak amplitude value is determined, it can be directly used as the corresponding final value.

Further, in order to simulate a more realistic effect, random environmental noise is introduced to influence the randomness of the sound amplitude of different frequencies, so that the product of the initial value and a random number in a preset range can be used as the final value.

Assuming the predetermined range is 0.8-1.2, it is formulated as:

h(x_i)＝h₀(x_i)×Rand(0.8,1.2)

wherein, h (x)_i) Is the final value of the peak amplitude value, h₀(x_i) Is the initial value of the corresponding peak amplitude value, Rand (0.8, 1)And 2) is a random number in the range of 0.8 to 1.2.

The peak diagram obtained by multiplying the initial value of the peak amplitude value by the random number can be shown in fig. 7.

S37: and generating a voice animation waveform according to the peak position point and the peak amplitude value.

The method specifically comprises the following steps: determining an amplitude value of the position point according to the coordinate value of the position point, the coordinate value of the peak position point and the peak amplitude value of the voice animation waveform to be generated; and generating a voice animation waveform according to the amplitude value of the position point.

Assuming that the coordinate value of each peak position point is x_iThe coordinate value of each position point is represented by x_jIndicating that the amplitude value h (x) of each position point_j) Can be calculated by the following formula:

where 0.5 is the attenuation coefficient of the neighboring point, which may be set empirically, or may be other values.

After the amplitude value of each position point is determined, a corresponding amplitude value can be added to each position point, thereby generating a corresponding voice animation waveform. For example, the final generated speech animation waveform may be as shown in FIG. 8.

After the voice animation waveform is generated, the waveform may be presented to the user as a voice animation.

Further, in some embodiments, referring to fig. 3, the method may further include:

s38: and determining the updating period of the voice animation, and regenerating and displaying the voice animation according to the updating period.

The process of regenerating the voice animation can be re-executed by referring to the above steps.

The update period may be chosen to be a fixed value, such as Δ T₀. Is a fixed value set based on experience or the like.

Further, since the waveform refresh frequency is higher when speaking normally than when not speaking, the update period can be determined based on the volume value or the input volume value. Taking the example that the update period is related to the volume value V0, the formula can be expressed as:

where Δ T is the update period finally adopted, V0 is the volume value of the current speech signal, Δ T₀Is a fixed value set empirically or the like, it is understood that the above coefficient 3 may be other values larger than 1, and the above threshold value 0.1 may be selected to be other values between 0 and 1.

In the embodiment, the peak amplitude value is positively correlated with the volume, so that volume information can be fed back from the amplitude, and a user can adjust the volume according to the amplitude effect feedback, so as to achieve a better identification effect; when the volume is higher, the audio frequency distribution is richer, and the characteristics of human sounding are better met, so that more real frequency spectrum characteristics can be reflected; the updating frequency is positively correlated with the volume, so that the response sensitivity of the voice animation is positively correlated with the volume, and the higher the volume is, the larger the volume value is, and the more timely the animation feedback is; according to the method, when the volume value of a voice signal is small (namely the volume is low), the input volume value is also small, and the wave crest is positively correlated with the input volume value, so that when a user does not speak, no redundant wave crest animation exists; random numbers are added when the peak position point and the peak amplitude value are calculated, so that the voice animation has irreproducibility, the completely same state basically does not exist, and the influence of uncertainty of environmental background noise on voice recording can be simulated.

Fig. 9 is a schematic structural diagram of an apparatus for generating a voice animation according to an embodiment of the present application.

As shown in fig. 9, the apparatus 90 includes: a volume acquisition module 91, a peak determination module 92 and a waveform generation module 93.

A volume obtaining module 91, configured to obtain a volume value of a current voice signal;

a peak determining module 92, configured to determine a peak position point among position points forming a to-be-generated speech animation waveform according to the volume value, and determine a peak amplitude value of the peak position point;

and a waveform generating module 93, configured to generate a voice animation waveform according to the peak position point and the peak amplitude value.

In some embodiments, referring to fig. 10, the apparatus 90 further comprises:

an input volume determining module 94, configured to determine an input volume value according to a volume value of a current speech signal, so as to use the input volume value as a volume value for subsequent operations;

wherein the determining an input volume value according to the volume value of the current speech signal comprises:

determining the volume value of the current voice signal as an input volume value; alternatively, the first and second electrodes may be,

if the volume value of the current voice signal is smaller than the preset volume value, setting the input volume value as a fixed value smaller than the preset value; and if the volume value of the current voice signal is greater than or equal to the preset volume value, increasing the volume value of the current voice signal by the preset value to be used as an input volume value.

In some embodiments, referring to fig. 10, the peak determining module 92 comprises:

a position point coordinate obtaining submodule 921 for obtaining position points constituting a voice animation waveform to be generated, and obtaining coordinate values of the position points;

the candidate position point determining submodule 922 is configured to determine candidate peak position points in the position points according to the volume values and the coordinate values of the position points;

a peak position point determining submodule 923 configured to directly use the candidate peak position point as a peak position point to be finally adopted; or determining the number of peak position points; and selecting the candidate peak position points of the number as the peak position points finally adopted.

In some embodiments, the candidate location point determining sub-module 922 is specifically configured to:

determining the probability value of the position point as a candidate peak position point according to the volume value and the coordinate value of the position point;

and determining whether the corresponding position point is a candidate peak position point or not according to the probability value.

In some embodiments, the probability value is positively correlated with the input volume value and the absolute value of the coordinate value of the location point, and the candidate location point determination submodule 922 is further configured to:

if the probability value is smaller than a preset probability value, determining that the corresponding position point is not a candidate peak position point; and if the probability value is greater than or equal to a preset probability value, determining that the corresponding position point is a candidate peak position point.

In some embodiments, the peak position point determining submodule 923 is configured to determine the number of peak position points, including:

determining the preset multiple of the input volume value as the number of peak position points; alternatively, the first and second electrodes may be,

and determining the sum of the preset multiple of the input volume value and the random number in the preset range as the number of the peak position points.

an amplitude maximum determination sub-module 924 for determining an amplitude maximum from the volume value;

an amplitude initial value determining submodule 925, configured to determine an initial value of a peak amplitude value of the peak position point according to the coordinate value of the peak position point and the maximum amplitude value;

an amplitude final value determining sub-module 926, configured to determine a final value of the peak amplitude value according to the initial value.

In some embodiments, the initial value is positively correlated with the absolute value of the coordinate value of the peak position point and the maximum amplitude value, and the final amplitude value determining sub-module 926 is specifically configured to:

taking the initial value directly as the final value; alternatively, the first and second electrodes may be,

and taking the product of the initial value and the random number in a preset range as the final value.

In some embodiments, the waveform generation module 93 is specifically configured to:

determining the amplitude value of the position point according to the coordinate value of the position point, the coordinate value of the peak position point and the peak amplitude value;

and generating a voice animation waveform according to the amplitude value of the position point.

In some embodiments, referring to fig. 10, the apparatus 90 further comprises:

the updating module 95 is configured to determine an updating period of the voice animation, and regenerate and display the voice animation according to the updating period;

wherein the update period is a preset fixed value, or the update period is related to the volume value.

It is understood that the apparatus of the present embodiment corresponds to the method embodiment described above, and specific contents may be referred to the related description of the method embodiment, and are not described in detail herein.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A method of generating speech animation, comprising:

acquiring a volume value of a current voice signal;

determining a peak position point in position points forming a voice animation waveform to be generated and determining a peak amplitude value of the peak position point according to the volume value, wherein a plurality of position points forming the waveform are defined, whether the position points are candidate peak position points or not is determined according to the volume value corresponding to each position point, and the peak position point is determined according to the candidate peak position points;

and generating a voice animation waveform according to the peak position point and the peak amplitude value.

2. The method of claim 1, further comprising:

determining an input volume value according to the volume value of the current voice signal so as to take the input volume value as the volume value of the subsequent operation;

3. The method according to claim 1, wherein said determining peak position points among the position points composing the voice animation waveform to be generated according to the volume value comprises:

acquiring position points forming a voice animation waveform to be generated, and acquiring coordinate values of the position points;

determining candidate peak position points in the position points according to the volume value and the coordinate values of the position points;

directly taking the candidate peak position point as a finally adopted peak position point; or determining the number of peak position points; and selecting the candidate peak position points of the number as the peak position points finally adopted.

4. The method according to claim 3, wherein said determining candidate peak location points among said location points based on said volume value and coordinate values of said location points comprises:

5. The method of claim 4, wherein the probability value has a positive correlation with the input volume value and an absolute value of a coordinate value of the location point, and the determining whether the corresponding location point is a candidate peak location point according to the probability value comprises:

6. The method of claim 3, wherein determining the number of peak location points comprises:

determining the preset multiple of the volume value as the number of peak position points; alternatively, the first and second electrodes may be,

and determining the sum of the preset multiple of the volume value and the random number in the preset range as the number of the peak position points.

7. The method of claim 1, wherein determining the peak amplitude value for the peak location point comprises:

determining an amplitude maximum value according to the volume value;

determining an initial value of a peak amplitude value of the peak position point according to the coordinate value of the peak position point and the maximum amplitude value;

and determining a final value of the peak amplitude value according to the initial value.

8. The method of claim 7, wherein the initial value is positively correlated with the absolute value of the coordinate value of the peak position point and the maximum amplitude value, and the determining the final value of the peak amplitude value according to the initial value comprises:

9. The method of claim 1, wherein generating a speech animation waveform from the peak location points and the peak amplitude values comprises:

10. The method of claim 1, further comprising:

determining the updating period of the voice animation, and regenerating and displaying the voice animation according to the updating period;

11. An apparatus for generating speech animation, comprising:

the volume acquisition module is used for acquiring the volume value of the current voice signal;

a peak determining module, configured to determine a peak position point among position points forming a to-be-generated speech animation waveform according to the volume value, and determine a peak amplitude value of the peak position point, where a plurality of position points forming the waveform are defined, and corresponding to each of the position points, whether the position point is a candidate peak position point is determined according to the volume value, and the peak position point is determined according to the candidate peak position point;

and the waveform generating module is used for generating a voice animation waveform according to the peak position point and the peak amplitude value.

12. The apparatus of claim 11, further comprising:

the input volume determining module is used for determining an input volume value according to the volume value of the current voice signal so as to take the input volume value as the volume value of the subsequent operation;

13. The apparatus of claim 11, wherein the peak determining module comprises:

the position point coordinate acquisition submodule is used for acquiring position points forming a voice animation waveform to be generated and acquiring coordinate values of the position points;

a candidate position point determining submodule for determining candidate peak position points in the position points according to the volume value and the coordinate values of the position points;

a peak position point determining submodule for directly taking the candidate peak position point as a finally adopted peak position point; or determining the number of peak position points; and selecting the candidate peak position points of the number as the peak position points finally adopted.

14. The apparatus of claim 13, wherein the candidate location point determination submodule is specifically configured to:

15. The apparatus of claim 14, wherein the probability value is positively correlated with the input volume value and an absolute value of the coordinate value of the location point, and wherein the candidate location point determination sub-module is further configured to:

16. The apparatus of claim 13, wherein the peak location point determining submodule is configured to determine a number of peak location points, and comprises:

17. The apparatus of claim 11, wherein the peak determining module comprises:

the amplitude maximum value determining submodule is used for determining the amplitude maximum value according to the volume value;

the amplitude initial value determining submodule is used for determining the initial value of the peak amplitude value of the peak position point according to the coordinate value of the peak position point and the maximum amplitude value;

and the amplitude final value determining submodule is used for determining a final value of the peak amplitude value according to the initial value.

18. The apparatus of claim 17, wherein the initial value is positively correlated with the absolute value of the coordinate values of the peak position point and the maximum amplitude value, and wherein the final amplitude value determination submodule is specifically configured to:

19. The apparatus of claim 11, wherein the waveform generation module is specifically configured to:

20. The apparatus of claim 11, further comprising:

the updating module is used for determining the updating period of the voice animation, regenerating the voice animation according to the updating period and displaying the voice animation;