CN106653054B - Method and device for generating voice animation - Google Patents

Method and device for generating voice animation Download PDF

Info

Publication number
CN106653054B
CN106653054B CN201610889079.7A CN201610889079A CN106653054B CN 106653054 B CN106653054 B CN 106653054B CN 201610889079 A CN201610889079 A CN 201610889079A CN 106653054 B CN106653054 B CN 106653054B
Authority
CN
China
Prior art keywords
value
determining
position point
peak
volume value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610889079.7A
Other languages
Chinese (zh)
Other versions
CN106653054A (en
Inventor
王夏鸣
赵志翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201610889079.7A priority Critical patent/CN106653054B/en
Publication of CN106653054A publication Critical patent/CN106653054A/en
Application granted granted Critical
Publication of CN106653054B publication Critical patent/CN106653054B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • G10L21/12Transforming into visible information by displaying time domain information

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application provides a method and a device for generating voice animation, wherein the method for generating the voice animation comprises the following steps: acquiring a volume value of a current voice signal; determining a peak position point in position points forming a voice animation waveform to be generated according to the volume value, and determining a peak amplitude value of the peak position point; and generating a voice animation waveform according to the peak position point and the peak amplitude value. The method can generate more accurate voice animation of the voice-like audio spectrum, and brings more realistic voice feedback experience for users.

Description

Method and device for generating voice animation
Technical Field
The present application relates to the field of speech signal processing, and in particular, to a method and an apparatus for generating speech animation.
Background
On mainstream smart phone systems, such as iOS and Android, third-party software has permission to access a mobile phone microphone, and acquires sound recorded by the microphone by using a system-level audio recording interface. When the application software with the voice recording function interacts with the user, the recording state can be fed back to the user in real time in a voice animation mode, so that the user is informed that the current state is recording.
Although the voice animation in the related art may exhibit a ripple effect similar to a sound spectrum, the analysis of the voice animation in the related art cannot accurately reflect the real characteristics of the sound signal, and thus has a certain problem in terms of simulation effect and the like.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, an object of the present application is to provide a method for generating a speech animation, which can generate a more accurate speech animation imitating a voice frequency spectrum, and bring a more realistic speech feedback experience to a user.
Another object of the present application is to provide an apparatus for generating speech animation.
In order to achieve the above object, an embodiment of the first aspect of the present application provides a method for generating a speech animation, including: acquiring a volume value of a current voice signal; determining a peak position point in position points forming a voice animation waveform to be generated according to the volume value, and determining a peak amplitude value of the peak position point; and generating a voice animation waveform according to the peak position point and the peak amplitude value.
According to the method for generating the voice animation provided by the embodiment of the first aspect of the application, the peak position point and the peak amplitude value are determined according to the volume value, so that the frequency spectrum characteristics of sound can be reflected more truly, the voice animation of a more accurate voice-like audio spectrum is generated, and more vivid voice feedback experience is brought to a user.
In order to achieve the above object, an apparatus for generating a voice animation according to an embodiment of a second aspect of the present application includes: the volume acquisition module is used for acquiring the volume value of the current voice signal; the peak determining module is used for determining a peak position point in position points forming a voice animation waveform to be generated according to the volume value and determining a peak amplitude value of the peak position point; and the waveform generating module is used for generating a voice animation waveform according to the peak position point and the peak amplitude value.
According to the device for generating the voice animation provided by the embodiment of the second aspect of the application, the sound quantity value is obtained, and the peak position point and the peak amplitude value are determined according to the sound quantity value, so that the frequency spectrum characteristics of sound can be reflected more truly, the voice animation of a more accurate voice-like audio spectrum is generated, and more vivid voice feedback experience is brought to a user.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow diagram of a method for generating speech animation according to an embodiment of the present application;
FIG. 2 is a schematic diagram of location points composing a voice animation waveform to be generated in an embodiment of the present application;
FIG. 3 is a flow chart illustrating a method for generating speech animation according to another embodiment of the present application;
FIG. 4 is a schematic diagram of candidate peak location points in an embodiment of the present application;
FIG. 5 is a schematic illustration of the peak location points ultimately employed in the embodiments of the present application;
FIG. 6 is a schematic diagram of an initial value of a peak amplitude value in an embodiment of the present application;
FIG. 7 is a diagram illustrating final values of peak amplitude values in an embodiment of the present application;
FIG. 8 is a schematic diagram of a speech animation waveform generated in an embodiment of the present application;
FIG. 9 is a schematic structural diagram of an apparatus for generating speech animation according to an embodiment of the present application;
fig. 10 is a schematic structural diagram of an apparatus for generating a speech animation according to another embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar modules or modules having the same or similar functionality throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the application include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.
Fig. 1 is a flowchart illustrating a method for generating a speech animation according to an embodiment of the present application.
As shown in fig. 1, the present embodiment includes the following steps:
s11: and acquiring the volume value of the current voice signal.
Wherein, the volume value of the current voice signal can be detected by adopting a volume detection module. It is to be understood that the volume detection module may be implemented using existing or future technologies and will not be described in detail herein.
Assuming that the volume value of the current speech signal is represented by V0, the value of V0 is generally a value between 0 and 1. It can be understood that the physical unit of the volume value is usually decibels, and the volume value in this application refers to the normalized volume value in physical unit, so that the volume value in decibel unit is normalized to be between 0 and 1 for convenient operation. During normalization, the normalized value is positively correlated with the original decibel unit value, i.e., the higher the original volume, the larger the normalized volume value. The specific normalization algorithm is not limited and may be implemented using existing or future technologies.
S12: and determining a peak position point in the position points forming the voice animation waveform to be generated according to the volume value, and determining a peak amplitude value of the peak position point.
The parameters of the horizontal axis and the vertical axis of the voice animation waveform (waveform for short) to be generated can be set. In the present embodiment, since the waveform of the imitation-sound frequency spectrum is to be generated, as shown in fig. 2, the horizontal axis x is defined as the frequency and the vertical axis h is defined as the amplitude. Further, the location points that make up the waveform are also definable. To simplify the operation, as shown in fig. 2, N position points on the horizontal axis are defined as position points constituting the waveform. Further, in order to better match with the design principle of axial symmetry of a recording interface and a button on the mobile device, the waveform is also designed in an axial symmetry mode, and the frequency farther away from the axis is defined as the frequency with smaller amplitude contained in the human sound spectrum. As shown in fig. 2, when the coordinate values x are axisymmetric, the N position points may be symmetrically and uniformly distributed within a range of a maximum value of the preset coordinate values x.
After the position points are determined, whether the position points are candidate peak position points or not can be determined according to the volume value corresponding to each position point, and finally adopted peak position points are determined according to the candidate peak position points. For details, refer to the following embodiments.
Furthermore, the distribution range of the peak position points and the volume value form a positive correlation relationship, that is, the larger the volume value is, the richer the distribution of the peak position points is, and the medium-high frequency is involved. Because people pronounce through vocal cord vibration, the vibration range and amplitude of vocal cord muscles are larger and the frequency spectrum range is wider when the people speak loudly. Therefore, when the distribution range of the peak position points and the volume value form a positive correlation, the actual effect of the spectrum type animation is better met.
When determining the peak amplitude value, the peak amplitude value and the volume value have a positive correlation, i.e. the larger the volume value, the larger the peak amplitude value. Therefore, volume information can be fed back from the amplitude, and a user can adjust the volume according to the amplitude effect feedback so as to achieve a better recognition effect.
S13: and generating a voice animation waveform according to the peak position point and the peak amplitude value.
Wherein the generated voice animation is a waveform, which may be referred to as a voice animation waveform.
This step may include: determining the amplitude value of the position point according to the coordinate value of the position point, the coordinate value of the peak position point and the peak amplitude value; and generating a voice animation waveform according to the amplitude value of the position point. Please refer to the following description for details.
In the embodiment, the sound volume value is obtained, and the peak position point and the peak amplitude value are determined according to the sound volume value, so that the frequency spectrum characteristics of sound can be reflected more truly, more accurate voice animation of the voice-like audio spectrum is generated, and more vivid voice feedback experience is brought to a user.
Fig. 3 is a flowchart illustrating a method for generating a speech animation according to another embodiment of the present application.
As shown in fig. 3, the method of the present embodiment includes:
s31: and acquiring the volume value of the current voice signal.
The details can be found in the above example and are not described in detail herein.
S32: and determining an input volume value according to the volume value.
Assuming that the input volume value is represented by V, after the volume value V0 of the current speech signal is determined, the volume value V0 can be directly used as the input volume value V.
Further, in order to avoid excessive waveform generation due to background noise when the volume is small, the input volume value V may be determined according to the volume value V0 as follows:
if the volume value of the current voice signal is smaller than the preset volume value, setting the input volume value as a fixed value smaller than the preset value; and if the volume value of the current voice signal is greater than or equal to the preset volume value, increasing the volume value of the current voice signal by a preset increment to be used as an input volume value. It is understood that, as shown in the above embodiment, the volume value of the current speech signal is usually between 0 and 1, and therefore, in the above threshold determination, a limit between 0 and 1 may be added in addition to the preset volume value. For example, the volume value of the current speech signal being smaller than the preset volume value may specifically mean that the volume value of the current speech signal is smaller than the preset volume value and is greater than or equal to 0, the volume value of the current speech signal being greater than or equal to the preset volume value may specifically mean that the volume value of the current speech signal is greater than or equal to the preset volume value and is less than or equal to 1, and the preset volume value is a value between 0 and 1.
Specifically, taking the preset volume value as 0.1, the fixed value as 0.1, and the preset increment as 2 as an example, the calculation formula is as follows:
Figure BDA0001128765340000061
s33: the method comprises the steps of obtaining position points forming a voice animation waveform to be generated, and obtaining coordinate values of the position points.
As shown in fig. 2, the position point may be determined in a defined manner, and after the position point is determined, the coordinate value of the position point is also determined. Specifically, in each embodiment of the present application, if not specifically stated, the coordinate value refers to a coordinate value of the horizontal axis of the position point. Therefore, the coordinate value x of each position point can be acquired from the known informationj,j=1,…,N。
S34: and determining candidate peak position points in the position points according to the input volume value and the coordinate values of the position points.
In this case, it is determined whether the position point is a candidate peak position point corresponding to each position point, so that a candidate peak position point is determined among all the position points.
The method specifically comprises the following steps: determining a probability value of the position point as a candidate peak position point according to the input volume value and the coordinate value; and determining whether the corresponding position point is a candidate peak position point or not according to the probability value.
In particular, each location point is known, and thus each bitThe coordinate value of the positioning point is known, in particular the abscissa value, in xjAnd (4) showing.
After determining the input volume value V and the coordinate value xjThereafter, the probability values described above may be calculated. Wherein the probability value has a positive correlation with the input volume value and an absolute value of the coordinate value. One calculation is as follows:
Figure BDA0001128765340000071
wherein, B (x)j) Is a coordinate value of xjThe position point of (a) is a probability value of a candidate peak position point, V is an input volume value, xjIs the coordinate value of the position point, and W is the maximum value of the preset coordinate value.
After the probability value of the position point is determined, if the probability value is smaller than a preset probability value, determining that the corresponding position point is not a candidate peak position point; and if the probability value is greater than or equal to a preset probability value, determining that the corresponding position point is a candidate peak position point.
Assuming the coordinate value is xjIs not R (x) for candidate peak position pointj) R (x) represents a peak position candidate point represented by 0j) As indicated by 1, taking the preset probability value as 0.5 as an example, the following formula can be adopted to determine whether the position point is a candidate peak position point:
Figure BDA0001128765340000072
and determining whether the position point is a candidate peak position point or not by adopting the mode corresponding to each position point, so that the candidate peak position point can be determined from all the position points. For example, referring to fig. 4, a schematic diagram of candidate peak location points is given.
S35: and determining the finally adopted peak position point according to the candidate peak position points.
After the candidate peak position point is determined, the candidate peak position point can be directly used as the peak position point finally adopted.
Furthermore, in consideration of actual experience, the number of peaks needs to be limited, otherwise, the frequency spectrum loses the fluctuation of the sound spectrum after the peaks are excessively superposed. Therefore, when determining the peak position point to be finally adopted according to the candidate peak position points, the method may further include: determining the number of peak position points, and selecting the candidate peak position points of the number as the peak position points finally adopted.
When determining the number of peak position points, the number of peak position points may be determined according to the input volume value. The number of the peak position points and the input volume value form positive correlation. One calculation is as follows:
M=20*V
wherein M is the number of peak position points; v is the input volume value; 20 is an empirically selected coefficient, and may be selected to be other values.
Further, M may be varied within a predetermined range in order to simulate a situation where the number of sound frequency components may not be completely the same even at the same volume during the real sound production. Assuming that the variation of the preset range is random within the range of ± 1, the formula is:
M=20*V+Rand(0,2)-1
where Rand (0,2) refers to a random number in the range of 0-2.
After the number M is determined, M candidate peak position points may be randomly selected from all candidate peak position points as the peak position points to be finally adopted. For example, referring to FIG. 5, a schematic of the peak location points ultimately employed is given.
S36: and determining the peak amplitude value of the peak position point according to the coordinate value of the peak position point and the input volume value.
After the peak position point is determined, the coordinate value of the peak position point is also determined, so that the coordinate value of the peak position point can be acquired. Assume that the coordinate value of the peak position point is represented as xi,i=1,…,M。
The maximum amplitude value can be determined from the input volume value, and then the peak amplitude value can be determined from the coordinate value of the peak position point and the maximum amplitude value.
Specifically, the maximum amplitude value is positively correlated with the input volume value, and is expressed by the following formula:
Hmax=V0.333×0.8×H
wherein HmaxIs the amplitude maximum; v is the input volume value; h is a preset amplitude threshold; 0.333 and 0.8 are empirically set values, and may be selected as other values.
When determining the peak amplitude value according to the coordinate value of the peak position point and the maximum amplitude value, an initial value of the peak amplitude value may be determined first, and then a final value of the corresponding peak amplitude value may be determined according to the initial value.
The initial value of the peak amplitude value and the absolute value of the coordinate value of the peak position point and the maximum amplitude value form a positive correlation relationship, and the positive correlation relationship is expressed by a formula as follows:
wherein h is0(xi) Is the initial value of the peak amplitude value, HmaxIs the maximum value of the amplitude, xiIs the coordinate value of the peak position point, and W is the maximum value of the preset coordinate value.
After adding the initial value of the corresponding peak amplitude value to the peak position point, the schematic diagram of the peak may be as shown in fig. 6.
After the initial value of the peak amplitude value is determined, it can be directly used as the corresponding final value.
Further, in order to simulate a more realistic effect, random environmental noise is introduced to influence the randomness of the sound amplitude of different frequencies, so that the product of the initial value and a random number in a preset range can be used as the final value.
Assuming the predetermined range is 0.8-1.2, it is formulated as:
h(xi)=h0(xi)×Rand(0.8,1.2)
wherein, h (x)i) Is the final value of the peak amplitude value, h0(xi) Is the initial value of the corresponding peak amplitude value, Rand (0.8, 1)And 2) is a random number in the range of 0.8 to 1.2.
The peak diagram obtained by multiplying the initial value of the peak amplitude value by the random number can be shown in fig. 7.
S37: and generating a voice animation waveform according to the peak position point and the peak amplitude value.
The method specifically comprises the following steps: determining an amplitude value of the position point according to the coordinate value of the position point, the coordinate value of the peak position point and the peak amplitude value of the voice animation waveform to be generated; and generating a voice animation waveform according to the amplitude value of the position point.
Assuming that the coordinate value of each peak position point is xiThe coordinate value of each position point is represented by xjIndicating that the amplitude value h (x) of each position pointj) Can be calculated by the following formula:
Figure BDA0001128765340000101
where 0.5 is the attenuation coefficient of the neighboring point, which may be set empirically, or may be other values.
After the amplitude value of each position point is determined, a corresponding amplitude value can be added to each position point, thereby generating a corresponding voice animation waveform. For example, the final generated speech animation waveform may be as shown in FIG. 8.
After the voice animation waveform is generated, the waveform may be presented to the user as a voice animation.
Further, in some embodiments, referring to fig. 3, the method may further include:
s38: and determining the updating period of the voice animation, and regenerating and displaying the voice animation according to the updating period.
The process of regenerating the voice animation can be re-executed by referring to the above steps.
The update period may be chosen to be a fixed value, such as Δ T0. Is a fixed value set based on experience or the like.
Further, since the waveform refresh frequency is higher when speaking normally than when not speaking, the update period can be determined based on the volume value or the input volume value. Taking the example that the update period is related to the volume value V0, the formula can be expressed as:
Figure BDA0001128765340000102
where Δ T is the update period finally adopted, V0 is the volume value of the current speech signal, Δ T0Is a fixed value set empirically or the like, it is understood that the above coefficient 3 may be other values larger than 1, and the above threshold value 0.1 may be selected to be other values between 0 and 1.
In the embodiment, the peak amplitude value is positively correlated with the volume, so that volume information can be fed back from the amplitude, and a user can adjust the volume according to the amplitude effect feedback, so as to achieve a better identification effect; when the volume is higher, the audio frequency distribution is richer, and the characteristics of human sounding are better met, so that more real frequency spectrum characteristics can be reflected; the updating frequency is positively correlated with the volume, so that the response sensitivity of the voice animation is positively correlated with the volume, and the higher the volume is, the larger the volume value is, and the more timely the animation feedback is; according to the method, when the volume value of a voice signal is small (namely the volume is low), the input volume value is also small, and the wave crest is positively correlated with the input volume value, so that when a user does not speak, no redundant wave crest animation exists; random numbers are added when the peak position point and the peak amplitude value are calculated, so that the voice animation has irreproducibility, the completely same state basically does not exist, and the influence of uncertainty of environmental background noise on voice recording can be simulated.
Fig. 9 is a schematic structural diagram of an apparatus for generating a voice animation according to an embodiment of the present application.
As shown in fig. 9, the apparatus 90 includes: a volume acquisition module 91, a peak determination module 92 and a waveform generation module 93.
A volume obtaining module 91, configured to obtain a volume value of a current voice signal;
a peak determining module 92, configured to determine a peak position point among position points forming a to-be-generated speech animation waveform according to the volume value, and determine a peak amplitude value of the peak position point;
and a waveform generating module 93, configured to generate a voice animation waveform according to the peak position point and the peak amplitude value.
In some embodiments, referring to fig. 10, the apparatus 90 further comprises:
an input volume determining module 94, configured to determine an input volume value according to a volume value of a current speech signal, so as to use the input volume value as a volume value for subsequent operations;
wherein the determining an input volume value according to the volume value of the current speech signal comprises:
determining the volume value of the current voice signal as an input volume value; alternatively, the first and second electrodes may be,
if the volume value of the current voice signal is smaller than the preset volume value, setting the input volume value as a fixed value smaller than the preset value; and if the volume value of the current voice signal is greater than or equal to the preset volume value, increasing the volume value of the current voice signal by the preset value to be used as an input volume value.
In some embodiments, referring to fig. 10, the peak determining module 92 comprises:
a position point coordinate obtaining submodule 921 for obtaining position points constituting a voice animation waveform to be generated, and obtaining coordinate values of the position points;
the candidate position point determining submodule 922 is configured to determine candidate peak position points in the position points according to the volume values and the coordinate values of the position points;
a peak position point determining submodule 923 configured to directly use the candidate peak position point as a peak position point to be finally adopted; or determining the number of peak position points; and selecting the candidate peak position points of the number as the peak position points finally adopted.
In some embodiments, the candidate location point determining sub-module 922 is specifically configured to:
determining the probability value of the position point as a candidate peak position point according to the volume value and the coordinate value of the position point;
and determining whether the corresponding position point is a candidate peak position point or not according to the probability value.
In some embodiments, the probability value is positively correlated with the input volume value and the absolute value of the coordinate value of the location point, and the candidate location point determination submodule 922 is further configured to:
if the probability value is smaller than a preset probability value, determining that the corresponding position point is not a candidate peak position point; and if the probability value is greater than or equal to a preset probability value, determining that the corresponding position point is a candidate peak position point.
In some embodiments, the peak position point determining submodule 923 is configured to determine the number of peak position points, including:
determining the preset multiple of the input volume value as the number of peak position points; alternatively, the first and second electrodes may be,
and determining the sum of the preset multiple of the input volume value and the random number in the preset range as the number of the peak position points.
In some embodiments, referring to fig. 10, the peak determining module 92 comprises:
an amplitude maximum determination sub-module 924 for determining an amplitude maximum from the volume value;
an amplitude initial value determining submodule 925, configured to determine an initial value of a peak amplitude value of the peak position point according to the coordinate value of the peak position point and the maximum amplitude value;
an amplitude final value determining sub-module 926, configured to determine a final value of the peak amplitude value according to the initial value.
In some embodiments, the initial value is positively correlated with the absolute value of the coordinate value of the peak position point and the maximum amplitude value, and the final amplitude value determining sub-module 926 is specifically configured to:
taking the initial value directly as the final value; alternatively, the first and second electrodes may be,
and taking the product of the initial value and the random number in a preset range as the final value.
In some embodiments, the waveform generation module 93 is specifically configured to:
determining the amplitude value of the position point according to the coordinate value of the position point, the coordinate value of the peak position point and the peak amplitude value;
and generating a voice animation waveform according to the amplitude value of the position point.
In some embodiments, referring to fig. 10, the apparatus 90 further comprises:
the updating module 95 is configured to determine an updating period of the voice animation, and regenerate and display the voice animation according to the updating period;
wherein the update period is a preset fixed value, or the update period is related to the volume value.
It is understood that the apparatus of the present embodiment corresponds to the method embodiment described above, and specific contents may be referred to the related description of the method embodiment, and are not described in detail herein.
In the embodiment, the sound volume value is obtained, and the peak position point and the peak amplitude value are determined according to the sound volume value, so that the frequency spectrum characteristics of sound can be reflected more truly, more accurate voice animation of the voice-like audio spectrum is generated, and more vivid voice feedback experience is brought to a user.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (20)

1. A method of generating speech animation, comprising:
acquiring a volume value of a current voice signal;
determining a peak position point in position points forming a voice animation waveform to be generated and determining a peak amplitude value of the peak position point according to the volume value, wherein a plurality of position points forming the waveform are defined, whether the position points are candidate peak position points or not is determined according to the volume value corresponding to each position point, and the peak position point is determined according to the candidate peak position points;
and generating a voice animation waveform according to the peak position point and the peak amplitude value.
2. The method of claim 1, further comprising:
determining an input volume value according to the volume value of the current voice signal so as to take the input volume value as the volume value of the subsequent operation;
wherein the determining an input volume value according to the volume value of the current speech signal comprises:
determining the volume value of the current voice signal as an input volume value; alternatively, the first and second electrodes may be,
if the volume value of the current voice signal is smaller than the preset volume value, setting the input volume value as a fixed value smaller than the preset value; and if the volume value of the current voice signal is greater than or equal to the preset volume value, increasing the volume value of the current voice signal by the preset value to be used as an input volume value.
3. The method according to claim 1, wherein said determining peak position points among the position points composing the voice animation waveform to be generated according to the volume value comprises:
acquiring position points forming a voice animation waveform to be generated, and acquiring coordinate values of the position points;
determining candidate peak position points in the position points according to the volume value and the coordinate values of the position points;
directly taking the candidate peak position point as a finally adopted peak position point; or determining the number of peak position points; and selecting the candidate peak position points of the number as the peak position points finally adopted.
4. The method according to claim 3, wherein said determining candidate peak location points among said location points based on said volume value and coordinate values of said location points comprises:
determining the probability value of the position point as a candidate peak position point according to the volume value and the coordinate value of the position point;
and determining whether the corresponding position point is a candidate peak position point or not according to the probability value.
5. The method of claim 4, wherein the probability value has a positive correlation with the input volume value and an absolute value of a coordinate value of the location point, and the determining whether the corresponding location point is a candidate peak location point according to the probability value comprises:
if the probability value is smaller than a preset probability value, determining that the corresponding position point is not a candidate peak position point; and if the probability value is greater than or equal to a preset probability value, determining that the corresponding position point is a candidate peak position point.
6. The method of claim 3, wherein determining the number of peak location points comprises:
determining the preset multiple of the volume value as the number of peak position points; alternatively, the first and second electrodes may be,
and determining the sum of the preset multiple of the volume value and the random number in the preset range as the number of the peak position points.
7. The method of claim 1, wherein determining the peak amplitude value for the peak location point comprises:
determining an amplitude maximum value according to the volume value;
determining an initial value of a peak amplitude value of the peak position point according to the coordinate value of the peak position point and the maximum amplitude value;
and determining a final value of the peak amplitude value according to the initial value.
8. The method of claim 7, wherein the initial value is positively correlated with the absolute value of the coordinate value of the peak position point and the maximum amplitude value, and the determining the final value of the peak amplitude value according to the initial value comprises:
taking the initial value directly as the final value; alternatively, the first and second electrodes may be,
and taking the product of the initial value and the random number in a preset range as the final value.
9. The method of claim 1, wherein generating a speech animation waveform from the peak location points and the peak amplitude values comprises:
determining the amplitude value of the position point according to the coordinate value of the position point, the coordinate value of the peak position point and the peak amplitude value;
and generating a voice animation waveform according to the amplitude value of the position point.
10. The method of claim 1, further comprising:
determining the updating period of the voice animation, and regenerating and displaying the voice animation according to the updating period;
wherein the update period is a preset fixed value, or the update period is related to the volume value.
11. An apparatus for generating speech animation, comprising:
the volume acquisition module is used for acquiring the volume value of the current voice signal;
a peak determining module, configured to determine a peak position point among position points forming a to-be-generated speech animation waveform according to the volume value, and determine a peak amplitude value of the peak position point, where a plurality of position points forming the waveform are defined, and corresponding to each of the position points, whether the position point is a candidate peak position point is determined according to the volume value, and the peak position point is determined according to the candidate peak position point;
and the waveform generating module is used for generating a voice animation waveform according to the peak position point and the peak amplitude value.
12. The apparatus of claim 11, further comprising:
the input volume determining module is used for determining an input volume value according to the volume value of the current voice signal so as to take the input volume value as the volume value of the subsequent operation;
wherein the determining an input volume value according to the volume value of the current speech signal comprises:
determining the volume value of the current voice signal as an input volume value; alternatively, the first and second electrodes may be,
if the volume value of the current voice signal is smaller than the preset volume value, setting the input volume value as a fixed value smaller than the preset value; and if the volume value of the current voice signal is greater than or equal to the preset volume value, increasing the volume value of the current voice signal by the preset value to be used as an input volume value.
13. The apparatus of claim 11, wherein the peak determining module comprises:
the position point coordinate acquisition submodule is used for acquiring position points forming a voice animation waveform to be generated and acquiring coordinate values of the position points;
a candidate position point determining submodule for determining candidate peak position points in the position points according to the volume value and the coordinate values of the position points;
a peak position point determining submodule for directly taking the candidate peak position point as a finally adopted peak position point; or determining the number of peak position points; and selecting the candidate peak position points of the number as the peak position points finally adopted.
14. The apparatus of claim 13, wherein the candidate location point determination submodule is specifically configured to:
determining the probability value of the position point as a candidate peak position point according to the volume value and the coordinate value of the position point;
and determining whether the corresponding position point is a candidate peak position point or not according to the probability value.
15. The apparatus of claim 14, wherein the probability value is positively correlated with the input volume value and an absolute value of the coordinate value of the location point, and wherein the candidate location point determination sub-module is further configured to:
if the probability value is smaller than a preset probability value, determining that the corresponding position point is not a candidate peak position point; and if the probability value is greater than or equal to a preset probability value, determining that the corresponding position point is a candidate peak position point.
16. The apparatus of claim 13, wherein the peak location point determining submodule is configured to determine a number of peak location points, and comprises:
determining the preset multiple of the input volume value as the number of peak position points; alternatively, the first and second electrodes may be,
and determining the sum of the preset multiple of the input volume value and the random number in the preset range as the number of the peak position points.
17. The apparatus of claim 11, wherein the peak determining module comprises:
the amplitude maximum value determining submodule is used for determining the amplitude maximum value according to the volume value;
the amplitude initial value determining submodule is used for determining the initial value of the peak amplitude value of the peak position point according to the coordinate value of the peak position point and the maximum amplitude value;
and the amplitude final value determining submodule is used for determining a final value of the peak amplitude value according to the initial value.
18. The apparatus of claim 17, wherein the initial value is positively correlated with the absolute value of the coordinate values of the peak position point and the maximum amplitude value, and wherein the final amplitude value determination submodule is specifically configured to:
taking the initial value directly as the final value; alternatively, the first and second electrodes may be,
and taking the product of the initial value and the random number in a preset range as the final value.
19. The apparatus of claim 11, wherein the waveform generation module is specifically configured to:
determining the amplitude value of the position point according to the coordinate value of the position point, the coordinate value of the peak position point and the peak amplitude value;
and generating a voice animation waveform according to the amplitude value of the position point.
20. The apparatus of claim 11, further comprising:
the updating module is used for determining the updating period of the voice animation, regenerating the voice animation according to the updating period and displaying the voice animation;
wherein the update period is a preset fixed value, or the update period is related to the volume value.
CN201610889079.7A 2016-10-11 2016-10-11 Method and device for generating voice animation Active CN106653054B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610889079.7A CN106653054B (en) 2016-10-11 2016-10-11 Method and device for generating voice animation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610889079.7A CN106653054B (en) 2016-10-11 2016-10-11 Method and device for generating voice animation

Publications (2)

Publication Number Publication Date
CN106653054A CN106653054A (en) 2017-05-10
CN106653054B true CN106653054B (en) 2020-02-14

Family

ID=58855283

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610889079.7A Active CN106653054B (en) 2016-10-11 2016-10-11 Method and device for generating voice animation

Country Status (1)

Country Link
CN (1) CN106653054B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108566483A (en) * 2018-03-19 2018-09-21 百度在线网络技术(北京)有限公司 A kind of methods of exhibiting, device, terminal and the storage medium of typing voice
CN109327750B (en) * 2018-09-18 2020-10-02 广州视源电子科技股份有限公司 Method, device, equipment and storage medium for displaying microphone volume change
CN111966278B (en) * 2020-08-28 2022-03-25 网易(杭州)网络有限公司 Prompting method of terminal equipment, terminal equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3071652A (en) * 1959-05-08 1963-01-01 Bell Telephone Labor Inc Time domain vocoder
JP2008141552A (en) * 2006-12-04 2008-06-19 Seiko Epson Corp Automatic modulation degree adjustment method and apparatus for ultrasonic speaker
CN105511833A (en) * 2015-11-25 2016-04-20 广州周立功单片机科技有限公司 Method for audio spectrum and animation display and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007235930A (en) * 2006-02-03 2007-09-13 Seiko Epson Corp Output control method of ultrasonic speaker, ultrasonic speaker system and display device
CN104144280A (en) * 2013-05-08 2014-11-12 上海恺达广告有限公司 Voice and action animation synchronous control and device of electronic greeting card
CN103942048B (en) * 2014-04-09 2018-10-09 Tcl集团股份有限公司 A kind of method and device that speech volume animation is shown

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3071652A (en) * 1959-05-08 1963-01-01 Bell Telephone Labor Inc Time domain vocoder
JP2008141552A (en) * 2006-12-04 2008-06-19 Seiko Epson Corp Automatic modulation degree adjustment method and apparatus for ultrasonic speaker
CN105511833A (en) * 2015-11-25 2016-04-20 广州周立功单片机科技有限公司 Method for audio spectrum and animation display and system

Also Published As

Publication number Publication date
CN106653054A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN110288997B (en) Device wake-up method and system for acoustic networking
CN110832580B (en) Detection of replay attacks
US10997965B2 (en) Automated voice processing testing system and method
US9560449B2 (en) Distributed wireless speaker system
CN110473525B (en) Method and device for acquiring voice training sample
CN109257682B (en) Sound pickup adjusting method, control terminal and computer readable storage medium
US10290322B2 (en) Audio and video synchronizing perceptual model
CN110265064B (en) Audio frequency crackle detection method, device and storage medium
US11812254B2 (en) Generating scene-aware audio using a neural network-based acoustic analysis
CN106653054B (en) Method and device for generating voice animation
CN106817656B (en) A kind of outer method and mobile terminal for putting sound system audio calibration
CN107170456A (en) Method of speech processing and device
JP6121606B1 (en) Hearing training apparatus, operating method of hearing training apparatus, and program
CN104038473A (en) Method of audio ad insertion, device, equipment and system
CN109979433A (en) Voice is with reading processing method, device, equipment and storage medium
JP2022187977A (en) Wake-up test method, device, electronic device and readable storage medium
CN109800724A (en) A kind of loudspeaker position determines method, apparatus, terminal and storage medium
US11462236B2 (en) Voice recordings using acoustic quality measurement models and actionable acoustic improvement suggestions
US9368095B2 (en) Method for outputting sound and apparatus for the same
US9769582B1 (en) Audio source and audio sensor testing
CN111627460B (en) Ambient reverberation detection method, device, equipment and computer readable storage medium
CN109121068A (en) Sound effect control method, apparatus and electronic equipment
US20190385590A1 (en) Generating device, generating method, and non-transitory computer readable storage medium
CN113241056A (en) Method, device, system and medium for training speech synthesis model and speech synthesis
CN111951786A (en) Training method and device of voice recognition model, terminal equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant