CN106383648A

CN106383648A - Intelligent terminal voice display method and apparatus

Info

Publication number: CN106383648A
Application number: CN201510448262.9A
Authority: CN
Inventors: 王欣; 吴贵英
Original assignee: Qingdao Hisense Electronics Co Ltd
Current assignee: Qingdao Hisense Electronics Co Ltd
Priority date: 2015-07-27
Filing date: 2015-07-27
Publication date: 2017-02-08

Abstract

The embodiment of the invention discloses an intelligent terminal voice display method and apparatus. The method comprises steps of receiving voice data flows participating conversation, sampling and analyzing the voice data flows at fixed time intervals to acquire sampled sound loudness, pitch and speed information, determining loudness bubbles, comparing the sampled sound loudness and a preset threshold value, displaying the sampled sound via a first type circle with different diameters according to a threshold value section where the sampled sound lies, determining sound pitch bubbles, comparing frequency of the sampled sound and a preset frequency threshold value, displaying the sampled sound via a second type circle with different diameters according a threshold value section where the frequency lies, combining the acquired loudness bubbles and pitch bubbles to form an animated object, determining playing speed of the animated object according to the speed information of the sampled sound, determining an animated object motion curve according to the acquired loudness, pitch and speed information, and playing the well-set bubble animation on a display screen. By the use of the intelligent terminal voice display method, voice can be displayed in an emotional and personalized way with bubble animation effect, so user experience can be enhanced; and bubble animated scenes increases understanding depths of users to the voice.

Description

The method and apparatus that a kind of intelligent terminal's voice shows

Technical field

The present invention relates to a kind of method that shows of voice of intelligent terminal, more particularly, to intelligent terminal and dress Put.

Background technology

With the fast development of the communications industry, multiple intelligence such as smart mobile phone, intelligent watch, Intelligent bracelet Mobile terminal is increasingly favored by people.With the variation of intelligent mobile terminal, will necessarily use The requirement more and more higher to man-machine interaction for the family, thus the demand producing gets more and more.Such as smart mobile phone, User, from initial function of sending short messages of substantially making a phone call, is gradually developed online till now, is taken pictures, listens Music, see the demand of the various functions such as video, reading.Man-machine interaction mode also develops from keyboard, touch-control To voice, video.Experience for user interface considers, during speech communication in pairs, Need to provide and should be readily appreciated that and vivid voice Interaction Interface.

Current voice Interaction Interface mainly has：The wave that Fructus Mali pumilae siri adopts is shown, wechat platform is adopted The aperture that block diagram is shown and worm hole voice assistant adopts is shown.

Inventor finds during realizing the present invention：The speech communication interface cartoon effect of prior art Arrange stiff ice-cold, design lacks emotional culture and affinity.

Content of the invention

For solving above-mentioned technical problem, the method that intelligent terminal's voice provided by the present invention shows can be led to Cross following technical method to realize：

Receive from the audio data stream participating in session, it is sampled at regular intervals point Analysis, obtains loudness, tone and the word speed information of sound；

Audio data stream is carried out with bubble form by animation according to the loudness analyzing, tone and word speed information Display, described bubble is made up of the first kind circle of different-diameter and the Second Type circle of different-diameter, tool There are certain speed and curve movement.

A kind of method that intelligent terminal's voice shows, including：

Receive from the audio data stream participating in session, at regular intervals it is sampled point Analysis, obtains loudness, tone and the word speed information of sample audio, and described tone information is come by the frequency of sound Characterize, described word speed information to be characterized by the zero-crossing rate of sound；

Determine loudness bubble, the loudness of sample audio is contrasted with default loudness threshold values, according to sound The residing threshold values of degree is interval to be represented sample audio with the different first kind circle of diameter；

Determine tone bubble, the frequency of sample audio is contrasted with default frequency threshold, according to frequency Threshold values residing for rate is interval to be represented sample audio with the different Second Type circle of diameter；

The loudness bubble of acquisition and tone bubble are combined into animation object；

Determine the broadcasting speed of animation object according to the word speed information of the sample audio obtaining, according to obtain Loudness, tone and word speed information determine animation object motion angularity；

The bubble setting animation is played out within display screen.

The device that a kind of intelligent terminal's voice shows, including：

Decimation blocks are used for the audio data stream of the participation session receiving is entered at regular intervals Row sampling, obtains sample sound；

Speech analysis module, for being analyzed to the sample sound obtaining, the loudness of acquisition sample audio, Tone and word speed information；

Animation object determining module, for determining the loudness bubble of sample sound and tone bubble and will own The loudness bubble of sample sound and tone bubble are combined into animation object；

Animation object setup module, for determining described animation object according to the word speed of the sample audio obtaining Broadcasting speed, described animation object fortune is determined according to the loudness of sample audio obtaining, intonation and word speed Moving curve；

Animation playing module, for playing out the animation setting object within display screen.

Brief description

Fig. 1 is present invention method schematic flow sheet

Fig. 2 is embodiment of the present invention loudness bubble definition figure

Fig. 3 is embodiment of the present invention tone bubble definition figure

Fig. 4 is embodiment of the present invention word speed speed static state schematic diagram

Fig. 5 rises and falls for embodiment of the present invention sound wave and defines figure

Fig. 6 is the static schematic diagram of embodiment of the present invention sound wave fluctuating

Specific embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme total to the embodiment of the present invention is entered What row was clear, complete describes it is clear that described embodiment is only a part of embodiment of the present invention, Rather than whole embodiments.Based on embodiments of the invention, those skilled in the art are not making wound The every other embodiment being obtained under the premise of the property made work, broadly falls into the scope of protection of the invention.

As Fig. 1, embodiments provide a kind of realize the method that intelligent terminal's voice shows, including：

Receive from the voice Streaming Media participating in session, it is sampled at regular intervals point Analysis, obtains loudness, tone and the word speed information of sound；

According to the loudness of the sample audio analyzing, tone and word speed information, voice is carried out with bubble form Animation shows, described bubble justifies group by the first kind circle of different-diameter and the Second Type of different-diameter Become, there is certain speed and curve movement.

A kind of method that intelligent terminal's voice shows, including：

101：Receive from the audio data stream participating in session, with Fixed Time Interval, it is sampled Analysis, obtains loudness H of each sample audio, frequency f, sound wave zero-crossing rate λ_t；

Wherein said Fixed Time Interval is set to 100ms；Described sound intensity refers to that the pulse of sound is compiled Code modulation PCM quantifies loudness value, for describing people's subjective feeling to sound size；Described tone refers to The frequency of sound, for describing people's subjective feeling to volume up-down.

102：Determine loudness bubble, the loudness of sample audio contrasted with default loudness threshold values, Sample audio is represented by the threshold values interval according to residing for loudness with the different first kind of diameter；

Specific it is assumed that described loudness threshold values be two, described first kind circle be filled circles, will obtain Each sample audio loudness value λ_tDetermine the input variable of algorithm as loudness bubble, and default Loudness threshold values is contrasted, and algorithm is as follows：

Loudness is divided into equally spaced three intervals, right respectively according to three intervals of order from big to small Answer large, medium and small loudness, respectively with large, medium and small three kinds of filled circles description, formula is as follows：

I_{t} = \{\begin{matrix} p_{1}, H_{m i n} < h_{t} < H_{m i n} + Δ \\ p_{2}, H_{m i n} + Δ < h_{t} < H_{m i n} + 2 Δ \\ p_{3}, H_{m i n} + 2 Δ < h_{t} < H_{m a x} \end{matrix}

Wherein

Δ = \frac{H_{m a x} - H_{\min}}{3}

Wherein h_tFor characterizing the loudness of time t sample sound, I_tFor time t sample sound corresponding loudness bubble Bubble selects, P₁、P₂、P₃Represent large, medium and small three kinds of filled circles respectively, due to the volume of different sound pick-up outfits Quantized value is different, and compromise considers H_maxValue is 100, H_minValue is 0.

Described default loudness threshold values is H_min+Δ、H_min+2Δ.

As shown in Fig. 2 the definition of described large, medium and small three kinds of filled circles is 10 pixels for severe excess syndrome circle diameter, A diameter of 7 pixels of middle filled circles, little filled circles pixel is 4 pixels.

103：Determine tone bubble, the frequency of sample audio contrasted with default frequency threshold, Threshold values according to residing for frequency is interval to be represented sample audio with the different Second Type circle of diameter, described Frequency is characterizing the parameter value of the tone of people's subjective feeling；

Specific it is assumed that described frequency threshold be two, described first kind circle be open circles, will obtain Each sample audio frequency f_tDetermine the input variable of algorithm as tone bubble, with default frequency Rate threshold values is contrasted, and algorithm is as follows：

Frequency partition is equally spaced three intervals, right respectively according to three intervals of order from high to low Answer high, medium and low frequency, respectively with large, medium and small three kinds of open circles description, formula is as follows：

X_{t} = \{\begin{matrix} B_{1}, F_{m i n} < f_{t} < F_{m i n} + δ \\ B_{2}, F_{m i n} + δ < f_{t} < F_{m i n} + 2 δ \\ B_{3}, F_{m i n} + 2 δ < f_{t} < F_{m a x} \end{matrix}

WhereinF_maxFor frequency maxima, F_minFor frequency minima, f_tSign time t sample The loudness of sound, X_tSelect for time t sample audio corresponding loudness bubble, B₁、B₂、B₃Generation respectively The large, medium and small three kinds of filled circles of table.

Described predeterminated frequency threshold values is F_min+δ、F_min+2δ.

As shown in figure 3, it is 10 that the definition of described large, medium and small three kinds of filled circles is respectively severe excess syndrome circle diameter Pixel, a diameter of 7 pixels of middle filled circles, little filled circles pixel is 4 pixels.

104：The loudness bubble of acquisition and tone bubble are combined into animation object；

, by the way of two dimension interval plane bubble is put at random, described two dimension is interval flat for described compound mode The length in face and width are all set as that maximum loudness bubble and the diameter of maximum tone bubble close, i.e. 20 pixel.

105：Zero-crossing rate according to the sample audio obtaining arranges the broadcasting speed of animation object；

Because the path width that animation object is play is certain, therefore can be by arranging animation object Playing duration is realizing the speed effect of broadcasting speed.As shown in figure 4, the playing duration when sample audio More in short-term, screen a range of voice bubble is more intensive, otherwise more sparse.

Specifically, be not in not see soon very much and too impact function slowly when playing for ensureing animation, setting The duration that animation object is play limits scope as [L_min,L_max], zero-crossing rate span is (0, λ_max), wherein 0≤λ_max<1, animation object playing duration is obtained according to equation below：

l_{t} = \frac{L_{m a x} - L_{m i n}}{λ_{m a x}} λ_{t} + L_{m i n}

Wherein l_tFor the playing duration of time t sample sound, L_maxFor maximum long recording time, L_minFor the shortest recording Duration, λ_tFor the short-time average zero-crossing rate of the corresponding sample sound of time t, λ_maxFor in the 100ms time The maximum of the short-time average zero-crossing rate of every frame acoustic signals.

Described short-time average zero-crossing rate refers to the zeroaxial number of times of every frame signal, relevant with frequency, permissible The speed of reflection word speed.The word speed of sound is faster, and the speed that animation is play is faster, conversely, animation is broadcast The speed put is slower.

106：The loudness of the sample audio according to acquisition, tone, word speed information determine animation object motion Curve；

Set the movement locus of playing animation as sine curve, the amplitude of curve by sample audio loudness, Tone, word speed determine jointly, specifically the loudness of sample audio, frequency, word speed information are carried out difference Set of weights synthesize the amplitude of corresponding sample audio in sine curve, formula is as follows：

A_{\partial} = \frac{Σ_{i = 1}^{i = 3} \partial_{i} T_{i}}{Σ_{i = 1}^{i = 3} T_{i}}

WhereinImpact coefficient can be according to time or the dynamic setting of application, and span is (0,1), T_iFor loudness, Tone, the word speed impact share to profile amplitude, can be set as fixed value, value model according to different application EncloseWherein A_maxSpatial altitude for bubble playing animation in application.

As shown in Fig. 5 or 6, the numerical value fluctuating between each circle during the display of animation object depends on corresponding to The amplitude of sample audio.

107：The bubble setting animation is played out.

The device that a kind of intelligent terminal's voice shows, including：

801：Decimation blocks be used for receive participation session audio data stream with regular time Every being sampled, obtain sample sound；

802：Speech analysis module, for being analyzed to the sample sound obtaining, obtains sample audio Loudness, tone, word speed information；

803：Animation object determining module, for determining the loudness bubble of sample sound and tone bubble simultaneously The loudness bubble of all sample sounds and tone bubble are combined into animation object；

804：Animation object setup module, described dynamic for being determined according to the word speed of the sample audio obtaining Draw the broadcasting speed of object, the loudness of the sample audio according to acquisition, intonation and word speed determine described animation Object motion curve；

Specific inclusion sample audio playing duration computing unit and curve movement magnitude determinations unit.Described Sample audio playing duration computing unit is used for calculating the time of each sample audio corresponding bubble broadcasting, Described curve movement magnitude determinations unit is used for calculating the amplitude of the corresponding bubble of every sample audio.

805：Animation playing module, for playing out the animation setting object within display screen.

Method and apparatus that a kind of intelligent terminal's voice of the embodiment of the present invention shows is it is achieved that according to sound Tone, loudness and word speed information according to certain rule, with different bubbles on mobile terminal screen Form is shown, produces dynamic and interesting voice bubble identification process, allows whole interactive voice process No longer dry as dust it is achieved that emotional culture expression is carried out to the voice messaging of user input.

The method and apparatus that a kind of intelligent terminal's the voice above embodiment of the present invention being provided shows It is described in detail, the explanation of above example is served only for help and understands the method for the present invention and core Thought, is not limited to the present invention；For those skilled in the art, all at this The right that the modification made within bright spirit and principle, equivalent, improvement etc. are all contained in the present invention will Ask in protection domain.

Claims

1. a kind of method that intelligent terminal's voice shows is it is characterised in that include：

Receive from the audio data stream participating in session, at regular intervals it is sampled point Analysis, obtains loudness, tone and the word speed information of sample audio；

Determine loudness bubble, the loudness of sample audio is contrasted with default loudness threshold values, according to sound The residing threshold values of degree is interval to represent, loudness is more big right by sample audio with the different first kind circle of diameter The first kind circular diameter answered is bigger；

Determine tone bubble, the frequency of sample audio is contrasted with default frequency threshold, according to frequency Threshold values residing for rate is interval to represent, frequency is more big right by sample audio with the different Second Type circle of diameter The Second Type circular diameter answered is bigger；

Determine broadcasting speed and curve movement, the word speed information of the sample audio according to described acquisition determines to be moved Draw the broadcasting speed of object, determine that animation object is transported according to the loudness of described acquisition, tone, word speed information Moving curve；

The bubble setting animation is played out within display screen.

2. according to claim 1 method it is characterised in that described word speed information refers to sound Short-time average zero-crossing rate, refer to the number of times by null value for every frame signal.

3. according to claim 1 method it is characterised in that the described first kind circle and Equations of The Second Kind Type circle can be filled circles and open circles respectively.

4. according to claim 1 method it is characterised in that described loudness threshold values number can be Certain numerical value default, the number of the different-diameter that described certain numerical value default is justified by the first kind is determined Fixed；Described frequency threshold number can be certain numerical value default, and described certain numerical value default is by second The number of the different-diameter of type circle determines.

5. according to claim 1 method it is characterised in that described by obtain loudness bubble and Tone bubble is combined into animation object by the way of two dimension interval plane bubble is put at random, described two dimension The length of interval plane and width are all set as that maximum loudness bubble and the diameter of maximum tone bubble close.

6. according to claim 1 method it is characterised in that described according to obtain sample audio Word speed information determine that the broadcasting speed of animation object is determined by equation below：

l_{t} = \frac{L_{m a x} - L_{m i n}}{λ_{m a x}} λ_{t} + L_{m i n}

Wherein l_tFor the playing duration of time t sample sound, L_maxFor maximum long recording time, L_minFor the shortest Long recording time, λ_tFor the short-time average zero-crossing rate of the corresponding sample sound of time t, λ_maxDuring for 100ms The maximum of the short-time average zero-crossing rate of interior each frame sonic data.

7. according to claim 1 method it is characterised in that described according to obtain loudness, tone Determine animation object motion curve with word speed information, formula is as follows：

A_{\partial} = \frac{Σ_{i = 1}^{i = 3} \partial_{i} T_{i}}{Σ_{i = 1}^{i = 3} T_{i}}

WhereinImpact coefficient can be according to time or the dynamic setting of application, and span is (0,1), T_iFor Loudness, tone, the word speed impact share to profile amplitude, can be set as fixed value according to different application, SpanWherein A_maxSpatial altitude for bubble playing animation in application.

8. the device that a kind of intelligent terminal's voice shows is it is characterised in that include：

Sampling module is used for the audio data stream of the participation session receiving is entered at regular intervals Row sampling, obtains sample sound；

Animation object determining module, for determining the loudness bubble of sample sound and tone bubble will be described The loudness bubble of sample sound and tone bubble are combined into animation object；

Animation object setup module, for determining described animation object according to the word speed of the sample audio obtaining Broadcasting speed, described animation pair is determined according to the loudness of sample audio obtaining, intonation and word speed information As curve movement；