CN109587603A - Method for controlling volume, device and storage medium - Google Patents

Method for controlling volume, device and storage medium Download PDF

Info

Publication number
CN109587603A
CN109587603A CN201811506570.2A CN201811506570A CN109587603A CN 109587603 A CN109587603 A CN 109587603A CN 201811506570 A CN201811506570 A CN 201811506570A CN 109587603 A CN109587603 A CN 109587603A
Authority
CN
China
Prior art keywords
voice signal
mobility
value
target sound
energy matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811506570.2A
Other languages
Chinese (zh)
Other versions
CN109587603B (en
Inventor
张晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN201811506570.2A priority Critical patent/CN109587603B/en
Publication of CN109587603A publication Critical patent/CN109587603A/en
Application granted granted Critical
Publication of CN109587603B publication Critical patent/CN109587603B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/12Circuits for transducers, loudspeakers or microphones for distributing signals to two or more loudspeakers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups

Landscapes

  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The disclosure is directed to a kind of method for controlling volume, device and storage mediums, belong to signal processing technology field.Method includes: acquisition voice signal;Obtain the energy matrix of the voice signal;The mobility of the target sound in the voice signal is determined based on the energy matrix of the voice signal;The actual gain of the voice signal is determined based on the mobility of the target sound;According to voice signal described in the actual gain adjustments, to control the volume of the voice signal.The mobility of the target sound in voice signal is determined by the energy matrix of voice signal, and the actual gain of voice signal is determined by the mobility of target sound, it is amplified so as to avoid the bottom in environment from making an uproar, so that the control of volume more matches auditory properties, and then volume control effect can be promoted.

Description

Method for controlling volume, device and storage medium
Technical field
This disclosure relates to signal processing technology field more particularly to a kind of method for controlling volume, device and storage medium.
Background technique
With the rise of internet, the social media for relying on internet is more and more, and network direct broadcasting is therein one Kind.The advantage of internet is drawn and continued to network direct broadcasting, is broadcast live on the net in the way of video signal, because live streaming is real When, and living broadcast environment is multifarious, the sound of main broadcaster itself is of different sizes, and the distance of distance microphone is also different, therefore straight It is very big to broadcast sound intensity difference.In order to which the sound intensity experience of live streaming is more consistent, avoids sound intensity suddenly big or suddenly small, need The processing of automatic volume control loudness.
In the related technology, yield value is automatically adjusted according to the amplitude of input signal and target amplitude, so that output signal Amplitude is close to target amplitude.
However, this kind of mode is based primarily upon signal amplitude, it is possible to the bottom in environment be made an uproar amplification, lead to control effect not It is good.
Summary of the invention
The disclosure provides a kind of method for controlling volume, device and storage medium, can overcome the problems in the relevant technologies.
According to the first aspect of the embodiments of the present disclosure, a kind of method for controlling volume is provided, comprising:
Obtain voice signal;
Obtain the energy matrix of the voice signal;
The mobility of the target sound in the voice signal is determined based on the energy matrix of the voice signal;
The actual gain of the voice signal is determined based on the mobility of the target sound;
According to voice signal described in the actual gain adjustments, to control the volume of the voice signal.
In a kind of possible embodiment, the energy matrix for obtaining the voice signal, comprising:
The voice signal is converted into frequency by FFT (Fast Fourier Transformation, Fourier transformation) Domain signal;
Obtain the frequency domain energy signal of the frequency-region signal;
It combines the frequency domain energy signal with the frequency domain energy signal of reference number amount before, obtains the voice signal Energy matrix.
In a kind of possible embodiment, the energy matrix based on the voice signal determines the voice signal In target sound mobility, comprising:
Energy matrix based on the voice signal obtains the feature of the isoboles of the energy matrix, the energy matrix The features of isoboles include at least one of gray scale richness and Texture complication;
The mobility of the target sound in the voice signal is determined based on the feature of the isoboles of the energy matrix.
In a kind of possible embodiment, the energy matrix based on the voice signal obtains the energy matrix Isoboles feature, comprising:
The variance of the energy matrix is obtained, and obtains the mean value of the energy matrix;
The gray scale richness of the isoboles of the energy matrix is obtained according to the variance of the energy matrix and mean value.
In a kind of possible embodiment, the energy matrix based on the voice signal obtains the energy matrix Isoboles feature, comprising:
The isoboles of the energy matrix are divided into multiple sub-blocks, the intra prediction of different directions is done to each sub-block;
It is exhausted between the predicted value of either direction and each row actual pixel value of any sub-block to obtain any sub-block To value error, and the absolute value error to any sub-block in the either direction is averaging, the average absolute value that will be obtained Error as any sub-block the either direction block error;
Block error to any sub-block in all directions is averaging, using obtained average block error as any son The distortion value of block;
The distortion value of all sub-blocks is averaging, using obtained average distortion value as the isoboles of the energy matrix Texture complication.
In a kind of possible embodiment, the feature of the isoboles based on the energy matrix determines the sound The mobility of target sound in signal, comprising:
The initial active of the target sound in the voice signal is determined based on the feature of the isoboles of the energy matrix Degree;
According to the relationship of the initial active degree and mobility threshold value, the work of the target sound in the voice signal is determined Dynamic degree.
In a kind of possible embodiment, the feature of the isoboles based on the energy matrix determines the sound The initial active degree of target sound in signal, comprising:
Summation is weighted to the gray scale richness and Texture complication of the isoboles of the energy matrix, obtained weighting Initial active degree of the summed result as the target sound in the voice signal.
In a kind of possible embodiment, the relationship according to the initial active degree and mobility threshold value is determined The mobility of target sound in the voice signal, comprising:
If the initial active degree is greater than the first mobility threshold value, the mobility of the target sound is the first reference Value;
If the initial active degree, less than the second mobility threshold value, the mobility of the target sound is the second reference Value;
If the initial active degree is greater than the second mobility threshold value, and is less than the first mobility threshold value, then obtain Take the first difference between the initial active degree and the second mobility threshold value, and obtain the first mobility threshold value with The second difference between the second mobility threshold value, using the quotient of first difference and second difference as the target The mobility of sound;Wherein, the first mobility threshold value is greater than the second mobility threshold value, and first reference value is greater than Second reference value.
In a kind of possible embodiment, the mobility based on the target sound determines the voice signal Actual gain, comprising:
Obtain the loudness value of the voice signal;
Loudness value and target loudness value based on the voice signal determine the loudness value for needing to adjust;
Target gain is obtained according to the loudness value for needing to adjust;
The gain change step of the voice signal is determined by the mobility of the target sound;
Relationship based on the target gain Yu the actual gain of former frame voice signal, according to the increasing of the voice signal Beneficial change step determines the actual gain of the voice signal.
In a kind of possible embodiment, after the voice signal according to the actual gain adjustments, also wrap It includes:
Amplitude limiting processing is carried out to the voice signal after adjusting.
According to the second aspect of an embodiment of the present disclosure, a kind of sound volume control device is provided, comprising:
First acquisition unit is configured as obtaining voice signal;
Second acquisition unit is configured as obtaining the energy matrix of the voice signal;
First determination unit is configured as determining the mesh in the voice signal based on the energy matrix of the voice signal Mark the mobility of sound;
Second determination unit is configured as determining the practical increasing of the voice signal based on the mobility of the target sound Benefit;
Control unit is configured as the voice signal according to the actual gain adjustments, to control the voice signal Volume.
In a kind of possible embodiment, the second acquisition unit is configured as institute through Fourier transformation FFT It states voice signal and is converted into frequency-region signal;Obtain the frequency domain energy signal of the frequency-region signal;By the frequency domain energy signal with It is combined before with reference to the frequency domain energy signal of quantity, obtains the energy matrix of the voice signal.
In a kind of possible embodiment, first determination unit, comprising:
Subelement is obtained, is configured as obtaining the isoboles of the energy matrix based on the energy matrix of the voice signal Feature, the feature of the isoboles of the energy matrix includes at least one of gray scale richness and Texture complication;
It determines subelement, is configured as determining based on the feature of the isoboles of the energy matrix in the voice signal The mobility of target sound.
In a kind of possible embodiment, the acquisition subelement is configured as obtaining the variance of the energy matrix, And obtain the mean value of the energy matrix;The isoboles of the energy matrix are obtained according to the variance of the energy matrix and mean value Gray scale richness.
In a kind of possible embodiment, the acquisition subelement is configured as the isoboles of the energy matrix Multiple sub-blocks are divided into, the intra prediction of different directions is done to each sub-block;Any sub-block is obtained in the predicted value of either direction Absolute value error between each row actual pixel value of any sub-block, and to any sub-block in the either direction Absolute value error be averaging, using obtained average absolute value error as any sub-block the either direction block miss Difference;Block error to any sub-block in all directions is averaging, using obtained average block error as any sub-block Distortion value;The distortion value of all sub-blocks is averaging, using obtained average distortion value as the isoboles of the energy matrix Texture complication.
In a kind of possible embodiment, the determining subelement, comprising:
First determining module is configured as determining based on the feature of the isoboles of the energy matrix in the voice signal Target sound initial active degree;
Second determining module is configured as determining the sound according to the relationship of the initial active degree and mobility threshold value The mobility of target sound in sound signal.
In a kind of possible embodiment, first determining module is configured as to the equivalent of the energy matrix The gray scale richness and Texture complication of figure are weighted summation, and obtained weighted sum result is as in the voice signal The initial active degree of target sound.
In a kind of possible embodiment, second determining module is greater than if being configured as the initial active degree First mobility threshold value, then the mobility of the target sound is the first reference value;If the initial active degree is living less than second Dynamic degree threshold value, then the mobility of the target sound is the second reference value;If the initial active degree is greater than second activity Threshold value is spent, and is less than the first mobility threshold value, then is obtained between the initial active degree and the second mobility threshold value The first difference, and obtain the second difference between the first mobility threshold value and the second mobility threshold value, will be described Mobility of the quotient of first difference and second difference as the target sound;Wherein, the first mobility threshold value is big In the second mobility threshold value, first reference value is greater than second reference value.
In a kind of possible embodiment, second determination unit is configured as obtaining the sound of the voice signal Angle value;Loudness value and target loudness value based on the voice signal determine the loudness value for needing to adjust;It needs to adjust according to described Whole loudness value obtains target gain;The change in gain step of the voice signal is determined by the mobility of the target sound It is long;Relationship based on the target gain Yu the actual gain of former frame voice signal becomes according to the gain of the voice signal Change the actual gain that step-length determines the voice signal.
In a kind of possible embodiment, described device, further includes:
Clipping unit is configured as carrying out amplitude limiting processing to the voice signal after adjusting.
According to the third aspect of an embodiment of the present disclosure, a kind of non-transitorycomputer readable storage medium is provided, when described When instruction in storage medium is executed by the processor of terminal, enable the terminal to execute a kind of method for controlling volume, the side Method includes:
Obtain voice signal;
Obtain the energy matrix of the voice signal;
The mobility of the target sound in the voice signal is determined based on the energy matrix of the voice signal;
The actual gain of the voice signal is determined based on the mobility of the target sound;
According to voice signal described in the actual gain adjustments, to control the volume of the voice signal.
According to a fourth aspect of embodiments of the present disclosure, a kind of application product is provided, when in the application product Instruction by terminal processor execute when, enable the terminal to execute following method for controlling volume:
Obtain voice signal;
Obtain the energy matrix of the voice signal;
The mobility of the target sound in the voice signal is determined based on the energy matrix of the voice signal;
The actual gain of the voice signal is determined based on the mobility of the target sound;
According to voice signal described in the actual gain adjustments, to control the volume of the voice signal.
The technical scheme provided by this disclosed embodiment is include at least the following beneficial effects:
The mobility of the target sound in voice signal is determined by the energy matrix of voice signal, and passes through target sound Mobility determine the actual gain of voice signal, be amplified so as to avoid the bottom in environment from making an uproar, so that the control of volume System more matches auditory properties, and then can promote volume control effect.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not The disclosure can be limited.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and be used to explain the principle of the present invention together with specification.
Fig. 1 is a kind of flow chart of method for controlling volume shown according to an exemplary embodiment.
Fig. 2 is a kind of flow chart of method for controlling volume shown according to an exemplary embodiment.
Fig. 3 is a kind of loudness contour schematic diagram shown according to an exemplary embodiment.
Fig. 4 is a kind of overall flow figure of method for controlling volume shown according to an exemplary embodiment.
Fig. 5 is a kind of block diagram of sound volume control device shown according to an exemplary embodiment.
Fig. 6 is a kind of block diagram of first determination unit shown according to an exemplary embodiment.
Fig. 7 is a kind of block diagram of determining subelement shown according to an exemplary embodiment.
Fig. 8 is a kind of block diagram of sound volume control device shown according to an exemplary embodiment.
Fig. 9 is a kind of block diagram of device shown according to an exemplary embodiment.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment Described in embodiment do not represent all embodiments consistented with the present invention.On the contrary, they be only with it is such as appended The example of device and method being described in detail in claims, some aspects of the invention are consistent.
With the rise of internet, the social media for relying on internet is more and more, and network direct broadcasting is therein one Kind.Network direct broadcasting is a kind of emerging network social intercourse mode, and network direct broadcasting platform also becomes a kind of brand-new social media.Net The advantage of internet is drawn and has been continued in network live streaming, is broadcast live on the net in the way of video signal, can be by product introduction, phase The content sites such as meeting, background introduction, scheme assessment, investigation on the net, dialogue interview, online training are closed to be published on internet, Using the intuitive, quick of internet, the form of expression is good, abundant in content, interactivity is strong, region is unrestricted, audient can divide Feature reinforces the promotion effect of site of activity.After the completion of live broadcast, it can also continue to provide replay, program request at any time, effectively prolong Time and the space for having grown live streaming, play the maximum value of live content.
However, the sound of main broadcaster itself is of different sizes because living broadcast environment is multifarious, the distance of distance microphone is not yet Together, therefore, live streaming sound intensity difference is very big.In order to which the sound intensity experience of live streaming is more consistent, avoid sound intensity suddenly big Suddenly small, need the processing of automatic volume control loudness.For this purpose, the embodiment of the present disclosure provides a kind of method for controlling volume.
Fig. 1 is a kind of flow chart of method for controlling volume shown according to an exemplary embodiment, as shown in Figure 1, the party Method is for including the following steps in terminal.
In step s 11, voice signal is obtained.
In step s 12, the energy matrix of voice signal is obtained.
In step s 13, the mobility of the target sound in voice signal is determined based on the energy matrix of voice signal.
In step S14, the actual gain of voice signal is determined based on the mobility of target sound.
In step S15, according to actual gain adjustments voice signal, to control the volume of voice signal.
The method that the embodiment of the present disclosure provides, the target sound in voice signal is determined by the energy matrix of voice signal Mobility, and the actual gain of voice signal is determined by the mobility of target sound, so as to avoid in environment Bottom, which is made an uproar, to be amplified, so that the control of volume more matches auditory properties, and then can promote volume control effect.
In a kind of possible implementation, the energy matrix of voice signal is obtained, comprising:
Voice signal is converted into frequency-region signal by Fourier transformation FFT;
Obtain the frequency domain energy signal of frequency-region signal;
It combines frequency domain energy signal with the frequency domain energy signal of reference number amount before, obtains the energy square of voice signal Battle array.
In a kind of possible implementation, the target sound in voice signal is determined based on the energy matrix of voice signal Mobility, comprising:
Energy matrix based on voice signal obtains the feature of the isoboles of energy matrix, the spy of the isoboles of energy matrix Sign includes at least one of gray scale richness and Texture complication;
The mobility of the target sound in voice signal is determined based on the feature of the isoboles of energy matrix.
In a kind of possible implementation, the energy matrix based on voice signal obtains the spy of the isoboles of energy matrix Sign, comprising:
The variance of energy matrix is obtained, and obtains the mean value of energy matrix;
The gray scale richness of the isoboles of energy matrix is obtained according to the variance of energy matrix and mean value.
In a kind of possible implementation, the energy matrix based on voice signal obtains the spy of the isoboles of energy matrix Sign, comprising:
The isoboles of energy matrix are divided into multiple sub-blocks, the intra prediction of different directions is done to each sub-block;
Obtain absolute value of any sub-block between the predicted value of either direction and each row actual pixel value of any sub-block Error, and the absolute value error to any sub-block in either direction is averaging, using obtained average absolute value error as any Block error of the sub-block in either direction;
Block error to any sub-block in all directions is averaging, using obtained average block error as the distortion of any sub-block Value;
The distortion value of all sub-blocks is averaging, using obtained average distortion value as the texture of the isoboles of energy matrix Complexity.
In a kind of possible implementation, the target sound in voice signal is determined based on the feature of the isoboles of energy matrix The mobility of sound, comprising:
The initial active degree of the target sound in voice signal is determined based on the feature of the isoboles of energy matrix;
According to the relationship of initial active degree and mobility threshold value, the mobility of the target sound in voice signal is determined.
In a kind of possible implementation, the target sound in voice signal is determined based on the feature of the isoboles of energy matrix The initial active degree of sound, comprising:
Summation is weighted to the gray scale richness and Texture complication of the isoboles of energy matrix, obtained weighted sum As a result as the initial active degree of the target sound in voice signal.
In a kind of possible implementation, according to the relationship of initial active degree and mobility threshold value, determine in voice signal Target sound mobility, comprising:
If initial active degree is greater than the first mobility threshold value, the mobility of target sound is the first reference value;
If initial active degree is the second reference value less than the second mobility threshold value, the mobility of target sound;
If initial active degree is greater than the second mobility threshold value, and less than the first mobility threshold value, then obtains initial active degree With the first difference between the second mobility threshold value, and obtain between the first mobility threshold value and the second mobility threshold value second Difference, using the quotient of the first difference and the second difference as the mobility of target sound;Wherein, the first mobility threshold value is greater than second Mobility threshold value, the first reference value are greater than the second reference value.
In a kind of possible implementation, the actual gain of voice signal is determined based on the mobility of target sound, comprising:
Obtain the loudness value of voice signal;
Loudness value and target loudness value based on voice signal determine the loudness value for needing to adjust;
The loudness value adjusted as needed obtains target gain;
The gain change step of voice signal is determined by the mobility of target sound;
Relationship based on target gain Yu the actual gain of former frame voice signal is walked according to the change in gain of voice signal The long actual gain for determining voice signal.
In a kind of possible implementation, after actual gain adjustments voice signal, further includes:
Amplitude limiting processing is carried out to the voice signal after adjusting.
All the above alternatives can form the alternative embodiment of the disclosure, herein no longer using any combination It repeats one by one.
Fig. 2 is a kind of flow chart of method for controlling volume shown according to an exemplary embodiment, and this method can be applied In network direct broadcasting scene, it is related in the scene of volume adjustment or control it is of course also possible to be applied to other.As shown in Fig. 2, should Method is for including the following steps in terminal.
In the step s 21, voice signal is obtained.
By taking network direct broadcasting scene as an example, it is provided with microphone in terminal, the sound in current environment can be acquired by microphone Thus sound signal gets voice signal.In addition, can also be obtained other than can be by the real-time collected sound signal of microphone The voice signal being collected is taken, for example, terminal obtains a sound clip from network, thus obtains voice signal.Or Person, terminal receive the sound clip that user uploads, thus obtain voice signal.
In short, the method that the embodiment of the present disclosure provides can be applied to network direct broadcasting scene, but it is straight to be not limited solely to network Scene is broadcast, the voice signal obtained in any manner can be realized using the method that the embodiment of the present disclosure provides to sound The control of amount.In addition, it should be understood that the voice signal got is other than including target sound, it is also possible to will include in environment Some noises.For example, being directed to network direct broadcasting scene, the sound of main broadcaster is target sound, and the voice signal got is in addition to packet It includes outside the sound of main broadcaster, further includes some noises in current network living broadcast environment.
In step S22, the energy matrix of voice signal is obtained.
It when propagating in medium due to sound wave, moves back and forth medium particle near equilbrium position, generates dynamic Energy;On the other hand so that medium is produced the density process of compression and expansion again, make medium that there is the potential energy of deformation.This two parts energy The sum of amount is the acoustic energy for obtaining medium due to acoustic vibration.Based on acoustic energy, for voice signal, energy can be used Moment matrix indicates the energy of the voice signal.
Optionally, the energy matrix of voice signal is obtained, comprising: voice signal is converted by frequency-region signal by FFT;It obtains Take the frequency domain energy signal of frequency-region signal;It combines frequency domain energy signal with the frequency domain energy signal of reference number amount before, obtains The energy matrix of voice signal.
For example, then voice signal s (t) is turned by FFT so that the voice signal that current time t is got is s (t) as an example Changing frequency-region signal into is S0 (k, t)=FFT (s (t)), and the energy signal of frequency-region signal S0 (k, t) is E (k, t)=10*log10 (S0 (k, t) * S0 (k, t)), wherein k=1~M, M indicate the number of spectrum bands (band).
For being N with reference to quantity, frequency domain energy signal is combined with preceding N frame frequency domain energy signal, obtained voice signal Energy matrix be E=[E (t-N+1) ..., E (t)], wherein E (t)=[E (1, t) ..., E (M, t)] ', E (1, t) ..., E (M, t) represents a line in matrix, and E (t)=[E (1, t) ..., E (M, t)] ' is the transposition of row, thus represent in matrix one Column.
It should be understood that can be determined according to the scene of volume control with reference to quantity N, it can also be by user setting.It is logical It crosses and combines the frequency domain energy signal of the voice signal of t moment with the frequency domain energy signal of reference number amount before, and obtained with this The energy matrix of voice signal, to increase energy of the voice signal on time dimension so that according to the energy matrix into Capable volume control is more accurate.However, it should be understood that the acquisition modes of above-mentioned preceding N frame frequency domain energy signal and above-mentioned t The acquisition modes of the frequency domain energy signal of moment voice signal are identical, and this is no longer going to repeat them for the embodiment of the present disclosure.
In step S23, the mobility of the target sound in voice signal is determined based on the energy matrix of voice signal.
Since the energy matrix of voice signal is able to reflect the energy of alternative sounds in the voice signal, for voice signal In target sound (sound such as in addition to noise), the embodiment of the present disclosure takes the target sound in determining voice signal The mode of mobility, to reflect the active level of target sound in voice signal by the mobility.So as to by target Sound is distinguished with the noise in environment, so that main broadcaster's sound intensity difference is unlikely to excessive in the scenes such as network direct broadcasting.
In a kind of possible implementation, the two-dimentional energy matrix E for the N*M that above-mentioned steps obtain can be used as a width figure, That is the isoboles of energy matrix, each element are a pixels, then determine that sound is believed based on the energy matrix of voice signal The mobility of target sound in number, comprising: the energy matrix based on voice signal obtains the feature of the isoboles of energy matrix, The feature of the isoboles of energy matrix includes at least one of gray scale richness and Texture complication;Based on energy matrix etc. The feature of effect figure determines the mobility of the target sound in voice signal.
Wherein, gray scale richness can reflect the global feature of isoboles, be referred to as global characteristics.Sound is believed Target sound in number, general fluctuation of energy is larger, and for the noise in voice signal, general fluctuation of energy is smaller, thus Target sound and noise can be distinguished by gray scale richness.Texture complication is the same with gray scale richness and isoboles A kind of feature, unlike gray scale richness, Texture complication can reflect the local feature of isoboles.Sound is believed Target sound in number, texture is generally obvious, and for the noise in voice signal, texture is generally unobvious, thus Further target sound and noise can be distinguished by Texture complication.For this purpose, the embodiment of the present disclosure passes through gray scale richness The mobility of the target sound in voice signal is determined at least one of Texture complication feature, to can more match people Auditory properties.It is as follows about gray scale richness and the method for determination of Texture complication difference:
The method of determination of gray scale richness: by taking the feature of isoboles includes gray scale richness as an example, then it is based on voice signal Energy matrix obtain energy matrix isoboles feature, comprising: obtain the variance of energy matrix, and obtain energy matrix Mean value;The gray scale richness of the isoboles of energy matrix is obtained according to the variance of energy matrix and mean value.
Optionally, when the gray scale richness for the isoboles for obtaining energy matrix according to the variance of energy matrix and mean value, packet It includes but is not limited to the variance of energy matrix and mean value being weighted summation, enriched using obtain and value as the gray scale of isoboles Degree.For example, the gray scale richness of isoboles is Brightness:
Brightness=sigma_E+a*mu_E
Wherein, sigma_E is the variance of E, and mu_E is the mean value of E, and a is a weighted factor, which is 0~1 Between.It about the size of weighted factor, can rule of thumb determine, inlet porting can also be provided by terminal, be led to by user The inlet porting is crossed to execute setting operation.For example, the value of weighted factor a can be taken 0.3 in the embodiment of the present disclosure, then isoboles Gray scale richness be Brightness=sigma_E+0.3*mu_E.
The method of determination of Texture complication: by taking the feature of isoboles includes Texture complication as an example, then it is based on voice signal Energy matrix obtain energy matrix isoboles feature, comprising: the isoboles of energy matrix are divided into multiple sub-blocks, it is right Each sub-block does the intra prediction of different directions;It is real in the predicted value of either direction and each row of any sub-block to obtain any sub-block Absolute value error between the pixel value of border, and the absolute value error to any sub-block in either direction is averaging, it is flat by what is obtained Equal absolute value error as any sub-block either direction block error;Block error to any sub-block in all directions is averaging, Using obtained average block error as the distortion value of any sub-block;The distortion value of all sub-blocks is averaging, is averaged what is obtained Texture complication of the distortion value as the isoboles of energy matrix.
For example, being divided into the sub-block of 8x8 with the isoboles E by energy matrix, it is vertical to carry out in each sub-block (Vertical) and the intra prediction of horizontal (Horizontal) both direction, with the sub-block (m, n) vertical direction prediction Pixel value takes for the pixel value of the sub-block upper row.The absolute value error for calculating actual pixel value and predicted value, i.e., should The calculated for pixel values absolute value error of every a line of the pixel value and sub-block of sub-block (m, n) upper row.For being calculated Absolute value error be averaging, obtained average absolute value error as the sub-block (m, n) vertical direction block error.
In the way of the block error for obtaining vertical direction, sub-block (m, n) in the horizontal direction piece of error is obtained.Later, The block error in vertical direction and the two directions of horizontal direction is averaging, using obtained average block error as the sub-block (m, N) distortion value (Best distortion).Further, it in the way of the distortion value of above-mentioned acquisition sub-block (m, n), obtains The distortion value of each sub-block is averaging the distortion values of all sub-blocks, using obtained average distortion value as energy matrix etc. Imitate the Texture complication of figure.That is, the Texture complication of entire isoboles is measured using the distortion mean value of all sub-blocks.
For example, the distortion value (Best distortion) of the sub-block (m, n) is Texture (m, n):
Texture (m, n)=(Texture_Vertical (m, n)+Texture_Horizontal (m, n))/2
Wherein, Texture_Vertical (m, n) is expressed as the vertical texture of sub-block (m, n), Texture_ Horizontal (m, n) is expressed as the horizontal texture of sub-block (m, n).
The distortion value of all sub-blocks is averaging, obtained average distortion value can be expressed as Texture:
Texture=average (Texture (m, n))
Wherein, m=1~M/8;N=1~N/8.Texture is the Texture complication for being used as the isoboles of energy matrix.
Optionally, if the feature of the isoboles of energy matrix had not only included gray scale richness but also included Texture complication, Gray scale richness and Texture complication can be obtained in the manner described above respectively.
Further, no matter the feature of the isoboles of energy matrix only includes gray scale richness, still only multiple including texture Miscellaneous degree had still not only included gray scale richness but also had included Texture complication, and determined sound based on the feature of the isoboles of energy matrix The mobility of target sound in signal, comprising: the target in voice signal is determined based on the feature of the isoboles of energy matrix The initial active degree of sound;According to the relationship of initial active degree and mobility threshold value, the target sound in voice signal is determined Mobility.
Optionally, the feature based on the isoboles of energy matrix determines the initial active of the target sound in voice signal Degree, comprising: summation is weighted to the gray scale richness and Texture complication of the isoboles of energy matrix, obtained weighted sum As a result as the initial active degree of the target sound in voice signal.
For example, the initial active degree Activity of the target sound in voice signal are as follows:
Activity=α * Brightness+ β * Texture
Wherein, α and β is respectively the weighted value of gray scale richness and Texture complication, if come only with gray scale richness As the feature of isoboles, then the value that the value of α is set as 0, β can be set as non-zero;If only with Texture complication as equivalent The value that the value of β is set as 0, α can be then set as non-zero by the feature of figure;If made simultaneously using gray scale richness and Texture complication For the feature of isoboles, then the value of α and β can be set as non-zero.
α and β can be empirically determined, and inlet porting can also be provided by terminal, defeated by user by the inlet porting Enter the value of setting, the embodiment of the present disclosure is not limited this.
Optionally, according to the relationship of initial active degree and mobility threshold value, the work of the target sound in voice signal is determined Dynamic degree, comprising:
If initial active degree is greater than the first mobility threshold value, the mobility of target sound is the first reference value;
If initial active degree is the second reference value less than the second mobility threshold value, the mobility of target sound;
If initial active degree is greater than the second mobility threshold value, and less than the first mobility threshold value, then obtains initial active degree With the first difference between the second mobility threshold value, and obtain between the first mobility threshold value and the second mobility threshold value second Difference, using the quotient of the first difference and the second difference as the mobility of target sound;Wherein, the first mobility threshold value is greater than second Mobility threshold value, the first reference value are greater than the second reference value.
It is Activity with initial active degree, the first mobility threshold value is T1, and the second mobility threshold value is T0, the first reference Value is 1, for the second reference value is 0, the mobility Activity_factor=1 of target sound if Activity > T1; If Activity < T0, the mobility Activity_factor=0 of target sound, if T0 < Activity < T1, mesh Mark mobility Activity_factor=(Activity-T0)/(T1-T0) of sound.
Wherein, the size of T1 > T0, T1 and T0 can be empirically determined, inlet porting can also be provided by terminal, by user The value of T1 and T0 are set by the inlet porting, and the embodiment of the present disclosure is not defined the size of T1 and T0.
In step s 24, the actual gain of voice signal is determined based on the mobility of target sound.
Since the mobility of target sound is able to reflect the active level of target sound, hence it is evident that distinguish over making an uproar in environment Sound, thus determine based on the mobility of the target sound actual gain of voice signal can avoid the bottom in environment and make an uproar to be put Greatly.In one possible implementation, the actual gain of voice signal is determined based on the mobility of target sound, including as follows Several step S241 are to step 245.
Step S241 obtains the loudness value of voice signal;
When obtaining the loudness value of voice signal, can frequency-region signal S0 (k, t) to voice signal pass through loudness contour L (k) EQ (Equalizer, balanced device) weighting is carried out, frequency-region signal S2 (k, t)=L (k) S0 (k, t) of reaction loudness experience is obtained, The loudness value of voice signal is obtained by the frequency-region signal of reflection loudness experience later.
For example, loudness contour can be as shown in figure 3, in Fig. 3, abscissa is frequency, and ordinate is sound pressure level.Acoustic pressure is atmosphere The variation that pressure generates after being disturbed, the i.e. overbottom pressure of atmospheric pressure, it is equivalent to one disturbance of superposition on atmospheric pressure and causes Pressure change.The sound pressure level of different frequency corresponding to every curve is different in figure, but the loudness that human ear is felt It is equally, a number to be marked on every curve, unit is loudness side.By equal loudness contour it is known that when loudness is smaller, Human ear is insensitive to high bass perception, and when loudness is larger, high bass perception is gradually sensitive, and between 2000Hz~5000Hz Sound it is most sensitive.
In Fig. 3 on the curve of different loudness, the sound pressure level within the scope of 2000Hz~5000Hz frequency is in entire curve The position of relatively low sound pressure level illustrates that human ear is sensitive to the response of intermediate frequency.Low frequency and high frequency both sides except this range, Contour of equal loudness tilts, and illustrates that human ear declines the sensitivity of low frequency and high-frequency sound.Human ear can hear the most faint strong of sound Degree, referred to as threshold of hearing (in figure shown in dotted line MAF), generate the highest intensity of sound of feeling of pain, referred to as threshold of pain.By threshold of hearing and Two contour of equal loudness that threshold of pain is constituted, are the bounds of contour of equal loudness.Loudness depends mainly on the sound intensity, raising sound By force, loudness level is also increase accordingly.But the loudness of sound is not to be determined by the sound intensity merely, additionally depends on frequency, different frequencies The pure tone of rate has different loudness growth rates, and wherein the loudness growth rate of low frequency pure tone is faster than intermediate frequency pure tone.
EQ weighting is carried out by loudness contour L (k), i.e., weighting is rung to waiting for voice signal, simulates the auditory properties of human ear, Remove a part of useless acoustic energy.For example, when the insensitive frequency component of noise pollution those auditory perceptuals, if All frequency components are put on an equal footing in the frequency response for not considering human ear, then these are not the frequency pair of human ear sensitivity The feature of entire sound just has biggish pollution, so that subsequent voice recognition rate also degradation, so the disclosure is implemented The method that example provides is weighted by carrying out etc. ringing to the voice signal comprising noise, to simulate human ear characteristic, part is inhibited to be made an uproar Sound pollution and the fewer speech components of amount containing acoustic information, to improve noiseproof feature.
After obtaining the frequency-region signal of reflection loudness experience, the loudness value of the voice signal of acquisition is Loudness (t):
Loudness (t)=10*log10 (Sum (S2 (k, t) * S2 (k, t)))
Wherein, Sum () is summing function.
Step S242, loudness value and target loudness value based on voice signal determine the loudness value for needing to adjust;
By taking target loudness value is Loudness_target as an example, loudness value and target loudness value based on voice signal are true Needing the loudness value adjusted calmly is Loudness_diff:
Loudness_diff=Loudness_target-Loudness (t)
That is, needing difference of the loudness value adjusted between target loudness value and the loudness value of voice signal.Wherein, Target loudness value can be empirically determined, can also provide inlet porting by terminal, be set by user by the inlet porting It sets, the embodiment of the present disclosure is not defined the method for determination of target loudness value.
Step 243, the loudness value adjusted as needed obtains target gain;
It is TargetGain (t) that the loudness value adjusted as needed, which obtains target gain:
TargetGain (t)=pow (10, Loudness_diff/10)
Wherein, pow (x, y) function is used to ask the y power (power) of x.Then target gain is the Loudness_ for asking 10 Diff/10 power.
Step 244, the gain change step of voice signal is determined by the mobility of target sound;
Because each frame of target gain is all different, change over time too fastly, therefore be generally all with the gain of previous frame Adjust gain is carried out according to gain change step in basis, so that actual gain is mobile to target gain, i.e., walks according to change in gain Long adjustment actual gain.Optionally, the gain change step of voice signal is determined, by the mobility of target sound with stepUp As increased gain change step, using stepDown as the gain change step of reduction, then according to the activity of target sound It is as follows to spend determining stepUp and stepDown:
StepUp=release_factor*Activity_factor;
StepDown=attack_factor*Activity_factor;
Wherein, release_factor and attack_factor is respectively weighted value, can be empirically determined, can also be with Inlet porting is provided by terminal, is configured by user by the inlet porting, the embodiment of the present disclosure is not limited this. For example, release_factor and attack_factor can take the numerical value between 0 to 1, and the value of attack_factor respectively Value greater than release_factor.In addition, why with the reason of Activity_factor weighted gain change step be: What Activity_factor was characterized is the mobility of target sound, is made an uproar for environment bottom, Activity_factor is substantially equal to 0, therefore avoid bottom and make an uproar and be amplified.For there are target sound, music or other effective sound, Activity_factor bases This is equal to 1, therefore can normal zoom.For transitional region (for example, target sound and the simultaneous area of noise Domain or bad differentiation are the regions of target sound or noise), Activity_factor is the decimal between one 0~1, The variation of gain is slowed to a certain extent, that is, inhibits the amplification of noise.
Step 245, the relationship based on target gain Yu the actual gain of former frame voice signal, according to the increasing of voice signal Beneficial change step determines the actual gain of voice signal.
Optionally, the relationship based on target gain Yu the actual gain of former frame voice signal, according to the increasing of voice signal Beneficial change step determines the actual gain of voice signal, comprising:
If TargetGain (i) > Gain (t-1), Gain (t)=Gain (t-1)+stepUp;
If TargetGain (i) < Gain (t-1), Gain (t)=Gain (t-1)-stepDown;
If TargetGain (i)=Gain (t-1), Gain (t)=Gain (t-1).
Wherein, TargetGain (i) is target gain, and Gain (t-1) is the actual gain of former frame voice signal.Gain (t-1) the method for determination principle phase of the actual gain for the voice signal in method that acquisition modes are provided with the embodiment of the present disclosure Together, details are not described herein again.
In step s 25, according to actual gain adjustments voice signal, to control the volume of voice signal.
For the step, according to actual gain adjustments voice signal, i.e., voice signal is adjusted by the actual gain Size, to realize the volume control to voice signal.
As described above, actual gain is Gain (t), voice signal is s (t), then according to actual gain adjustments voice signal Afterwards, the voice signal after adjusting is s1 (t), s1 (t)=s (t) * Gain (t).
Since the actual gain in the application is determined by the mobility of target sound, that is to say, that the application The method that embodiment provides can control the actual gain of voice signal according to the mobility of target sound, to adjust sound The size of signal realizes the volume control to voice signal.For example, if the mobility of target sound is higher, the voice signal Feel to may be that volume is larger to people, can be turned down voice signal by control actual gain, to reach reduction volume Effect.Similarly, if the mobility of target sound is lower, which feels to may be that volume is smaller to people's, can be with Voice signal is tuned up by control actual gain, achievees the effect that improve volume.It can be seen that passing through the embodiment of the present application The avoidable volume of the method for offer is suddenly big or suddenly small, solves the problems, such as that sound intensity difference is too big, is able to ascend audio experience.This Outside, due to the mobility of target sound reflect be target sound active level, the actual gain determined therefrom that can inhibit ring Noise (i.e. bottom is made an uproar) in border is amplified, and thus can further promote audio experience.
In step S26, amplitude limiting processing is carried out to the voice signal after adjusting.
Clipping (limiting) processing refers to that by certain characteristic (such as voltage, electric current, power) of signal be more than predetermined threshold All instantaneous values of value weaken to close to this threshold value, and the operation that the instantaneous value all to other is retained.About clipping The mode of processing, the embodiment of the present disclosure are not limited this, for example, can be realized by amplitude limiter circuit.By the sound after adjusting Sound signal s1 (t) carries out amplitude limiting processing, obtains signal s2 (t), which is the sound letter after controlling volume Number.By carrying out amplitude limiting processing to the voice signal after adjusting, it can be avoided saturation and overflow distortion.
It, can also be in the case where not occurring saturation and overflowing distortion it should be understood that step S26 is optional step Without executing step S26, i.e., directly the voice signal s1 (t) after adjusting is believed as the sound after controlling volume Number.
Optionally, the realization process of above-mentioned each step can be found in shown in Fig. 4, and the embodiment of the present disclosure is based on AGC The basic principle of (Automatic Gain Control, automatic growth control) proposes one kind for scenes such as network direct broadcastings The method of Automatic control of sound volume, compared to the relevant technologies only target amplitude come by way of controlling volume, the embodiment of the present disclosure The method of offer more matches the auditory properties of people, and the bottom in environment can be made an uproar and be amplified, so that the sound intensity of live streaming is poor It is not unlikely to excessive, and then promotes audio experience.
The method that the embodiment of the present disclosure provides, the target sound in voice signal is determined by the energy matrix of voice signal Mobility, and the actual gain of voice signal is determined by the mobility of target sound, so as to avoid in environment Bottom, which is made an uproar, to be amplified, so that the control of volume more matches auditory properties, and then can promote volume control effect.
Fig. 5 is a kind of sound volume control device block diagram shown according to an exemplary embodiment.Referring to Fig. 5, which includes First acquisition unit 51, second acquisition unit 52, the first determination unit 53, the second determination unit 54 and updating unit 55.
First acquisition unit 51 is configured as obtaining voice signal;
Second acquisition unit 52 is configured as obtaining the energy matrix of voice signal;
First determination unit 53 is configured as determining the target sound in voice signal based on the energy matrix of voice signal Mobility;
Second determination unit 54 is configured as determining the actual gain of voice signal based on the mobility of target sound;
Control unit 55 is configured as according to actual gain adjustments voice signal, to control the volume of voice signal.
In a kind of possible implementation, second acquisition unit 52 is configured as believing sound by Fourier transformation FFT Number it is converted into frequency-region signal;Obtain the frequency domain energy signal of frequency-region signal;By the frequency of frequency domain energy signal and reference number amount before The combination of domain energy signal, obtains the energy matrix of voice signal.
In a kind of possible implementation, referring to Fig. 6, the first determination unit 51, comprising:
Subelement 511 is obtained, the spy that the energy matrix based on voice signal obtains the isoboles of energy matrix is configured as Sign, the feature of the isoboles of energy matrix includes at least one of gray scale richness and Texture complication;
It determines subelement 512, is configured as determining the target in voice signal based on the feature of the isoboles of energy matrix The mobility of sound.
In a kind of possible implementation, subelement 511 is obtained, is configured as obtaining the variance of energy matrix, and obtain The mean value of energy matrix;The gray scale richness of the isoboles of energy matrix is obtained according to the variance of energy matrix and mean value.
In a kind of possible implementation, subelement 511 is obtained, is configured as the isoboles of energy matrix being divided into more A sub-block does the intra prediction of different directions to each sub-block;Obtain predicted value and any son of any sub-block in either direction Absolute value error between each row actual pixel value of block, and the absolute value error to any sub-block in either direction is averaging, Using obtained average absolute value error as any sub-block either direction block error;Block to any sub-block in all directions misses Difference is averaging, using obtained average block error as the distortion value of any sub-block;The distortion value of all sub-blocks is averaging, will Texture complication of the average distortion value arrived as the isoboles of energy matrix.
In a kind of possible implementation, referring to Fig. 7, subelement 512 is determined, comprising:
First determining module 5121 is configured as determining the mesh in voice signal based on the feature of the isoboles of energy matrix Mark the initial active degree of sound;
Second determining module 5122, is configured as the relationship according to initial active degree and mobility threshold value, determines that sound is believed The mobility of target sound in number.
In a kind of possible implementation, the first determining module 5121 is configured as the ash to the isoboles of energy matrix Degree richness and Texture complication are weighted summation, and obtained weighted sum result is as the target sound in voice signal Initial active degree.
In a kind of possible implementation, the second determining module 5122, if it is living greater than first to be configured as initial active degree Dynamic degree threshold value, then the mobility of target sound is the first reference value;If initial active degree is less than the second mobility threshold value, target The mobility of sound is the second reference value;If initial active degree is greater than the second mobility threshold value, and less than the first mobility threshold value, The first difference between initial active degree and the second mobility threshold value is then obtained, and obtains the first mobility threshold value and the second activity The second difference between threshold value is spent, using the quotient of the first difference and the second difference as the mobility of target sound;Wherein, first is living Dynamic degree threshold value is greater than the second mobility threshold value, and the first reference value is greater than the second reference value.
In a kind of possible implementation, the second determination unit 54 is configured as obtaining the loudness value of voice signal;It is based on The loudness value and target loudness value of voice signal determine the loudness value for needing to adjust;The loudness value adjusted as needed obtains target Gain;The gain change step of voice signal is determined by the mobility of target sound;Based on target gain and former frame sound The relationship of the actual gain of signal determines the actual gain of voice signal according to the gain change step of voice signal.
In a kind of possible implementation, referring to Fig. 8, the device further include:
Clipping unit 56 is configured as carrying out amplitude limiting processing to the voice signal after adjusting.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.
Fig. 9 is a kind of block diagram of terminal 900 shown according to an exemplary embodiment.The terminal 900 may is that intelligent hand (Moving Picture Experts Group Audio Layer III, dynamic image are special for machine, tablet computer, MP3 player Family's compression standard audio level 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image Expert's compression standard audio level 4) player, laptop or desktop computer.Terminal 900 is also possible to referred to as user and sets Other titles such as standby, portable terminal, laptop terminal, terminal console.
In general, terminal 900 includes: processor 901 and memory 902.
Processor 901 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place Reason device 901 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field- Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed Logic array) at least one of example, in hardware realize.Processor 901 also may include primary processor and coprocessor, master Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing Unit, central processing unit);Coprocessor is the low power processor for being handled data in the standby state.? In some embodiments, processor 901 can be integrated with GPU (Graphics Processing Unit, image processor), GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 901 can also be wrapped AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processor is for handling related machine learning Calculating operation.
Memory 902 may include one or more computer readable storage mediums, which can To be non-transient.Memory 902 may also include high-speed random access memory and nonvolatile memory, such as one Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 902 can Storage medium is read for storing at least one instruction, at least one instruction performed by processor 901 for realizing this Shen Please in embodiment of the method provide method for controlling volume.
In some embodiments, terminal 900 is also optional includes: peripheral device interface 903 and at least one peripheral equipment. It can be connected by bus or signal wire between processor 901, memory 902 and peripheral device interface 903.Each peripheral equipment It can be connected by bus, signal wire or circuit board with peripheral device interface 903.Specifically, peripheral equipment includes: radio circuit 904, at least one of display screen 905, camera 906, voicefrequency circuit 907, positioning component 908 and power supply 909.
Peripheral device interface 903 can be used for I/O (Input/Output, input/output) is relevant outside at least one Peripheral equipment is connected to processor 901 and memory 902.In some embodiments, processor 901, memory 902 and peripheral equipment Interface 903 is integrated on same chip or circuit board;In some other embodiments, processor 901, memory 902 and outer Any one or two in peripheral equipment interface 903 can realize on individual chip or circuit board, the present embodiment to this not It is limited.
Radio circuit 904 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal.It penetrates Frequency circuit 904 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 904 turns electric signal It is changed to electromagnetic signal to be sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 904 wraps It includes: antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, codec chip Group, user identity module card etc..Radio circuit 904 can be carried out by least one wireless communication protocol with other terminals Communication.The wireless communication protocol includes but is not limited to: Metropolitan Area Network (MAN), each third generation mobile communication network (2G, 3G, 4G and 5G), wireless office Domain net and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some embodiments, radio circuit 904 may be used also To include the related circuit of NFC (Near Field Communication, wireless near field communication), the application is not subject to this It limits.
Display screen 905 is for showing UI (User Interface, user interface).The UI may include figure, text, figure Mark, video and its their any combination.When display screen 905 is touch display screen, display screen 905 also there is acquisition to show The ability of the touch signal on the surface or surface of screen 905.The touch signal can be used as control signal and be input to processor 901 are handled.At this point, display screen 905 can be also used for providing virtual push button and/or dummy keyboard, also referred to as soft button and/or Soft keyboard.In some embodiments, display screen 905 can be one, and the front panel of terminal 900 is arranged;In other embodiments In, display screen 905 can be at least two, be separately positioned on the different surfaces of terminal 900 or in foldover design;In still other reality It applies in example, display screen 905 can be flexible display screen, be arranged on the curved surface of terminal 900 or on fold plane.Even, it shows Display screen 905 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display screen 905 can use LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) Etc. materials preparation.
CCD camera assembly 906 is for acquiring image or video.Optionally, CCD camera assembly 906 include front camera and Rear camera.In general, the front panel of terminal is arranged in front camera, the back side of terminal is arranged in rear camera.One In a little embodiments, rear camera at least two is main camera, depth of field camera, wide-angle camera, focal length camera shooting respectively Any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide-angle Camera fusion realizes that pan-shot and VR (Virtual Reality, virtual reality) shooting function or other fusions are clapped Camera shooting function.In some embodiments, CCD camera assembly 906 can also include flash lamp.Flash lamp can be monochromatic warm flash lamp, It is also possible to double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for not With the light compensation under colour temperature.
Voicefrequency circuit 907 may include microphone and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and will Sound wave, which is converted to electric signal and is input to processor 901, to be handled, or is input to radio circuit 904 to realize voice communication. For stereo acquisition or the purpose of noise reduction, microphone can be separately positioned on the different parts of terminal 900 to be multiple.Mike Wind can also be array microphone or omnidirectional's acquisition type microphone.Loudspeaker is then used to that processor 901 or radio circuit will to be come from 904 electric signal is converted to sound wave.Loudspeaker can be traditional wafer speaker, be also possible to piezoelectric ceramic loudspeaker.When When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, it can also be by telecommunications Number the sound wave that the mankind do not hear is converted to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 907 can also include Earphone jack.
Positioning component 908 is used for the current geographic position of positioning terminal 900, to realize navigation or LBS (Location Based Service, location based service).Positioning component 908 can be the GPS (Global based on the U.S. Positioning System, global positioning system), the dipper system of China, Russia Gray receive this system or European Union The positioning component of Galileo system.
Power supply 909 is used to be powered for the various components in terminal 900.Power supply 909 can be alternating current, direct current, Disposable battery or rechargeable battery.When power supply 909 includes rechargeable battery, which can support wired charging Or wireless charging.The rechargeable battery can be also used for supporting fast charge technology.
In some embodiments, terminal 900 further includes having one or more sensors 910.The one or more sensors 910 include but is not limited to: acceleration transducer 911, gyro sensor 912, pressure sensor 913, fingerprint sensor 914, Optical sensor 915 and proximity sensor 916.
The acceleration that acceleration transducer 911 can detecte in three reference axis of the coordinate system established with terminal 900 is big It is small.For example, acceleration transducer 911 can be used for detecting component of the acceleration of gravity in three reference axis.Processor 901 can With the acceleration of gravity signal acquired according to acceleration transducer 911, touch display screen 905 is controlled with transverse views or longitudinal view Figure carries out the display of user interface.Acceleration transducer 911 can be also used for the acquisition of game or the exercise data of user.
Gyro sensor 912 can detecte body direction and the rotational angle of terminal 900, and gyro sensor 912 can To cooperate with acquisition user to act the 3D of terminal 900 with acceleration transducer 911.Processor 901 is according to gyro sensor 912 Following function may be implemented in the data of acquisition: when action induction (for example changing UI according to the tilt operation of user), shooting Image stabilization, game control and inertial navigation.
The lower layer of side frame and/or touch display screen 905 in terminal 900 can be set in pressure sensor 913.Work as pressure When the side frame of terminal 900 is arranged in sensor 913, user can detecte to the gripping signal of terminal 900, by processor 901 Right-hand man's identification or prompt operation are carried out according to the gripping signal that pressure sensor 913 acquires.When the setting of pressure sensor 913 exists When the lower layer of touch display screen 905, the pressure operation of touch display screen 905 is realized to UI circle according to user by processor 901 Operability control on face is controlled.Operability control includes button control, scroll bar control, icon control, menu At least one of control.
Fingerprint sensor 914 is used to acquire the fingerprint of user, collected according to fingerprint sensor 914 by processor 901 The identity of fingerprint recognition user, alternatively, by fingerprint sensor 914 according to the identity of collected fingerprint recognition user.It is identifying When the identity of user is trusted identity out, the user is authorized to execute relevant sensitive operation, the sensitive operation packet by processor 901 Include solution lock screen, check encryption information, downloading software, payment and change setting etc..Terminal can be set in fingerprint sensor 914 900 front, the back side or side.When being provided with physical button or manufacturer Logo in terminal 900, fingerprint sensor 914 can be with It is integrated with physical button or manufacturer Logo.
Optical sensor 915 is for acquiring ambient light intensity.In one embodiment, processor 901 can be according to optics The ambient light intensity that sensor 915 acquires controls the display brightness of touch display screen 905.Specifically, when ambient light intensity is higher When, the display brightness of touch display screen 905 is turned up;When ambient light intensity is lower, the display for turning down touch display screen 905 is bright Degree.In another embodiment, the ambient light intensity that processor 901 can also be acquired according to optical sensor 915, dynamic adjust The acquisition parameters of CCD camera assembly 906.
Proximity sensor 916, also referred to as range sensor are generally arranged at the front panel of terminal 900.Proximity sensor 916 For acquiring the distance between the front of user Yu terminal 900.In one embodiment, when proximity sensor 916 detects use When family and the distance between the front of terminal 900 gradually become smaller, touch display screen 905 is controlled from bright screen state by processor 901 It is switched to breath screen state;When proximity sensor 916 detects user and the distance between the front of terminal 900 becomes larger, Touch display screen 905 is controlled by processor 901 and is switched to bright screen state from breath screen state.
It will be understood by those skilled in the art that the restriction of the not structure paired terminal 900 of structure shown in Fig. 9, can wrap It includes than illustrating more or fewer components, perhaps combine certain components or is arranged using different components.
In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium is additionally provided, when the storage medium In instruction by terminal processor execute when, enable the terminal to execute following method for controlling volume:
Obtain voice signal;
Obtain the energy matrix of voice signal;
The mobility of the target sound in voice signal is determined based on the energy matrix of voice signal;
The actual gain of voice signal is determined based on the mobility of target sound;
According to actual gain adjustments voice signal, to control the volume of voice signal.
For example, the non-transitorycomputer readable storage medium can be read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), CD-ROM (Compact Disc Read-Only Memory, CD-ROM), tape, floppy disk and optical data storage devices etc..
In the exemplary embodiment, additionally provide a kind of application product, when the instruction in the application product by When the processor of terminal executes, enable the terminal to execute following method for controlling volume:
Obtain voice signal;
Obtain the energy matrix of voice signal;
The mobility of the target sound in voice signal is determined based on the energy matrix of voice signal;
The actual gain of voice signal is determined based on the mobility of target sound;
According to actual gain adjustments voice signal, to control the volume of voice signal.
It should be appreciated that enabling the terminal to execution when the instruction in above-mentioned storage medium is executed by the processor of terminal When instruction in method for controlling volume and application product is executed by the processor of terminal, execution is enabled the terminal to Method for controlling volume can be found in the content shown in above method embodiment, no longer repeat one by one herein.
Those skilled in the art will readily occur to its of the application after considering specification and practicing this disclosure Its embodiment.This application is intended to cover any variations, uses, or adaptations of the application, these modifications, purposes or Person's adaptive change follows the general principle of the application and including the undocumented common knowledge in the art of the disclosure Or conventional techniques.The description and examples are only to be considered as illustrative, and the true scope and spirit of the application are wanted by right It asks and points out.
It should be understood that the application is not limited to the precise structure that has been described above and shown in the drawings, and And various modifications and changes may be made without departing from the scope thereof.Scope of the present application is only limited by the accompanying claims.

Claims (10)

1. a kind of method for controlling volume characterized by comprising
Obtain voice signal;
Obtain the energy matrix of the voice signal;
The mobility of the target sound in the voice signal is determined based on the energy matrix of the voice signal;
The actual gain of the voice signal is determined based on the mobility of the target sound;
According to voice signal described in the actual gain adjustments, to control the volume of the voice signal.
2. method for controlling volume according to claim 1, which is characterized in that the energy square for obtaining the voice signal Battle array, comprising:
The voice signal is converted into frequency-region signal by Fourier transformation FFT;
Obtain the frequency domain energy signal of the frequency-region signal;
It combines the frequency domain energy signal with the frequency domain energy signal of reference number amount before, obtains the energy of the voice signal Matrix.
3. method for controlling volume according to claim 1, which is characterized in that the energy square based on the voice signal Battle array determines the mobility of the target sound in the voice signal, comprising:
Energy matrix based on the voice signal obtains the feature of the isoboles of the energy matrix, the energy matrix etc. The feature of effect figure includes at least one of gray scale richness and Texture complication;
The mobility of the target sound in the voice signal is determined based on the feature of the isoboles of the energy matrix.
4. method for controlling volume according to claim 3, which is characterized in that the energy square based on the voice signal Battle array obtains the feature of the isoboles of the energy matrix, comprising:
The variance of the energy matrix is obtained, and obtains the mean value of the energy matrix;
The gray scale richness of the isoboles of the energy matrix is obtained according to the variance of the energy matrix and mean value.
5. method for controlling volume according to claim 3, which is characterized in that the energy square based on the voice signal Battle array obtains the feature of the isoboles of the energy matrix, comprising:
The isoboles of the energy matrix are divided into multiple sub-blocks, the intra prediction of different directions is done to each sub-block;
Obtain absolute value of any sub-block between the predicted value of either direction and each row actual pixel value of any sub-block Error, and the absolute value error to any sub-block in the either direction is averaging, the average absolute value error that will be obtained As any sub-block the either direction block error;
Block error to any sub-block in all directions is averaging, using obtained average block error as any sub-block Distortion value;
The distortion value of all sub-blocks is averaging, using obtained average distortion value as the texture of the isoboles of the energy matrix Complexity.
6. method for controlling volume according to claim 3, which is characterized in that the isoboles based on the energy matrix Feature determine the mobility of the target sound in the voice signal, comprising:
The initial active degree of the target sound in the voice signal is determined based on the feature of the isoboles of the energy matrix;
According to the relationship of the initial active degree and mobility threshold value, the activity of the target sound in the voice signal is determined Degree.
7. method for controlling volume according to claim 6, which is characterized in that described according to the initial active degree and activity The relationship for spending threshold value, determines the mobility of the target sound in the voice signal, comprising:
If the initial active degree is greater than the first mobility threshold value, the mobility of the target sound is the first reference value;
If the initial active degree is the second reference value less than the second mobility threshold value, the mobility of the target sound;
If the initial active degree is greater than the second mobility threshold value, and is less than the first mobility threshold value, then institute is obtained State the first difference between initial active degree and the second mobility threshold value, and obtain the first mobility threshold value with it is described The second difference between second mobility threshold value, using the quotient of first difference and second difference as the target sound Mobility;Wherein, the first mobility threshold value is greater than the second mobility threshold value, and first reference value is greater than described Second reference value.
8. method for controlling volume according to claim 1, which is characterized in that the mobility based on the target sound Determine the actual gain of the voice signal, comprising:
Obtain the loudness value of the voice signal;
Loudness value and target loudness value based on the voice signal determine the loudness value for needing to adjust;
Target gain is obtained according to the loudness value for needing to adjust;
The gain change step of the voice signal is determined by the mobility of the target sound;
Relationship based on the target gain Yu the actual gain of former frame voice signal becomes according to the gain of the voice signal Change the actual gain that step-length determines the voice signal.
9. a kind of sound volume control device characterized by comprising
First acquisition unit is configured as obtaining voice signal;
Second acquisition unit is configured as obtaining the energy matrix of the voice signal;
First determination unit is configured as determining the target sound in the voice signal based on the energy matrix of the voice signal The mobility of sound;
Second determination unit is configured as determining the actual gain of the voice signal based on the mobility of the target sound;
Control unit is configured as the voice signal according to the actual gain adjustments, to control the sound of the voice signal Amount.
10. a kind of non-transitorycomputer readable storage medium, which is characterized in that when the instruction in the storage medium is by terminal Processor execute when, enable the terminal to execute a kind of method for controlling volume, which comprises
Obtain voice signal;
Obtain the energy matrix of the voice signal;
The mobility of the target sound in the voice signal is determined based on the energy matrix of the voice signal;
The actual gain of the voice signal is determined based on the mobility of the target sound;
According to voice signal described in the actual gain adjustments, to control the volume of the voice signal.
CN201811506570.2A 2018-12-10 2018-12-10 Volume control method, device and storage medium Active CN109587603B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811506570.2A CN109587603B (en) 2018-12-10 2018-12-10 Volume control method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811506570.2A CN109587603B (en) 2018-12-10 2018-12-10 Volume control method, device and storage medium

Publications (2)

Publication Number Publication Date
CN109587603A true CN109587603A (en) 2019-04-05
CN109587603B CN109587603B (en) 2020-11-10

Family

ID=65929398

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811506570.2A Active CN109587603B (en) 2018-12-10 2018-12-10 Volume control method, device and storage medium

Country Status (1)

Country Link
CN (1) CN109587603B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112385143A (en) * 2019-04-26 2021-02-19 谷歌有限责任公司 Dynamic volume level dependent on background level
CN113711624A (en) * 2019-04-23 2021-11-26 株式会社索思未来 Sound processing device
CN113711624B (en) * 2019-04-23 2024-06-07 株式会社索思未来 Sound processing device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103684302A (en) * 2012-09-12 2014-03-26 国基电子(上海)有限公司 Volume control device and method
US20160261951A1 (en) * 2013-10-30 2016-09-08 Nuance Communications, Inc. Methods And Apparatus For Selective Microphone Signal Combining
US20160343242A1 (en) * 2015-05-20 2016-11-24 Google Inc. Systems and methods for self-administering a sound test
CN107799124A (en) * 2017-10-12 2018-03-13 安徽咪鼠科技有限公司 A kind of VAD detection methods applied to intelligent sound mouse
CN108573709A (en) * 2017-03-09 2018-09-25 中移(杭州)信息技术有限公司 A kind of auto gain control method and device
CN108711435A (en) * 2018-05-30 2018-10-26 中南大学 A kind of high efficiency audio control method towards loudness

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103684302A (en) * 2012-09-12 2014-03-26 国基电子(上海)有限公司 Volume control device and method
US20160261951A1 (en) * 2013-10-30 2016-09-08 Nuance Communications, Inc. Methods And Apparatus For Selective Microphone Signal Combining
US20160343242A1 (en) * 2015-05-20 2016-11-24 Google Inc. Systems and methods for self-administering a sound test
CN108573709A (en) * 2017-03-09 2018-09-25 中移(杭州)信息技术有限公司 A kind of auto gain control method and device
CN107799124A (en) * 2017-10-12 2018-03-13 安徽咪鼠科技有限公司 A kind of VAD detection methods applied to intelligent sound mouse
CN108711435A (en) * 2018-05-30 2018-10-26 中南大学 A kind of high efficiency audio control method towards loudness

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113711624A (en) * 2019-04-23 2021-11-26 株式会社索思未来 Sound processing device
CN113711624B (en) * 2019-04-23 2024-06-07 株式会社索思未来 Sound processing device
CN112385143A (en) * 2019-04-26 2021-02-19 谷歌有限责任公司 Dynamic volume level dependent on background level

Also Published As

Publication number Publication date
CN109587603B (en) 2020-11-10

Similar Documents

Publication Publication Date Title
CN111326132B (en) Audio processing method and device, storage medium and electronic equipment
CN108008930B (en) Method and device for determining K song score
CN107509153B (en) Detection method and device of sound playing device, storage medium and terminal
US11482237B2 (en) Method and terminal for reconstructing speech signal, and computer storage medium
CN110139142A (en) Virtual objects display methods, device, terminal and storage medium
CN109547848A (en) Loudness method of adjustment, device, electronic equipment and storage medium
CN110022489A (en) Video broadcasting method, device and storage medium
CN109524016B (en) Audio processing method and device, electronic equipment and storage medium
CN109327608A (en) Method, terminal, server and the system that song is shared
WO2021139535A1 (en) Method, apparatus and system for playing audio, and device and storage medium
CN111445901A (en) Audio data acquisition method and device, electronic equipment and storage medium
CN110708630B (en) Method, device and equipment for controlling earphone and storage medium
CN112133332B (en) Method, device and equipment for playing audio
CN108834037B (en) The method and apparatus of playing audio-fequency data
CN110139143A (en) Virtual objects display methods, device, computer equipment and storage medium
CN111813367A (en) Method, device and equipment for adjusting volume and storage medium
CN109065068A (en) Audio-frequency processing method, device and storage medium
CN109634688A (en) Display methods, device, terminal and the storage medium at session interface
CN109587603A (en) Method for controlling volume, device and storage medium
CN109147809A (en) Acoustic signal processing method, device, terminal and storage medium
CN110058837B (en) Audio output method and terminal
CN116095595B (en) Audio processing method and device
CN108196813B (en) Method and device for adding sound effect
CN109036463A (en) Obtain the method, apparatus and storage medium of the difficulty information of song
CN108053831A (en) Music generation, broadcasting, recognition methods, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant