WO2012065392A1

WO2012065392A1 - Method and device for processing the quality of voice call

Info

Publication number: WO2012065392A1
Application number: PCT/CN2011/071792
Authority: WO
Inventors: 魏国华
Original assignee: 中兴通讯股份有限公司
Priority date: 2010-11-19
Filing date: 2011-03-15
Publication date: 2012-05-24
Also published as: CN102014205A

Abstract

A method and device for processing the quality of a voice call are disclosed in the present invention. The method includes: setting multiple scenes in a mobile terminal, wherein each scene corresponding to a set of audio parameters; using the audio parameters corresponding to one scene of the multiple scenes to set the mobile terminal. With the invention, the problem that the quality of a voice call can not meet the actual needs of a user in the prior art is solved, and the user experience is improved.

Description

The present invention relates to the field of communications, and in particular to a method and apparatus for processing voice call quality. BACKGROUND At present, in a mobile terminal (for example, a mobile phone or the like), in order to improve the voice quality of a call, the following processing methods are currently used: The mobile terminal can use the dual microphone noise reduction technology, but this method increases the hardware design cost. As well as increasing the complexity of the structural design, this dual-mike noise reduction technology completely filters out the environmental unsteady noise, so that the called end can not truly feel the real environment where the caller is located. The mobile terminal can also dynamically adjust the receiving gain and the transmission gain of the mobile terminal to improve the voice call quality only by detecting the magnitude of the ambient noise by a microphone (MIC) on the mobile terminal motherboard. Since the ambient noise of the MIC is not able to determine the exact noise location where the user is located, improving the transmission gain and the receiving gain can not really solve the problem of voice call quality, but brings more speech distortion and affects the subjective feeling of the user. The above two processing methods are built-in fixed software versions in the mobile terminal. Such a processing method cannot be tampered with by the user after leaving the factory, and the quality of the voice call cannot meet the real needs of the user. In this case, the user may select Retire the machine or update the software version after the sale, which affects the user's physical insurance. SUMMARY OF THE INVENTION A primary object of the present invention is to provide a method and apparatus for processing voice call quality to at least solve the above problems. According to an aspect of the present invention, a method for processing voice call quality is provided, including: setting a plurality of scenarios in a mobile terminal, wherein each scenario corresponds to a set of audio parameters; The audio parameters corresponding to a scene are set to the mobile terminal. Preferably, the audio parameter corresponding to the scene is determined according to at least one of the following: an age group of the user, a gender of the user, and an environment in which the mobile terminal is in a call. Preferably, the following steps are used to determine audio parameters corresponding to the scene: collecting audio samples corresponding to the scene; using the audio samples to perform testing in a standard anechoic chamber; determining the scene according to the test result Corresponding audio parameters. Preferably, the audio parameter comprises at least one of: a parameter for controlling the magnitude of the sound gain in the transmitting and/or receiving direction, a parameter for adjusting the digital gain on the transmitting and/or receiving channel, adjusting the transmitting and/or receiving channel. The parameters of the analog gain, the parameters of the frequency at which the transmitted and/or received speech are modulated, the parameters that suppress the transmission of the background noise, and the parameters that enhance the effect of the double talk. Preferably, setting the mobile terminal by using an audio parameter corresponding to one of the multiple scenarios includes: writing audio parameters corresponding to one of the multiple scenarios to digital signal processing In the DSP register, the terminal sets the mobile terminal according to an audio parameter in the DSP register. According to another aspect of the present invention, a processing device for voice call quality is provided, which is located in a mobile terminal, and includes: a first setting module, configured to set a plurality of scenarios, wherein each scenario corresponds to a set of audio parameters; The second setting module is configured to set the mobile terminal by using an audio parameter corresponding to one of the multiple scenarios. Preferably, the device further includes: a parameter preset module, configured to determine an audio parameter corresponding to the scene according to at least one of the following: an age group of the user, a gender of the user, and the mobile terminal is in a call surroundings. Preferably, the device further includes: a collection module configured to collect audio samples corresponding to the scene; a test module configured to perform testing in the standard anechoic chamber using the audio sample; and determining a module, configured to The result of the test determines the audio parameters corresponding to the scene. Preferably, the audio parameter corresponding to the scene set by the first setting module includes at least one of the following: a parameter for controlling a size of a sound gain in a transmitting and/or receiving direction, and a parameter for adjusting a digital gain on a transmitting and/or receiving channel. Adjusting the parameters of the analog gain on the transmit and/or receive channels, the parameters of the frequency at which the transmitted and/or received speech are modulated, the parameters that suppress the transmission of the background noise, and the parameters that enhance the effect of the double talk. Preferably, the device further includes: a writing module, configured to write audio parameters corresponding to one of the plurality of scenarios into a digital signal processing DSP register, wherein the terminal is configured according to the DSP register The audio parameters are set for the mobile terminal. The invention solves the problem that the quality of the voice call cannot meet the real needs of the user in the prior art, and improves the user experience. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are set to illustrate,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a flow chart showing a method for processing voice call quality according to an embodiment of the present invention; FIG. 2 is a block diagram showing a structure of a voice call quality processing device according to an embodiment of the present invention; BRIEF DESCRIPTION OF THE DRAWINGS FIG. 4 is a schematic diagram of an environment used by a user body-subject subjective test according to an embodiment of the present invention; FIG. 5 is an audio diagram of a mobile terminal according to an embodiment of the present invention; Module Block Diagram; Figure 6 is a schematic diagram of scene relationships based on gender, age, and location of use, in accordance with an embodiment of the present invention. BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. It should be noted that the embodiments in the present application and the features in the embodiments may be combined with each other without conflict. FIG. 1 is a flowchart of a method for processing voice call quality according to an embodiment of the present invention. As shown in FIG. 1 , the method includes the following steps: Step S102: Set a plurality of scenarios in a mobile terminal, where each scenario corresponds to a set of audio parameters; Step S104: Set the mobile terminal by using audio parameters corresponding to one of the plurality of scenarios. Through the above steps, multiple scenarios are set in the mobile terminal, so that the user is provided with multiple choices, so that the user can select a suitable scenario according to his own needs, to obtain satisfactory voice call quality. The existing mobile terminal cannot judge the user's gender, age level, and mobile terminal use location. In the following preferred embodiment of the present embodiment, according to psychoacoustics and physiological acoustics, for example, users according to different genders or age levels may be used. The spectrum of the auditory and call speech is used to determine the listening ability of different users, and then to set the appropriate scene. For example, at least one of the following determines the audio parameters corresponding to the scene: the age of the user, the gender of the user, and the environment in which the mobile terminal is in a call. In the preferred embodiment, the user can select a separate voice call configuration according to his or her age, gender, and different usage environments in which the voice call is made, which can give the user more voice call quality selection options. The user significantly improves the quality of voice calls without replacing the mobile terminal. In this embodiment, a preferred method for determining an audio parameter corresponding to a scene is also provided (of course, the method for determining an audio parameter may not be used, but the determining method is relatively easy to implement), and the method includes the following Step ^1 Step S1, collecting the audio samples corresponding to the scene; Step S2, using the audio samples to test in the standard anechoic chamber; Step S3, determining the audio parameters corresponding to the scene according to the test results. Preferably, the audio parameters comprise at least one of: parameters for controlling the magnitude of the sound gain in the transmit and/or receive directions, adjusting parameters of the digital gain on the transmit and/or receive channels, adjusting the analog gain on the transmit and/or receive channels The parameters, the parameters of the frequency at which the transmitted and/or received speech are modulated, the parameters that suppress the transmission of the background noise, and the parameters that enhance the effect of the double talk. Of course, in order to be more conveniently implemented in the mobile terminal, the audio parameters corresponding to one of the multiple scenes can be written into a digital signal processing (DSP) register, and the terminal can only use the DSP register. The audio parameters in the settings are made to the mobile terminal. According to the above embodiment and its preferred embodiment, several mobile terminal usage scene modes are preset in the mobile terminal, so that the user can use his or her gender and age level during the call. And the location and environment in which the voice call is made, the scene is selected, and then the mobile terminal dynamically writes the adjusted audio parameters customized according to different scenarios in the DSP of the mobile terminal according to the scene configuration selected by the user. It satisfies the high requirements of users for the quality of voice calls, reduces the after-sales cost of mobile terminal manufacturers, and also makes the voice call quality significantly improved when users use mobile terminals. 2 is a structural block diagram of a processing apparatus for voice call quality according to an embodiment of the present invention. The device is used to implement the foregoing embodiment and its preferred embodiments. The description has been omitted, and the following is related to the structure. The module is explained. As shown in FIG. 2, the device includes: a first setting module 10 and a second setting module 20. The structure will be described below. The first setting module 10 is configured to set a plurality of scenarios in the mobile terminal, where each scenario corresponds to a set of audio parameters; and the second setting module 20 is configured to use audio corresponding to one of the multiple scenarios. The parameters are set for the mobile terminal. 3 is a structural block diagram of a processing apparatus for a preferred voice call quality according to an embodiment of the present invention. As shown in FIG. 3, the apparatus further includes: a parameter preset module 302, configured to determine an audio parameter corresponding to the scene according to at least one of the following: : The age of the user, the gender of the user, and the environment in which the mobile terminal is talking. Preferably, the device further includes: a collection module 304 configured to collect audio samples corresponding to the scene; a test module 306 configured to perform testing in a standard anechoic chamber using the audio samples; and the determining module 308 is configured to The result of the test determines the audio parameters corresponding to the scene. Preferably, the audio parameter corresponding to the scene set by the first setting module 10 includes at least one of the following: a parameter for controlling the size of the sound gain in the transmitting and/or receiving direction, a parameter for adjusting the digital gain on the transmitting and/or receiving channel, Adjust the parameters of the analog gain on the transmit and/or receive channels, the parameters that modulate the frequency of the transmitted and/or received speech, the parameters that suppress the background noise transmission, and the parameters that enhance the double talk effect. Preferably, the device further includes: a writing module 310, configured to write audio parameters corresponding to one of the plurality of scenes into the DSP register, and the terminal sets the mobile terminal according to the audio parameters in the DSP register. The following description is made in conjunction with another preferred embodiment in combination with the above-described embodiments and preferred embodiments thereof. In the preferred embodiment, several types of call usage scenarios are built in the built-in programming software version of the mobile terminal, such as selecting the user's age level, gender, and the real environment in which the user voice calls: family, conference room, road, Beach, bus, etc. Users can set according to their own call needs and actual usage. Through the above three modes, users can choose the voice call quality mode that suits them. In the preferred embodiment, for each scenario, separate different audio parameters are provided in the built-in software of the mobile terminal, so that the user can write the DSP register in real time when the user selects the scene, thereby achieving the purpose of improving the quality of the voice call. In the preferred embodiment, the individual audio parameters of different scenes are obtained by capturing different sound samples, for example, sound samples of different age levels, sound samples of different genders, sound samples of different places of use, and in standard The anechoic chamber passes the test system. For example, the Advanced Communication Quality Analysis (ACQUA) audio test system can be used to test the terminal, adjust the audio parameters in real time, and process the test results, for example, the average subjective score ( The Mean Opinion Score, referred to as the MOS) score size or the ITU-defined voice test standard, determines the audio parameters that use the best results. Then, the MOS score derived from the subjective auditory feeling of the volunteer is used to judge whether the voice call quality at this time is in a good state, and finally whether the selected audio parameter needs to be continuously adjusted. FIG. 4 is a schematic diagram of an environment used by a user body-subject subjective test according to an embodiment of the present invention. As shown in FIG. 4, the voice call quality of the mobile terminal in the sending direction can be tested. If the mobile terminal and the fixed telephone in the two rooms are interchanged, the quality of the voice call in the direction in which the mobile terminal is received can be tested. In addition, for the noise of different use environments, the background noise of the real scene is simulated in the room through the speaker, so as to test the voice call quality of the mobile terminal in different scenarios. In this embodiment, the adjusted audio parameters can also be validated by writing to the DSP register, and the used DSP registers and corresponding algorithms include filters, analog gains, digital gains, echo algorithms, and the like. The following describes the audio-related modules in the mobile terminal. It should be noted that the following modules are merely exemplary, but the implementation of these functions is not limited to being implemented in the following modules. FIG. 5 is a structural block diagram of an audio module of a mobile terminal according to an embodiment of the present invention. As shown in FIG. 5, for mobile terminals, algorithms for generally adjusting voice call quality include Auto Gain Control (AGC) module, digital gain, analog gain, and Finite Impulse Response (FIR). Or no P-monthly response (Infinite Impulse Response, referred to as IIR) Filter and Echo Canceller (EC) modules. Among them, the AGC module is used to control the size of the sound gain in the transmitting and receiving directions, to avoid the excessive or too small sound affecting the user's subjective hearing experience. It is based on the set compression threshold, extended threshold, compression slope, extended slope and static gain registers. Gain adjustment is performed and low frequency noise can be filtered out. Both digital gain and analog gain can increase and decrease the gain on the transmit or receive channels. The FIR or IIR filter is used to modulate the frequency response of the received or transmitted speech, and can be adjusted to achieve optimal conditions according to different scenarios. The EC module is responsible for eliminating echoes during the mobile terminal's call, and the registers of this module can suppress the transmission of background noise and enhance the double talk effect. For different voice call quality modes or scenarios, the above DSP registers need to be adjusted in different scenarios. The following description will be based on the spectrum range in which the sounds are different.

60~100Hz: This frequency affects the richness of the sound and is the pitch area of the bass. If the frequency is very full, the tone will look thick and thick. If the frequency is insufficient, the tone will become weak; if the frequency is too strong, the tone will have a low frequency resonance and a sensation.

100~150Hz: This frequency affects the fullness of the tone. If this frequency component is enhanced, it will create a sense of space and thickness in the room. If this frequency component is missing, the tone will become thin. Pale; if this frequency component is too strong, the tone will appear turbid and the clarity of the voice will deteriorate. 150~300Hz: This frequency affects the strength of the sound, especially the strength of the male voice. This frequency is the low-frequency fundamental frequency of the male voice, and is also the root audio frequency of the chord in the tone. If this frequency component is lacking, the tone will appear soft and fluttering, and the voice will become soft. If this frequency component is too strong, the sound will become stiff and unnatural, and there is no special feature.

200~500Hz: The mid-band frequency determines the sound intensity. If the sound exceeds +5dB~10dB, the sound becomes blurred, the sharpness decreases, and the drop is -6~10dB. The sound lacks strength and is thin, and the sound is hard and narrow.

300~500Hz: This frequency is the main zone frequency of the voice. The frequency of this frequency is full and the voice is strong. If the amplitude of this frequency is insufficient, the sound will appear hollow and not solid; if the frequency is too strong, the tone will become monotonous, the relative frequency component will be less, the high frequency will be less, and the voice will become similar to the phone. The sound of the sound is the same, it looks very monotonous. 500~lKHz: The frequency is the pitch frequency area of the human voice and is an important frequency range. If this frequency is full, the vocal contour is clear and the overall feeling is good; if the frequency is not enough, the voice will have a sense of contraction; if the frequency is too strong, the voice will have a feeling of forward accentuation. The voice produces an auditory feeling that enters the person in advance. 800Hz: This frequency amplitude affects the strength of the tone. If this frequency is full, the tone will appear strong and powerful; if this frequency is insufficient, the tone will appear slack, that is, the characteristic characteristics below 800 Hz will be prominent, and the low frequency component will be obvious; if this frequency is too high, the throat will be produced. Sound sense. Everyone has a throat. Everyone has a certain throat sound. If there are too many throat sounds in the tone, the sound will be lost. l~2KHz: This frequency range has obvious transparency and smoothness. If this frequency is lacking, the tone is loose and the tone is out of line; if this frequency is too strong, the tone has a jump.

2~3KHz: This frequency is the most sensitive frequency band that affects the brightness of the sound. If the frequency component is rich, the brightness of the tone will be enhanced. If the frequency is insufficient, the tone will become awkward; If this frequency component is too strong, the tone will appear dull, hard and unnatural. The medium to high frequency band of l~3KHz plays an important role in brightness, sharpness and presence. If the frequency band exceeds +3~5dB, the sound will be hardened. If it exceeds +5~10dB, metal sound will appear, and the drop will be -3~5dB. Hardening, more than +5~10dB will appear metal sound, falling -3~5dB will make the tone lose brightness, falling -5~10dB The sound is boring, not clear.

3~4KHz: The penetration of this frequency is very strong. The resonant frequency of the human ear cavity is 1 to 4 ΚΗζ, so the human ear is also very sensitive to this frequency. If this frequency component is too small, the hearing ability will be worse, and the voice will be blurred. If this frequency component is too strong, it will produce a coughing sensation. For the gender of the mobile terminal holder, the low frequency of the general male voice spectrum is relatively rich, resulting in the other side of the mobile terminal (ie, the called end), the sound will be boring, in this case, you can pass the FIR Or the IIR filter performs the adjustment of the chirp frequency, and raises the spectral gain between the 100-500HZ portion of the chirp frequency to improve the call quality. The high frequency of the female voice spectrum is rich, which causes the other side (called end) of the mobile terminal to sound sharp and harsh, and can use the FIR or IIR filter to obtain the spectral gain between 3000-4000HZ at high frequencies. Perform a certain pressure to improve the quality of the call. For the age of mobile terminal holders, the age level of people is generally divided into old age, middle age, youth, and children. Depending on the hearing ability, the distribution of sound spectrum is different, and it is necessary to adjust with different FIR and IIR filter parameters. . If the elderly have poor hearing, you need to improve the receiving channel. Receiving strength, if the Receiver device itself supports the Hear Assist Carrier (HAC) function, you can open the HAC of the handset, but ensure that the volume is within acceptable limits and cannot exceed human physiological pain. Threshold and cannot exceed the power range of the earpiece or speaker. For the mobile terminal holder's use environment, such as the road, the traffic volume is relatively large, and the noise is relatively large, then the noise suppression algorithm can be used to suppress the vehicle noise to a greater extent, and can also be in the AGC pair. The frequency band of the road noise is filtered and specially processed to achieve an optimal call effect. 6 is a schematic diagram of scene relationships based on gender, age, and place of use, in accordance with an embodiment of the present invention. Figure 6 shows a scenario linkage diagram based on gender, age, and location of use. Based on Figure 6, there are about 40 usage scenarios that can be provided, and can be added according to actual needs. Based on the gender, age level, and location of the mobile terminal user described above, approximately 40 audio call scene configurations are required. Each scene configuration is rigorously tested by the audio lab using different audio samples, and the actual scene is tested to ensure the voice call quality of each scene. When the user makes no choices, the mobile terminal is configured with a set of default audio parameters. When the user selects his or her gender, age level, and the use environment of the call according to his/her own situation, the mobile terminal invokes the specified scene configuration through the setting of the user, thereby achieving the purpose of improving the quality of the voice call, and greatly improving the mobile terminal. The flexibility to improve the quality of voice calls provides users with great convenience. Through the above embodiments, the requirements of the user for the quality of the voice call can be satisfied, and the cost of the mobile terminal manufacturer is reduced, and the quality of the voice call is significantly improved when the user uses the mobile terminal. Obviously, those skilled in the art should understand that the above modules or steps of the present invention can be implemented by a general-purpose computing device, which can be concentrated on a single computing device or distributed over a network composed of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device so that they may be stored in the storage device by the computing device, or they may be separately fabricated into individual integrated circuit modules, or Multiple modules or steps are made into a single integrated circuit module. Thus, the invention is not limited to any specific combination of hardware and software. The above is only the preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes can be made to the present invention. Where in the invention ^" God and Within the principles, any modifications, equivalent substitutions, improvements, etc., are intended to be included within the scope of the present invention.

Claims

Claim

1. A method for processing voice call quality, comprising:

Setting a plurality of scenarios in the mobile terminal, where each scenario corresponds to a set of audio parameters;

The mobile terminal is set using audio parameters corresponding to one of the plurality of scenarios.

2. The method according to claim 1, wherein the audio parameter corresponding to the scene is determined according to at least one of the following:

The age group of the user, the gender of the user, and the environment in which the mobile terminal is in a call.

3. The method of claim 1 or 2, wherein the following steps are used to determine audio parameters corresponding to the scene:

Collecting audio samples corresponding to the scene;

Testing the audio sample in a standard anechoic chamber;

The audio parameters corresponding to the scene are determined according to the test results.

The method according to claim 1 or 2, wherein the audio parameter comprises at least one of the following:

Parameters for controlling the magnitude of the sound gain in the transmit and/or receive directions, adjusting the parameters of the digital gain on the transmit and/or receive channels, adjusting the parameters of the analog gain on the transmit and/or receive channels, modulating the transmission and/or receiving speech Frequency parameters, parameters that suppress background noise transmission, and parameters that enhance double talk performance.

The method according to claim 1 or 2, wherein setting the mobile terminal by using an audio parameter corresponding to one of the multiple scenarios comprises:

The audio parameters corresponding to one of the plurality of scenes are written into a digital signal processing DSP register, and the terminal sets the mobile terminal according to the audio parameters in the DSP register.

6. A processing device for voice call quality, located in the mobile terminal, comprising: The first setting module is configured to set a plurality of scenarios, where each scenario corresponds to a set of audio parameters;

And a second setting module, configured to set the mobile terminal by using an audio parameter corresponding to one of the multiple scenarios.

The device according to claim 6, wherein the device further comprises:

a parameter preset module, configured to determine an audio parameter corresponding to the scene according to at least one of the following:

The device according to claim 6 or 7, wherein the device further comprises:

a collection module, configured to collect audio samples corresponding to the scene;

The test module is configured to perform testing in the standard anechoic chamber using the audio sample; and the determining module is configured to determine an audio parameter corresponding to the scene according to the test result.

The device according to claim 6 or 7, wherein the audio parameter corresponding to the scene set by the first setting module comprises at least one of the following:

The device according to claim 6 or 7, wherein the device further comprises:

a writing module, configured to write audio parameters corresponding to one of the plurality of scenarios into a digital signal processing DSP register, where the terminal performs the mobile terminal according to an audio parameter in the DSP register Settings.