CN107819964B

CN107819964B - Method, device, terminal and computer readable storage medium for improving call quality

Info

Publication number: CN107819964B
Application number: CN201711125884.3A
Authority: CN
Inventors: 杨宗业
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2017-11-10
Filing date: 2017-11-10
Publication date: 2021-04-06
Anticipated expiration: 2037-11-10
Also published as: CN107819964A

Abstract

The invention belongs to the technical field of voice communication, and particularly relates to a method, a device, a terminal and a computer readable storage medium for improving communication quality, wherein the method comprises the following steps: when a user terminal is in a hands-free call state, acquiring an uplink signal of an uplink channel and a downlink signal of a downlink channel, and judging a call mode of the user terminal according to the uplink signal and the downlink signal; if the user terminal is in a double-talk mode, identifying whether the downlink signal is a non-voice signal; and if the downlink signal is identified to be a non-voice signal, attenuating the gain of the downlink channel. The echo is suppressed, and the tone quality and the call quality of the uplink voice are improved.

Description

Method, device, terminal and computer readable storage medium for improving call quality

Technical Field

The invention belongs to the technical field of voice communication, and particularly relates to a method, a device, a terminal and a computer readable storage medium for improving communication quality.

Background

In the prior art, echo is often generated in the voice double-talk scenes such as hands-free conversation, video chat, WeChat voice and the like, and the echo refers to that a microphone of a local end talker collects a voice signal of the local end talker and simultaneously collects a non-voice signal which is broadcasted by a loudspeaker and transmitted from an opposite end talker. When the local end speaker sends all the sounds collected by the microphone to the opposite end speaker, the opposite end speaker can also hear the non-voice signal, namely, the echo while hearing the voice signal of the local end speaker, therefore, in order to avoid the influence of the non-voice signal on the communication quality, all the sounds collected by the microphone need to be subjected to echo suppression processing, but the processing method for suppressing the echo in the prior art can influence the voice signal of the local end speaker, and can damage the voice signal of the local end speaker to a certain extent while eliminating the echo, so that signal distortion is caused, and the opposite end speaker can hear intermittent sounds, so that the voice communication quality is influenced.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, a terminal, and a computer-readable storage medium for improving call quality, which can avoid an influence of a non-voice signal on the call quality and improve the tone quality of an uplink voice.

A first aspect of an embodiment of the present invention provides a method for improving call quality, including:

when a user terminal is in a hands-free call state, acquiring an uplink signal of an uplink channel and a downlink signal of a downlink channel, and judging a call mode of the user terminal according to the uplink signal and the downlink signal;

if the user terminal is in a double-talk mode, identifying whether the downlink signal is a non-voice signal;

and if the downlink signal is identified to be a non-voice signal, attenuating the gain of the downlink channel.

A second aspect of the embodiments of the present invention provides an apparatus for improving call quality, including:

a call mode judging unit, configured to, when a user terminal is in a hands-free call state, acquire an uplink signal of an uplink channel and a downlink signal of a downlink channel, and judge a call mode of the user terminal according to the uplink signal and the downlink signal;

the identification unit is used for identifying whether the downlink signal is a non-voice signal or not if the user terminal is in a double-talk mode;

and the attenuation unit is used for attenuating the gain of the downlink channel if the downlink signal is identified to be a non-voice signal.

A third aspect of the embodiments of the present invention provides a terminal, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method when executing the computer program.

A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the above method.

In the embodiment of the invention, when the user terminal is in a double-talk mode, whether the downlink signal is a non-voice signal is identified, and when the downlink signal is the non-voice signal, the gain of the downlink channel is attenuated, so that a microphone of a local terminal caller can not acquire the non-voice signal broadcasted by a loudspeaker when acquiring the voice signal of the local terminal caller, echo is inhibited, and the tone quality of uplink voice is improved. In addition, because the gain of the downstream channel is attenuated when the downstream signal is a non-voice signal, the local end speaker does not hear the intermittent voice, and the communication quality is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments or the prior art will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and that for a person skilled in the art, other relevant drawings can be obtained from these drawings without inventive effort.

Fig. 1 is a schematic structural diagram of a user terminal according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating an implementation of a method for improving call quality according to an embodiment of the present invention;

fig. 3 is a flowchart illustrating a specific implementation of the method S202 for improving the call quality according to the second embodiment of the present invention;

fig. 4 is a flowchart of a specific implementation of the method S202 for improving the call quality according to the third embodiment of the present invention;

fig. 5 is a schematic structural diagram of an apparatus for improving communication quality according to a fourth embodiment of the present invention;

fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 shows a schematic structural diagram of a user terminal, which may be a smart phone or a pad, including a dual microphone and a speaker 103 according to an embodiment of the present invention. The dual microphones include a primary microphone 101 and a secondary microphone 102. During the conversation of the user, the primary microphone 101 is mainly used for collecting the voice of the user, and the secondary microphone 102 is mainly used for collecting the external noise. After the processor of the smart phone or pad processes different sounds collected by the primary microphone 101 and the secondary microphone 102, a sound wave opposite to the external noise is generated to cancel the external noise.

However, when the user terminal is in the hands-free call state, for example, in a voice double-talk scene such as hands-free call, video chat, WeChat voice or game voice, an echo problem often occurs. That is, the microphone of the local terminal talker collects the voice signal of the local terminal talker and also collects the non-voice signal broadcast by the speaker 103 and transmitted from the opposite terminal talker, so that when the local terminal talker sends all the sounds collected by the microphone to the opposite terminal talker, the opposite terminal talker hears the voice signal of the local terminal talker and also hears the non-voice signal, i.e. the echo, thereby affecting the call quality.

The echo suppression requirement is higher and higher as the hands-free downlink sound is larger. When all the sounds collected by the microphone of the local end caller in the prior art are subjected to echo suppression processing, the influence on the voice signal of the local end caller can be caused, the echo is eliminated, and meanwhile, the voice signal of the local end caller can be damaged to a certain extent, so that signal distortion is caused, and the opposite end caller can hear intermittent sounds, so that the voice communication quality is influenced.

In order to solve the above problem, in the method for improving the call quality provided in the embodiment of the present invention, when the user terminal is in the dual-talk mode, whether the downlink signal is a non-voice signal is identified, and when the downlink signal is a non-voice signal, the gain of the downlink channel is attenuated, so that when the microphone of the local terminal talker collects the voice signal of the local terminal talker, the non-voice signal broadcasted by the speaker is not collected, thereby suppressing echo and improving the tone quality of the uplink voice. In addition, when the downlink signal is a non-voice signal, the gain of the downlink channel is attenuated, and the voice signal of the local end caller cannot be affected, so that the opposite end caller cannot hear the intermittent tone, and the communication quality is improved.

Example one

Fig. 2 shows a flowchart of an implementation of a method for improving call quality, which is applied to a user terminal and includes steps S201 to S203.

In S201, when the user terminal is in the hands-free call state, an uplink signal of an uplink channel and a downlink signal of a downlink channel are obtained, and a call mode of the user terminal is determined according to the uplink signal and the downlink signal.

When the user terminal is in a hands-free call state, acquiring an uplink signal of an uplink channel and a downlink signal of a downlink channel refers to acquiring an uplink signal of an uplink channel and a downlink signal of a downlink channel of a caller at one end; in the embodiment of the present invention, an example of acquiring an uplink signal of an uplink channel and a downlink signal of a downlink channel of a local-end talker is described.

The determining the call mode of the user terminal according to the uplink signal and the downlink signal means determining whether the call mode of the user terminal is in a dual-talk mode or a single-talk mode by determining whether the uplink channel and the downlink channel perform signal transmission simultaneously. The double-talk mode is a talk mode in which the uplink channel and the downlink channel simultaneously transmit signals; the single talk mode is a talk mode in which the uplink channel and one of the downlink channels perform signal transmission.

In some embodiments of the present invention, the determining the call mode of the ue according to the uplink signal and the downlink signal includes: acquiring an uplink decibel value of the uplink signal and a downlink decibel value of the downlink signal; if the uplink decibel value exceeds a first preset threshold value and the downlink decibel value exceeds a second preset threshold value, determining that the user terminal is in a double-talk mode; and if the uplink decibel value is lower than or equal to the first preset threshold value, or the downlink decibel value is lower than or equal to the second preset threshold value, determining that the user terminal is in a single-talk mode.

The first preset threshold and the second preset threshold may be equal or unequal. The first preset threshold and the second preset threshold are valued according to practical application, for example, the value is 5-10 db, and the like, which is not limited herein.

Because the echo problem exists only in the dual-talk mode, the embodiment of the invention performs echo suppression when the user terminal is in the dual-talk mode.

In S202, if the ue is in the dual-talk mode, it is identified whether the downlink signal is a non-voice signal.

The non-voice signal refers to a downlink signal that does not include a voice signal of an opposite-end talker, that is, the downlink signal only includes all voice signals except the voice signal of the opposite-end talker. For example, when the opposite-end talker does not speak, the local-end talker still receives the voice signal transmitted from the opposite-end talker, and the voice signal is a non-voice signal, and for example, the non-voice signal includes the voice signal of the local-end talker, the voice signal of an animal, the voice signal of a device such as a television, and the like.

In S203, if the downlink signal is identified as a non-voice signal, the gain of the downlink channel is attenuated.

The microphone of the local end speaker collects the voice signal of the local end speaker and simultaneously collects the non-voice signal broadcasted by the loudspeaker. When the local end speaker sends all the sounds to the opposite end speaker, the opposite end speaker can hear the non-voice signal while hearing the voice signal of the local end speaker. Thus, the non-voice signal is a voice signal that is not intended to be transmitted to the correspondent party. In addition, since the non-voice signal does not include the voice signal of the opposite-end speaker, it is noise for the local-end speaker.

Therefore, when the downlink signal is judged to be a non-voice signal, the invention enables the microphone of the local end speaker not to acquire the non-voice signal broadcasted by the loudspeaker while acquiring the voice signal of the local end speaker, and enables the opposite end speaker not to hear the non-voice signal while hearing the voice signal of the local end speaker, thereby improving the tone quality of the uplink voice.

Example two

This embodiment is a further limitation on the identification of whether the downlink signal is a non-voice signal in S202 in the first embodiment.

In this embodiment, the non-voice signal refers to a voice signal of the opposite-end talker not included in the downlink signal. For example, when the opposite-end talker does not speak, the local-end talker still receives the voice signal transmitted from the opposite-end talker, and the voice signal is a non-voice signal, and for example, the non-voice signal includes the voice signal of the local-end talker, the voice signal of an animal, the voice signal of a device such as a television, and the like.

As shown in fig. 3, the identifying whether the downlink signal is a non-voice signal includes: step S301 to step S303.

In S301, sound spectra from different sound sources included in the downstream signal are separated.

The sound spectrum is a physical parameter for distinguishing different sound sources, and can be displayed by using an electrical instrument.

Since the non-voice signal includes voice signals generated by various persons or objects capable of generating sounds, such as people, animals, televisions, etc., it is possible to separate sound spectrums from different sound sources for the non-voice signal, and the sound spectrums are used to determine whether the downstream signal is a non-voice signal.

In S302, it is determined whether each of the sound spectrums includes an opposite-end talker voice spectrum.

Optionally, the determining whether each sound spectrum includes a voice spectrum of an opposite-end talker includes: and collecting and storing the voice frequency spectrum of the opposite-end caller.

For example, when a voice call is just started, the voice spectrum of the opposite-end caller is collected and stored, and in the subsequent call process, whether the voice spectrum of the opposite-end caller is included in each voice spectrum is identified by using the stored voice spectrum of the opposite-end caller.

It should be noted that the voice spectrum of the opposite-end talker may be stored in the memory of the local-end talker together with the identity information of the opposite-end talker, and when a next call is made, the pre-stored voice spectrum of the opposite-end talker may be used to identify whether the voice spectrum of the opposite-end talker includes the voice spectrum of the current call. The identity information of the opposite-end talker includes a name, a phone number, a game ID, a micro signal or other identification code of the opposite-end talker, which is not limited herein.

However, in order to save the storage space of the memory of the local end talker, the voice spectrum of the opposite end talker may not be stored for a long time, for example, the storage space for storing the voice spectrum of the opposite end talker is released when the call is ended, or the storage space for storing the voice spectrum of the opposite end talker is released after a set time length, which may be a week, a month, a quarter, or the like, and is not limited herein.

In some embodiments of the present invention, the determining whether each sound spectrum includes an opposite-end talker voice spectrum includes: matching each sound frequency spectrum with the voice frequency spectrum of the opposite-end caller; if the voice frequency spectrum is successfully matched with the voice frequency spectrum of the opposite-end speaker, judging that each voice frequency spectrum comprises the voice frequency spectrum of the opposite-end speaker; and if the voice frequency spectrum does not successfully match with the voice frequency spectrum of the opposite-end speaker, judging that each voice frequency spectrum does not comprise the voice frequency spectrum of the opposite-end speaker.

Wherein the matching of the sound spectrums with the voice spectrum of the opposite-end caller comprises: and matching the frequency spectrum characteristics of each sound frequency spectrum with the frequency spectrum characteristics of the voice frequency spectrum of the opposite-end caller, so as to improve the matching efficiency.

In S302, if the voice spectrum of the opposite-end talker is not included in the voice spectrums, it is identified that the downlink signal is a non-voice signal.

When the voice spectrum of the opposite-end speaker is not included in the voice spectrums, the opposite-end speaker does not speak, and at the moment, the downlink signal can be judged to have no information wanted by the local-end speaker, so that the downlink signal can be judged to be a non-voice signal. That is, it means that the downstream signal can be subjected to an attenuation gain, for example, an amplitude of the downstream signal is attenuated.

Make this end talker's microphone when gathering this end talker's speech signal, can not gather the broadcast non-speech signal of speaker for the opposite end talker can not hear when hearing this end talker's speech signal non-speech signal, thereby improve the tone quality of the pronunciation of going upward.

EXAMPLE III

As shown in fig. 4, the identifying whether the downlink signal is a non-voice signal includes: step S401 to step S404.

In S401, sound spectra from different sound sources included in the downstream signal are separated.

It should be noted that the implementation of S401 is the same as that of S301 in the second embodiment, and details are not repeated here.

In S402, it is determined whether or not the sound spectrums include a human voice spectrum.

Since the human voice spectrum is more periodic than noise such as machine noise, it can be very easily determined whether each of the voice spectra includes the human voice spectrum.

In S403, if the voice spectrum of the human is not included in the voice spectrums, the downlink signal is identified as a non-voice signal.

Since the non-voice signal means that the voice signal of the opposite-end talker is not included, that is, all voice signals except the voice signal of the opposite-end talker. Therefore, when the voice spectrum of the human is not included in the voice spectrums, the downstream signal can be determined to be a non-voice signal.

For example, when neither the local-end talker nor the opposite-end talk speaks, only the voice signal not including the human voice spectrum is in the channel.

In S404, if the sound spectrums include human voice spectrums, determining whether each human voice spectrum is a voice spectrum of a local-end talker; if the voice frequency spectrum of each human voice is judged to be the voice frequency spectrum of the local end caller, the downlink signal is identified to be a non-voice signal; and if the voice frequency spectrum of each human voice is judged to be not the voice frequency spectrum of the local terminal caller, identifying that the downlink signal is not a non-voice signal.

Wherein, the judging whether each human voice frequency spectrum is the voice frequency spectrum of the local end caller is as follows: and identifying whether the human voice frequency spectrum only comprises the voice frequency spectrum of the local end caller, and if the human voice frequency spectrum only comprises the voice frequency spectrum of the local end caller, judging that the downlink signal is a non-voice signal. When the human voice spectrum includes not only the voice spectrum of the local end caller but also other human voice spectrums, it indicates that the downstream signal is not a non-voice signal.

Because the human voice spectrum not only includes the voice spectrum of the local terminal caller, but also includes other human voice spectrums, it indicates that there is a possibility of the voice spectrum of the opposite terminal caller in the downlink signal, at this time, in order to avoid misjudgment, the downlink signal cannot be identified as a non-voice signal, otherwise, the local terminal caller cannot hear the voice signal of the opposite terminal caller. That is, at this time, the downlink signal is recognized as a speech signal, and no attenuation gain is applied thereto.

Optionally, the recognizing whether the human voice spectrum is before the voice spectrum of the local end caller includes: and collecting and storing the voice frequency spectrum of the local terminal caller.

For example, the local caller records the voice signal of the local caller in advance on the user terminal, and stores the corresponding local caller voice spectrum, and during the subsequent call, the stored local caller voice spectrum is used to identify whether the voice spectrum of the local caller is included in each voice spectrum.

In some embodiments of the present invention, the recognizing whether the human voice spectrum is a local-end speaker voice spectrum includes: matching the voice frequency spectrum of each human voice with the voice frequency spectrum of the local caller; if the voice frequency spectrum of each human voice is successfully matched with the voice frequency spectrum of the local caller, identifying that the downlink signal is a non-voice signal; and if the matching of the human voice frequency spectrum and the voice frequency spectrum of the local caller is unsuccessful in the human voice frequency spectrums, identifying that the downlink signal is not a non-voice signal.

Since the identity of the local caller is relatively fixed, in the embodiment of the present invention, it is determined whether each voice spectrum of the downlink signal includes a human voice spectrum, and when each voice spectrum includes a human voice spectrum, it is determined whether each human voice spectrum is a voice spectrum of the local caller, so that it is determined whether the downlink signal is a non-voice signal.

Compared with the second embodiment, the embodiment does not need to re-collect the voice spectrum of the opposite-end caller due to the change of the opposite-end call object, but the second embodiment has higher judgment accuracy.

Example four

As shown in fig. 5, an apparatus 500 for improving call quality according to an embodiment of the present invention includes:

a call mode determining unit 501, configured to, when a user terminal is in a hands-free call state, obtain an uplink signal of an uplink channel and a downlink signal of a downlink channel, and determine a call mode of the user terminal according to the uplink signal and the downlink signal;

an identifying unit 502, configured to identify whether the downlink signal is a non-voice signal if the user terminal is in a dual-talk mode;

an attenuating unit 503, configured to attenuate the gain of the downlink channel if the downlink signal is identified as a non-voice signal.

It should be noted that, for convenience and brevity of description, the specific working process of the apparatus 500 for improving call quality described above may refer to the corresponding process of the method described in fig. 2 and fig. 4, and will not be described in detail herein.

Fig. 6 is a schematic diagram of a terminal according to an embodiment of the present invention. As shown in fig. 6, the terminal of this embodiment may include: a processor 601, a memory 602, an input device 603, and an output device 604, the processor 601, the memory 602, the input device 603, and the output device 604 being connected by a bus 605. The input devices 603 may include a keyboard, touchpad, fingerprint sensor, microphone, etc., and the output devices 604 may include a display, speaker, etc.

The terminal also comprises a computer program, such as a program for improving the quality of a call, stored in the memory 602 and executable on the processor 601. The processor 601 executes the computer program to implement the steps in the above-mentioned method embodiments for improving the call quality, such as steps 201 to 203 shown in fig. 2, or the processor 601 executes the computer program to implement the functions of the modules/units in the above-mentioned device embodiments, such as the functions of modules 501 to 503 shown in fig. 5.

Illustratively, the computer program may be partitioned into one or more modules/units that are stored in the memory 602 and executed by the processor 601 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program in the terminal. For example, the computer program may be divided into a call mode determination unit, an identification unit, and an attenuation unit, and each unit may specifically function as follows: a call mode judging unit, configured to, when a user terminal is in a hands-free call state, acquire an uplink signal of an uplink channel and a downlink signal of a downlink channel, and judge a call mode of the user terminal according to the uplink signal and the downlink signal; the identification unit is used for identifying whether the downlink signal is a non-voice signal or not if the user terminal is in a double-talk mode; and the attenuation unit is used for attenuating the gain of the downlink channel if the downlink signal is identified to be a non-voice signal.

The terminal can be a mobile terminal such as a smart phone or a computing device such as a desktop computer, a notebook, a palm computer and a cloud server. The terminal may include, but is not limited to, a processor 601, a memory 602. Those skilled in the art will appreciate that fig. 6 is only an example of a terminal and is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or different components, e.g., the terminal may also include input-output devices, network access devices, buses, etc.

The Processor 601 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 602 may be an internal storage unit of the terminal, such as a hard disk or a memory of the terminal. The memory 602 may also be an external storage device of the terminal, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal. Further, the memory 602 may also include both an internal storage unit and an external storage device of the terminal. The memory 602 is used for storing the computer programs and other programs and data required by the terminal. The memory 602 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal and method may be implemented in other ways. For example, the above-described apparatus/terminal embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A method for improving call quality, comprising:

when the user terminal is in a hands-free call state, acquiring an uplink signal of an uplink channel and a downlink signal of a downlink channel of a local end caller, and judging a call mode of the user terminal according to the uplink signal and the downlink signal;

if the user terminal is in a dual-talk mode, identifying whether the downlink signal is a non-voice signal, including: separating sound spectrums from different sound sources included in the downlink signal; judging whether each sound frequency spectrum comprises the voice frequency spectrum of the opposite-end caller; if the voice frequency spectrum of the opposite-end speaker is not included in the voice frequency spectrums, the downlink signal is identified as a non-voice signal; the voice frequency spectrum of the opposite-end speaker and the identity information of the opposite-end speaker are stored in a memory of the local-end speaker together, and the identity information of the opposite-end speaker comprises the name and the telephone number of the opposite-end speaker;

2. The method of claim 1, wherein said determining whether the voice spectrum of the peer talker is included in the voice spectra comprises:

and collecting and storing the voice frequency spectrum of the opposite-end caller.

3. The method of claim 1 or 2, wherein said determining whether the voice spectrum of the opposite party includes a voice spectrum of the opposite party comprises:

matching each sound frequency spectrum with the voice frequency spectrum of the opposite-end caller; if the voice frequency spectrum is successfully matched with the voice frequency spectrum of the opposite-end speaker, judging that each voice frequency spectrum comprises the voice frequency spectrum of the opposite-end speaker; and if the voice frequency spectrum does not successfully match with the voice frequency spectrum of the opposite-end speaker, judging that each voice frequency spectrum does not comprise the voice frequency spectrum of the opposite-end speaker.

4. The method of claim 1, wherein said identifying whether said downstream signal is a non-speech signal comprises:

separating sound spectrums from different sound sources included in the downlink signal;

judging whether each sound frequency spectrum comprises a human voice frequency spectrum or not;

if the voice frequency spectrums do not comprise human voice frequency spectrums, identifying the downlink signals as non-voice signals;

if the sound frequency spectrums comprise human voice frequency spectrums, judging whether the human voice frequency spectrums are voice frequency spectrums of the local end talker or not; if the voice frequency spectrum of each human voice is judged to be the voice frequency spectrum of the local end caller, the downlink signal is identified to be a non-voice signal; and if the voice frequency spectrum of each human voice is judged to be not the voice frequency spectrum of the local terminal caller, identifying that the downlink signal is not a non-voice signal.

5. The method of claim 4, wherein said determining whether each human voice frequency spectrum is a local speaker voice frequency spectrum comprises:

and collecting and storing the voice frequency spectrum of the local terminal caller.

6. The method of claim 4 or 5, wherein said determining whether each human voice spectrum is a local speaker voice spectrum comprises:

matching the voice frequency spectrum of each human voice with the voice frequency spectrum of the local caller;

if the voice frequency spectrum of each human voice is successfully matched with the voice frequency spectrum of the local end caller, identifying that the downlink signal is a non-voice signal, wherein the identification comprises the following steps: separating sound spectrums from different sound sources included in the downlink signal; judging whether each sound frequency spectrum comprises the voice frequency spectrum of the opposite-end caller; if the voice frequency spectrum of the opposite-end speaker is not included in the voice frequency spectrums, the downlink signal is identified as a non-voice signal; the voice frequency spectrum of the opposite-end speaker and the identity information of the opposite-end speaker are stored in a memory of the local-end speaker together; and if the matching of the human voice frequency spectrum and the voice frequency spectrum of the local caller is unsuccessful in the human voice frequency spectrums, identifying that the downlink signal is not a non-voice signal.

7. An apparatus for improving call quality, comprising:

a call mode judging unit, configured to, when a user terminal is in a hands-free call state, acquire an uplink signal of an uplink channel and a downlink signal of a downlink channel of a local-end talker, and judge a call mode of the user terminal according to the uplink signal and the downlink signal;

the identification unit is configured to identify whether the downlink signal is a non-voice signal if the user terminal is in a dual-talk mode, and includes: separating sound spectrums from different sound sources included in the downlink signal; judging whether each sound frequency spectrum comprises the voice frequency spectrum of the opposite-end caller; if the voice frequency spectrum of the opposite-end speaker is not included in the voice frequency spectrums, the downlink signal is identified as a non-voice signal; the voice frequency spectrum of the opposite-end speaker and the identity information of the opposite-end speaker are stored in a memory of the local-end speaker together; the identity information of the opposite-end caller comprises the name and the telephone number of the opposite-end caller;

8. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when executing the computer program.

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.