CN113824843B

CN113824843B - Voice call quality detection method, device, equipment and storage medium

Info

Publication number: CN113824843B
Application number: CN202010566405.7A
Authority: CN
Inventors: 王夏鸣
Original assignee: Volkswagen Mobvoi Beijing Information Technology Co Ltd
Current assignee: Volkswagen Mobvoi Beijing Information Technology Co Ltd
Priority date: 2020-06-19
Filing date: 2020-06-19
Publication date: 2023-11-21
Anticipated expiration: 2040-06-19
Also published as: CN113824843A

Abstract

The embodiment of the invention discloses a voice call quality detection method, a device, equipment and a storage medium. The method comprises the following steps: in the voice communication process of the first terminal and the second terminal, acquiring a first voice signal acquired by the first terminal and sending the first voice signal to the second terminal so that the second terminal plays the first voice signal; acquiring a second voice signal acquired by a second terminal; determining an estimated voice signal according to the first voice signal and the voice call parameter; determining a voice call state of the second terminal according to the first voice signal and the second voice signal; if the voice call state is unmanned call, determining a first residual signal according to the second voice signal and the estimated voice signal, and generating a sound receiving detection result of the second terminal. The embodiment of the invention can avoid the problem of sound receiving feedback obstacle among the call parties caused by abnormal sound receiving in the voice call process, optimize the existing voice call quality detection mode and improve the call efficiency.

Description

Voice call quality detection method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a voice call quality detection method, a device, equipment and a storage medium.

Background

In the current voice call system, the system can be used as a basis for judging the call quality by detecting the network speed, the packet loss rate and the like, so that the other party with normal active prompt signals informs the other party that the call cannot be normally performed due to abnormal network in the voice call process. However, in practical use, there are a large number of such scenarios: the speaker needs to confirm to the recipient whether the recipient is talking normally (whether the recipient can hear the speaker's voice) and then start the conversation.

In the related art, a speaker generally determines whether a receiving sound is normal or not according to active oral feedback of the receiving sound. During the conversation, if the receiving party is abnormal to receive voice, the speaking party cannot directly know the voice, and the voice receiving party depends on the active oral feedback. However, at this time, because the receiving party receives abnormal sound, the communication process itself is affected, and the occurrence of problems, the finding of problems, and the confirmation of the occurrence of problems by both the receiving party and the speaking party are performed, so that the period for suspending the communication to eliminate the abnormal communication is long, and the communication efficiency is affected.

Disclosure of Invention

The embodiment of the invention provides a voice call quality detection method, a device, equipment and a storage medium, which are used for optimizing the existing voice call quality detection mode, actively detecting abnormal sound receiving and improving call efficiency.

In a first aspect, an embodiment of the present invention provides a method for detecting voice call quality, including:

in the voice communication process of a first terminal and a second terminal, acquiring a first voice signal acquired by the first terminal, and sending the first voice signal to the second terminal so that the second terminal plays the first voice signal;

acquiring a second voice signal which is acquired by the second terminal and corresponds to the first voice signal;

determining an estimated voice signal corresponding to the first voice signal according to the first voice signal and the voice call parameters of the second terminal;

determining a voice call state of the second terminal according to the first voice signal and the second voice signal;

if the voice call state of the second terminal is unmanned call, determining a first residual signal corresponding to the second voice signal according to the second voice signal and the estimated voice signal, and generating a sound receiving detection result of the second terminal according to the first residual signal.

In a second aspect, an embodiment of the present invention further provides a device for detecting voice call quality, including:

the first signal acquisition module is used for acquiring a first voice signal acquired by a first terminal and sending the first voice signal to a second terminal in the voice communication process of the first terminal and the second terminal so that the second terminal plays the first voice signal;

The second signal acquisition module is used for acquiring a second voice signal corresponding to the first voice signal acquired by the second terminal;

an estimated signal determining module, configured to determine an estimated voice signal corresponding to the first voice signal according to the first voice signal and a voice call parameter of the second terminal;

the call state determining module is used for determining the voice call state of the second terminal according to the first voice signal and the second voice signal;

and the first result generation module is used for determining a first residual signal corresponding to the second voice signal according to the second voice signal and the estimated voice signal and generating a sound receiving detection result of the second terminal according to the first residual signal if the voice call state of the second terminal is unmanned call.

In a third aspect, an embodiment of the present invention further provides a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor implements the method for detecting voice call quality according to the embodiment of the present invention when executing the computer program.

In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement a voice call quality detection method according to an embodiment of the present invention.

According to the technical scheme of the embodiment of the invention, the echo signal of the receiving party in the voice call process is estimated according to the reference signal, and compared with the echo signal of the actual receiving party through an algorithm, whether the receiving party receives sound normally or not is judged, and a corresponding prompt is given to a speaking party, so that the feedback obstacle of the receiving problem among the speaking parties caused by abnormal receiving sound in the voice call process is avoided, the existing voice call quality detection mode is optimized, and the call efficiency is improved.

Drawings

Fig. 1 is a flowchart of a voice call quality detection method according to an embodiment of the present invention.

Fig. 2 is a flowchart of a voice call quality detection method according to a second embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a voice call quality detection apparatus according to a third embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof.

It should be further noted that, for convenience of description, only some, but not all of the matters related to the present invention are shown in the accompanying drawings. Before discussing exemplary embodiments in more detail, it should be mentioned that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart depicts operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently, or at the same time. Furthermore, the order of the operations may be rearranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Example 1

Fig. 1 is a flowchart of a voice call quality detection method according to an embodiment of the present invention. The embodiment of the invention can be suitable for the situation that the feedback of the sound receiving problem between the call parties is blocked under the abnormal sound receiving condition in the voice call process, the method can be executed by the voice call quality detection device provided by the embodiment of the invention, and the device can be realized in a software and/or hardware mode and can be generally integrated in computer equipment. Such as a cloud server. As shown in fig. 1, the method in the embodiment of the present invention specifically includes:

Step 101, in a voice call process between a first terminal and a second terminal, acquiring a first voice signal acquired by the first terminal, and sending the first voice signal to the second terminal, so that the second terminal plays the first voice signal.

In this embodiment, the first terminal is a terminal device of a speaker terminal in a voice call, and may be any electronic device having a voice call function, for example, a mobile phone, a fixed phone, etc., where the number of the first terminals may be one or more, which corresponds to the situations of a two-party call and a multi-party call; the second terminal is a terminal device of a receiving party in voice call, and can be any electronic device with voice call function, such as a mobile phone, a fixed phone and the like, and the number of the second terminals can be one or more, which corresponds to the situations of both party call and multiparty call; the first voice signal is a voice signal which can be collected by the first terminal when the speaker starts to speak, and comprises but is not limited to a voice generated by the speaker speaking and noise in the environment where the speaker is located; the device for collecting the first voice signal in the first terminal may be a microphone, or any other voice signal input device that may be connected to the first terminal, and is not limited; the device for playing the first voice signal in the second terminal may be a speaker, or may be any other voice signal output device that may be connected to the second terminal, and is not limited.

Step 102, acquiring a second voice signal corresponding to the first voice signal acquired by the second terminal.

The second voice signal is a voice signal which can be collected by the second terminal when the second terminal loudspeaker plays the first voice signal, including but not limited to noise in an environment where a receiving party is located, and the second terminal loudspeaker plays the voice generated by the first voice signal and the voice generated by the second terminal loudspeaker plays the voice reflected by the complex and changeable wall surface; the device for collecting the second voice signal in the second terminal may be a microphone, or may be any other voice signal input device that may be connected to the second terminal, and is not limited.

Step 103, determining an estimated voice signal corresponding to the first voice signal according to the voice call parameters of the first voice signal and the second terminal.

Wherein the voice call parameters include parameters that can be set by a calling party in the second terminal and parameters inherent to the device, specifically including, but not limited to, a system sound value of the second terminal, a microphone sensitivity of the second terminal, and a speaker frequency response curve and speaker power of the second terminal.

In this embodiment, optionally, the voice call parameter may be acquired in the cloud server before the first voice signal acquired by the first terminal is acquired.

The method for determining the estimated voice signal corresponding to the first voice signal may be that the estimated voice signal is obtained by carrying the voice call parameter into a formula corresponding to a correlation algorithm, and preferably, the following voice signal estimation formula is provided in this embodiment, and the estimated voice signal corresponding to the first voice signal is calculated:

B(F，t)＝α(V)*V*S*f(F)A(F，t)，

wherein B (F, t) is an estimated speech signal corresponding to the first speech signal, V is a system sound value of the second terminal, α (V) is an empirical amplification factor, and is a function of the system sound value V of the second terminal, and is a proportionality coefficient of loudness of the second speech signal collected by the second terminal relative to the first speech signal collected by the first terminal when the system sound values of the second terminal are different, S is a microphone sensitivity of the second terminal, F (F) is a speaker frequency response curve of the second terminal, and a (F, t) is the first speech signal.

Step 104, determining the voice call state of the second terminal according to the first voice signal and the second voice signal.

The method comprises the steps that the voice call state of a second terminal is determined to be finished through a double-talk detection algorithm, the double-talk detection algorithm can detect whether voice signals collected by a speaking party end and voice signals collected by a receiving party end exist simultaneously, if so, the voice call state of the second terminal can be determined to be a person call, and if not, the voice call state of the second terminal can be determined to be an unmanned call; the conventional double-talk detection algorithm includes an energy comparison method, a correlation comparison method, a double-filter method, and the like, and this is not limited in this embodiment.

Step 105, if the voice call state of the second terminal is unmanned call, determining a first residual signal corresponding to the second voice signal according to the second voice signal and the estimated voice signal, and generating a sound receiving detection result of the second terminal according to the first residual signal.

The first residual signal is a signal representing a difference between the second voice signal and the estimated voice signal when the voice call state of the second terminal is an unmanned call, and specifically may be a difference obtained by subtracting the estimated voice signal from the second voice signal.

Preferably, the following residual signal calculation formula is provided in this embodiment, and is used for calculating a first residual signal corresponding to the second speech signal:

e ₁ (F，t)＝C(F，t)-B(F，t)，

wherein e ₁ (F, t) is a first residual signal corresponding to the second speech signal, C (F, t) is the second speech signal, and B (F, t) is the estimated speech signal.

Preferably, the sound receiving detection result of the second terminal includes normal sound receiving and too small sound receiving.

Optionally, the generating, according to the first residual signal, a receiving detection result of the second terminal includes: calculating an integration area of the frequency-time coordinates such that the first residual signal amplitude is positive; calculating a frequency domain time domain area corresponding to the first residual signal; calculating a variance of the first residual signal; and if the ratio of the integral area of the frequency-time coordinate enabling the amplitude of the first residual signal to be positive to the frequency-time area corresponding to the first residual signal is larger than a first preset threshold value and the variance of the first residual signal is smaller than a second preset threshold value, determining that the sound receiving detection result of the second terminal is sound receiving normal.

Optionally, the first preset threshold and the second preset threshold are preset experience parameters, and can be adjusted according to actual system performance of the voice call system. Illustratively, the first preset threshold is 90% and the second preset threshold is 6db. Preferably, the second terminal is generated from the first residual signal The method of receiving the sound detection result includes calculating an integration area of the frequency-time coordinates such that the first residual signal amplitude is positive using the formula ≡dfdt according to the symbolization rule provided in the present embodiment; calculate a frequency domain time domain area (F _max -F _min )t _max Wherein F is _max For the maximum frequency of the first residual signal, F _min Is the frequency minimum value of the first residual signal, t _max Is the temporal maximum of the first residual signal; calculating the variance of the first residual signal, the symbology rule provided in this embodiment may be expressed as s ² (e ₁ ) The method comprises the steps of carrying out a first treatment on the surface of the If the ratio of the integral area of the frequency-time coordinate, which makes the amplitude of the first residual signal positive, to the frequency-time area corresponding to the first residual signal is greater than a first preset threshold, and the variance of the first residual signal is smaller than a second preset threshold, the loudness of the second speech signal is large enough and stable enough to maintain normal conversation, and the sound receiving detection result of the second terminal is determined to be sound receiving normal; wherein the first preset threshold and the second preset threshold are determined empirical parameters.

Optionally, the generating the receiving detection result of the second terminal according to the first residual signal further includes: calculating an integration area of a frequency-time coordinate such that the first residual signal amplitude is negative; determining an absolute value signal corresponding to the first residual signal, and calculating a mean value of the absolute value signal corresponding to the first residual signal; and if the ratio of the integral area of the frequency-time coordinate with the negative amplitude of the first residual signal to the frequency domain time domain area corresponding to the first residual signal is larger than a third preset threshold value and the average value of the absolute value signal corresponding to the first residual signal is larger than a fourth preset threshold value, determining that the sound receiving detection result of the second terminal is too small, and sending the sound receiving detection result to the first terminal.

Optionally, the third preset threshold and the fourth preset threshold are preset experience parameters, and can be adjusted according to actual system performance of the voice communication system. Illustratively, the third preset threshold is 90% and the fourth preset threshold is 20db.

Preferably, the method for generating the result of the detection of the reception of the sound by the second terminal according to the first residual signal further includes calculating an integration area of frequency-time coordinates such that the magnitude of the first residual signal is negative using the formula ≡dfdt according to the symbolization rule provided in the present embodiment; determining an absolute value signal corresponding to the first residual signal, and calculating the average value of the absolute value signal corresponding to the first residual signalIf the ratio of the integral area of the frequency-time coordinate with the negative amplitude of the first residual signal to the frequency-domain time area corresponding to the first residual signal is greater than a third preset threshold value, and the average value of the absolute value signal corresponding to the first residual signal is greater than a fourth preset threshold value, the loudness of the second voice signal is smaller, and the second voice signal is insufficient to maintain normal conversation, the receiving sound detection result of the second terminal is determined to be too small, and the receiving sound detection result is sent to the first terminal; wherein the third preset threshold and the fourth preset threshold are determined empirical parameters.

The embodiment of the invention provides a voice call quality detection method, which is used for judging whether the receiving party receives sound normally and gives corresponding prompts to speaking parties when no one speaks according to a reference signal and comparing the receiving party echo signal with an actual receiving party echo signal through an algorithm in the voice call process, so that the problem of receiving sound feedback between the speaking parties caused by abnormal receiving sound in the voice call process is avoided, the existing voice call quality detection mode is optimized, and the call efficiency is improved.

Example two

Fig. 2 is a flowchart of a voice call quality detection method according to a second embodiment of the present invention. The embodiment of the present invention may be combined with each of the alternatives in one or more embodiments, and in the embodiment of the present invention, after determining the voice call state of the second terminal according to the first voice signal and the second voice signal, the method may further include: if the voice call state of the second terminal is a person call, determining a second residual signal corresponding to the second voice signal according to an echo signal which is estimated by a preset adaptive filter and corresponds to the first voice signal and the estimated voice signal, and generating a receiving sound detection result of the second terminal according to the second residual signal.

As shown in fig. 2, the method in the embodiment of the present invention specifically includes:

step 201, in a voice call process between a first terminal and a second terminal, acquiring a first voice signal acquired by the first terminal, and sending the first voice signal to the second terminal, so that the second terminal plays the first voice signal.

Step 202, acquiring a second voice signal corresponding to the first voice signal acquired by the second terminal.

Step 203, determining an estimated voice signal corresponding to the first voice signal according to the voice call parameters of the first voice signal and the second terminal.

Step 204, determining the voice call state of the second terminal according to the first voice signal and the second voice signal.

Step 205, if the voice call state of the second terminal is a person call, determining a second residual signal corresponding to the second voice signal according to an echo signal corresponding to the first voice signal estimated by a preset adaptive filter and the estimated voice signal, and generating a receiving detection result of the second terminal according to the second residual signal.

The preset adaptive filter is a filter capable of automatically adjusting a filtering coefficient by adopting an adaptive algorithm according to an input voice signal, and performing voice signal processing to obtain an expected response, and in this embodiment, when the voice call state of the second terminal is a person call, a voice signal other than an echo signal corresponding to the first voice signal collected by the second terminal can be eliminated through the preset adaptive filter, so as to obtain an estimated echo signal corresponding to the first voice signal, and the estimated echo signal is used for comparing with the estimated voice signal.

The second residual signal is a signal representing a difference between the estimated echo signal corresponding to the first voice signal and the estimated voice signal when the voice call state of the second terminal is a person call, and specifically may be a difference obtained by subtracting the estimated voice signal from the estimated echo signal corresponding to the first voice signal.

Preferably, the following residual signal calculation formula is provided in this embodiment, and is used for calculating a second residual signal corresponding to the second speech signal:

e ₂ (F，t)＝W(F，t)-B(F，t)

wherein e ₂ (F, t) is a second residual signal corresponding to the second speech signal, W (F, t) is an echo signal corresponding to the first speech signal estimated according to a preset adaptive filter, and B (F, t) is the estimated speech signal.

Optionally, the generating, according to the second residual signal, a receiving detection result of the second terminal includes: calculating an integration area of the frequency-time coordinates such that the second residual signal amplitude is positive; calculating a frequency domain time domain area corresponding to the second residual signal; calculating a variance of the second residual signal; and if the ratio of the integral area of the frequency-time coordinate enabling the amplitude of the second residual signal to be positive to the frequency-time area corresponding to the second residual signal is larger than a fifth preset threshold value, and the variance of the second residual signal is smaller than a sixth preset threshold value, determining that the sound receiving detection result of the second terminal is sound receiving normal.

Optionally, the fifth preset threshold and the sixth preset threshold are preset experience parameters, and may be adjusted according to an actual system performance of the voice call system. Illustratively, the fifth preset threshold is 90% and the sixth preset threshold is 6db.

Preferably, the method of generating the reception detection result of the second terminal according to the second residual signal includes calculating an integration area of frequency-time coordinates such that the second residual signal amplitude is positive using the formula ≡dfdt according to the symbolization rule provided in the present embodiment; calculating a frequency domain time domain area (F _max -F _min )t _max Wherein F is _max Is the maximum frequency of the second residual signal, F _min Is the minimum frequency of the second residual signal, t _max Is the temporal maximum of the second residual signal; calculating the variance of the second residual signal, the symbolism rule provided in this embodiment may be expressed as s ² (e ₂ ) The method comprises the steps of carrying out a first treatment on the surface of the If the ratio of the integral area of the frequency-time coordinate, which makes the amplitude of the second residual signal positive, to the frequency-time area corresponding to the second residual signal is greater than a fifth preset threshold, and the variance of the second residual signal is smaller than a sixth preset threshold, the loudness of the second speech signal is large enough and stable enough to maintain normal conversation, and the sound receiving detection result of the second terminal is determined to be sound receiving normal; wherein the fifth preset threshold and the sixth preset threshold are determined empirical parameters.

Optionally, the generating the receiving detection result of the second terminal according to the second residual signal further includes: calculating an integration area of the frequency-time coordinates such that the second residual signal amplitude is negative; determining an absolute value signal corresponding to the second residual signal, and calculating a mean value of the absolute value signal corresponding to the second residual signal; and if the ratio of the integral area of the frequency-time coordinate with the negative amplitude of the second residual signal to the frequency domain time domain area corresponding to the second residual signal is greater than a seventh preset threshold value and the average value of the absolute value signals corresponding to the second residual signal is greater than an eighth preset threshold value, determining that the sound receiving detection result of the second terminal is too small, and sending the sound receiving detection result to the first terminal.

Optionally, the seventh preset threshold and the eighth preset threshold are preset experience parameters, and may be adjusted according to an actual system performance of the voice call system. Illustratively, the seventh preset threshold is 90% and the eighth preset threshold is 20db.

Preferably, the method for generating the result of the detection of the sound reception of the second terminal according to the second residual signal further includes calculating an integration area of the frequency-time coordinates such that the magnitude of the second residual signal is negative using the formula ≡dfdt according to the symbolization rule provided in the present embodiment; determining an absolute value signal corresponding to the second residual signal and calculating the average value of the absolute value signal corresponding to the second residual signal If the ratio of the integral area of the frequency-time coordinate with the negative amplitude of the second residual signal to the frequency-domain time area corresponding to the second residual signal is greater than a seventh preset threshold, and the average value of the absolute value signals corresponding to the second residual signal is greater than an eighth preset threshold, the loudness of the second voice signal is smaller, and the second voice signal is insufficient to maintain normal conversation, the receiving sound detection result of the second terminal is determined to be too small, and the receiving sound detection result is sent to the first terminal; wherein the seventh preset threshold and the eighth preset threshold are determined empirical parameters.

The embodiment of the invention provides a voice call quality detection method, which is used for judging whether a receiving party receives sound normally or not and giving corresponding prompts to speaking parties according to a reference signal and comparing the receiving party echo signal with an actual receiving party echo signal through an algorithm in the voice call process, so that the problem of receiving sound among the speaking parties caused by abnormal receiving sound in the voice call process is avoided, the existing voice call quality detection mode is optimized, and the call efficiency is improved.

Example III

Fig. 3 is a schematic structural diagram of a voice call quality detection apparatus according to a third embodiment of the present invention, where, as shown in fig. 3, the apparatus includes:

The first signal obtaining module 311 is configured to obtain a first voice signal collected by a first terminal during a voice call between the first terminal and a second terminal, and send the first voice signal to the second terminal, so that the second terminal plays the first voice signal;

a second signal obtaining module 312, configured to obtain a second voice signal corresponding to the first voice signal, where the second voice signal is collected by the second terminal;

an estimated signal determining module 313, configured to determine an estimated voice signal corresponding to the first voice signal according to the first voice signal and a voice call parameter of the second terminal;

a call state determining module 314, configured to determine a voice call state of the second terminal according to the first voice signal and the second voice signal;

the first result generating module 315 is configured to determine, if the voice call state of the second terminal is an unmanned call, a first residual signal corresponding to the second voice signal according to the second voice signal and the estimated voice signal, and generate a sound receiving detection result of the second terminal according to the first residual signal.

Optionally, the voice call quality detection device may further include a call parameter acquisition module, configured to acquire voice call parameters of the first terminal and the second terminal before acquiring the first voice signal acquired by the first terminal.

Further, the estimated signal determining module 313 is specifically configured to calculate an estimated speech signal corresponding to the first speech signal according to the following speech signal estimation formula:

B(F，t)＝α(V)*V*S*f(F)A(F，t)，

wherein B (F, t) is an estimated speech signal corresponding to the first speech signal, α (V) is an empirical amplification factor, V is a system sound value of the second terminal, S is a microphone sensitivity of the second terminal, F (F) is a speaker frequency response curve of the second terminal, and a (F, t) is the first speech signal.

Further, the call state determining module 314 is specifically configured to determine the voice call state of the second terminal according to the first voice signal and the second voice signal by using a two-way detection algorithm.

Further, the first result generating module 315 is specifically configured to:

calculating a first residual signal corresponding to the second speech signal according to the following residual signal calculation formula:

e ₁ (F，t)＝C(F，t)-B(F，t)，

wherein e ₁ (F, t) being a first residual signal corresponding to the second speech signal, C (F, t) being the second speech signal, B (F, t) being the estimated speech signal;

calculating an integration area of the frequency-time coordinates such that the first residual signal amplitude is positive;

Calculating a frequency domain time domain area corresponding to the first residual signal;

calculating a variance of the first residual signal;

if the ratio of the integral area of the frequency-time coordinate enabling the first residual signal amplitude to be positive and the frequency domain time area corresponding to the first residual signal is greater than a first preset threshold value, and the variance of the first residual signal is smaller than a second preset threshold value, determining that the sound receiving detection result of the second terminal is sound receiving normal;

calculating an integration area of a frequency-time coordinate such that the first residual signal amplitude is negative;

determining an absolute value signal corresponding to the first residual signal, and calculating a mean value of the absolute value signal corresponding to the first residual signal;

and if the ratio of the integral area of the frequency-time coordinate with the negative amplitude of the first residual signal to the frequency domain time domain area corresponding to the first residual signal is larger than a third preset threshold value and the average value of the absolute value signal corresponding to the first residual signal is larger than a fourth preset threshold value, determining that the sound receiving detection result of the second terminal is too small, and sending the sound receiving detection result to the first terminal.

The embodiment of the invention provides a voice call quality detection device, which is used for judging whether the receiving party receives sound normally and gives corresponding prompts to speaking parties when no one speaks according to the estimated receiving party echo signal of a reference signal in the voice call process and comparing the estimated receiving party echo signal with the actual receiving party echo signal through an algorithm, so that the problem of receiving sound feedback between the speaking parties caused by abnormal receiving sound in the voice call process is avoided, the existing voice call quality detection mode is optimized, and the call efficiency is improved.

In an optional implementation manner of the embodiment of the present invention, the voice call quality detection apparatus further includes:

and the second result generating module is used for determining a second residual signal corresponding to the second voice signal according to the echo signal corresponding to the first voice signal estimated by the preset adaptive filter and the estimated voice signal if the voice call state of the second terminal is a someone call, and generating a sound receiving detection result of the second terminal according to the second residual signal.

Further, the second result generation module is specifically configured to:

calculating a second residual signal corresponding to the second speech signal according to the following residual signal calculation formula:

e ₂ (F，t)＝W(F，t)-B(F，t)

Wherein e ₂ (F, t) is a second residual signal corresponding to the second speech signal, W (F, t) is an echo signal corresponding to the first speech signal estimated according to a preset adaptive filter, and B (F, t) is the estimated speech signal;

calculating an integration area of the frequency-time coordinates such that the second residual signal amplitude is positive;

calculating a frequency domain time domain area corresponding to the second residual signal;

calculating a variance of the second residual signal;

if the ratio of the integral area of the frequency-time coordinate enabling the amplitude of the second residual signal to be positive to the frequency-time area corresponding to the second residual signal is greater than a fifth preset threshold value, and the variance of the second residual signal is smaller than a sixth preset threshold value, determining that the sound receiving detection result of the second terminal is sound receiving normal;

calculating an integration area of the frequency-time coordinates such that the second residual signal amplitude is negative;

determining an absolute value signal corresponding to the second residual signal, and calculating a mean value of the absolute value signal corresponding to the second residual signal;

and if the ratio of the integral area of the frequency-time coordinate with the negative amplitude of the second residual signal to the frequency domain time domain area corresponding to the second residual signal is greater than a seventh preset threshold value and the average value of the absolute value signals corresponding to the second residual signal is greater than an eighth preset threshold value, determining that the sound receiving detection result of the second terminal is too small, and sending the sound receiving detection result to the first terminal.

According to the voice call quality detection device provided by the embodiment, the echo signal of the receiving party in the voice call process is estimated according to the reference signal and compared with the echo signal of the actual receiving party through an algorithm, whether the receiving party receives sound normally or not is judged, and a corresponding prompt is given to a speaking party, so that the problem of receiving sound between the speaking parties caused by abnormal receiving sound in the voice call process is avoided, the existing voice call quality detection mode is optimized, and the call efficiency is improved.

The voice call quality detection device can execute the voice call quality detection method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the voice call quality detection method.

Example IV

Fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. Fig. 4 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in fig. 4 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

As shown in FIG. 4, the computer device 12 is in the form of a general purpose computing device. Components of computer device 12 may include, but are not limited to: one or more processors 16, a memory 28, a bus 18 that connects the various system components, including the memory 28 and the processor 16.

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

Memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard disk drive"). Although not shown in fig. 4, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.

The computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the computer device 12, and/or any devices (e.g., network card, modem, etc.) that enable the computer device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Moreover, computer device 12 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through network adapter 20. As shown, network adapter 20 communicates with other modules of computer device 12 via bus 18. It should be appreciated that although not shown in fig. 4, other hardware and/or software modules may be used in connection with computer device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

The processor 16 executes a program stored in the memory 28 to perform various functional applications and data processing, thereby implementing the voice call quality detection method provided by the embodiment of the present invention: in the voice communication process of a first terminal and a second terminal, acquiring a first voice signal acquired by the first terminal, and sending the first voice signal to the second terminal so that the second terminal plays the first voice signal; acquiring a second voice signal which is acquired by the second terminal and corresponds to the first voice signal; determining an estimated voice signal corresponding to the first voice signal according to the first voice signal and the voice call parameters of the second terminal; determining a voice call state of the second terminal according to the first voice signal and the second voice signal; if the voice call state of the second terminal is unmanned call, determining a first residual signal corresponding to the second voice signal according to the second voice signal and the estimated voice signal, and generating a sound receiving detection result of the second terminal according to the first residual signal.

Example five

The fifth embodiment of the present invention provides a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the voice call quality detection method provided by the embodiments of the present invention: in the voice communication process of a first terminal and a second terminal, acquiring a first voice signal acquired by the first terminal, and sending the first voice signal to the second terminal so that the second terminal plays the first voice signal; acquiring a second voice signal which is acquired by the second terminal and corresponds to the first voice signal; determining an estimated voice signal corresponding to the first voice signal according to the first voice signal and the voice call parameters of the second terminal; determining a voice call state of the second terminal according to the first voice signal and the second voice signal; if the voice call state of the second terminal is unmanned call, determining a first residual signal corresponding to the second voice signal according to the second voice signal and the estimated voice signal, and generating a sound receiving detection result of the second terminal according to the first residual signal.

Any combination of one or more computer readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or computer device. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A voice call quality detection method, comprising:

determining an estimated voice signal corresponding to the first voice signal according to the first voice signal, the voice call parameters of the second terminal and a preset voice signal estimation formula;

if the voice call state of the second terminal is unmanned call, determining a first residual signal corresponding to the second voice signal according to the second voice signal and the estimated voice signal, and generating a sound receiving detection result of the second terminal according to the first residual signal;

the determining, according to the first voice signal, the voice call parameter of the second terminal, and a preset voice signal estimation formula, an estimated voice signal corresponding to the first voice signal includes:

calculating an estimated speech signal corresponding to the first speech signal according to the following speech signal estimation formula:

B(F,t)＝α(V)*V*S*f(F)A(F,t)，

2. The method of claim 1, further comprising, after determining the voice call state of the second terminal from the first voice signal and the second voice signal:

If the voice call state of the second terminal is a person call, determining a second residual signal corresponding to the second voice signal according to an echo signal which is estimated by a preset adaptive filter and corresponds to the first voice signal and the estimated voice signal, and generating a receiving sound detection result of the second terminal according to the second residual signal.

3. The method of claim 1, wherein the determining the voice call state of the second terminal from the first voice signal and the second voice signal comprises:

and determining the voice call state of the second terminal according to the first voice signal and the second voice signal by using a double-talk detection algorithm.

4. The method of claim 1, wherein said determining a first residual signal corresponding to said second speech signal from said second speech signal and said estimated speech signal comprises:

e ₁ (F,t)＝C(F,t)-B(F,t)，

5. The method of claim 4, wherein generating the result of the second terminal's silence detection from the first residual signal comprises:

calculating a variance of the first residual signal;

and if the ratio of the integral area of the frequency-time coordinate enabling the amplitude of the first residual signal to be positive to the frequency-time area corresponding to the first residual signal is larger than a first preset threshold value and the variance of the first residual signal is smaller than a second preset threshold value, determining that the sound receiving detection result of the second terminal is sound receiving normal.

6. The method of claim 5, wherein generating the result of the second terminal's voice detection from the first residual signal further comprises:

7. The method of claim 2, wherein the determining a second residual signal corresponding to the second speech signal based on the echo signal corresponding to the first speech signal estimated by the preset adaptive filter and the estimated speech signal comprises:

e ₂ (F,t)＝W(F,t)-B(F,t)

8. The method of claim 7, wherein generating the second terminal's received sound detection result from the second residual signal comprises:

calculating a variance of the second residual signal;

and if the ratio of the integral area of the frequency-time coordinate enabling the amplitude of the second residual signal to be positive to the frequency-time area corresponding to the second residual signal is larger than a fifth preset threshold value, and the variance of the second residual signal is smaller than a sixth preset threshold value, determining that the sound receiving detection result of the second terminal is sound receiving normal.

9. The method of claim 8, wherein generating the second terminal's received sound detection result from the second residual signal further comprises:

10. A voice call quality detection apparatus, comprising:

the first result generation module is used for determining a first residual signal corresponding to the second voice signal according to the second voice signal and the estimated voice signal and generating a sound receiving detection result of the second terminal according to the first residual signal if the voice call state of the second terminal is unmanned call;

The estimation signal determining module is specifically configured to:

B(F,t)＝α(V)*V*S*f(F)A(F,t)，

11. The apparatus as recited in claim 10, further comprising:

12. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the voice call quality detection method according to any one of claims 1-9 when executing the computer program.

13. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a voice call quality detection method according to any of claims 1-9.