CN113824843A

CN113824843A - Voice call quality detection method, device, equipment and storage medium

Info

Publication number: CN113824843A
Application number: CN202010566405.7A
Authority: CN
Inventors: 王夏鸣
Original assignee: Volkswagen Mobvoi Beijing Information Technology Co Ltd
Current assignee: Volkswagen Mobvoi Beijing Information Technology Co Ltd
Priority date: 2020-06-19
Filing date: 2020-06-19
Publication date: 2021-12-21
Anticipated expiration: 2040-06-19
Also published as: CN113824843B

Abstract

The embodiment of the invention discloses a voice call quality detection method, a voice call quality detection device, voice call quality detection equipment and a storage medium. The method comprises the following steps: in the voice communication process between a first terminal and a second terminal, acquiring a first voice signal acquired by the first terminal and sending the first voice signal to the second terminal so that the second terminal plays the first voice signal; acquiring a second voice signal acquired by a second terminal; determining an estimated voice signal according to the first voice signal and the voice call parameter; determining the voice call state of the second terminal according to the first voice signal and the second voice signal; and if the voice call state is the unmanned call, determining a first residual signal according to the second voice signal and the estimated voice signal, and generating a voice receiving detection result of the second terminal. The embodiment of the invention can avoid the feedback obstacle of the voice receiving problem between the calling parties caused by abnormal voice receiving in the voice calling process, optimize the existing voice calling quality detection mode and improve the calling efficiency.

Description

Voice call quality detection method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to a voice call quality detection method, a voice call quality detection device, voice call quality detection equipment and a storage medium.

Background

In the current voice communication system, the system can be used as the basis for judging the communication quality by detecting the network speed, the packet loss rate and the like, so that the other party with normal signals is actively prompted to inform the other party that the normal communication can not be realized because of the abnormal network in the voice communication process. In actual use, however, there are still a large number of such scenarios: the speaking party needs to confirm whether the sound is normally received (whether the sound of the speaking party can be heard by the sound receiving party) and then start the conversation.

In the related art, the speaking party usually determines whether the receiving party confirms normal receiving according to the active oral feedback of the receiving party. In the process of communication, if the opposite party of the sound receiving party receives sound abnormally, the speaking party cannot know the sound directly and depends on the active oral feedback of the sound receiving party. However, at this time, the call exchange process itself is affected due to abnormal sound reception of the sound receiver, and the period of suspending the call to eliminate the abnormal call is long when the problem occurs and the sound receiver and the speaker both confirm the occurrence of the problem, thereby affecting the call efficiency.

Disclosure of Invention

The embodiment of the invention provides a voice call quality detection method, a voice call quality detection device, voice call quality detection equipment and a storage medium, which are used for optimizing the existing voice call quality detection mode, actively detecting abnormal sound reception and improving the call efficiency.

In a first aspect, an embodiment of the present invention provides a method for detecting voice call quality, including:

in the process of voice communication between a first terminal and a second terminal, acquiring a first voice signal acquired by the first terminal, and sending the first voice signal to the second terminal so as to enable the second terminal to play the first voice signal;

acquiring a second voice signal corresponding to the first voice signal and acquired by the second terminal;

determining an estimated voice signal corresponding to the first voice signal according to the first voice signal and the voice call parameter of the second terminal;

determining the voice call state of the second terminal according to the first voice signal and the second voice signal;

and if the voice call state of the second terminal is unmanned call, determining a first residual signal corresponding to the second voice signal according to the second voice signal and the estimated voice signal, and generating a voice reception detection result of the second terminal according to the first residual signal.

In a second aspect, an embodiment of the present invention further provides a device for detecting voice call quality, including:

the first signal acquisition module is used for acquiring a first voice signal acquired by a first terminal in the voice call process between the first terminal and a second terminal, and sending the first voice signal to the second terminal so as to enable the second terminal to play the first voice signal;

the second signal acquisition module is used for acquiring a second voice signal which is acquired by the second terminal and corresponds to the first voice signal;

the estimated signal determining module is used for determining an estimated voice signal corresponding to the first voice signal according to the first voice signal and the voice call parameter of the second terminal;

the call state determining module is used for determining the voice call state of the second terminal according to the first voice signal and the second voice signal;

and the first result generation module is used for determining a first residual signal corresponding to the second voice signal according to the second voice signal and the estimated voice signal and generating a voice receiving detection result of the second terminal according to the first residual signal if the voice call state of the second terminal is unmanned call.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the voice call quality detection method according to the embodiment of the present invention.

In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the voice call quality detection method according to the embodiment of the present invention.

According to the technical scheme of the embodiment of the invention, the echo signal of the sound receiving party in the voice communication process is estimated according to the reference signal and is compared with the echo signal of the actual sound receiving party through the algorithm, whether the sound receiving party receives the sound normally or not is judged, and the corresponding prompt is made for the speaking party, so that the sound receiving problem feedback obstacle between the speaking parties caused by abnormal sound receiving in the voice communication process is avoided, the existing voice communication quality detection mode is optimized, and the communication efficiency is improved.

Drawings

Fig. 1 is a flowchart of a voice call quality detection method according to an embodiment of the present invention.

Fig. 2 is a flowchart of a voice call quality detection method according to a second embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a voice call quality detection apparatus according to a third embodiment of the present invention.

Fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention.

It should be further noted that, for the convenience of description, only some but not all of the relevant aspects of the present invention are shown in the drawings. Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

Example one

Fig. 1 is a flowchart of a voice call quality detection method according to an embodiment of the present invention. The embodiment of the invention can be suitable for the situation that the feedback of the voice receiving problem between the calling parties is obstructed under the condition of abnormal voice receiving in the voice calling process, the method can be executed by the voice calling quality detection device provided by the embodiment of the invention, and the device can be realized by adopting a software and/or hardware mode and can be generally integrated in computer equipment. Such as a cloud server. As shown in fig. 1, the method of the embodiment of the present invention specifically includes:

step 101, in a voice communication process between a first terminal and a second terminal, acquiring a first voice signal acquired by the first terminal, and sending the first voice signal to the second terminal so that the second terminal plays the first voice signal.

In this embodiment, the first terminal is a terminal device of a speaking party in a voice call, and may be any electronic device having a voice call function, such as a mobile phone, a fixed-line telephone, and the like, and the number of the first terminals may be one or more, corresponding to a two-party call and a multi-party call; the second terminal is a terminal device at the receiving end in the voice call, and can be any electronic device with a voice call function, such as a mobile phone, a fixed-line telephone, and the like, and the number of the second terminals can be one or more, which corresponds to the situations of two-party call and multi-party call; the first voice signal is a voice signal which can be collected by the first terminal when the speaker starts speaking, and includes but is not limited to voice generated by speaking of the speaker and noise in the environment where the speaker is located; the device in the first terminal for acquiring the first voice signal may be a microphone, or may be any other voice signal input device that may be connected to the first terminal, without limitation; the device in the second terminal for playing the first voice signal may be a speaker, or may be any other voice signal output device that can be connected to the second terminal, without limitation.

And 102, acquiring a second voice signal which is acquired by the second terminal and corresponds to the first voice signal.

The second voice signal is a sound signal which can be collected by the second terminal when the second terminal speaker plays the first voice signal, and includes but is not limited to noise in the environment where the sound receiver is located, the sound generated by the second terminal speaker playing the first voice signal and the sound generated by the second terminal speaker playing the first voice signal after being reflected by a complex and changeable wall surface; the device in the second terminal for acquiring the second voice signal may be a microphone, or may be any other voice signal input device that may be connected to the second terminal, without limitation.

Step 103, determining an estimated voice signal corresponding to the first voice signal according to the first voice signal and the voice call parameter of the second terminal.

The voice call parameters include parameters that can be set by the calling party in the second terminal and parameters inherent to the device, specifically, but not limited to, a system volume value of the second terminal, a microphone sensitivity of the second terminal, and a speaker frequency response curve and a speaker power of the second terminal.

In this embodiment, optionally, the voice call parameter may be acquired in a cloud server before acquiring the first voice signal acquired by the first terminal.

The method for determining the estimated voice signal corresponding to the first voice signal may be to bring the voice call parameter into a formula corresponding to a correlation algorithm to obtain an estimated voice signal through calculation, and preferably, the following voice signal estimation formula is provided in this embodiment, and is used for calculating the estimated voice signal corresponding to the first voice signal:

B(F，t)＝α(V)*V*S*f(F)A(F，t)，

wherein, B (F, t) is an estimated voice signal corresponding to the first voice signal, V is a system volume value of the second terminal, α (V) is an empirical amplification coefficient and is a function of the system volume value V of the second terminal, and is a proportionality coefficient of the loudness of the second voice signal collected by the second terminal relative to the first voice signal collected by the first terminal under the condition that the system volume values of the second terminal are different, S is a microphone sensitivity of the second terminal, F (F) is a speaker response curve of the second terminal, and a (F, t) is the first voice signal.

And step 104, determining the voice call state of the second terminal according to the first voice signal and the second voice signal.

The voice call state of the second terminal can be determined to be an unmanned call if the voice call state of the second terminal does not exist simultaneously; the existing dual-talk detection algorithm includes an energy comparison method, a correlation comparison method, a dual-filter method, and the like, which is not limited in this embodiment.

And 105, if the voice call state of the second terminal is unmanned call, determining a first residual signal corresponding to the second voice signal according to the second voice signal and the estimated voice signal, and generating a voice reception detection result of the second terminal according to the first residual signal.

The first residual signal is a signal representing a difference between the second voice signal and the estimated voice signal when the voice call state of the second terminal is an unmanned call, and specifically may be a difference obtained by subtracting the estimated voice signal from the second voice signal.

Preferably, the present embodiment provides the following residual signal calculation formula for calculating the first residual signal corresponding to the second speech signal:

e₁(F，t)＝C(F，t)-B(F，t)，

wherein e is₁(F, t) is a first residual signal corresponding to the second speech signal, C (F, t) is the second speech signal, and B (F, t) is the estimated speech signal.

Preferably, the voice reception detection result of the second terminal includes normal voice reception and too small voice reception.

Optionally, the generating a sound receiving detection result of the second terminal according to the first residual signal includes: calculating an integral area of a frequency-time coordinate that makes the first residual signal amplitude positive; calculating the frequency domain time domain area corresponding to the first residual signal; calculating a variance of the first residual signal; and if the ratio of the integral area of the frequency-time coordinate with the positive amplitude of the first residual signal to the frequency-domain time-domain area corresponding to the first residual signal is larger than a first preset threshold value, and the variance of the first residual signal is smaller than a second preset threshold value, determining that the voice receiving detection result of the second terminal is normal voice receiving.

Optionally, the first preset threshold and the second preset threshold are preset empirical parameters, and may be adjusted according to the actual system performance of the voice call system. Illustratively, the first predetermined threshold is 90% and the second predetermined threshold is 6 db. Preferably, according to the first residual signal, the method for generating the voiced sound detection result of the second terminal includes calculating an integral area of a frequency-time coordinate, where the amplitude of the first residual signal is positive, by using formula × - [ dFdt ] according to the sign expression rule provided in this embodiment; calculating a frequency-domain time-domain area (F) corresponding to the first residual signal_max-F_min)t_maxWherein F is_maxIs the frequency maximum of the first residual signal, F_minIs the frequency minimum, t, of the first residual signal_maxIs the time maximum of the first residual signal; the variance of the first residual signal is calculated and may be expressed as s according to the sign indication rule provided in this embodiment²(e₁) (ii) a If the ratio of the integral area of the frequency-time coordinate with the positive amplitude of the first residual signal to the frequency-domain time-domain area corresponding to the first residual signal is greater than a first preset threshold value, and the variance of the first residual signal is smaller than a second preset threshold value, it means that the loudness of the second voice signal is large enough and stable enough to maintain normal conversation, and it is determined that the receiving detection result of the second terminal is normal receiving; wherein the first preset threshold and the second preset threshold are determined empirical parameters.

Optionally, the generating a sound receiving detection result of the second terminal according to the first residual signal further includes: calculating an integral area of a frequency-time coordinate that makes the first residual signal amplitude negative; determining an absolute value signal corresponding to the first residual signal and calculating a mean value of the absolute value signal corresponding to the first residual signal; and if the ratio of the integral area of the frequency-time coordinate making the amplitude of the first residual signal negative to the frequency-domain time-domain area corresponding to the first residual signal is larger than a third preset threshold value, and the mean value of the absolute value signal corresponding to the first residual signal is larger than a fourth preset threshold value, determining that the reception detection result of the second terminal is too small, and sending the reception detection result to the first terminal.

Optionally, the third preset threshold and the fourth preset threshold are preset empirical parameters, and may be adjusted according to the actual system performance of the voice call system. Illustratively, the third predetermined threshold is 90% and the fourth predetermined threshold is 20 db.

Preferably, according to the first residual signal, the method for generating the voiced sound detection result of the second terminal further includes calculating an integral area of a frequency-time coordinate that makes the amplitude of the first residual signal negative by using formula × - [ integral ] dFdt according to the sign expression rule provided in this embodiment; determining an absolute value signal corresponding to the first residual signal and calculating a mean value of the absolute value signal corresponding to the first residual signal

If the ratio of the integral area of the frequency-time coordinate making the amplitude of the first residual signal negative to the frequency-domain time-domain area corresponding to the first residual signal is larger than a third preset threshold value, and the mean value of the absolute value signal corresponding to the first residual signal is larger than a fourth preset threshold value, it means that the loudness of the second voice signal is small and is not enough to maintain normal conversation, it is determined that the reception detection result of the second terminal is too small, and the reception detection result is sent to the first terminal; wherein the third preset threshold and the fourth preset threshold are determined empirical parameters.

The embodiment of the invention provides a voice call quality detection method, which estimates echo signals of a receiving party in a voice call process according to reference signals, compares the estimated echo signals with the echo signals of an actual receiving party through an algorithm, judges whether the receiving party is normal when no person speaks and gives a corresponding prompt to the speaking party, avoids the feedback obstacle of the receiving problem between the speaking parties caused by abnormal receiving in the voice call process, optimizes the existing voice call quality detection mode and improves the call efficiency.

Example two

Fig. 2 is a flowchart of a voice call quality detection method according to a second embodiment of the present invention. In this embodiment of the present invention, after determining the voice call state of the second terminal according to the first voice signal and the second voice signal, the method may further include: and if the voice call state of the second terminal is the manned call, determining a second residual signal corresponding to the second voice signal according to an echo signal corresponding to the first voice signal and the estimated voice signal which are pre-estimated by a preset adaptive filter, and generating a voice receiving detection result of the second terminal according to the second residual signal.

As shown in fig. 2, the method of the embodiment of the present invention specifically includes:

step 201, in a voice communication process between a first terminal and a second terminal, acquiring a first voice signal acquired by the first terminal, and sending the first voice signal to the second terminal, so that the second terminal plays the first voice signal.

Step 202, acquiring a second voice signal corresponding to the first voice signal and acquired by the second terminal.

Step 203, determining an estimated voice signal corresponding to the first voice signal according to the first voice signal and the voice call parameter of the second terminal.

And 204, determining the voice call state of the second terminal according to the first voice signal and the second voice signal.

Step 205, if the voice call state of the second terminal is a person call, determining a second residual signal corresponding to the second voice signal according to an echo signal corresponding to the first voice signal and the estimated voice signal, which are pre-estimated by a preset adaptive filter, and generating a voice reception detection result of the second terminal according to the second residual signal.

In this embodiment, when the voice call state of the second terminal is a talk with someone, the preset adaptive filter may eliminate sound signals other than the echo signal corresponding to the first voice signal collected by the second terminal, so as to obtain an estimated echo signal corresponding to the first voice signal, for comparison with the estimated voice signal.

The second residual signal is a signal representing a difference between the estimated echo signal corresponding to the first voice signal and the estimated voice signal when the voice call state of the second terminal is a person call, and specifically may be a difference obtained by subtracting the estimated voice signal from the estimated echo signal corresponding to the first voice signal.

Preferably, the present embodiment provides the following residual signal calculation formula for calculating the second residual signal corresponding to the second speech signal:

e₂(F，t)＝W(F，t)-B(F，t)

wherein e is₂(F, t) is a second residual signal corresponding to the second speech signal, W (F, t) is an echo signal corresponding to the first speech signal estimated according to a preset adaptive filter, and B (F, t) is the estimated speech signal.

Optionally, the generating a sound receiving detection result of the second terminal according to the second residual signal includes: calculating an integral area of a frequency-time coordinate that makes the second residual signal amplitude positive; calculating the frequency domain time domain area corresponding to the second residual signal; calculating a variance of the second residual signal; and if the ratio of the integral area of the frequency-time coordinate with the positive amplitude of the second residual signal to the frequency-domain time-domain area corresponding to the second residual signal is greater than a fifth preset threshold, and the variance of the second residual signal is smaller than a sixth preset threshold, determining that the voice reception detection result of the second terminal is normal.

Optionally, the fifth preset threshold and the sixth preset threshold are preset empirical parameters, and may be adjusted according to the actual system performance of the voice call system. Illustratively, the fifth preset threshold is 90% and the sixth preset threshold is 6 db.

Preferably, according to the second residual signal, the method for generating the voiced sound detection result of the second terminal includes calculating an integral area of a frequency-time coordinate, where the amplitude of the second residual signal is positive, by using formula × - [ dFdt ] according to the sign expression rule provided in this embodiment; calculating a frequency-domain time-domain area (F) corresponding to the second residual signal_max-F_min)t_maxWherein F is_maxIs the frequency maximum of the second residual signal, F_minIs the frequency minimum, t, of the second residual signal_maxIs the time maximum of the second residual signal; the variance of the second residual signal is calculated and may be expressed as s according to the sign expression rule provided in this embodiment²(e₂) (ii) a If the ratio of the integral area of the frequency-time coordinate with the positive amplitude of the second residual signal to the frequency-domain time-domain area corresponding to the second residual signal is greater than a fifth preset threshold, and the variance of the second residual signal is smaller than a sixth preset threshold, it means that the loudness of the second voice signal is large enough and stable enough to maintain normal conversation, and it is determined that the receiving detection result of the second terminal is normal receiving; wherein the fifth preset threshold and the sixth preset threshold are determined empirical parameters.

Optionally, the generating a sound receiving detection result of the second terminal according to the second residual signal further includes: calculating an integral area of a frequency-time coordinate that makes the second residual signal amplitude negative; determining an absolute value signal corresponding to the second residual signal and calculating a mean value of the absolute value signal corresponding to the second residual signal; and if the ratio of the integral area of the frequency-time coordinate making the amplitude of the second residual signal negative to the frequency-domain time-domain area corresponding to the second residual signal is greater than a seventh preset threshold value, and the mean value of the absolute value signal corresponding to the second residual signal is greater than an eighth preset threshold value, determining that the reception detection result of the second terminal is too small, and sending the reception detection result to the first terminal.

Optionally, the seventh preset threshold and the eighth preset threshold are preset empirical parameters, and may be adjusted according to the actual system performance of the voice call system. Illustratively, the seventh preset threshold is 90% and the eighth preset threshold is 20 db.

Preferably, according to the second residual signal, the method for generating the voiced sound detection result of the second terminal further includes calculating an integral area of a frequency-time coordinate that makes the amplitude of the second residual signal negative by using formula × - [ integral ] dFdt according to the sign expression rule provided in this embodiment; determining an absolute value signal corresponding to the second residual signal and calculating a mean value of the absolute value signal corresponding to the second residual signal

If the ratio of the integral area of the frequency-time coordinate making the amplitude of the second residual signal negative to the frequency-domain time-domain area corresponding to the second residual signal is greater than a seventh preset threshold, and the mean value of the absolute value signal corresponding to the second residual signal is greater than an eighth preset threshold, it means that the loudness of the second voice signal is small and is not enough to maintain normal conversation, it is determined that the reception detection result of the second terminal is too small, and the reception detection result is sent to the first terminal; wherein the seventh preset threshold and the eighth preset threshold are determined empirical parameters.

The embodiment of the invention provides a voice call quality detection method, which is characterized in that echo signals of a receiving party in a voice call process are estimated according to reference signals and are compared with echo signals of an actual receiving party through an algorithm, whether the receiving party receives the voice normally or not is judged, and a corresponding prompt is made for the speaking party, so that the feedback obstacle of the receiving problem between the calling parties caused by abnormal receiving in the voice call process is avoided, the existing voice call quality detection mode is optimized, and the call efficiency is improved.

EXAMPLE III

Fig. 3 is a schematic structural diagram of a voice call quality detection apparatus according to a third embodiment of the present invention, and as shown in fig. 3, the apparatus includes:

the first signal obtaining module 311 is configured to obtain a first voice signal collected by a first terminal during a voice call between the first terminal and a second terminal, and send the first voice signal to the second terminal, so that the second terminal plays the first voice signal;

a second signal obtaining module 312, configured to obtain a second voice signal corresponding to the first voice signal, where the second voice signal is collected by the second terminal;

an estimated signal determining module 313, configured to determine, according to the first voice signal and the voice call parameter of the second terminal, an estimated voice signal corresponding to the first voice signal;

a call state determining module 314, configured to determine a voice call state of the second terminal according to the first voice signal and the second voice signal;

a first result generating module 315, configured to determine, according to the second voice signal and the estimated voice signal, a first residual signal corresponding to the second voice signal if the voice call state of the second terminal is an unattended call, and generate a voice reception detection result of the second terminal according to the first residual signal.

Optionally, the voice call quality detection apparatus may further include a call parameter obtaining module, configured to obtain voice call parameters of the first terminal and the second terminal before obtaining the first voice signal collected by the first terminal.

Further, the estimated signal determining module 313 is specifically configured to calculate the estimated speech signal corresponding to the first speech signal according to the following speech signal estimation formula:

B(F，t)＝α(V)*V*S*f(F)A(F，t)，

wherein B (F, t) is an estimated speech signal corresponding to the first speech signal, α (V) is an empirical amplification factor, V is a system volume value of the second terminal, S is a microphone sensitivity of the second terminal, F (F) is a speaker frequency response curve of the second terminal, and a (F, t) is the first speech signal.

Further, the call state determining module 314 is specifically configured to determine the voice call state of the second terminal according to the first voice signal and the second voice signal by using a dual-talk detection algorithm.

Further, the first result generating module 315 is specifically configured to:

calculating a first residual signal corresponding to the second speech signal according to the following residual signal calculation formula:

e₁(F，t)＝C(F，t)-B(F，t)，

wherein e is₁(F, t) is a first residual signal corresponding to the second speech signal, C (F, t) is the second speech signal, B (F, t) is the estimated speech signal;

calculating an integral area of a frequency-time coordinate that makes the first residual signal amplitude positive;

calculating the frequency domain time domain area corresponding to the first residual signal;

calculating a variance of the first residual signal;

if the ratio of the integral area of the frequency-time coordinate with the positive amplitude of the first residual signal to the frequency-domain time-domain area corresponding to the first residual signal is larger than a first preset threshold value, and the variance of the first residual signal is smaller than a second preset threshold value, determining that the voice receiving detection result of the second terminal is normal;

calculating an integral area of a frequency-time coordinate that makes the first residual signal amplitude negative;

determining an absolute value signal corresponding to the first residual signal and calculating a mean value of the absolute value signal corresponding to the first residual signal;

and if the ratio of the integral area of the frequency-time coordinate making the amplitude of the first residual signal negative to the frequency-domain time-domain area corresponding to the first residual signal is larger than a third preset threshold value, and the mean value of the absolute value signal corresponding to the first residual signal is larger than a fourth preset threshold value, determining that the reception detection result of the second terminal is too small, and sending the reception detection result to the first terminal.

The embodiment of the invention provides a voice call quality detection device, which estimates echo signals of a receiving party in a voice call process according to reference signals, compares the estimated echo signals with the echo signals of an actual receiving party through an algorithm, judges whether the receiving party is normal when no person speaks and gives a corresponding prompt to the speaking party, avoids the feedback obstacle of the receiving problem between the speaking parties caused by abnormal receiving in the voice call process, optimizes the existing voice call quality detection mode and improves the call efficiency.

In an optional implementation manner of the embodiment of the present invention, the apparatus for detecting voice call quality further includes:

and the second result generation module is used for determining a second residual signal corresponding to the second voice signal according to an echo signal which is estimated by a preset adaptive filter and corresponds to the first voice signal and the estimated voice signal if the voice call state of the second terminal is a person call, and generating a voice receiving detection result of the second terminal according to the second residual signal.

Further, the second result generation module is specifically configured to:

calculating a second residual signal corresponding to the second speech signal according to the following residual signal calculation formula:

e₂(F，t)＝W(F，t)-B(F，t)

wherein e is₂(F, t) is the second voice signalTwo residual signals, wherein W (F, t) is an echo signal corresponding to the first voice signal estimated according to a preset adaptive filter, and B (F, t) is the estimated voice signal;

calculating an integral area of a frequency-time coordinate that makes the second residual signal amplitude positive;

calculating the frequency domain time domain area corresponding to the second residual signal;

calculating a variance of the second residual signal;

if the ratio of the integral area of the frequency-time coordinate with the positive amplitude of the second residual signal to the frequency-domain time-domain area corresponding to the second residual signal is greater than a fifth preset threshold, and the variance of the second residual signal is less than a sixth preset threshold, determining that the voice reception detection result of the second terminal is normal;

calculating an integral area of a frequency-time coordinate that makes the second residual signal amplitude negative;

determining an absolute value signal corresponding to the second residual signal and calculating a mean value of the absolute value signal corresponding to the second residual signal;

and if the ratio of the integral area of the frequency-time coordinate making the amplitude of the second residual signal negative to the frequency-domain time-domain area corresponding to the second residual signal is greater than a seventh preset threshold value, and the mean value of the absolute value signal corresponding to the second residual signal is greater than an eighth preset threshold value, determining that the reception detection result of the second terminal is too small, and sending the reception detection result to the first terminal.

In the voice call quality detection device provided in this embodiment, the echo signal of the receiving party in the voice call process is estimated according to the reference signal and compared with the echo signal of the actual receiving party through the algorithm, whether the receiving party receives the voice normally or not is judged, and a corresponding prompt is made to the speaking party, so that the feedback obstacle of the receiving problem between the speaking parties caused by abnormal receiving in the voice call process is avoided, the existing voice call quality detection mode is optimized, and the call efficiency is improved.

The voice call quality detection device can execute the voice call quality detection method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the voice call quality detection method.

Example four

Fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention. FIG. 4 illustrates a block diagram of an exemplary computer device 12 suitable for use in implementing embodiments of the present invention. The computer device 12 shown in FIG. 4 is only one example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention.

As shown in FIG. 4, computer device 12 is in the form of a general purpose computing device. The components of computer device 12 may include, but are not limited to: one or more processors 16, a memory 28, and a bus 18 that connects the various system components (including the memory 28 and the processors 16).

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.

The memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)30 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.

Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, computer device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via network adapter 20. As shown, network adapter 20 communicates with the other modules of computer device 12 via bus 18. It should be appreciated that although not shown in FIG. 4, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

The processor 16 executes various functional applications and data processing by running the program stored in the memory 28, so as to implement the voice call quality detection method provided by the embodiment of the present invention: in the process of voice communication between a first terminal and a second terminal, acquiring a first voice signal acquired by the first terminal, and sending the first voice signal to the second terminal so as to enable the second terminal to play the first voice signal; acquiring a second voice signal corresponding to the first voice signal and acquired by the second terminal; determining an estimated voice signal corresponding to the first voice signal according to the first voice signal and the voice call parameter of the second terminal; determining the voice call state of the second terminal according to the first voice signal and the second voice signal; and if the voice call state of the second terminal is unmanned call, determining a first residual signal corresponding to the second voice signal according to the second voice signal and the estimated voice signal, and generating a voice reception detection result of the second terminal according to the first residual signal.

EXAMPLE five

Fifth embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where when the computer program is executed by a processor, the method for detecting voice call quality provided in the fifth embodiment of the present invention is implemented: in the process of voice communication between a first terminal and a second terminal, acquiring a first voice signal acquired by the first terminal, and sending the first voice signal to the second terminal so as to enable the second terminal to play the first voice signal; acquiring a second voice signal corresponding to the first voice signal and acquired by the second terminal; determining an estimated voice signal corresponding to the first voice signal according to the first voice signal and the voice call parameter of the second terminal; determining the voice call state of the second terminal according to the first voice signal and the second voice signal; and if the voice call state of the second terminal is unmanned call, determining a first residual signal corresponding to the second voice signal according to the second voice signal and the estimated voice signal, and generating a voice reception detection result of the second terminal according to the first residual signal.

Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or computer device. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A voice call quality detection method is characterized by comprising the following steps:

2. The method of claim 1, further comprising, after determining the voice call state of the second terminal based on the first voice signal and the second voice signal:

and if the voice call state of the second terminal is the manned call, determining a second residual signal corresponding to the second voice signal according to an echo signal corresponding to the first voice signal and the estimated voice signal which are pre-estimated by a preset adaptive filter, and generating a voice receiving detection result of the second terminal according to the second residual signal.

3. The method of claim 1, wherein determining an estimated voice signal corresponding to the first voice signal according to the first voice signal, the voice call parameter of the second terminal, and a preset voice signal estimation formula comprises:

calculating an estimated speech signal corresponding to the first speech signal according to the following speech signal estimation formula:

B(F,t)＝α(V)*V*S*f(F)A(F,t)，

4. The method of claim 1, wherein the determining the voice call state of the second terminal according to the first voice signal and the second voice signal comprises:

and determining the voice call state of the second terminal according to the first voice signal and the second voice signal by using a double-talk detection algorithm.

5. The method of claim 1, wherein determining a first residual signal corresponding to the second speech signal based on the second speech signal and the estimated speech signal comprises:

e₁(F,t)＝C(F,t)-B(F,t)，

6. The method according to claim 5, wherein generating the voiced detection result of the second terminal according to the first residual signal comprises:

calculating a variance of the first residual signal;

and if the ratio of the integral area of the frequency-time coordinate with the positive amplitude of the first residual signal to the frequency-domain time-domain area corresponding to the first residual signal is larger than a first preset threshold value, and the variance of the first residual signal is smaller than a second preset threshold value, determining that the voice receiving detection result of the second terminal is normal voice receiving.

7. The method of claim 6, wherein generating the voicing detection result for the second terminal based on the first residual signal further comprises:

8. The method of claim 2, wherein determining a second residual signal corresponding to the second speech signal according to the echo signal corresponding to the first speech signal and the estimated speech signal estimated by the preset adaptive filter comprises:

e₂(F,t)＝W(F,t)-B(F,t)

9. The method according to claim 8, wherein generating the voiced detection result of the second terminal according to the second residual signal comprises:

calculating a variance of the second residual signal;

and if the ratio of the integral area of the frequency-time coordinate with the positive amplitude of the second residual signal to the frequency-domain time-domain area corresponding to the second residual signal is greater than a fifth preset threshold, and the variance of the second residual signal is smaller than a sixth preset threshold, determining that the voice reception detection result of the second terminal is normal.

10. The method of claim 9, wherein generating the voicing detection result for the second terminal based on the second residual signal further comprises:

11. A voice call quality detection apparatus, comprising:

12. The apparatus of claim 11, further comprising:

13. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the voice call quality detection method according to any one of claims 1 to 10 when executing the computer program.

14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a voice call quality detection method according to any one of claims 1 to 10.