WO2018133247A1

WO2018133247A1 - Abnormal sound detection method and apparatus

Info

Publication number: WO2018133247A1
Application number: PCT/CN2017/082415
Authority: WO
Inventors: 马骅; 吴元友; 仇存收; 孙建华
Original assignee: 华为技术有限公司
Priority date: 2017-01-20
Filing date: 2017-04-28
Publication date: 2018-07-26
Also published as: CN108605191B; CN108605191A

Abstract

An abnormal sound detection method and apparatus (02). The method comprises: acquiring a first speech signal played by a sound output device (03) of a terminal device (01), the first speech signal being locally stored in the terminal device (01), and the first speech signal comprising audio information with an irregularly changing frequency (101, 301, 401, 501); obtaining a residual signal according to a reference speech signal acquired in advance and the first speech signal, the residual signal comprising a part of the first speech signal that is different from the reference speech signal (102, 304, 404); and determining, according to the residual signal, whether there is an abnormal sound in the first speech signal so as to determine whether the sound output device is abnormal (103). The speech signal represents a real use scenario of a user, and frequency points are triggered repeatedly together in an actual frequency band of the speech during an entire playing process of the speech signal, facilitating discovery of a problematic frequency point. The speech signal itself represents actual frequency points to undergo detection, and the probability of missing a problematic frequency point is greatly reduced. The detection method is convenient and universal, and has an accurate detection result.

Description

Unvoiced detection method and device

Related application cross-reference

The present application claims priority to Chinese Patent Application No. JP-A No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. in.

Technical field

The present application relates to the field of terminal technologies, and in particular, to an abnormal sound detection method and apparatus.

Background technique

With the development of terminal technologies, various types of terminals have been widely used in people's lives, such as smart phones, computers, earphones, smart watches, and the like. A sound output device is generally provided in the terminal, and the sound output device includes, for example, a speaker, a receiver, etc., and the terminal needs to play an audio signal by using the sound output device. The sound output device may cause an abnormal sound when playing an audio signal due to various reasons such as design defects, assembly defects, and foreign matter entering. Therefore, before the terminal sells, it is necessary to detect the sound output device on the terminal, and detect whether the sound output device has an abnormal sound when playing the audio signal.

In the prior art, the sound output device to be detected is used to play the frequency sweep signal, and then the detection system records the frequency sweep signal played by the sound output device to be detected, and then calculates the high frequency of each frequency band on the frequency sweep signal. Harmonic distortion energy, and then determine whether the high-order harmonic distortion energy of each frequency band exceeds the energy threshold of each frequency band. When determining that the high-order harmonic distortion energy of one frequency band exceeds the energy threshold of the frequency band, or when determining that the high-order harmonic distortion energy of the multiple frequency bands exceeds the energy threshold of the respective frequency band, It can be determined that the sound output device to be detected has an abnormal sound, thereby determining that the sound output device to be detected is abnormal.

However, in the prior art, since the frequency sweep signal is in a certain frequency band, the frequency is from high to low, or the frequency is from monotonous change to low frequency, each frequency point in the frequency sweep signal lasts for a short time. Then, when a certain frequency point has not yet ignited a relatively high harmonic energy, the next frequency point is scanned, and the problem that may occur at the frequency point is not detected. Also, when the sound output device is actually used, it is unlikely that only a simple audio signal such as a swept signal will be played. Therefore, in the prior art, the abnormal sound in the frequency sweeping signal played by the sound output device to be detected cannot be accurately detected, and it is impossible to accurately detect whether the sound output device to be detected is abnormal, and the existing detection method does not accurate.

Summary of the invention

The present invention provides an abnormal sound detecting method and apparatus for solving the problem that whether the sound output device to be detected in the prior art detects abnormal sound when playing an audio signal is inaccurate, and the sound output device to be detected cannot be accurately detected. Is it an abnormal problem?

In a first aspect, the present application provides an abnormal sound detecting method, including: acquiring a sound output device of a terminal device And playing the first voice signal, the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information with irregular frequency changes; according to the pre-acquired voice reference signal and the first voice signal Obtaining a residual signal, wherein the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; and then determining, according to the residual signal, whether the first voice signal has an abnormal sound, and further Determine if the sound output device is abnormal.

In a possible design, determining whether the first voice signal has an abnormal sound according to the residual signal comprises: determining an energy value of the residual signal; and determining, according to the calculated energy value, the first voice signal Whether it has an abnormal sound.

In a possible design, determining the energy value of the residual signal includes: removing the voice main band energy in the residual signal, thereby obtaining a residual signal with the voice main band energy removed, wherein In the process of removing the energy of the main band of the voice, the frequency of the removed main energy of the speech band is set to be smaller than the first frequency value; and then the energy value of the residual signal except the energy of the main band of the speech is determined.

In a possible design, determining the energy value of the residual signal except for the energy of the main band of the voice includes: determining a portion of the residual signal other than the energy of the main band of the speech that is greater than the second frequency value, and then Then calculate the energy value of the part in each frame. Corresponding to here, according to the energy value, determining whether the first voice signal has an abnormal sound includes the following process:

Determining whether an energy value having a preset number of energy values in each frame is smaller than a first energy threshold corresponding to the energy value;

If it is determined that the energy value of each frame is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has an abnormal sound and determining the sound The output device is abnormal;

If it is determined that the energy value of each frame is smaller than the first energy threshold corresponding to the energy value, it may be determined that the first voice signal does not have an abnormal sound, and the sound is determined. The output device is normal.

Or, in a possible design, determining an energy value of the residual signal except for the energy of the main band of the voice, comprising: determining a portion of the residual signal excluding the energy of the main band of the speech that is greater than the second frequency value Then, calculate the energy value of the part in each frame; then calculate the energy maximum value, which is the largest value among the energy values of each frame. Corresponding to here, according to the energy value, determining whether the first voice signal has an abnormal sound includes the following process:

Determining whether the energy maximum is greater than or equal to the second energy threshold;

If it is determined that the energy maximum value is greater than or equal to the second energy threshold, it may be determined that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal;

If it is determined that the energy maximum is less than the second energy threshold, it may be determined that the first voice signal does not have an abnormal sound, and it is determined that the sound output device is normal.

In a possible design, before obtaining a residual signal according to the pre-acquired voice reference signal and the first voice signal, the method further includes: acquiring a second voice signal played by at least one other sound output device, each The other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal; and then the second voice signal is superimposed and processed to generate the above Voice reference signal.

In a possible design, before obtaining a residual signal according to the pre-acquired voice reference signal and the first voice signal, the method further includes: delaying the first voice signal and the voice reference signal in a time domain. Processing, generating a first speech signal after aligning the speech reference signal.

The technical solution provided by the embodiment of the present disclosure may include the following beneficial effects: obtaining the first voice signal played by the sound output device of the terminal device, wherein the first voice signal is locally stored in the terminal device, and the first voice is The signal includes audio information with irregular frequency changes; and the residual signal is obtained according to the pre-acquired voice reference signal and the first voice signal, wherein the residual signal is different from the signal of the voice reference signal in the first voice signal. And determining, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal. There is thus provided a way of detecting whether an abnormal sound occurs when the sound output device plays audio to determine whether the sound output device is abnormal. Since the detected signal to be detected is a voice signal, the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency. The abnormality of the point; and, in the present application, the speech signal itself represents the real frequency point to be detected, so the possibility of missing the problem frequency point is much smaller, and it is advantageous to detect the frequency point with abnormal sound. At the same time, the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise. And detecting that a portion of the first voice signal different from the signal of the voice reference signal is used to determine whether there is an abnormal sound in the first voice signal, and the first voice signal is the same as the voice content of the voice reference signal. The detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.

In a second aspect, the present application provides an abnormal sound detecting apparatus, including:

An acquiring unit, configured to acquire a first voice signal that is played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information with irregular frequency changes;

a calculating unit, configured to obtain, according to the pre-acquired voice reference signal and the first voice signal, a residual signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal;

And a determining unit, configured to determine, according to the residual signal, whether the first voice signal has an abnormal sound, thereby determining whether the sound output device is abnormal.

In a possible design, the determining unit includes: a first determining module, configured to determine an energy value of the residual signal; and a second determining module, configured to determine, according to the calculated energy value, the first voice signal Whether it has an abnormal sound.

In a possible design, the first determining module comprises:

Removing the sub-module for removing the main energy of the speech in the residual signal, thereby obtaining a residual signal from which the energy of the main band of the speech is removed, wherein the removed signal is set in the process of removing the energy of the main band of the speech The frequency of the main energy of the speech is less than the first frequency value;

A sub-module is determined to determine the energy value of the residual signal in addition to the energy of the speech main band.

In a possible design, the determining sub-module is specifically configured to: determine a portion of the residual signal excluding the energy of the main energy band of the voice that is greater than the second frequency value, and then calculate the portion on each frame. Energy value. Corresponding to here, the second determining module is specifically used for:

Or, in a possible design, determining a sub-module, specifically: determining to determine a portion of the residual signal except the energy of the main energy band of the voice that is greater than the second frequency value, and then calculating the portion in each The energy value on one frame; then the energy maximum is calculated, which is the largest of the energy values of each frame. Corresponding to here, the second determining module is specifically used for:

In one possible design, the device further includes:

a generating unit, configured to acquire, by the computing unit, a second voice signal played by at least one other sound output device before obtaining a residual signal according to the pre-acquired voice reference signal and the first voice signal, each of the other sound output devices In order to play the sound output device with normal sound, the voice content in the second voice signal is the same as the voice content in the first voice signal; then the second voice signal is subjected to signal superposition processing to generate the voice reference signal.

In one possible design, the device further includes:

And an aligning unit, configured to perform delay alignment processing on the first speech signal and the voice reference signal in the time domain before the calculating unit obtains a residual signal according to the pre-acquired voice reference signal and the first voice signal, to generate a The first speech signal after the speech reference signal is aligned.

In a third aspect, the present application provides a computer program for performing the method of the above first aspect when executed by a processor.

In a fourth aspect, the application provides a program product, such as a computer readable storage medium, comprising the program of the third aspect.

In a fifth aspect, a computer program product comprising instructions for causing a computer to perform the methods of the above aspects when run on a computer is provided.

It can be seen that, in the foregoing third aspect, the fourth aspect, and the fifth aspect, the first voice signal played by the sound output device of the terminal device is obtained, wherein the first voice signal is locally stored in the terminal device, and the The first voice signal includes audio information whose frequency is irregularly changed; and the residual signal is obtained according to the previously obtained voice reference signal and the first voice signal, wherein the residual signal is the first voice signal and the voice reference signal a portion of the signal; determining whether the first speech signal has an abnormal sound based on the residual signal to determine whether the sound output device is abnormal. There is thus provided a way of detecting whether an abnormal sound occurs when the sound output device plays audio to determine whether the sound output device is abnormal. Since the detected signal to be detected is a voice signal, the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency. The abnormality of the point; and, in the present application, the speech signal itself represents the real frequency point to be detected, so the possibility of missing the problem frequency point is much smaller, and it is advantageous to detect the frequency point with abnormal sound. At the same time, the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise. And detecting that a portion of the first voice signal different from the signal of the voice reference signal is used to determine whether there is an abnormal sound in the first voice signal, and the first voice signal is the same as the voice content of the voice reference signal. The detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.

DRAWINGS

FIG. 1 is a schematic diagram 1 of an application scenario according to an embodiment of the present disclosure;

2 is a schematic flowchart 1 of a method for detecting an abnormal sound according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an adaptive filtering method used in an abnormal sound detecting method according to an embodiment of the present disclosure;

4 is a schematic flowchart 2 of a method for detecting an abnormal sound according to an embodiment of the present application;

FIG. 5 is a schematic flowchart diagram of still another abnormal sound detecting method according to an embodiment of the present application;

6 is an energy curve diagram of still another abnormal sound detecting method according to an embodiment of the present application;

FIG. 7 is a schematic flowchart diagram of another method for detecting an abnormal sound according to an embodiment of the present application;

FIG. 8 is a schematic flowchart diagram of still another abnormal sound detecting method according to an embodiment of the present application;

FIG. 9 is a schematic structural diagram of an abnormal sound detecting apparatus according to an embodiment of the present application;

FIG. 10 is a schematic structural diagram of still another abnormal sound detecting apparatus according to an embodiment of the present application;

FIG. 11 is a schematic structural diagram of another abnormal sound detecting apparatus according to an embodiment of the present application.

detailed description

The embodiments of the present application are applied to either the abnormal sound detecting device, or the audio detecting system, or any system that can perform the embodiments of the present application. Some of the terms in the present application are explained below to facilitate understanding by those skilled in the art. It should be noted that when the solution of the embodiment of the present application is applied to an audio detection system or can be executed in any system of the embodiment of the present application, the names of the audio detection system and the abnormal sound detection device may change, but this is not The implementation of the solution of the embodiment of the present application is affected.

1) A terminal device, also referred to as a terminal or user device, is a device that provides voice and/or data connectivity to a user, for example, a handheld device having a wireless connection function, an in-vehicle device, and the like. Common terminal devices include, for example, a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a mobile internet device (MID), and a wearable device. The wearable device includes, for example, a smart watch, a smart wristband, and a step counter. And so on.

2) A sound output device, which is a device that can play an audio signal, for example, a speaker or a receiver; the sound output device can be disposed on the terminal device.

3) "Multiple" means two or more, and other quantifiers are similar. "and/or", describing the association relationship of the associated objects, indicating that there may be three relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately. The character "/" generally indicates that the contextual object is an "or" relationship.

FIG. 1 is a schematic diagram 1 of an application scenario provided by an embodiment of the present application. As shown in FIG. 1 , the embodiment of the present application needs to use the terminal device 01 and the abnormal sound detecting device 02. In the terminal device 01, a sound output device 03 is provided, and the sound output device 03 can play an audio signal. As shown in FIG. 1, the sound output device 03 on the terminal device 01 plays an audio signal, and the abnormal sound detecting device 02 acquires the played audio signal played by the sound output device 03 on the end device 01, and then the abnormal sound detecting device 02 performs The solution carried out by the embodiment of the present application.

The terminal device in the embodiment of the present application may refer to an access terminal, a user terminal, a terminal, a wireless communication device, a user agent, a user device, or the like. Among them, the user terminal has, for example, a smart phone, a smart watch, a personal computer, and the like.

The sound output device in the implementation of the present application may be a speaker, a receiver, etc., and the sound output device in the implementation of the present application may be disposed on the terminal device in the embodiment of the present application.

FIG. 2 is a schematic flowchart 1 of a method for detecting an abnormal sound according to an embodiment of the present application. As shown in Figure 2, the method includes:

S101. Acquire a first voice signal played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly.

In the present embodiment, the description will be made with the execution subject being an abnormal sound detecting device. The sound output device of the terminal device plays the first voice signal, and then the noise detecting device can acquire the first voice signal played by the sound output device.

In the present application, the manner in which the abnormal sound detecting device acquires the first voice signal played by the sound output device is: the voice has been pre-stored in the terminal device, and the sound output device of the terminal device can be stored according to the voice stored locally by the terminal device. The first voice signal is played; then, the abnormal sound detecting device can take the first voice signal.

In the present application, the first voice signal may be the voice of the "first aid center dial 120" voice of the female voice at 112. For example, the sound output device plays the voice stored locally in the terminal device "Please dial 120 for the emergency center." In this application, the voice of the female voice can be used, because the female voice is still higher than the male voice, the fundamental frequency is higher, and the coverage of the frequency band is larger; the frequency energy distribution of the female voice on the time axis is more diverse. .

Compared with the prior art, the signal difference between the frequency sweep signal and the voice signal is large. Specifically, first, the signal to be detected used in the prior art is a frequency sweep signal, and the frequency sweep signal is a process in which a frequency changes from high to low, or a frequency changes from low to high. Each frequency point in the frequency signal lasts for a short time; in turn, when a certain frequency point has not yet excited the higher harmonic energy, the next frequency point is scanned. Problems that may occur at the frequency point are not detected; voice signals are used in this application. As the signal to be detected, because the voice signal can represent the real use scenario of the user, the present application can obtain the first voice signal played by the sound output device, and the first voice signal has audio information with irregular frequency change, the first voice The duration of each frequency point in the signal is variable, and the frequency variation in the first speech signal is variability, and the entire playback process of the first speech signal is repeatedly triggered in the actual frequency band concentrated in the speech, and further It is good for finding anomalies with problematic frequencies. At the same time, in the case of abnormal distortion, the abnormal sound is usually generated at a very narrow individual resonance frequency; whereas in the prior art, when the frequency sweep signal is used as the signal to be detected, since the frequency of the frequency sweep signal is Discrete step sweep, each frequency point is not continuous, and it is very likely that the true problem frequency will be missed during the scanning process; however, the speech signal itself in this application represents the real frequency point to be detected, so it is missed. The probability of having a problem frequency is much smaller, which is good for detecting frequencies with abnormal sounds.

S102. Obtain a residual signal according to the pre-acquired voice reference signal and the first voice signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal.

In this embodiment, the abnormal sound detecting device has previously acquired the voice reference signal, wherein the voice content of the voice reference signal is the same as the voice content of the first voice signal. For example, the voice content of the first voice signal is "Hello, please dial 00", and the voice content of the voice reference signal is also "Hello, please dial 00".

The abnormal sound detecting device needs to adopt a voice reference signal, and the first voice signal to be detected is subjected to adaptive filtering processing to remove a portion of the first voice signal to be detected that is consistent with the signal of the voice reference signal, and retain the first to be detected. A portion of the voice signal that is different from the signal of the voice reference signal, and thus "the portion of the first voice signal to be detected that is different from the signal of the reference signal that remains is" is a residual signal. Alternatively, the abnormal sound detecting device may also adopt another filtering processing method, and perform filtering processing on the first voice signal to be detected according to the voice reference signal to obtain a residual signal.

The residual signal may include a portion of the first voice signal that is different from the signal of the voice reference signal; and at the same time, the residual signal may also include some signal information of the first voice signal, or a residual signal. It is also possible to include some signal information of the voice reference signal.

For example, when an adaptive filtering method is used to obtain a residual signal. FIG. 3 is a schematic diagram of an adaptive filtering method used in an abnormal sound detecting method according to an embodiment of the present invention. As shown in FIG. 3, in combination with the present application, x is a first voice signal, and d is a voice reference signal, e Is the residual signal. The idea of adaptive filtering is to constantly adjust the value of e by some criterion, so that the filtered x value (ie, y value) is close to the value of the speech reference signal d. Specifically, x(j) represents the value of the input first speech signal at time j, y(j) represents the value of the filtered first speech signal at the j-time, and d(j) represents the j-time. For the speech reference signal, the residual signal e(j) is the difference between d(j) and y(j); the filtering parameter of the adaptive filter is controlled by the value of the residual signal e(j), and the filtering parameter is based on e The value of (j) is automatically adjusted so that it is suitable for the value of y(j) output at the next moment to be closer to the value of the desired speech reference signal d(j).

S103. Determine, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.

In this embodiment, the abnormal sound detecting device analyzes whether the obtained residual signal has an abnormal signal, and further determines whether the first voice signal has an abnormal sound. When it is determined that the first speech signal has an abnormal sound, it is determined that the sound output device is abnormal; when it is determined that the first speech signal does not have an abnormal sound, it is determined that the sound output device is normal.

FIG. 4 is a schematic flowchart 2 of a method for detecting an abnormal sound according to an embodiment of the present application. As shown in Figure 4, the process includes:

S201. The abnormal sound detecting device starts the recording function of the abnormal sound detecting device.

In the present embodiment, the abnormal sound detecting means activates its own recording function.

S202. The sound output device of the terminal device plays the first voice signal, and the abnormal sound detecting device acquires the first voice signal played by the sound output device of the terminal device, wherein the first voice signal is locally stored in the terminal device.

In this embodiment, the voice is pre-stored in the terminal device, and the sound output device of the terminal device can play the first voice signal according to the voice stored locally by the terminal device; then, the abnormal sound detecting device can take the first voice. voice signal. The process of this step can be referred to step S101 provided in FIG. 2, and the principle and process are the same as step S101.

S203. The abnormal sound detecting device saves the first voice signal.

In this embodiment, the abnormal sound detecting means holds the first voice signal that has been recorded.

S204. The abnormal sound detecting device acquires a voice reference signal.

In this embodiment, the abnormal sound detecting device acquires a voice reference signal, wherein the voice content of the voice reference signal is the same as the voice content of the first voice signal.

S205. The abnormal sound detecting device runs an abnormal sound detecting algorithm.

In the present embodiment, the abnormal sound detecting means operates the abnormal sound detecting algorithm, and the process of the abnormal sound detecting algorithm includes S102, S103 shown in FIG. Further determining whether there is an abnormal sound in the first speech signal to determine whether the sound output device is abnormal.

S206. The abnormal sound detecting device outputs the detection result.

The process shown in Figure 4 is shown in the process shown in Figure 2.

In the embodiment provided by FIG. 2 and FIG. 4, the abnormal sound detecting means outputs the detection result obtained in S205, and determines that the sound output device is abnormal when determining that the first voice signal has an abnormal sound; and determines that the first voice signal does not have When the noise is abnormal, it is determined that the sound output device is normal.

In the existing method, the existing method provides a method in which the sound output device plays the frequency sweep signal, and then obtains the frequency sweep signal played by the sound output device, and then calculates the 12-15 harmonic of the frequency sweep signal. Wave energy; according to the 12-15th harmonic energy of the frequency sweep signal, determine whether there is abnormal sound in the frequency sweep signal to determine whether the sound output device is abnormal. However, in this mode, the signal to be detected is still a frequency sweep signal. As with the previously mentioned problem, there may still be an inability to accurately detect the abnormal sound in the frequency sweep signal played by the sound output device to be detected. It is impossible to accurately detect whether the sound output device to be detected is abnormal. Further, when detecting the sound output device such as the earpiece of some terminal devices, the detection result is judged to be no abnormal sound, but when the terminal device is actually used to play the sound source, the user may hear the obvious abnormal sound.

The existing method further provides a method for acquiring an audio signal transmitted by a communication network, acquiring a frequency domain energy distribution parameter of a current frame of the audio signal, and acquiring a frequency of each frame in a frame within a preset neighborhood of the current frame. The domain energy distribution parameter is obtained by acquiring the pitch parameter of the current frame, and acquiring the pitch parameter of each frame in the frame within the preset neighborhood of the current frame; according to the pitch parameter of the current frame and the frame within the preset neighborhood of the current frame. The pitch parameter of each frame determines whether the current frame is in the voice segment; if it is determined that the current frame is in the voice segment, and in all the frequency domain energy distribution parameters, the frequency of the energy distribution parameter interval in the preset voice-like audio domain If the number of domain energy distribution parameters is greater than or equal to the first threshold, it is determined that the current frame is a voice-like noise. In the prior method, the first point, the audio signal to be detected is an audio signal transmitted by the communication network, and the audio signal is in the process of transmission. There is a packet loss phenomenon of the audio signal, or other external noise may occur to make the audio signal doped noise during the transmission; thus, in the existing method, if the voice noise is detected, the noise may be Because the audio signal is caused by packet loss during the transmission process, or is caused by the noise, it is impossible to determine whether the noise is caused by the defect of the sound output device itself, and the existing method is not accurate. Secondly, in the existing method, the frequency domain energy distribution parameter of the audio signal is analyzed, and the frequency domain energy distribution parameter of the audio signal is compared with the preset frequency domain energy distribution parameter interval to determine whether the audio signal is in the audio signal. There are abnormal sounds; however, since the frequency domain characteristics of different audio signals may vary widely, it may be difficult to preset the frequency domain energy distribution parameter interval, and the same result may result in inaccurate detection results. Thirdly, the existing detection method is directed to the same type of audio signal, and there are great differences in the design process, assembly process, and electro-acoustic device selection of different types of terminal devices, which leads to different The same type of audio signal played by the terminal device also has a great difference in the frequency domain characteristics, and also brings great difficulty to the preset frequency domain energy distribution parameter interval. Poor sex can also cause inaccurate test results.

In the present application, the process of FIG. 2 or FIG. 4 is adopted. Since the detected signal to be detected is a voice signal, the voice signal can represent a real use scenario of the user, and the entire playback process of the voice signal is concentrated in the actual frequency band of the voice. Repeated triggering inside, which is beneficial to find the abnormality of the problem frequency; and, in this application, the speech signal itself represents the real frequency point to be detected, so the possibility of missing the problem frequency is much smaller. It is beneficial to detect the frequency of abnormal sounds. Meanwhile, in the present application, the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, and is not a signal transmitted from the communication network, thereby avoiding packet loss during transmission of the voice signal. Phenomena, or the problem of being mixed with noise to cause abnormal sound, improves the accuracy of the test results. Moreover, in the present application, the residual signal may include a portion of the first voice signal that is different from the signal of the voice reference signal, and then the residual signal is detected to determine whether there is an abnormal sound in the first voice signal. And, the first voice signal is the same as the voice content of the voice reference signal, and the detection method is convenient, and the detection method is more versatile, and the detection result is better than the method of analyzing the noise by using the frequency domain energy distribution parameter of the audio signal. accurate.

In this embodiment, the first voice signal played by the sound output device of the terminal device is obtained, wherein the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly; a voice reference signal, and a first voice signal, to obtain a residual signal, wherein the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; and determining, according to the residual signal, whether the first voice signal is Have an abnormal sound to determine if the sound output device is abnormal. There is thus provided a way of detecting whether an abnormal sound occurs when the sound output device plays audio to determine whether the sound output device is abnormal. Since the detected signal to be detected is a voice signal, the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency. The abnormality of the point; and, in the present application, the speech signal itself represents the real frequency point to be detected, so the possibility of missing the problem frequency point is much smaller, and it is advantageous to detect the frequency point with abnormal sound. At the same time, the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise. And, the residual signal includes a portion of the first voice signal different from the signal of the voice reference signal, and then detecting the residual signal to determine whether there is an abnormal sound in the first voice signal, first The voice signal is the same as the voice content of the voice reference signal. It is convenient and the versatility of the detection method is good, and the accuracy of the detection result is improved.

FIG. 5 is a schematic flowchart diagram of still another abnormal sound detecting method according to an embodiment of the present application. As shown in FIG. 5, the method includes:

S301. Acquire a first voice signal played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly.

In this embodiment, the step S101 in the flowchart 1 of the abnormal sound detecting method provided in FIG. 2 and the step S202 in the flowchart 2 of the abnormal sound detecting method provided in FIG. 4 are referred to.

S302. Acquire a second voice signal played by at least one other sound output device, where the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal. And performing signal superposition processing on each of the second speech signals to generate a speech reference signal.

In this embodiment, a plurality of normal sound output devices that can normally play the sound can be used to play the same second voice signal; the second voice signal played by the normal sound output device is also stored in each normal In the terminal device corresponding to the sound output device. And, the voice content in the second voice signal is the same as the voice content in the first voice signal. After the normal sound output device plays the same second voice signal, the abnormal sound detecting device separately records the second voice signal played by each normal sound output device.

Then, the abnormal sound detecting means performs signal superimposition processing on each of the second speech signals to obtain a voice reference signal, wherein the voice content of the voice reference signal and the voice content in the second voice signal are the same. Among them, the process of signal superposition processing can be in the following ways. In the first mode, the abnormal sound detecting device performs splicing processing on each second voice signal to obtain a voice reference signal. The second mode is that the abnormal sound detecting device superimposes each of the second voice signals in the time domain to obtain a voice reference signal. The third mode is: the abnormal sound detecting device can detect each second voice signal in each frequency band, and filter the frequency band of the signal exceeding the preset frequency range in each second voice signal, and then filter After the processing, each of the second speech signals is subjected to synthesis processing to obtain a speech reference signal.

S303. The first voice signal is time-aligned with the voice reference signal in the time domain to generate a first voice signal after the voice reference signal is aligned.

In this embodiment, the abnormal sound detecting device performs time delay alignment processing on the first voice signal and the voice reference signal in the time domain, so that the first voice signal is aligned with the voice reference signal in the time domain to obtain an aligned voice reference. The first speech signal after the signal.

The delay alignment algorithm may use a delay alignment algorithm to align the first speech signal with the speech reference signal in a time domain, for example, a generalized autocorrelation algorithm (GCC), and a self-correlation algorithm (GCC). Adapted to Least Mean Square (LMS), subspace based Eigen-Value Decomposition (EVD), Acoustic Transfer Functions Ration (ATF-s ration), etc. Wait.

S304. Obtain a residual signal according to the pre-acquired voice reference signal and the first voice signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal.

In this embodiment, the step S102 in the flow chart of the abnormal sound detecting method provided in FIG. 2 and the step S205 in the flow chart 2 of the abnormal sound detecting method provided in FIG. 4 are referred to.

S305. Determine an energy value of the residual signal.

The S305 specifically includes: removing the voice main band energy in the residual signal, and generating the removed voice main frequency band A residual signal of energy, wherein a frequency of the speech main band energy is less than the first frequency value; and determining an energy value of the residual signal from which the speech main band energy is removed.

Wherein, determining an energy value of the residual signal from which the energy of the main band of the voice is removed includes: determining a portion of the residual signal in which the energy of the main band of the voice is removed is greater than a second frequency value, and energy in each frame value.

In the present embodiment, the abnormal sound detecting device first needs to calculate the energy value of the residual signal. Since the signal frequency of the main energy of the speech in the residual signal is low, the energy of the energy part of the main frequency band of the speech is greater than the energy of the high frequency abnormal part of the residual signal, and the energy of the main frequency band of the speech Slight fluctuations directly affect the judgment of the high frequency noise energy in the residual signal, so it is necessary to filter the main energy of the speech in the residual signal; at this time, the abnormal sound detection device needs to adopt the high-pass filtering method first. The residual signal is processed to remove the main energy band of the voice in the residual signal, and then the residual signal with the energy of the main band of the voice is removed; in the process of removing, the voice master in the residual signal The frequency of the band energy is less than the first frequency value, and in the process of removal, the speech main band energy in the residual signal can be removed.

Specifically, the high-pass filter is a filtering method. The high-pass filtering rule is that the high-frequency signal can pass through the high-pass filter normally, and the low-frequency signal below the set threshold is blocked by the high-pass filter. And weaken, and the high-pass filter can output a high-frequency signal.

For example, analyzing a sampled speech signal, the sampling rate of the sampled speech signal is 8 kHz. According to the Nyquist theorem, the frequency of the main energy band of the speech in the sampled speech signal can be calculated to be below 4 kHz. The energy of the main frequency band of the speech is much stronger than the energy of the higher harmonics. The result of analyzing the speech spectrum of the speech reference signal is that the speech reference signal is very clean and the energy of the higher harmonics is hardly seen.

For the analysis performed in the above example, it can be seen that the portion of the energy of the higher harmonics represents the portion of the abnormal signal in the speech signal. In the present application, a residual signal can be analyzed, and the energy of the main frequency band portion of the residual signal is stronger than the energy of the higher harmonics. If the residual signal is not subjected to high-pass filtering, in the frequency domain. The energy of the higher harmonics only accounts for a small fraction of the total energy of the residual signal; further, slight fluctuations or changes in the energy portion of the main energy band of the speech are more likely to be caused by higher harmonics. Or the change is larger, which seriously affects whether or not the high-order harmonic is generated in the residual signal, thereby affecting whether the residual signal has an abnormal sound. So here we have a high-pass filter with a cut-off frequency that is less than the first frequency value. The high-pass filter can be used to filter the energy of the main speech band whose frequency is less than the first frequency value; then the residual signal is left. The energy is mainly the energy of the higher harmonic part, that is, the remaining energy of the residual signal is the energy of the part of the abnormal sound signal. Among them, the first frequency value can be set to 4 kHz.

Then, the abnormal sound detecting means calculates the energy value for the residual signal from which the energy of the main band of the voice is removed. In this step, the abnormal sound detecting means can calculate the energy value at each frame in which the frequency in the residual signal of the speech main band energy is greater than the second frequency value. Wherein, the energy value of the residual signal of the main energy of the speech is removed, which is also called the out-of-band energy.

Specifically, in a preferred time, the high-pass filtered residual signal obtained after the high-pass filtering process does not have a signal whose frequency is smaller than the first frequency value, and thus the high-pass filtered residual can be directly calculated from the time domain. The time domain energy of the signal yields the energy value of the residual signal from which the energy of the speech main band is removed.

However, when it is not ideal, the high-pass filtered residual signal obtained after the high-pass filtering process also has a signal whose frequency is lower than the first frequency value, and further needs to calculate the high-pass filtered residual signal from the frequency domain. In the frequency domain energy, it is ensured that the energy of the signal whose frequency is less than the first frequency value is not calculated. Therefore, in this step, the abnormal sound detecting device needs to perform calculation for the portion of the residual signal from which the energy of the main energy band of the voice is removed is greater than the second frequency value, where the second frequency value can be set to Equal to the first frequency value, the second frequency value setting rate may be set to be greater than the first frequency value according to actual requirements; and, the abnormal sound detecting device calculates the energy value of each part in the frequency less than the second frequency value. E_thr _n , that is, an energy value E_thr _{n is} obtained for one frame; wherein, for one frame, the energy value of one frame is the sum of the squares of the amplitude values of the points in the frame; then, the noise detecting device sets each energy value E_thr _{n is} fitted to an energy curve, which is compared with a preset energy curve.

S306. Determine, according to the energy value, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.

The S306 specifically includes: determining, in the energy value of each frame, that the energy value that does not have the preset number is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has a different value Sound, and determining that the sound output device is abnormal; determining that the energy value of each frame has a preset number of energy values less than a first energy threshold corresponding to the energy value, determining the first voice signal There is no abnormal sound and it is determined that the sound output device is normal.

In the present embodiment, the abnormal sound detecting means compares the energy curve obtained from each energy value E_thr _n with a preset energy curve. There is a first energy threshold for each energy value E_thr _n on the preset energy curve. Further, if the abnormal sound detecting device determines that each of the energy values E_thr _n does not have a preset number of energy values smaller than a first energy threshold corresponding to the energy value E_thr _n , it may be determined that the first voice signal has a different value. Sound, and determining that the sound output device that plays the first voice signal is abnormal; if the noise detecting device determines the respective energy values E_thr _n , the energy value having a preset number is smaller than the energy value E_thr _n When an energy threshold is used, it may be determined that the first voice signal does not have an abnormal sound, and it is determined that the sound output device that plays the first voice signal is normal.

For example, FIG. 6 is an energy curve diagram of still another abnormal sound detecting method provided by an embodiment of the present application. As shown in FIG. 6, the measured energy curve of the first voice signal is obtained by the method provided in this embodiment, and the measured energy curve is a solid curve in FIG. 6, and the dotted curve in FIG. 6 is a preset energy curve; Comparing the measured energy curve with the preset energy curve, determining whether each energy value E_thr _n on the measured energy curve is smaller than a first energy threshold value on a preset energy curve corresponding to each energy value, It can be determined from FIG. 6 that the energy values E_thr _n on the measured energy curve are not all smaller than the first energy threshold value on the preset energy curve corresponding to each energy value, and the first energy threshold can be determined. The voice signal has an abnormal sound, and the sound output device that plays the first voice signal is abnormal.

In this embodiment, the first voice signal played by the sound output device of the terminal device is obtained, wherein the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly; acquiring at least one a second voice signal played by the other sound output device, wherein the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal; The voice signal is subjected to signal superposition processing to generate a voice reference signal; the first voice signal is time-aligned with the voice reference signal in a time domain to generate a first voice signal after the voice reference signal is aligned; according to the pre-acquired voice reference signal, And the first voice signal, the residual signal is obtained, wherein the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; and the voice main band energy in the residual signal is removed, and the voice is removed. a residual signal of the main band energy, wherein the frequency of the speech main band energy a first frequency value; determining, in the residual signal from which the energy of the main energy band of the voice is removed, a frequency greater than a second frequency value, an energy value in each frame; and determining, according to the energy value, whether the first voice signal is Have an abnormal sound to determine if the sound output device is abnormal. Thereby providing a method for detecting whether an abnormal sound occurs when the sound output device plays audio to determine whether the sound output device is abnormal The way. Since the detected signal to be detected is a voice signal, the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency. The abnormality of the point; and, in the present application, the speech signal itself represents the real frequency point to be detected, so the possibility of missing the problem frequency point is much smaller, and it is advantageous to detect the frequency point with abnormal sound. At the same time, the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise. And, the residual signal includes a portion of the first voice signal different from the signal of the voice reference signal, and then detecting the residual signal to determine whether there is an abnormal sound in the first voice signal, first The voice signal and the voice reference signal have the same voice content, the detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.

FIG. 7 is a schematic flowchart diagram of another method for detecting an abnormal sound according to an embodiment of the present application. As shown in FIG. 7, the method includes:

S401. Acquire a first voice signal played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly.

In this embodiment, referring to step S101 in the flowchart 1 of the abnormal sound detecting method provided in FIG. 2, and step S202 in the flowchart 2 of the abnormal sound detecting method provided in FIG. 4, Step S301 of the flow diagram of still another abnormal sound detecting method provided in FIG.

S402. Acquire a second voice signal played by at least one other sound output device, where the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal. And performing signal superposition processing on each of the second speech signals to generate a speech reference signal.

In this embodiment, this step refers to step S302 of the flowchart of the other abnormal sound detecting method provided in FIG. 5 .

S403. Align the first voice signal with the voice reference signal in a time domain to generate a first voice signal after the voice reference signal is aligned.

In this embodiment, this step refers to step S303 of the flowchart of the other abnormal sound detecting method provided in FIG. 5 .

S404. Obtain a residual signal according to the pre-acquired voice reference signal and the first voice signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal.

In this embodiment, referring to step S102 in the flowchart 1 of the abnormal sound detecting method provided in FIG. 2, and step S205 in the flowchart 2 of the abnormal sound detecting method provided in FIG. 4, Step S304 of the flow diagram of still another abnormal sound detecting method provided in FIG.

S405. Determine an energy value of the residual signal.

The S405 specifically includes: removing the voice main band energy in the residual signal, and generating a residual signal with the voice main band energy removed, wherein the frequency of the voice main band energy is smaller than the first frequency value; determining that the voice is removed The energy value of the residual signal of the main band energy.

Wherein, determining an energy value of the residual signal from which the energy of the main band of the voice is removed includes: determining a portion of the residual signal in which the energy of the main band of the voice is removed is greater than a second frequency value, and energy in each frame Value; the energy maximum is determined, wherein the energy maximum is the largest of the energy values of the frames.

In this embodiment, in this step, “removing the voice main band energy in the residual signal, generating a residual signal with the voice main band energy removed, wherein the frequency of the speech main band energy is less than the first frequency value. Determining an energy value of the residual signal from which the energy of the main band of the speech is removed; determining a portion of the residual signal from which the energy of the main band of the speech is removed is greater than the second frequency value, and the energy value at each frame" Step S305 of the flow diagram of still another abnormal sound detecting method provided in FIG.

Then, in this step, after the noise detecting device obtains the energy value E_thr _n on each frame, the maximum value of the energy value E_thr _n on each frame is calculated to obtain an energy maximum value.

S406. Determine, according to the energy value, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.

The S406 specifically includes: determining that the first voice signal has an abnormal sound when the energy maximum value is greater than or equal to the second energy threshold, and determining that the sound output device is abnormal; and determining that the energy maximum value is less than the second energy threshold. When it is determined that there is no abnormal sound in the first speech signal, it is determined that the sound output device is normal.

In this embodiment, the abnormal sound detecting device compares and analyzes the obtained energy maximum value with a second energy threshold value, and if the abnormal sound detecting device determines that the energy maximum value is greater than or equal to the second energy threshold value, determining the The first voice signal has an abnormal sound, and determines that the sound output device that plays the first voice signal is abnormal; if the abnormal sound detecting device determines that the energy maximum value is less than the second energy threshold, determining the first voice signal There is no abnormal sound in it, and it is determined that the sound output device that plays the first voice signal is normal.

Alternatively, in S405, the energy value E_thr _n on each frame may be averaged to obtain an energy average value; and in S406, the abnormal sound detecting device will obtain the energy average value and a third energy threshold value. Performing a comparative analysis, if the noise detection device determines that the energy average value is greater than or equal to the third energy threshold, determining that the first voice signal has an abnormal sound, and determining that the sound output device that plays the first voice signal is abnormal; If the noise detecting means determines that the energy mean is less than the third energy threshold, it is determined that the first voice signal does not have an abnormal sound, and it is determined that the sound output device that plays the first voice signal is normal.

In this embodiment, the first voice signal played by the sound output device of the terminal device is obtained, wherein the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly; acquiring at least one a second voice signal played by the other sound output device, wherein the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal; The voice signal is subjected to signal superposition processing to generate a voice reference signal; the first voice signal is time-aligned with the voice reference signal in a time domain to generate a first voice signal after the voice reference signal is aligned; according to the pre-acquired voice reference signal, And the first voice signal, the residual signal is obtained, wherein the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; and the voice main band energy in the residual signal is removed, and the voice is removed. a residual signal of the main band energy, wherein the frequency of the speech main band energy a first frequency value; determining a portion of the residual signal from which the energy of the main energy band of the voice is removed is greater than a second frequency value, an energy value at each frame; determining an energy maximum value, wherein the maximum energy value is The largest value of the energy values of the frame; determining whether there is an abnormal sound in the first speech signal according to the maximum value of the energy to determine whether the sound output device is abnormal. There is thus provided a way of detecting whether an abnormal sound occurs when the sound output device plays audio to determine whether the sound output device is abnormal. Since the detected signal to be detected is a voice signal, the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency. Point exception; and, this application The medium speech signal itself represents the real frequency point that needs to be detected, so the possibility of missing the problem frequency point is much smaller, which is beneficial for detecting the frequency point with abnormal sound. At the same time, the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise. And, the residual signal includes a portion of the first voice signal different from the signal of the voice reference signal, and then detecting the residual signal to determine whether there is an abnormal sound in the first voice signal, first The voice signal and the voice reference signal have the same voice content, the detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.

FIG. 8 is a schematic flowchart diagram of still another abnormal sound detecting method according to an embodiment of the present application. As shown in FIG. 8, the method includes:

S501. Acquire a first voice signal played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly.

S502. Acquire a voice reference signal.

S503. Perform time delay alignment of the first voice signal with the voice reference signal in the time domain to generate a first voice signal after the voice reference signal is aligned.

S504. Perform filtering processing according to the voice reference signal acquired in advance and the first voice signal to obtain a residual signal.

S505: Perform high-pass filtering on the residual signal to obtain a residual signal with the energy of the main band of the voice removed.

S506. Determine an energy value of the residual signal from which the energy of the main band of the voice is removed.

S507. Input an energy threshold.

S508. Determine whether the energy value is greater than or equal to the energy threshold to determine whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.

S509. Determine that the sound output device is abnormal when determining that the energy value is greater than or equal to the energy threshold.

S5010: Determine that the sound output device is normal when the energy value is determined to be less than the energy threshold.

In this embodiment, the steps of the flow schematic diagram of the other abnormal sound detecting method provided in FIG. 5 and the steps of the flow schematic diagram of another abnormal sound detecting method provided in FIG. 7 may be referred to in each step. The principle and effect are the same as the principle and effect of the method provided by the above embodiments.

FIG. 9 is a schematic structural diagram of an abnormal sound detecting apparatus according to an embodiment of the present application. As shown in Figure 9, the device includes:

The acquiring unit 81 is configured to acquire a first voice signal that is played by the sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes randomly;

The calculating unit 82 is configured to obtain a residual signal according to the pre-acquired voice reference signal and the first voice signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal;

The determining unit 83 is configured to determine, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.

The obtaining unit 81 may perform step S101 of the method shown in FIG. 2, or the first obtaining unit 81 may perform step S202 of the method shown in FIG. 4, or the first obtaining unit 81 may perform step S301 of the method shown in FIG. Or the first obtaining unit 81 can perform step S401 of the method shown in FIG. The computing unit 82 may perform step S102 of the method illustrated in FIG. 2, or the computing unit 82 may perform step S205 of the method illustrated in FIG. 4, or the computing unit 82 may perform step S304 of the method illustrated in FIG. 5, or the computing unit 82 may perform Step S404 of the method shown in FIG. The determining unit 83 may perform step S103 of the method illustrated in FIG. 2, or the determining unit 83 may perform step S205 of the method illustrated in FIG.

The abnormal sound detecting device of the embodiment shown in FIG. 9 can be used to perform the technical solution of the embodiment shown in FIG. 2 to FIG. 4 in the above method, and the implementation principle and technical effects are similar, and details are not described herein again.

FIG. 10 is a schematic structural diagram of still another abnormal sound detecting apparatus according to an embodiment of the present application. On the basis of the apparatus shown in FIG. 9, as shown in FIG. 10, in the apparatus, the determining unit 83 includes:

The first determining module 831 is configured to determine an energy value of the residual signal. The first determining module 831 can perform step S305 of the method shown in FIG. 5, or the first determining module 831 can perform step S405 of the method shown in FIG.

The second determining module 832 is configured to determine, according to the energy value, whether the first voice signal has an abnormal sound. The second determining module 832 can perform step S306 of the method shown in FIG. 5, or the second determining module 832 can perform step S406 of the method shown in FIG.

The first determining module 831 includes:

The removal sub-module 8311 is configured to remove the voice main band energy in the residual signal, and generate a residual signal with the voice main band energy removed, wherein the frequency of the voice main band energy is smaller than the first frequency value. The removing submodule 8311 can perform the step of removing the voice main band energy in the residual signal in step S305 of the method shown in FIG. 5, and generate a residual signal with the voice main band energy removed, wherein the voice main frequency band The frequency of the energy is less than the first frequency value", or the removal sub-module 8311 can perform the process of removing the voice main band energy in the residual signal in step S405 of the method shown in FIG. A residual signal of energy, wherein the frequency of the speech main band energy is less than the first frequency value.

The determining sub-module 8312 is configured to determine an energy value of the residual signal from which the speech main band energy is removed. The determining sub-module 8312 may perform the process of “determining the energy value of the residual signal with the voice mainband energy removed” in step S305 of the method shown in FIG. 5, or the determining sub-module 8312 may perform the process shown in FIG. 7. The process of "determining the energy value of the residual signal from which the speech main band energy is removed" in step S405 of the method.

The determining submodule 8312 is specifically configured to:

The energy value in each frame is determined by the portion of the residual signal from which the energy of the speech main band energy is removed is greater than the second frequency value. At this time, the determination sub-module 8312 can perform the process of "determining the energy value of the residual signal from which the speech main band energy is removed" in step S305 of the method shown in FIG.

Correspondingly, the second determining module 832 is specifically configured to:

In determining the energy value of each frame, if the energy value without the preset number is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has an abnormal sound, and determining the sound output The device is abnormal; when it is determined that the energy value of each frame has a preset number of energy values less than a first energy threshold corresponding to the energy value, determining that the first voice signal does not have an abnormal sound, and Make sure the sound output device is normal. At this time, the second determination module 832 can perform step S306 of the method shown in FIG.

Alternatively, the determining sub-module 8312 is specifically configured to:

Determining a portion of the residual signal from which the energy of the main energy band of the voice is removed is greater than the second frequency value, on each frame The energy value; the energy maximum is determined, wherein the energy maximum is the largest of the energy values of the frames. At this time, the determination sub-module 8312 can perform the process of "determining the energy value of the residual signal from which the speech main band energy is removed" in step S405 of the method shown in FIG.

Determining that the first speech signal has an abnormal sound and determining that the sound output device is abnormal when determining that the energy maximum value is greater than or equal to the second energy threshold; determining the first speech when determining that the energy maximum is less than the second energy threshold There is no abnormal sound in the signal and it is determined that the sound output device is normal. At this time, the second determination module 832 can perform step S406 of the method shown in FIG.

In the apparatus of this embodiment, the method further includes:

The generating unit 91 is configured to acquire, after the calculating unit 82 obtains the residual signal according to the pre-acquired voice reference signal and the first voice signal, the second voice signal played by the at least one other sound output device, wherein the other voice output The device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal; each second voice signal is subjected to signal superposition processing to generate a voice reference signal. The generating unit 91 may perform step S302 of the method shown in FIG. 5, or the generating unit 91 may perform step S402 of the method shown in FIG.

The aligning unit 92 is configured to: after the calculating unit 82 obtains the residual signal according to the pre-acquired voice reference signal and the first voice signal, delay the first voice signal with the voice reference signal in the time domain to generate an alignment. The first speech signal after the speech reference signal. Wherein, the aligning unit 92 can perform step S303 of the method shown in FIG. 5, or the aligning unit 92 can perform step S403 of the method shown in FIG.

The abnormal sound detecting device of the embodiment shown in FIG. 10 can be used to perform the technical solution of the embodiment shown in FIG. 5 to FIG. 8 in the above method, and the implementation principle and technical effects are similar, and details are not described herein again.

Moreover, the implementation of this embodiment does not depend on whether the embodiment shown in FIG. 9 is implemented, and the embodiment can be implemented independently.

FIG. 11 is a schematic structural diagram of another abnormal sound detecting apparatus according to an embodiment of the present application. As shown in FIG. 11, the network device includes a transmitter 261, a receiver 262, and a processor 263. The receiver 262 is configured to acquire a first voice signal played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly. The processor 263 is configured to obtain a residual signal according to the pre-acquired voice reference signal and the first voice signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; The signal determines whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal. At this time, the receiver 262 can implement the function of the obtaining unit 81 in the apparatus shown in FIG. 9, and further, the receiver 262 can perform step S101 of the method shown in FIG. 2, or the receiver 262 can perform the steps of the method shown in FIG. S202, or the receiver 262 may perform step S301 of the method illustrated in FIG. 5, or the receiver 262 may perform step S401 of the method illustrated in FIG. The processor 263 can implement the functions of the computing unit 82 and the determining unit 83 in the apparatus shown in FIG. 9, and further, the processor 263 can perform steps S102 and S103 of the method shown in FIG. 2, or the processor 263 can execute the method shown in FIG. Step S205 of the method.

The processor 263 is specifically configured to determine an energy value of the residual signal, and determine, according to the energy value, whether the first voice signal has an abnormal sound. At this time, the processor 263 can implement the functions of the first determining module 831 and the second determining module 832 in the apparatus shown in FIG. 10, and further, the processor 263 can perform step S305 of the method shown in FIG. 5 and S306, or processor 263, may perform steps S405 and S406 of the method illustrated in FIG.

The processor 263 is specifically configured to remove the voice main band energy in the residual signal, and generate a residual signal with the voice main band energy removed, wherein the frequency of the voice main band energy is smaller than the first frequency value; determining that the voice is removed The energy value of the residual signal of the main band energy. At this time, the processor 263 can implement the functions of the removal submodule 8311 and the determination submodule 8312 in the apparatus shown in FIG. 10, and further, the processor 263 can execute step S305 of the method shown in FIG. 5, or the processor 263 can execute the diagram. Step S405 of the method shown in 7.

The processor 263 is specifically configured to determine, in the residual signal in which the energy of the voice main band energy is removed, a portion whose frequency is greater than the second frequency value, and the energy value in each frame; in determining the energy value of each frame, When the energy value without the preset number is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal; determining the energy of each frame In the value, when the energy value of the preset number is less than the first energy threshold corresponding to the energy value, it is determined that the first voice signal does not have an abnormal sound, and the sound output device is determined to be normal. At this time, the processor 263 can implement the functions of the determining sub-module 8312 and the second determining module 832 in the apparatus shown in FIG. 10, and further, the processor 263 can perform the step of determining the voice removed in step S305 of the method shown in FIG. The process of the energy value of the residual signal of the main band energy" and the step S306 of the method shown in FIG.

Alternatively, the processor 263 is specifically configured to determine a portion of the residual signal in which the energy of the voice main band is removed, a frequency greater than the second frequency value, and an energy value in each frame; and determine an energy maximum, where the energy maximum The maximum value of the energy values of each frame; when determining that the energy maximum value is greater than or equal to the second energy threshold value, determining that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal; When the energy threshold is two, it is determined that there is no abnormal sound in the first speech signal, and it is determined that the sound output device is normal. At this time, the processor 263 can implement the functions of the determining sub-module 8312 and the second determining module 832 in the apparatus shown in FIG. 10, and further, the processor 263 can perform the "determining the removed voice" in step S405 of the method shown in FIG. The process of the energy value of the residual signal of the main band energy, and the step S406 of the method shown in FIG.

The receiver 262 is further configured to acquire a second voice signal played by at least one other sound output device, wherein the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is in the first voice signal. The voice content is the same. At this time, the receiver 262 can implement a part of the functions of the generating unit 91 in the apparatus shown in FIG. 10, and further, the receiver 262 can perform the "acquisition of at least one other sound output device" in step S302 of the method shown in FIG. The process of the second voice signal, or the receiver 262, may perform the process of "acquiring the second voice signal played by the at least one other sound output device" in step S402 of the method shown in FIG.

The processor 263 is further configured to perform signal superposition processing on each of the second speech signals to generate a speech reference signal. At this time, the processor 263 can implement a part of the functions of the generating unit 91 in the apparatus shown in FIG. 10, and further, the processor 263 can perform the signal superimposition processing on each of the second voice signals in step S302 of the method shown in FIG. The process of generating a voice reference signal, or the processor 263 may perform the process of "signal superimposing each second voice signal to generate a voice reference signal" in step S402 of the method shown in FIG.

The processor 263 is further configured to perform time delay alignment of the first voice signal with the voice reference signal in the time domain to generate a first voice signal after the voice reference signal is aligned. At this time, the processor 263 can implement the function of the aligning unit 92 in the apparatus shown in FIG. 10, and further, the processor 263 can execute step S303 of the method shown in FIG. 5, or the processor 263 can execute the steps of the method shown in FIG. S403.

The abnormal sound detecting device of the embodiment shown in FIG. 11 can be used to execute the technical solution of the above method embodiment, or In the program of each module of the embodiment shown in FIG. 10, the processor 263 calls the program to perform the operations of the above method embodiments to implement the modules shown in FIG. 9 and FIG.

The processor 263 may also be a controller, and is represented as "controller/processor 263" in FIG. The transmitter 261 and the receiver 262 are configured to support transmission and reception of information between the network device and the terminal device in the above embodiment, and to support radio communication between the terminal device and other terminal devices. The processor 263 performs various functions for communicating with the terminal device.

Further, the network device may further include a memory 264 for storing program codes and data of the network device.

The processor 263, such as a central processing unit (CPU), may also be one or more integrated circuits configured to implement the above method, for example, one or more application specific integrated circuits (ASICs), Or, one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs). The memory 264 can be a memory or a collective name for a plurality of storage elements.

It should be noted that the transmitter 261 included in the abnormal sound detecting apparatus of FIG. 11 provided by the embodiment of the present invention may perform a sending operation corresponding to the foregoing method embodiment, and the processor 263 performs processing operations such as processing, determining, and acquiring, and the receiver. The receiving action can be performed. For details, refer to the foregoing method embodiments. The receiver 262 included in the abnormal sound detecting device of Fig. 11 corresponds to the operation of acquiring a voice signal in the above-described method embodiment.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server or data center by wire (eg, coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media. The usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (eg, a Solid State Disk (SSD)) or the like.

Those skilled in the art should appreciate that in one or more of the above examples, the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored in a computer readable medium or transmitted as one or more instructions or code on a computer readable medium. Computer readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one location to another. A storage medium may be any available media that can be accessed by a general purpose or special purpose computer.

Claims

An abnormal sound detecting method, comprising:

Obtaining, by the sound output device of the terminal device, the first voice signal, wherein the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly;

Obtaining a residual signal according to the pre-acquired voice reference signal and the first voice signal, wherein the residual signal includes a portion of the first voice signal that is different from a signal of the voice reference signal;

And determining, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
The method according to claim 1, wherein the determining whether the first voice signal has an abnormal sound according to the residual signal comprises:

Determining an energy value of the residual signal;

Determining whether the first voice signal has an abnormal sound according to the energy value.
The method according to claim 2, wherein said determining an energy value of said residual signal comprises:

And removing a residual energy signal of the voice in the residual signal to generate a residual signal with the energy of the voice main frequency band removed, wherein the frequency of the voice main frequency band energy is less than the first frequency value;

The energy value of the residual signal from which the speech main band energy is removed is determined.
The method according to claim 3, wherein the determining the energy value of the residual signal from which the energy of the main energy band of the voice is removed comprises:

Determining, in the residual signal in which the energy of the main energy band of the voice is removed, a frequency greater than a second frequency value, and an energy value in each frame.
The method according to claim 4, wherein the determining whether the first voice signal has an abnormal sound according to the energy value comprises:

Determining, in the energy value of each of the frames, that the energy value that does not have a preset number is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has an abnormal sound, And determining that the sound output device is abnormal;

Determining that the first speech signal has no abnormal sound when the energy value of each of the frames is determined to be less than the first energy threshold corresponding to the energy value, And determining that the sound output device is normal.
The method according to claim 3, wherein the determining the energy value of the residual signal from which the energy of the main energy band of the voice is removed comprises:

Determining, in the residual signal in which the energy of the main energy band of the voice is removed, a frequency greater than a second frequency value, and an energy value in each frame;

The energy maximum is determined, wherein the energy maximum is the largest of the energy values of the frames.
The method according to claim 6, wherein the determining whether the first voice signal has an abnormal sound according to the energy value comprises:

When it is determined that the energy maximum value is greater than or equal to the second energy threshold, determining that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal;

When it is determined that the energy maximum is less than the second energy threshold, determining that the first voice signal does not have an abnormal sound, and determining that the sound output device is normal.
The method according to any one of claims 1 to 7, wherein before the obtaining the residual signal according to the pre-acquired voice reference signal and the first voice signal, the method further includes:

Obtaining a second voice signal played by at least one other sound output device, wherein the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is in the first voice signal The same voice content;

Each of the second speech signals is subjected to signal superposition processing to generate the speech reference signal.
The method according to any one of claims 1 to 7, wherein before the obtaining the residual signal according to the pre-acquired voice reference signal and the first voice signal, the method further includes:

And time-aligning the first voice signal with the voice reference signal in a time domain to generate a first voice signal after the voice reference signal is aligned.
An abnormal sound detecting device, comprising:

An acquiring unit, configured to acquire a first voice signal played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes a random frequency change Audio information;

a calculating unit, configured to obtain a residual signal according to the pre-acquired voice reference signal and the first voice signal, where the residual signal includes the first voice signal and the voice reference signal a different part of the signal;

And a determining unit, configured to determine, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
The device according to claim 10, wherein the determining unit comprises:

a first determining module, configured to determine an energy value of the residual signal;

And a second determining module, configured to determine, according to the energy value, whether the first voice signal has an abnormal sound.
The device according to claim 11, wherein the first determining module comprises:

a sub-module, configured to remove a voice main band energy in the residual signal, and generate a residual signal with the voice main band energy removed, where the frequency of the voice main band energy is less than the first frequency value;

Determining a sub-module, determining an energy value of the residual signal from which the energy of the speech main band is removed.
The apparatus according to claim 12, wherein the determining submodule is specifically configured to:

Determining, in the residual signal in which the energy of the main energy band of the voice is removed, a frequency greater than a second frequency value, and an energy value in each frame.
The device according to claim 13, wherein the second determining module is specifically configured to:

Determining, in the energy value of each of the frames, that the energy value that does not have a preset number is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has an abnormal sound, And determining that the sound output device is abnormal;

Determining that the first speech signal has no abnormal sound when the energy value of each of the frames is determined to be less than the first energy threshold corresponding to the energy value, And determining that the sound output device is normal.
The apparatus according to claim 12, wherein the determining submodule is specifically configured to:

Determining, in the residual signal in which the energy of the main energy band of the voice is removed, a frequency greater than a second frequency value, and an energy value in each frame;

The energy maximum is determined, wherein the energy maximum is the largest of the energy values of the frames.
The device according to claim 15, wherein the second determining module is specifically configured to:

When it is determined that the energy maximum value is greater than or equal to the second energy threshold, determining that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal;

When it is determined that the energy maximum is less than the second energy threshold, determining that the first voice signal does not have an abnormal sound, and determining that the sound output device is normal.
The device according to any one of claims 10-16, wherein the device further comprises:

a generating unit, configured to acquire, after the calculating unit obtains the residual signal according to the pre-acquired voice reference signal and the first voice signal, the second voice signal played by the at least one other sound output device, where The other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal; and each of the second voice signals is subjected to signal superposition processing. The voice reference signal is generated.
The device according to any one of claims 10-16, wherein the device further comprises:

And an aligning unit, configured to perform the first voice signal in the time domain and the voice reference signal before the calculating unit obtains the residual signal according to the pre-acquired voice reference signal and the first voice signal Time delay alignment generates a first speech signal after aligning the speech reference signal.
A computer readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of any of claims 1-9.