WO2018133247A1 - Abnormal sound detection method and apparatus - Google Patents
Abnormal sound detection method and apparatus Download PDFInfo
- Publication number
- WO2018133247A1 WO2018133247A1 PCT/CN2017/082415 CN2017082415W WO2018133247A1 WO 2018133247 A1 WO2018133247 A1 WO 2018133247A1 CN 2017082415 W CN2017082415 W CN 2017082415W WO 2018133247 A1 WO2018133247 A1 WO 2018133247A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- voice
- energy
- determining
- output device
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R29/00—Monitoring arrangements; Testing arrangements
Definitions
- the present application relates to the field of terminal technologies, and in particular, to an abnormal sound detection method and apparatus.
- a sound output device is generally provided in the terminal, and the sound output device includes, for example, a speaker, a receiver, etc., and the terminal needs to play an audio signal by using the sound output device.
- the sound output device may cause an abnormal sound when playing an audio signal due to various reasons such as design defects, assembly defects, and foreign matter entering. Therefore, before the terminal sells, it is necessary to detect the sound output device on the terminal, and detect whether the sound output device has an abnormal sound when playing the audio signal.
- the sound output device to be detected is used to play the frequency sweep signal, and then the detection system records the frequency sweep signal played by the sound output device to be detected, and then calculates the high frequency of each frequency band on the frequency sweep signal. Harmonic distortion energy, and then determine whether the high-order harmonic distortion energy of each frequency band exceeds the energy threshold of each frequency band. When determining that the high-order harmonic distortion energy of one frequency band exceeds the energy threshold of the frequency band, or when determining that the high-order harmonic distortion energy of the multiple frequency bands exceeds the energy threshold of the respective frequency band, It can be determined that the sound output device to be detected has an abnormal sound, thereby determining that the sound output device to be detected is abnormal.
- the frequency sweep signal since the frequency sweep signal is in a certain frequency band, the frequency is from high to low, or the frequency is from monotonous change to low frequency, each frequency point in the frequency sweep signal lasts for a short time. Then, when a certain frequency point has not yet ignited a relatively high harmonic energy, the next frequency point is scanned, and the problem that may occur at the frequency point is not detected. Also, when the sound output device is actually used, it is unlikely that only a simple audio signal such as a swept signal will be played. Therefore, in the prior art, the abnormal sound in the frequency sweeping signal played by the sound output device to be detected cannot be accurately detected, and it is impossible to accurately detect whether the sound output device to be detected is abnormal, and the existing detection method does not accurate.
- the present invention provides an abnormal sound detecting method and apparatus for solving the problem that whether the sound output device to be detected in the prior art detects abnormal sound when playing an audio signal is inaccurate, and the sound output device to be detected cannot be accurately detected. Is it an abnormal problem?
- the present application provides an abnormal sound detecting method, including: acquiring a sound output device of a terminal device And playing the first voice signal, the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information with irregular frequency changes; according to the pre-acquired voice reference signal and the first voice signal Obtaining a residual signal, wherein the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; and then determining, according to the residual signal, whether the first voice signal has an abnormal sound, and further Determine if the sound output device is abnormal.
- determining whether the first voice signal has an abnormal sound according to the residual signal comprises: determining an energy value of the residual signal; and determining, according to the calculated energy value, the first voice signal Whether it has an abnormal sound.
- determining the energy value of the residual signal includes: removing the voice main band energy in the residual signal, thereby obtaining a residual signal with the voice main band energy removed, wherein In the process of removing the energy of the main band of the voice, the frequency of the removed main energy of the speech band is set to be smaller than the first frequency value; and then the energy value of the residual signal except the energy of the main band of the speech is determined.
- determining the energy value of the residual signal except for the energy of the main band of the voice includes: determining a portion of the residual signal other than the energy of the main band of the speech that is greater than the second frequency value, and then Then calculate the energy value of the part in each frame.
- determining whether the first voice signal has an abnormal sound includes the following process:
- the energy value of each frame is smaller than the first energy threshold corresponding to the energy value, it may be determined that the first voice signal does not have an abnormal sound, and the sound is determined.
- the output device is normal.
- determining an energy value of the residual signal except for the energy of the main band of the voice comprising: determining a portion of the residual signal excluding the energy of the main band of the speech that is greater than the second frequency value Then, calculate the energy value of the part in each frame; then calculate the energy maximum value, which is the largest value among the energy values of each frame.
- determining whether the first voice signal has an abnormal sound includes the following process:
- the energy maximum value is greater than or equal to the second energy threshold, it may be determined that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal;
- the energy maximum is less than the second energy threshold, it may be determined that the first voice signal does not have an abnormal sound, and it is determined that the sound output device is normal.
- the method before obtaining a residual signal according to the pre-acquired voice reference signal and the first voice signal, the method further includes: acquiring a second voice signal played by at least one other sound output device, each The other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal; and then the second voice signal is superimposed and processed to generate the above Voice reference signal.
- the method before obtaining a residual signal according to the pre-acquired voice reference signal and the first voice signal, the method further includes: delaying the first voice signal and the voice reference signal in a time domain. Processing, generating a first speech signal after aligning the speech reference signal.
- the technical solution provided by the embodiment of the present disclosure may include the following beneficial effects: obtaining the first voice signal played by the sound output device of the terminal device, wherein the first voice signal is locally stored in the terminal device, and the first voice is The signal includes audio information with irregular frequency changes; and the residual signal is obtained according to the pre-acquired voice reference signal and the first voice signal, wherein the residual signal is different from the signal of the voice reference signal in the first voice signal. And determining, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal. There is thus provided a way of detecting whether an abnormal sound occurs when the sound output device plays audio to determine whether the sound output device is abnormal.
- the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency.
- the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise.
- the detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.
- an abnormal sound detecting apparatus including:
- An acquiring unit configured to acquire a first voice signal that is played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information with irregular frequency changes;
- a calculating unit configured to obtain, according to the pre-acquired voice reference signal and the first voice signal, a residual signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal;
- a determining unit configured to determine, according to the residual signal, whether the first voice signal has an abnormal sound, thereby determining whether the sound output device is abnormal.
- the determining unit includes: a first determining module, configured to determine an energy value of the residual signal; and a second determining module, configured to determine, according to the calculated energy value, the first voice signal Whether it has an abnormal sound.
- the first determining module comprises:
- a sub-module is determined to determine the energy value of the residual signal in addition to the energy of the speech main band.
- the determining sub-module is specifically configured to: determine a portion of the residual signal excluding the energy of the main energy band of the voice that is greater than the second frequency value, and then calculate the portion on each frame. Energy value.
- the second determining module is specifically used for:
- the energy value of each frame is smaller than the first energy threshold corresponding to the energy value, it may be determined that the first voice signal does not have an abnormal sound, and the sound is determined.
- the output device is normal.
- determining a sub-module specifically: determining to determine a portion of the residual signal except the energy of the main energy band of the voice that is greater than the second frequency value, and then calculating the portion in each The energy value on one frame; then the energy maximum is calculated, which is the largest of the energy values of each frame.
- the second determining module is specifically used for:
- the energy maximum value is greater than or equal to the second energy threshold, it may be determined that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal;
- the energy maximum is less than the second energy threshold, it may be determined that the first voice signal does not have an abnormal sound, and it is determined that the sound output device is normal.
- the device further includes:
- a generating unit configured to acquire, by the computing unit, a second voice signal played by at least one other sound output device before obtaining a residual signal according to the pre-acquired voice reference signal and the first voice signal, each of the other sound output devices
- the voice content in the second voice signal is the same as the voice content in the first voice signal; then the second voice signal is subjected to signal superposition processing to generate the voice reference signal.
- the device further includes:
- an aligning unit configured to perform delay alignment processing on the first speech signal and the voice reference signal in the time domain before the calculating unit obtains a residual signal according to the pre-acquired voice reference signal and the first voice signal, to generate a The first speech signal after the speech reference signal is aligned.
- the technical solution provided by the embodiment of the present disclosure may include the following beneficial effects: obtaining the first voice signal played by the sound output device of the terminal device, wherein the first voice signal is locally stored in the terminal device, and the first voice is The signal includes audio information with irregular frequency changes; and the residual signal is obtained according to the pre-acquired voice reference signal and the first voice signal, wherein the residual signal is different from the signal of the voice reference signal in the first voice signal. And determining, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal. There is thus provided a way of detecting whether an abnormal sound occurs when the sound output device plays audio to determine whether the sound output device is abnormal.
- the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency.
- the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise.
- the detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.
- the present application provides a computer program for performing the method of the above first aspect when executed by a processor.
- the application provides a program product, such as a computer readable storage medium, comprising the program of the third aspect.
- a computer program product comprising instructions for causing a computer to perform the methods of the above aspects when run on a computer is provided.
- the first voice signal played by the sound output device of the terminal device is obtained, wherein the first voice signal is locally stored in the terminal device, and the The first voice signal includes audio information whose frequency is irregularly changed; and the residual signal is obtained according to the previously obtained voice reference signal and the first voice signal, wherein the residual signal is the first voice signal and the voice reference signal a portion of the signal; determining whether the first speech signal has an abnormal sound based on the residual signal to determine whether the sound output device is abnormal.
- the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency.
- the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise.
- the detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.
- FIG. 1 is a schematic diagram 1 of an application scenario according to an embodiment of the present disclosure
- FIG. 2 is a schematic flowchart 1 of a method for detecting an abnormal sound according to an embodiment of the present application
- FIG. 3 is a schematic diagram of an adaptive filtering method used in an abnormal sound detecting method according to an embodiment of the present disclosure
- FIG. 4 is a schematic flowchart 2 of a method for detecting an abnormal sound according to an embodiment of the present application
- FIG. 5 is a schematic flowchart diagram of still another abnormal sound detecting method according to an embodiment of the present application.
- FIG. 6 is an energy curve diagram of still another abnormal sound detecting method according to an embodiment of the present application.
- FIG. 7 is a schematic flowchart diagram of another method for detecting an abnormal sound according to an embodiment of the present application.
- FIG. 8 is a schematic flowchart diagram of still another abnormal sound detecting method according to an embodiment of the present application.
- FIG. 9 is a schematic structural diagram of an abnormal sound detecting apparatus according to an embodiment of the present application.
- FIG. 10 is a schematic structural diagram of still another abnormal sound detecting apparatus according to an embodiment of the present application.
- FIG. 11 is a schematic structural diagram of another abnormal sound detecting apparatus according to an embodiment of the present application.
- the embodiments of the present application are applied to either the abnormal sound detecting device, or the audio detecting system, or any system that can perform the embodiments of the present application. Some of the terms in the present application are explained below to facilitate understanding by those skilled in the art. It should be noted that when the solution of the embodiment of the present application is applied to an audio detection system or can be executed in any system of the embodiment of the present application, the names of the audio detection system and the abnormal sound detection device may change, but this is not The implementation of the solution of the embodiment of the present application is affected.
- a terminal device also referred to as a terminal or user device, is a device that provides voice and/or data connectivity to a user, for example, a handheld device having a wireless connection function, an in-vehicle device, and the like.
- Common terminal devices include, for example, a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a mobile internet device (MID), and a wearable device.
- the wearable device includes, for example, a smart watch, a smart wristband, and a step counter. And so on.
- a sound output device which is a device that can play an audio signal, for example, a speaker or a receiver; the sound output device can be disposed on the terminal device.
- Multiple means two or more, and other quantifiers are similar. "and/or”, describing the association relationship of the associated objects, indicating that there may be three relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately.
- the character "/" generally indicates that the contextual object is an "or" relationship.
- FIG. 1 is a schematic diagram 1 of an application scenario provided by an embodiment of the present application.
- the embodiment of the present application needs to use the terminal device 01 and the abnormal sound detecting device 02.
- a sound output device 03 is provided, and the sound output device 03 can play an audio signal.
- the sound output device 03 on the terminal device 01 plays an audio signal
- the abnormal sound detecting device 02 acquires the played audio signal played by the sound output device 03 on the end device 01, and then the abnormal sound detecting device 02 performs The solution carried out by the embodiment of the present application.
- the terminal device in the embodiment of the present application may refer to an access terminal, a user terminal, a terminal, a wireless communication device, a user agent, a user device, or the like.
- the user terminal has, for example, a smart phone, a smart watch, a personal computer, and the like.
- the sound output device in the implementation of the present application may be a speaker, a receiver, etc., and the sound output device in the implementation of the present application may be disposed on the terminal device in the embodiment of the present application.
- FIG. 2 is a schematic flowchart 1 of a method for detecting an abnormal sound according to an embodiment of the present application. As shown in Figure 2, the method includes:
- the description will be made with the execution subject being an abnormal sound detecting device.
- the sound output device of the terminal device plays the first voice signal, and then the noise detecting device can acquire the first voice signal played by the sound output device.
- the manner in which the abnormal sound detecting device acquires the first voice signal played by the sound output device is: the voice has been pre-stored in the terminal device, and the sound output device of the terminal device can be stored according to the voice stored locally by the terminal device.
- the first voice signal is played; then, the abnormal sound detecting device can take the first voice signal.
- the first voice signal may be the voice of the "first aid center dial 120" voice of the female voice at 112.
- the sound output device plays the voice stored locally in the terminal device "Please dial 120 for the emergency center.”
- the voice of the female voice can be used, because the female voice is still higher than the male voice, the fundamental frequency is higher, and the coverage of the frequency band is larger; the frequency energy distribution of the female voice on the time axis is more diverse. .
- the signal difference between the frequency sweep signal and the voice signal is large.
- the signal to be detected used in the prior art is a frequency sweep signal
- the frequency sweep signal is a process in which a frequency changes from high to low, or a frequency changes from low to high.
- Each frequency point in the frequency signal lasts for a short time; in turn, when a certain frequency point has not yet excited the higher harmonic energy, the next frequency point is scanned. Problems that may occur at the frequency point are not detected; voice signals are used in this application.
- the present application can obtain the first voice signal played by the sound output device, and the first voice signal has audio information with irregular frequency change, the first voice
- the duration of each frequency point in the signal is variable, and the frequency variation in the first speech signal is variability, and the entire playback process of the first speech signal is repeatedly triggered in the actual frequency band concentrated in the speech, and further It is good for finding anomalies with problematic frequencies.
- the abnormal sound is usually generated at a very narrow individual resonance frequency; whereas in the prior art, when the frequency sweep signal is used as the signal to be detected, since the frequency of the frequency sweep signal is Discrete step sweep, each frequency point is not continuous, and it is very likely that the true problem frequency will be missed during the scanning process; however, the speech signal itself in this application represents the real frequency point to be detected, so it is missed. The probability of having a problem frequency is much smaller, which is good for detecting frequencies with abnormal sounds.
- the abnormal sound detecting device has previously acquired the voice reference signal, wherein the voice content of the voice reference signal is the same as the voice content of the first voice signal.
- the voice content of the first voice signal is "Hello, please dial 00”
- the voice content of the voice reference signal is also "Hello, please dial 00”.
- the abnormal sound detecting device needs to adopt a voice reference signal, and the first voice signal to be detected is subjected to adaptive filtering processing to remove a portion of the first voice signal to be detected that is consistent with the signal of the voice reference signal, and retain the first to be detected.
- a portion of the voice signal that is different from the signal of the voice reference signal, and thus "the portion of the first voice signal to be detected that is different from the signal of the reference signal that remains is" is a residual signal.
- the abnormal sound detecting device may also adopt another filtering processing method, and perform filtering processing on the first voice signal to be detected according to the voice reference signal to obtain a residual signal.
- the residual signal may include a portion of the first voice signal that is different from the signal of the voice reference signal; and at the same time, the residual signal may also include some signal information of the first voice signal, or a residual signal. It is also possible to include some signal information of the voice reference signal.
- FIG. 3 is a schematic diagram of an adaptive filtering method used in an abnormal sound detecting method according to an embodiment of the present invention.
- x is a first voice signal
- d is a voice reference signal
- e is the residual signal.
- the idea of adaptive filtering is to constantly adjust the value of e by some criterion, so that the filtered x value (ie, y value) is close to the value of the speech reference signal d.
- x(j) represents the value of the input first speech signal at time j
- y(j) represents the value of the filtered first speech signal at the j-time
- d(j) represents the j-time.
- the residual signal e(j) is the difference between d(j) and y(j);
- the filtering parameter of the adaptive filter is controlled by the value of the residual signal e(j), and the filtering parameter is based on e
- the value of (j) is automatically adjusted so that it is suitable for the value of y(j) output at the next moment to be closer to the value of the desired speech reference signal d(j).
- S103 Determine, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
- the abnormal sound detecting device analyzes whether the obtained residual signal has an abnormal signal, and further determines whether the first voice signal has an abnormal sound. When it is determined that the first speech signal has an abnormal sound, it is determined that the sound output device is abnormal; when it is determined that the first speech signal does not have an abnormal sound, it is determined that the sound output device is normal.
- FIG. 4 is a schematic flowchart 2 of a method for detecting an abnormal sound according to an embodiment of the present application. As shown in Figure 4, the process includes:
- the abnormal sound detecting device starts the recording function of the abnormal sound detecting device.
- the abnormal sound detecting means activates its own recording function.
- the sound output device of the terminal device plays the first voice signal
- the abnormal sound detecting device acquires the first voice signal played by the sound output device of the terminal device, wherein the first voice signal is locally stored in the terminal device.
- the voice is pre-stored in the terminal device, and the sound output device of the terminal device can play the first voice signal according to the voice stored locally by the terminal device; then, the abnormal sound detecting device can take the first voice. voice signal.
- the process of this step can be referred to step S101 provided in FIG. 2, and the principle and process are the same as step S101.
- the abnormal sound detecting device saves the first voice signal.
- the abnormal sound detecting means holds the first voice signal that has been recorded.
- the abnormal sound detecting device acquires a voice reference signal.
- the abnormal sound detecting device acquires a voice reference signal, wherein the voice content of the voice reference signal is the same as the voice content of the first voice signal.
- the abnormal sound detecting device runs an abnormal sound detecting algorithm.
- the abnormal sound detecting means operates the abnormal sound detecting algorithm, and the process of the abnormal sound detecting algorithm includes S102, S103 shown in FIG. Further determining whether there is an abnormal sound in the first speech signal to determine whether the sound output device is abnormal.
- the abnormal sound detecting device outputs the detection result.
- the abnormal sound detecting means outputs the detection result obtained in S205, and determines that the sound output device is abnormal when determining that the first voice signal has an abnormal sound; and determines that the first voice signal does not have When the noise is abnormal, it is determined that the sound output device is normal.
- the existing method provides a method in which the sound output device plays the frequency sweep signal, and then obtains the frequency sweep signal played by the sound output device, and then calculates the 12-15 harmonic of the frequency sweep signal.
- Wave energy; according to the 12-15th harmonic energy of the frequency sweep signal determine whether there is abnormal sound in the frequency sweep signal to determine whether the sound output device is abnormal.
- the signal to be detected is still a frequency sweep signal.
- the detection result is judged to be no abnormal sound, but when the terminal device is actually used to play the sound source, the user may hear the obvious abnormal sound.
- the existing method further provides a method for acquiring an audio signal transmitted by a communication network, acquiring a frequency domain energy distribution parameter of a current frame of the audio signal, and acquiring a frequency of each frame in a frame within a preset neighborhood of the current frame.
- the domain energy distribution parameter is obtained by acquiring the pitch parameter of the current frame, and acquiring the pitch parameter of each frame in the frame within the preset neighborhood of the current frame; according to the pitch parameter of the current frame and the frame within the preset neighborhood of the current frame.
- the pitch parameter of each frame determines whether the current frame is in the voice segment; if it is determined that the current frame is in the voice segment, and in all the frequency domain energy distribution parameters, the frequency of the energy distribution parameter interval in the preset voice-like audio domain If the number of domain energy distribution parameters is greater than or equal to the first threshold, it is determined that the current frame is a voice-like noise.
- the first point, the audio signal to be detected is an audio signal transmitted by the communication network, and the audio signal is in the process of transmission.
- the existing method There is a packet loss phenomenon of the audio signal, or other external noise may occur to make the audio signal doped noise during the transmission; thus, in the existing method, if the voice noise is detected, the noise may be Because the audio signal is caused by packet loss during the transmission process, or is caused by the noise, it is impossible to determine whether the noise is caused by the defect of the sound output device itself, and the existing method is not accurate.
- the frequency domain energy distribution parameter of the audio signal is analyzed, and the frequency domain energy distribution parameter of the audio signal is compared with the preset frequency domain energy distribution parameter interval to determine whether the audio signal is in the audio signal.
- the existing detection method is directed to the same type of audio signal, and there are great differences in the design process, assembly process, and electro-acoustic device selection of different types of terminal devices, which leads to different
- the same type of audio signal played by the terminal device also has a great difference in the frequency domain characteristics, and also brings great difficulty to the preset frequency domain energy distribution parameter interval. Poor sex can also cause inaccurate test results.
- the process of FIG. 2 or FIG. 4 is adopted. Since the detected signal to be detected is a voice signal, the voice signal can represent a real use scenario of the user, and the entire playback process of the voice signal is concentrated in the actual frequency band of the voice. Repeated triggering inside, which is beneficial to find the abnormality of the problem frequency; and, in this application, the speech signal itself represents the real frequency point to be detected, so the possibility of missing the problem frequency is much smaller. It is beneficial to detect the frequency of abnormal sounds. Meanwhile, in the present application, the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, and is not a signal transmitted from the communication network, thereby avoiding packet loss during transmission of the voice signal.
- the residual signal may include a portion of the first voice signal that is different from the signal of the voice reference signal, and then the residual signal is detected to determine whether there is an abnormal sound in the first voice signal.
- the first voice signal is the same as the voice content of the voice reference signal, and the detection method is convenient, and the detection method is more versatile, and the detection result is better than the method of analyzing the noise by using the frequency domain energy distribution parameter of the audio signal. accurate.
- the first voice signal played by the sound output device of the terminal device is obtained, wherein the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly; a voice reference signal, and a first voice signal, to obtain a residual signal, wherein the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; and determining, according to the residual signal, whether the first voice signal is Have an abnormal sound to determine if the sound output device is abnormal.
- the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency.
- the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise.
- the residual signal includes a portion of the first voice signal different from the signal of the voice reference signal, and then detecting the residual signal to determine whether there is an abnormal sound in the first voice signal, first
- the voice signal is the same as the voice content of the voice reference signal. It is convenient and the versatility of the detection method is good, and the accuracy of the detection result is improved.
- FIG. 5 is a schematic flowchart diagram of still another abnormal sound detecting method according to an embodiment of the present application. As shown in FIG. 5, the method includes:
- step S101 in the flowchart 1 of the abnormal sound detecting method provided in FIG. 2 and the step S202 in the flowchart 2 of the abnormal sound detecting method provided in FIG. 4 are referred to.
- a plurality of normal sound output devices that can normally play the sound can be used to play the same second voice signal; the second voice signal played by the normal sound output device is also stored in each normal In the terminal device corresponding to the sound output device. And, the voice content in the second voice signal is the same as the voice content in the first voice signal.
- the abnormal sound detecting device separately records the second voice signal played by each normal sound output device.
- the abnormal sound detecting means performs signal superimposition processing on each of the second speech signals to obtain a voice reference signal, wherein the voice content of the voice reference signal and the voice content in the second voice signal are the same.
- the process of signal superposition processing can be in the following ways.
- the abnormal sound detecting device performs splicing processing on each second voice signal to obtain a voice reference signal.
- the second mode is that the abnormal sound detecting device superimposes each of the second voice signals in the time domain to obtain a voice reference signal.
- the third mode is: the abnormal sound detecting device can detect each second voice signal in each frequency band, and filter the frequency band of the signal exceeding the preset frequency range in each second voice signal, and then filter After the processing, each of the second speech signals is subjected to synthesis processing to obtain a speech reference signal.
- the first voice signal is time-aligned with the voice reference signal in the time domain to generate a first voice signal after the voice reference signal is aligned.
- the abnormal sound detecting device performs time delay alignment processing on the first voice signal and the voice reference signal in the time domain, so that the first voice signal is aligned with the voice reference signal in the time domain to obtain an aligned voice reference.
- the first speech signal after the signal.
- the delay alignment algorithm may use a delay alignment algorithm to align the first speech signal with the speech reference signal in a time domain, for example, a generalized autocorrelation algorithm (GCC), and a self-correlation algorithm (GCC).
- GCC generalized autocorrelation algorithm
- GCC self-correlation algorithm
- LMS Least Mean Square
- EMD subspace based Eigen-Value Decomposition
- ATF-s ration Acoustic Transfer Functions Ration
- step S102 in the flow chart of the abnormal sound detecting method provided in FIG. 2 and the step S205 in the flow chart 2 of the abnormal sound detecting method provided in FIG. 4 are referred to.
- the S305 specifically includes: removing the voice main band energy in the residual signal, and generating the removed voice main frequency band A residual signal of energy, wherein a frequency of the speech main band energy is less than the first frequency value; and determining an energy value of the residual signal from which the speech main band energy is removed.
- determining an energy value of the residual signal from which the energy of the main band of the voice is removed includes: determining a portion of the residual signal in which the energy of the main band of the voice is removed is greater than a second frequency value, and energy in each frame value.
- the abnormal sound detecting device first needs to calculate the energy value of the residual signal. Since the signal frequency of the main energy of the speech in the residual signal is low, the energy of the energy part of the main frequency band of the speech is greater than the energy of the high frequency abnormal part of the residual signal, and the energy of the main frequency band of the speech Slight fluctuations directly affect the judgment of the high frequency noise energy in the residual signal, so it is necessary to filter the main energy of the speech in the residual signal; at this time, the abnormal sound detection device needs to adopt the high-pass filtering method first.
- the residual signal is processed to remove the main energy band of the voice in the residual signal, and then the residual signal with the energy of the main band of the voice is removed; in the process of removing, the voice master in the residual signal
- the frequency of the band energy is less than the first frequency value, and in the process of removal, the speech main band energy in the residual signal can be removed.
- the high-pass filter is a filtering method.
- the high-pass filtering rule is that the high-frequency signal can pass through the high-pass filter normally, and the low-frequency signal below the set threshold is blocked by the high-pass filter. And weaken, and the high-pass filter can output a high-frequency signal.
- the sampling rate of the sampled speech signal is 8 kHz.
- the frequency of the main energy band of the speech in the sampled speech signal can be calculated to be below 4 kHz.
- the energy of the main frequency band of the speech is much stronger than the energy of the higher harmonics.
- the result of analyzing the speech spectrum of the speech reference signal is that the speech reference signal is very clean and the energy of the higher harmonics is hardly seen.
- the portion of the energy of the higher harmonics represents the portion of the abnormal signal in the speech signal.
- a residual signal can be analyzed, and the energy of the main frequency band portion of the residual signal is stronger than the energy of the higher harmonics. If the residual signal is not subjected to high-pass filtering, in the frequency domain.
- the energy of the higher harmonics only accounts for a small fraction of the total energy of the residual signal; further, slight fluctuations or changes in the energy portion of the main energy band of the speech are more likely to be caused by higher harmonics. Or the change is larger, which seriously affects whether or not the high-order harmonic is generated in the residual signal, thereby affecting whether the residual signal has an abnormal sound.
- the high-pass filter can be used to filter the energy of the main speech band whose frequency is less than the first frequency value; then the residual signal is left.
- the energy is mainly the energy of the higher harmonic part, that is, the remaining energy of the residual signal is the energy of the part of the abnormal sound signal.
- the first frequency value can be set to 4 kHz.
- the abnormal sound detecting means calculates the energy value for the residual signal from which the energy of the main band of the voice is removed.
- the abnormal sound detecting means can calculate the energy value at each frame in which the frequency in the residual signal of the speech main band energy is greater than the second frequency value.
- the energy value of the residual signal of the main energy of the speech is removed, which is also called the out-of-band energy.
- the high-pass filtered residual signal obtained after the high-pass filtering process does not have a signal whose frequency is smaller than the first frequency value, and thus the high-pass filtered residual can be directly calculated from the time domain.
- the time domain energy of the signal yields the energy value of the residual signal from which the energy of the speech main band is removed.
- the high-pass filtered residual signal obtained after the high-pass filtering process also has a signal whose frequency is lower than the first frequency value, and further needs to calculate the high-pass filtered residual signal from the frequency domain. In the frequency domain energy, it is ensured that the energy of the signal whose frequency is less than the first frequency value is not calculated.
- the abnormal sound detecting device needs to perform calculation for the portion of the residual signal from which the energy of the main energy band of the voice is removed is greater than the second frequency value, where the second frequency value can be set to Equal to the first frequency value, the second frequency value setting rate may be set to be greater than the first frequency value according to actual requirements; and, the abnormal sound detecting device calculates the energy value of each part in the frequency less than the second frequency value.
- E_thr n that is, an energy value E_thr n is obtained for one frame; wherein, for one frame, the energy value of one frame is the sum of the squares of the amplitude values of the points in the frame; then, the noise detecting device sets each energy value E_thr n is fitted to an energy curve, which is compared with a preset energy curve.
- S306. Determine, according to the energy value, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
- the S306 specifically includes: determining, in the energy value of each frame, that the energy value that does not have the preset number is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has a different value Sound, and determining that the sound output device is abnormal; determining that the energy value of each frame has a preset number of energy values less than a first energy threshold corresponding to the energy value, determining the first voice signal There is no abnormal sound and it is determined that the sound output device is normal.
- the abnormal sound detecting means compares the energy curve obtained from each energy value E_thr n with a preset energy curve. There is a first energy threshold for each energy value E_thr n on the preset energy curve. Further, if the abnormal sound detecting device determines that each of the energy values E_thr n does not have a preset number of energy values smaller than a first energy threshold corresponding to the energy value E_thr n , it may be determined that the first voice signal has a different value.
- FIG. 6 is an energy curve diagram of still another abnormal sound detecting method provided by an embodiment of the present application.
- the measured energy curve of the first voice signal is obtained by the method provided in this embodiment, and the measured energy curve is a solid curve in FIG. 6, and the dotted curve in FIG. 6 is a preset energy curve; Comparing the measured energy curve with the preset energy curve, determining whether each energy value E_thr n on the measured energy curve is smaller than a first energy threshold value on a preset energy curve corresponding to each energy value, It can be determined from FIG. 6 that the energy values E_thr n on the measured energy curve are not all smaller than the first energy threshold value on the preset energy curve corresponding to each energy value, and the first energy threshold can be determined.
- the voice signal has an abnormal sound, and the sound output device that plays the first voice signal is abnormal.
- the first voice signal played by the sound output device of the terminal device is obtained, wherein the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly; acquiring at least one a second voice signal played by the other sound output device, wherein the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal;
- the voice signal is subjected to signal superposition processing to generate a voice reference signal;
- the first voice signal is time-aligned with the voice reference signal in a time domain to generate a first voice signal after the voice reference signal is aligned; according to the pre-acquired voice reference signal,
- the residual signal is obtained, wherein the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; and the voice main band energy in the residual signal is removed, and the voice is removed.
- a residual signal of the main band energy wherein the frequency of the speech main band energy a first frequency value; determining, in the residual signal from which the energy of the main energy band of the voice is removed, a frequency greater than a second frequency value, an energy value in each frame; and determining, according to the energy value, whether the first voice signal is Have an abnormal sound to determine if the sound output device is abnormal.
- the speech signal itself represents the real frequency point to be detected, so the possibility of missing the problem frequency point is much smaller, and it is advantageous to detect the frequency point with abnormal sound.
- the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise.
- the residual signal includes a portion of the first voice signal different from the signal of the voice reference signal, and then detecting the residual signal to determine whether there is an abnormal sound in the first voice signal, first
- the voice signal and the voice reference signal have the same voice content, the detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.
- FIG. 7 is a schematic flowchart diagram of another method for detecting an abnormal sound according to an embodiment of the present application. As shown in FIG. 7, the method includes:
- Step S101 in the flowchart 1 of the abnormal sound detecting method provided in FIG. 2 referring to step S202 in the flowchart 2 of the abnormal sound detecting method provided in FIG. 4, Step S301 of the flow diagram of still another abnormal sound detecting method provided in FIG.
- S402. Acquire a second voice signal played by at least one other sound output device, where the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal. And performing signal superposition processing on each of the second speech signals to generate a speech reference signal.
- this step refers to step S302 of the flowchart of the other abnormal sound detecting method provided in FIG. 5 .
- this step refers to step S303 of the flowchart of the other abnormal sound detecting method provided in FIG. 5 .
- the S405 specifically includes: removing the voice main band energy in the residual signal, and generating a residual signal with the voice main band energy removed, wherein the frequency of the voice main band energy is smaller than the first frequency value; determining that the voice is removed The energy value of the residual signal of the main band energy.
- determining an energy value of the residual signal from which the energy of the main band of the voice is removed includes: determining a portion of the residual signal in which the energy of the main band of the voice is removed is greater than a second frequency value, and energy in each frame Value; the energy maximum is determined, wherein the energy maximum is the largest of the energy values of the frames.
- the noise detecting device obtains the energy value E_thr n on each frame
- the maximum value of the energy value E_thr n on each frame is calculated to obtain an energy maximum value.
- S406. Determine, according to the energy value, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
- the S406 specifically includes: determining that the first voice signal has an abnormal sound when the energy maximum value is greater than or equal to the second energy threshold, and determining that the sound output device is abnormal; and determining that the energy maximum value is less than the second energy threshold. When it is determined that there is no abnormal sound in the first speech signal, it is determined that the sound output device is normal.
- the abnormal sound detecting device compares and analyzes the obtained energy maximum value with a second energy threshold value, and if the abnormal sound detecting device determines that the energy maximum value is greater than or equal to the second energy threshold value, determining the The first voice signal has an abnormal sound, and determines that the sound output device that plays the first voice signal is abnormal; if the abnormal sound detecting device determines that the energy maximum value is less than the second energy threshold, determining the first voice signal There is no abnormal sound in it, and it is determined that the sound output device that plays the first voice signal is normal.
- the energy value E_thr n on each frame may be averaged to obtain an energy average value; and in S406, the abnormal sound detecting device will obtain the energy average value and a third energy threshold value.
- the noise detection device determines that the energy average value is greater than or equal to the third energy threshold, determining that the first voice signal has an abnormal sound, and determining that the sound output device that plays the first voice signal is abnormal; If the noise detecting means determines that the energy mean is less than the third energy threshold, it is determined that the first voice signal does not have an abnormal sound, and it is determined that the sound output device that plays the first voice signal is normal.
- the first voice signal played by the sound output device of the terminal device is obtained, wherein the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly; acquiring at least one a second voice signal played by the other sound output device, wherein the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal;
- the voice signal is subjected to signal superposition processing to generate a voice reference signal;
- the first voice signal is time-aligned with the voice reference signal in a time domain to generate a first voice signal after the voice reference signal is aligned; according to the pre-acquired voice reference signal,
- the residual signal is obtained, wherein the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; and the voice main band energy in the residual signal is removed, and the voice is removed.
- a residual signal of the main band energy wherein the frequency of the speech main band energy a first frequency value; determining a portion of the residual signal from which the energy of the main energy band of the voice is removed is greater than a second frequency value, an energy value at each frame; determining an energy maximum value, wherein the maximum energy value is The largest value of the energy values of the frame; determining whether there is an abnormal sound in the first speech signal according to the maximum value of the energy to determine whether the sound output device is abnormal.
- the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency. Point exception; and, this application
- the medium speech signal itself represents the real frequency point that needs to be detected, so the possibility of missing the problem frequency point is much smaller, which is beneficial for detecting the frequency point with abnormal sound.
- the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise.
- the residual signal includes a portion of the first voice signal different from the signal of the voice reference signal, and then detecting the residual signal to determine whether there is an abnormal sound in the first voice signal, first
- the voice signal and the voice reference signal have the same voice content, the detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.
- FIG. 8 is a schematic flowchart diagram of still another abnormal sound detecting method according to an embodiment of the present application. As shown in FIG. 8, the method includes:
- S505 Perform high-pass filtering on the residual signal to obtain a residual signal with the energy of the main band of the voice removed.
- S506. Determine an energy value of the residual signal from which the energy of the main band of the voice is removed.
- S508. Determine whether the energy value is greater than or equal to the energy threshold to determine whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
- S5010 Determine that the sound output device is normal when the energy value is determined to be less than the energy threshold.
- the steps of the flow schematic diagram of the other abnormal sound detecting method provided in FIG. 5 and the steps of the flow schematic diagram of another abnormal sound detecting method provided in FIG. 7 may be referred to in each step.
- the principle and effect are the same as the principle and effect of the method provided by the above embodiments.
- FIG. 9 is a schematic structural diagram of an abnormal sound detecting apparatus according to an embodiment of the present application. As shown in Figure 9, the device includes:
- the acquiring unit 81 is configured to acquire a first voice signal that is played by the sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes randomly;
- the calculating unit 82 is configured to obtain a residual signal according to the pre-acquired voice reference signal and the first voice signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal;
- the determining unit 83 is configured to determine, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
- the obtaining unit 81 may perform step S101 of the method shown in FIG. 2, or the first obtaining unit 81 may perform step S202 of the method shown in FIG. 4, or the first obtaining unit 81 may perform step S301 of the method shown in FIG. Or the first obtaining unit 81 can perform step S401 of the method shown in FIG.
- the computing unit 82 may perform step S102 of the method illustrated in FIG. 2, or the computing unit 82 may perform step S205 of the method illustrated in FIG. 4, or the computing unit 82 may perform step S304 of the method illustrated in FIG. 5, or the computing unit 82 may perform Step S404 of the method shown in FIG.
- the determining unit 83 may perform step S103 of the method illustrated in FIG. 2, or the determining unit 83 may perform step S205 of the method illustrated in FIG.
- the abnormal sound detecting device of the embodiment shown in FIG. 9 can be used to perform the technical solution of the embodiment shown in FIG. 2 to FIG. 4 in the above method, and the implementation principle and technical effects are similar, and details are not described herein again.
- FIG. 10 is a schematic structural diagram of still another abnormal sound detecting apparatus according to an embodiment of the present application.
- the determining unit 83 includes:
- the first determining module 831 is configured to determine an energy value of the residual signal.
- the first determining module 831 can perform step S305 of the method shown in FIG. 5, or the first determining module 831 can perform step S405 of the method shown in FIG.
- the second determining module 832 is configured to determine, according to the energy value, whether the first voice signal has an abnormal sound.
- the second determining module 832 can perform step S306 of the method shown in FIG. 5, or the second determining module 832 can perform step S406 of the method shown in FIG.
- the first determining module 831 includes:
- the removal sub-module 8311 is configured to remove the voice main band energy in the residual signal, and generate a residual signal with the voice main band energy removed, wherein the frequency of the voice main band energy is smaller than the first frequency value.
- the removing submodule 8311 can perform the step of removing the voice main band energy in the residual signal in step S305 of the method shown in FIG. 5, and generate a residual signal with the voice main band energy removed, wherein the voice main frequency band The frequency of the energy is less than the first frequency value", or the removal sub-module 8311 can perform the process of removing the voice main band energy in the residual signal in step S405 of the method shown in FIG. A residual signal of energy, wherein the frequency of the speech main band energy is less than the first frequency value.
- the determining sub-module 8312 is configured to determine an energy value of the residual signal from which the speech main band energy is removed.
- the determining sub-module 8312 may perform the process of “determining the energy value of the residual signal with the voice mainband energy removed” in step S305 of the method shown in FIG. 5, or the determining sub-module 8312 may perform the process shown in FIG. 7.
- the determining submodule 8312 is specifically configured to:
- the energy value in each frame is determined by the portion of the residual signal from which the energy of the speech main band energy is removed is greater than the second frequency value.
- the determination sub-module 8312 can perform the process of "determining the energy value of the residual signal from which the speech main band energy is removed" in step S305 of the method shown in FIG.
- the second determining module 832 is specifically configured to:
- the second determination module 832 can perform step S306 of the method shown in FIG.
- the determining sub-module 8312 is specifically configured to:
- the determination sub-module 8312 can perform the process of "determining the energy value of the residual signal from which the speech main band energy is removed" in step S405 of the method shown in FIG.
- the second determining module 832 is specifically configured to:
- the second determination module 832 can perform step S406 of the method shown in FIG.
- the method further includes:
- the generating unit 91 is configured to acquire, after the calculating unit 82 obtains the residual signal according to the pre-acquired voice reference signal and the first voice signal, the second voice signal played by the at least one other sound output device, wherein the other voice output
- the device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal; each second voice signal is subjected to signal superposition processing to generate a voice reference signal.
- the generating unit 91 may perform step S302 of the method shown in FIG. 5, or the generating unit 91 may perform step S402 of the method shown in FIG.
- the aligning unit 92 is configured to: after the calculating unit 82 obtains the residual signal according to the pre-acquired voice reference signal and the first voice signal, delay the first voice signal with the voice reference signal in the time domain to generate an alignment.
- the first speech signal after the speech reference signal can perform step S303 of the method shown in FIG. 5, or the aligning unit 92 can perform step S403 of the method shown in FIG.
- the abnormal sound detecting device of the embodiment shown in FIG. 10 can be used to perform the technical solution of the embodiment shown in FIG. 5 to FIG. 8 in the above method, and the implementation principle and technical effects are similar, and details are not described herein again.
- this embodiment does not depend on whether the embodiment shown in FIG. 9 is implemented, and the embodiment can be implemented independently.
- FIG. 11 is a schematic structural diagram of another abnormal sound detecting apparatus according to an embodiment of the present application.
- the network device includes a transmitter 261, a receiver 262, and a processor 263.
- the receiver 262 is configured to acquire a first voice signal played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly.
- the processor 263 is configured to obtain a residual signal according to the pre-acquired voice reference signal and the first voice signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; The signal determines whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
- the receiver 262 can implement the function of the obtaining unit 81 in the apparatus shown in FIG. 9, and further, the receiver 262 can perform step S101 of the method shown in FIG. 2, or the receiver 262 can perform the steps of the method shown in FIG. S202, or the receiver 262 may perform step S301 of the method illustrated in FIG. 5, or the receiver 262 may perform step S401 of the method illustrated in FIG.
- the processor 263 can implement the functions of the computing unit 82 and the determining unit 83 in the apparatus shown in FIG. 9, and further, the processor 263 can perform steps S102 and S103 of the method shown in FIG. 2, or the processor 263 can execute the method shown in FIG. Step S205 of the method.
- the processor 263 is specifically configured to determine an energy value of the residual signal, and determine, according to the energy value, whether the first voice signal has an abnormal sound. At this time, the processor 263 can implement the functions of the first determining module 831 and the second determining module 832 in the apparatus shown in FIG. 10, and further, the processor 263 can perform step S305 of the method shown in FIG. 5 and S306, or processor 263, may perform steps S405 and S406 of the method illustrated in FIG.
- the processor 263 is specifically configured to remove the voice main band energy in the residual signal, and generate a residual signal with the voice main band energy removed, wherein the frequency of the voice main band energy is smaller than the first frequency value; determining that the voice is removed The energy value of the residual signal of the main band energy.
- the processor 263 can implement the functions of the removal submodule 8311 and the determination submodule 8312 in the apparatus shown in FIG. 10, and further, the processor 263 can execute step S305 of the method shown in FIG. 5, or the processor 263 can execute the diagram. Step S405 of the method shown in 7.
- the processor 263 is specifically configured to determine, in the residual signal in which the energy of the voice main band energy is removed, a portion whose frequency is greater than the second frequency value, and the energy value in each frame; in determining the energy value of each frame, When the energy value without the preset number is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal; determining the energy of each frame In the value, when the energy value of the preset number is less than the first energy threshold corresponding to the energy value, it is determined that the first voice signal does not have an abnormal sound, and the sound output device is determined to be normal.
- the processor 263 can implement the functions of the determining sub-module 8312 and the second determining module 832 in the apparatus shown in FIG. 10, and further, the processor 263 can perform the step of determining the voice removed in step S305 of the method shown in FIG. The process of the energy value of the residual signal of the main band energy" and the step S306 of the method shown in FIG.
- the processor 263 is specifically configured to determine a portion of the residual signal in which the energy of the voice main band is removed, a frequency greater than the second frequency value, and an energy value in each frame; and determine an energy maximum, where the energy maximum The maximum value of the energy values of each frame; when determining that the energy maximum value is greater than or equal to the second energy threshold value, determining that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal; When the energy threshold is two, it is determined that there is no abnormal sound in the first speech signal, and it is determined that the sound output device is normal.
- the processor 263 can implement the functions of the determining sub-module 8312 and the second determining module 832 in the apparatus shown in FIG. 10, and further, the processor 263 can perform the "determining the removed voice" in step S405 of the method shown in FIG. The process of the energy value of the residual signal of the main band energy, and the step S406 of the method shown in FIG.
- the receiver 262 is further configured to acquire a second voice signal played by at least one other sound output device, wherein the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is in the first voice signal.
- the voice content is the same.
- the receiver 262 can implement a part of the functions of the generating unit 91 in the apparatus shown in FIG. 10, and further, the receiver 262 can perform the "acquisition of at least one other sound output device" in step S302 of the method shown in FIG.
- the process of the second voice signal, or the receiver 262 may perform the process of "acquiring the second voice signal played by the at least one other sound output device" in step S402 of the method shown in FIG.
- the processor 263 is further configured to perform signal superposition processing on each of the second speech signals to generate a speech reference signal.
- the processor 263 can implement a part of the functions of the generating unit 91 in the apparatus shown in FIG. 10, and further, the processor 263 can perform the signal superimposition processing on each of the second voice signals in step S302 of the method shown in FIG.
- the process of generating a voice reference signal, or the processor 263 may perform the process of "signal superimposing each second voice signal to generate a voice reference signal" in step S402 of the method shown in FIG.
- the processor 263 is further configured to perform time delay alignment of the first voice signal with the voice reference signal in the time domain to generate a first voice signal after the voice reference signal is aligned.
- the processor 263 can implement the function of the aligning unit 92 in the apparatus shown in FIG. 10, and further, the processor 263 can execute step S303 of the method shown in FIG. 5, or the processor 263 can execute the steps of the method shown in FIG. S403.
- the abnormal sound detecting device of the embodiment shown in FIG. 11 can be used to execute the technical solution of the above method embodiment, or In the program of each module of the embodiment shown in FIG. 10, the processor 263 calls the program to perform the operations of the above method embodiments to implement the modules shown in FIG. 9 and FIG.
- the processor 263 may also be a controller, and is represented as "controller/processor 263" in FIG.
- the transmitter 261 and the receiver 262 are configured to support transmission and reception of information between the network device and the terminal device in the above embodiment, and to support radio communication between the terminal device and other terminal devices.
- the processor 263 performs various functions for communicating with the terminal device.
- the network device may further include a memory 264 for storing program codes and data of the network device.
- the processor 263 such as a central processing unit (CPU), may also be one or more integrated circuits configured to implement the above method, for example, one or more application specific integrated circuits (ASICs), Or, one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs).
- the memory 264 can be a memory or a collective name for a plurality of storage elements.
- the transmitter 261 included in the abnormal sound detecting apparatus of FIG. 11 may perform a sending operation corresponding to the foregoing method embodiment, and the processor 263 performs processing operations such as processing, determining, and acquiring, and the receiver.
- the receiving action can be performed.
- the receiver 262 included in the abnormal sound detecting device of Fig. 11 corresponds to the operation of acquiring a voice signal in the above-described method embodiment.
- the computer program product includes one or more computer instructions.
- the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
- the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server or data center by wire (eg, coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
- the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
- the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (eg, a Solid State Disk (SSD)) or the like.
- a magnetic medium eg, a floppy disk, a hard disk, a magnetic tape
- an optical medium eg, a DVD
- a semiconductor medium eg, a Solid State Disk (SSD)
- the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof.
- the functions may be stored in a computer readable medium or transmitted as one or more instructions or code on a computer readable medium.
- Computer readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one location to another.
- a storage medium may be any available media that can be accessed by a general purpose or special purpose computer.
Landscapes
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Otolaryngology (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Telephone Function (AREA)
Abstract
An abnormal sound detection method and apparatus (02). The method comprises: acquiring a first speech signal played by a sound output device (03) of a terminal device (01), the first speech signal being locally stored in the terminal device (01), and the first speech signal comprising audio information with an irregularly changing frequency (101, 301, 401, 501); obtaining a residual signal according to a reference speech signal acquired in advance and the first speech signal, the residual signal comprising a part of the first speech signal that is different from the reference speech signal (102, 304, 404); and determining, according to the residual signal, whether there is an abnormal sound in the first speech signal so as to determine whether the sound output device is abnormal (103). The speech signal represents a real use scenario of a user, and frequency points are triggered repeatedly together in an actual frequency band of the speech during an entire playing process of the speech signal, facilitating discovery of a problematic frequency point. The speech signal itself represents actual frequency points to undergo detection, and the probability of missing a problematic frequency point is greatly reduced. The detection method is convenient and universal, and has an accurate detection result.
Description
相关申请交叉引用Related application cross-reference
本申请要求于2017年1月20日提交中国专利局、申请号为201710045605.6、发明名称为“一种语音激励的方法和终端”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims priority to Chinese Patent Application No. JP-A No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. No. in.
本申请涉及终端技术领域,尤其涉及一种异音检测方法和装置。The present application relates to the field of terminal technologies, and in particular, to an abnormal sound detection method and apparatus.
随着终端技术的发展,各类终端已经广泛的应用到人们的生活中,终端例如有智能手机、电脑、耳机、智能手表等等。在终端中一般都会设置声音输出器件,声音输出器件例如有扬声器、受话器等等,终端需要利用声音输出器件播放音频信号。声音输出器件由于设计缺陷、组装瑕疵、异物进入等种种原因,会导致播放音频信号的时候产生异音。从而在终端进行销售之前,需要对终端上的声音输出器件进行检测,检测声音输出器件在播放音频信号的时候是否会出现异音。With the development of terminal technologies, various types of terminals have been widely used in people's lives, such as smart phones, computers, earphones, smart watches, and the like. A sound output device is generally provided in the terminal, and the sound output device includes, for example, a speaker, a receiver, etc., and the terminal needs to play an audio signal by using the sound output device. The sound output device may cause an abnormal sound when playing an audio signal due to various reasons such as design defects, assembly defects, and foreign matter entering. Therefore, before the terminal sells, it is necessary to detect the sound output device on the terminal, and detect whether the sound output device has an abnormal sound when playing the audio signal.
现有技术中,采用待检测的声音输出器件去播放扫频信号,然后采用检测系统录下该待检测的声音输出器件所播放的扫频信号,然后计算出扫频信号上各频段的高次谐波失真能量,然后判断各频段的高次谐波失真能量是否超出了各频段的能量门限值。在确定只要有一个频段的高次谐波失真能量超出了该频段的能量门限值的时候,或者在确定多个频段的高次谐波失真能量超出了各自频段的能量门限值的时候,可以确定待检测的声音输出器件中具有异音,进而确定待检测的声音输出器件是异常的。In the prior art, the sound output device to be detected is used to play the frequency sweep signal, and then the detection system records the frequency sweep signal played by the sound output device to be detected, and then calculates the high frequency of each frequency band on the frequency sweep signal. Harmonic distortion energy, and then determine whether the high-order harmonic distortion energy of each frequency band exceeds the energy threshold of each frequency band. When determining that the high-order harmonic distortion energy of one frequency band exceeds the energy threshold of the frequency band, or when determining that the high-order harmonic distortion energy of the multiple frequency bands exceeds the energy threshold of the respective frequency band, It can be determined that the sound output device to be detected has an abnormal sound, thereby determining that the sound output device to be detected is abnormal.
然而现有技术中,由于扫频信号是在某一个频段内,频率由高到低、或者频率由低到高的单调变化的过程,在扫频信号中的每一个频点持续的时间很短,进而可能会出现某一个频点还没有激发出较明显的高次谐波能量的时候,就去扫描下一个频点了,此时该频点可能出现的问题没有被检测出来。并且,在声音输出器件实际被使用的时候,不大可能只播放扫频信号这样简单的音频信号。从而现有技术中会出现不能准确的检测待检测的声音输出器件所播放的扫频信号中的异音的情况,无法准确的检测出待检测的声音输出器件是否异常,现有的检测方法不准确。However, in the prior art, since the frequency sweep signal is in a certain frequency band, the frequency is from high to low, or the frequency is from monotonous change to low frequency, each frequency point in the frequency sweep signal lasts for a short time. Then, when a certain frequency point has not yet ignited a relatively high harmonic energy, the next frequency point is scanned, and the problem that may occur at the frequency point is not detected. Also, when the sound output device is actually used, it is unlikely that only a simple audio signal such as a swept signal will be played. Therefore, in the prior art, the abnormal sound in the frequency sweeping signal played by the sound output device to be detected cannot be accurately detected, and it is impossible to accurately detect whether the sound output device to be detected is abnormal, and the existing detection method does not accurate.
发明内容Summary of the invention
本申请提供一种异音检测方法和装置,以解决现有技术中检测待检测的声音输出器件在播放音频信号的时候是否产生异音并不准确,无法准确的检测出待检测的声音输出器件是否异常的问题。The present invention provides an abnormal sound detecting method and apparatus for solving the problem that whether the sound output device to be detected in the prior art detects abnormal sound when playing an audio signal is inaccurate, and the sound output device to be detected cannot be accurately detected. Is it an abnormal problem?
第一方面,本申请提供一种异音检测方法,包括:获取终端设备的声音输出器件所
播放的第一语音信号,该第一语音信号为终端设备中本地存储的,且该第一语音信号中包括有频率无规则变化的音频信息;根据预先获取的语音参考信号和该第一语音信号,得到一个残差信号,在该残差信号中包括了第一语音信号中与语音参考信号的信号不同的部分;然后再根据残差信号,确定出第一语音信号中是否具有异音,进而确定出声音输出器件是否异常。In a first aspect, the present application provides an abnormal sound detecting method, including: acquiring a sound output device of a terminal device
And playing the first voice signal, the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information with irregular frequency changes; according to the pre-acquired voice reference signal and the first voice signal Obtaining a residual signal, wherein the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; and then determining, according to the residual signal, whether the first voice signal has an abnormal sound, and further Determine if the sound output device is abnormal.
在一种可能的设计中,根据残差信号,确定出第一语音信号中是否具有异音,包括:确定出该残差信号的能量值;根据计算出能量值,判断该第一语音信号中是否具有异音。In a possible design, determining whether the first voice signal has an abnormal sound according to the residual signal comprises: determining an energy value of the residual signal; and determining, according to the calculated energy value, the first voice signal Whether it has an abnormal sound.
在一种可能的设计中,确定出该残差信号的能量值,包括:去除掉残差信号中的语音主频带能量,从而得到去除了语音主频带能量的残差信号,其中,在去除语音主频带能量的过程中会设置所去除的语音主频带能量的频率,是小于第一频率值的;然后再确定出去除了语音主频带能量的残差信号的能量值。In a possible design, determining the energy value of the residual signal includes: removing the voice main band energy in the residual signal, thereby obtaining a residual signal with the voice main band energy removed, wherein In the process of removing the energy of the main band of the voice, the frequency of the removed main energy of the speech band is set to be smaller than the first frequency value; and then the energy value of the residual signal except the energy of the main band of the speech is determined.
在一种可能的设计中,确定出去除了语音主频带能量的残差信号的能量值,包括:确定出去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,然后再计算出该部分在每一帧上的能量值。对应于此处,根据能量值,确定第一语音信号中是否具有异音,包括以下过程:In a possible design, determining the energy value of the residual signal except for the energy of the main band of the voice includes: determining a portion of the residual signal other than the energy of the main band of the speech that is greater than the second frequency value, and then Then calculate the energy value of the part in each frame. Corresponding to here, according to the energy value, determining whether the first voice signal has an abnormal sound includes the following process:
判断各每一帧上的能量值中,是否具有预设个数的能量值均小于与能量值对应的第一能量门限值;Determining whether an energy value having a preset number of energy values in each frame is smaller than a first energy threshold corresponding to the energy value;
若确定各每一帧上的能量值中,不具有预设个数的能量值均小于与能量值对应的第一能量门限值,则可以确定第一语音信号中具有异音,并确定声音输出器件异常;If it is determined that the energy value of each frame is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has an abnormal sound and determining the sound The output device is abnormal;
若确定各每一帧上的能量值中,具有预设个数的能量值均小于与能量值对应的第一能量门限值,则可以确定第一语音信号中不具有异音,并确定声音输出器件正常。If it is determined that the energy value of each frame is smaller than the first energy threshold corresponding to the energy value, it may be determined that the first voice signal does not have an abnormal sound, and the sound is determined. The output device is normal.
或者,在一种可能的设计中,确定出去除了语音主频带能量的残差信号的能量值,包括:确定出去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,然后再计算出该部分在每一帧上的能量值;然后计算出能量最大值,该能量最大值为各帧的能量值中最大的值。对应于此处,根据能量值,确定第一语音信号中是否具有异音,包括以下过程:Or, in a possible design, determining an energy value of the residual signal except for the energy of the main band of the voice, comprising: determining a portion of the residual signal excluding the energy of the main band of the speech that is greater than the second frequency value Then, calculate the energy value of the part in each frame; then calculate the energy maximum value, which is the largest value among the energy values of each frame. Corresponding to here, according to the energy value, determining whether the first voice signal has an abnormal sound includes the following process:
判断能量最大值,是否大于等于第二能量门限值;Determining whether the energy maximum is greater than or equal to the second energy threshold;
若确定能量最大值大于等于第二能量门限值,则可以确定第一语音信号中具有异音,并确定声音输出器件异常;If it is determined that the energy maximum value is greater than or equal to the second energy threshold, it may be determined that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal;
若确定能量最大值小于第二能量门限值,则可以确定第一语音信号中不具有异音,并确定声音输出器件正常。If it is determined that the energy maximum is less than the second energy threshold, it may be determined that the first voice signal does not have an abnormal sound, and it is determined that the sound output device is normal.
在一种可能的设计中,在根据预先获取的语音参考信号、以及第一语音信号,得到一个残差信号之前,还包括有:获取至少一个其他声音输出器件所播放的第二语音信号,各该其他声音输出器件为播放声音正常的声音输出器件,第二语音信号中的语音内容与第一语音信号中的语音内容是相同的;然后再将各第二语音信号进行信号叠加处理,生成上述语音参考信号。In a possible design, before obtaining a residual signal according to the pre-acquired voice reference signal and the first voice signal, the method further includes: acquiring a second voice signal played by at least one other sound output device, each The other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal; and then the second voice signal is superimposed and processed to generate the above Voice reference signal.
在一种可能的设计中,在根据预先获取的语音参考信号、以及第一语音信号,得到一个残差信号之前,还包括:在时域上将第一语音信号与语音参考信号进行时延对齐处理,生成一个对齐语音参考信号后的第一语音信号。
In a possible design, before obtaining a residual signal according to the pre-acquired voice reference signal and the first voice signal, the method further includes: delaying the first voice signal and the voice reference signal in a time domain. Processing, generating a first speech signal after aligning the speech reference signal.
本公开的实施例提供的技术方案可以包括以下有益效果:通过获取终端设备的声音输出器件所播放的第一语音信号,其中,第一语音信号为终端设备中本地存储的,且该第一语音信号中包括有频率无规则变化的音频信息;根据预先获取的语音参考信号、以及第一语音信号,得到残差信号,其中,残差信号为第一语音信号中与语音参考信号的信号不同的部分;根据残差信号,确定第一语音信号中是否具有异音,以确定声音输出器件是否异常。从而提供了一种检测声音输出器件播放音频的时候是否出现异音,以确定该声音输出器件是否异常的方式。由于采用的检测的待检测信号为语音信号,语音信号可以代表着用户真实的使用场景,在语音信号的整个播放过程会在集中在语音的实际频段内反复触发,进而有利于去发现有问题频点的异常;并且,本申请中语音信号本身就代表着需要检测的真实频点,因此漏掉有问题频点的可能性就会小得多,有利于检测出有异音的频点。同时,采用的待检测信号是声音输出器件所播放的终端设备中本地存储的语音信号,进而避免了语音信号在传输过程中出现了丢包现象、或者被掺杂杂音而引发出异音的问题,并且,采用的是对第一语音信号中与语音参考信号的信号不同的部分进行检测,去确定第一语音信号中是否存在着异音,第一语音信号与语音参考信号的语音内容相同,检测方式较为便捷、且检测方法的通用性较好,提高了检测结果的准确性。The technical solution provided by the embodiment of the present disclosure may include the following beneficial effects: obtaining the first voice signal played by the sound output device of the terminal device, wherein the first voice signal is locally stored in the terminal device, and the first voice is The signal includes audio information with irregular frequency changes; and the residual signal is obtained according to the pre-acquired voice reference signal and the first voice signal, wherein the residual signal is different from the signal of the voice reference signal in the first voice signal. And determining, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal. There is thus provided a way of detecting whether an abnormal sound occurs when the sound output device plays audio to determine whether the sound output device is abnormal. Since the detected signal to be detected is a voice signal, the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency. The abnormality of the point; and, in the present application, the speech signal itself represents the real frequency point to be detected, so the possibility of missing the problem frequency point is much smaller, and it is advantageous to detect the frequency point with abnormal sound. At the same time, the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise. And detecting that a portion of the first voice signal different from the signal of the voice reference signal is used to determine whether there is an abnormal sound in the first voice signal, and the first voice signal is the same as the voice content of the voice reference signal. The detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.
第二方面,本申请提供一种异音检测装置,包括:In a second aspect, the present application provides an abnormal sound detecting apparatus, including:
获取单元,用于获取终端设备的声音输出器件所播放的第一语音信号,该第一语音信号为终端设备中本地存储的,且该第一语音信号中包括有频率无规则变化的音频信息;An acquiring unit, configured to acquire a first voice signal that is played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information with irregular frequency changes;
计算单元,用于根据预先获取的语音参考信号和该第一语音信号,得到一个残差信号,在该残差信号中包括了第一语音信号中与语音参考信号的信号不同的部分;a calculating unit, configured to obtain, according to the pre-acquired voice reference signal and the first voice signal, a residual signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal;
确定单元,用于根据残差信号,确定出第一语音信号中是否具有异音,进而确定出声音输出器件是否异常。And a determining unit, configured to determine, according to the residual signal, whether the first voice signal has an abnormal sound, thereby determining whether the sound output device is abnormal.
在一种可能的设计中,确定单元,包括:第一确定模块,用于确定出该残差信号的能量值;第二确定模块,用于根据计算出能量值,判断该第一语音信号中是否具有异音。In a possible design, the determining unit includes: a first determining module, configured to determine an energy value of the residual signal; and a second determining module, configured to determine, according to the calculated energy value, the first voice signal Whether it has an abnormal sound.
在一种可能的设计中,第一确定模块,包括:In a possible design, the first determining module comprises:
去除子模块,用于去除掉残差信号中的语音主频带能量,从而得到去除了语音主频带能量的残差信号,其中,在去除语音主频带能量的过程中会设置所去除的语音主频带能量的频率,是小于第一频率值的;Removing the sub-module for removing the main energy of the speech in the residual signal, thereby obtaining a residual signal from which the energy of the main band of the speech is removed, wherein the removed signal is set in the process of removing the energy of the main band of the speech The frequency of the main energy of the speech is less than the first frequency value;
确定子模块,确定出去除了语音主频带能量的残差信号的能量值。A sub-module is determined to determine the energy value of the residual signal in addition to the energy of the speech main band.
在一种可能的设计中,确定子模块,具体用于:确定出去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,然后再计算出该部分在每一帧上的能量值。对应于此处,第二确定模块,具体用于:In a possible design, the determining sub-module is specifically configured to: determine a portion of the residual signal excluding the energy of the main energy band of the voice that is greater than the second frequency value, and then calculate the portion on each frame. Energy value. Corresponding to here, the second determining module is specifically used for:
判断各每一帧上的能量值中,是否具有预设个数的能量值均小于与能量值对应的第一能量门限值;Determining whether an energy value having a preset number of energy values in each frame is smaller than a first energy threshold corresponding to the energy value;
若确定各每一帧上的能量值中,不具有预设个数的能量值均小于与能量值对应的第一能量门限值,则可以确定第一语音信号中具有异音,并确定声音输出器件异常;If it is determined that the energy value of each frame is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has an abnormal sound and determining the sound The output device is abnormal;
若确定各每一帧上的能量值中,具有预设个数的能量值均小于与能量值对应的第一能量门限值,则可以确定第一语音信号中不具有异音,并确定声音输出器件正常。
If it is determined that the energy value of each frame is smaller than the first energy threshold corresponding to the energy value, it may be determined that the first voice signal does not have an abnormal sound, and the sound is determined. The output device is normal.
或者,在一种可能的设计中,确定子模块,具体用于:确确定出去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,然后再计算出该部分在每一帧上的能量值;然后计算出能量最大值,该能量最大值为各帧的能量值中最大的值。对应于此处,第二确定模块,具体用于:Or, in a possible design, determining a sub-module, specifically: determining to determine a portion of the residual signal except the energy of the main energy band of the voice that is greater than the second frequency value, and then calculating the portion in each The energy value on one frame; then the energy maximum is calculated, which is the largest of the energy values of each frame. Corresponding to here, the second determining module is specifically used for:
判断能量最大值,是否大于等于第二能量门限值;Determining whether the energy maximum is greater than or equal to the second energy threshold;
若确定能量最大值大于等于第二能量门限值,则可以确定第一语音信号中具有异音,并确定声音输出器件异常;If it is determined that the energy maximum value is greater than or equal to the second energy threshold, it may be determined that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal;
若确定能量最大值小于第二能量门限值,则可以确定第一语音信号中不具有异音,并确定声音输出器件正常。If it is determined that the energy maximum is less than the second energy threshold, it may be determined that the first voice signal does not have an abnormal sound, and it is determined that the sound output device is normal.
在一种可能的设计中,装置,还包括:In one possible design, the device further includes:
生成单元,用于在计算单元根据预先获取的语音参考信号、以及第一语音信号,得到一个残差信号之前,获取至少一个其他声音输出器件所播放的第二语音信号,各该其他声音输出器件为播放声音正常的声音输出器件,第二语音信号中的语音内容与第一语音信号中的语音内容是相同的;然后再将各第二语音信号进行信号叠加处理,生成上述语音参考信号。a generating unit, configured to acquire, by the computing unit, a second voice signal played by at least one other sound output device before obtaining a residual signal according to the pre-acquired voice reference signal and the first voice signal, each of the other sound output devices In order to play the sound output device with normal sound, the voice content in the second voice signal is the same as the voice content in the first voice signal; then the second voice signal is subjected to signal superposition processing to generate the voice reference signal.
在一种可能的设计中,装置,还包括:In one possible design, the device further includes:
对齐单元,用于在计算单元根据预先获取的语音参考信号、以及第一语音信号,得到一个残差信号之前,在时域上将第一语音信号与语音参考信号进行时延对齐处理,生成一个对齐语音参考信号后的第一语音信号。And an aligning unit, configured to perform delay alignment processing on the first speech signal and the voice reference signal in the time domain before the calculating unit obtains a residual signal according to the pre-acquired voice reference signal and the first voice signal, to generate a The first speech signal after the speech reference signal is aligned.
本公开的实施例提供的技术方案可以包括以下有益效果:通过获取终端设备的声音输出器件所播放的第一语音信号,其中,第一语音信号为终端设备中本地存储的,且该第一语音信号中包括有频率无规则变化的音频信息;根据预先获取的语音参考信号、以及第一语音信号,得到残差信号,其中,残差信号为第一语音信号中与语音参考信号的信号不同的部分;根据残差信号,确定第一语音信号中是否具有异音,以确定声音输出器件是否异常。从而提供了一种检测声音输出器件播放音频的时候是否出现异音,以确定该声音输出器件是否异常的方式。由于采用的检测的待检测信号为语音信号,语音信号可以代表着用户真实的使用场景,在语音信号的整个播放过程会在集中在语音的实际频段内反复触发,进而有利于去发现有问题频点的异常;并且,本申请中语音信号本身就代表着需要检测的真实频点,因此漏掉有问题频点的可能性就会小得多,有利于检测出有异音的频点。同时,采用的待检测信号是声音输出器件所播放的终端设备中本地存储的语音信号,进而避免了语音信号在传输过程中出现了丢包现象、或者被掺杂杂音而引发出异音的问题,并且,采用的是对第一语音信号中与语音参考信号的信号不同的部分进行检测,去确定第一语音信号中是否存在着异音,第一语音信号与语音参考信号的语音内容相同,检测方式较为便捷、且检测方法的通用性较好,提高了检测结果的准确性。The technical solution provided by the embodiment of the present disclosure may include the following beneficial effects: obtaining the first voice signal played by the sound output device of the terminal device, wherein the first voice signal is locally stored in the terminal device, and the first voice is The signal includes audio information with irregular frequency changes; and the residual signal is obtained according to the pre-acquired voice reference signal and the first voice signal, wherein the residual signal is different from the signal of the voice reference signal in the first voice signal. And determining, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal. There is thus provided a way of detecting whether an abnormal sound occurs when the sound output device plays audio to determine whether the sound output device is abnormal. Since the detected signal to be detected is a voice signal, the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency. The abnormality of the point; and, in the present application, the speech signal itself represents the real frequency point to be detected, so the possibility of missing the problem frequency point is much smaller, and it is advantageous to detect the frequency point with abnormal sound. At the same time, the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise. And detecting that a portion of the first voice signal different from the signal of the voice reference signal is used to determine whether there is an abnormal sound in the first voice signal, and the first voice signal is the same as the voice content of the voice reference signal. The detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.
第三方面,本申请提供一种计算机程序,该程序在被处理器执行时用于执行以上第一方面的方法。In a third aspect, the present application provides a computer program for performing the method of the above first aspect when executed by a processor.
第四方面,本申请提供一种程序产品,例如计算机可读存储介质,包括第三方面的程序。
In a fourth aspect, the application provides a program product, such as a computer readable storage medium, comprising the program of the third aspect.
第五方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述各方面的方法。In a fifth aspect, a computer program product comprising instructions for causing a computer to perform the methods of the above aspects when run on a computer is provided.
可见,分别在以上第三方面、第四方面、第五方面中,通过获取终端设备的声音输出器件所播放的第一语音信号,其中,第一语音信号为终端设备中本地存储的,且该第一语音信号中包括有频率无规则变化的音频信息;根据预先获取的语音参考信号、以及第一语音信号,得到残差信号,其中,残差信号为第一语音信号中与语音参考信号的信号不同的部分;根据残差信号,确定第一语音信号中是否具有异音,以确定声音输出器件是否异常。从而提供了一种检测声音输出器件播放音频的时候是否出现异音,以确定该声音输出器件是否异常的方式。由于采用的检测的待检测信号为语音信号,语音信号可以代表着用户真实的使用场景,在语音信号的整个播放过程会在集中在语音的实际频段内反复触发,进而有利于去发现有问题频点的异常;并且,本申请中语音信号本身就代表着需要检测的真实频点,因此漏掉有问题频点的可能性就会小得多,有利于检测出有异音的频点。同时,采用的待检测信号是声音输出器件所播放的终端设备中本地存储的语音信号,进而避免了语音信号在传输过程中出现了丢包现象、或者被掺杂杂音而引发出异音的问题,并且,采用的是对第一语音信号中与语音参考信号的信号不同的部分进行检测,去确定第一语音信号中是否存在着异音,第一语音信号与语音参考信号的语音内容相同,检测方式较为便捷、且检测方法的通用性较好,提高了检测结果的准确性。It can be seen that, in the foregoing third aspect, the fourth aspect, and the fifth aspect, the first voice signal played by the sound output device of the terminal device is obtained, wherein the first voice signal is locally stored in the terminal device, and the The first voice signal includes audio information whose frequency is irregularly changed; and the residual signal is obtained according to the previously obtained voice reference signal and the first voice signal, wherein the residual signal is the first voice signal and the voice reference signal a portion of the signal; determining whether the first speech signal has an abnormal sound based on the residual signal to determine whether the sound output device is abnormal. There is thus provided a way of detecting whether an abnormal sound occurs when the sound output device plays audio to determine whether the sound output device is abnormal. Since the detected signal to be detected is a voice signal, the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency. The abnormality of the point; and, in the present application, the speech signal itself represents the real frequency point to be detected, so the possibility of missing the problem frequency point is much smaller, and it is advantageous to detect the frequency point with abnormal sound. At the same time, the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise. And detecting that a portion of the first voice signal different from the signal of the voice reference signal is used to determine whether there is an abnormal sound in the first voice signal, and the first voice signal is the same as the voice content of the voice reference signal. The detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.
图1为本申请实施例提供的一种应用场景示意图一;FIG. 1 is a schematic diagram 1 of an application scenario according to an embodiment of the present disclosure;
图2为本申请实施例提供的一种异音检测方法的流程示意图一;2 is a schematic flowchart 1 of a method for detecting an abnormal sound according to an embodiment of the present application;
图3为本申请实施例提供的一种异音检测方法中采用的自适应滤波方法的原理图;FIG. 3 is a schematic diagram of an adaptive filtering method used in an abnormal sound detecting method according to an embodiment of the present disclosure;
图4为本申请实施例提供的一种异音检测方法的流程示意图二;4 is a schematic flowchart 2 of a method for detecting an abnormal sound according to an embodiment of the present application;
图5为本申请实施例提供的又一种异音检测方法的流程示意图;FIG. 5 is a schematic flowchart diagram of still another abnormal sound detecting method according to an embodiment of the present application;
图6为本申请实施例提供的又一种异音检测方法中的能量曲线图;6 is an energy curve diagram of still another abnormal sound detecting method according to an embodiment of the present application;
图7为本申请实施例提供的另一种异音检测方法的流程示意图;FIG. 7 is a schematic flowchart diagram of another method for detecting an abnormal sound according to an embodiment of the present application;
图8为本申请实施例提供的再一种异音检测方法的流程示意图;FIG. 8 is a schematic flowchart diagram of still another abnormal sound detecting method according to an embodiment of the present application;
图9为本申请实施例提供的一种异音检测装置的结构示意图;FIG. 9 is a schematic structural diagram of an abnormal sound detecting apparatus according to an embodiment of the present application;
图10为本申请实施例提供的又一种异音检测装置的结构示意图;FIG. 10 is a schematic structural diagram of still another abnormal sound detecting apparatus according to an embodiment of the present application;
图11为本申请实施例提供的另一种异音检测装置的结构示意图。FIG. 11 is a schematic structural diagram of another abnormal sound detecting apparatus according to an embodiment of the present application.
本申请实施例应用于或者异音检测装置中、或者音频检测系统中、或者可以执行本申请实施例的任意系统中,以下对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。需要说明的是,当本申请实施例的方案应用于音频检测系统中、或者可以执行本申请实施例的任意系统中时,音频检测系统、异音检测装置的名称可能发生变化,但这并不影响本申请实施例方案的实施。
The embodiments of the present application are applied to either the abnormal sound detecting device, or the audio detecting system, or any system that can perform the embodiments of the present application. Some of the terms in the present application are explained below to facilitate understanding by those skilled in the art. It should be noted that when the solution of the embodiment of the present application is applied to an audio detection system or can be executed in any system of the embodiment of the present application, the names of the audio detection system and the abnormal sound detection device may change, but this is not The implementation of the solution of the embodiment of the present application is affected.
1)终端设备,又称为终端、用户设备,是一种向用户提供语音和/或数据连通性的设备,例如,具有无线连接功能的手持式设备、车载设备等。常见的终端设备例如包括:手机、平板电脑、笔记本电脑、掌上电脑、移动互联网设备(mobile internet device,MID)、可穿戴设备,其中,可穿戴设备例如包括:智能手表、智能手环、计步器等。1) A terminal device, also referred to as a terminal or user device, is a device that provides voice and/or data connectivity to a user, for example, a handheld device having a wireless connection function, an in-vehicle device, and the like. Common terminal devices include, for example, a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a mobile internet device (MID), and a wearable device. The wearable device includes, for example, a smart watch, a smart wristband, and a step counter. And so on.
2)声音输出器件,是可以播放音频信号的器件,例如,扬声器、受话器;该声音输出器件可以设置在终端设备上。2) A sound output device, which is a device that can play an audio signal, for example, a speaker or a receiver; the sound output device can be disposed on the terminal device.
3)“多个”是指两个或两个以上,其它量词与之类似。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。3) "Multiple" means two or more, and other quantifiers are similar. "and/or", describing the association relationship of the associated objects, indicating that there may be three relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately. The character "/" generally indicates that the contextual object is an "or" relationship.
图1为本申请实施例提供的一种应用场景示意图一。如图1所示,本申请实施例需要使用终端设备01和异音检测装置02,在终端设备01中会设置有声音输出器件03,声音输出器件03可以播放音频信号。如图1所示,终端设备01上的声音输出器件03播放音频信号,异音检测装置02获取到端设备01上的声音输出器件03所播放的播放音频信号,然后,异音检测装置02进行本申请实施例所进行的方案。FIG. 1 is a schematic diagram 1 of an application scenario provided by an embodiment of the present application. As shown in FIG. 1 , the embodiment of the present application needs to use the terminal device 01 and the abnormal sound detecting device 02. In the terminal device 01, a sound output device 03 is provided, and the sound output device 03 can play an audio signal. As shown in FIG. 1, the sound output device 03 on the terminal device 01 plays an audio signal, and the abnormal sound detecting device 02 acquires the played audio signal played by the sound output device 03 on the end device 01, and then the abnormal sound detecting device 02 performs The solution carried out by the embodiment of the present application.
其中,本申请实施例中的终端设备可以指接入终端、用户终端、终端、无线通信设备、用户代理或用户装置等等。其中,用户终端例如有智能手机、智能手表、个人电脑等等。The terminal device in the embodiment of the present application may refer to an access terminal, a user terminal, a terminal, a wireless communication device, a user agent, a user device, or the like. Among them, the user terminal has, for example, a smart phone, a smart watch, a personal computer, and the like.
本申请实施中的声音输出器件可以是扬声器、受话器等等,且本申请实施中的声音输出器件可以设置在本申请实施例中的终端设备上。The sound output device in the implementation of the present application may be a speaker, a receiver, etc., and the sound output device in the implementation of the present application may be disposed on the terminal device in the embodiment of the present application.
图2为本申请实施例提供的一种异音检测方法的流程示意图一。如图2所示,该方法包括:FIG. 2 is a schematic flowchart 1 of a method for detecting an abnormal sound according to an embodiment of the present application. As shown in Figure 2, the method includes:
S101、获取终端设备的声音输出器件所播放的第一语音信号,其中,第一语音信号为终端设备中本地存储的,且第一语音信号包括频率无规则变化的音频信息。S101. Acquire a first voice signal played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly.
在本实施例中,以执行主体为异音检测装置进行说明。终端设备的声音输出器件播放第一语音信号,然后异音检测装置可以获取到该声音输出器件所播放的第一语音信号。In the present embodiment, the description will be made with the execution subject being an abnormal sound detecting device. The sound output device of the terminal device plays the first voice signal, and then the noise detecting device can acquire the first voice signal played by the sound output device.
在本申请中,异音检测装置获取声音输出器件所播放的第一语音信号的方式为:终端设备中已经预先存储好了语音,进而终端设备的声音输出器件可以根据终端设备本地所存储的语音播放出第一语音信号;然后,异音检测装置可以录取该第一语音信号。In the present application, the manner in which the abnormal sound detecting device acquires the first voice signal played by the sound output device is: the voice has been pre-stored in the terminal device, and the sound output device of the terminal device can be stored according to the voice stored locally by the terminal device. The first voice signal is played; then, the abnormal sound detecting device can take the first voice signal.
在本申请中,第一语音信号可以为112时女声的“急救中心请拨120”语音。例如,声音输出器件播放终端设备中本地存储的语音“急救中心请拨120”。在本申请中,可以采用女声的语音,这是因为女声语音相比男声语音而依然,基波频率更高、频段的覆盖范围更大;女声语音在时间轴上的频率能量分布更加具有多样性。In the present application, the first voice signal may be the voice of the "first aid center dial 120" voice of the female voice at 112. For example, the sound output device plays the voice stored locally in the terminal device "Please dial 120 for the emergency center." In this application, the voice of the female voice can be used, because the female voice is still higher than the male voice, the fundamental frequency is higher, and the coverage of the frequency band is larger; the frequency energy distribution of the female voice on the time axis is more diverse. .
相对于现有技术而言,扫频信号与语音信号之间的信号差异性较大。具体来说,首先,现有技术中采用的待检测信号为扫频信号,扫频信号是在某一个频段内,频率由高到低、或者频率由低到高的单调变化的过程,在扫频信号中的每一个频点持续的时间很短;进而会出现某一个频点还没有激发出较明显的高次谐波能量的时候,就去扫描下一个频点了的问题,此时该频点可能出现的问题没有被检测出来;本申请中采用语音信号
作为待检测信号,因为语音信号可以代表着用户真实的使用场景,本申请可以获取声音输出器件所播放的第一语音信号,该第一语音信号中具有频率无规则变化的音频信息,第一语音信号中每一个频点的持续时间是多变的,且第一语音信号中的频率变化是多变性的,在第一语音信号的整个播放过程会在集中在语音的实际频段内反复触发,进而有利于去发现有问题频点的异常。同时,出现异音失真的情况的时候,通常是在非常窄的个别共振频点上出现异音;而现有技术中采用扫频信号作为待检测信号的时候,由于扫频信号的频点是离散的阶跃式扫频,各个频点并不连续,扫描过程中从而很可能漏掉真正有问题的频点;而本申请中语音信号本身就代表着需要检测的真实频点,因此漏掉有问题频点的可能性就会小得多,有利于检测出有异音的频点。Compared with the prior art, the signal difference between the frequency sweep signal and the voice signal is large. Specifically, first, the signal to be detected used in the prior art is a frequency sweep signal, and the frequency sweep signal is a process in which a frequency changes from high to low, or a frequency changes from low to high. Each frequency point in the frequency signal lasts for a short time; in turn, when a certain frequency point has not yet excited the higher harmonic energy, the next frequency point is scanned. Problems that may occur at the frequency point are not detected; voice signals are used in this application.
As the signal to be detected, because the voice signal can represent the real use scenario of the user, the present application can obtain the first voice signal played by the sound output device, and the first voice signal has audio information with irregular frequency change, the first voice The duration of each frequency point in the signal is variable, and the frequency variation in the first speech signal is variability, and the entire playback process of the first speech signal is repeatedly triggered in the actual frequency band concentrated in the speech, and further It is good for finding anomalies with problematic frequencies. At the same time, in the case of abnormal distortion, the abnormal sound is usually generated at a very narrow individual resonance frequency; whereas in the prior art, when the frequency sweep signal is used as the signal to be detected, since the frequency of the frequency sweep signal is Discrete step sweep, each frequency point is not continuous, and it is very likely that the true problem frequency will be missed during the scanning process; however, the speech signal itself in this application represents the real frequency point to be detected, so it is missed. The probability of having a problem frequency is much smaller, which is good for detecting frequencies with abnormal sounds.
S102、根据预先获取的语音参考信号、以及第一语音信号,得到残差信号,其中,残差信号中包括了第一语音信号中与语音参考信号的信号不同的部分。S102. Obtain a residual signal according to the pre-acquired voice reference signal and the first voice signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal.
在本实施例中,异音检测装置已经预先获取到了语音参考信号,其中,语音参考信号的语音内容与第一语音信号的语音内容相同。例如,第一语音信号的语音内容为“你好,请拨打00”,语音参考信号的语音内容也是“你好,请拨打00”。In this embodiment, the abnormal sound detecting device has previously acquired the voice reference signal, wherein the voice content of the voice reference signal is the same as the voice content of the first voice signal. For example, the voice content of the first voice signal is "Hello, please dial 00", and the voice content of the voice reference signal is also "Hello, please dial 00".
异音检测装置需要采用语音参考信号,对待检测的第一语音信号进行自适应滤波处理,以去除待检测的第一语音信号中与语音参考信号的信号一致的部分,保留下待检测的第一语音信号中与语音参考信号的信号不同的部分,进而“保留下的待检测的第一语音信号中与参考信号的信号不同的部分”为残差信号。或者,异音检测装置也可以采用其他的滤波处理方法,根据语音参考信号,对待检测的第一语音信号进行滤波处理,去得到残差信号。The abnormal sound detecting device needs to adopt a voice reference signal, and the first voice signal to be detected is subjected to adaptive filtering processing to remove a portion of the first voice signal to be detected that is consistent with the signal of the voice reference signal, and retain the first to be detected. A portion of the voice signal that is different from the signal of the voice reference signal, and thus "the portion of the first voice signal to be detected that is different from the signal of the reference signal that remains is" is a residual signal. Alternatively, the abnormal sound detecting device may also adopt another filtering processing method, and perform filtering processing on the first voice signal to be detected according to the voice reference signal to obtain a residual signal.
其中,残差信号中会包含有第一语音信号中的与语音参考信号的信号不同的部分;同时,残差信号中有可能也会包含有第一语音信号的一些信号信息,或者残差信号中有可能也会包含有语音参考信号的一些信号信息。The residual signal may include a portion of the first voice signal that is different from the signal of the voice reference signal; and at the same time, the residual signal may also include some signal information of the first voice signal, or a residual signal. It is also possible to include some signal information of the voice reference signal.
举例来说,在采用自适应滤波方法,得到残差信号的时候。图3为本申请实施例提供的一种异音检测方法中采用的自适应滤波方法的原理图,如图3所示,结合本申请,x为第一语音信号,d为语音参考信号,e为残差信号。自适应滤波思想就是通过某种准则不断调整e的值,使的经过滤波处理后的x值(即y值)接近于语音参考信号d的值。具体来说,x(j)表示j时刻的输入的第一语音信号的值,y(j)表示j时刻的输出的滤波处理后的第一语音信号的值,d(j)表示j时刻的语音参考信号的,残差信号e(j)为d(j)与y(j)之差;自适应滤波器的滤波参数,受到残差信号e(j)的值的控制,滤波参数根据e(j)的值而自动调整,使之适合下一时刻输出的y(j)的值更接近于所期望的语音参考信号d(j)的值。For example, when an adaptive filtering method is used to obtain a residual signal. FIG. 3 is a schematic diagram of an adaptive filtering method used in an abnormal sound detecting method according to an embodiment of the present invention. As shown in FIG. 3, in combination with the present application, x is a first voice signal, and d is a voice reference signal, e Is the residual signal. The idea of adaptive filtering is to constantly adjust the value of e by some criterion, so that the filtered x value (ie, y value) is close to the value of the speech reference signal d. Specifically, x(j) represents the value of the input first speech signal at time j, y(j) represents the value of the filtered first speech signal at the j-time, and d(j) represents the j-time. For the speech reference signal, the residual signal e(j) is the difference between d(j) and y(j); the filtering parameter of the adaptive filter is controlled by the value of the residual signal e(j), and the filtering parameter is based on e The value of (j) is automatically adjusted so that it is suitable for the value of y(j) output at the next moment to be closer to the value of the desired speech reference signal d(j).
S103、根据残差信号,确定第一语音信号中是否具有异音,以确定声音输出器件是否异常。S103. Determine, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
在本实施例中,异音检测装置去分析得到的残差信号中是否出异常的信号,进而确定出第一语音信号中是否具有异音。在确定第一语音信号具有异音的时候,确定声音输出器件异常;在确定第一语音信号不具有异音的时候,确定声音输出器件正常。In this embodiment, the abnormal sound detecting device analyzes whether the obtained residual signal has an abnormal signal, and further determines whether the first voice signal has an abnormal sound. When it is determined that the first speech signal has an abnormal sound, it is determined that the sound output device is abnormal; when it is determined that the first speech signal does not have an abnormal sound, it is determined that the sound output device is normal.
图4为本申请实施例提供的一种异音检测方法的流程示意图二。如图4所示,该流程过程包括:
FIG. 4 is a schematic flowchart 2 of a method for detecting an abnormal sound according to an embodiment of the present application. As shown in Figure 4, the process includes:
S201、异音检测装置启动异音检测装置的录音功能。S201. The abnormal sound detecting device starts the recording function of the abnormal sound detecting device.
在本实施例中,异音检测装置启动自身的录音功能。In the present embodiment, the abnormal sound detecting means activates its own recording function.
S202、终端设备的声音输出器件播放第一语音信号,异音检测装置获取终端设备的声音输出器件所播放的第一语音信号,其中,第一语音信号为终端设备中本地存储的。S202. The sound output device of the terminal device plays the first voice signal, and the abnormal sound detecting device acquires the first voice signal played by the sound output device of the terminal device, wherein the first voice signal is locally stored in the terminal device.
在本实施例中,终端设备中已经预先存储好了语音,进而终端设备的声音输出器件可以根据终端设备本地所存储的语音播放出第一语音信号;然后,异音检测装置可以录取该第一语音信号。本步骤的过程可以参见图2所提供的步骤S101,原理和过程与步骤S101相同。In this embodiment, the voice is pre-stored in the terminal device, and the sound output device of the terminal device can play the first voice signal according to the voice stored locally by the terminal device; then, the abnormal sound detecting device can take the first voice. voice signal. The process of this step can be referred to step S101 provided in FIG. 2, and the principle and process are the same as step S101.
S203、异音检测装置保存第一语音信号。S203. The abnormal sound detecting device saves the first voice signal.
在本实施例中,异音检测装置保存录取到的第一语音信号。In this embodiment, the abnormal sound detecting means holds the first voice signal that has been recorded.
S204、异音检测装置获取语音参考信号。S204. The abnormal sound detecting device acquires a voice reference signal.
在本实施例中,异音检测装置获取到一个语音参考信号,其中,语音参考信号的语音内容与第一语音信号的语音内容相同。In this embodiment, the abnormal sound detecting device acquires a voice reference signal, wherein the voice content of the voice reference signal is the same as the voice content of the first voice signal.
S205、异音检测装置运行异音检测算法。S205. The abnormal sound detecting device runs an abnormal sound detecting algorithm.
在本实施例中,异音检测装置运行异音检测算法,该异音检测算法的过程包括了图2所示的S102、S103。进而确定第一语音信号中是否具有异音,以确定声音输出器件是否异常。In the present embodiment, the abnormal sound detecting means operates the abnormal sound detecting algorithm, and the process of the abnormal sound detecting algorithm includes S102, S103 shown in FIG. Further determining whether there is an abnormal sound in the first speech signal to determine whether the sound output device is abnormal.
S206、异音检测装置输出检测结果。S206. The abnormal sound detecting device outputs the detection result.
图4所示的过程参见行图2所示的的过程。The process shown in Figure 4 is shown in the process shown in Figure 2.
在图2和图4所提供实施例中,异音检测装置输出S205中得到的检测结果,在确定第一语音信号具有异音的时候,确定声音输出器件异常;在确定第一语音信号不具有异音的时候,确定声音输出器件正常。In the embodiment provided by FIG. 2 and FIG. 4, the abnormal sound detecting means outputs the detection result obtained in S205, and determines that the sound output device is abnormal when determining that the first voice signal has an abnormal sound; and determines that the first voice signal does not have When the noise is abnormal, it is determined that the sound output device is normal.
在现有的方法中,现有的方法提供了一种方式为,声音输出器件播放扫频信号,然后获取到声音输出器件所播放的扫频信号之后,计算扫频信号的12~15次谐波能量;根据扫频信号的12~15次谐波能量,确定扫频信号中是否具有异音,以确定声音输出器件是否异常。但是这种方式中,采用的待检测信号依然还是扫频信号,与之前提到的问题一样,依然会出现不能准确的检测待检测的声音输出器件所播放的扫频信号中的异音的情况,无法准确的检测出待检测的声音输出器件是否异常。进而会出现,检测一些终端设备的听筒等声音输出器件的时候,会出现检测结果判断为无异音,但是实际中使用该终端设备播放音源的时候,用户会听到明显的异音的情况。In the existing method, the existing method provides a method in which the sound output device plays the frequency sweep signal, and then obtains the frequency sweep signal played by the sound output device, and then calculates the 12-15 harmonic of the frequency sweep signal. Wave energy; according to the 12-15th harmonic energy of the frequency sweep signal, determine whether there is abnormal sound in the frequency sweep signal to determine whether the sound output device is abnormal. However, in this mode, the signal to be detected is still a frequency sweep signal. As with the previously mentioned problem, there may still be an inability to accurately detect the abnormal sound in the frequency sweep signal played by the sound output device to be detected. It is impossible to accurately detect whether the sound output device to be detected is abnormal. Further, when detecting the sound output device such as the earpiece of some terminal devices, the detection result is judged to be no abnormal sound, but when the terminal device is actually used to play the sound source, the user may hear the obvious abnormal sound.
现有的方法还提供了一种方式为,获取通信网络传输的音频信号;获取音频信号当前帧的频域能量分布参数,获取当前帧的预设邻域范围内的帧中每一帧的频域能量分布参数;获取当前帧的音调参数,获取当前帧的预设邻域范围内的帧中每一帧的音调参数;根据当前帧的音调参数以及当前帧的预设邻域范围内的帧中每一帧的音调参数,确定当前帧是否处于语音段;若确定当前帧处于语音段,且在全部的频域能量分布参数中,位于预设的语音类杂音频域能量分布参数区间的频域能量分布参数的数量大于等于第一阈值,则确定当前帧为语音类杂音。这种现有方法中,第一点,采用的待检测音频信号,是通信网络传输过来的音频信号,该音频信号在传输的过程中,会
出现音频信号的丢包现象,或者会出现其他的外部噪音进而使得该音频信号在传输过程中掺杂杂音的现象;从而这种现有方法,最终若检测出了语音类杂音,这个杂音可能是因为音频信号在传输过程中出现了丢包现象、或者被掺杂杂音而引发的,不能确定是否这个杂音就是因为声音输出器件本身的缺陷而导致的,从而现有方法并不准确。第二点,这种现有方法中,分析的是音频信号频域能量分布参数,将音频信号频域能量分布参数与预设的频域能量分布参数区间做比较,去判断该音频信号中是否存在异音;但是由于不同音频信号的频域特征可能千差万别,因此在预设频域能量分布参数区间时会比较困难,同样的会造成检测结果不准确的问题。第三点,现有的这种检测方法,针对于同一类型的音频信号,由于不同类型的终端设备的设计工艺、装配工艺、电声器件选型等方面都会存在很大的差异,进而导致不同的终端设备所播放出来的同一类型的音频信号,也会在频域特征上存在很大的不同,也会给预设频域能量分布参数区间带来很大的困难,这种检测方法的通用性较差,也会造成检测结果不准确的问题。The existing method further provides a method for acquiring an audio signal transmitted by a communication network, acquiring a frequency domain energy distribution parameter of a current frame of the audio signal, and acquiring a frequency of each frame in a frame within a preset neighborhood of the current frame. The domain energy distribution parameter is obtained by acquiring the pitch parameter of the current frame, and acquiring the pitch parameter of each frame in the frame within the preset neighborhood of the current frame; according to the pitch parameter of the current frame and the frame within the preset neighborhood of the current frame. The pitch parameter of each frame determines whether the current frame is in the voice segment; if it is determined that the current frame is in the voice segment, and in all the frequency domain energy distribution parameters, the frequency of the energy distribution parameter interval in the preset voice-like audio domain If the number of domain energy distribution parameters is greater than or equal to the first threshold, it is determined that the current frame is a voice-like noise. In the prior method, the first point, the audio signal to be detected is an audio signal transmitted by the communication network, and the audio signal is in the process of transmission.
There is a packet loss phenomenon of the audio signal, or other external noise may occur to make the audio signal doped noise during the transmission; thus, in the existing method, if the voice noise is detected, the noise may be Because the audio signal is caused by packet loss during the transmission process, or is caused by the noise, it is impossible to determine whether the noise is caused by the defect of the sound output device itself, and the existing method is not accurate. Secondly, in the existing method, the frequency domain energy distribution parameter of the audio signal is analyzed, and the frequency domain energy distribution parameter of the audio signal is compared with the preset frequency domain energy distribution parameter interval to determine whether the audio signal is in the audio signal. There are abnormal sounds; however, since the frequency domain characteristics of different audio signals may vary widely, it may be difficult to preset the frequency domain energy distribution parameter interval, and the same result may result in inaccurate detection results. Thirdly, the existing detection method is directed to the same type of audio signal, and there are great differences in the design process, assembly process, and electro-acoustic device selection of different types of terminal devices, which leads to different The same type of audio signal played by the terminal device also has a great difference in the frequency domain characteristics, and also brings great difficulty to the preset frequency domain energy distribution parameter interval. Poor sex can also cause inaccurate test results.
在本申请中,采用图2或图4的过程,由于检测的待检测信号为语音信号,语音信号可以代表着用户真实的使用场景,在语音信号的整个播放过程会在集中在语音的实际频段内反复触发,进而有利于去发现有问题频点的异常;并且,本申请中语音信号本身就代表着需要检测的真实频点,因此漏掉有问题频点的可能性就会小得多,有利于检测出有异音的频点。同时,本申请中,采用的待检测信号是声音输出器件所播放的终端设备中本地存储的语音信号,不是从通信网络上传输过来的信号,进而避免了语音信号在传输过程中出现了丢包现象、或者被掺杂杂音而引发出异音的问题,提高了检测结果的准确性。并且,本申请中,在残差信号中会包含有第一语音信号中与语音参考信号的信号不同的部分,然后对该残差信号进行检测,去确定第一语音信号中是否存在着异音,并且,第一语音信号与语音参考信号的语音内容相同,相对于采用音频信号频域能量分布参数去分析异音的方式,检测方式较为便捷、且检测方法的通用性较好,检测结果较为准确。In the present application, the process of FIG. 2 or FIG. 4 is adopted. Since the detected signal to be detected is a voice signal, the voice signal can represent a real use scenario of the user, and the entire playback process of the voice signal is concentrated in the actual frequency band of the voice. Repeated triggering inside, which is beneficial to find the abnormality of the problem frequency; and, in this application, the speech signal itself represents the real frequency point to be detected, so the possibility of missing the problem frequency is much smaller. It is beneficial to detect the frequency of abnormal sounds. Meanwhile, in the present application, the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, and is not a signal transmitted from the communication network, thereby avoiding packet loss during transmission of the voice signal. Phenomena, or the problem of being mixed with noise to cause abnormal sound, improves the accuracy of the test results. Moreover, in the present application, the residual signal may include a portion of the first voice signal that is different from the signal of the voice reference signal, and then the residual signal is detected to determine whether there is an abnormal sound in the first voice signal. And, the first voice signal is the same as the voice content of the voice reference signal, and the detection method is convenient, and the detection method is more versatile, and the detection result is better than the method of analyzing the noise by using the frequency domain energy distribution parameter of the audio signal. accurate.
本实施例通过获取终端设备的声音输出器件所播放的第一语音信号,其中,第一语音信号为终端设备中本地存储的,第一语音信号包括频率无规则变化的音频信息;根据预先获取的语音参考信号、以及第一语音信号,得到残差信号,其中,残差信号中包括了第一语音信号中与语音参考信号的信号不同的部分;根据残差信号,确定第一语音信号中是否具有异音,以确定声音输出器件是否异常。从而提供了一种检测声音输出器件播放音频的时候是否出现异音,以确定该声音输出器件是否异常的方式。由于采用的检测的待检测信号为语音信号,语音信号可以代表着用户真实的使用场景,在语音信号的整个播放过程会在集中在语音的实际频段内反复触发,进而有利于去发现有问题频点的异常;并且,本申请中语音信号本身就代表着需要检测的真实频点,因此漏掉有问题频点的可能性就会小得多,有利于检测出有异音的频点。同时,采用的待检测信号是声音输出器件所播放的终端设备中本地存储的语音信号,进而避免了语音信号在传输过程中出现了丢包现象、或者被掺杂杂音而引发出异音的问题,并且,在残差信号中会包含有第一语音信号中与语音参考信号的信号不同的部分,然后对该残差信号进行检测,去确定第一语音信号中是否存在着异音,第一语音信号与语音参考信号的语音内容相同,检测方式较
为便捷、且检测方法的通用性较好,提高了检测结果的准确性。In this embodiment, the first voice signal played by the sound output device of the terminal device is obtained, wherein the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly; a voice reference signal, and a first voice signal, to obtain a residual signal, wherein the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; and determining, according to the residual signal, whether the first voice signal is Have an abnormal sound to determine if the sound output device is abnormal. There is thus provided a way of detecting whether an abnormal sound occurs when the sound output device plays audio to determine whether the sound output device is abnormal. Since the detected signal to be detected is a voice signal, the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency. The abnormality of the point; and, in the present application, the speech signal itself represents the real frequency point to be detected, so the possibility of missing the problem frequency point is much smaller, and it is advantageous to detect the frequency point with abnormal sound. At the same time, the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise. And, the residual signal includes a portion of the first voice signal different from the signal of the voice reference signal, and then detecting the residual signal to determine whether there is an abnormal sound in the first voice signal, first The voice signal is the same as the voice content of the voice reference signal.
It is convenient and the versatility of the detection method is good, and the accuracy of the detection result is improved.
图5为本申请实施例提供的又一种异音检测方法的流程示意图。如图5所示,该方法包括:FIG. 5 is a schematic flowchart diagram of still another abnormal sound detecting method according to an embodiment of the present application. As shown in FIG. 5, the method includes:
S301、获取终端设备的声音输出器件所播放的第一语音信号,其中,第一语音信号为终端设备中本地存储的,且第一语音信号包括频率无规则变化的音频信息。S301. Acquire a first voice signal played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly.
在本实施例中,本步骤参见图2所提供的一种异音检测方法的流程示意图一中的步骤S101,以及图4所提供的一种异音检测方法的流程示意图二的步骤S202。In this embodiment, the step S101 in the flowchart 1 of the abnormal sound detecting method provided in FIG. 2 and the step S202 in the flowchart 2 of the abnormal sound detecting method provided in FIG. 4 are referred to.
S302、获取至少一个其他声音输出器件所播放的第二语音信号,其中,其他声音输出器件为播放声音正常的声音输出器件,第二语音信号中的语音内容与第一语音信号中的语音内容相同;将各第二语音信号进行信号叠加处理,生成语音参考信号。S302. Acquire a second voice signal played by at least one other sound output device, where the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal. And performing signal superposition processing on each of the second speech signals to generate a speech reference signal.
在本实施例中,可以采用多台确认可以正常播放声音的正常的声音输出器件去播放同一段第二语音信号;正常的声音输出器件所播放的第二语音信号,也被存储在各正常的声音输出器件所对应的终端设备中的。并且,第二语音信号中的语音内容与第一语音信号中的语音内容是相同的。在正常的声音输出器件去播放同一段第二语音信号之后,异音检测装置分别的录取下各正常的声音输出器件所播放的第二语音信号。In this embodiment, a plurality of normal sound output devices that can normally play the sound can be used to play the same second voice signal; the second voice signal played by the normal sound output device is also stored in each normal In the terminal device corresponding to the sound output device. And, the voice content in the second voice signal is the same as the voice content in the first voice signal. After the normal sound output device plays the same second voice signal, the abnormal sound detecting device separately records the second voice signal played by each normal sound output device.
然后,异音检测装置对各第二语音信号进行信号叠加处理,得到语音参考信号,其中,语音参考信号的语音内容与第二语音信号中的语音内容是相同的。其中,信号叠加处理的过程可以有以下几种方式。第一方式为:异音检测装置对各第二语音信号进行拼接处理,得到语音参考信号。第二方式为:异音检测装置在时域上对各第二语音信号进行叠加,得到语音参考信号。第三方式为:异音检测装置可以而在各频段上对各第二语音信号进行检测,每一个第二语音信号中的将超出预设频率范围的信号的频段进行滤除之后,对进行滤除处理之后各第二语音信号进行合成处理,得到语音参考信号。Then, the abnormal sound detecting means performs signal superimposition processing on each of the second speech signals to obtain a voice reference signal, wherein the voice content of the voice reference signal and the voice content in the second voice signal are the same. Among them, the process of signal superposition processing can be in the following ways. In the first mode, the abnormal sound detecting device performs splicing processing on each second voice signal to obtain a voice reference signal. The second mode is that the abnormal sound detecting device superimposes each of the second voice signals in the time domain to obtain a voice reference signal. The third mode is: the abnormal sound detecting device can detect each second voice signal in each frequency band, and filter the frequency band of the signal exceeding the preset frequency range in each second voice signal, and then filter After the processing, each of the second speech signals is subjected to synthesis processing to obtain a speech reference signal.
S303、将第一语音信号在时域上与语音参考信号进行时延对齐,生成对齐语音参考信号后的第一语音信号。S303. The first voice signal is time-aligned with the voice reference signal in the time domain to generate a first voice signal after the voice reference signal is aligned.
在本实施例中,异音检测装置在时域上将第一语音信号与语音参考信号进行时延对齐处理,进而使得第一语音信号在时域上与语音参考信号进行对齐,得到对齐语音参考信号后的第一语音信号。In this embodiment, the abnormal sound detecting device performs time delay alignment processing on the first voice signal and the voice reference signal in the time domain, so that the first voice signal is aligned with the voice reference signal in the time domain to obtain an aligned voice reference. The first speech signal after the signal.
其中,时延对齐处理的过程中可以采用时延对齐算法,将第一语音信号与语音参考信号在时域上对齐,时延对齐算法例如有广义自相关算法(Generalized Cross Correlation,GCC)、自适应最小均方算法(adaptive Least Mean Square,LMS)、基于子空间的特征值分解算法(subspace based Eigen-Value Decomposition,EVD)、基于传递函数比算法(Acoustic Transfer Functions Ration,ATF-s ration)等等。The delay alignment algorithm may use a delay alignment algorithm to align the first speech signal with the speech reference signal in a time domain, for example, a generalized autocorrelation algorithm (GCC), and a self-correlation algorithm (GCC). Adapted to Least Mean Square (LMS), subspace based Eigen-Value Decomposition (EVD), Acoustic Transfer Functions Ration (ATF-s ration), etc. Wait.
S304、根据预先获取的语音参考信号、以及第一语音信号,得到残差信号,其中,残差信号中包括了第一语音信号中与语音参考信号的信号不同的部分。S304. Obtain a residual signal according to the pre-acquired voice reference signal and the first voice signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal.
在本实施例中,本步骤参见图2所提供的一种异音检测方法的流程示意图一中的步骤S102,以及图4所提供的一种异音检测方法的流程示意图二的步骤S205。In this embodiment, the step S102 in the flow chart of the abnormal sound detecting method provided in FIG. 2 and the step S205 in the flow chart 2 of the abnormal sound detecting method provided in FIG. 4 are referred to.
S305、确定残差信号的能量值。S305. Determine an energy value of the residual signal.
其中,S305具体包括:去除残差信号中的语音主频带能量,生成去除了语音主频带
能量的残差信号,其中,语音主频带能量的频率小于第一频率值;确定去除了语音主频带能量的残差信号的能量值。The S305 specifically includes: removing the voice main band energy in the residual signal, and generating the removed voice main frequency band
A residual signal of energy, wherein a frequency of the speech main band energy is less than the first frequency value; and determining an energy value of the residual signal from which the speech main band energy is removed.
其中,确定去除了语音主频带能量的残差信号的能量值,包括:确定去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,在每一帧上的能量值。Wherein, determining an energy value of the residual signal from which the energy of the main band of the voice is removed includes: determining a portion of the residual signal in which the energy of the main band of the voice is removed is greater than a second frequency value, and energy in each frame value.
在本实施例中,异音检测装置首先需要计算出残差信号的能量值。由于残差信号中的语音主频带能量的信号频率是较低的,从而语音主频带能量部分的能量会比残差信号中的高频异音部分的能量大,语音主频带能量的轻微波动都会直接影响到对于残差信号中的高频异音能量的判断,所以需要将残差信号中的语音主频带能量进行滤除;此时,异音检测装置需要首先采用高通滤波方法对残差信号进行处理,将残差信号中的语音主频带能量去除掉,然后可以得到去除了语音主频带能量的残差信号;在去除的过程中,由于残差信号中的语音主频带能量的频率是小于第一频率值的,进而在去除的过程中,可以将残差信号中的语音主频带能量去除。In the present embodiment, the abnormal sound detecting device first needs to calculate the energy value of the residual signal. Since the signal frequency of the main energy of the speech in the residual signal is low, the energy of the energy part of the main frequency band of the speech is greater than the energy of the high frequency abnormal part of the residual signal, and the energy of the main frequency band of the speech Slight fluctuations directly affect the judgment of the high frequency noise energy in the residual signal, so it is necessary to filter the main energy of the speech in the residual signal; at this time, the abnormal sound detection device needs to adopt the high-pass filtering method first. The residual signal is processed to remove the main energy band of the voice in the residual signal, and then the residual signal with the energy of the main band of the voice is removed; in the process of removing, the voice master in the residual signal The frequency of the band energy is less than the first frequency value, and in the process of removal, the speech main band energy in the residual signal can be removed.
具体来说,高通滤波方法(High-pass Filter)是一种过滤方式,高通滤波的规则为高频信号能正常通过高通滤波器,而低于设定临界值的低频信号则被高通滤波器阻隔、减弱,进而高通滤波器可以输出高频信号。Specifically, the high-pass filter is a filtering method. The high-pass filtering rule is that the high-frequency signal can pass through the high-pass filter normally, and the low-frequency signal below the set threshold is blocked by the high-pass filter. And weaken, and the high-pass filter can output a high-frequency signal.
举例来说,对一个采样语音信号进行分析,该采样语音信号的采样率为8kHz,根据奈奎斯特定理,可以计算出该采样语音信号中的语音主频带能量的频率集中在4kHz以下,语音主频带的能量比比高次谐波的能量强的多。对于语音参考信号进行语谱图的分析可以得到的结果为,语音参考信号很干净,基本看不到高次谐波的能量。For example, analyzing a sampled speech signal, the sampling rate of the sampled speech signal is 8 kHz. According to the Nyquist theorem, the frequency of the main energy band of the speech in the sampled speech signal can be calculated to be below 4 kHz. The energy of the main frequency band of the speech is much stronger than the energy of the higher harmonics. The result of analyzing the speech spectrum of the speech reference signal is that the speech reference signal is very clean and the energy of the higher harmonics is hardly seen.
对于以上举例进行的分析,可以看出,高次谐波的能量的部分就代表了语音信号中的异音信号的部分。在本申请中,可以对一个残差信号进行分析,残差信号中的语音主频带部分的能量比高次谐波的能量强,如果不对残差信号进行高通滤波处理的话,在频域上,高次谐波的能量只占该残差信号的总能量中的很小的一分部分;进而语音主频带能量部分的轻微波动或变化,都会比高次谐波所带来的能量波动或变化更大,严重影响到对于残差信号中是否产生高次谐波的判断,进而影响到对于残差信号是否具有异音的判断。所以这里我们做了一个截止频率为小于第一频率值的信号的高通滤波器,采用该高通滤波器可以将频率为小于第一频率值的语音主频带能量滤除;然后残差信号剩下的能量主要就是高次谐波部分的能量了,即残差信号剩下的能量是异音信号部分的能量。其中,第一频率值可以设置为4kHz。For the analysis performed in the above example, it can be seen that the portion of the energy of the higher harmonics represents the portion of the abnormal signal in the speech signal. In the present application, a residual signal can be analyzed, and the energy of the main frequency band portion of the residual signal is stronger than the energy of the higher harmonics. If the residual signal is not subjected to high-pass filtering, in the frequency domain. The energy of the higher harmonics only accounts for a small fraction of the total energy of the residual signal; further, slight fluctuations or changes in the energy portion of the main energy band of the speech are more likely to be caused by higher harmonics. Or the change is larger, which seriously affects whether or not the high-order harmonic is generated in the residual signal, thereby affecting whether the residual signal has an abnormal sound. So here we have a high-pass filter with a cut-off frequency that is less than the first frequency value. The high-pass filter can be used to filter the energy of the main speech band whose frequency is less than the first frequency value; then the residual signal is left. The energy is mainly the energy of the higher harmonic part, that is, the remaining energy of the residual signal is the energy of the part of the abnormal sound signal. Among them, the first frequency value can be set to 4 kHz.
然后,异音检测装置针对于去除了语音主频带能量的残差信号,计算出其能量值。在本步骤中,异音检测装置可以计算出去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,在每一帧上的能量值。其中,去除了语音主频带能量的残差信号的能量值,也称作带外能量。Then, the abnormal sound detecting means calculates the energy value for the residual signal from which the energy of the main band of the voice is removed. In this step, the abnormal sound detecting means can calculate the energy value at each frame in which the frequency in the residual signal of the speech main band energy is greater than the second frequency value. Wherein, the energy value of the residual signal of the main energy of the speech is removed, which is also called the out-of-band energy.
具体来说,在较为理想的时候,高通滤波处理之后得到的高通滤波后的残差信号中不具有频率小于第一频率值的信号了,进而可以直接从时域上计算高通滤波后的残差信号的时域能量,得到去除了语音主频带能量的残差信号的能量值。Specifically, in a preferred time, the high-pass filtered residual signal obtained after the high-pass filtering process does not have a signal whose frequency is smaller than the first frequency value, and thus the high-pass filtered residual can be directly calculated from the time domain. The time domain energy of the signal yields the energy value of the residual signal from which the energy of the speech main band is removed.
但是在不大理想的时候,高通滤波处理之后得到的高通滤波后的残差信号中还会具有频率小于第一频率值的信号了,进而需要从频域上计算高通滤波后的残差信号的频域能量,此时可以保证频率小于第一频率值的信号的能量不被计算进来。所以,在本步骤中,
异音检测装置需要针对去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,去进行计算,在这里,可以将第二频率值设置为等于第一频率值,也可以根据实际需求,设置第二频率值设置率大于第一频率值;进而,异音检测装置计算出频率小于第二频率值的部分,在每一帧上的能量值E_thrn,即对于一个帧得到一个能量值E_thrn;其中,针对于一个帧来说,一个帧的能量值为该帧内各点幅度值的平方和;然后,异音检测装置将各能量值E_thrn,拟合成一条能量曲线,将该能量曲线与预设能量曲线进行比较。However, when it is not ideal, the high-pass filtered residual signal obtained after the high-pass filtering process also has a signal whose frequency is lower than the first frequency value, and further needs to calculate the high-pass filtered residual signal from the frequency domain. In the frequency domain energy, it is ensured that the energy of the signal whose frequency is less than the first frequency value is not calculated. Therefore, in this step, the abnormal sound detecting device needs to perform calculation for the portion of the residual signal from which the energy of the main energy band of the voice is removed is greater than the second frequency value, where the second frequency value can be set to Equal to the first frequency value, the second frequency value setting rate may be set to be greater than the first frequency value according to actual requirements; and, the abnormal sound detecting device calculates the energy value of each part in the frequency less than the second frequency value. E_thr n , that is, an energy value E_thr n is obtained for one frame; wherein, for one frame, the energy value of one frame is the sum of the squares of the amplitude values of the points in the frame; then, the noise detecting device sets each energy value E_thr n is fitted to an energy curve, which is compared with a preset energy curve.
S306、根据能量值,确定第一语音信号中是否具有异音,以确定声音输出器件是否异常。S306. Determine, according to the energy value, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
其中,S306具体包括:在确定各每一帧上的能量值中,不具有预设个数的能量值均小于与能量值对应的第一能量门限值时,确定第一语音信号中具有异音,并确定声音输出器件异常;在确定各每一帧上的能量值中,具有预设个数的能量值均小于与能量值对应的第一能量门限值时,确定第一语音信号中不具有异音,并确定声音输出器件正常。The S306 specifically includes: determining, in the energy value of each frame, that the energy value that does not have the preset number is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has a different value Sound, and determining that the sound output device is abnormal; determining that the energy value of each frame has a preset number of energy values less than a first energy threshold corresponding to the energy value, determining the first voice signal There is no abnormal sound and it is determined that the sound output device is normal.
在本实施例中,异音检测装置将根据各能量值E_thrn得到的能量曲线,与预设能量曲线进行比较。在该预设能量曲线上具有针对每一个能量值E_thrn的每一个第一能量门限值。进而异音检测装置若确定出各能量值E_thrn中,不具有预设个数的能量值均小于与能量值E_thrn对应的第一能量门限值时,可以确定第一语音信号中具有异音,并确定播放该第一语音信号的声音输出器件是异常的;异音检测装置若在确定各能量值E_thrn中,具有预设个数的能量值均小于与能量值E_thrn对应的第一能量门限值时,可以确定第一语音信号中不具有异音,并确定播放该第一语音信号的声音输出器件是正常的。In the present embodiment, the abnormal sound detecting means compares the energy curve obtained from each energy value E_thr n with a preset energy curve. There is a first energy threshold for each energy value E_thr n on the preset energy curve. Further, if the abnormal sound detecting device determines that each of the energy values E_thr n does not have a preset number of energy values smaller than a first energy threshold corresponding to the energy value E_thr n , it may be determined that the first voice signal has a different value. Sound, and determining that the sound output device that plays the first voice signal is abnormal; if the noise detecting device determines the respective energy values E_thr n , the energy value having a preset number is smaller than the energy value E_thr n When an energy threshold is used, it may be determined that the first voice signal does not have an abnormal sound, and it is determined that the sound output device that plays the first voice signal is normal.
举例来说,图6为本申请实施例提供的又一种异音检测方法中的能量曲线图。如图6所示,通过本实施例提供的方法得到了第一语音信号的实测能量曲线,该实测能量曲线为图6中的实线曲线,图6中的虚线曲线为预设能量曲线;可以将实测能量曲线与预设能量曲线进行对比分析,判断实测能量曲线上的各能量值E_thrn,是否都小于与各能量值分别一一对应的预设能量曲线上的第一能量门限值,从图6中可以确定实测能量曲线上的各能量值E_thrn,并不是都小于与各能量值分别一一对应的预设能量曲线上的第一能量门限值的,进而可以确定该第一语音信号中具有异音,播放该第一语音信号的声音输出器件是异常的。For example, FIG. 6 is an energy curve diagram of still another abnormal sound detecting method provided by an embodiment of the present application. As shown in FIG. 6, the measured energy curve of the first voice signal is obtained by the method provided in this embodiment, and the measured energy curve is a solid curve in FIG. 6, and the dotted curve in FIG. 6 is a preset energy curve; Comparing the measured energy curve with the preset energy curve, determining whether each energy value E_thr n on the measured energy curve is smaller than a first energy threshold value on a preset energy curve corresponding to each energy value, It can be determined from FIG. 6 that the energy values E_thr n on the measured energy curve are not all smaller than the first energy threshold value on the preset energy curve corresponding to each energy value, and the first energy threshold can be determined. The voice signal has an abnormal sound, and the sound output device that plays the first voice signal is abnormal.
本实施例通过获取终端设备的声音输出器件所播放的第一语音信号,其中,第一语音信号为终端设备中本地存储的,且第一语音信号包括频率无规则变化的音频信息;获取至少一个其他声音输出器件所播放的第二语音信号,其中,其他声音输出器件为播放声音正常的声音输出器件,第二语音信号中的语音内容与第一语音信号中的语音内容相同;将各第二语音信号进行信号叠加处理,生成语音参考信号;将第一语音信号在时域上与语音参考信号进行时延对齐,生成对齐语音参考信号后的第一语音信号;根据预先获取的语音参考信号、以及第一语音信号,得到残差信号,其中,残差信号中包括了第一语音信号中与语音参考信号的信号不同的部分;去除残差信号中的语音主频带能量,生成去除了语音主频带能量的残差信号,其中,语音主频带能量的频率小于第一频率值;确定去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,在每一帧上的能量值;;根据能量值,确定第一语音信号中是否具有异音,以确定声音输出器件是否异常。从而提供了一种检测声音输出器件播放音频的时候是否出现异音,以确定该声音输出器件是否异常
的方式。由于采用的检测的待检测信号为语音信号,语音信号可以代表着用户真实的使用场景,在语音信号的整个播放过程会在集中在语音的实际频段内反复触发,进而有利于去发现有问题频点的异常;并且,本申请中语音信号本身就代表着需要检测的真实频点,因此漏掉有问题频点的可能性就会小得多,有利于检测出有异音的频点。同时,采用的待检测信号是声音输出器件所播放的终端设备中本地存储的语音信号,进而避免了语音信号在传输过程中出现了丢包现象、或者被掺杂杂音而引发出异音的问题;并且,在残差信号中会包含有第一语音信号中与语音参考信号的信号不同的部分,然后对该残差信号进行检测,去确定第一语音信号中是否存在着异音,第一语音信号与语音参考信号的语音内容相同,检测方式较为便捷、且检测方法的通用性较好,提高了检测结果的准确性。In this embodiment, the first voice signal played by the sound output device of the terminal device is obtained, wherein the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly; acquiring at least one a second voice signal played by the other sound output device, wherein the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal; The voice signal is subjected to signal superposition processing to generate a voice reference signal; the first voice signal is time-aligned with the voice reference signal in a time domain to generate a first voice signal after the voice reference signal is aligned; according to the pre-acquired voice reference signal, And the first voice signal, the residual signal is obtained, wherein the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; and the voice main band energy in the residual signal is removed, and the voice is removed. a residual signal of the main band energy, wherein the frequency of the speech main band energy a first frequency value; determining, in the residual signal from which the energy of the main energy band of the voice is removed, a frequency greater than a second frequency value, an energy value in each frame; and determining, according to the energy value, whether the first voice signal is Have an abnormal sound to determine if the sound output device is abnormal. Thereby providing a method for detecting whether an abnormal sound occurs when the sound output device plays audio to determine whether the sound output device is abnormal
The way. Since the detected signal to be detected is a voice signal, the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency. The abnormality of the point; and, in the present application, the speech signal itself represents the real frequency point to be detected, so the possibility of missing the problem frequency point is much smaller, and it is advantageous to detect the frequency point with abnormal sound. At the same time, the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise. And, the residual signal includes a portion of the first voice signal different from the signal of the voice reference signal, and then detecting the residual signal to determine whether there is an abnormal sound in the first voice signal, first The voice signal and the voice reference signal have the same voice content, the detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.
图7为本申请实施例提供的另一种异音检测方法的流程示意图。如图7所示,该方法包括:FIG. 7 is a schematic flowchart diagram of another method for detecting an abnormal sound according to an embodiment of the present application. As shown in FIG. 7, the method includes:
S401、获取终端设备的声音输出器件所播放的第一语音信号,其中,第一语音信号为终端设备中本地存储的,且第一语音信号包括频率无规则变化的音频信息。S401. Acquire a first voice signal played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly.
在本实施例中,本步骤参见图2所提供的一种异音检测方法的流程示意图一中的步骤S101,以及图4所提供的一种异音检测方法的流程示意图二的步骤S202,以及图5所提供的又一种异音检测方法的流程示意图的步骤S301。In this embodiment, referring to step S101 in the flowchart 1 of the abnormal sound detecting method provided in FIG. 2, and step S202 in the flowchart 2 of the abnormal sound detecting method provided in FIG. 4, Step S301 of the flow diagram of still another abnormal sound detecting method provided in FIG.
S402、获取至少一个其他声音输出器件所播放的第二语音信号,其中,其他声音输出器件为播放声音正常的声音输出器件,第二语音信号中的语音内容与第一语音信号中的语音内容相同;将各第二语音信号进行信号叠加处理,生成语音参考信号。S402. Acquire a second voice signal played by at least one other sound output device, where the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal. And performing signal superposition processing on each of the second speech signals to generate a speech reference signal.
在本实施例中,本步骤参见图5所提供的又一种异音检测方法的流程示意图的步骤S302。In this embodiment, this step refers to step S302 of the flowchart of the other abnormal sound detecting method provided in FIG. 5 .
S403、将第一语音信号在时域上与语音参考信号进行时延对齐,生成对齐语音参考信号后的第一语音信号。S403. Align the first voice signal with the voice reference signal in a time domain to generate a first voice signal after the voice reference signal is aligned.
在本实施例中,本步骤参见图5所提供的又一种异音检测方法的流程示意图的步骤S303。In this embodiment, this step refers to step S303 of the flowchart of the other abnormal sound detecting method provided in FIG. 5 .
S404、根据预先获取的语音参考信号、以及第一语音信号,得到残差信号,其中,残差信号中包括了第一语音信号中与语音参考信号的信号不同的部分。S404. Obtain a residual signal according to the pre-acquired voice reference signal and the first voice signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal.
在本实施例中,本步骤参见图2所提供的一种异音检测方法的流程示意图一中的步骤S102,以及图4所提供的一种异音检测方法的流程示意图二的步骤S205,以及图5所提供的又一种异音检测方法的流程示意图的步骤S304。In this embodiment, referring to step S102 in the flowchart 1 of the abnormal sound detecting method provided in FIG. 2, and step S205 in the flowchart 2 of the abnormal sound detecting method provided in FIG. 4, Step S304 of the flow diagram of still another abnormal sound detecting method provided in FIG.
S405、确定残差信号的能量值。S405. Determine an energy value of the residual signal.
其中,S405具体包括:去除残差信号中的语音主频带能量,生成去除了语音主频带能量的残差信号,其中,语音主频带能量的频率小于第一频率值;确定去除了语音主频带能量的残差信号的能量值。The S405 specifically includes: removing the voice main band energy in the residual signal, and generating a residual signal with the voice main band energy removed, wherein the frequency of the voice main band energy is smaller than the first frequency value; determining that the voice is removed The energy value of the residual signal of the main band energy.
其中,确定去除了语音主频带能量的残差信号的能量值,包括:确定去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,在每一帧上的能量值;确定能量最大值,其中,能量最大值为各帧的能量值中最大的值。
Wherein, determining an energy value of the residual signal from which the energy of the main band of the voice is removed includes: determining a portion of the residual signal in which the energy of the main band of the voice is removed is greater than a second frequency value, and energy in each frame Value; the energy maximum is determined, wherein the energy maximum is the largest of the energy values of the frames.
在本实施例中,本步骤中的“去除残差信号中的语音主频带能量,生成去除了语音主频带能量的残差信号,其中,语音主频带能量的频率小于第一频率值;确定去除了语音主频带能量的残差信号的能量值;确定去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,在每一帧上的能量值”参见图5所提供的又一种异音检测方法的流程示意图的步骤S305。In this embodiment, in this step, “removing the voice main band energy in the residual signal, generating a residual signal with the voice main band energy removed, wherein the frequency of the speech main band energy is less than the first frequency value. Determining an energy value of the residual signal from which the energy of the main band of the speech is removed; determining a portion of the residual signal from which the energy of the main band of the speech is removed is greater than the second frequency value, and the energy value at each frame" Step S305 of the flow diagram of still another abnormal sound detecting method provided in FIG.
然后,在本步骤中,异音检测装置得到了每一帧上的能量值E_thrn之后,计算各帧上的能量值E_thrn的最大值,得到一个能量最大值。Then, in this step, after the noise detecting device obtains the energy value E_thr n on each frame, the maximum value of the energy value E_thr n on each frame is calculated to obtain an energy maximum value.
S406、根据能量值,确定第一语音信号中是否具有异音,以确定声音输出器件是否异常。S406. Determine, according to the energy value, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
其中,S406具体包括:在确定能量最大值大于等于第二能量门限值时,确定第一语音信号中具有异音,并确定声音输出器件异常;在确定能量最大值小于第二能量门限值时,确定第一语音信号中不具有异音,并确定声音输出器件正常。The S406 specifically includes: determining that the first voice signal has an abnormal sound when the energy maximum value is greater than or equal to the second energy threshold, and determining that the sound output device is abnormal; and determining that the energy maximum value is less than the second energy threshold. When it is determined that there is no abnormal sound in the first speech signal, it is determined that the sound output device is normal.
在本实施例中,异音检测装置将得到的能量最大值与一个第二能量门限值进行比较分析,异音检测装置若确定该能量最大值大于等于第二能量门限值,则确定该第一语音信号中具有异音,并确定播放该第一语音信号的声音输出器件是异常的;异音检测装置若确定该能量最大值小于第二能量门限值,则确定该第一语音信号中不具有异音,并确定播放该第一语音信号的声音输出器件是正常的。In this embodiment, the abnormal sound detecting device compares and analyzes the obtained energy maximum value with a second energy threshold value, and if the abnormal sound detecting device determines that the energy maximum value is greater than or equal to the second energy threshold value, determining the The first voice signal has an abnormal sound, and determines that the sound output device that plays the first voice signal is abnormal; if the abnormal sound detecting device determines that the energy maximum value is less than the second energy threshold, determining the first voice signal There is no abnormal sound in it, and it is determined that the sound output device that plays the first voice signal is normal.
或者,在S405中,也可以对各帧上的能量值E_thrn进行均值计算,得到一个能量均值;进而此时在S406中,异音检测装置将得到的能量均值与一个第三能量门限值进行比较分析,异音检测装置若确定该能量均值大于等于第三能量门限值,则确定该第一语音信号中具有异音,并确定播放该第一语音信号的声音输出器件是异常的;异音检测装置若确定该能量均值小于第三能量门限值,则确定该第一语音信号中不具有异音,并确定播放该第一语音信号的声音输出器件是正常的。Alternatively, in S405, the energy value E_thr n on each frame may be averaged to obtain an energy average value; and in S406, the abnormal sound detecting device will obtain the energy average value and a third energy threshold value. Performing a comparative analysis, if the noise detection device determines that the energy average value is greater than or equal to the third energy threshold, determining that the first voice signal has an abnormal sound, and determining that the sound output device that plays the first voice signal is abnormal; If the noise detecting means determines that the energy mean is less than the third energy threshold, it is determined that the first voice signal does not have an abnormal sound, and it is determined that the sound output device that plays the first voice signal is normal.
本实施例通过获取终端设备的声音输出器件所播放的第一语音信号,其中,第一语音信号为终端设备中本地存储的,且第一语音信号包括频率无规则变化的音频信息;获取至少一个其他声音输出器件所播放的第二语音信号,其中,其他声音输出器件为播放声音正常的声音输出器件,第二语音信号中的语音内容与第一语音信号中的语音内容相同;将各第二语音信号进行信号叠加处理,生成语音参考信号;将第一语音信号在时域上与语音参考信号进行时延对齐,生成对齐语音参考信号后的第一语音信号;根据预先获取的语音参考信号、以及第一语音信号,得到残差信号,其中,残差信号中包括了第一语音信号中与语音参考信号的信号不同的部分;去除残差信号中的语音主频带能量,生成去除了语音主频带能量的残差信号,其中,语音主频带能量的频率小于第一频率值;确定去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,在每一帧上的能量值;确定能量最大值,其中,能量最大值为各帧的能量值中最大的值;根据能量最大值,确定第一语音信号中是否具有异音,以确定声音输出器件是否异常。从而提供了一种检测声音输出器件播放音频的时候是否出现异音,以确定该声音输出器件是否异常的方式。由于采用的检测的待检测信号为语音信号,语音信号可以代表着用户真实的使用场景,在语音信号的整个播放过程会在集中在语音的实际频段内反复触发,进而有利于去发现有问题频点的异常;并且,本申请
中语音信号本身就代表着需要检测的真实频点,因此漏掉有问题频点的可能性就会小得多,有利于检测出有异音的频点。同时,采用的待检测信号是声音输出器件所播放的终端设备中本地存储的语音信号,进而避免了语音信号在传输过程中出现了丢包现象、或者被掺杂杂音而引发出异音的问题;并且,在残差信号中会包含有第一语音信号中与语音参考信号的信号不同的部分,然后对该残差信号进行检测,去确定第一语音信号中是否存在着异音,第一语音信号与语音参考信号的语音内容相同,检测方式较为便捷、且检测方法的通用性较好,提高了检测结果的准确性。In this embodiment, the first voice signal played by the sound output device of the terminal device is obtained, wherein the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly; acquiring at least one a second voice signal played by the other sound output device, wherein the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal; The voice signal is subjected to signal superposition processing to generate a voice reference signal; the first voice signal is time-aligned with the voice reference signal in a time domain to generate a first voice signal after the voice reference signal is aligned; according to the pre-acquired voice reference signal, And the first voice signal, the residual signal is obtained, wherein the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; and the voice main band energy in the residual signal is removed, and the voice is removed. a residual signal of the main band energy, wherein the frequency of the speech main band energy a first frequency value; determining a portion of the residual signal from which the energy of the main energy band of the voice is removed is greater than a second frequency value, an energy value at each frame; determining an energy maximum value, wherein the maximum energy value is The largest value of the energy values of the frame; determining whether there is an abnormal sound in the first speech signal according to the maximum value of the energy to determine whether the sound output device is abnormal. There is thus provided a way of detecting whether an abnormal sound occurs when the sound output device plays audio to determine whether the sound output device is abnormal. Since the detected signal to be detected is a voice signal, the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency. Point exception; and, this application
The medium speech signal itself represents the real frequency point that needs to be detected, so the possibility of missing the problem frequency point is much smaller, which is beneficial for detecting the frequency point with abnormal sound. At the same time, the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise. And, the residual signal includes a portion of the first voice signal different from the signal of the voice reference signal, and then detecting the residual signal to determine whether there is an abnormal sound in the first voice signal, first The voice signal and the voice reference signal have the same voice content, the detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.
图8为本申请实施例提供的再一种异音检测方法的流程示意图。如图8所示,该方法,包括:FIG. 8 is a schematic flowchart diagram of still another abnormal sound detecting method according to an embodiment of the present application. As shown in FIG. 8, the method includes:
S501、获取终端设备的声音输出器件所播放的第一语音信号,其中,第一语音信号为终端设备中本地存储的,且第一语音信号包括频率无规则变化的音频信息。S501. Acquire a first voice signal played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly.
S502、获取语音参考信号。S502. Acquire a voice reference signal.
S503、将第一语音信号在时域上与语音参考信号进行时延对齐,生成对齐语音参考信号后的第一语音信号。S503. Perform time delay alignment of the first voice signal with the voice reference signal in the time domain to generate a first voice signal after the voice reference signal is aligned.
S504、根据预先获取的语音参考信号、以及第一语音信号进行滤波处理,得到残差信号。S504. Perform filtering processing according to the voice reference signal acquired in advance and the first voice signal to obtain a residual signal.
S505、对残差信号进行高通滤波处理,得到去除了语音主频带能量的残差信号。S505: Perform high-pass filtering on the residual signal to obtain a residual signal with the energy of the main band of the voice removed.
S506、确定去除了语音主频带能量的残差信号的能量值。S506. Determine an energy value of the residual signal from which the energy of the main band of the voice is removed.
S507、输入一个能量阈值。S507. Input an energy threshold.
S508、判断能量值是否大于等于能量阈值,以确定第一语音信号中是否具有异音,以确定声音输出器件是否异常。S508. Determine whether the energy value is greater than or equal to the energy threshold to determine whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
S509、在确定能量值大于等于能量阈值时,确定声音输出器件异常。S509. Determine that the sound output device is abnormal when determining that the energy value is greater than or equal to the energy threshold.
S5010、在确定能量值小于能量阈值时,确定声音输出器件正常。S5010: Determine that the sound output device is normal when the energy value is determined to be less than the energy threshold.
本实施例中,各步骤可以参见图5所提供的又一种异音检测方法的流程示意图的各步骤,以及图7所提供的另一种异音检测方法的流程示意图的各步骤。原理和效果如以上实施例提供的方法的原理和效果相同。In this embodiment, the steps of the flow schematic diagram of the other abnormal sound detecting method provided in FIG. 5 and the steps of the flow schematic diagram of another abnormal sound detecting method provided in FIG. 7 may be referred to in each step. The principle and effect are the same as the principle and effect of the method provided by the above embodiments.
图9为本申请实施例提供的一种异音检测装置的结构示意图。如图9所示,该装置,包括:FIG. 9 is a schematic structural diagram of an abnormal sound detecting apparatus according to an embodiment of the present application. As shown in Figure 9, the device includes:
获取单元81,用于获取终端设备的声音输出器件所播放的第一语音信号,其中,第一语音信号为终端设备中本地存储的,且第一语音信号包括频率无规则变化的音频信息;The acquiring unit 81 is configured to acquire a first voice signal that is played by the sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes randomly;
计算单元82,用于根据预先获取的语音参考信号、以及第一语音信号,得到残差信号,其中,残差信号中包括了第一语音信号中与语音参考信号的信号不同的部分;The calculating unit 82 is configured to obtain a residual signal according to the pre-acquired voice reference signal and the first voice signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal;
确定单元83,用于根据残差信号,确定第一语音信号中是否具有异音,以确定声音输出器件是否异常。The determining unit 83 is configured to determine, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
其中,获取单元81可以执行图2所示方法的步骤S101,或者第一获取单元81可以执行图4所示方法的步骤S202,或者第一获取单元81可以执行图5所示方法的步骤S301,
或者第一获取单元81可以执行图7所示方法的步骤S401。计算单元82可以执行图2所示方法的步骤S102,或者计算单元82可以执行图4所示方法的步骤S205,或者计算单元82可以执行图5所示方法的步骤S304,或者计算单元82可以执行图7所示方法的步骤S404。确定单元83可以执行图2所示方法的步骤S103,或者确定单元83可以执行图4所示方法的步骤S205。The obtaining unit 81 may perform step S101 of the method shown in FIG. 2, or the first obtaining unit 81 may perform step S202 of the method shown in FIG. 4, or the first obtaining unit 81 may perform step S301 of the method shown in FIG.
Or the first obtaining unit 81 can perform step S401 of the method shown in FIG. The computing unit 82 may perform step S102 of the method illustrated in FIG. 2, or the computing unit 82 may perform step S205 of the method illustrated in FIG. 4, or the computing unit 82 may perform step S304 of the method illustrated in FIG. 5, or the computing unit 82 may perform Step S404 of the method shown in FIG. The determining unit 83 may perform step S103 of the method illustrated in FIG. 2, or the determining unit 83 may perform step S205 of the method illustrated in FIG.
图9所示实施例的异音检测装置可用于执行上述方法中图2-图4所示实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。The abnormal sound detecting device of the embodiment shown in FIG. 9 can be used to perform the technical solution of the embodiment shown in FIG. 2 to FIG. 4 in the above method, and the implementation principle and technical effects are similar, and details are not described herein again.
图10为本申请实施例提供的又一种异音检测装置的结构示意图。在图9所示装置的基础上,如图10所示,该装置中,确定单元83,包括:FIG. 10 is a schematic structural diagram of still another abnormal sound detecting apparatus according to an embodiment of the present application. On the basis of the apparatus shown in FIG. 9, as shown in FIG. 10, in the apparatus, the determining unit 83 includes:
第一确定模块831,用于确定残差信号的能量值。其中,第一确定模块831可以执行图5所示方法的步骤S305,或者第一确定模块831可以执行图7所示方法的步骤S405。The first determining module 831 is configured to determine an energy value of the residual signal. The first determining module 831 can perform step S305 of the method shown in FIG. 5, or the first determining module 831 can perform step S405 of the method shown in FIG.
第二确定模块832,用于根据能量值,确定第一语音信号中是否具有异音。其中,第二确定模块832可以执行图5所示方法的步骤S306,或者第二确定模块832可以执行图7所示方法的步骤S406。The second determining module 832 is configured to determine, according to the energy value, whether the first voice signal has an abnormal sound. The second determining module 832 can perform step S306 of the method shown in FIG. 5, or the second determining module 832 can perform step S406 of the method shown in FIG.
第一确定模块831,包括:The first determining module 831 includes:
去除子模块8311,用于去除残差信号中的语音主频带能量,生成去除了语音主频带能量的残差信号,其中,语音主频带能量的频率小于第一频率值。其中,去除子模块8311可以执行图5所示方法的步骤S305中的“去除残差信号中的语音主频带能量,生成去除了语音主频带能量的残差信号,其中,语音主频带能量的频率小于第一频率值”的过程,或者,去除子模块8311可以执行图7所示方法的步骤S405中的“去除残差信号中的语音主频带能量,生成去除了语音主频带能量的残差信号,其中,语音主频带能量的频率小于第一频率值”的过程。The removal sub-module 8311 is configured to remove the voice main band energy in the residual signal, and generate a residual signal with the voice main band energy removed, wherein the frequency of the voice main band energy is smaller than the first frequency value. The removing submodule 8311 can perform the step of removing the voice main band energy in the residual signal in step S305 of the method shown in FIG. 5, and generate a residual signal with the voice main band energy removed, wherein the voice main frequency band The frequency of the energy is less than the first frequency value", or the removal sub-module 8311 can perform the process of removing the voice main band energy in the residual signal in step S405 of the method shown in FIG. A residual signal of energy, wherein the frequency of the speech main band energy is less than the first frequency value.
确定子模块8312,用于确定去除了语音主频带能量的残差信号的能量值。其中,确定子模块8312可以执行图5所示方法的步骤S305中的“确定去除了语音主频带能量的残差信号的能量值”的过程,或者,确定子模块8312可以执行图7所示方法的步骤S405中的“确定去除了语音主频带能量的残差信号的能量值”的过程。The determining sub-module 8312 is configured to determine an energy value of the residual signal from which the speech main band energy is removed. The determining sub-module 8312 may perform the process of “determining the energy value of the residual signal with the voice mainband energy removed” in step S305 of the method shown in FIG. 5, or the determining sub-module 8312 may perform the process shown in FIG. 7. The process of "determining the energy value of the residual signal from which the speech main band energy is removed" in step S405 of the method.
其中,确定子模块8312,具体用于:The determining submodule 8312 is specifically configured to:
确定去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,在每一帧上的能量值。此时,确定子模块8312可以执行图5所示方法的步骤S305中的“确定去除了语音主频带能量的残差信号的能量值”的过程。The energy value in each frame is determined by the portion of the residual signal from which the energy of the speech main band energy is removed is greater than the second frequency value. At this time, the determination sub-module 8312 can perform the process of "determining the energy value of the residual signal from which the speech main band energy is removed" in step S305 of the method shown in FIG.
相应的,第二确定模块832,具体用于:Correspondingly, the second determining module 832 is specifically configured to:
在确定各每一帧上的能量值中,不具有预设个数的能量值均小于与能量值对应的第一能量门限值时,确定第一语音信号中具有异音,并确定声音输出器件异常;在确定各每一帧上的能量值中,具有预设个数的能量值均小于与能量值对应的第一能量门限值时,确定第一语音信号中不具有异音,并确定声音输出器件正常。此时,第二确定模块832可以执行图5所示方法的步骤S306。In determining the energy value of each frame, if the energy value without the preset number is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has an abnormal sound, and determining the sound output The device is abnormal; when it is determined that the energy value of each frame has a preset number of energy values less than a first energy threshold corresponding to the energy value, determining that the first voice signal does not have an abnormal sound, and Make sure the sound output device is normal. At this time, the second determination module 832 can perform step S306 of the method shown in FIG.
或者,确定子模块8312,具体用于:Alternatively, the determining sub-module 8312 is specifically configured to:
确定去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,在每一帧上
的能量值;确定能量最大值,其中,能量最大值为各帧的能量值中最大的值。此时,确定子模块8312可以执行图7所示方法的步骤S405中的“确定去除了语音主频带能量的残差信号的能量值”的过程。Determining a portion of the residual signal from which the energy of the main energy band of the voice is removed is greater than the second frequency value, on each frame
The energy value; the energy maximum is determined, wherein the energy maximum is the largest of the energy values of the frames. At this time, the determination sub-module 8312 can perform the process of "determining the energy value of the residual signal from which the speech main band energy is removed" in step S405 of the method shown in FIG.
相应的,第二确定模块832,具体用于:Correspondingly, the second determining module 832 is specifically configured to:
在确定能量最大值大于等于第二能量门限值时,确定第一语音信号中具有异音,并确定声音输出器件异常;在确定能量最大值小于第二能量门限值时,确定第一语音信号中不具有异音,并确定声音输出器件正常。此时,第二确定模块832可以执行图7所示方法的步骤S406。Determining that the first speech signal has an abnormal sound and determining that the sound output device is abnormal when determining that the energy maximum value is greater than or equal to the second energy threshold; determining the first speech when determining that the energy maximum is less than the second energy threshold There is no abnormal sound in the signal and it is determined that the sound output device is normal. At this time, the second determination module 832 can perform step S406 of the method shown in FIG.
在本实施例的装置中,还包括:In the apparatus of this embodiment, the method further includes:
生成单元91,用于在计算单元82根据预先获取的语音参考信号、以及第一语音信号,得到残差信号之前,获取至少一个其他声音输出器件所播放的第二语音信号,其中,其他声音输出器件为播放声音正常的声音输出器件,第二语音信号中的语音内容与第一语音信号中的语音内容相同;将各第二语音信号进行信号叠加处理,生成语音参考信号。其中,生成单元91可以执行图5所示方法的步骤S302,或者生成单元91可以执行图7所示方法的步骤S402。The generating unit 91 is configured to acquire, after the calculating unit 82 obtains the residual signal according to the pre-acquired voice reference signal and the first voice signal, the second voice signal played by the at least one other sound output device, wherein the other voice output The device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal; each second voice signal is subjected to signal superposition processing to generate a voice reference signal. The generating unit 91 may perform step S302 of the method shown in FIG. 5, or the generating unit 91 may perform step S402 of the method shown in FIG.
对齐单元92,用于在计算单元82根据预先获取的语音参考信号、以及第一语音信号,得到残差信号之前,将第一语音信号在时域上与语音参考信号进行时延对齐,生成对齐语音参考信号后的第一语音信号。其中,对齐单元92可以执行图5所示方法的步骤S303,或者对齐单元92可以执行图7所示方法的步骤S403。The aligning unit 92 is configured to: after the calculating unit 82 obtains the residual signal according to the pre-acquired voice reference signal and the first voice signal, delay the first voice signal with the voice reference signal in the time domain to generate an alignment. The first speech signal after the speech reference signal. Wherein, the aligning unit 92 can perform step S303 of the method shown in FIG. 5, or the aligning unit 92 can perform step S403 of the method shown in FIG.
图10所示实施例的异音检测装置可用于执行上述方法中图5-图8所示实施例的技术方案,其实现原理和技术效果类似,此处不再赘述。The abnormal sound detecting device of the embodiment shown in FIG. 10 can be used to perform the technical solution of the embodiment shown in FIG. 5 to FIG. 8 in the above method, and the implementation principle and technical effects are similar, and details are not described herein again.
并且,本实施例的实施不依赖于图9所示的实施例是否实施,本实施例可以独立实施。Moreover, the implementation of this embodiment does not depend on whether the embodiment shown in FIG. 9 is implemented, and the embodiment can be implemented independently.
图11为本申请实施例提供的另一种异音检测装置的结构示意图。如图11所示,该网络设备包括发送器261、接收器262和处理器263。其中,接收器262用于获取终端设备的声音输出器件所播放的第一语音信号,其中,第一语音信号为终端设备中本地存储的,且第一语音信号包括频率无规则变化的音频信息。处理器263用于根据预先获取的语音参考信号、以及第一语音信号,得到残差信号,其中,残差信号中包括了第一语音信号中与语音参考信号的信号不同的部分;根据残差信号,确定第一语音信号中是否具有异音,以确定声音输出器件是否异常。此时,接收器262可以实现图9所示装置中的获取单元81的功能,进而,接收器262可以执行图2所示方法的步骤S101,或者接收器262可以执行图4所示方法的步骤S202,或者接收器262可以执行图5所示方法的步骤S301,或者接收器262可以执行图7所示方法的步骤S401。处理器263可以实现图9所示装置中的计算单元82和确定单元83的功能,进而,处理器263可以执行图2所示方法的步骤S102和S103,或者处理器263可以执行图4所示方法的步骤S205。FIG. 11 is a schematic structural diagram of another abnormal sound detecting apparatus according to an embodiment of the present application. As shown in FIG. 11, the network device includes a transmitter 261, a receiver 262, and a processor 263. The receiver 262 is configured to acquire a first voice signal played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly. The processor 263 is configured to obtain a residual signal according to the pre-acquired voice reference signal and the first voice signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; The signal determines whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal. At this time, the receiver 262 can implement the function of the obtaining unit 81 in the apparatus shown in FIG. 9, and further, the receiver 262 can perform step S101 of the method shown in FIG. 2, or the receiver 262 can perform the steps of the method shown in FIG. S202, or the receiver 262 may perform step S301 of the method illustrated in FIG. 5, or the receiver 262 may perform step S401 of the method illustrated in FIG. The processor 263 can implement the functions of the computing unit 82 and the determining unit 83 in the apparatus shown in FIG. 9, and further, the processor 263 can perform steps S102 and S103 of the method shown in FIG. 2, or the processor 263 can execute the method shown in FIG. Step S205 of the method.
其中,处理器263具体用于确定残差信号的能量值;根据能量值,确定第一语音信号中是否具有异音。此时,处理器263可以实现图10所示装置中的第一确定模块831和第二确定模块832的功能,进而,处理器263可以执行图5所示方法的步骤S305和
S306,或者处理器263可以执行图7所示方法的步骤S405和S406。The processor 263 is specifically configured to determine an energy value of the residual signal, and determine, according to the energy value, whether the first voice signal has an abnormal sound. At this time, the processor 263 can implement the functions of the first determining module 831 and the second determining module 832 in the apparatus shown in FIG. 10, and further, the processor 263 can perform step S305 of the method shown in FIG. 5 and
S306, or processor 263, may perform steps S405 and S406 of the method illustrated in FIG.
处理器263具体用于去除残差信号中的语音主频带能量,生成去除了语音主频带能量的残差信号,其中,语音主频带能量的频率小于第一频率值;确定去除了语音主频带能量的残差信号的能量值。此时,处理器263可以实现图10所示装置中的去除子模块8311和确定子模块8312的功能,进而,处理器263可以执行图5所示方法的步骤S305,或者处理器263可以执行图7所示方法的步骤S405.The processor 263 is specifically configured to remove the voice main band energy in the residual signal, and generate a residual signal with the voice main band energy removed, wherein the frequency of the voice main band energy is smaller than the first frequency value; determining that the voice is removed The energy value of the residual signal of the main band energy. At this time, the processor 263 can implement the functions of the removal submodule 8311 and the determination submodule 8312 in the apparatus shown in FIG. 10, and further, the processor 263 can execute step S305 of the method shown in FIG. 5, or the processor 263 can execute the diagram. Step S405 of the method shown in 7.
处理器263具体用于确定去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,在每一帧上的能量值;在确定各每一帧上的能量值中,不具有预设个数的能量值均小于与能量值对应的第一能量门限值时,确定第一语音信号中具有异音,并确定声音输出器件异常;在确定各每一帧上的能量值中,具有预设个数的能量值均小于与能量值对应的第一能量门限值时,确定第一语音信号中不具有异音,并确定声音输出器件正常。此时,处理器263可以实现图10所示装置中的确定子模块8312和第二确定模块832的功能,进而,处理器263可以执行图5所示方法的步骤S305中的“确定去除了语音主频带能量的残差信号的能量值”的过程,以及图5所示方法的步骤S306。The processor 263 is specifically configured to determine, in the residual signal in which the energy of the voice main band energy is removed, a portion whose frequency is greater than the second frequency value, and the energy value in each frame; in determining the energy value of each frame, When the energy value without the preset number is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal; determining the energy of each frame In the value, when the energy value of the preset number is less than the first energy threshold corresponding to the energy value, it is determined that the first voice signal does not have an abnormal sound, and the sound output device is determined to be normal. At this time, the processor 263 can implement the functions of the determining sub-module 8312 and the second determining module 832 in the apparatus shown in FIG. 10, and further, the processor 263 can perform the step of determining the voice removed in step S305 of the method shown in FIG. The process of the energy value of the residual signal of the main band energy" and the step S306 of the method shown in FIG.
或者,处理器263具体用于确定去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,在每一帧上的能量值;确定能量最大值,其中,能量最大值为各帧的能量值中最大的值;在确定能量最大值大于等于第二能量门限值时,确定第一语音信号中具有异音,并确定声音输出器件异常;在确定能量最大值小于第二能量门限值时,确定第一语音信号中不具有异音,并确定声音输出器件正常。此时,处理器263可以实现图10所示装置中的确定子模块8312和第二确定模块832的功能,进而,处理器263可以执行图7所示方法的步骤S405中的“确定去除了语音主频带能量的残差信号的能量值”的过程,以及图7所示方法的步骤S406。Alternatively, the processor 263 is specifically configured to determine a portion of the residual signal in which the energy of the voice main band is removed, a frequency greater than the second frequency value, and an energy value in each frame; and determine an energy maximum, where the energy maximum The maximum value of the energy values of each frame; when determining that the energy maximum value is greater than or equal to the second energy threshold value, determining that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal; When the energy threshold is two, it is determined that there is no abnormal sound in the first speech signal, and it is determined that the sound output device is normal. At this time, the processor 263 can implement the functions of the determining sub-module 8312 and the second determining module 832 in the apparatus shown in FIG. 10, and further, the processor 263 can perform the "determining the removed voice" in step S405 of the method shown in FIG. The process of the energy value of the residual signal of the main band energy, and the step S406 of the method shown in FIG.
接收器262还用于获取至少一个其他声音输出器件所播放的第二语音信号,其中,其他声音输出器件为播放声音正常的声音输出器件,第二语音信号中的语音内容与第一语音信号中的语音内容相同。此时,接收器262可以实现图10所示装置中的生成单元91的部分功能,进而,接收器262可以执行图5所示方法的步骤S302中的“获取至少一个其他声音输出器件所播放的第二语音信号”的过程,或者接收器262可以执行图7所示方法的步骤S402中的“获取至少一个其他声音输出器件所播放的第二语音信号”的过程。The receiver 262 is further configured to acquire a second voice signal played by at least one other sound output device, wherein the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is in the first voice signal. The voice content is the same. At this time, the receiver 262 can implement a part of the functions of the generating unit 91 in the apparatus shown in FIG. 10, and further, the receiver 262 can perform the "acquisition of at least one other sound output device" in step S302 of the method shown in FIG. The process of the second voice signal, or the receiver 262, may perform the process of "acquiring the second voice signal played by the at least one other sound output device" in step S402 of the method shown in FIG.
则处理器263还用于将各第二语音信号进行信号叠加处理,生成语音参考信号。此时,处理器263可以实现图10所示装置中的生成单元91的部分功能,进而,处理器263可以执行图5所示方法的步骤S302中的“将各第二语音信号进行信号叠加处理,生成语音参考信号”的过程,或者处理器263可以执行图7所示方法的步骤S402中的“将各第二语音信号进行信号叠加处理,生成语音参考信号”的过程。The processor 263 is further configured to perform signal superposition processing on each of the second speech signals to generate a speech reference signal. At this time, the processor 263 can implement a part of the functions of the generating unit 91 in the apparatus shown in FIG. 10, and further, the processor 263 can perform the signal superimposition processing on each of the second voice signals in step S302 of the method shown in FIG. The process of generating a voice reference signal, or the processor 263 may perform the process of "signal superimposing each second voice signal to generate a voice reference signal" in step S402 of the method shown in FIG.
处理器263还用于将第一语音信号在时域上与语音参考信号进行时延对齐,生成对齐语音参考信号后的第一语音信号。此时,处理器263可以实现图10所示装置中的对齐单元92的功能,进而,处理器263可以执行图5所示方法的步骤S303,或者处理器263可以执行图7所示方法的步骤S403。The processor 263 is further configured to perform time delay alignment of the first voice signal with the voice reference signal in the time domain to generate a first voice signal after the voice reference signal is aligned. At this time, the processor 263 can implement the function of the aligning unit 92 in the apparatus shown in FIG. 10, and further, the processor 263 can execute step S303 of the method shown in FIG. 5, or the processor 263 can execute the steps of the method shown in FIG. S403.
图11所示实施例的异音检测装置可用于执行上述方法实施例的技术方案,或者
图9、图10所示实施例各个模块的程序,处理器263调用该程序,执行以上方法实施例的操作,以实现图9、图10所示的各个模块。The abnormal sound detecting device of the embodiment shown in FIG. 11 can be used to execute the technical solution of the above method embodiment, or
In the program of each module of the embodiment shown in FIG. 10, the processor 263 calls the program to perform the operations of the above method embodiments to implement the modules shown in FIG. 9 and FIG.
其中,处理器263也可以为控制器,图11中表示为“控制器/处理器263”。发送器261和接收器262用于支持网络设备与上述实施例中的终端设备之间收发信息,以及支持终端设备与其他终端设备之间进行无线电通信。处理器263执行各种用于与终端设备通信的功能。The processor 263 may also be a controller, and is represented as "controller/processor 263" in FIG. The transmitter 261 and the receiver 262 are configured to support transmission and reception of information between the network device and the terminal device in the above embodiment, and to support radio communication between the terminal device and other terminal devices. The processor 263 performs various functions for communicating with the terminal device.
进一步的,网络设备还可以包括存储器264,存储器264用于存储网络设备的程序代码和数据。Further, the network device may further include a memory 264 for storing program codes and data of the network device.
处理器263例如中央处理器(Central Processing Unit,CPU),还可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(Application Specific Integrated Circuit,ASIC),或,一个或多个微处理器(digital singnal processor,DSP),或,一个或者多个现场可编程门阵列(Field Programmable Gate Array,FPGA)等。存储器264可以是一个存储器,也可以是多个存储元件的统称。The processor 263, such as a central processing unit (CPU), may also be one or more integrated circuits configured to implement the above method, for example, one or more application specific integrated circuits (ASICs), Or, one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs). The memory 264 can be a memory or a collective name for a plurality of storage elements.
需要说明的是,本发明实施例提供的图11的异音检测装置所包含的发送器261对应前述方法实施例中可以执行发送动作,处理器263执行处理、确定、获取等处理动作,接收器可以执行接收动作。具体可参考前述方法实施例。图11的异音检测装置所包含的接收器262,对应前述方法实施例中的获取语音信号的动作。It should be noted that the transmitter 261 included in the abnormal sound detecting apparatus of FIG. 11 provided by the embodiment of the present invention may perform a sending operation corresponding to the foregoing method embodiment, and the processor 263 performs processing operations such as processing, determining, and acquiring, and the receiver. The receiving action can be performed. For details, refer to the foregoing method embodiments. The receiver 262 included in the abnormal sound detecting device of Fig. 11 corresponds to the operation of acquiring a voice signal in the above-described method embodiment.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如,同轴电缆、光纤、数字用户线(Digital Subscriber Line,DSL))或无线(例如,红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如,固态硬盘(Solid State Disk,SSD))等。In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are generated in whole or in part. The computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server or data center by wire (eg, coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.). The computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media. The usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (eg, a Solid State Disk (SSD)) or the like.
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。
Those skilled in the art should appreciate that in one or more of the above examples, the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored in a computer readable medium or transmitted as one or more instructions or code on a computer readable medium. Computer readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one location to another. A storage medium may be any available media that can be accessed by a general purpose or special purpose computer.
Claims (19)
- 一种异音检测方法,其特征在于,包括:An abnormal sound detecting method, comprising:获取终端设备的声音输出器件所播放的第一语音信号,其中,所述第一语音信号为所述终端设备中本地存储的,且所述第一语音信号包括频率无规则变化的音频信息;Obtaining, by the sound output device of the terminal device, the first voice signal, wherein the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly;根据预先获取的语音参考信号、以及所述第一语音信号,得到残差信号,其中,所述残差信号中包括了所述第一语音信号中与所述语音参考信号的信号不同的部分;Obtaining a residual signal according to the pre-acquired voice reference signal and the first voice signal, wherein the residual signal includes a portion of the first voice signal that is different from a signal of the voice reference signal;根据所述残差信号,确定所述第一语音信号中是否具有异音,以确定所述声音输出器件是否异常。And determining, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
- 根据权利要求1所述的方法,其特征在于,所述根据所述残差信号,确定所述第一语音信号中是否具有异音,包括:The method according to claim 1, wherein the determining whether the first voice signal has an abnormal sound according to the residual signal comprises:确定所述残差信号的能量值;Determining an energy value of the residual signal;根据所述能量值,确定所述第一语音信号中是否具有异音。Determining whether the first voice signal has an abnormal sound according to the energy value.
- 根据权利要求2所述的方法,其特征在于,所述确定所述残差信号的能量值,包括:The method according to claim 2, wherein said determining an energy value of said residual signal comprises:去除所述残差信号中的语音主频带能量,生成去除了语音主频带能量的残差信号,其中,所述语音主频带能量的频率小于第一频率值;And removing a residual energy signal of the voice in the residual signal to generate a residual signal with the energy of the voice main frequency band removed, wherein the frequency of the voice main frequency band energy is less than the first frequency value;确定所述去除了语音主频带能量的残差信号的能量值。The energy value of the residual signal from which the speech main band energy is removed is determined.
- 根据权利要求3所述的方法,其特征在于,所述确定所述去除了语音主频带能量的残差信号的能量值,包括:The method according to claim 3, wherein the determining the energy value of the residual signal from which the energy of the main energy band of the voice is removed comprises:确定所述去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,在每一帧上的能量值。Determining, in the residual signal in which the energy of the main energy band of the voice is removed, a frequency greater than a second frequency value, and an energy value in each frame.
- 根据权利要求4所述的方法,其特征在于,所述根据所述能量值,确定所述第一语音信号中是否具有异音,包括:The method according to claim 4, wherein the determining whether the first voice signal has an abnormal sound according to the energy value comprises:在确定各所述每一帧上的能量值中,不具有预设个数的能量值均小于与能量值对应的第一能量门限值时,确定所述第一语音信号中具有异音,并确定所述声音输出器件异常;Determining, in the energy value of each of the frames, that the energy value that does not have a preset number is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has an abnormal sound, And determining that the sound output device is abnormal;在确定各所述每一帧上的能量值中,具有预设个数的能量值均小于与能量值对应的第一能量门限值时,确定所述第一语音信号中不具有异音,并确定所述声音输出器件正常。Determining that the first speech signal has no abnormal sound when the energy value of each of the frames is determined to be less than the first energy threshold corresponding to the energy value, And determining that the sound output device is normal.
- 根据权利要求3所述的方法,其特征在于,所述确定所述去除了语音主频带能量的残差信号的能量值,包括:The method according to claim 3, wherein the determining the energy value of the residual signal from which the energy of the main energy band of the voice is removed comprises:确定所述去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,在每一帧上的能量值;Determining, in the residual signal in which the energy of the main energy band of the voice is removed, a frequency greater than a second frequency value, and an energy value in each frame;确定能量最大值,其中,所述能量最大值为各帧的能量值中最大的值。The energy maximum is determined, wherein the energy maximum is the largest of the energy values of the frames.
- 根据权利要求6所述的方法,其特征在于,所述根据所述能量值,确定所述第一语音信号中是否具有异音,包括:The method according to claim 6, wherein the determining whether the first voice signal has an abnormal sound according to the energy value comprises:在确定所述能量最大值大于等于第二能量门限值时,确定所述第一语音信号中具有异音,并确定所述声音输出器件异常;When it is determined that the energy maximum value is greater than or equal to the second energy threshold, determining that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal;在确定所述能量最大值小于所述第二能量门限值时,确定所述第一语音信号中不具有异音,并确定所述声音输出器件正常。 When it is determined that the energy maximum is less than the second energy threshold, determining that the first voice signal does not have an abnormal sound, and determining that the sound output device is normal.
- 根据权利要求1-7任一项所述的方法,其特征在于,在所述根据预先获取的语音参考信号、以及所述第一语音信号,得到残差信号之前,还包括:The method according to any one of claims 1 to 7, wherein before the obtaining the residual signal according to the pre-acquired voice reference signal and the first voice signal, the method further includes:获取至少一个其他声音输出器件所播放的第二语音信号,其中,所述其他声音输出器件为播放声音正常的声音输出器件,所述第二语音信号中的语音内容与所述第一语音信号中的语音内容相同;Obtaining a second voice signal played by at least one other sound output device, wherein the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is in the first voice signal The same voice content;将各所述第二语音信号进行信号叠加处理,生成所述语音参考信号。Each of the second speech signals is subjected to signal superposition processing to generate the speech reference signal.
- 根据权利要求1-7任一项所述的方法,其特征在于,在所述根据预先获取的语音参考信号、以及所述第一语音信号,得到残差信号之前,还包括:The method according to any one of claims 1 to 7, wherein before the obtaining the residual signal according to the pre-acquired voice reference signal and the first voice signal, the method further includes:将所述第一语音信号在时域上与所述语音参考信号进行时延对齐,生成对齐所述语音参考信号后的第一语音信号。And time-aligning the first voice signal with the voice reference signal in a time domain to generate a first voice signal after the voice reference signal is aligned.
- 一种异音检测装置,其特征在于,包括:An abnormal sound detecting device, comprising:获取单元,用于获取终端设备的声音输出器件所播放的第一语音信号,其中,所述第一语音信号为所述终端设备中本地存储的,且所述第一语音信号包括频率无规则变化的音频信息;An acquiring unit, configured to acquire a first voice signal played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes a random frequency change Audio information;计算单元,用于根据预先获取的语音参考信号、以及所述第一语音信号,得到残差信号,其中,所述残差信号中包括了所述第一语音信号中与所述语音参考信号的信号不同的部分;a calculating unit, configured to obtain a residual signal according to the pre-acquired voice reference signal and the first voice signal, where the residual signal includes the first voice signal and the voice reference signal a different part of the signal;确定单元,用于根据所述残差信号,确定所述第一语音信号中是否具有异音,以确定所述声音输出器件是否异常。And a determining unit, configured to determine, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
- 根据权利要求10所述的装置,其特征在于,所述确定单元,包括:The device according to claim 10, wherein the determining unit comprises:第一确定模块,用于确定所述残差信号的能量值;a first determining module, configured to determine an energy value of the residual signal;第二确定模块,用于根据所述能量值,确定所述第一语音信号中是否具有异音。And a second determining module, configured to determine, according to the energy value, whether the first voice signal has an abnormal sound.
- 根据权利要求11所述的装置,其特征在于,所述第一确定模块,包括:The device according to claim 11, wherein the first determining module comprises:去除子模块,用于去除所述残差信号中的语音主频带能量,生成去除了语音主频带能量的残差信号,其中,所述语音主频带能量的频率小于第一频率值;a sub-module, configured to remove a voice main band energy in the residual signal, and generate a residual signal with the voice main band energy removed, where the frequency of the voice main band energy is less than the first frequency value;确定子模块,确定所述去除了语音主频带能量的残差信号的能量值。Determining a sub-module, determining an energy value of the residual signal from which the energy of the speech main band is removed.
- 根据权利要求12所述的装置,其特征在于,所述确定子模块,具体用于:The apparatus according to claim 12, wherein the determining submodule is specifically configured to:确定所述去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,在每一帧上的能量值。Determining, in the residual signal in which the energy of the main energy band of the voice is removed, a frequency greater than a second frequency value, and an energy value in each frame.
- 根据权利要求13所述的装置,其特征在于,所述第二确定模块,具体用于:The device according to claim 13, wherein the second determining module is specifically configured to:在确定各所述每一帧上的能量值中,不具有预设个数的能量值均小于与能量值对应的第一能量门限值时,确定所述第一语音信号中具有异音,并确定所述声音输出器件异常;Determining, in the energy value of each of the frames, that the energy value that does not have a preset number is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has an abnormal sound, And determining that the sound output device is abnormal;在确定各所述每一帧上的能量值中,具有预设个数的能量值均小于与能量值对应的第一能量门限值时,确定所述第一语音信号中不具有异音,并确定所述声音输出器件正常。Determining that the first speech signal has no abnormal sound when the energy value of each of the frames is determined to be less than the first energy threshold corresponding to the energy value, And determining that the sound output device is normal.
- 根据权利要求12所述的装置,其特征在于,所述确定子模块,具体用于:The apparatus according to claim 12, wherein the determining submodule is specifically configured to:确定所述去除了语音主频带能量的残差信号中的频率大于第二频率值的部分,在每一帧上的能量值;Determining, in the residual signal in which the energy of the main energy band of the voice is removed, a frequency greater than a second frequency value, and an energy value in each frame;确定能量最大值,其中,所述能量最大值为各帧的能量值中最大的值。The energy maximum is determined, wherein the energy maximum is the largest of the energy values of the frames.
- 根据权利要求15所述的装置,其特征在于,所述第二确定模块,具体用于: The device according to claim 15, wherein the second determining module is specifically configured to:在确定所述能量最大值大于等于第二能量门限值时,确定所述第一语音信号中具有异音,并确定所述声音输出器件异常;When it is determined that the energy maximum value is greater than or equal to the second energy threshold, determining that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal;在确定所述能量最大值小于所述第二能量门限值时,确定所述第一语音信号中不具有异音,并确定所述声音输出器件正常。When it is determined that the energy maximum is less than the second energy threshold, determining that the first voice signal does not have an abnormal sound, and determining that the sound output device is normal.
- 根据权利要求10-16任一项所述的装置,其特征在于,所述装置,还包括:The device according to any one of claims 10-16, wherein the device further comprises:生成单元,用于在所述计算单元根据预先获取的语音参考信号、以及所述第一语音信号,得到残差信号之前,获取至少一个其他声音输出器件所播放的第二语音信号,其中,所述其他声音输出器件为播放声音正常的声音输出器件,所述第二语音信号中的语音内容与所述第一语音信号中的语音内容相同;将各所述第二语音信号进行信号叠加处理,生成所述语音参考信号。a generating unit, configured to acquire, after the calculating unit obtains the residual signal according to the pre-acquired voice reference signal and the first voice signal, the second voice signal played by the at least one other sound output device, where The other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal; and each of the second voice signals is subjected to signal superposition processing. The voice reference signal is generated.
- 根据权利要求10-16任一项所述的装置,其特征在于,所述装置,还包括:The device according to any one of claims 10-16, wherein the device further comprises:对齐单元,用于在所述计算单元根据预先获取的语音参考信号、以及所述第一语音信号,得到残差信号之前,将所述第一语音信号在时域上与所述语音参考信号进行时延对齐,生成对齐所述语音参考信号后的第一语音信号。And an aligning unit, configured to perform the first voice signal in the time domain and the voice reference signal before the calculating unit obtains the residual signal according to the pre-acquired voice reference signal and the first voice signal Time delay alignment generates a first speech signal after aligning the speech reference signal.
- 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1-9任一项所述的方法。 A computer readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the method of any of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201780009940.9A CN108605191B (en) | 2017-01-20 | 2017-04-28 | Abnormal sound detection method and device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710045605 | 2017-01-20 | ||
CN201710045605.6 | 2017-01-20 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018133247A1 true WO2018133247A1 (en) | 2018-07-26 |
Family
ID=62907582
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/082415 WO2018133247A1 (en) | 2017-01-20 | 2017-04-28 | Abnormal sound detection method and apparatus |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108605191B (en) |
WO (1) | WO2018133247A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI778437B (en) * | 2020-10-23 | 2022-09-21 | 財團法人資訊工業策進會 | Defect-detecting device and defect-detecting method for an audio device |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113611327A (en) * | 2020-10-23 | 2021-11-05 | 深圳市冠旭电子股份有限公司 | Abnormal sound detection and analysis method and device, terminal equipment and readable storage medium |
CN112969134B (en) * | 2021-02-07 | 2022-05-10 | 深圳市微纳感知计算技术有限公司 | Microphone abnormality detection method, device, equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050253713A1 (en) * | 2004-05-17 | 2005-11-17 | Teppei Yokota | Audio apparatus and monitoring method using the same |
CN103546853A (en) * | 2013-09-18 | 2014-01-29 | 浙江中科电声研发中心 | Speaker abnormal sound detecting method based on short-time Fourier transformation |
JP2014182092A (en) * | 2013-03-21 | 2014-09-29 | Jx Nippon Oil & Energy Corp | Abnormality detection method and abnormality detection device |
CN104168532A (en) * | 2013-05-15 | 2014-11-26 | 光宝光电(常州)有限公司 | Method and apparatus for abnormal noise detection of loudspeaker |
CN104363554A (en) * | 2014-09-29 | 2015-02-18 | 嘉善恩益迪电声技术服务有限公司 | Method for detecting loudspeaker abnormal sounds |
CN105163262A (en) * | 2015-09-30 | 2015-12-16 | 南京师范大学 | Loudspeaker abnormal sound detection method and system |
CN105810213A (en) * | 2014-12-30 | 2016-07-27 | 浙江大华技术股份有限公司 | Typical abnormal sound detection method and device |
CN106303876A (en) * | 2015-05-19 | 2017-01-04 | 比亚迪股份有限公司 | Voice system, abnormal sound detection method and electronic installation |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100571452C (en) * | 2006-04-07 | 2009-12-16 | 清华大学 | Loudspeaker simple tone detecting method |
CN100554916C (en) * | 2006-04-28 | 2009-10-28 | 孙盈军 | A kind of method of testing of digital product and isolated plant thereof |
CN101917735A (en) * | 2010-05-06 | 2010-12-15 | 王芸 | Mobile terminal audio calibrating method and automatic testing system |
CN102324229B (en) * | 2011-09-08 | 2012-11-28 | 中国科学院自动化研究所 | Method and system for detecting abnormal use of voice input equipment |
CN106034272A (en) * | 2015-03-17 | 2016-10-19 | 钰太芯微电子科技(上海)有限公司 | Loudspeaker compensation system and portable mobile terminal |
CN106488376B (en) * | 2016-10-28 | 2020-03-27 | 努比亚技术有限公司 | Method and device for carrying out fault diagnosis on audio element of mobile terminal |
-
2017
- 2017-04-28 CN CN201780009940.9A patent/CN108605191B/en active Active
- 2017-04-28 WO PCT/CN2017/082415 patent/WO2018133247A1/en active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050253713A1 (en) * | 2004-05-17 | 2005-11-17 | Teppei Yokota | Audio apparatus and monitoring method using the same |
JP2014182092A (en) * | 2013-03-21 | 2014-09-29 | Jx Nippon Oil & Energy Corp | Abnormality detection method and abnormality detection device |
CN104168532A (en) * | 2013-05-15 | 2014-11-26 | 光宝光电(常州)有限公司 | Method and apparatus for abnormal noise detection of loudspeaker |
CN103546853A (en) * | 2013-09-18 | 2014-01-29 | 浙江中科电声研发中心 | Speaker abnormal sound detecting method based on short-time Fourier transformation |
CN104363554A (en) * | 2014-09-29 | 2015-02-18 | 嘉善恩益迪电声技术服务有限公司 | Method for detecting loudspeaker abnormal sounds |
CN105810213A (en) * | 2014-12-30 | 2016-07-27 | 浙江大华技术股份有限公司 | Typical abnormal sound detection method and device |
CN106303876A (en) * | 2015-05-19 | 2017-01-04 | 比亚迪股份有限公司 | Voice system, abnormal sound detection method and electronic installation |
CN105163262A (en) * | 2015-09-30 | 2015-12-16 | 南京师范大学 | Loudspeaker abnormal sound detection method and system |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI778437B (en) * | 2020-10-23 | 2022-09-21 | 財團法人資訊工業策進會 | Defect-detecting device and defect-detecting method for an audio device |
Also Published As
Publication number | Publication date |
---|---|
CN108605191B (en) | 2020-12-25 |
CN108605191A (en) | 2018-09-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015184893A1 (en) | Mobile terminal call voice noise reduction method and device | |
EP2652737B1 (en) | Noise reduction system with remote noise detector | |
US9363596B2 (en) | System and method of mixing accelerometer and microphone signals to improve voice quality in a mobile device | |
US9654874B2 (en) | Systems and methods for feedback detection | |
WO2017185342A1 (en) | Method and apparatus for determining voice input anomaly, terminal, and storage medium | |
TWI628454B (en) | Apparatus, system and method for space status detection based on an acoustic signal | |
US9672843B2 (en) | Apparatus and method for improving an audio signal in the spectral domain | |
WO2013107307A1 (en) | Noise reduction method and device | |
WO2018133247A1 (en) | Abnormal sound detection method and apparatus | |
WO2016184138A1 (en) | Method, mobile terminal and computer storage medium for adjusting audio parameters | |
CN103152546A (en) | Echo suppression method for videoconferences based on pattern recognition and delay feedforward control | |
TWI506620B (en) | Communication apparatus and voice processing method therefor | |
JP2011061422A (en) | Information processing apparatus, information processing method, and program | |
CN111402910B (en) | Method and equipment for eliminating echo | |
JP2013527479A (en) | Corrupt audio signal repair | |
US20140341386A1 (en) | Noise reduction | |
CN112802486B (en) | Noise suppression method and device and electronic equipment | |
CN105991857A (en) | Method and device for adjusting reference signal | |
WO2015085946A1 (en) | Voice signal processing method, apparatus and server | |
CN110996238B (en) | Binaural synchronous signal processing hearing aid system and method | |
CN114584908B (en) | Acoustic testing method, device and equipment for hearing aid | |
WO2017045512A1 (en) | Voice recognition method and apparatus, terminal, and voice recognition device | |
US8615075B2 (en) | Method and apparatus for removing noise signal from input signal | |
TWI790718B (en) | Conference terminal and echo cancellation method for conference | |
WO2022041485A1 (en) | Method for processing audio signal, electronic device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17893451 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17893451 Country of ref document: EP Kind code of ref document: A1 |