WO2018133247A1 - Appareil et procédé de détection de son anormal - Google Patents

Appareil et procédé de détection de son anormal Download PDF

Info

Publication number
WO2018133247A1
WO2018133247A1 PCT/CN2017/082415 CN2017082415W WO2018133247A1 WO 2018133247 A1 WO2018133247 A1 WO 2018133247A1 CN 2017082415 W CN2017082415 W CN 2017082415W WO 2018133247 A1 WO2018133247 A1 WO 2018133247A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
voice
energy
determining
output device
Prior art date
Application number
PCT/CN2017/082415
Other languages
English (en)
Chinese (zh)
Inventor
马骅
吴元友
仇存收
孙建华
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN201780009940.9A priority Critical patent/CN108605191B/zh
Publication of WO2018133247A1 publication Critical patent/WO2018133247A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R29/00Monitoring arrangements; Testing arrangements

Definitions

  • the present application relates to the field of terminal technologies, and in particular, to an abnormal sound detection method and apparatus.
  • a sound output device is generally provided in the terminal, and the sound output device includes, for example, a speaker, a receiver, etc., and the terminal needs to play an audio signal by using the sound output device.
  • the sound output device may cause an abnormal sound when playing an audio signal due to various reasons such as design defects, assembly defects, and foreign matter entering. Therefore, before the terminal sells, it is necessary to detect the sound output device on the terminal, and detect whether the sound output device has an abnormal sound when playing the audio signal.
  • the sound output device to be detected is used to play the frequency sweep signal, and then the detection system records the frequency sweep signal played by the sound output device to be detected, and then calculates the high frequency of each frequency band on the frequency sweep signal. Harmonic distortion energy, and then determine whether the high-order harmonic distortion energy of each frequency band exceeds the energy threshold of each frequency band. When determining that the high-order harmonic distortion energy of one frequency band exceeds the energy threshold of the frequency band, or when determining that the high-order harmonic distortion energy of the multiple frequency bands exceeds the energy threshold of the respective frequency band, It can be determined that the sound output device to be detected has an abnormal sound, thereby determining that the sound output device to be detected is abnormal.
  • the frequency sweep signal since the frequency sweep signal is in a certain frequency band, the frequency is from high to low, or the frequency is from monotonous change to low frequency, each frequency point in the frequency sweep signal lasts for a short time. Then, when a certain frequency point has not yet ignited a relatively high harmonic energy, the next frequency point is scanned, and the problem that may occur at the frequency point is not detected. Also, when the sound output device is actually used, it is unlikely that only a simple audio signal such as a swept signal will be played. Therefore, in the prior art, the abnormal sound in the frequency sweeping signal played by the sound output device to be detected cannot be accurately detected, and it is impossible to accurately detect whether the sound output device to be detected is abnormal, and the existing detection method does not accurate.
  • the present invention provides an abnormal sound detecting method and apparatus for solving the problem that whether the sound output device to be detected in the prior art detects abnormal sound when playing an audio signal is inaccurate, and the sound output device to be detected cannot be accurately detected. Is it an abnormal problem?
  • the present application provides an abnormal sound detecting method, including: acquiring a sound output device of a terminal device And playing the first voice signal, the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information with irregular frequency changes; according to the pre-acquired voice reference signal and the first voice signal Obtaining a residual signal, wherein the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; and then determining, according to the residual signal, whether the first voice signal has an abnormal sound, and further Determine if the sound output device is abnormal.
  • determining whether the first voice signal has an abnormal sound according to the residual signal comprises: determining an energy value of the residual signal; and determining, according to the calculated energy value, the first voice signal Whether it has an abnormal sound.
  • determining the energy value of the residual signal includes: removing the voice main band energy in the residual signal, thereby obtaining a residual signal with the voice main band energy removed, wherein In the process of removing the energy of the main band of the voice, the frequency of the removed main energy of the speech band is set to be smaller than the first frequency value; and then the energy value of the residual signal except the energy of the main band of the speech is determined.
  • determining the energy value of the residual signal except for the energy of the main band of the voice includes: determining a portion of the residual signal other than the energy of the main band of the speech that is greater than the second frequency value, and then Then calculate the energy value of the part in each frame.
  • determining whether the first voice signal has an abnormal sound includes the following process:
  • the energy value of each frame is smaller than the first energy threshold corresponding to the energy value, it may be determined that the first voice signal does not have an abnormal sound, and the sound is determined.
  • the output device is normal.
  • determining an energy value of the residual signal except for the energy of the main band of the voice comprising: determining a portion of the residual signal excluding the energy of the main band of the speech that is greater than the second frequency value Then, calculate the energy value of the part in each frame; then calculate the energy maximum value, which is the largest value among the energy values of each frame.
  • determining whether the first voice signal has an abnormal sound includes the following process:
  • the energy maximum value is greater than or equal to the second energy threshold, it may be determined that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal;
  • the energy maximum is less than the second energy threshold, it may be determined that the first voice signal does not have an abnormal sound, and it is determined that the sound output device is normal.
  • the method before obtaining a residual signal according to the pre-acquired voice reference signal and the first voice signal, the method further includes: acquiring a second voice signal played by at least one other sound output device, each The other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal; and then the second voice signal is superimposed and processed to generate the above Voice reference signal.
  • the method before obtaining a residual signal according to the pre-acquired voice reference signal and the first voice signal, the method further includes: delaying the first voice signal and the voice reference signal in a time domain. Processing, generating a first speech signal after aligning the speech reference signal.
  • the technical solution provided by the embodiment of the present disclosure may include the following beneficial effects: obtaining the first voice signal played by the sound output device of the terminal device, wherein the first voice signal is locally stored in the terminal device, and the first voice is The signal includes audio information with irregular frequency changes; and the residual signal is obtained according to the pre-acquired voice reference signal and the first voice signal, wherein the residual signal is different from the signal of the voice reference signal in the first voice signal. And determining, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal. There is thus provided a way of detecting whether an abnormal sound occurs when the sound output device plays audio to determine whether the sound output device is abnormal.
  • the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency.
  • the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise.
  • the detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.
  • an abnormal sound detecting apparatus including:
  • An acquiring unit configured to acquire a first voice signal that is played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information with irregular frequency changes;
  • a calculating unit configured to obtain, according to the pre-acquired voice reference signal and the first voice signal, a residual signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal;
  • a determining unit configured to determine, according to the residual signal, whether the first voice signal has an abnormal sound, thereby determining whether the sound output device is abnormal.
  • the determining unit includes: a first determining module, configured to determine an energy value of the residual signal; and a second determining module, configured to determine, according to the calculated energy value, the first voice signal Whether it has an abnormal sound.
  • the first determining module comprises:
  • a sub-module is determined to determine the energy value of the residual signal in addition to the energy of the speech main band.
  • the determining sub-module is specifically configured to: determine a portion of the residual signal excluding the energy of the main energy band of the voice that is greater than the second frequency value, and then calculate the portion on each frame. Energy value.
  • the second determining module is specifically used for:
  • the energy value of each frame is smaller than the first energy threshold corresponding to the energy value, it may be determined that the first voice signal does not have an abnormal sound, and the sound is determined.
  • the output device is normal.
  • determining a sub-module specifically: determining to determine a portion of the residual signal except the energy of the main energy band of the voice that is greater than the second frequency value, and then calculating the portion in each The energy value on one frame; then the energy maximum is calculated, which is the largest of the energy values of each frame.
  • the second determining module is specifically used for:
  • the energy maximum value is greater than or equal to the second energy threshold, it may be determined that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal;
  • the energy maximum is less than the second energy threshold, it may be determined that the first voice signal does not have an abnormal sound, and it is determined that the sound output device is normal.
  • the device further includes:
  • a generating unit configured to acquire, by the computing unit, a second voice signal played by at least one other sound output device before obtaining a residual signal according to the pre-acquired voice reference signal and the first voice signal, each of the other sound output devices
  • the voice content in the second voice signal is the same as the voice content in the first voice signal; then the second voice signal is subjected to signal superposition processing to generate the voice reference signal.
  • the device further includes:
  • an aligning unit configured to perform delay alignment processing on the first speech signal and the voice reference signal in the time domain before the calculating unit obtains a residual signal according to the pre-acquired voice reference signal and the first voice signal, to generate a The first speech signal after the speech reference signal is aligned.
  • the technical solution provided by the embodiment of the present disclosure may include the following beneficial effects: obtaining the first voice signal played by the sound output device of the terminal device, wherein the first voice signal is locally stored in the terminal device, and the first voice is The signal includes audio information with irregular frequency changes; and the residual signal is obtained according to the pre-acquired voice reference signal and the first voice signal, wherein the residual signal is different from the signal of the voice reference signal in the first voice signal. And determining, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal. There is thus provided a way of detecting whether an abnormal sound occurs when the sound output device plays audio to determine whether the sound output device is abnormal.
  • the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency.
  • the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise.
  • the detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.
  • the present application provides a computer program for performing the method of the above first aspect when executed by a processor.
  • the application provides a program product, such as a computer readable storage medium, comprising the program of the third aspect.
  • a computer program product comprising instructions for causing a computer to perform the methods of the above aspects when run on a computer is provided.
  • the first voice signal played by the sound output device of the terminal device is obtained, wherein the first voice signal is locally stored in the terminal device, and the The first voice signal includes audio information whose frequency is irregularly changed; and the residual signal is obtained according to the previously obtained voice reference signal and the first voice signal, wherein the residual signal is the first voice signal and the voice reference signal a portion of the signal; determining whether the first speech signal has an abnormal sound based on the residual signal to determine whether the sound output device is abnormal.
  • the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency.
  • the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise.
  • the detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.
  • FIG. 1 is a schematic diagram 1 of an application scenario according to an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart 1 of a method for detecting an abnormal sound according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of an adaptive filtering method used in an abnormal sound detecting method according to an embodiment of the present disclosure
  • FIG. 4 is a schematic flowchart 2 of a method for detecting an abnormal sound according to an embodiment of the present application
  • FIG. 5 is a schematic flowchart diagram of still another abnormal sound detecting method according to an embodiment of the present application.
  • FIG. 6 is an energy curve diagram of still another abnormal sound detecting method according to an embodiment of the present application.
  • FIG. 7 is a schematic flowchart diagram of another method for detecting an abnormal sound according to an embodiment of the present application.
  • FIG. 8 is a schematic flowchart diagram of still another abnormal sound detecting method according to an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of an abnormal sound detecting apparatus according to an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of still another abnormal sound detecting apparatus according to an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of another abnormal sound detecting apparatus according to an embodiment of the present application.
  • the embodiments of the present application are applied to either the abnormal sound detecting device, or the audio detecting system, or any system that can perform the embodiments of the present application. Some of the terms in the present application are explained below to facilitate understanding by those skilled in the art. It should be noted that when the solution of the embodiment of the present application is applied to an audio detection system or can be executed in any system of the embodiment of the present application, the names of the audio detection system and the abnormal sound detection device may change, but this is not The implementation of the solution of the embodiment of the present application is affected.
  • a terminal device also referred to as a terminal or user device, is a device that provides voice and/or data connectivity to a user, for example, a handheld device having a wireless connection function, an in-vehicle device, and the like.
  • Common terminal devices include, for example, a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a mobile internet device (MID), and a wearable device.
  • the wearable device includes, for example, a smart watch, a smart wristband, and a step counter. And so on.
  • a sound output device which is a device that can play an audio signal, for example, a speaker or a receiver; the sound output device can be disposed on the terminal device.
  • Multiple means two or more, and other quantifiers are similar. "and/or”, describing the association relationship of the associated objects, indicating that there may be three relationships, for example, A and/or B, which may indicate that there are three cases where A exists separately, A and B exist at the same time, and B exists separately.
  • the character "/" generally indicates that the contextual object is an "or" relationship.
  • FIG. 1 is a schematic diagram 1 of an application scenario provided by an embodiment of the present application.
  • the embodiment of the present application needs to use the terminal device 01 and the abnormal sound detecting device 02.
  • a sound output device 03 is provided, and the sound output device 03 can play an audio signal.
  • the sound output device 03 on the terminal device 01 plays an audio signal
  • the abnormal sound detecting device 02 acquires the played audio signal played by the sound output device 03 on the end device 01, and then the abnormal sound detecting device 02 performs The solution carried out by the embodiment of the present application.
  • the terminal device in the embodiment of the present application may refer to an access terminal, a user terminal, a terminal, a wireless communication device, a user agent, a user device, or the like.
  • the user terminal has, for example, a smart phone, a smart watch, a personal computer, and the like.
  • the sound output device in the implementation of the present application may be a speaker, a receiver, etc., and the sound output device in the implementation of the present application may be disposed on the terminal device in the embodiment of the present application.
  • FIG. 2 is a schematic flowchart 1 of a method for detecting an abnormal sound according to an embodiment of the present application. As shown in Figure 2, the method includes:
  • the description will be made with the execution subject being an abnormal sound detecting device.
  • the sound output device of the terminal device plays the first voice signal, and then the noise detecting device can acquire the first voice signal played by the sound output device.
  • the manner in which the abnormal sound detecting device acquires the first voice signal played by the sound output device is: the voice has been pre-stored in the terminal device, and the sound output device of the terminal device can be stored according to the voice stored locally by the terminal device.
  • the first voice signal is played; then, the abnormal sound detecting device can take the first voice signal.
  • the first voice signal may be the voice of the "first aid center dial 120" voice of the female voice at 112.
  • the sound output device plays the voice stored locally in the terminal device "Please dial 120 for the emergency center.”
  • the voice of the female voice can be used, because the female voice is still higher than the male voice, the fundamental frequency is higher, and the coverage of the frequency band is larger; the frequency energy distribution of the female voice on the time axis is more diverse. .
  • the signal difference between the frequency sweep signal and the voice signal is large.
  • the signal to be detected used in the prior art is a frequency sweep signal
  • the frequency sweep signal is a process in which a frequency changes from high to low, or a frequency changes from low to high.
  • Each frequency point in the frequency signal lasts for a short time; in turn, when a certain frequency point has not yet excited the higher harmonic energy, the next frequency point is scanned. Problems that may occur at the frequency point are not detected; voice signals are used in this application.
  • the present application can obtain the first voice signal played by the sound output device, and the first voice signal has audio information with irregular frequency change, the first voice
  • the duration of each frequency point in the signal is variable, and the frequency variation in the first speech signal is variability, and the entire playback process of the first speech signal is repeatedly triggered in the actual frequency band concentrated in the speech, and further It is good for finding anomalies with problematic frequencies.
  • the abnormal sound is usually generated at a very narrow individual resonance frequency; whereas in the prior art, when the frequency sweep signal is used as the signal to be detected, since the frequency of the frequency sweep signal is Discrete step sweep, each frequency point is not continuous, and it is very likely that the true problem frequency will be missed during the scanning process; however, the speech signal itself in this application represents the real frequency point to be detected, so it is missed. The probability of having a problem frequency is much smaller, which is good for detecting frequencies with abnormal sounds.
  • the abnormal sound detecting device has previously acquired the voice reference signal, wherein the voice content of the voice reference signal is the same as the voice content of the first voice signal.
  • the voice content of the first voice signal is "Hello, please dial 00”
  • the voice content of the voice reference signal is also "Hello, please dial 00”.
  • the abnormal sound detecting device needs to adopt a voice reference signal, and the first voice signal to be detected is subjected to adaptive filtering processing to remove a portion of the first voice signal to be detected that is consistent with the signal of the voice reference signal, and retain the first to be detected.
  • a portion of the voice signal that is different from the signal of the voice reference signal, and thus "the portion of the first voice signal to be detected that is different from the signal of the reference signal that remains is" is a residual signal.
  • the abnormal sound detecting device may also adopt another filtering processing method, and perform filtering processing on the first voice signal to be detected according to the voice reference signal to obtain a residual signal.
  • the residual signal may include a portion of the first voice signal that is different from the signal of the voice reference signal; and at the same time, the residual signal may also include some signal information of the first voice signal, or a residual signal. It is also possible to include some signal information of the voice reference signal.
  • FIG. 3 is a schematic diagram of an adaptive filtering method used in an abnormal sound detecting method according to an embodiment of the present invention.
  • x is a first voice signal
  • d is a voice reference signal
  • e is the residual signal.
  • the idea of adaptive filtering is to constantly adjust the value of e by some criterion, so that the filtered x value (ie, y value) is close to the value of the speech reference signal d.
  • x(j) represents the value of the input first speech signal at time j
  • y(j) represents the value of the filtered first speech signal at the j-time
  • d(j) represents the j-time.
  • the residual signal e(j) is the difference between d(j) and y(j);
  • the filtering parameter of the adaptive filter is controlled by the value of the residual signal e(j), and the filtering parameter is based on e
  • the value of (j) is automatically adjusted so that it is suitable for the value of y(j) output at the next moment to be closer to the value of the desired speech reference signal d(j).
  • S103 Determine, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
  • the abnormal sound detecting device analyzes whether the obtained residual signal has an abnormal signal, and further determines whether the first voice signal has an abnormal sound. When it is determined that the first speech signal has an abnormal sound, it is determined that the sound output device is abnormal; when it is determined that the first speech signal does not have an abnormal sound, it is determined that the sound output device is normal.
  • FIG. 4 is a schematic flowchart 2 of a method for detecting an abnormal sound according to an embodiment of the present application. As shown in Figure 4, the process includes:
  • the abnormal sound detecting device starts the recording function of the abnormal sound detecting device.
  • the abnormal sound detecting means activates its own recording function.
  • the sound output device of the terminal device plays the first voice signal
  • the abnormal sound detecting device acquires the first voice signal played by the sound output device of the terminal device, wherein the first voice signal is locally stored in the terminal device.
  • the voice is pre-stored in the terminal device, and the sound output device of the terminal device can play the first voice signal according to the voice stored locally by the terminal device; then, the abnormal sound detecting device can take the first voice. voice signal.
  • the process of this step can be referred to step S101 provided in FIG. 2, and the principle and process are the same as step S101.
  • the abnormal sound detecting device saves the first voice signal.
  • the abnormal sound detecting means holds the first voice signal that has been recorded.
  • the abnormal sound detecting device acquires a voice reference signal.
  • the abnormal sound detecting device acquires a voice reference signal, wherein the voice content of the voice reference signal is the same as the voice content of the first voice signal.
  • the abnormal sound detecting device runs an abnormal sound detecting algorithm.
  • the abnormal sound detecting means operates the abnormal sound detecting algorithm, and the process of the abnormal sound detecting algorithm includes S102, S103 shown in FIG. Further determining whether there is an abnormal sound in the first speech signal to determine whether the sound output device is abnormal.
  • the abnormal sound detecting device outputs the detection result.
  • the abnormal sound detecting means outputs the detection result obtained in S205, and determines that the sound output device is abnormal when determining that the first voice signal has an abnormal sound; and determines that the first voice signal does not have When the noise is abnormal, it is determined that the sound output device is normal.
  • the existing method provides a method in which the sound output device plays the frequency sweep signal, and then obtains the frequency sweep signal played by the sound output device, and then calculates the 12-15 harmonic of the frequency sweep signal.
  • Wave energy; according to the 12-15th harmonic energy of the frequency sweep signal determine whether there is abnormal sound in the frequency sweep signal to determine whether the sound output device is abnormal.
  • the signal to be detected is still a frequency sweep signal.
  • the detection result is judged to be no abnormal sound, but when the terminal device is actually used to play the sound source, the user may hear the obvious abnormal sound.
  • the existing method further provides a method for acquiring an audio signal transmitted by a communication network, acquiring a frequency domain energy distribution parameter of a current frame of the audio signal, and acquiring a frequency of each frame in a frame within a preset neighborhood of the current frame.
  • the domain energy distribution parameter is obtained by acquiring the pitch parameter of the current frame, and acquiring the pitch parameter of each frame in the frame within the preset neighborhood of the current frame; according to the pitch parameter of the current frame and the frame within the preset neighborhood of the current frame.
  • the pitch parameter of each frame determines whether the current frame is in the voice segment; if it is determined that the current frame is in the voice segment, and in all the frequency domain energy distribution parameters, the frequency of the energy distribution parameter interval in the preset voice-like audio domain If the number of domain energy distribution parameters is greater than or equal to the first threshold, it is determined that the current frame is a voice-like noise.
  • the first point, the audio signal to be detected is an audio signal transmitted by the communication network, and the audio signal is in the process of transmission.
  • the existing method There is a packet loss phenomenon of the audio signal, or other external noise may occur to make the audio signal doped noise during the transmission; thus, in the existing method, if the voice noise is detected, the noise may be Because the audio signal is caused by packet loss during the transmission process, or is caused by the noise, it is impossible to determine whether the noise is caused by the defect of the sound output device itself, and the existing method is not accurate.
  • the frequency domain energy distribution parameter of the audio signal is analyzed, and the frequency domain energy distribution parameter of the audio signal is compared with the preset frequency domain energy distribution parameter interval to determine whether the audio signal is in the audio signal.
  • the existing detection method is directed to the same type of audio signal, and there are great differences in the design process, assembly process, and electro-acoustic device selection of different types of terminal devices, which leads to different
  • the same type of audio signal played by the terminal device also has a great difference in the frequency domain characteristics, and also brings great difficulty to the preset frequency domain energy distribution parameter interval. Poor sex can also cause inaccurate test results.
  • the process of FIG. 2 or FIG. 4 is adopted. Since the detected signal to be detected is a voice signal, the voice signal can represent a real use scenario of the user, and the entire playback process of the voice signal is concentrated in the actual frequency band of the voice. Repeated triggering inside, which is beneficial to find the abnormality of the problem frequency; and, in this application, the speech signal itself represents the real frequency point to be detected, so the possibility of missing the problem frequency is much smaller. It is beneficial to detect the frequency of abnormal sounds. Meanwhile, in the present application, the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, and is not a signal transmitted from the communication network, thereby avoiding packet loss during transmission of the voice signal.
  • the residual signal may include a portion of the first voice signal that is different from the signal of the voice reference signal, and then the residual signal is detected to determine whether there is an abnormal sound in the first voice signal.
  • the first voice signal is the same as the voice content of the voice reference signal, and the detection method is convenient, and the detection method is more versatile, and the detection result is better than the method of analyzing the noise by using the frequency domain energy distribution parameter of the audio signal. accurate.
  • the first voice signal played by the sound output device of the terminal device is obtained, wherein the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly; a voice reference signal, and a first voice signal, to obtain a residual signal, wherein the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; and determining, according to the residual signal, whether the first voice signal is Have an abnormal sound to determine if the sound output device is abnormal.
  • the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency.
  • the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise.
  • the residual signal includes a portion of the first voice signal different from the signal of the voice reference signal, and then detecting the residual signal to determine whether there is an abnormal sound in the first voice signal, first
  • the voice signal is the same as the voice content of the voice reference signal. It is convenient and the versatility of the detection method is good, and the accuracy of the detection result is improved.
  • FIG. 5 is a schematic flowchart diagram of still another abnormal sound detecting method according to an embodiment of the present application. As shown in FIG. 5, the method includes:
  • step S101 in the flowchart 1 of the abnormal sound detecting method provided in FIG. 2 and the step S202 in the flowchart 2 of the abnormal sound detecting method provided in FIG. 4 are referred to.
  • a plurality of normal sound output devices that can normally play the sound can be used to play the same second voice signal; the second voice signal played by the normal sound output device is also stored in each normal In the terminal device corresponding to the sound output device. And, the voice content in the second voice signal is the same as the voice content in the first voice signal.
  • the abnormal sound detecting device separately records the second voice signal played by each normal sound output device.
  • the abnormal sound detecting means performs signal superimposition processing on each of the second speech signals to obtain a voice reference signal, wherein the voice content of the voice reference signal and the voice content in the second voice signal are the same.
  • the process of signal superposition processing can be in the following ways.
  • the abnormal sound detecting device performs splicing processing on each second voice signal to obtain a voice reference signal.
  • the second mode is that the abnormal sound detecting device superimposes each of the second voice signals in the time domain to obtain a voice reference signal.
  • the third mode is: the abnormal sound detecting device can detect each second voice signal in each frequency band, and filter the frequency band of the signal exceeding the preset frequency range in each second voice signal, and then filter After the processing, each of the second speech signals is subjected to synthesis processing to obtain a speech reference signal.
  • the first voice signal is time-aligned with the voice reference signal in the time domain to generate a first voice signal after the voice reference signal is aligned.
  • the abnormal sound detecting device performs time delay alignment processing on the first voice signal and the voice reference signal in the time domain, so that the first voice signal is aligned with the voice reference signal in the time domain to obtain an aligned voice reference.
  • the first speech signal after the signal.
  • the delay alignment algorithm may use a delay alignment algorithm to align the first speech signal with the speech reference signal in a time domain, for example, a generalized autocorrelation algorithm (GCC), and a self-correlation algorithm (GCC).
  • GCC generalized autocorrelation algorithm
  • GCC self-correlation algorithm
  • LMS Least Mean Square
  • EMD subspace based Eigen-Value Decomposition
  • ATF-s ration Acoustic Transfer Functions Ration
  • step S102 in the flow chart of the abnormal sound detecting method provided in FIG. 2 and the step S205 in the flow chart 2 of the abnormal sound detecting method provided in FIG. 4 are referred to.
  • the S305 specifically includes: removing the voice main band energy in the residual signal, and generating the removed voice main frequency band A residual signal of energy, wherein a frequency of the speech main band energy is less than the first frequency value; and determining an energy value of the residual signal from which the speech main band energy is removed.
  • determining an energy value of the residual signal from which the energy of the main band of the voice is removed includes: determining a portion of the residual signal in which the energy of the main band of the voice is removed is greater than a second frequency value, and energy in each frame value.
  • the abnormal sound detecting device first needs to calculate the energy value of the residual signal. Since the signal frequency of the main energy of the speech in the residual signal is low, the energy of the energy part of the main frequency band of the speech is greater than the energy of the high frequency abnormal part of the residual signal, and the energy of the main frequency band of the speech Slight fluctuations directly affect the judgment of the high frequency noise energy in the residual signal, so it is necessary to filter the main energy of the speech in the residual signal; at this time, the abnormal sound detection device needs to adopt the high-pass filtering method first.
  • the residual signal is processed to remove the main energy band of the voice in the residual signal, and then the residual signal with the energy of the main band of the voice is removed; in the process of removing, the voice master in the residual signal
  • the frequency of the band energy is less than the first frequency value, and in the process of removal, the speech main band energy in the residual signal can be removed.
  • the high-pass filter is a filtering method.
  • the high-pass filtering rule is that the high-frequency signal can pass through the high-pass filter normally, and the low-frequency signal below the set threshold is blocked by the high-pass filter. And weaken, and the high-pass filter can output a high-frequency signal.
  • the sampling rate of the sampled speech signal is 8 kHz.
  • the frequency of the main energy band of the speech in the sampled speech signal can be calculated to be below 4 kHz.
  • the energy of the main frequency band of the speech is much stronger than the energy of the higher harmonics.
  • the result of analyzing the speech spectrum of the speech reference signal is that the speech reference signal is very clean and the energy of the higher harmonics is hardly seen.
  • the portion of the energy of the higher harmonics represents the portion of the abnormal signal in the speech signal.
  • a residual signal can be analyzed, and the energy of the main frequency band portion of the residual signal is stronger than the energy of the higher harmonics. If the residual signal is not subjected to high-pass filtering, in the frequency domain.
  • the energy of the higher harmonics only accounts for a small fraction of the total energy of the residual signal; further, slight fluctuations or changes in the energy portion of the main energy band of the speech are more likely to be caused by higher harmonics. Or the change is larger, which seriously affects whether or not the high-order harmonic is generated in the residual signal, thereby affecting whether the residual signal has an abnormal sound.
  • the high-pass filter can be used to filter the energy of the main speech band whose frequency is less than the first frequency value; then the residual signal is left.
  • the energy is mainly the energy of the higher harmonic part, that is, the remaining energy of the residual signal is the energy of the part of the abnormal sound signal.
  • the first frequency value can be set to 4 kHz.
  • the abnormal sound detecting means calculates the energy value for the residual signal from which the energy of the main band of the voice is removed.
  • the abnormal sound detecting means can calculate the energy value at each frame in which the frequency in the residual signal of the speech main band energy is greater than the second frequency value.
  • the energy value of the residual signal of the main energy of the speech is removed, which is also called the out-of-band energy.
  • the high-pass filtered residual signal obtained after the high-pass filtering process does not have a signal whose frequency is smaller than the first frequency value, and thus the high-pass filtered residual can be directly calculated from the time domain.
  • the time domain energy of the signal yields the energy value of the residual signal from which the energy of the speech main band is removed.
  • the high-pass filtered residual signal obtained after the high-pass filtering process also has a signal whose frequency is lower than the first frequency value, and further needs to calculate the high-pass filtered residual signal from the frequency domain. In the frequency domain energy, it is ensured that the energy of the signal whose frequency is less than the first frequency value is not calculated.
  • the abnormal sound detecting device needs to perform calculation for the portion of the residual signal from which the energy of the main energy band of the voice is removed is greater than the second frequency value, where the second frequency value can be set to Equal to the first frequency value, the second frequency value setting rate may be set to be greater than the first frequency value according to actual requirements; and, the abnormal sound detecting device calculates the energy value of each part in the frequency less than the second frequency value.
  • E_thr n that is, an energy value E_thr n is obtained for one frame; wherein, for one frame, the energy value of one frame is the sum of the squares of the amplitude values of the points in the frame; then, the noise detecting device sets each energy value E_thr n is fitted to an energy curve, which is compared with a preset energy curve.
  • S306. Determine, according to the energy value, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
  • the S306 specifically includes: determining, in the energy value of each frame, that the energy value that does not have the preset number is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has a different value Sound, and determining that the sound output device is abnormal; determining that the energy value of each frame has a preset number of energy values less than a first energy threshold corresponding to the energy value, determining the first voice signal There is no abnormal sound and it is determined that the sound output device is normal.
  • the abnormal sound detecting means compares the energy curve obtained from each energy value E_thr n with a preset energy curve. There is a first energy threshold for each energy value E_thr n on the preset energy curve. Further, if the abnormal sound detecting device determines that each of the energy values E_thr n does not have a preset number of energy values smaller than a first energy threshold corresponding to the energy value E_thr n , it may be determined that the first voice signal has a different value.
  • FIG. 6 is an energy curve diagram of still another abnormal sound detecting method provided by an embodiment of the present application.
  • the measured energy curve of the first voice signal is obtained by the method provided in this embodiment, and the measured energy curve is a solid curve in FIG. 6, and the dotted curve in FIG. 6 is a preset energy curve; Comparing the measured energy curve with the preset energy curve, determining whether each energy value E_thr n on the measured energy curve is smaller than a first energy threshold value on a preset energy curve corresponding to each energy value, It can be determined from FIG. 6 that the energy values E_thr n on the measured energy curve are not all smaller than the first energy threshold value on the preset energy curve corresponding to each energy value, and the first energy threshold can be determined.
  • the voice signal has an abnormal sound, and the sound output device that plays the first voice signal is abnormal.
  • the first voice signal played by the sound output device of the terminal device is obtained, wherein the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly; acquiring at least one a second voice signal played by the other sound output device, wherein the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal;
  • the voice signal is subjected to signal superposition processing to generate a voice reference signal;
  • the first voice signal is time-aligned with the voice reference signal in a time domain to generate a first voice signal after the voice reference signal is aligned; according to the pre-acquired voice reference signal,
  • the residual signal is obtained, wherein the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; and the voice main band energy in the residual signal is removed, and the voice is removed.
  • a residual signal of the main band energy wherein the frequency of the speech main band energy a first frequency value; determining, in the residual signal from which the energy of the main energy band of the voice is removed, a frequency greater than a second frequency value, an energy value in each frame; and determining, according to the energy value, whether the first voice signal is Have an abnormal sound to determine if the sound output device is abnormal.
  • the speech signal itself represents the real frequency point to be detected, so the possibility of missing the problem frequency point is much smaller, and it is advantageous to detect the frequency point with abnormal sound.
  • the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise.
  • the residual signal includes a portion of the first voice signal different from the signal of the voice reference signal, and then detecting the residual signal to determine whether there is an abnormal sound in the first voice signal, first
  • the voice signal and the voice reference signal have the same voice content, the detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.
  • FIG. 7 is a schematic flowchart diagram of another method for detecting an abnormal sound according to an embodiment of the present application. As shown in FIG. 7, the method includes:
  • Step S101 in the flowchart 1 of the abnormal sound detecting method provided in FIG. 2 referring to step S202 in the flowchart 2 of the abnormal sound detecting method provided in FIG. 4, Step S301 of the flow diagram of still another abnormal sound detecting method provided in FIG.
  • S402. Acquire a second voice signal played by at least one other sound output device, where the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal. And performing signal superposition processing on each of the second speech signals to generate a speech reference signal.
  • this step refers to step S302 of the flowchart of the other abnormal sound detecting method provided in FIG. 5 .
  • this step refers to step S303 of the flowchart of the other abnormal sound detecting method provided in FIG. 5 .
  • the S405 specifically includes: removing the voice main band energy in the residual signal, and generating a residual signal with the voice main band energy removed, wherein the frequency of the voice main band energy is smaller than the first frequency value; determining that the voice is removed The energy value of the residual signal of the main band energy.
  • determining an energy value of the residual signal from which the energy of the main band of the voice is removed includes: determining a portion of the residual signal in which the energy of the main band of the voice is removed is greater than a second frequency value, and energy in each frame Value; the energy maximum is determined, wherein the energy maximum is the largest of the energy values of the frames.
  • the noise detecting device obtains the energy value E_thr n on each frame
  • the maximum value of the energy value E_thr n on each frame is calculated to obtain an energy maximum value.
  • S406. Determine, according to the energy value, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
  • the S406 specifically includes: determining that the first voice signal has an abnormal sound when the energy maximum value is greater than or equal to the second energy threshold, and determining that the sound output device is abnormal; and determining that the energy maximum value is less than the second energy threshold. When it is determined that there is no abnormal sound in the first speech signal, it is determined that the sound output device is normal.
  • the abnormal sound detecting device compares and analyzes the obtained energy maximum value with a second energy threshold value, and if the abnormal sound detecting device determines that the energy maximum value is greater than or equal to the second energy threshold value, determining the The first voice signal has an abnormal sound, and determines that the sound output device that plays the first voice signal is abnormal; if the abnormal sound detecting device determines that the energy maximum value is less than the second energy threshold, determining the first voice signal There is no abnormal sound in it, and it is determined that the sound output device that plays the first voice signal is normal.
  • the energy value E_thr n on each frame may be averaged to obtain an energy average value; and in S406, the abnormal sound detecting device will obtain the energy average value and a third energy threshold value.
  • the noise detection device determines that the energy average value is greater than or equal to the third energy threshold, determining that the first voice signal has an abnormal sound, and determining that the sound output device that plays the first voice signal is abnormal; If the noise detecting means determines that the energy mean is less than the third energy threshold, it is determined that the first voice signal does not have an abnormal sound, and it is determined that the sound output device that plays the first voice signal is normal.
  • the first voice signal played by the sound output device of the terminal device is obtained, wherein the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly; acquiring at least one a second voice signal played by the other sound output device, wherein the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal;
  • the voice signal is subjected to signal superposition processing to generate a voice reference signal;
  • the first voice signal is time-aligned with the voice reference signal in a time domain to generate a first voice signal after the voice reference signal is aligned; according to the pre-acquired voice reference signal,
  • the residual signal is obtained, wherein the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; and the voice main band energy in the residual signal is removed, and the voice is removed.
  • a residual signal of the main band energy wherein the frequency of the speech main band energy a first frequency value; determining a portion of the residual signal from which the energy of the main energy band of the voice is removed is greater than a second frequency value, an energy value at each frame; determining an energy maximum value, wherein the maximum energy value is The largest value of the energy values of the frame; determining whether there is an abnormal sound in the first speech signal according to the maximum value of the energy to determine whether the sound output device is abnormal.
  • the voice signal can represent the real use scenario of the user, and the entire playback process of the voice signal is repeatedly triggered in the actual frequency band concentrated in the voice, thereby facilitating the discovery of the problematic frequency. Point exception; and, this application
  • the medium speech signal itself represents the real frequency point that needs to be detected, so the possibility of missing the problem frequency point is much smaller, which is beneficial for detecting the frequency point with abnormal sound.
  • the signal to be detected is a voice signal stored locally in the terminal device played by the sound output device, thereby avoiding the problem that the voice signal is lost during the transmission process, or the noise is caused by the doping noise.
  • the residual signal includes a portion of the first voice signal different from the signal of the voice reference signal, and then detecting the residual signal to determine whether there is an abnormal sound in the first voice signal, first
  • the voice signal and the voice reference signal have the same voice content, the detection method is convenient, and the detection method has good versatility, and the accuracy of the detection result is improved.
  • FIG. 8 is a schematic flowchart diagram of still another abnormal sound detecting method according to an embodiment of the present application. As shown in FIG. 8, the method includes:
  • S505 Perform high-pass filtering on the residual signal to obtain a residual signal with the energy of the main band of the voice removed.
  • S506. Determine an energy value of the residual signal from which the energy of the main band of the voice is removed.
  • S508. Determine whether the energy value is greater than or equal to the energy threshold to determine whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
  • S5010 Determine that the sound output device is normal when the energy value is determined to be less than the energy threshold.
  • the steps of the flow schematic diagram of the other abnormal sound detecting method provided in FIG. 5 and the steps of the flow schematic diagram of another abnormal sound detecting method provided in FIG. 7 may be referred to in each step.
  • the principle and effect are the same as the principle and effect of the method provided by the above embodiments.
  • FIG. 9 is a schematic structural diagram of an abnormal sound detecting apparatus according to an embodiment of the present application. As shown in Figure 9, the device includes:
  • the acquiring unit 81 is configured to acquire a first voice signal that is played by the sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes randomly;
  • the calculating unit 82 is configured to obtain a residual signal according to the pre-acquired voice reference signal and the first voice signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal;
  • the determining unit 83 is configured to determine, according to the residual signal, whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
  • the obtaining unit 81 may perform step S101 of the method shown in FIG. 2, or the first obtaining unit 81 may perform step S202 of the method shown in FIG. 4, or the first obtaining unit 81 may perform step S301 of the method shown in FIG. Or the first obtaining unit 81 can perform step S401 of the method shown in FIG.
  • the computing unit 82 may perform step S102 of the method illustrated in FIG. 2, or the computing unit 82 may perform step S205 of the method illustrated in FIG. 4, or the computing unit 82 may perform step S304 of the method illustrated in FIG. 5, or the computing unit 82 may perform Step S404 of the method shown in FIG.
  • the determining unit 83 may perform step S103 of the method illustrated in FIG. 2, or the determining unit 83 may perform step S205 of the method illustrated in FIG.
  • the abnormal sound detecting device of the embodiment shown in FIG. 9 can be used to perform the technical solution of the embodiment shown in FIG. 2 to FIG. 4 in the above method, and the implementation principle and technical effects are similar, and details are not described herein again.
  • FIG. 10 is a schematic structural diagram of still another abnormal sound detecting apparatus according to an embodiment of the present application.
  • the determining unit 83 includes:
  • the first determining module 831 is configured to determine an energy value of the residual signal.
  • the first determining module 831 can perform step S305 of the method shown in FIG. 5, or the first determining module 831 can perform step S405 of the method shown in FIG.
  • the second determining module 832 is configured to determine, according to the energy value, whether the first voice signal has an abnormal sound.
  • the second determining module 832 can perform step S306 of the method shown in FIG. 5, or the second determining module 832 can perform step S406 of the method shown in FIG.
  • the first determining module 831 includes:
  • the removal sub-module 8311 is configured to remove the voice main band energy in the residual signal, and generate a residual signal with the voice main band energy removed, wherein the frequency of the voice main band energy is smaller than the first frequency value.
  • the removing submodule 8311 can perform the step of removing the voice main band energy in the residual signal in step S305 of the method shown in FIG. 5, and generate a residual signal with the voice main band energy removed, wherein the voice main frequency band The frequency of the energy is less than the first frequency value", or the removal sub-module 8311 can perform the process of removing the voice main band energy in the residual signal in step S405 of the method shown in FIG. A residual signal of energy, wherein the frequency of the speech main band energy is less than the first frequency value.
  • the determining sub-module 8312 is configured to determine an energy value of the residual signal from which the speech main band energy is removed.
  • the determining sub-module 8312 may perform the process of “determining the energy value of the residual signal with the voice mainband energy removed” in step S305 of the method shown in FIG. 5, or the determining sub-module 8312 may perform the process shown in FIG. 7.
  • the determining submodule 8312 is specifically configured to:
  • the energy value in each frame is determined by the portion of the residual signal from which the energy of the speech main band energy is removed is greater than the second frequency value.
  • the determination sub-module 8312 can perform the process of "determining the energy value of the residual signal from which the speech main band energy is removed" in step S305 of the method shown in FIG.
  • the second determining module 832 is specifically configured to:
  • the second determination module 832 can perform step S306 of the method shown in FIG.
  • the determining sub-module 8312 is specifically configured to:
  • the determination sub-module 8312 can perform the process of "determining the energy value of the residual signal from which the speech main band energy is removed" in step S405 of the method shown in FIG.
  • the second determining module 832 is specifically configured to:
  • the second determination module 832 can perform step S406 of the method shown in FIG.
  • the method further includes:
  • the generating unit 91 is configured to acquire, after the calculating unit 82 obtains the residual signal according to the pre-acquired voice reference signal and the first voice signal, the second voice signal played by the at least one other sound output device, wherein the other voice output
  • the device is a sound output device that plays a normal sound, and the voice content in the second voice signal is the same as the voice content in the first voice signal; each second voice signal is subjected to signal superposition processing to generate a voice reference signal.
  • the generating unit 91 may perform step S302 of the method shown in FIG. 5, or the generating unit 91 may perform step S402 of the method shown in FIG.
  • the aligning unit 92 is configured to: after the calculating unit 82 obtains the residual signal according to the pre-acquired voice reference signal and the first voice signal, delay the first voice signal with the voice reference signal in the time domain to generate an alignment.
  • the first speech signal after the speech reference signal can perform step S303 of the method shown in FIG. 5, or the aligning unit 92 can perform step S403 of the method shown in FIG.
  • the abnormal sound detecting device of the embodiment shown in FIG. 10 can be used to perform the technical solution of the embodiment shown in FIG. 5 to FIG. 8 in the above method, and the implementation principle and technical effects are similar, and details are not described herein again.
  • this embodiment does not depend on whether the embodiment shown in FIG. 9 is implemented, and the embodiment can be implemented independently.
  • FIG. 11 is a schematic structural diagram of another abnormal sound detecting apparatus according to an embodiment of the present application.
  • the network device includes a transmitter 261, a receiver 262, and a processor 263.
  • the receiver 262 is configured to acquire a first voice signal played by a sound output device of the terminal device, where the first voice signal is locally stored in the terminal device, and the first voice signal includes audio information whose frequency changes irregularly.
  • the processor 263 is configured to obtain a residual signal according to the pre-acquired voice reference signal and the first voice signal, where the residual signal includes a portion of the first voice signal that is different from the signal of the voice reference signal; The signal determines whether the first voice signal has an abnormal sound to determine whether the sound output device is abnormal.
  • the receiver 262 can implement the function of the obtaining unit 81 in the apparatus shown in FIG. 9, and further, the receiver 262 can perform step S101 of the method shown in FIG. 2, or the receiver 262 can perform the steps of the method shown in FIG. S202, or the receiver 262 may perform step S301 of the method illustrated in FIG. 5, or the receiver 262 may perform step S401 of the method illustrated in FIG.
  • the processor 263 can implement the functions of the computing unit 82 and the determining unit 83 in the apparatus shown in FIG. 9, and further, the processor 263 can perform steps S102 and S103 of the method shown in FIG. 2, or the processor 263 can execute the method shown in FIG. Step S205 of the method.
  • the processor 263 is specifically configured to determine an energy value of the residual signal, and determine, according to the energy value, whether the first voice signal has an abnormal sound. At this time, the processor 263 can implement the functions of the first determining module 831 and the second determining module 832 in the apparatus shown in FIG. 10, and further, the processor 263 can perform step S305 of the method shown in FIG. 5 and S306, or processor 263, may perform steps S405 and S406 of the method illustrated in FIG.
  • the processor 263 is specifically configured to remove the voice main band energy in the residual signal, and generate a residual signal with the voice main band energy removed, wherein the frequency of the voice main band energy is smaller than the first frequency value; determining that the voice is removed The energy value of the residual signal of the main band energy.
  • the processor 263 can implement the functions of the removal submodule 8311 and the determination submodule 8312 in the apparatus shown in FIG. 10, and further, the processor 263 can execute step S305 of the method shown in FIG. 5, or the processor 263 can execute the diagram. Step S405 of the method shown in 7.
  • the processor 263 is specifically configured to determine, in the residual signal in which the energy of the voice main band energy is removed, a portion whose frequency is greater than the second frequency value, and the energy value in each frame; in determining the energy value of each frame, When the energy value without the preset number is less than the first energy threshold corresponding to the energy value, determining that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal; determining the energy of each frame In the value, when the energy value of the preset number is less than the first energy threshold corresponding to the energy value, it is determined that the first voice signal does not have an abnormal sound, and the sound output device is determined to be normal.
  • the processor 263 can implement the functions of the determining sub-module 8312 and the second determining module 832 in the apparatus shown in FIG. 10, and further, the processor 263 can perform the step of determining the voice removed in step S305 of the method shown in FIG. The process of the energy value of the residual signal of the main band energy" and the step S306 of the method shown in FIG.
  • the processor 263 is specifically configured to determine a portion of the residual signal in which the energy of the voice main band is removed, a frequency greater than the second frequency value, and an energy value in each frame; and determine an energy maximum, where the energy maximum The maximum value of the energy values of each frame; when determining that the energy maximum value is greater than or equal to the second energy threshold value, determining that the first voice signal has an abnormal sound, and determining that the sound output device is abnormal; When the energy threshold is two, it is determined that there is no abnormal sound in the first speech signal, and it is determined that the sound output device is normal.
  • the processor 263 can implement the functions of the determining sub-module 8312 and the second determining module 832 in the apparatus shown in FIG. 10, and further, the processor 263 can perform the "determining the removed voice" in step S405 of the method shown in FIG. The process of the energy value of the residual signal of the main band energy, and the step S406 of the method shown in FIG.
  • the receiver 262 is further configured to acquire a second voice signal played by at least one other sound output device, wherein the other sound output device is a sound output device that plays a normal sound, and the voice content in the second voice signal is in the first voice signal.
  • the voice content is the same.
  • the receiver 262 can implement a part of the functions of the generating unit 91 in the apparatus shown in FIG. 10, and further, the receiver 262 can perform the "acquisition of at least one other sound output device" in step S302 of the method shown in FIG.
  • the process of the second voice signal, or the receiver 262 may perform the process of "acquiring the second voice signal played by the at least one other sound output device" in step S402 of the method shown in FIG.
  • the processor 263 is further configured to perform signal superposition processing on each of the second speech signals to generate a speech reference signal.
  • the processor 263 can implement a part of the functions of the generating unit 91 in the apparatus shown in FIG. 10, and further, the processor 263 can perform the signal superimposition processing on each of the second voice signals in step S302 of the method shown in FIG.
  • the process of generating a voice reference signal, or the processor 263 may perform the process of "signal superimposing each second voice signal to generate a voice reference signal" in step S402 of the method shown in FIG.
  • the processor 263 is further configured to perform time delay alignment of the first voice signal with the voice reference signal in the time domain to generate a first voice signal after the voice reference signal is aligned.
  • the processor 263 can implement the function of the aligning unit 92 in the apparatus shown in FIG. 10, and further, the processor 263 can execute step S303 of the method shown in FIG. 5, or the processor 263 can execute the steps of the method shown in FIG. S403.
  • the abnormal sound detecting device of the embodiment shown in FIG. 11 can be used to execute the technical solution of the above method embodiment, or In the program of each module of the embodiment shown in FIG. 10, the processor 263 calls the program to perform the operations of the above method embodiments to implement the modules shown in FIG. 9 and FIG.
  • the processor 263 may also be a controller, and is represented as "controller/processor 263" in FIG.
  • the transmitter 261 and the receiver 262 are configured to support transmission and reception of information between the network device and the terminal device in the above embodiment, and to support radio communication between the terminal device and other terminal devices.
  • the processor 263 performs various functions for communicating with the terminal device.
  • the network device may further include a memory 264 for storing program codes and data of the network device.
  • the processor 263 such as a central processing unit (CPU), may also be one or more integrated circuits configured to implement the above method, for example, one or more application specific integrated circuits (ASICs), Or, one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs).
  • the memory 264 can be a memory or a collective name for a plurality of storage elements.
  • the transmitter 261 included in the abnormal sound detecting apparatus of FIG. 11 may perform a sending operation corresponding to the foregoing method embodiment, and the processor 263 performs processing operations such as processing, determining, and acquiring, and the receiver.
  • the receiving action can be performed.
  • the receiver 262 included in the abnormal sound detecting device of Fig. 11 corresponds to the operation of acquiring a voice signal in the above-described method embodiment.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server or data center by wire (eg, coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (eg, a Solid State Disk (SSD)) or the like.
  • a magnetic medium eg, a floppy disk, a hard disk, a magnetic tape
  • an optical medium eg, a DVD
  • a semiconductor medium eg, a Solid State Disk (SSD)
  • the functions described in the embodiments of the present application may be implemented in hardware, software, firmware, or any combination thereof.
  • the functions may be stored in a computer readable medium or transmitted as one or more instructions or code on a computer readable medium.
  • Computer readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one location to another.
  • a storage medium may be any available media that can be accessed by a general purpose or special purpose computer.

Landscapes

  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Telephone Function (AREA)

Abstract

La présente invention concerne un appareil et un procédé de détection de son anormal (02). Le procédé consiste à : acquérir un premier signal vocal lu par un dispositif de sortie sonore (03) d'un dispositif terminal (01), le premier signal vocal étant stocké localement dans le dispositif terminal (01), et le premier signal vocal comprenant des informations audio ayant une fréquence changeant irrégulièrement (101, 301, 401, 501) ; obtenir un signal résiduel selon un signal vocal de référence acquis au préalable et le premier signal vocal, le signal résiduel comprenant une partie du premier signal vocal qui est différent du signal vocal de référence (102, 304, 404) ; et déterminer, en fonction du signal résiduel, s'il y a un son anormal dans le premier signal vocal de façon à déterminer si le dispositif de sortie de son est anormal (103). Le signal vocal représente un scénario d'utilisation réelle d'un utilisateur, et des points de fréquence sont déclenchés de manière répétée ensemble dans une bande de fréquence réelle de la parole pendant l'ensemble d'un processus de lecture du signal vocal, facilitant la découverte d'un point de fréquence problématique. Le signal vocal lui-même représente des points de fréquence réels devant subir une détection, et la probabilité de manquer un point de fréquence problématique est fortement réduite. Le procédé de détection est pratique et universel, et présente un résultat de détection précis.
PCT/CN2017/082415 2017-01-20 2017-04-28 Appareil et procédé de détection de son anormal WO2018133247A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201780009940.9A CN108605191B (zh) 2017-01-20 2017-04-28 异音检测方法和装置

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710045605.6 2017-01-20
CN201710045605 2017-01-20

Publications (1)

Publication Number Publication Date
WO2018133247A1 true WO2018133247A1 (fr) 2018-07-26

Family

ID=62907582

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/082415 WO2018133247A1 (fr) 2017-01-20 2017-04-28 Appareil et procédé de détection de son anormal

Country Status (2)

Country Link
CN (1) CN108605191B (fr)
WO (1) WO2018133247A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI778437B (zh) * 2020-10-23 2022-09-21 財團法人資訊工業策進會 用於音頻裝置的瑕疵檢測裝置及瑕疵檢測方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113611327A (zh) * 2020-10-23 2021-11-05 深圳市冠旭电子股份有限公司 异音检测分析方法、装置、终端设备及可读存储介质
CN112969134B (zh) * 2021-02-07 2022-05-10 深圳市微纳感知计算技术有限公司 麦克风异常检测方法、装置、设备及存储介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050253713A1 (en) * 2004-05-17 2005-11-17 Teppei Yokota Audio apparatus and monitoring method using the same
CN103546853A (zh) * 2013-09-18 2014-01-29 浙江中科电声研发中心 一种基于短时傅里叶变换的扬声器异常音检测方法
JP2014182092A (ja) * 2013-03-21 2014-09-29 Jx Nippon Oil & Energy Corp 異常検知方法及び異常検知装置
CN104168532A (zh) * 2013-05-15 2014-11-26 光宝光电(常州)有限公司 扬声器异音检测方法及装置
CN104363554A (zh) * 2014-09-29 2015-02-18 嘉善恩益迪电声技术服务有限公司 一种扬声器异常音检测方法
CN105163262A (zh) * 2015-09-30 2015-12-16 南京师范大学 一种扬声器异音检测方法及检测系统
CN105810213A (zh) * 2014-12-30 2016-07-27 浙江大华技术股份有限公司 一种典型异常声音检测方法及装置
CN106303876A (zh) * 2015-05-19 2017-01-04 比亚迪股份有限公司 语音系统、异音检测方法及电子装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100571452C (zh) * 2006-04-07 2009-12-16 清华大学 扬声器纯音检测方法
CN100554916C (zh) * 2006-04-28 2009-10-28 孙盈军 一种数字产品的测试方法及其专用装置
CN101917735A (zh) * 2010-05-06 2010-12-15 王芸 一种移动终端音频校准方法及自动化测试系统
CN102324229B (zh) * 2011-09-08 2012-11-28 中国科学院自动化研究所 语音输入设备使用异常的检测方法及系统
CN106034272A (zh) * 2015-03-17 2016-10-19 钰太芯微电子科技(上海)有限公司 扬声器补偿系统及便携式移动终端
CN106488376B (zh) * 2016-10-28 2020-03-27 努比亚技术有限公司 一种对移动终端的音频元件进行故障诊断的方法和装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050253713A1 (en) * 2004-05-17 2005-11-17 Teppei Yokota Audio apparatus and monitoring method using the same
JP2014182092A (ja) * 2013-03-21 2014-09-29 Jx Nippon Oil & Energy Corp 異常検知方法及び異常検知装置
CN104168532A (zh) * 2013-05-15 2014-11-26 光宝光电(常州)有限公司 扬声器异音检测方法及装置
CN103546853A (zh) * 2013-09-18 2014-01-29 浙江中科电声研发中心 一种基于短时傅里叶变换的扬声器异常音检测方法
CN104363554A (zh) * 2014-09-29 2015-02-18 嘉善恩益迪电声技术服务有限公司 一种扬声器异常音检测方法
CN105810213A (zh) * 2014-12-30 2016-07-27 浙江大华技术股份有限公司 一种典型异常声音检测方法及装置
CN106303876A (zh) * 2015-05-19 2017-01-04 比亚迪股份有限公司 语音系统、异音检测方法及电子装置
CN105163262A (zh) * 2015-09-30 2015-12-16 南京师范大学 一种扬声器异音检测方法及检测系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI778437B (zh) * 2020-10-23 2022-09-21 財團法人資訊工業策進會 用於音頻裝置的瑕疵檢測裝置及瑕疵檢測方法

Also Published As

Publication number Publication date
CN108605191A (zh) 2018-09-28
CN108605191B (zh) 2020-12-25

Similar Documents

Publication Publication Date Title
WO2015184893A1 (fr) Procédé et dispositif de réduction de bruit d'appel vocal pour terminal mobile
US9363596B2 (en) System and method of mixing accelerometer and microphone signals to improve voice quality in a mobile device
US9654874B2 (en) Systems and methods for feedback detection
WO2017185342A1 (fr) Procédé et appareil pour déterminer une anomalie d'entrée vocale, terminal et support d'informations
TWI628454B (zh) 基於聲波的空間狀態偵測裝置、系統與方法
WO2013107307A1 (fr) Procédé et dispositif de réduction du bruit
US9672843B2 (en) Apparatus and method for improving an audio signal in the spectral domain
WO2018133247A1 (fr) Appareil et procédé de détection de son anormal
WO2016184138A1 (fr) Procédé, terminal mobile et support de stockage informatique pour régler des paramètres audio
JP2011061422A (ja) 情報処理装置、情報処理方法およびプログラム
CN103152546A (zh) 基于模式识别和延迟前馈控制的视频会议回声抑制方法
JP2013527479A (ja) 破損したオーディオ信号の修復
TWI506620B (zh) 通訊裝置及其語音處理方法
US20140341386A1 (en) Noise reduction
CN112802486B (zh) 一种噪声抑制方法、装置及电子设备
WO2015085946A1 (fr) Procédé, appareil et serveur de traitement de signal vocal
CN110996238B (zh) 双耳同步信号处理助听系统及方法
WO2020125325A1 (fr) Procédé d'élimination d'écho et dispositif
WO2017045512A1 (fr) Procédé de reconnaissance vocale et appareil, terminal et dispositif de reconnaissance vocale
CN114584908B (zh) 助听器的声学测试方法、装置以及设备
US8615075B2 (en) Method and apparatus for removing noise signal from input signal
TWI790718B (zh) 會議終端及用於會議的回音消除方法
WO2022041485A1 (fr) Procédé de traitement de signal audio, dispositif électronique et support de stockage
US10623845B1 (en) Acoustic gesture detection for control of a hearable device
JP2010010856A (ja) ノイズキャンセル装置、ノイズキャンセル方法、ノイズキャンセルプログラム、ノイズキャンセルシステム、及び、基地局

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17893451

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17893451

Country of ref document: EP

Kind code of ref document: A1