WO2019062751A1 - Method and device for detecting abnormalities of voice data - Google Patents

Method and device for detecting abnormalities of voice data Download PDF

Info

Publication number
WO2019062751A1
WO2019062751A1 PCT/CN2018/107572 CN2018107572W WO2019062751A1 WO 2019062751 A1 WO2019062751 A1 WO 2019062751A1 CN 2018107572 W CN2018107572 W CN 2018107572W WO 2019062751 A1 WO2019062751 A1 WO 2019062751A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice data
microphone
target voice
data
high frequency
Prior art date
Application number
PCT/CN2018/107572
Other languages
French (fr)
Chinese (zh)
Inventor
杨霖
韩晓
尹朝阳
苏俊峰
王建鹏
高骏鹏
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2019062751A1 publication Critical patent/WO2019062751A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/82Line monitoring circuits for call progress or status discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R29/00Monitoring arrangements; Testing arrangements

Definitions

  • the present application relates to the field of voice technologies, and in particular, to an abnormality detection method and apparatus for voice data.
  • the voice call function is one of the basic applications of the mobile phone, and the quality of the voice call is directly related to the user's feeling of using the mobile phone.
  • the voice data collected from the local mobile phone is transmitted to the opposite mobile phone through the audio effect processing, which is called the uplink call path; otherwise, the voice data received by the local mobile phone from the opposite mobile phone is played through the speaker or the earpiece. , called the down call path.
  • the inventor of the present application found that during the actual call, there is a scene in which the time domain signal in the voice data is normal but the frequency domain signal is abnormal. Such a scenario may cause an abnormal problem such as silence or discontinuity during the call, but Such voice data with abnormal frequency domain signals cannot be detected by the existing time domain detection method, and thus there is no regulation to avoid abnormal call phenomena caused by abnormal frequency domain signals.
  • the main purpose of the embodiment of the present application is to provide an abnormality detecting method and device for voice data, which can detect voice data with abnormal frequency domain.
  • the present application provides an abnormality detecting method for voice data, including:
  • Determining whether the high frequency energy in the target speech data is less than the high frequency energy in the normal speech data by analyzing the magnitude of the low frequency energy or the high frequency energy in the target speech data;
  • the determining, by analyzing the magnitude of the low-frequency energy in the target voice data, determining whether the high-frequency energy in the target voice data is lower than the normal voice data Frequency energy including:
  • the low frequency energy ratio is a proportion of a total energy of the low frequency data in the target voice data to a total energy of the target voice data
  • the low frequency energy ratio is greater than the low frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the low frequency occupancy threshold is the normal voice data.
  • the total energy of the low frequency data accounts for the proportion of the total energy of the normal voice data.
  • the determining whether the high frequency energy in the target voice data is less than the normal voice data by analyzing the magnitude of the high frequency energy in the target voice data High frequency energy including:
  • the high frequency energy ratio is a proportion of a total energy of the high frequency data in the target voice data to a total energy of the target voice data
  • the high frequency energy ratio is less than the high frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the high frequency occupancy threshold is The total energy of the high frequency data in the normal voice data accounts for the proportion of the total energy of the normal voice data.
  • the third possible implementation manner after the determining that the target voice data is abnormal, also includes:
  • the interval is the first duration, and the acquiring the target voice data transmitted through the uplink call path is continued;
  • abnormal processing is performed according to the number of microphone channels of the microphone.
  • the acquiring the target voice data that is transmitted by using the uplink call channel includes:
  • the abnormal processing is performed according to the number of the microphone channels of the microphone, including:
  • the microphone microphone has only one microphone path, outputting a second prompt, wherein the second prompt is used to prompt the user that the microphone path may be faulty;
  • the microphone microphone has at least two mic paths, and the target voice data collected by the partial mic path is abnormal in the frequency domain, another normal mic path is selected for voice call;
  • the microphone microphone has at least two mic paths, and the target voice data collected by all the mic paths is abnormal in the frequency domain, outputting a third prompt, wherein the third prompt is used to prompt the user to the microphone path May all fail.
  • the acquiring the target voice data that is transmitted by using the uplink call channel includes:
  • the method further includes:
  • the target voice data acquired in the first duration is abnormal in the frequency domain, performing an exception processing, where the first duration is the current interval, or at least two including the current interval Segment continuous interval.
  • an anomaly detecting apparatus for voice data, the anomaly detecting apparatus comprising means for performing the method provided by the first aspect or any of the possible implementations of the first aspect.
  • an abnormality detecting apparatus for voice data comprising: a processor, a memory, and a bus system; the processor and the memory are connected by the system bus; and the memory is used for one Or a plurality of programs, the one or more programs comprising instructions that, when executed by the anomaly detecting device, cause the anomaly detecting device to perform the first aspect or any one of the possible implementations of the first aspect Methods.
  • a fourth aspect a computer readable storage medium storing one or more programs, the one or more programs being executed by the abnormality detecting device, the abnormality detecting device performing the first aspect or the first aspect
  • the method provided by any of the possible implementations.
  • a graphical user interface is provided on an anomaly detecting device, the anomaly detecting device comprising a display, a memory, a plurality of applications, and one or one of executing one or more programs stored in the memory A plurality of processors, the graphical user interface comprising a user interface displayed in accordance with the method provided by the first aspect or any one of the possible implementations of the first aspect, wherein the display comprises a touch-sensitive surface and a display screen.
  • the method and device for detecting anomaly of voice data provided by the present application first acquire target voice data transmitted through an uplink call path; since normal voice data has a large proportion of low frequency energy and a small proportion of high frequency energy, therefore, analysis is performed.
  • the magnitude of the low-frequency energy or the high-frequency energy in the target speech data can determine whether the high-frequency energy in the target speech data is less than the high-frequency energy in the normal speech data; when the judgment result is yes, The high frequency energy in the target speech data is lost or truncated, so it can be determined that the target speech data is abnormal in the frequency domain.
  • FIG. 1 is a schematic diagram of an uplink call path of a mobile phone according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a method for detecting an abnormality of voice data according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of amplitude/frequency of normal voice data according to an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a method for detecting an abnormality of a voice data according to an embodiment of the present disclosure
  • FIG. 5 is a second schematic flowchart of a method for detecting an abnormality of a voice data according to an embodiment of the present disclosure
  • FIG. 6 is a schematic structural diagram of an abnormality detecting apparatus for voice data according to an embodiment of the present disclosure
  • FIG. 7 is a schematic structural diagram of hardware of an abnormality detecting apparatus for voice data according to an embodiment of the present disclosure.
  • a microphone MIC
  • a board-level connection failure a user failure, etc.
  • the MIC unit is faulty. It refers to the entry of impurities into the diaphragm of the MIC unit of the talking device, causing partial adhesion of the diaphragm.
  • the board-level connection failure refers to a situation in which an instantaneous short circuit occurs in the audio path
  • the user-used fault refers to a user error during the call.
  • the operation causes the finger to block the MIC hole.
  • the time domain signal in the voice data may be normal but the frequency domain signal is abnormal, and the abnormal frequency domain signal may cause a silent or intermittent call problem during the call.
  • the above-mentioned faults may cause the time domain signal of the voice data to not change greatly, but the frequency domain signal is abnormal.
  • the existing voice call detection technologies are all for the time domain signal of the voice. Detection, there is no accurate, fast and effective detection method to detect whether the voice data is abnormal in the frequency domain, and then can not eliminate the frequency domain anomaly detection result to eliminate the cause of the silent or intermittent failure during the call due to the frequency domain anomaly. .
  • the embodiment of the present application provides an abnormality detection method for voice data, which can accurately, quickly, and effectively detect whether the voice data is abnormal in the frequency domain, and can also determine After the frequency domain is abnormal, the possible causes of the frequency domain anomaly are investigated and the exception is processed. It should be noted that the method provided by the embodiment of the present application can be applied to any type of voice call device, such as a mobile phone or a landline, which does not limit the type of the voice call device.
  • FIG. 1 it is a schematic diagram of an uplink call path of a mobile phone.
  • the voice data of the user 1 is collected by the MIC of the mobile phone. After that, it is passed to the COder-DECoder (Codec) chip for A/D conversion, that is, the analog voice signal is converted into a digital voice signal, and then the voice data is transmitted to the sound algorithm module for sound processing, and the sound effect is obtained.
  • the processed voice data is protocol-encoded and transmitted to the modem (English name Modem), and the modem finally sends the encoded data to the mobile phone or landline of the peer user 2.
  • the modem English name Modem
  • the embodiment may add an abnormality detecting module based on the physical structure of the existing mobile phone, and use the abnormality detecting module to perform frequency domain abnormality detection of the voice data, wherein the abnormality detecting module and the sound effect algorithm module may be adopted.
  • the same or different Digital Signal Processing (DSP) chips are used to implement related functions. Since the normal voice data has a low proportion of low frequency energy and a small proportion of high frequency energy, the high frequency energy in the voice data may be lost or cut off due to the above MIC single unit failure, board level connection failure, user use failure, and the like.
  • DSP Digital Signal Processing
  • the sound effect algorithm processing module performs the voice effect processing.
  • the high frequency energy if the judgment result is yes, indicates that the high frequency energy in the collected speech data is lost or truncated, so that it is possible to determine that the collected speech data is abnormal in the frequency domain.
  • the collected voice data is a digital voice signal that is A/D converted by the Codec chip.
  • the collected voice data is hereinafter referred to as target voice data.
  • FIG. 2 is a schematic flowchart of a method for detecting an abnormality of voice data according to an embodiment of the present application, where the method includes the following steps S201-S202:
  • S201 Acquire target voice data transmitted through the uplink call path.
  • the sound effect algorithm module sends the received voice data to the abnormality detecting module.
  • the Codec chip can directly transmit the output voice data.
  • the abnormality detecting module detects whether the voice data is abnormal in the frequency domain by the abnormality detecting module, and the detected voice data is the target voice data.
  • the target voice data may be voice data acquired in a short time (for example, 1 ms), and may be voice data acquired in a long time (for example, 1 s).
  • S202 determining, by analyzing the magnitude of the low-frequency energy or the high-frequency energy in the target voice data, whether the high-frequency energy in the target voice data is less than the high-frequency energy in the normal voice data; if yes, determining the target The voice data is abnormal.
  • FIG. 3 there is a schematic diagram of the amplitude/frequency of normal speech data, wherein the abscissa f represents frequency and the ordinate A represents amplitude.
  • the low frequency energy has a large proportion and the high frequency energy has a small proportion. Therefore, the low frequency data in the target voice data can be obtained and the energy ratio of the low frequency data in the target voice data can be determined.
  • Determining whether the energy ratio satisfies the ratio of normal voice data to low frequency energy or, by acquiring high frequency data in the target voice data and determining the energy ratio of the high frequency data in the target voice data, And determining whether the energy ratio satisfies the requirement of the normal voice data for the high frequency energy; if not, the high frequency signal in the target voice data is lost or truncated, so that the target voice data can be determined. Abnormal in the frequency domain.
  • step S202 can be implemented by using one of the following two implementation manners.
  • S202 may specifically include:
  • S2021 Obtain low-frequency data in the target voice data by performing low-pass filtering on the target voice data.
  • a finite impulse response (Finite Impulse Response, abbreviated as FIR) digital filter or an Infinite Impulse Response (IIR) digital filter may be set in advance in the abnormality detecting module shown in FIG. 1 and set to Low pass filter and set low pass frequency threshold f Lp .
  • FIR Finite Impulse Response
  • IIR Infinite Impulse Response
  • the target speech data When the target speech data is low-pass filtered by using a low-pass filter, data whose frequency is lower than the threshold f Lp in the target speech data will pass through the low-pass filter, and the passed data is the target speech data. Low frequency data.
  • S2022 Calculate a low frequency energy ratio, wherein the low frequency energy ratio is a proportion of a total energy of the low frequency data in the target voice data to a total energy of the target voice data.
  • the proportion of low-frequency energy in normal speech data may not be a fixed value, but a range of values. Therefore, when setting the low-frequency occupancy threshold Kthreshold, it can be set to the proportion of low-frequency energy in normal speech data.
  • the mobile phone 1 After the user 1 establishes a normal voice call with the mobile phone 1 and the mobile phone 2, the mobile phone 1 continuously collects the voice data of the user 1 at intervals of 1 ms, assuming that the sampling interval Tunit of the voice data is set to 1 ms.
  • Each MIC path of the mobile phone 1 can collect 48 voice data every 1 ms, and the 48 voice data is the target voice data.
  • a low-pass filter uses 10th-order (or other order) FIR or IIR low-pass filtering on 48 speech data acquired every 1ms, assuming that the set low-pass filtering frequency threshold f Lp is 4KHz, then each speech data The data component below 4KHz can pass through the low pass filter, and the data passing through the low pass filter is the low frequency data in the 48 speech data.
  • voice data collected every 1 ms is data[0] ⁇ data[47]
  • low frequency data in each voice data of data[0] ⁇ data[47] is defined as data_Lp[0] ⁇ data_Lp[47] .
  • the amplitude of the i-th data of the 48 voice data is shown.
  • the low-frequency energy ratio Kactucal exceeds the low-frequency occupancy threshold Kthreshold, it indicates that the target speech data acquired in the unit time Tunit has a high-frequency signal missing or intercepted.
  • S202 may specifically include:
  • S2021 Obtain high frequency data in the target voice data by performing high-pass filtering on the target voice data.
  • the FIR digital filter or the IIR digital filter may be set in advance in the abnormality detecting module shown in FIG. 1, and set as a high-pass filter and set a high-pass frequency threshold f Hp .
  • the target speech data When the target speech data is high-pass filtered by the high-pass filter, data having a frequency higher than the threshold f Hp in the target speech data will pass through the high-pass filter, and the passed data is the high frequency in the target speech data. data.
  • S2022 Calculate a high frequency energy ratio, wherein the high frequency energy ratio is a proportion of a total energy of the high frequency data in the target voice data to a total energy of the target voice data.
  • the high-frequency energy ratio Kactucal is lower than the high-frequency occupancy threshold Kthreshold, that is, Kactucal ⁇ Kthreshold, the high-frequency energy ratio in the target speech data is biased. Low, thereby indicating that the high frequency signal in the target voice data is lost or truncated, thereby indicating that the target voice data is abnormal in the frequency domain.
  • the proportion of high-frequency energy in normal speech data may not be a fixed value, but a range. Therefore, when setting the high-frequency occupancy threshold Kthreshold, it can be set to the high-frequency energy in normal speech data.
  • the mobile phone 1 After the user 1 establishes a normal voice call with the mobile phone 1 and the mobile phone 2, the mobile phone 1 continuously collects the voice data of the user 1 at intervals of 1 ms, assuming that the sampling interval Tunit of the voice data is set to 1 ms.
  • Each voice channel of the mobile phone 1 can collect 48 voice data every 1 ms, and the 48 voice data is the target voice data.
  • 10th order (or other order) FIR or IIR high-pass filtering processing is performed on 48 speech data collected every 1ms. It is assumed that the set high-pass filtering frequency threshold f Hp is 6KHz, and each speech data is higher than 6KHz.
  • the data component can pass through the high-pass filter, and the data passing through the high-pass filter is the high-frequency data of the 48 voice data.
  • voice data collected every 1 ms is data[0] ⁇ data[47]
  • low frequency data in each voice data of data[0] ⁇ data[47] is defined as data_Hp[0] ⁇ data_Hp[47] .
  • the amplitude of the i-th data of the 48 voice data is shown.
  • the high-frequency energy ratio Kactucal When the high-frequency energy ratio Kactucal is lower than the high-frequency occupancy threshold Kthreshold, it indicates that the target speech data acquired in the unit time Tunit has a phenomenon that the high-frequency signal is lost or intercepted.
  • the present embodiment can detect whether the frequency domain signal of the voice data is abnormal in a short time, that is, the detection efficiency is high. Therefore, when the voice data is abnormal in the frequency domain, the problem can be quickly processed and Avoidance, thereby improving the user experience of the call device.
  • step S202 may further include:
  • Step A outputting a first prompt, wherein the first prompt is used to prompt the microphone to be blocked by the user.
  • the user when the call is abnormal due to the user's non-standard use (such as blocking the MIC hole), if there is no response prompt, the user may not know the cause of the problem. Therefore, after determining that the voice data is abnormal in the frequency domain, first check the voice data. Whether the user operation is standardized or not, the user can be prompted to improve the abnormal operation by means of mobile phone vibration or mobile phone prompt tone. For example, the first prompt can be output through voice, such as "Your finger may block the MIC hole", the user hears After the prompt, the finger will generally be removed.
  • Step B After the first prompt is output, the interval is the first duration, and step S201 is continued.
  • a certain abnormal cancellation time (ie, the first duration) is reserved for the user, for example, 5 seconds, and then the process proceeds to step S201 to continue collecting voice data and performing abnormality detection.
  • Step C If the target voice data acquired after the first duration is abnormal, abnormal processing is performed according to the number of microphone channels of the microphone.
  • the voice data should return to normal when the user's finger is no longer blocked, but if the voice data is still abnormal, one or more MIC cells of the MIC microphone may appear. malfunction.
  • the main MIC unit if the main MIC unit is physically damaged, the mobile phone will be unusable, and the user must repair the network point to repair it.
  • it after detecting the abnormality of the main MIC unit, it can automatically switch to The secondary MIC unit makes a call to ensure the integrity of the call and prompts the user which MIC units may be faulty.
  • the S201 may specifically include: acquiring target voice data collected by each of the microphone channels of the microphone microphone, in order to be able to determine which one or which MICs may be faulty.
  • the MIC array of the MIC microphone can be detected in advance, and it is determined that the MIC microphone has several MIC monomers, such as only one main MIC monomer, or one main MIC monomer and one or more sub-MIC monomers. Each MIC unit corresponds to one MIC path. Thereafter, the target voice data collected for each MIC channel of the MIC microphone is respectively subjected to frequency domain anomaly detection, that is, the frequency domain anomaly detection of the voice data of each MIC path may not be mutually rely.
  • the existing voice data detection algorithm mainly relies on the time domain signal, and only analyzes the time domain signal collected by the single MIC path, and cannot accurately determine whether the voice data is abnormal, but needs voice data collected by multiple MIC channels.
  • Auxiliary comprehensive judgment in addition, through the multi-channel comprehensive judgment, there is a problem that the spending period is long and the detection accuracy is low. It can be seen that, compared with the prior art, when determining whether the voice data is abnormal, the embodiment does not need to rely on the voice data collected by the multiple MIC path, and the time taken for the abnormality detection is less and the detection accuracy is low.
  • the existing time domain detection technology cannot accurately and quickly detect whether the voice data is abnormal, and the performance and characteristics of the call device cannot be fully exerted.
  • the existing time domain detection technology relies on multiple MIC paths for abnormal detection of voice data. Therefore, it is impossible to accurately detect whether the MIC path is faulty, and thus there is no problem of avoiding call abnormality caused by MIC failure.
  • the voice data collected by each MIC path can be abnormally detected, so that the corresponding MIC path may be determined to be faulty according to the abnormality detection result.
  • the microphone path according to the microphone is used in the above step C.
  • the number of exception handling can include:
  • the second prompt such as a voice prompt or a vibrating alert, reminds the user that the single MIC path of the calling device may be faulty.
  • C2 If the microphone microphone has at least two mic paths, and the target voice data collected by the partial mic path is abnormal in the frequency domain, select another normal mic path for voice call.
  • the call device has multiple MIC paths, when the main MIC path is abnormal, the MIC path with the best voice quality in the remaining sub-MIC channels is selected for the call; if the call device has multiple MIC paths, when the main MIC path and the sub MIC path therein are abnormal Then, select the MIC path with the best voice quality in the remaining sub-MIC channels for the call.
  • C3 outputting a third prompt if the microphone microphone has at least two microphone paths and the target voice data collected by all the microphone channels is abnormal in the frequency domain, wherein the third prompt is used to prompt the user to The mic path may all fail.
  • the third prompt such as a voice prompt or a vibration prompt, is used to remind the user that all MIC paths of the call device may be faulty.
  • the calling device when an abnormality of one or some MIC paths is detected, the calling device automatically switches to other normal MIC channels for voice calls, thereby ensuring the integrity of the call and prompting the user which MIC paths may appear. Fault, so that the user can carry out repairs in time.
  • the human ear can have a clear feeling. Therefore, when it is detected by the above steps that the target voice data is abnormal in the frequency domain,
  • the sampling time corresponding to the target speech data is relatively short, for example, 1 ms, and the abnormal processing may not be performed immediately, but the continuous accumulation of the frequency domain abnormal time is performed, for example, the abnormal time accumulation threshold ACC is set to 100 ms, and the frequency domain abnormality detection is accumulated.
  • the abnormality processing is performed by using the above procedure AC.
  • S201 may specifically include: acquiring target voice data transmitted through the uplink call path according to a preset time interval.
  • the A/D converted digital voice data may be acquired at a certain time interval, for example, digital voice data is acquired once every 1 ms, and the data voice data within 1 ms is the target voice data.
  • step AC If the target voice data acquired in the second duration is abnormal, proceed to step AC, where the second duration is the current interval, or at least two consecutive segments including the current interval. Intervals.
  • an abnormal time accumulation threshold ACC ie, a second duration
  • an acquisition time corresponding to the target voice data For example, when the ACC is 100 ms, the voice data collected every 100 ms may be used as the target voice data. If the currently collected target voice data is abnormal in the frequency domain, abnormal processing is performed; for example, it is collected every 1 ms. The voice data is used as the target voice data, and when the target voice data collected for 100 consecutive times is abnormal in the frequency domain, the exception processing is performed.
  • the existing abnormal voice detection technology mainly relies on time domain signals for detection, and has problems such as low detection accuracy and long detection period (generally 2-3 seconds), and the present embodiment is based on frequency domain signals. Compared with the prior art, there are advantageous effects such as high detection accuracy and short detection period (generally 100-300 milliseconds), so that abnormal processing can be performed quickly.
  • the voice anomaly detection method provided in this embodiment is not affected by the age, tone, and the like of the user, and the accuracy of the detection result is more than 80%.
  • FIG. 6 is a schematic structural diagram of an abnormality detecting apparatus for voice data according to an embodiment of the present disclosure.
  • the abnormality detecting apparatus 600 includes:
  • the data obtaining unit 601 is configured to acquire target voice data transmitted through the uplink call path.
  • the abnormality detecting unit 602 is configured to determine whether the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data by analyzing the magnitude of the low frequency energy or the high frequency energy in the target voice data; if yes, Then determining that the target voice data is abnormal.
  • the abnormality detecting unit 602 may include:
  • a low pass filtering subunit configured to acquire low frequency data in the target voice data by performing low pass filtering on the target voice data
  • a percentage calculation subunit for calculating a low frequency energy ratio, wherein the low frequency energy ratio is a proportion of a total energy of the low frequency data in the target voice data to a total energy of the target voice data;
  • An abnormality determining subunit configured to determine that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, if the low frequency energy ratio is greater than a low frequency occupancy threshold, wherein the low frequency ratio The threshold is a proportion of the total energy of the low frequency data in the normal voice data to the total energy of the normal voice data.
  • the abnormality detecting unit 602 may include:
  • a high-pass filtering sub-unit configured to acquire high-frequency data in the target voice data by performing high-pass filtering on the target voice data
  • a ratio calculating subunit for calculating a high frequency energy ratio, wherein the high frequency energy ratio is a proportion of a total energy of the high frequency data in the target voice data to a total energy of the target voice data;
  • An abnormality determining subunit configured to determine that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, if the high frequency energy ratio is less than a high frequency occupancy threshold, wherein the high The frequency occupancy threshold is a proportion of the total energy of the high frequency data in the normal voice data to the total energy of the normal voice data.
  • the apparatus 600 may further include:
  • An abnormality prompting unit configured to output a first prompt if the abnormality detecting unit 602 determines that the target voice data is abnormal, wherein the first prompt is used to prompt the microphone microphone to be blocked by the user;
  • a clock timing unit configured to: after the outputting the first prompt, interval a first duration, triggering the data acquiring unit 601 to acquire target voice data transmitted through an uplink call path;
  • the abnormality processing unit is configured to perform abnormal processing according to the number of the microphone channels of the microphone microphone if the target voice data acquired after the abnormality detecting unit 602 determines the first duration is abnormal.
  • the data acquiring unit 601 may be specifically configured to acquire target voice data collected by each microphone path of the microphone microphone;
  • the exception processing unit is specifically configured to: when the abnormality detecting unit 602 determines that the target voice data acquired after the first duration is abnormal in a frequency domain, if the microphone has only one microphone path, And outputting a second prompt, wherein the second prompt is used to prompt the user that the microphone path may be faulty; if the microphone microphone has at least two microphone paths, and the target voice data collected by the partial microphone path is in the frequency domain If the abnormality is abnormal, the other normal microphone path is selected for the voice call; if the microphone microphone has at least two microphone paths, and the target voice data collected by all the microphone channels is abnormal in the frequency domain, the third prompt is output, where The third prompt is used to prompt the user that the microphone path may be all faulty.
  • the data acquiring unit 601 is specifically configured to acquire target voice data transmitted through the uplink call path according to a preset time interval.
  • the exception processing unit is further configured to: if the abnormality detecting unit 602 determines that the target voice data is abnormal in the second duration, triggering the abnormal prompting unit to output a first prompt, where the The duration of the second duration is the current interval, or at least two consecutive intervals including the current interval.
  • FIG. 7 is a schematic diagram of a hardware structure of an abnormality detecting apparatus for voice data according to an embodiment of the present application.
  • the abnormality detecting apparatus 700 includes a memory 701 and a receiver 702, and the memory 701 and the receiver respectively.
  • the processor 703 is configured to store a set of program instructions, and the processor 703 is configured to invoke the program instructions stored in the memory 701 to perform the following operations:
  • Determining whether the high frequency energy in the target speech data is less than the high frequency energy in the normal speech data by analyzing the magnitude of the low frequency energy or the high frequency energy in the target speech data;
  • the processor 703 is further configured to invoke a program instruction stored by the memory 701 to perform the following operations:
  • the low frequency energy ratio is a proportion of a total energy of the low frequency data in the target voice data to a total energy of the target voice data
  • the low frequency energy ratio is greater than the low frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the low frequency occupancy threshold is the normal voice data.
  • the total energy of the low frequency data accounts for the proportion of the total energy of the normal voice data.
  • the processor 703 is further configured to invoke a program instruction stored by the memory 701 to perform the following operations:
  • the high frequency energy ratio is a proportion of a total energy of the high frequency data in the target voice data to a total energy of the target voice data
  • the high frequency energy ratio is less than the high frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the high frequency occupancy threshold is The total energy of the high frequency data in the normal voice data accounts for the proportion of the total energy of the normal voice data.
  • the processor 703 is further configured to invoke a program instruction stored by the memory 701 to perform the following operations:
  • the determining that the target voice data is abnormal outputting a first prompt, wherein the first prompt is used to prompt the microphone microphone to be blocked by the user;
  • the interval is the first duration, and the acquiring the target voice data transmitted through the uplink call path is continued;
  • abnormal processing is performed according to the number of microphone channels of the microphone.
  • the processor 703 is further configured to invoke a program instruction stored by the memory 701 to perform the following operations:
  • the microphone microphone has only one microphone path, outputting a second prompt, wherein the second prompt is used to prompt the user
  • the microphone path may be faulty
  • the microphone microphone has at least two mic paths, and the target voice data collected by the partial mic path is abnormal in the frequency domain, another normal mic path is selected for voice call;
  • the microphone microphone has at least two mic paths, and the target voice data collected by all the mic paths is abnormal in the frequency domain, outputting a third prompt, wherein the third prompt is used to prompt the user to the microphone path May all fail.
  • the processor 703 is further configured to invoke a program instruction stored by the memory 701 to perform the following operations:
  • the acquiring the target voice data transmitted through the uplink call path includes:
  • the step of outputting the first prompt is continued, where the second duration is the current interval, or the current interval is included At least two consecutive intervals.
  • the memory 701, the receiver 702, and the processor 703 included in the abnormality detecting apparatus 700 may be part of a mobile terminal, and the mobile terminal may include a mobile phone, a tablet, a PDA (Personal Digital Assistant, personal Digital Assistant), POS (Point of Sales), on-board computer, etc.
  • the mobile terminal may include a mobile phone, a tablet, a PDA (Personal Digital Assistant, personal Digital Assistant), POS (Point of Sales), on-board computer, etc.
  • the memory 701 can be used to store software programs and modules, and the processor 703 executes various functional applications and data processing of the mobile terminal by running software programs and modules stored in the memory 701.
  • the memory 701 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored. Data created according to the use of the mobile terminal (such as audio data, phone book, etc.).
  • the memory 701 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
  • Receiver 702 can receive the user's voice.
  • receiver 702 can include a microphone or other structure that receives user speech.
  • the microphone can convert the collected sound signal into a signal, which is received by the audio circuit and then converted into audio data, and then the audio data is output to an RF circuit for transmission to, for example, another mobile terminal, or the audio data is output to the memory 701 for further deal with.
  • the processor 703 is a control center of the mobile terminal that connects various parts of the entire mobile terminal using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 701, and calling data stored in the memory 701.
  • the mobile terminal performs various functions and processing data to perform overall monitoring on the mobile terminal.
  • the processor 703 may include one or more processing units; preferably, the processor 703 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like.
  • the modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 703.
  • the abnormality detecting device 700 can further include a radio frequency circuit for receiving and transmitting the user's voice data.
  • the radio frequency circuit can receive and process the downlink voice data sent by the network device, or send the received uplink voice data to the network device, so as to perform services such as normal voice calls.
  • the abnormality detecting device 700 may include more or less hardware structures as described above, and the specific structure of the abnormality detecting device 700 is not specifically limited in the embodiment of the present invention.
  • the disclosed system, apparatus, and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
  • a computer readable storage medium A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

Abstract

Disclosed in the present application are a method and a device for detecting abnormalities of voice data. In the method, firstly target voice data transmitted via an uplink call path is acquired. As in normal voice data, the proportion of low frequency energy is large, and the proportion of high frequency energy is small, it can be determined, by analyzing the magnitude of the low frequency energy or the high frequency energy in the target voice data, whether the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, and if the result of the determination is yes, it is indicated that the high frequency energy in the target voice data is lost or intercepted, and thus it can be determined that the target voice data is abnormal.

Description

一种语音数据的异常检测方法及装置Method and device for detecting abnormality of voice data
本申请要求于2017年9月27日提交中国专利局、申请号为201710890904.X、发明名称为“一种查找图标的方法及终端”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese Patent Application, filed on Sep. 27, 2017, to the Chinese Patent Office, Application No. 201710890904.X, entitled "A Method and Terminal for Finding Icons", the entire contents of which are incorporated by reference. In this application.
技术领域Technical field
本申请涉及语音技术领域,尤其涉及一种语音数据的异常检测方法及装置。The present application relates to the field of voice technologies, and in particular, to an abnormality detection method and apparatus for voice data.
背景技术Background technique
在手机的日常应用中,语音通话功能是手机的基本应用之一,语音通话质量的好坏直接关系着用户对手机的使用感受。在语音通话过程中,从本端手机采集的语音数据,通过音效处理后传送到对端手机,叫做上行通话通路;反之,本端手机从对端手机接收的语音数据,通过喇叭或者听筒播放出来,叫做下行通话通路。In the daily application of the mobile phone, the voice call function is one of the basic applications of the mobile phone, and the quality of the voice call is directly related to the user's feeling of using the mobile phone. During the voice call, the voice data collected from the local mobile phone is transmitted to the opposite mobile phone through the audio effect processing, which is called the uplink call path; otherwise, the voice data received by the local mobile phone from the opposite mobile phone is played through the speaker or the earpiece. , called the down call path.
目前,各个手机厂家和开源组织主要在开发音效处理的算法,对音效异常的检测并不太关注。虽然各个手机厂家也开发了一些语音异常的检测算法,但现有的语音检测技术,都是针对语音的时域信号进行检测,这种时域检测方法直接对采集的语音信号进行幅值、活跃度、跳变异常等内容进行分析,使得异常检测结果的准确度并不理想。At present, various mobile phone manufacturers and open source organizations are mainly in the algorithm of sound effect processing, and the detection of abnormal sound effects is not very concerned. Although various mobile phone manufacturers have also developed some speech anomaly detection algorithms, the existing speech detection technologies are all for detecting time domain signals of speech. This time domain detection method directly performs amplitude and active on the collected speech signals. The analysis of the degree and the abnormality of the jump makes the accuracy of the abnormality detection result unsatisfactory.
然而,本申请的发明人发现,在实际通话过程中,存在着语音数据中时域信号正常但频域信号异常的场景,这种场景将导致通话过程中出现无声或者断续等异常问题,但这种频域信号异常的语音数据,无法通过现有的时域检测方法检测出来,进而无法规避因频域信号异常导致的通话异常现象。However, the inventor of the present application found that during the actual call, there is a scene in which the time domain signal in the voice data is normal but the frequency domain signal is abnormal. Such a scenario may cause an abnormal problem such as silence or discontinuity during the call, but Such voice data with abnormal frequency domain signals cannot be detected by the existing time domain detection method, and thus there is no regulation to avoid abnormal call phenomena caused by abnormal frequency domain signals.
发明内容Summary of the invention
本申请实施例的主要目的在于提供一种语音数据的异常检测方法及装置,能够检测出频域异常的语音数据。The main purpose of the embodiment of the present application is to provide an abnormality detecting method and device for voice data, which can detect voice data with abnormal frequency domain.
第一方面,本申请提供了一种语音数据的异常检测方法,包括:In a first aspect, the present application provides an abnormality detecting method for voice data, including:
获取经上行通话通路传输的目标语音数据;Acquiring target voice data transmitted through the uplink call path;
通过分析所述目标语音数据中的低频能量或高频能量的大小,判断所述目标语音数据中的高频能量是否少于正常语音数据中的高频能量;Determining whether the high frequency energy in the target speech data is less than the high frequency energy in the normal speech data by analyzing the magnitude of the low frequency energy or the high frequency energy in the target speech data;
若是,则确定所述目标语音数据异常。If yes, it is determined that the target voice data is abnormal.
在第一方面的第一种可能的实现方式中,所述通过分析所述目标语音数据中的低频能量的大小,判断所述目标语音数据中的高频能量是否少于正常语音数据中的高频能量,包括:In a first possible implementation manner of the first aspect, the determining, by analyzing the magnitude of the low-frequency energy in the target voice data, determining whether the high-frequency energy in the target voice data is lower than the normal voice data Frequency energy, including:
通过对所述目标语音数据进行低通滤波,获取所述目标语音数据中的低频数据;Obtaining low frequency data in the target voice data by low pass filtering the target voice data;
计算低频能量占比,其中,所述低频能量占比为所述目标语音数据中低频数据的总能量占所述目标语音数据的总能量的比重;Calculating a low frequency energy ratio, wherein the low frequency energy ratio is a proportion of a total energy of the low frequency data in the target voice data to a total energy of the target voice data;
若所述低频能量占比大于低频占比门限,则确定所述目标语音数据中的高频能量少于正常语音数据中的高频能量,其中,所述低频占比门限为所述正常语音数据中低频数据的总能量占所述正常语音数据的总能量的比重。If the low frequency energy ratio is greater than the low frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the low frequency occupancy threshold is the normal voice data The total energy of the low frequency data accounts for the proportion of the total energy of the normal voice data.
在第一方面的第二种可能的实现方式中,所述通过分析所述目标语音数据中的高频能量 的大小,判断所述目标语音数据中的高频能量是否少于正常语音数据中的高频能量,包括:In a second possible implementation manner of the first aspect, the determining whether the high frequency energy in the target voice data is less than the normal voice data by analyzing the magnitude of the high frequency energy in the target voice data High frequency energy, including:
通过对所述目标语音数据进行高通滤波,获取所述目标语音数据中的高频数据;Obtaining high frequency data in the target voice data by performing high-pass filtering on the target voice data;
计算高频能量占比,其中,所述高频能量占比为所述目标语音数据中高频数据的总能量占所述目标语音数据的总能量的比重;Calculating a high frequency energy ratio, wherein the high frequency energy ratio is a proportion of a total energy of the high frequency data in the target voice data to a total energy of the target voice data;
若所述高频能量占比小于高频占比门限,则确定所述目标语音数据中的高频能量少于正常语音数据中的高频能量,其中,所述高频占比门限为所述正常语音数据中高频数据的总能量占所述正常语音数据的总能量的比重。If the high frequency energy ratio is less than the high frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the high frequency occupancy threshold is The total energy of the high frequency data in the normal voice data accounts for the proportion of the total energy of the normal voice data.
结合第一方面或第一方面的第一种可能的实现方式或第一方面的第二种可能的实现方式,在第三种可能的实现方式中,所述确定所述目标语音数据异常之后,还包括:With reference to the first aspect, or the first possible implementation manner of the first aspect, or the second possible implementation manner of the first aspect, in the third possible implementation manner, after the determining that the target voice data is abnormal, Also includes:
输出第一提示,其中,所述第一提示用于提示麦克话筒可能被用户堵住;Outputting a first prompt, wherein the first prompt is used to prompt the microphone microphone to be blocked by the user;
在输出所述第一提示后,间隔第一持续时长,继续所述获取经上行通话通路传输的目标语音数据;After outputting the first prompt, the interval is the first duration, and the acquiring the target voice data transmitted through the uplink call path is continued;
若所述第一持续时长后获取的所述目标语音数据异常,则根据所述麦克话筒的麦克通路个数进行异常处理。If the target voice data acquired after the first duration is abnormal, abnormal processing is performed according to the number of microphone channels of the microphone.
结合第一方面的第三种可能的实现方式,在第四种可能的实现方式中,所述获取经上行通话通路传输的目标语音数据,包括:With reference to the third possible implementation manner of the foregoing aspect, in a fourth possible implementation, the acquiring the target voice data that is transmitted by using the uplink call channel includes:
获取所述麦克话筒的每一麦克通路采集的目标语音数据;Obtaining target voice data collected by each of the microphone channels of the microphone microphone;
则,所述根据所述麦克话筒的麦克通路个数进行异常处理,包括:Then, the abnormal processing is performed according to the number of the microphone channels of the microphone, including:
若所述麦克话筒只有一个麦克通路,则输出第二提示,其中,所述第二提示用于提示用户所述麦克通路可能故障;If the microphone microphone has only one microphone path, outputting a second prompt, wherein the second prompt is used to prompt the user that the microphone path may be faulty;
若所述麦克话筒有至少两个麦克通路、且部分麦克通路采集的所述目标语音数据在频域上异常,则选择其它正常麦克通路进行语音通话;If the microphone microphone has at least two mic paths, and the target voice data collected by the partial mic path is abnormal in the frequency domain, another normal mic path is selected for voice call;
若所述麦克话筒有至少两个麦克通路、且全部麦克通路采集的所述目标语音数据在频域上异常,则输出第三提示,其中,所述第三提示用于提示用户所述麦克通路可能全部故障。If the microphone microphone has at least two mic paths, and the target voice data collected by all the mic paths is abnormal in the frequency domain, outputting a third prompt, wherein the third prompt is used to prompt the user to the microphone path May all fail.
结合第一方面的第三种可能的实现方式,在第五种可能的实现方式中,所述获取经上行通话通路传输的目标语音数据,包括:With reference to the third possible implementation manner of the foregoing aspect, in the fifth possible implementation, the acquiring the target voice data that is transmitted by using the uplink call channel includes:
按照预设时间间隔,获取经所述上行通话通路传输的目标语音数据;Obtaining target voice data transmitted through the uplink call path according to a preset time interval;
则,所述方法还包括:Then, the method further includes:
若第一持续时长内获取的所述目标语音数据在频域上均异常,则进行异常处理,其中,所述第一持续时长为当前间隔时间、或包括所述当前间隔时间在内的至少两段连续间隔时间。If the target voice data acquired in the first duration is abnormal in the frequency domain, performing an exception processing, where the first duration is the current interval, or at least two including the current interval Segment continuous interval.
第二方面,提供一种语音数据的异常检测装置,所述异常检测装置包括用于执行上述第一方面或第一方面的任一种可能实现方式所提供的方法的单元。In a second aspect, there is provided an anomaly detecting apparatus for voice data, the anomaly detecting apparatus comprising means for performing the method provided by the first aspect or any of the possible implementations of the first aspect.
第三方面,提供一种语音数据的异常检测装置,所述异常检测装置包括:处理器、存储器、总线系统;所述处理器以及所述存储器通过所述系统总线相连;所述存储器用于一个或多个程序,所述一个或多个程序包括指令,所述指令当被所述异常检测装置执行时使所述异常检测装置执行第一方面或第一方面的任一种可能实现方式所提供的方法。In a third aspect, an abnormality detecting apparatus for voice data is provided, the abnormality detecting apparatus comprising: a processor, a memory, and a bus system; the processor and the memory are connected by the system bus; and the memory is used for one Or a plurality of programs, the one or more programs comprising instructions that, when executed by the anomaly detecting device, cause the anomaly detecting device to perform the first aspect or any one of the possible implementations of the first aspect Methods.
第四方面,提供一种存储一个或多个程序的计算机可读存储介质,所述一个或多个程序被所述异常检测装置执行时,所述异常检测装置执行第一方面或第一方面的任一种可能实现 方式所提供的方法。A fourth aspect, a computer readable storage medium storing one or more programs, the one or more programs being executed by the abnormality detecting device, the abnormality detecting device performing the first aspect or the first aspect The method provided by any of the possible implementations.
第五方面,提供一种异常检测装置上的图形用户界面,所述异常检测装置包括显示器、存储器、多个应用程序;和用于执行存储在所述存储器中的一个或多个程序的一个或多个处理器,所述图形用户界面包括根据第一方面或第一方面的任一种可能实现方式所提供的方法显示的用户界面,其中,所述显示器包括触敏表面和显示屏。In a fifth aspect, a graphical user interface is provided on an anomaly detecting device, the anomaly detecting device comprising a display, a memory, a plurality of applications, and one or one of executing one or more programs stored in the memory A plurality of processors, the graphical user interface comprising a user interface displayed in accordance with the method provided by the first aspect or any one of the possible implementations of the first aspect, wherein the display comprises a touch-sensitive surface and a display screen.
本申请提供的一种语音数据的异常检测方法及装置,首先获取经上行通话通路传输的目标语音数据;由于正常语音数据中,低频能量占比多、高频能量占比少,因此,通过分析所述目标语音数据中的低频能量或高频能量的大小,能够判断出所述目标语音数据中的高频能量是否少于正常语音数据中的高频能量;当判断结果为是时,则说明所述目标语音数据中的高频能量丢失或被截断,因此可以确定所述目标语音数据在频域上异常。The method and device for detecting anomaly of voice data provided by the present application first acquire target voice data transmitted through an uplink call path; since normal voice data has a large proportion of low frequency energy and a small proportion of high frequency energy, therefore, analysis is performed. The magnitude of the low-frequency energy or the high-frequency energy in the target speech data can determine whether the high-frequency energy in the target speech data is less than the high-frequency energy in the normal speech data; when the judgment result is yes, The high frequency energy in the target speech data is lost or truncated, so it can be determined that the target speech data is abnormal in the frequency domain.
附图说明DRAWINGS
图1为本申请实施例提供的一种手机的上行通话通路示意图;1 is a schematic diagram of an uplink call path of a mobile phone according to an embodiment of the present application;
图2为本申请实施例提供的一种语音数据的异常检测方法的流程示意图;2 is a schematic flowchart of a method for detecting an abnormality of voice data according to an embodiment of the present disclosure;
图3为本申请实施例提供的正常语音数据的幅值/频率示意图;FIG. 3 is a schematic diagram of amplitude/frequency of normal voice data according to an embodiment of the present application; FIG.
图4为本申请实施例提供的语音数据异常检测方法的具体流程示意图之一;FIG. 4 is a schematic flowchart of a method for detecting an abnormality of a voice data according to an embodiment of the present disclosure;
图5为本申请实施例提供的语音数据异常检测方法的具体流程示意图之二;FIG. 5 is a second schematic flowchart of a method for detecting an abnormality of a voice data according to an embodiment of the present disclosure;
图6为本申请实施例提供的一种语音数据的异常检测装置的结构示意图;FIG. 6 is a schematic structural diagram of an abnormality detecting apparatus for voice data according to an embodiment of the present disclosure;
图7为本申请实施例提供的一种语音数据的异常检测装置的硬件结构示意图。FIG. 7 is a schematic structural diagram of hardware of an abnormality detecting apparatus for voice data according to an embodiment of the present disclosure.
具体实施方式Detailed ways
下面结合附图,对本申请的实施例进行描述。Embodiments of the present application will be described below with reference to the accompanying drawings.
用户利用手机或者座机等通话设备进行通话时,存在着一些异常情况,比如,麦克风(microphone,简称MIC)单体故障、板级连接故障、用户使用故障等等,其中,所述MIC单体故障是指通话设备的MIC单体的振膜里进入杂质导致振膜局部粘连,所述板级连接故障是指在音频通路中出现瞬间短路的情况,所述用户使用故障是指通话过程中用户误操作导致手指堵住MIC孔的情况。通话过程中,当出现其中一种或多种故障时,可能导致语音数据中的时域信号正常但频域信号异常,而频域信号异常会导致通话过程中出现无声或者断续的通话问题。When a user makes a call using a mobile device such as a mobile phone or a landline, there are some abnormalities, such as a microphone (MIC), a board-level connection failure, a user failure, etc., wherein the MIC unit is faulty. It refers to the entry of impurities into the diaphragm of the MIC unit of the talking device, causing partial adhesion of the diaphragm. The board-level connection failure refers to a situation in which an instantaneous short circuit occurs in the audio path, and the user-used fault refers to a user error during the call. The operation causes the finger to block the MIC hole. During the call, when one or more faults occur, the time domain signal in the voice data may be normal but the frequency domain signal is abnormal, and the abnormal frequency domain signal may cause a silent or intermittent call problem during the call.
可见,在实际通话过程中,上述故障会导致语音数据的时域信号没有发生较大变化,但频域信号发生异常,然而,现有的语音通话检测技术,都是针对语音的时域信号进行检测,还没有准确、快速、有效的检测方法来检测语音数据在频域上是否异常,继而无法根据频域异常检测结果,来消除因频域异常导致通话过程中出现无声或者断续的故障原因。It can be seen that during the actual call, the above-mentioned faults may cause the time domain signal of the voice data to not change greatly, but the frequency domain signal is abnormal. However, the existing voice call detection technologies are all for the time domain signal of the voice. Detection, there is no accurate, fast and effective detection method to detect whether the voice data is abnormal in the frequency domain, and then can not eliminate the frequency domain anomaly detection result to eliminate the cause of the silent or intermittent failure during the call due to the frequency domain anomaly. .
为了能够检测出语音数据在频域上是否异常,本申请实施例提供了一种语音数据的异常检测方法,可以准确、快速、有效的检测出语音数据在频域上是否异常,还可以在确定频域异常后,对导致频域异常的可能原因进行排查并进行异常处理。需要说明的是,本申请实施例提供的方法可以应用于任何一种语音通话设备,比如手机或座机等,其不对语音通话设备的类型进行限制。In order to be able to detect whether the voice data is abnormal in the frequency domain, the embodiment of the present application provides an abnormality detection method for voice data, which can accurately, quickly, and effectively detect whether the voice data is abnormal in the frequency domain, and can also determine After the frequency domain is abnormal, the possible causes of the frequency domain anomaly are investigated and the exception is processed. It should be noted that the method provided by the embodiment of the present application can be applied to any type of voice call device, such as a mobile phone or a landline, which does not limit the type of the voice call device.
现针对本申请实施例提供的方法给出一个具体应用场景,参见图1,为一种手机的上行通 话通路示意图,当用户1利用手机进行语音通话时,用户1的语音数据被手机的MIC采集后,被传递到编译码器(COder-DECoder,简称Codec)芯片进行A/D转换,即将模拟语音信号转换为数字语音信号,之后,把语音数据传递到音效算法模块进行音效处理,并把音效处理后的语音数据进行协议编码后传递给调制解调器(英文名Modem),modem最终把编码数据发送给对端用户2的手机或座机。A specific application scenario is provided for the method provided by the embodiment of the present application. Referring to FIG. 1 , it is a schematic diagram of an uplink call path of a mobile phone. When the user 1 uses the mobile phone to make a voice call, the voice data of the user 1 is collected by the MIC of the mobile phone. After that, it is passed to the COder-DECoder (Codec) chip for A/D conversion, that is, the analog voice signal is converted into a digital voice signal, and then the voice data is transmitted to the sound algorithm module for sound processing, and the sound effect is obtained. The processed voice data is protocol-encoded and transmitted to the modem (English name Modem), and the modem finally sends the encoded data to the mobile phone or landline of the peer user 2.
如图1所示,本实施例可以基于现有手机的物理结构,增加一个异常检测模块,并利用该异常检测模块进行语音数据的频域异常检测,其中,异常检测模块和音效算法模块可以采用同一个或不同的数字信号处理(Digital Signal Processing,简称DSP)芯片来实现相关功能。由于正常语音数据中,低频能量占比多、高频能量占比少,但因上述MIC单体故障、板级连接故障、用户使用故障等可能导致语音数据中的高频能量丢失或被截断,因此,可以在音效算法处理模块进行语音效果处理的同时,通过分析被采集语音数据中的低频能量或高频能量的大小,判断被采集语音数据中的高频能量是否少于正常语音数据中的高频能量,若判断结果为是,则说明被采集语音数据中的高频能量丢失或被截断,因此可以确定被采集语音数据在频域上异常。As shown in FIG. 1 , the embodiment may add an abnormality detecting module based on the physical structure of the existing mobile phone, and use the abnormality detecting module to perform frequency domain abnormality detection of the voice data, wherein the abnormality detecting module and the sound effect algorithm module may be adopted. The same or different Digital Signal Processing (DSP) chips are used to implement related functions. Since the normal voice data has a low proportion of low frequency energy and a small proportion of high frequency energy, the high frequency energy in the voice data may be lost or cut off due to the above MIC single unit failure, board level connection failure, user use failure, and the like. Therefore, it is possible to determine whether the high frequency energy in the collected voice data is less than the normal voice data by analyzing the magnitude of the low frequency energy or the high frequency energy in the collected voice data while the sound effect algorithm processing module performs the voice effect processing. The high frequency energy, if the judgment result is yes, indicates that the high frequency energy in the collected speech data is lost or truncated, so that it is possible to determine that the collected speech data is abnormal in the frequency domain.
其中,所述被采集语音数据为经Codec芯片进行A/D转换后的数字语音信号,为便于描述,后文将所述被采集语音数据称为目标语音数据。The collected voice data is a digital voice signal that is A/D converted by the Codec chip. For convenience of description, the collected voice data is hereinafter referred to as target voice data.
参见图2,为本申请实施例提供的一种语音数据的异常检测方法的流程示意图,该方法包括以下步骤S201-S202:2 is a schematic flowchart of a method for detecting an abnormality of voice data according to an embodiment of the present application, where the method includes the following steps S201-S202:
S201:获取经上行通话通路传输的目标语音数据。S201: Acquire target voice data transmitted through the uplink call path.
在本实施例中,可以如图1那样,音效算法模块在接收到Codec芯片发送的语音数据后,将接收的语音数据发送至异常检测模块,当然,Codec芯片也可以直接将输出的语音数据传送至异常检测模块,以由异常检测模块检测语音数据在频域上是否异常,被检测的语音数据即为所述目标语音数据。In this embodiment, as shown in FIG. 1, after receiving the voice data sent by the Codec chip, the sound effect algorithm module sends the received voice data to the abnormality detecting module. Of course, the Codec chip can directly transmit the output voice data. The abnormality detecting module detects whether the voice data is abnormal in the frequency domain by the abnormality detecting module, and the detected voice data is the target voice data.
其中,所述目标语音数据可以是较短时间内(比如1ms)获取的语音数据,可以是较长时间内(比如1s)获取的语音数据。The target voice data may be voice data acquired in a short time (for example, 1 ms), and may be voice data acquired in a long time (for example, 1 s).
S202:通过分析所述目标语音数据中的低频能量或高频能量的大小,判断所述目标语音数据中的高频能量是否少于正常语音数据中的高频能量;若是,则确定所述目标语音数据异常。S202: determining, by analyzing the magnitude of the low-frequency energy or the high-frequency energy in the target voice data, whether the high-frequency energy in the target voice data is less than the high-frequency energy in the normal voice data; if yes, determining the target The voice data is abnormal.
参见图3,为正常语音数据的幅值/频率示意图,其中,横坐标f代表频率,纵坐标A代表幅值。在正常语音数据中,低频能量占比多、而高频能量占比少,因此,可以通过获取所述目标语音数据中的低频数据并确定低频数据在所述目标语音数据中的能量占比,再确定该能量占比是否满足正常语音数据对低频能量的占比要求;或者,通过获取所述目标语音数据中的高频数据并确定高频数据在所述目标语音数据中的能量占比,再确定该能量占比是否满足正常语音数据对高频能量的占比要求;如果不满足,则可以说明所述目标语音数据中的高频信号丢失或被截断,从而可以确定所述目标语音数据在频域上异常。Referring to Figure 3, there is a schematic diagram of the amplitude/frequency of normal speech data, wherein the abscissa f represents frequency and the ordinate A represents amplitude. In the normal voice data, the low frequency energy has a large proportion and the high frequency energy has a small proportion. Therefore, the low frequency data in the target voice data can be obtained and the energy ratio of the low frequency data in the target voice data can be determined. Determining whether the energy ratio satisfies the ratio of normal voice data to low frequency energy; or, by acquiring high frequency data in the target voice data and determining the energy ratio of the high frequency data in the target voice data, And determining whether the energy ratio satisfies the requirement of the normal voice data for the high frequency energy; if not, the high frequency signal in the target voice data is lost or truncated, so that the target voice data can be determined. Abnormal in the frequency domain.
具体地,可以采用以下两种实施方式之一实现步骤S202。Specifically, step S202 can be implemented by using one of the following two implementation manners.
在第一种实施方式中,参见图4,S202具体可以包括:In the first embodiment, referring to FIG. 4, S202 may specifically include:
S2021:通过对所述目标语音数据进行低通滤波,获取所述目标语音数据中的低频数据。S2021: Obtain low-frequency data in the target voice data by performing low-pass filtering on the target voice data.
可以预先在图1所示的异常检测模块内设置有限脉冲响应(Finite Impulse Response,简称FIR)数字滤波器或者是无限脉冲响应(Infinite Impulse Response,简称为IIR)数字滤波器,并将其设置为低通滤波器且设置低通频率门限f LpA finite impulse response (Finite Impulse Response, abbreviated as FIR) digital filter or an Infinite Impulse Response (IIR) digital filter may be set in advance in the abnormality detecting module shown in FIG. 1 and set to Low pass filter and set low pass frequency threshold f Lp .
当利用低通滤波器对所述目标语音数据进行低通滤波时,所述目标语音数据中频率低于门限f Lp的数据将通过低通滤波器,通过的数据即为所述目标语音数据中的低频数据。 When the target speech data is low-pass filtered by using a low-pass filter, data whose frequency is lower than the threshold f Lp in the target speech data will pass through the low-pass filter, and the passed data is the target speech data. Low frequency data.
S2022:计算低频能量占比,其中,所述低频能量占比为所述目标语音数据中低频数据的总能量占所述目标语音数据的总能量的比重。S2022: Calculate a low frequency energy ratio, wherein the low frequency energy ratio is a proportion of a total energy of the low frequency data in the target voice data to a total energy of the target voice data.
计算所述目标语音数据中的低频数据能量E Lp、以及所述目标语音数据的全部能量E ALL,再计算低频能量占比Kactucal=E Lp/E ALLCalculating the low frequency data energy E Lp in the target speech data and the total energy E ALL of the target speech data, and then calculating the low frequency energy ratio Kactucal=E Lp /E ALL .
S2023:若所述低频能量占比大于低频占比门限,则确定所述目标语音数据中的高频能量少于正常语音数据中的高频能量,其中,所述低频占比门限为所述正常语音数据中低频数据的总能量占所述正常语音数据的总能量的比重。S2023: if the low frequency energy ratio is greater than the low frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the low frequency occupancy threshold is the normal The total energy of the low frequency data in the voice data is the proportion of the total energy of the normal voice data.
通过S2022计算低频数据的低频能量占比Kactucal后,如果低频能量占比Kactucal超过低频占比门限Kthreshold,即Kactucal>Kthreshold,则表明所述目标语音数据中的低频能量占比偏高,从而说明所述目标语音数据中的高频信号丢失或被截断,进而说明所述目标语音数据在频域上异常。After S2022 calculates the low-frequency energy ratio of the low-frequency data to Kactucal, if the low-frequency energy ratio Kactucal exceeds the low-frequency occupancy threshold Kthreshold, that is, Kactucal>Kthreshold, it indicates that the low-frequency energy ratio in the target speech data is high, thereby indicating The high frequency signal in the target speech data is lost or truncated, thereby indicating that the target speech data is abnormal in the frequency domain.
通常情况下,正常语音数据中的低频能量占比可能并不是一个固定值,而是一个数值范围,因此,在设置低频占比门限Kthreshold时,可以将其设置为正常语音数据中低频能量占比的最大值、或最小值、或正常占比范围的均值等等。Normally, the proportion of low-frequency energy in normal speech data may not be a fixed value, but a range of values. Therefore, when setting the low-frequency occupancy threshold Kthreshold, it can be set to the proportion of low-frequency energy in normal speech data. The maximum value, or minimum value, or the mean of the normal proportion range, and so on.
为便于理解上述第一种实施方式的步骤S2021-S2023,现举例说明:In order to facilitate the understanding of steps S2021-S2023 of the above first embodiment, an example is illustrated:
以某平台为例,当用户1利用手机1与手机2建立正常语音通话后,假设语音数据的采样间隔时间Tunit设置为1ms,则手机1以1ms为间隔持续采集用户1的语音数据,其中,手机1的每一个MIC通路每1ms可以采集48个语音数据,这48个语音数据即为所述目标语音数据。Taking a certain platform as an example, after the user 1 establishes a normal voice call with the mobile phone 1 and the mobile phone 2, the mobile phone 1 continuously collects the voice data of the user 1 at intervals of 1 ms, assuming that the sampling interval Tunit of the voice data is set to 1 ms. Each MIC path of the mobile phone 1 can collect 48 voice data every 1 ms, and the 48 voice data is the target voice data.
利用低通滤波器,对每1ms采集的48个语音数据进行10阶(或其它阶数)FIR或IIR低通滤波处理,假设设置的低通滤波频率门限f Lp为4KHz,则每一语音数据中低于4KHz的数据成分可以通过低通滤波器,通过低通滤波器的这些数据即为所述48个语音数据中的低频数据。 Using a low-pass filter, perform 10th-order (or other order) FIR or IIR low-pass filtering on 48 speech data acquired every 1ms, assuming that the set low-pass filtering frequency threshold f Lp is 4KHz, then each speech data The data component below 4KHz can pass through the low pass filter, and the data passing through the low pass filter is the low frequency data in the 48 speech data.
定义每1ms采集的48个语音数据为data[0]~data[47],data[0]~data[47]中每一语音数据中的低频数据分别定义为data_Lp[0]~data_Lp[47]。The definition of 48 voice data collected every 1 ms is data[0]~data[47], and the low frequency data in each voice data of data[0]~data[47] is defined as data_Lp[0]~data_Lp[47] .
计算48个语音数据data[0]~data[47]的低频能量和全部能量,即:Calculate the low frequency energy and total energy of 48 speech data data[0]~data[47], namely:
Figure PCTCN2018107572-appb-000001
示所述48个语音数据中的第i个数据的幅值。
Figure PCTCN2018107572-appb-000001
The amplitude of the i-th data of the 48 voice data is shown.
则,低频能量占比Kactucal=C*E Lp/E ALL,其中,C为常值增益。 Then, the low frequency energy accounts for Kactucal=C*E Lp /E ALL , where C is a constant gain.
当低频能量占比Kactucal超过低频占比门限Kthreshold时,表明单位时间Tunit内获取的目标语音数据出现了高频信号丢失或者被截取的现象。When the low-frequency energy ratio Kactucal exceeds the low-frequency occupancy threshold Kthreshold, it indicates that the target speech data acquired in the unit time Tunit has a high-frequency signal missing or intercepted.
在第二种实施方式中,参见图5,S202具体可以包括:In the second embodiment, referring to FIG. 5, S202 may specifically include:
S2021:通过对所述目标语音数据进行高通滤波,获取所述目标语音数据中的高频数据。S2021: Obtain high frequency data in the target voice data by performing high-pass filtering on the target voice data.
可以预先在图1所示的异常检测模块内设置FIR数字滤波器或者IIR数字滤波器,并将其设置为高通滤波器且设置高通频率门限f HpThe FIR digital filter or the IIR digital filter may be set in advance in the abnormality detecting module shown in FIG. 1, and set as a high-pass filter and set a high-pass frequency threshold f Hp .
当利用高通滤波器对所述目标语音数据进行高通滤波时,所述目标语音数据中频率高于门限f Hp的数据将通过高通滤波器,通过的数据即为所述目标语音数据中的高频数据。 When the target speech data is high-pass filtered by the high-pass filter, data having a frequency higher than the threshold f Hp in the target speech data will pass through the high-pass filter, and the passed data is the high frequency in the target speech data. data.
S2022:计算高频能量占比,其中,所述高频能量占比为所述目标语音数据中高频数据的总能量占所述目标语音数据的总能量的比重。S2022: Calculate a high frequency energy ratio, wherein the high frequency energy ratio is a proportion of a total energy of the high frequency data in the target voice data to a total energy of the target voice data.
计算所述目标语音数据中的高频数据能量E Hp、以及所述目标语音数据的全部能量E ALL,再计算高频能量占比Kactucal=E Hp/E ALLCalculating the high frequency data energy E Hp in the target speech data and the total energy E ALL of the target speech data, and then calculating the high frequency energy ratio Kactucal=E Hp /E ALL .
S2023:若所述高频能量占比小于高频占比门限,则确定所述目标语音数据中的高频能量少于正常语音数据中的高频能量,其中,所述高频占比门限为所述正常语音数据中高频数据的总能量占所述正常语音数据的总能量的比重。S2023: if the high frequency energy ratio is less than the high frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the high frequency occupancy threshold is The total energy of the high frequency data in the normal voice data accounts for the proportion of the total energy of the normal voice data.
通过S2022计算低频数据的高频能量占比Kactucal后,如果高频能量占比Kactucal低过高频占比门限Kthreshold,即Kactucal<Kthreshold,则说明所述目标语音数据中的高频能量占比偏低,从而说明所述目标语音数据中的高频信号丢失或被截断,进而说明所述目标语音数据在频域上异常。After calculating the high-frequency energy ratio of the low-frequency data by S2022, if the high-frequency energy ratio Kactucal is lower than the high-frequency occupancy threshold Kthreshold, that is, Kactucal<Kthreshold, the high-frequency energy ratio in the target speech data is biased. Low, thereby indicating that the high frequency signal in the target voice data is lost or truncated, thereby indicating that the target voice data is abnormal in the frequency domain.
通常情况下,正常语音数据中的高频能量占比可能并不是一个固定值,而是一个范围,因此,在设置高频占比门限Kthreshold时,可以将其设置为正常语音数据中高频能量占比的最大值、或最小值、或正常占比范围的均值等等。Normally, the proportion of high-frequency energy in normal speech data may not be a fixed value, but a range. Therefore, when setting the high-frequency occupancy threshold Kthreshold, it can be set to the high-frequency energy in normal speech data. The maximum or minimum of the ratio, or the mean of the normal proportion range, and so on.
为便于理解上述第二种实施方式的S2021-S2023,现举例说明:To facilitate understanding of S2021-S2023 of the second embodiment described above, an example is illustrated:
以某平台为例,当用户1利用手机1与手机2建立正常语音通话后,假设语音数据的采样间隔时间Tunit设置为1ms,则手机1以1ms为间隔持续采集用户1的语音数据,其中,手机1的每一个语音通路每1ms可以采集48个语音数据,这48个语音数据即为所述目标语音数据。Taking a certain platform as an example, after the user 1 establishes a normal voice call with the mobile phone 1 and the mobile phone 2, the mobile phone 1 continuously collects the voice data of the user 1 at intervals of 1 ms, assuming that the sampling interval Tunit of the voice data is set to 1 ms. Each voice channel of the mobile phone 1 can collect 48 voice data every 1 ms, and the 48 voice data is the target voice data.
利用高通滤波器,对每1ms采集的48个语音数据进行10阶(或其它阶数)FIR或IIR高通滤波处理,假设设置的高通滤波频率门限f Hp为6KHz,则每一语音数据中高于6KHz的数据成分可以通过高通滤波器,通过高通滤波器的这些数据即为48个语音数据中的高频数据。 Using the high-pass filter, 10th order (or other order) FIR or IIR high-pass filtering processing is performed on 48 speech data collected every 1ms. It is assumed that the set high-pass filtering frequency threshold f Hp is 6KHz, and each speech data is higher than 6KHz. The data component can pass through the high-pass filter, and the data passing through the high-pass filter is the high-frequency data of the 48 voice data.
定义每1ms采集的48个语音数据为data[0]~data[47],data[0]~data[47]中每一语音数据中的低频数据分别定义为data_Hp[0]~data_Hp[47]。The definition of 48 voice data collected every 1 ms is data[0]~data[47], and the low frequency data in each voice data of data[0]~data[47] is defined as data_Hp[0]~data_Hp[47] .
计算48个语音数据data[0]~data[47]的高频能量和全部能量,即:Calculate the high frequency energy and total energy of 48 speech data data[0]~data[47], namely:
Figure PCTCN2018107572-appb-000002
示所述48个语音数据中的第i个数据的幅值。
Figure PCTCN2018107572-appb-000002
The amplitude of the i-th data of the 48 voice data is shown.
则,高频能量占比Kactucal=C*E Hp/E ALL,其中,C为常值增益。 Then, the high frequency energy accounts for Kactucal=C*E Hp /E ALL , where C is a constant gain.
当高频能量占比Kactucal低于高频占比门限Kthreshold时,表明单位时间Tunit内获取的目标语音数据出现了高频信号丢失或者被截取的现象。When the high-frequency energy ratio Kactucal is lower than the high-frequency occupancy threshold Kthreshold, it indicates that the target speech data acquired in the unit time Tunit has a phenomenon that the high-frequency signal is lost or intercepted.
进一步地,由于本实施例可以在较短时间内检测出语音数据的频域信号是否异常,即检测效率较高,因此,当语音数据在频域上出现异常时,可以快速的进行问题处理和规避,从而提升用户对通话设备的体验效果。Further, the present embodiment can detect whether the frequency domain signal of the voice data is abnormal in a short time, that is, the detection efficiency is high. Therefore, when the voice data is abnormal in the frequency domain, the problem can be quickly processed and Avoidance, thereby improving the user experience of the call device.
因此,在本申请的一种实施方式中,步骤S202之后还可以进一步包括:Therefore, in an implementation manner of the application, step S202 may further include:
步骤A:输出第一提示,其中,所述第一提示用于提示麦克话筒可能被用户堵住。Step A: outputting a first prompt, wherein the first prompt is used to prompt the microphone to be blocked by the user.
用户通话过程中,当因用户使用不规范(比如堵住MIC孔)导致通话异常时,如果没有响应提示,用户会不清楚问题原因,因此,在确定语音数据在频域上异常后,首先排查用户操作是否规范,具有可以通过手机震动或者手机提示音等方式,提醒用户对异常操作进行改善,例如,可以通过语音输出第一提示,如“您的手指可能堵住MIC孔”,用户听到提示后,一般会将手指移开。During the user's call, when the call is abnormal due to the user's non-standard use (such as blocking the MIC hole), if there is no response prompt, the user may not know the cause of the problem. Therefore, after determining that the voice data is abnormal in the frequency domain, first check the voice data. Whether the user operation is standardized or not, the user can be prompted to improve the abnormal operation by means of mobile phone vibration or mobile phone prompt tone. For example, the first prompt can be output through voice, such as "Your finger may block the MIC hole", the user hears After the prompt, the finger will generally be removed.
步骤B:在输出所述第一提示后,间隔第一持续时长,继续步骤S201。Step B: After the first prompt is output, the interval is the first duration, and step S201 is continued.
在输出第一提示后,为用户预留一定的异常解除时间(即第一持续时长),比如5秒,然后继续执行步骤S201,以继续采集语音数据并进行异常检测。After the first prompt is output, a certain abnormal cancellation time (ie, the first duration) is reserved for the user, for example, 5 seconds, and then the process proceeds to step S201 to continue collecting voice data and performing abnormality detection.
步骤C:若所述第一持续时长后获取的所述目标语音数据异常,则根据所述麦克话筒的麦克通路个数进行异常处理。Step C: If the target voice data acquired after the first duration is abnormal, abnormal processing is performed according to the number of microphone channels of the microphone.
如果因用户手指堵住MIC孔导致语音数据异常,当用户手指不再堵孔时,语音数据应恢复正常,但如果语音数据仍异常,则可能是MIC话筒的某个或多个MIC单体出现故障。If the voice data is abnormal due to the user's finger blocking the MIC hole, the voice data should return to normal when the user's finger is no longer blocked, but if the voice data is still abnormal, one or more MIC cells of the MIC microphone may appear. malfunction.
关于现在的手机,如果主MIC单体出现物理损坏,将导致手机无法使用,用户必须去维修网点维修才可重新使用,而通过本实施例,当检测主MIC单体异常后,可以自动切换到副MIC单体进行通话,从而确保通话的完整性,并提示用户哪些MIC单体可能出现故障。Regarding the current mobile phone, if the main MIC unit is physically damaged, the mobile phone will be unusable, and the user must repair the network point to repair it. However, in this embodiment, after detecting the abnormality of the main MIC unit, it can automatically switch to The secondary MIC unit makes a call to ensure the integrity of the call and prompts the user which MIC units may be faulty.
为了能够确定哪个或哪些MIC单体可能出现故障,在本申请的一种实施方式中,S201具体可以包括:获取所述麦克话筒的每一麦克通路采集的目标语音数据。在本实施方式中,可以预先检测MIC话筒的MIC阵列,判断该MIC话筒具有几个MIC单体,比如只有一个主MIC单体、或者有一个主MIC单体以及一个或多个副MIC单体,每个MIC单体即对应一个MIC通路,之后,针对MIC话筒的每个MIC通路采集的目标语音数据分别进行频域异常检测,即,各个MIC通路对语音数据的频域异常检测可以不相互依赖。In an embodiment of the present application, the S201 may specifically include: acquiring target voice data collected by each of the microphone channels of the microphone microphone, in order to be able to determine which one or which MICs may be faulty. In this embodiment, the MIC array of the MIC microphone can be detected in advance, and it is determined that the MIC microphone has several MIC monomers, such as only one main MIC monomer, or one main MIC monomer and one or more sub-MIC monomers. Each MIC unit corresponds to one MIC path. Thereafter, the target voice data collected for each MIC channel of the MIC microphone is respectively subjected to frequency domain anomaly detection, that is, the frequency domain anomaly detection of the voice data of each MIC path may not be mutually rely.
然而,现有的语音数据检测算法,主要依赖时域信号,只对单一MIC通路采集的时域信号进行分析,是无法准确判断语音数据是否异常的,而是需要多MIC通路采集的语音数据进行辅助综合判断,此外,通过多通路综合判断,还存在花费周期长、检测准确率低的问题。可见,与现有技术相比,本实施例在判断语音数据是否异常时,不但不需要依赖于多MIC通路采集的语音数据,而且异常检测所花费的时间较少且检测准确率低。However, the existing voice data detection algorithm mainly relies on the time domain signal, and only analyzes the time domain signal collected by the single MIC path, and cannot accurately determine whether the voice data is abnormal, but needs voice data collected by multiple MIC channels. Auxiliary comprehensive judgment, in addition, through the multi-channel comprehensive judgment, there is a problem that the spending period is long and the detection accuracy is low. It can be seen that, compared with the prior art, when determining whether the voice data is abnormal, the embodiment does not need to rely on the voice data collected by the multiple MIC path, and the time taken for the abnormality detection is less and the detection accuracy is low.
可见,当通话设备出现送话异常时,现有的时域检测技术是无法准确、快速的检测出语音数据是否异常的,导致通话设备的性能和特性不能充分的发挥出来。此外,现有的时域检测技术对语音数据的异常检测需要依赖于多MIC通路,因此,无法准确检测出MIC通路是否故障,从而无法规避MIC故障导致的通话异常问题。It can be seen that when the call device is abnormal, the existing time domain detection technology cannot accurately and quickly detect whether the voice data is abnormal, and the performance and characteristics of the call device cannot be fully exerted. In addition, the existing time domain detection technology relies on multiple MIC paths for abnormal detection of voice data. Therefore, it is impossible to accurately detect whether the MIC path is faulty, and thus there is no problem of avoiding call abnormality caused by MIC failure.
而本实施例可以对每一MIC通路采集的语音数据进行异常检测,因而可以根据异常检测结果判断对应的MIC通路可能发生故障,具体地,上述步骤C中的“根据所述麦克话筒的麦克通路个数进行异常处理”可以包括:In this embodiment, the voice data collected by each MIC path can be abnormally detected, so that the corresponding MIC path may be determined to be faulty according to the abnormality detection result. Specifically, the microphone path according to the microphone is used in the above step C. The number of exception handling" can include:
C1:若所述麦克话筒只有一个麦克通路,则输出第二提示,其中,所述第二提示用于提示用户所述麦克通路可能故障。C1: If the microphone microphone has only one microphone path, outputting a second prompt, wherein the second prompt is used to prompt the user that the microphone path may be faulty.
如果通话设备只具有一个MIC通路,则通过第二提示,比如语音提示或震动提示等,提 醒用户通话设备的单MIC通路可能故障。If the calling device has only one MIC path, the second prompt, such as a voice prompt or a vibrating alert, reminds the user that the single MIC path of the calling device may be faulty.
C2:若所述麦克话筒有至少两个麦克通路、且部分麦克通路采集的所述目标语音数据在频域上异常,则选择其它正常麦克通路进行语音通话。C2: If the microphone microphone has at least two mic paths, and the target voice data collected by the partial mic path is abnormal in the frequency domain, select another normal mic path for voice call.
如果通话设备具备多MIC通路,当主MIC通路异常时,则选择其余副MIC通路中语音质量最好的MIC通路进行通话;如果通话设备具备多MIC通路,当主MIC通路和其中的副MIC通路异常时,则选择其余副MIC通路中语音质量最好的MIC通路进行通话。If the call device has multiple MIC paths, when the main MIC path is abnormal, the MIC path with the best voice quality in the remaining sub-MIC channels is selected for the call; if the call device has multiple MIC paths, when the main MIC path and the sub MIC path therein are abnormal Then, select the MIC path with the best voice quality in the remaining sub-MIC channels for the call.
此外,还可以通过语音提示或震动提示等,提醒用户通话设备的哪些MIC通路可能故障。In addition, you can remind the user which MIC channels of the call device may be malfunctioning by voice prompts or vibrating alerts.
C3:若所述麦克话筒有至少两个麦克通路、且全部麦克通路采集的所述目标语音数据在频域上异常,则输出第三提示,其中,所述第三提示用于提示用户所述麦克通路可能全部故障。C3: outputting a third prompt if the microphone microphone has at least two microphone paths and the target voice data collected by all the microphone channels is abnormal in the frequency domain, wherein the third prompt is used to prompt the user to The mic path may all fail.
如果通话设备具备多MIC通路,当所有MIC通路都异常时,则通过第三提示,比如语音提示或震动提示等,提醒用户通话设备的全部MIC通路可能故障。If the call device has multiple MIC channels, when all MIC paths are abnormal, the third prompt, such as a voice prompt or a vibration prompt, is used to remind the user that all MIC paths of the call device may be faulty.
可见,通过本实施例,当检测到某个或某些MIC通路异常后,通话设备会自动切换到其它正常MIC通路进行语音通话,这样可以确保通话的完整性,并提示用户哪些MIC通路可能出现故障,以便用户及时进行维修。It can be seen that, in this embodiment, when an abnormality of one or some MIC paths is detected, the calling device automatically switches to other normal MIC channels for voice calls, thereby ensuring the integrity of the call and prompting the user which MIC paths may appear. Fault, so that the user can carry out repairs in time.
通常情况下,在正常语音通话时,当语音断续或者无声时间超过100ms时,人耳能有明显感受,因此,当通过以上步骤检测出所述目标语音数据在频域上异常时,如果所述目标语音数据对应的采样时间比较短比如1ms,可以并不马上进行异常处理,而是进行频域异常时间的连续累加,例如,将异常时间累计门限ACC设置为100ms,当频域异常检测累计超过100ms时,在利用上述步骤A-C进行异常处理。Generally, in a normal voice call, when the voice is intermittent or the silent time exceeds 100 ms, the human ear can have a clear feeling. Therefore, when it is detected by the above steps that the target voice data is abnormal in the frequency domain, The sampling time corresponding to the target speech data is relatively short, for example, 1 ms, and the abnormal processing may not be performed immediately, but the continuous accumulation of the frequency domain abnormal time is performed, for example, the abnormal time accumulation threshold ACC is set to 100 ms, and the frequency domain abnormality detection is accumulated. When it exceeds 100 ms, the abnormality processing is performed by using the above procedure AC.
为此,在本申请的一种实施方式中,S201具体可以包括:按照预设时间间隔,获取经所述上行通话通路传输的目标语音数据。在本实施方式中,可以按照一定的时间间隔获取经A/D转换后的数字语音数据,比如,每1ms获取一次数字语音数据,每1ms内的数据语音数据即为所述目标语音数据。To this end, in an implementation manner of the present application, S201 may specifically include: acquiring target voice data transmitted through the uplink call path according to a preset time interval. In this embodiment, the A/D converted digital voice data may be acquired at a certain time interval, for example, digital voice data is acquired once every 1 ms, and the data voice data within 1 ms is the target voice data.
S203:若第二持续时长内获取的所述目标语音数据异常,则继续执行步骤A-C;其中,所述第二持续时长为当前间隔时间、或包括所述当前间隔时间在内的至少两段连续间隔时间。S203: If the target voice data acquired in the second duration is abnormal, proceed to step AC, where the second duration is the current interval, or at least two consecutive segments including the current interval. Intervals.
在本实施方式中,需要预先设置异常时间累计门限ACC(即第二持续时长),以及所述目标语音数据对应的获取时间。例如,当ACC为100ms时,可以将每100ms内采集的语音数据作为所述目标语音数据,如果当前采集的目标语音数据在频域上异常,便进行异常处理;又比如,将每1ms内采集的语音数据作为所述目标语音数据,当连续100次采集的目标语音数据均在频域上异常时,便进行异常处理。In the present embodiment, it is necessary to set an abnormal time accumulation threshold ACC (ie, a second duration) and an acquisition time corresponding to the target voice data. For example, when the ACC is 100 ms, the voice data collected every 100 ms may be used as the target voice data. If the currently collected target voice data is abnormal in the frequency domain, abnormal processing is performed; for example, it is collected every 1 ms. The voice data is used as the target voice data, and when the target voice data collected for 100 consecutive times is abnormal in the frequency domain, the exception processing is performed.
可以理解的是,现有的异常语音检测技术主要是依赖时域信号进行检测,存在检测准确率低、检测周期长(一般2-3秒时间)等问题,而本实施例是基于频域信号进行检测,与现有技术相比,存在检测准确率高,检测周期短(一般100-300毫秒)等有益效果,因此可以迅速进行异常处理。另外,通过本实施例进行实际效果测试中发现,本实施例提供的语音异常检测方法,检测结果的准确度不受用户年龄、音调等影响,且检测结果的准确率达到80%以上。It can be understood that the existing abnormal voice detection technology mainly relies on time domain signals for detection, and has problems such as low detection accuracy and long detection period (generally 2-3 seconds), and the present embodiment is based on frequency domain signals. Compared with the prior art, there are advantageous effects such as high detection accuracy and short detection period (generally 100-300 milliseconds), so that abnormal processing can be performed quickly. In addition, it is found that the voice anomaly detection method provided in this embodiment is not affected by the age, tone, and the like of the user, and the accuracy of the detection result is more than 80%.
参见图6,为本申请实施例提供的一种语音数据的异常检测装置的结构示意图,该异常检测装置600包括:FIG. 6 is a schematic structural diagram of an abnormality detecting apparatus for voice data according to an embodiment of the present disclosure. The abnormality detecting apparatus 600 includes:
数据获取单元601,用于获取经上行通话通路传输的目标语音数据;The data obtaining unit 601 is configured to acquire target voice data transmitted through the uplink call path.
异常检测单元602,用于通过分析所述目标语音数据中的低频能量或高频能量的大小,判断所述目标语音数据中的高频能量是否少于正常语音数据中的高频能量;若是,则确定所述目标语音数据异常。The abnormality detecting unit 602 is configured to determine whether the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data by analyzing the magnitude of the low frequency energy or the high frequency energy in the target voice data; if yes, Then determining that the target voice data is abnormal.
在本申请的一种实施方式中,所述异常检测单元602可以包括:In an embodiment of the present application, the abnormality detecting unit 602 may include:
低通滤波子单元,用于通过对所述目标语音数据进行低通滤波,获取所述目标语音数据中的低频数据;a low pass filtering subunit, configured to acquire low frequency data in the target voice data by performing low pass filtering on the target voice data;
占比计算子单元,用于计算低频能量占比,其中,所述低频能量占比为所述目标语音数据中低频数据的总能量占所述目标语音数据的总能量的比重;a percentage calculation subunit for calculating a low frequency energy ratio, wherein the low frequency energy ratio is a proportion of a total energy of the low frequency data in the target voice data to a total energy of the target voice data;
异常确定子单元,用于若所述低频能量占比大于低频占比门限,则确定所述目标语音数据中的高频能量少于正常语音数据中的高频能量,其中,所述低频占比门限为所述正常语音数据中低频数据的总能量占所述正常语音数据的总能量的比重。An abnormality determining subunit, configured to determine that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, if the low frequency energy ratio is greater than a low frequency occupancy threshold, wherein the low frequency ratio The threshold is a proportion of the total energy of the low frequency data in the normal voice data to the total energy of the normal voice data.
在本申请的一种实施方式中,所述异常检测单元602可以包括:In an embodiment of the present application, the abnormality detecting unit 602 may include:
高通滤波子单元,用于通过对所述目标语音数据进行高通滤波,获取所述目标语音数据中的高频数据;a high-pass filtering sub-unit, configured to acquire high-frequency data in the target voice data by performing high-pass filtering on the target voice data;
占比计算子单元,用于计算高频能量占比,其中,所述高频能量占比为所述目标语音数据中高频数据的总能量占所述目标语音数据的总能量的比重;a ratio calculating subunit for calculating a high frequency energy ratio, wherein the high frequency energy ratio is a proportion of a total energy of the high frequency data in the target voice data to a total energy of the target voice data;
异常确定子单元,用于若所述高频能量占比小于高频占比门限,则确定所述目标语音数据中的高频能量少于正常语音数据中的高频能量,其中,所述高频占比门限为所述正常语音数据中高频数据的总能量占所述正常语音数据的总能量的比重。An abnormality determining subunit, configured to determine that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, if the high frequency energy ratio is less than a high frequency occupancy threshold, wherein the high The frequency occupancy threshold is a proportion of the total energy of the high frequency data in the normal voice data to the total energy of the normal voice data.
在本申请的一种实施方式中,所述装置600还可以包括:In an embodiment of the present application, the apparatus 600 may further include:
异常提示单元,用于若所述异常检测单元602确定所述目标语音数据异常,则输出第一提示,其中,所述第一提示用于提示麦克话筒可能被用户堵住;An abnormality prompting unit, configured to output a first prompt if the abnormality detecting unit 602 determines that the target voice data is abnormal, wherein the first prompt is used to prompt the microphone microphone to be blocked by the user;
时钟计时单元,用于在输出所述第一提示后,间隔第一持续时长,触发所述数据获取单元601获取经上行通话通路传输的目标语音数据;a clock timing unit, configured to: after the outputting the first prompt, interval a first duration, triggering the data acquiring unit 601 to acquire target voice data transmitted through an uplink call path;
异常处理单元,用于若所述异常检测单元602确定所述第一持续时长后获取的所述目标语音数据异常,则根据所述麦克话筒的麦克通路个数进行异常处理。The abnormality processing unit is configured to perform abnormal processing according to the number of the microphone channels of the microphone microphone if the target voice data acquired after the abnormality detecting unit 602 determines the first duration is abnormal.
在本申请的一种实施方式中,所述数据获取单元601,具体可以用于获取所述麦克话筒的每一麦克通路采集的目标语音数据;In an embodiment of the present application, the data acquiring unit 601 may be specifically configured to acquire target voice data collected by each microphone path of the microphone microphone;
则,所述异常处理单元,具体用于当所述异常检测单元602确定所述第一持续时长后获取的所述目标语音数据在频域上异常时,若所述麦克话筒只有一个麦克通路,则输出第二提示,其中,所述第二提示用于提示用户所述麦克通路可能故障;若所述麦克话筒有至少两个麦克通路、且部分麦克通路采集的所述目标语音数据在频域上异常,则选择其它正常麦克通路进行语音通话;若所述麦克话筒有至少两个麦克通路、且全部麦克通路采集的所述目标语音数据在频域上异常,则输出第三提示,其中,所述第三提示用于提示用户所述麦克通路可能全部故障。The exception processing unit is specifically configured to: when the abnormality detecting unit 602 determines that the target voice data acquired after the first duration is abnormal in a frequency domain, if the microphone has only one microphone path, And outputting a second prompt, wherein the second prompt is used to prompt the user that the microphone path may be faulty; if the microphone microphone has at least two microphone paths, and the target voice data collected by the partial microphone path is in the frequency domain If the abnormality is abnormal, the other normal microphone path is selected for the voice call; if the microphone microphone has at least two microphone paths, and the target voice data collected by all the microphone channels is abnormal in the frequency domain, the third prompt is output, where The third prompt is used to prompt the user that the microphone path may be all faulty.
在本申请的一种实施方式中,所述数据获取单元601,具体用于按照预设时间间隔,获取经所述上行通话通路传输的目标语音数据;In an embodiment of the present application, the data acquiring unit 601 is specifically configured to acquire target voice data transmitted through the uplink call path according to a preset time interval.
则,所述异常处理单元,还用于若所述异常检测单元602确定第二持续时长内获取的所述目标语音数据异常,则触发所述异常提示单元输出第一提示,其中,所述第二持续时长为当前间隔时间、或包括所述当前间隔时间在内的至少两段连续间隔时间。The exception processing unit is further configured to: if the abnormality detecting unit 602 determines that the target voice data is abnormal in the second duration, triggering the abnormal prompting unit to output a first prompt, where the The duration of the second duration is the current interval, or at least two consecutive intervals including the current interval.
图6所对应实施例中特征的说明可以参见图2所对应实施例的相关说明,这里不再一一赘述。For the description of the features in the corresponding embodiment of FIG. 6, reference may be made to the related description of the corresponding embodiment in FIG. 2, and details are not described herein again.
参见图7,为本申请实施例提供的一种语音数据的异常检测装置的硬件结构示意图,所述异常检测装置700包括存储器701和接收器702,以及分别与所述存储器701和所述接收器702连接的处理器703,所述存储器701用于存储一组程序指令,所述处理器703用于调用所述存储器701存储的程序指令执行如下操作:FIG. 7 is a schematic diagram of a hardware structure of an abnormality detecting apparatus for voice data according to an embodiment of the present application. The abnormality detecting apparatus 700 includes a memory 701 and a receiver 702, and the memory 701 and the receiver respectively. The processor 703 is configured to store a set of program instructions, and the processor 703 is configured to invoke the program instructions stored in the memory 701 to perform the following operations:
获取经上行通话通路传输的目标语音数据;Acquiring target voice data transmitted through the uplink call path;
通过分析所述目标语音数据中的低频能量或高频能量的大小,判断所述目标语音数据中的高频能量是否少于正常语音数据中的高频能量;Determining whether the high frequency energy in the target speech data is less than the high frequency energy in the normal speech data by analyzing the magnitude of the low frequency energy or the high frequency energy in the target speech data;
若是,则确定所述目标语音数据异常。If yes, it is determined that the target voice data is abnormal.
在本发明的一种实施方式中,所述处理器703还用于调用所述存储器701存储的程序指令执行如下操作:In an embodiment of the present invention, the processor 703 is further configured to invoke a program instruction stored by the memory 701 to perform the following operations:
通过对所述目标语音数据进行低通滤波,获取所述目标语音数据中的低频数据;Obtaining low frequency data in the target voice data by low pass filtering the target voice data;
计算低频能量占比,其中,所述低频能量占比为所述目标语音数据中低频数据的总能量占所述目标语音数据的总能量的比重;Calculating a low frequency energy ratio, wherein the low frequency energy ratio is a proportion of a total energy of the low frequency data in the target voice data to a total energy of the target voice data;
若所述低频能量占比大于低频占比门限,则确定所述目标语音数据中的高频能量少于正常语音数据中的高频能量,其中,所述低频占比门限为所述正常语音数据中低频数据的总能量占所述正常语音数据的总能量的比重。If the low frequency energy ratio is greater than the low frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the low frequency occupancy threshold is the normal voice data The total energy of the low frequency data accounts for the proportion of the total energy of the normal voice data.
在本发明的一种实施方式中,所述处理器703还用于调用所述存储器701存储的程序指令执行如下操作:In an embodiment of the present invention, the processor 703 is further configured to invoke a program instruction stored by the memory 701 to perform the following operations:
通过对所述目标语音数据进行高通滤波,获取所述目标语音数据中的高频数据;Obtaining high frequency data in the target voice data by performing high-pass filtering on the target voice data;
计算高频能量占比,其中,所述高频能量占比为所述目标语音数据中高频数据的总能量占所述目标语音数据的总能量的比重;Calculating a high frequency energy ratio, wherein the high frequency energy ratio is a proportion of a total energy of the high frequency data in the target voice data to a total energy of the target voice data;
若所述高频能量占比小于高频占比门限,则确定所述目标语音数据中的高频能量少于正常语音数据中的高频能量,其中,所述高频占比门限为所述正常语音数据中高频数据的总能量占所述正常语音数据的总能量的比重。If the high frequency energy ratio is less than the high frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the high frequency occupancy threshold is The total energy of the high frequency data in the normal voice data accounts for the proportion of the total energy of the normal voice data.
在本发明的一种实施方式中,所述处理器703还用于调用所述存储器701存储的程序指令执行如下操作:In an embodiment of the present invention, the processor 703 is further configured to invoke a program instruction stored by the memory 701 to perform the following operations:
若所述确定所述目标语音数据异常,则输出第一提示,其中,所述第一提示用于提示麦克话筒可能被用户堵住;If the determining that the target voice data is abnormal, outputting a first prompt, wherein the first prompt is used to prompt the microphone microphone to be blocked by the user;
在输出所述第一提示后,间隔第一持续时长,继续所述获取经上行通话通路传输的目标语音数据;After outputting the first prompt, the interval is the first duration, and the acquiring the target voice data transmitted through the uplink call path is continued;
若所述第一持续时长后获取的所述目标语音数据异常,则根据所述麦克话筒的麦克通路个数进行异常处理。If the target voice data acquired after the first duration is abnormal, abnormal processing is performed according to the number of microphone channels of the microphone.
在本发明的一种实施方式中,所述处理器703还用于调用所述存储器701存储的程序指令执行如下操作:In an embodiment of the present invention, the processor 703 is further configured to invoke a program instruction stored by the memory 701 to perform the following operations:
获取所述麦克话筒的每一麦克通路采集的目标语音数据;Obtaining target voice data collected by each of the microphone channels of the microphone microphone;
当所述第二持续时长后获取的所述目标语音数据在频域上异常时,若所述麦克话筒只有一个麦克通路,则输出第二提示,其中,所述第二提示用于提示用户所述麦克通路可能故障;And when the target voice data acquired after the second duration is abnormal in the frequency domain, if the microphone microphone has only one microphone path, outputting a second prompt, wherein the second prompt is used to prompt the user The microphone path may be faulty;
若所述麦克话筒有至少两个麦克通路、且部分麦克通路采集的所述目标语音数据在频域上异常,则选择其它正常麦克通路进行语音通话;If the microphone microphone has at least two mic paths, and the target voice data collected by the partial mic path is abnormal in the frequency domain, another normal mic path is selected for voice call;
若所述麦克话筒有至少两个麦克通路、且全部麦克通路采集的所述目标语音数据在频域上异常,则输出第三提示,其中,所述第三提示用于提示用户所述麦克通路可能全部故障。If the microphone microphone has at least two mic paths, and the target voice data collected by all the mic paths is abnormal in the frequency domain, outputting a third prompt, wherein the third prompt is used to prompt the user to the microphone path May all fail.
在本发明的一种实施方式中,所述处理器703还用于调用所述存储器701存储的程序指令执行如下操作:In an embodiment of the present invention, the processor 703 is further configured to invoke a program instruction stored by the memory 701 to perform the following operations:
所述获取经上行通话通路传输的目标语音数据,包括:The acquiring the target voice data transmitted through the uplink call path includes:
按照预设时间间隔,获取经所述上行通话通路传输的目标语音数据;Obtaining target voice data transmitted through the uplink call path according to a preset time interval;
若第二持续时长内获取的所述目标语音数据异常,则继续执行所述输出第一提示的步骤,其中,所述第二持续时长为当前间隔时间、或包括所述当前间隔时间在内的至少两段连续间隔时间。And if the target voice data acquired in the second duration is abnormal, the step of outputting the first prompt is continued, where the second duration is the current interval, or the current interval is included At least two consecutive intervals.
在一些实施方式中,所述异常检测装置700包括的存储器701、接收器702和处理器703可以是移动终端的部分部件,所述移动终端可以包括手机、平板电脑、PDA(Personal Digital Assistant,个人数字助理)、POS(Point of Sales,销售终端)、车载电脑等。In some embodiments, the memory 701, the receiver 702, and the processor 703 included in the abnormality detecting apparatus 700 may be part of a mobile terminal, and the mobile terminal may include a mobile phone, a tablet, a PDA (Personal Digital Assistant, personal Digital Assistant), POS (Point of Sales), on-board computer, etc.
存储器701可用于存储软件程序以及模块,处理器703通过运行存储在存储器701的软件程序以及模块,从而执行移动终端的各种功能应用以及数据处理。存储器701可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图象播放功能等)等;存储数据区可存储根据移动终端的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器701可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。The memory 701 can be used to store software programs and modules, and the processor 703 executes various functional applications and data processing of the mobile terminal by running software programs and modules stored in the memory 701. The memory 701 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored. Data created according to the use of the mobile terminal (such as audio data, phone book, etc.). Further, the memory 701 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
接收器702可以接收用户的语音。例如,接收器702可以包括麦克风或其他接收用户语音的结构。麦克风可将收集的声音信号转换为信号,该信号由音频电路接收后转换为音频数据,再将音频数据输出至RF电路以发送给比如另一移动终端,或者将音频数据输出至存储器701以便进一步处理。 Receiver 702 can receive the user's voice. For example, receiver 702 can include a microphone or other structure that receives user speech. The microphone can convert the collected sound signal into a signal, which is received by the audio circuit and then converted into audio data, and then the audio data is output to an RF circuit for transmission to, for example, another mobile terminal, or the audio data is output to the memory 701 for further deal with.
处理器703是移动终端的控制中心,利用各种接口和线路连接整个移动终端的各个部分,通过运行或执行存储在存储器701内的软件程序和/或模块,以及调用存储在存储器701内的数据,执行移动终端的各种功能和处理数据,从而对移动终端进行整体监控。可选的,处理器703可包括一个或多个处理单元;优选的,处理器703可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器703中。The processor 703 is a control center of the mobile terminal that connects various parts of the entire mobile terminal using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 701, and calling data stored in the memory 701. The mobile terminal performs various functions and processing data to perform overall monitoring on the mobile terminal. Optionally, the processor 703 may include one or more processing units; preferably, the processor 703 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like. The modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 703.
可以理解的是,异常检测装置700还可以包括射频电路,用于接收和发送用户的语音数 据。例如,射频电路可接收网络设备发送过来的下行语音数据并处理,或将接收的上行语音数据发送至网络设备,以便进行正常的语音通话等业务。It can be understood that the abnormality detecting device 700 can further include a radio frequency circuit for receiving and transmitting the user's voice data. For example, the radio frequency circuit can receive and process the downlink voice data sent by the network device, or send the received uplink voice data to the network device, so as to perform services such as normal voice calls.
异常检测装置700可以包括上述更多或更少的硬件结构,对于异常检测装置700的具体结构,本发明实施例不作具体限定。The abnormality detecting device 700 may include more or less hardware structures as described above, and the specific structure of the abnormality detecting device 700 is not specifically limited in the embodiment of the present invention.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。A person skilled in the art can clearly understand that, for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application, in essence or the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。The above embodiments are only used to explain the technical solutions of the present application, and are not limited thereto; although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still The technical solutions described in the embodiments are modified, or the equivalents of the technical features are replaced by the equivalents. The modifications and substitutions of the embodiments do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims (15)

  1. 一种语音数据的异常检测方法,其特征在于,包括:An abnormality detecting method for voice data, comprising:
    获取经上行通话通路传输的目标语音数据;Acquiring target voice data transmitted through the uplink call path;
    通过分析所述目标语音数据中的低频能量或高频能量的大小,判断所述目标语音数据中的高频能量是否少于正常语音数据中的高频能量;Determining whether the high frequency energy in the target speech data is less than the high frequency energy in the normal speech data by analyzing the magnitude of the low frequency energy or the high frequency energy in the target speech data;
    若是,则确定所述目标语音数据异常。If yes, it is determined that the target voice data is abnormal.
  2. 根据权利要求1所述的方法,其特征在于,所述通过分析所述目标语音数据中的低频能量的大小,判断所述目标语音数据中的高频能量是否少于正常语音数据中的高频能量,包括:The method according to claim 1, wherein the determining whether the high frequency energy in the target speech data is less than the high frequency in the normal speech data by analyzing the magnitude of the low frequency energy in the target speech data Energy, including:
    通过对所述目标语音数据进行低通滤波,获取所述目标语音数据中的低频数据;Obtaining low frequency data in the target voice data by low pass filtering the target voice data;
    计算低频能量占比,其中,所述低频能量占比为所述目标语音数据中低频数据的总能量占所述目标语音数据的总能量的比重;Calculating a low frequency energy ratio, wherein the low frequency energy ratio is a proportion of a total energy of the low frequency data in the target voice data to a total energy of the target voice data;
    若所述低频能量占比大于低频占比门限,则确定所述目标语音数据中的高频能量少于正常语音数据中的高频能量,其中,所述低频占比门限为所述正常语音数据中低频数据的总能量占所述正常语音数据的总能量的比重。If the low frequency energy ratio is greater than the low frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the low frequency occupancy threshold is the normal voice data The total energy of the low frequency data accounts for the proportion of the total energy of the normal voice data.
  3. 根据权利要求1所述的方法,其特征在于,所述通过分析所述目标语音数据中的高频能量的大小,判断所述目标语音数据中的高频能量是否少于正常语音数据中的高频能量,包括:The method according to claim 1, wherein the determining whether the high frequency energy in the target speech data is lower than the high in the normal speech data by analyzing the magnitude of the high frequency energy in the target speech data Frequency energy, including:
    通过对所述目标语音数据进行高通滤波,获取所述目标语音数据中的高频数据;Obtaining high frequency data in the target voice data by performing high-pass filtering on the target voice data;
    计算高频能量占比,其中,所述高频能量占比为所述目标语音数据中高频数据的总能量占所述目标语音数据的总能量的比重;Calculating a high frequency energy ratio, wherein the high frequency energy ratio is a proportion of a total energy of the high frequency data in the target voice data to a total energy of the target voice data;
    若所述高频能量占比小于高频占比门限,则确定所述目标语音数据中的高频能量少于正常语音数据中的高频能量,其中,所述高频占比门限为所述正常语音数据中高频数据的总能量占所述正常语音数据的总能量的比重。If the high frequency energy ratio is less than the high frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the high frequency occupancy threshold is The total energy of the high frequency data in the normal voice data accounts for the proportion of the total energy of the normal voice data.
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述确定所述目标语音数据异常之后,还包括:The method according to any one of claims 1 to 3, wherein after the determining the abnormality of the target voice data, the method further comprises:
    输出第一提示,其中,所述第一提示用于提示麦克话筒可能被用户堵住。A first prompt is output, wherein the first prompt is used to prompt the microphone to be blocked by the user.
  5. 根据权利要求4所述的方法,其特征在于,所述输出所述第一提示之后,还包括:The method according to claim 4, wherein after the outputting the first prompt, the method further comprises:
    间隔第一持续时长,获取经上行通话通路传输的目标语音数据;Obtaining the target voice data transmitted through the uplink call path according to the first duration of the interval;
    若确定所述第一持续时长后获取的所述目标语音数据异常,则根据所述麦克话筒的麦克通路个数进行异常处理。If it is determined that the target voice data acquired after the first duration is abnormal, abnormal processing is performed according to the number of the microphone channels of the microphone.
  6. 根据权利要求5所述的方法,其特征在于,所述获取经上行通话通路传输的目标语音数据,包括:The method according to claim 5, wherein the acquiring the target voice data transmitted via the uplink call path comprises:
    获取所述麦克话筒的每一麦克通路采集的目标语音数据;Obtaining target voice data collected by each of the microphone channels of the microphone microphone;
    则,所述根据所述麦克话筒的麦克通路个数进行异常处理,包括:Then, the abnormal processing is performed according to the number of the microphone channels of the microphone, including:
    若所述麦克话筒只有一个麦克通路,则输出第二提示,其中,所述第二提示用于提示用户所述麦克通路可能故障;If the microphone microphone has only one microphone path, outputting a second prompt, wherein the second prompt is used to prompt the user that the microphone path may be faulty;
    若所述麦克话筒有至少两个麦克通路、且部分麦克通路采集的所述目标语音数据在频域 上异常,则选择其它正常麦克通路进行语音通话;If the microphone microphone has at least two mic paths, and the target voice data collected by the partial mic path is abnormal in the frequency domain, another normal mic path is selected for voice call;
    若所述麦克话筒有至少两个麦克通路、且全部麦克通路采集的所述目标语音数据在频域上异常,则输出第三提示,其中,所述第三提示用于提示用户所述麦克通路可能全部故障。If the microphone microphone has at least two mic paths, and the target voice data collected by all the mic paths is abnormal in the frequency domain, outputting a third prompt, wherein the third prompt is used to prompt the user to the microphone path May all fail.
  7. 根据权利要求5所述的方法,其特征在于,所述获取经上行通话通路传输的目标语音数据,包括:The method according to claim 5, wherein the acquiring the target voice data transmitted via the uplink call path comprises:
    按照预设时间间隔,获取经所述上行通话通路传输的目标语音数据;Obtaining target voice data transmitted through the uplink call path according to a preset time interval;
    则,所述方法还包括:Then, the method further includes:
    若第二持续时长内获取的所述目标语音数据异常,则继续执行所述输出第一提示的步骤,其中,所述第二持续时长为当前间隔时间、或包括所述当前间隔时间在内的至少两段连续间隔时间。And if the target voice data acquired in the second duration is abnormal, the step of outputting the first prompt is continued, where the second duration is the current interval, or the current interval is included At least two consecutive intervals.
  8. 一种语音数据的异常检测装置,其特征在于,包括:An abnormality detecting device for voice data, comprising:
    数据获取单元,用于获取经上行通话通路传输的目标语音数据;a data acquiring unit, configured to acquire target voice data transmitted through the uplink call path;
    异常检测单元,用于通过分析所述目标语音数据中的低频能量或高频能量的大小,判断所述目标语音数据中的高频能量是否少于正常语音数据中的高频能量;若是,则确定所述目标语音数据异常。An abnormality detecting unit, configured to determine whether the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data by analyzing the magnitude of the low frequency energy or the high frequency energy in the target voice data; if yes, Determining that the target voice data is abnormal.
  9. 根据权利要求8所述的装置,其特征在于,所述异常检测单元包括:The device according to claim 8, wherein the abnormality detecting unit comprises:
    低通滤波子单元,用于通过对所述目标语音数据进行低通滤波,获取所述目标语音数据中的低频数据;a low pass filtering subunit, configured to acquire low frequency data in the target voice data by performing low pass filtering on the target voice data;
    占比计算子单元,用于计算低频能量占比,其中,所述低频能量占比为所述目标语音数据中低频数据的总能量占所述目标语音数据的总能量的比重;a percentage calculation subunit for calculating a low frequency energy ratio, wherein the low frequency energy ratio is a proportion of a total energy of the low frequency data in the target voice data to a total energy of the target voice data;
    异常确定子单元,用于若所述低频能量占比大于低频占比门限,则确定所述目标语音数据中的高频能量少于正常语音数据中的高频能量,其中,所述低频占比门限为所述正常语音数据中低频数据的总能量占所述正常语音数据的总能量的比重。An abnormality determining subunit, configured to determine that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, if the low frequency energy ratio is greater than a low frequency occupancy threshold, wherein the low frequency ratio The threshold is a proportion of the total energy of the low frequency data in the normal voice data to the total energy of the normal voice data.
  10. 根据权利要求8所述的装置,其特征在于,所述异常检测单元包括:The device according to claim 8, wherein the abnormality detecting unit comprises:
    高通滤波子单元,用于通过对所述目标语音数据进行高通滤波,获取所述目标语音数据中的高频数据;a high-pass filtering sub-unit, configured to acquire high-frequency data in the target voice data by performing high-pass filtering on the target voice data;
    占比计算子单元,用于计算高频能量占比,其中,所述高频能量占比为所述目标语音数据中高频数据的总能量占所述目标语音数据的总能量的比重;a ratio calculating subunit for calculating a high frequency energy ratio, wherein the high frequency energy ratio is a proportion of a total energy of the high frequency data in the target voice data to a total energy of the target voice data;
    异常确定子单元,用于若所述高频能量占比小于高频占比门限,则确定所述目标语音数据中的高频能量少于正常语音数据中的高频能量,其中,所述高频占比门限为所述正常语音数据中高频数据的总能量占所述正常语音数据的总能量的比重。An abnormality determining subunit, configured to determine that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, if the high frequency energy ratio is less than a high frequency occupancy threshold, wherein the high The frequency occupancy threshold is a proportion of the total energy of the high frequency data in the normal voice data to the total energy of the normal voice data.
  11. 根据权利要求8至10任一项所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 8 to 10, wherein the device further comprises:
    异常提示单元,用于若所述异常检测单元确定所述目标语音数据异常,则输出第一提示,其中,所述第一提示用于提示麦克话筒可能被用户堵住。The abnormality prompting unit is configured to output a first prompt if the abnormality detecting unit determines that the target voice data is abnormal, wherein the first prompt is used to prompt the microphone to be blocked by the user.
  12. 根据权利要求11任一项所述的装置,其特征在于,所述装置还包括:The device according to any one of claims 11 to 4, wherein the device further comprises:
    时钟计时单元,用于在输出所述第一提示后,间隔第一持续时长,触发所述数据获取单元获取经上行通话通路传输的目标语音数据;a clock timing unit, configured to: after the outputting the first prompt, an interval of a first duration, triggering the data acquiring unit to acquire target voice data transmitted through an uplink call path;
    异常处理单元,用于若所述异常检测单元确定所述第一持续时长后获取的所述目标语音 数据异常,则根据所述麦克话筒的麦克通路个数进行异常处理。The abnormality processing unit is configured to perform abnormal processing according to the number of the microphone channels of the microphone when the abnormality detecting unit determines that the target voice data acquired after the first duration is abnormal.
  13. 根据权利要求11所述的装置,其特征在于,所述数据获取单元,具体用于获取所述麦克话筒的每一麦克通路采集的目标语音数据;The device according to claim 11, wherein the data acquisition unit is specifically configured to acquire target voice data collected by each of the microphone channels of the microphone microphone;
    则,所述异常处理单元,具体用于当所述异常检测单元确定所述第一持续时长后获取的所述目标语音数据异常时,若所述麦克话筒只有一个麦克通路,则输出第二提示,其中,所述第二提示用于提示用户所述麦克通路可能故障;若所述麦克话筒有至少两个麦克通路、且部分麦克通路采集的所述目标语音数据在频域上异常,则选择其它正常麦克通路进行语音通话;若所述麦克话筒有至少两个麦克通路、且全部麦克通路采集的所述目标语音数据在频域上异常,则输出第三提示,其中,所述第三提示用于提示用户所述麦克通路可能全部故障。The exception processing unit is specifically configured to: when the abnormality detecting unit determines that the target voice data is abnormal after the first duration is determined, if the microphone microphone has only one microphone path, outputting a second prompt The second prompt is used to prompt the user that the microphone path may be faulty; if the microphone microphone has at least two microphone paths, and the target voice data collected by the partial microphone path is abnormal in the frequency domain, then selecting The other normal microphone path performs a voice call; if the microphone microphone has at least two microphone paths, and the target voice data collected by all the microphone channels is abnormal in the frequency domain, a third prompt is output, wherein the third prompt It is used to prompt the user that the microphone path may be completely faulty.
  14. 根据权利要求11所述的装置,其特征在于,所述数据获取单元,具体用于按照预设时间间隔,获取经所述上行通话通路传输的目标语音数据;The device according to claim 11, wherein the data acquiring unit is configured to acquire target voice data transmitted through the uplink call path according to a preset time interval;
    则,所述异常处理单元,还用于若所述异常检测单元确定第二持续时长内获取的所述目标语音数据异常,则触发所述异常提示单元输出第一提示,其中,所述第二持续时长为当前间隔时间、或包括所述当前间隔时间在内的至少两段连续间隔时间。The exception processing unit is further configured to: if the abnormality detecting unit determines that the target voice data is abnormal in the second duration, trigger the abnormal prompting unit to output a first prompt, where the second The duration is the current interval, or at least two consecutive intervals including the current interval.
  15. 一种语音数据的异常检测装置,其特征在于,包括:处理器、存储器、系统总线;An abnormality detecting device for voice data, comprising: a processor, a memory, and a system bus;
    所述处理器以及所述存储器通过所述系统总线相连;The processor and the memory are connected by the system bus;
    所述存储器用于存储一个或多个程序,所述一个或多个程序包括指令,所述指令当被所述异常检测装置执行时使所述异常检测装置执行如权利要求1-7中任一项所述的方法。The memory is for storing one or more programs, the one or more programs including instructions that, when executed by the abnormality detecting device, cause the abnormality detecting device to perform any of claims 1-7 The method described in the item.
PCT/CN2018/107572 2017-09-27 2018-09-26 Method and device for detecting abnormalities of voice data WO2019062751A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710890904.X 2017-09-27
CN201710890904.XA CN109561222A (en) 2017-09-27 2017-09-27 A kind of method for detecting abnormality and device of voice data

Publications (1)

Publication Number Publication Date
WO2019062751A1 true WO2019062751A1 (en) 2019-04-04

Family

ID=65863980

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/107572 WO2019062751A1 (en) 2017-09-27 2018-09-26 Method and device for detecting abnormalities of voice data

Country Status (2)

Country Link
CN (1) CN109561222A (en)
WO (1) WO2019062751A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291421B (en) * 2019-07-24 2021-09-21 中国移动通信集团广东有限公司 Single-pass detection method and device based on voice communication, storage medium and electronic equipment
CN110536193B (en) * 2019-07-24 2020-12-22 华为技术有限公司 Audio signal processing method and device
CN111491061B (en) * 2020-04-21 2021-08-06 Oppo广东移动通信有限公司 Audio detection method and device for call scene and related equipment
CN112268688B (en) * 2020-09-04 2022-06-07 上海士翌测试技术有限公司 Error data identification method and device
CN113040771B (en) * 2021-03-01 2022-12-23 青岛歌尔智能传感器有限公司 Emotion recognition method, system, wearable device and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102948169A (en) * 2010-06-23 2013-02-27 摩托罗拉移动有限责任公司 Microphone interference detection method and apparatus
US20130329895A1 (en) * 2012-06-08 2013-12-12 Apple Inc. Microphone occlusion detector
CN103578470A (en) * 2012-08-09 2014-02-12 安徽科大讯飞信息科技股份有限公司 Telephone recording data processing method and system
CN106911996A (en) * 2017-03-03 2017-06-30 广东欧珀移动通信有限公司 The detection method of microphone state, device and terminal device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106878866B (en) * 2017-03-03 2020-01-10 Oppo广东移动通信有限公司 Audio signal processing method and device and terminal

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102948169A (en) * 2010-06-23 2013-02-27 摩托罗拉移动有限责任公司 Microphone interference detection method and apparatus
US20130329895A1 (en) * 2012-06-08 2013-12-12 Apple Inc. Microphone occlusion detector
CN103578470A (en) * 2012-08-09 2014-02-12 安徽科大讯飞信息科技股份有限公司 Telephone recording data processing method and system
CN106911996A (en) * 2017-03-03 2017-06-30 广东欧珀移动通信有限公司 The detection method of microphone state, device and terminal device

Also Published As

Publication number Publication date
CN109561222A (en) 2019-04-02

Similar Documents

Publication Publication Date Title
WO2019062751A1 (en) Method and device for detecting abnormalities of voice data
CN107548564B (en) Method, device, terminal and storage medium for determining voice input abnormity
US11270696B2 (en) Audio device with wakeup word detection
EP3163849B1 (en) Method and apparatus for selecting main microphone
US9560456B2 (en) Hearing aid and method of detecting vibration
US20160094718A1 (en) Detection of Acoustic Echo Cancellation
JP2597817B2 (en) Audio signal detection method
CN107645696A (en) One kind is uttered long and high-pitched sounds detection method and device
KR20180036778A (en) Event detection for playback management in audio devices
JP6381062B2 (en) Method and device for processing audio signals for communication devices
US20140236590A1 (en) Communication apparatus and voice processing method therefor
CN110913214A (en) Method and device for detecting faults of array microphone at television end
US10540983B2 (en) Detecting and reducing feedback
CN113949955A (en) Noise reduction processing method and device, electronic equipment, earphone and storage medium
CN113286244B (en) Microphone anomaly detection method and device
CN113543010B (en) Detection method and device for microphone equipment, storage medium and processor
US11373669B2 (en) Acoustic processing method and acoustic device
CN108605191B (en) Abnormal sound detection method and device
CN111586547B (en) Detection method and device of audio input module and storage medium
CN112289336A (en) Audio signal processing method and device
CN111477246B (en) Voice processing method and device and intelligent terminal
CN115696110A (en) Audio device and audio signal processing method
US11922933B2 (en) Voice processing device and voice processing method
JP7144078B2 (en) Signal processing device, voice call terminal, signal processing method and signal processing program
CN111510841A (en) Audio component detection method and device and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18862993

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18862993

Country of ref document: EP

Kind code of ref document: A1