WO2019062751A1

WO2019062751A1 - Method and device for detecting abnormalities of voice data

Info

Publication number: WO2019062751A1
Application number: PCT/CN2018/107572
Authority: WO
Inventors: 杨霖; 韩晓; 尹朝阳; 苏俊峰; 王建鹏; 高骏鹏
Original assignee: 华为技术有限公司
Priority date: 2017-09-27
Filing date: 2018-09-26
Publication date: 2019-04-04
Also published as: CN109561222A

Abstract

Disclosed in the present application are a method and a device for detecting abnormalities of voice data. In the method, firstly target voice data transmitted via an uplink call path is acquired. As in normal voice data, the proportion of low frequency energy is large, and the proportion of high frequency energy is small, it can be determined, by analyzing the magnitude of the low frequency energy or the high frequency energy in the target voice data, whether the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, and if the result of the determination is yes, it is indicated that the high frequency energy in the target voice data is lost or intercepted, and thus it can be determined that the target voice data is abnormal.

Description

Method and device for detecting abnormality of voice data

This application claims the priority of the Chinese Patent Application, filed on Sep. 27, 2017, to the Chinese Patent Office, Application No. 201710890904.X, entitled "A Method and Terminal for Finding Icons", the entire contents of which are incorporated by reference. In this application.

Technical field

The present application relates to the field of voice technologies, and in particular, to an abnormality detection method and apparatus for voice data.

Background technique

In the daily application of the mobile phone, the voice call function is one of the basic applications of the mobile phone, and the quality of the voice call is directly related to the user's feeling of using the mobile phone. During the voice call, the voice data collected from the local mobile phone is transmitted to the opposite mobile phone through the audio effect processing, which is called the uplink call path; otherwise, the voice data received by the local mobile phone from the opposite mobile phone is played through the speaker or the earpiece. , called the down call path.

At present, various mobile phone manufacturers and open source organizations are mainly in the algorithm of sound effect processing, and the detection of abnormal sound effects is not very concerned. Although various mobile phone manufacturers have also developed some speech anomaly detection algorithms, the existing speech detection technologies are all for detecting time domain signals of speech. This time domain detection method directly performs amplitude and active on the collected speech signals. The analysis of the degree and the abnormality of the jump makes the accuracy of the abnormality detection result unsatisfactory.

However, the inventor of the present application found that during the actual call, there is a scene in which the time domain signal in the voice data is normal but the frequency domain signal is abnormal. Such a scenario may cause an abnormal problem such as silence or discontinuity during the call, but Such voice data with abnormal frequency domain signals cannot be detected by the existing time domain detection method, and thus there is no regulation to avoid abnormal call phenomena caused by abnormal frequency domain signals.

Summary of the invention

The main purpose of the embodiment of the present application is to provide an abnormality detecting method and device for voice data, which can detect voice data with abnormal frequency domain.

In a first aspect, the present application provides an abnormality detecting method for voice data, including:

Acquiring target voice data transmitted through the uplink call path;

Determining whether the high frequency energy in the target speech data is less than the high frequency energy in the normal speech data by analyzing the magnitude of the low frequency energy or the high frequency energy in the target speech data;

If yes, it is determined that the target voice data is abnormal.

In a first possible implementation manner of the first aspect, the determining, by analyzing the magnitude of the low-frequency energy in the target voice data, determining whether the high-frequency energy in the target voice data is lower than the normal voice data Frequency energy, including:

Obtaining low frequency data in the target voice data by low pass filtering the target voice data;

Calculating a low frequency energy ratio, wherein the low frequency energy ratio is a proportion of a total energy of the low frequency data in the target voice data to a total energy of the target voice data;

If the low frequency energy ratio is greater than the low frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the low frequency occupancy threshold is the normal voice data The total energy of the low frequency data accounts for the proportion of the total energy of the normal voice data.

In a second possible implementation manner of the first aspect, the determining whether the high frequency energy in the target voice data is less than the normal voice data by analyzing the magnitude of the high frequency energy in the target voice data High frequency energy, including:

Obtaining high frequency data in the target voice data by performing high-pass filtering on the target voice data;

Calculating a high frequency energy ratio, wherein the high frequency energy ratio is a proportion of a total energy of the high frequency data in the target voice data to a total energy of the target voice data;

If the high frequency energy ratio is less than the high frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the high frequency occupancy threshold is The total energy of the high frequency data in the normal voice data accounts for the proportion of the total energy of the normal voice data.

With reference to the first aspect, or the first possible implementation manner of the first aspect, or the second possible implementation manner of the first aspect, in the third possible implementation manner, after the determining that the target voice data is abnormal, Also includes:

Outputting a first prompt, wherein the first prompt is used to prompt the microphone microphone to be blocked by the user;

After outputting the first prompt, the interval is the first duration, and the acquiring the target voice data transmitted through the uplink call path is continued;

If the target voice data acquired after the first duration is abnormal, abnormal processing is performed according to the number of microphone channels of the microphone.

With reference to the third possible implementation manner of the foregoing aspect, in a fourth possible implementation, the acquiring the target voice data that is transmitted by using the uplink call channel includes:

Obtaining target voice data collected by each of the microphone channels of the microphone microphone;

Then, the abnormal processing is performed according to the number of the microphone channels of the microphone, including:

If the microphone microphone has only one microphone path, outputting a second prompt, wherein the second prompt is used to prompt the user that the microphone path may be faulty;

If the microphone microphone has at least two mic paths, and the target voice data collected by the partial mic path is abnormal in the frequency domain, another normal mic path is selected for voice call;

If the microphone microphone has at least two mic paths, and the target voice data collected by all the mic paths is abnormal in the frequency domain, outputting a third prompt, wherein the third prompt is used to prompt the user to the microphone path May all fail.

With reference to the third possible implementation manner of the foregoing aspect, in the fifth possible implementation, the acquiring the target voice data that is transmitted by using the uplink call channel includes:

Obtaining target voice data transmitted through the uplink call path according to a preset time interval;

Then, the method further includes:

If the target voice data acquired in the first duration is abnormal in the frequency domain, performing an exception processing, where the first duration is the current interval, or at least two including the current interval Segment continuous interval.

In a second aspect, there is provided an anomaly detecting apparatus for voice data, the anomaly detecting apparatus comprising means for performing the method provided by the first aspect or any of the possible implementations of the first aspect.

In a third aspect, an abnormality detecting apparatus for voice data is provided, the abnormality detecting apparatus comprising: a processor, a memory, and a bus system; the processor and the memory are connected by the system bus; and the memory is used for one Or a plurality of programs, the one or more programs comprising instructions that, when executed by the anomaly detecting device, cause the anomaly detecting device to perform the first aspect or any one of the possible implementations of the first aspect Methods.

A fourth aspect, a computer readable storage medium storing one or more programs, the one or more programs being executed by the abnormality detecting device, the abnormality detecting device performing the first aspect or the first aspect The method provided by any of the possible implementations.

In a fifth aspect, a graphical user interface is provided on an anomaly detecting device, the anomaly detecting device comprising a display, a memory, a plurality of applications, and one or one of executing one or more programs stored in the memory A plurality of processors, the graphical user interface comprising a user interface displayed in accordance with the method provided by the first aspect or any one of the possible implementations of the first aspect, wherein the display comprises a touch-sensitive surface and a display screen.

The method and device for detecting anomaly of voice data provided by the present application first acquire target voice data transmitted through an uplink call path; since normal voice data has a large proportion of low frequency energy and a small proportion of high frequency energy, therefore, analysis is performed. The magnitude of the low-frequency energy or the high-frequency energy in the target speech data can determine whether the high-frequency energy in the target speech data is less than the high-frequency energy in the normal speech data; when the judgment result is yes, The high frequency energy in the target speech data is lost or truncated, so it can be determined that the target speech data is abnormal in the frequency domain.

DRAWINGS

1 is a schematic diagram of an uplink call path of a mobile phone according to an embodiment of the present application;

2 is a schematic flowchart of a method for detecting an abnormality of voice data according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of amplitude/frequency of normal voice data according to an embodiment of the present application; FIG.

FIG. 4 is a schematic flowchart of a method for detecting an abnormality of a voice data according to an embodiment of the present disclosure;

FIG. 5 is a second schematic flowchart of a method for detecting an abnormality of a voice data according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of an abnormality detecting apparatus for voice data according to an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of hardware of an abnormality detecting apparatus for voice data according to an embodiment of the present disclosure.

Detailed ways

Embodiments of the present application will be described below with reference to the accompanying drawings.

When a user makes a call using a mobile device such as a mobile phone or a landline, there are some abnormalities, such as a microphone (MIC), a board-level connection failure, a user failure, etc., wherein the MIC unit is faulty. It refers to the entry of impurities into the diaphragm of the MIC unit of the talking device, causing partial adhesion of the diaphragm. The board-level connection failure refers to a situation in which an instantaneous short circuit occurs in the audio path, and the user-used fault refers to a user error during the call. The operation causes the finger to block the MIC hole. During the call, when one or more faults occur, the time domain signal in the voice data may be normal but the frequency domain signal is abnormal, and the abnormal frequency domain signal may cause a silent or intermittent call problem during the call.

It can be seen that during the actual call, the above-mentioned faults may cause the time domain signal of the voice data to not change greatly, but the frequency domain signal is abnormal. However, the existing voice call detection technologies are all for the time domain signal of the voice. Detection, there is no accurate, fast and effective detection method to detect whether the voice data is abnormal in the frequency domain, and then can not eliminate the frequency domain anomaly detection result to eliminate the cause of the silent or intermittent failure during the call due to the frequency domain anomaly. .

In order to be able to detect whether the voice data is abnormal in the frequency domain, the embodiment of the present application provides an abnormality detection method for voice data, which can accurately, quickly, and effectively detect whether the voice data is abnormal in the frequency domain, and can also determine After the frequency domain is abnormal, the possible causes of the frequency domain anomaly are investigated and the exception is processed. It should be noted that the method provided by the embodiment of the present application can be applied to any type of voice call device, such as a mobile phone or a landline, which does not limit the type of the voice call device.

A specific application scenario is provided for the method provided by the embodiment of the present application. Referring to FIG. 1 , it is a schematic diagram of an uplink call path of a mobile phone. When the user 1 uses the mobile phone to make a voice call, the voice data of the user 1 is collected by the MIC of the mobile phone. After that, it is passed to the COder-DECoder (Codec) chip for A/D conversion, that is, the analog voice signal is converted into a digital voice signal, and then the voice data is transmitted to the sound algorithm module for sound processing, and the sound effect is obtained. The processed voice data is protocol-encoded and transmitted to the modem (English name Modem), and the modem finally sends the encoded data to the mobile phone or landline of the peer user 2.

As shown in FIG. 1 , the embodiment may add an abnormality detecting module based on the physical structure of the existing mobile phone, and use the abnormality detecting module to perform frequency domain abnormality detection of the voice data, wherein the abnormality detecting module and the sound effect algorithm module may be adopted. The same or different Digital Signal Processing (DSP) chips are used to implement related functions. Since the normal voice data has a low proportion of low frequency energy and a small proportion of high frequency energy, the high frequency energy in the voice data may be lost or cut off due to the above MIC single unit failure, board level connection failure, user use failure, and the like. Therefore, it is possible to determine whether the high frequency energy in the collected voice data is less than the normal voice data by analyzing the magnitude of the low frequency energy or the high frequency energy in the collected voice data while the sound effect algorithm processing module performs the voice effect processing. The high frequency energy, if the judgment result is yes, indicates that the high frequency energy in the collected speech data is lost or truncated, so that it is possible to determine that the collected speech data is abnormal in the frequency domain.

The collected voice data is a digital voice signal that is A/D converted by the Codec chip. For convenience of description, the collected voice data is hereinafter referred to as target voice data.

2 is a schematic flowchart of a method for detecting an abnormality of voice data according to an embodiment of the present application, where the method includes the following steps S201-S202:

S201: Acquire target voice data transmitted through the uplink call path.

In this embodiment, as shown in FIG. 1, after receiving the voice data sent by the Codec chip, the sound effect algorithm module sends the received voice data to the abnormality detecting module. Of course, the Codec chip can directly transmit the output voice data. The abnormality detecting module detects whether the voice data is abnormal in the frequency domain by the abnormality detecting module, and the detected voice data is the target voice data.

The target voice data may be voice data acquired in a short time (for example, 1 ms), and may be voice data acquired in a long time (for example, 1 s).

S202: determining, by analyzing the magnitude of the low-frequency energy or the high-frequency energy in the target voice data, whether the high-frequency energy in the target voice data is less than the high-frequency energy in the normal voice data; if yes, determining the target The voice data is abnormal.

Referring to Figure 3, there is a schematic diagram of the amplitude/frequency of normal speech data, wherein the abscissa f represents frequency and the ordinate A represents amplitude. In the normal voice data, the low frequency energy has a large proportion and the high frequency energy has a small proportion. Therefore, the low frequency data in the target voice data can be obtained and the energy ratio of the low frequency data in the target voice data can be determined. Determining whether the energy ratio satisfies the ratio of normal voice data to low frequency energy; or, by acquiring high frequency data in the target voice data and determining the energy ratio of the high frequency data in the target voice data, And determining whether the energy ratio satisfies the requirement of the normal voice data for the high frequency energy; if not, the high frequency signal in the target voice data is lost or truncated, so that the target voice data can be determined. Abnormal in the frequency domain.

Specifically, step S202 can be implemented by using one of the following two implementation manners.

In the first embodiment, referring to FIG. 4, S202 may specifically include:

S2021: Obtain low-frequency data in the target voice data by performing low-pass filtering on the target voice data.

A finite impulse response (Finite Impulse Response, abbreviated as FIR) digital filter or an Infinite Impulse Response (IIR) digital filter may be set in advance in the abnormality detecting module shown in FIG. 1 and set to Low pass filter and set low pass frequency threshold f _Lp .

When the target speech data is low-pass filtered by using a low-pass filter, data whose frequency is lower than the threshold f _Lp in the target speech data will pass through the low-pass filter, and the passed data is the target speech data. Low frequency data.

S2022: Calculate a low frequency energy ratio, wherein the low frequency energy ratio is a proportion of a total energy of the low frequency data in the target voice data to a total energy of the target voice data.

Calculating the low frequency data energy E _Lp in the target speech data and the total energy E _{ALL of} the target speech data, and then calculating the low frequency energy ratio Kactucal=E _Lp /E _ALL .

S2023: if the low frequency energy ratio is greater than the low frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the low frequency occupancy threshold is the normal The total energy of the low frequency data in the voice data is the proportion of the total energy of the normal voice data.

After S2022 calculates the low-frequency energy ratio of the low-frequency data to Kactucal, if the low-frequency energy ratio Kactucal exceeds the low-frequency occupancy threshold Kthreshold, that is, Kactucal>Kthreshold, it indicates that the low-frequency energy ratio in the target speech data is high, thereby indicating The high frequency signal in the target speech data is lost or truncated, thereby indicating that the target speech data is abnormal in the frequency domain.

Normally, the proportion of low-frequency energy in normal speech data may not be a fixed value, but a range of values. Therefore, when setting the low-frequency occupancy threshold Kthreshold, it can be set to the proportion of low-frequency energy in normal speech data. The maximum value, or minimum value, or the mean of the normal proportion range, and so on.

In order to facilitate the understanding of steps S2021-S2023 of the above first embodiment, an example is illustrated:

Taking a certain platform as an example, after the user 1 establishes a normal voice call with the mobile phone 1 and the mobile phone 2, the mobile phone 1 continuously collects the voice data of the user 1 at intervals of 1 ms, assuming that the sampling interval Tunit of the voice data is set to 1 ms. Each MIC path of the mobile phone 1 can collect 48 voice data every 1 ms, and the 48 voice data is the target voice data.

Using a low-pass filter, perform 10th-order (or other order) FIR or IIR low-pass filtering on 48 speech data acquired every 1ms, assuming that the set low-pass filtering frequency threshold f _Lp is 4KHz, then each speech data The data component below 4KHz can pass through the low pass filter, and the data passing through the low pass filter is the low frequency data in the 48 speech data.

The definition of 48 voice data collected every 1 ms is data[0]~data[47], and the low frequency data in each voice data of data[0]~data[47] is defined as data_Lp[0]~data_Lp[47] .

Calculate the low frequency energy and total energy of 48 speech data data[0]~data[47], namely:

The amplitude of the i-th data of the 48 voice data is shown.

Then, the low frequency energy accounts for Kactucal=C*E _Lp /E _ALL , where C is a constant gain.

When the low-frequency energy ratio Kactucal exceeds the low-frequency occupancy threshold Kthreshold, it indicates that the target speech data acquired in the unit time Tunit has a high-frequency signal missing or intercepted.

In the second embodiment, referring to FIG. 5, S202 may specifically include:

S2021: Obtain high frequency data in the target voice data by performing high-pass filtering on the target voice data.

The FIR digital filter or the IIR digital filter may be set in advance in the abnormality detecting module shown in FIG. 1, and set as a high-pass filter and set a high-pass frequency threshold f _Hp .

When the target speech data is high-pass filtered by the high-pass filter, data having a frequency higher than the threshold f _Hp in the target speech data will pass through the high-pass filter, and the passed data is the high frequency in the target speech data. data.

S2022: Calculate a high frequency energy ratio, wherein the high frequency energy ratio is a proportion of a total energy of the high frequency data in the target voice data to a total energy of the target voice data.

Calculating the high frequency data energy E _Hp in the target speech data and the total energy E _{ALL of} the target speech data, and then calculating the high frequency energy ratio Kactucal=E _Hp /E _ALL .

S2023: if the high frequency energy ratio is less than the high frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the high frequency occupancy threshold is The total energy of the high frequency data in the normal voice data accounts for the proportion of the total energy of the normal voice data.

After calculating the high-frequency energy ratio of the low-frequency data by S2022, if the high-frequency energy ratio Kactucal is lower than the high-frequency occupancy threshold Kthreshold, that is, Kactucal<Kthreshold, the high-frequency energy ratio in the target speech data is biased. Low, thereby indicating that the high frequency signal in the target voice data is lost or truncated, thereby indicating that the target voice data is abnormal in the frequency domain.

Normally, the proportion of high-frequency energy in normal speech data may not be a fixed value, but a range. Therefore, when setting the high-frequency occupancy threshold Kthreshold, it can be set to the high-frequency energy in normal speech data. The maximum or minimum of the ratio, or the mean of the normal proportion range, and so on.

To facilitate understanding of S2021-S2023 of the second embodiment described above, an example is illustrated:

Taking a certain platform as an example, after the user 1 establishes a normal voice call with the mobile phone 1 and the mobile phone 2, the mobile phone 1 continuously collects the voice data of the user 1 at intervals of 1 ms, assuming that the sampling interval Tunit of the voice data is set to 1 ms. Each voice channel of the mobile phone 1 can collect 48 voice data every 1 ms, and the 48 voice data is the target voice data.

Using the high-pass filter, 10th order (or other order) FIR or IIR high-pass filtering processing is performed on 48 speech data collected every 1ms. It is assumed that the set high-pass filtering frequency threshold f _Hp is 6KHz, and each speech data is higher than 6KHz. The data component can pass through the high-pass filter, and the data passing through the high-pass filter is the high-frequency data of the 48 voice data.

The definition of 48 voice data collected every 1 ms is data[0]~data[47], and the low frequency data in each voice data of data[0]~data[47] is defined as data_Hp[0]~data_Hp[47] .

Calculate the high frequency energy and total energy of 48 speech data data[0]~data[47], namely:

The amplitude of the i-th data of the 48 voice data is shown.

Then, the high frequency energy accounts for Kactucal=C*E _Hp /E _ALL , where C is a constant gain.

When the high-frequency energy ratio Kactucal is lower than the high-frequency occupancy threshold Kthreshold, it indicates that the target speech data acquired in the unit time Tunit has a phenomenon that the high-frequency signal is lost or intercepted.

Further, the present embodiment can detect whether the frequency domain signal of the voice data is abnormal in a short time, that is, the detection efficiency is high. Therefore, when the voice data is abnormal in the frequency domain, the problem can be quickly processed and Avoidance, thereby improving the user experience of the call device.

Therefore, in an implementation manner of the application, step S202 may further include:

Step A: outputting a first prompt, wherein the first prompt is used to prompt the microphone to be blocked by the user.

During the user's call, when the call is abnormal due to the user's non-standard use (such as blocking the MIC hole), if there is no response prompt, the user may not know the cause of the problem. Therefore, after determining that the voice data is abnormal in the frequency domain, first check the voice data. Whether the user operation is standardized or not, the user can be prompted to improve the abnormal operation by means of mobile phone vibration or mobile phone prompt tone. For example, the first prompt can be output through voice, such as "Your finger may block the MIC hole", the user hears After the prompt, the finger will generally be removed.

Step B: After the first prompt is output, the interval is the first duration, and step S201 is continued.

After the first prompt is output, a certain abnormal cancellation time (ie, the first duration) is reserved for the user, for example, 5 seconds, and then the process proceeds to step S201 to continue collecting voice data and performing abnormality detection.

Step C: If the target voice data acquired after the first duration is abnormal, abnormal processing is performed according to the number of microphone channels of the microphone.

If the voice data is abnormal due to the user's finger blocking the MIC hole, the voice data should return to normal when the user's finger is no longer blocked, but if the voice data is still abnormal, one or more MIC cells of the MIC microphone may appear. malfunction.

Regarding the current mobile phone, if the main MIC unit is physically damaged, the mobile phone will be unusable, and the user must repair the network point to repair it. However, in this embodiment, after detecting the abnormality of the main MIC unit, it can automatically switch to The secondary MIC unit makes a call to ensure the integrity of the call and prompts the user which MIC units may be faulty.

In an embodiment of the present application, the S201 may specifically include: acquiring target voice data collected by each of the microphone channels of the microphone microphone, in order to be able to determine which one or which MICs may be faulty. In this embodiment, the MIC array of the MIC microphone can be detected in advance, and it is determined that the MIC microphone has several MIC monomers, such as only one main MIC monomer, or one main MIC monomer and one or more sub-MIC monomers. Each MIC unit corresponds to one MIC path. Thereafter, the target voice data collected for each MIC channel of the MIC microphone is respectively subjected to frequency domain anomaly detection, that is, the frequency domain anomaly detection of the voice data of each MIC path may not be mutually rely.

However, the existing voice data detection algorithm mainly relies on the time domain signal, and only analyzes the time domain signal collected by the single MIC path, and cannot accurately determine whether the voice data is abnormal, but needs voice data collected by multiple MIC channels. Auxiliary comprehensive judgment, in addition, through the multi-channel comprehensive judgment, there is a problem that the spending period is long and the detection accuracy is low. It can be seen that, compared with the prior art, when determining whether the voice data is abnormal, the embodiment does not need to rely on the voice data collected by the multiple MIC path, and the time taken for the abnormality detection is less and the detection accuracy is low.

It can be seen that when the call device is abnormal, the existing time domain detection technology cannot accurately and quickly detect whether the voice data is abnormal, and the performance and characteristics of the call device cannot be fully exerted. In addition, the existing time domain detection technology relies on multiple MIC paths for abnormal detection of voice data. Therefore, it is impossible to accurately detect whether the MIC path is faulty, and thus there is no problem of avoiding call abnormality caused by MIC failure.

In this embodiment, the voice data collected by each MIC path can be abnormally detected, so that the corresponding MIC path may be determined to be faulty according to the abnormality detection result. Specifically, the microphone path according to the microphone is used in the above step C. The number of exception handling" can include:

C1: If the microphone microphone has only one microphone path, outputting a second prompt, wherein the second prompt is used to prompt the user that the microphone path may be faulty.

If the calling device has only one MIC path, the second prompt, such as a voice prompt or a vibrating alert, reminds the user that the single MIC path of the calling device may be faulty.

C2: If the microphone microphone has at least two mic paths, and the target voice data collected by the partial mic path is abnormal in the frequency domain, select another normal mic path for voice call.

If the call device has multiple MIC paths, when the main MIC path is abnormal, the MIC path with the best voice quality in the remaining sub-MIC channels is selected for the call; if the call device has multiple MIC paths, when the main MIC path and the sub MIC path therein are abnormal Then, select the MIC path with the best voice quality in the remaining sub-MIC channels for the call.

In addition, you can remind the user which MIC channels of the call device may be malfunctioning by voice prompts or vibrating alerts.

C3: outputting a third prompt if the microphone microphone has at least two microphone paths and the target voice data collected by all the microphone channels is abnormal in the frequency domain, wherein the third prompt is used to prompt the user to The mic path may all fail.

If the call device has multiple MIC channels, when all MIC paths are abnormal, the third prompt, such as a voice prompt or a vibration prompt, is used to remind the user that all MIC paths of the call device may be faulty.

It can be seen that, in this embodiment, when an abnormality of one or some MIC paths is detected, the calling device automatically switches to other normal MIC channels for voice calls, thereby ensuring the integrity of the call and prompting the user which MIC paths may appear. Fault, so that the user can carry out repairs in time.

Generally, in a normal voice call, when the voice is intermittent or the silent time exceeds 100 ms, the human ear can have a clear feeling. Therefore, when it is detected by the above steps that the target voice data is abnormal in the frequency domain, The sampling time corresponding to the target speech data is relatively short, for example, 1 ms, and the abnormal processing may not be performed immediately, but the continuous accumulation of the frequency domain abnormal time is performed, for example, the abnormal time accumulation threshold ACC is set to 100 ms, and the frequency domain abnormality detection is accumulated. When it exceeds 100 ms, the abnormality processing is performed by using the above procedure AC.

To this end, in an implementation manner of the present application, S201 may specifically include: acquiring target voice data transmitted through the uplink call path according to a preset time interval. In this embodiment, the A/D converted digital voice data may be acquired at a certain time interval, for example, digital voice data is acquired once every 1 ms, and the data voice data within 1 ms is the target voice data.

S203: If the target voice data acquired in the second duration is abnormal, proceed to step AC, where the second duration is the current interval, or at least two consecutive segments including the current interval. Intervals.

In the present embodiment, it is necessary to set an abnormal time accumulation threshold ACC (ie, a second duration) and an acquisition time corresponding to the target voice data. For example, when the ACC is 100 ms, the voice data collected every 100 ms may be used as the target voice data. If the currently collected target voice data is abnormal in the frequency domain, abnormal processing is performed; for example, it is collected every 1 ms. The voice data is used as the target voice data, and when the target voice data collected for 100 consecutive times is abnormal in the frequency domain, the exception processing is performed.

It can be understood that the existing abnormal voice detection technology mainly relies on time domain signals for detection, and has problems such as low detection accuracy and long detection period (generally 2-3 seconds), and the present embodiment is based on frequency domain signals. Compared with the prior art, there are advantageous effects such as high detection accuracy and short detection period (generally 100-300 milliseconds), so that abnormal processing can be performed quickly. In addition, it is found that the voice anomaly detection method provided in this embodiment is not affected by the age, tone, and the like of the user, and the accuracy of the detection result is more than 80%.

FIG. 6 is a schematic structural diagram of an abnormality detecting apparatus for voice data according to an embodiment of the present disclosure. The abnormality detecting apparatus 600 includes:

The data obtaining unit 601 is configured to acquire target voice data transmitted through the uplink call path.

The abnormality detecting unit 602 is configured to determine whether the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data by analyzing the magnitude of the low frequency energy or the high frequency energy in the target voice data; if yes, Then determining that the target voice data is abnormal.

In an embodiment of the present application, the abnormality detecting unit 602 may include:

a low pass filtering subunit, configured to acquire low frequency data in the target voice data by performing low pass filtering on the target voice data;

a percentage calculation subunit for calculating a low frequency energy ratio, wherein the low frequency energy ratio is a proportion of a total energy of the low frequency data in the target voice data to a total energy of the target voice data;

An abnormality determining subunit, configured to determine that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, if the low frequency energy ratio is greater than a low frequency occupancy threshold, wherein the low frequency ratio The threshold is a proportion of the total energy of the low frequency data in the normal voice data to the total energy of the normal voice data.

a high-pass filtering sub-unit, configured to acquire high-frequency data in the target voice data by performing high-pass filtering on the target voice data;

a ratio calculating subunit for calculating a high frequency energy ratio, wherein the high frequency energy ratio is a proportion of a total energy of the high frequency data in the target voice data to a total energy of the target voice data;

An abnormality determining subunit, configured to determine that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, if the high frequency energy ratio is less than a high frequency occupancy threshold, wherein the high The frequency occupancy threshold is a proportion of the total energy of the high frequency data in the normal voice data to the total energy of the normal voice data.

In an embodiment of the present application, the apparatus 600 may further include:

An abnormality prompting unit, configured to output a first prompt if the abnormality detecting unit 602 determines that the target voice data is abnormal, wherein the first prompt is used to prompt the microphone microphone to be blocked by the user;

a clock timing unit, configured to: after the outputting the first prompt, interval a first duration, triggering the data acquiring unit 601 to acquire target voice data transmitted through an uplink call path;

The abnormality processing unit is configured to perform abnormal processing according to the number of the microphone channels of the microphone microphone if the target voice data acquired after the abnormality detecting unit 602 determines the first duration is abnormal.

In an embodiment of the present application, the data acquiring unit 601 may be specifically configured to acquire target voice data collected by each microphone path of the microphone microphone;

The exception processing unit is specifically configured to: when the abnormality detecting unit 602 determines that the target voice data acquired after the first duration is abnormal in a frequency domain, if the microphone has only one microphone path, And outputting a second prompt, wherein the second prompt is used to prompt the user that the microphone path may be faulty; if the microphone microphone has at least two microphone paths, and the target voice data collected by the partial microphone path is in the frequency domain If the abnormality is abnormal, the other normal microphone path is selected for the voice call; if the microphone microphone has at least two microphone paths, and the target voice data collected by all the microphone channels is abnormal in the frequency domain, the third prompt is output, where The third prompt is used to prompt the user that the microphone path may be all faulty.

In an embodiment of the present application, the data acquiring unit 601 is specifically configured to acquire target voice data transmitted through the uplink call path according to a preset time interval.

The exception processing unit is further configured to: if the abnormality detecting unit 602 determines that the target voice data is abnormal in the second duration, triggering the abnormal prompting unit to output a first prompt, where the The duration of the second duration is the current interval, or at least two consecutive intervals including the current interval.

For the description of the features in the corresponding embodiment of FIG. 6, reference may be made to the related description of the corresponding embodiment in FIG. 2, and details are not described herein again.

FIG. 7 is a schematic diagram of a hardware structure of an abnormality detecting apparatus for voice data according to an embodiment of the present application. The abnormality detecting apparatus 700 includes a memory 701 and a receiver 702, and the memory 701 and the receiver respectively. The processor 703 is configured to store a set of program instructions, and the processor 703 is configured to invoke the program instructions stored in the memory 701 to perform the following operations:

Acquiring target voice data transmitted through the uplink call path;

If yes, it is determined that the target voice data is abnormal.

In an embodiment of the present invention, the processor 703 is further configured to invoke a program instruction stored by the memory 701 to perform the following operations:

If the determining that the target voice data is abnormal, outputting a first prompt, wherein the first prompt is used to prompt the microphone microphone to be blocked by the user;

And when the target voice data acquired after the second duration is abnormal in the frequency domain, if the microphone microphone has only one microphone path, outputting a second prompt, wherein the second prompt is used to prompt the user The microphone path may be faulty;

The acquiring the target voice data transmitted through the uplink call path includes:

And if the target voice data acquired in the second duration is abnormal, the step of outputting the first prompt is continued, where the second duration is the current interval, or the current interval is included At least two consecutive intervals.

In some embodiments, the memory 701, the receiver 702, and the processor 703 included in the abnormality detecting apparatus 700 may be part of a mobile terminal, and the mobile terminal may include a mobile phone, a tablet, a PDA (Personal Digital Assistant, personal Digital Assistant), POS (Point of Sales), on-board computer, etc.

The memory 701 can be used to store software programs and modules, and the processor 703 executes various functional applications and data processing of the mobile terminal by running software programs and modules stored in the memory 701. The memory 701 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may be stored. Data created according to the use of the mobile terminal (such as audio data, phone book, etc.). Further, the memory 701 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

Receiver 702 can receive the user's voice. For example, receiver 702 can include a microphone or other structure that receives user speech. The microphone can convert the collected sound signal into a signal, which is received by the audio circuit and then converted into audio data, and then the audio data is output to an RF circuit for transmission to, for example, another mobile terminal, or the audio data is output to the memory 701 for further deal with.

The processor 703 is a control center of the mobile terminal that connects various parts of the entire mobile terminal using various interfaces and lines, by running or executing software programs and/or modules stored in the memory 701, and calling data stored in the memory 701. The mobile terminal performs various functions and processing data to perform overall monitoring on the mobile terminal. Optionally, the processor 703 may include one or more processing units; preferably, the processor 703 may integrate an application processor and a modem processor, where the application processor mainly processes an operating system, a user interface, an application, and the like. The modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 703.

It can be understood that the abnormality detecting device 700 can further include a radio frequency circuit for receiving and transmitting the user's voice data. For example, the radio frequency circuit can receive and process the downlink voice data sent by the network device, or send the received uplink voice data to the network device, so as to perform services such as normal voice calls.

The abnormality detecting device 700 may include more or less hardware structures as described above, and the specific structure of the abnormality detecting device 700 is not specifically limited in the embodiment of the present invention.

A person skilled in the art can clearly understand that, for the convenience and brevity of the description, the specific working process of the system, the device and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

In the several embodiments provided by the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application, in essence or the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application. The foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .

The above embodiments are only used to explain the technical solutions of the present application, and are not limited thereto; although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still The technical solutions described in the embodiments are modified, or the equivalents of the technical features are replaced by the equivalents. The modifications and substitutions of the embodiments do not depart from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

An abnormality detecting method for voice data, comprising:

Acquiring target voice data transmitted through the uplink call path;

Determining whether the high frequency energy in the target speech data is less than the high frequency energy in the normal speech data by analyzing the magnitude of the low frequency energy or the high frequency energy in the target speech data;

If yes, it is determined that the target voice data is abnormal.
The method according to claim 1, wherein the determining whether the high frequency energy in the target speech data is less than the high frequency in the normal speech data by analyzing the magnitude of the low frequency energy in the target speech data Energy, including:

Obtaining low frequency data in the target voice data by low pass filtering the target voice data;

Calculating a low frequency energy ratio, wherein the low frequency energy ratio is a proportion of a total energy of the low frequency data in the target voice data to a total energy of the target voice data;

If the low frequency energy ratio is greater than the low frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the low frequency occupancy threshold is the normal voice data The total energy of the low frequency data accounts for the proportion of the total energy of the normal voice data.
The method according to claim 1, wherein the determining whether the high frequency energy in the target speech data is lower than the high in the normal speech data by analyzing the magnitude of the high frequency energy in the target speech data Frequency energy, including:

Obtaining high frequency data in the target voice data by performing high-pass filtering on the target voice data;

Calculating a high frequency energy ratio, wherein the high frequency energy ratio is a proportion of a total energy of the high frequency data in the target voice data to a total energy of the target voice data;

If the high frequency energy ratio is less than the high frequency occupancy threshold, determining that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, wherein the high frequency occupancy threshold is The total energy of the high frequency data in the normal voice data accounts for the proportion of the total energy of the normal voice data.
The method according to any one of claims 1 to 3, wherein after the determining the abnormality of the target voice data, the method further comprises:

A first prompt is output, wherein the first prompt is used to prompt the microphone to be blocked by the user.
The method according to claim 4, wherein after the outputting the first prompt, the method further comprises:

Obtaining the target voice data transmitted through the uplink call path according to the first duration of the interval;

If it is determined that the target voice data acquired after the first duration is abnormal, abnormal processing is performed according to the number of the microphone channels of the microphone.
The method according to claim 5, wherein the acquiring the target voice data transmitted via the uplink call path comprises:

Obtaining target voice data collected by each of the microphone channels of the microphone microphone;

Then, the abnormal processing is performed according to the number of the microphone channels of the microphone, including:

If the microphone microphone has only one microphone path, outputting a second prompt, wherein the second prompt is used to prompt the user that the microphone path may be faulty;

If the microphone microphone has at least two mic paths, and the target voice data collected by the partial mic path is abnormal in the frequency domain, another normal mic path is selected for voice call;

If the microphone microphone has at least two mic paths, and the target voice data collected by all the mic paths is abnormal in the frequency domain, outputting a third prompt, wherein the third prompt is used to prompt the user to the microphone path May all fail.
The method according to claim 5, wherein the acquiring the target voice data transmitted via the uplink call path comprises:

Obtaining target voice data transmitted through the uplink call path according to a preset time interval;

Then, the method further includes:

And if the target voice data acquired in the second duration is abnormal, the step of outputting the first prompt is continued, where the second duration is the current interval, or the current interval is included At least two consecutive intervals.
An abnormality detecting device for voice data, comprising:

a data acquiring unit, configured to acquire target voice data transmitted through the uplink call path;

An abnormality detecting unit, configured to determine whether the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data by analyzing the magnitude of the low frequency energy or the high frequency energy in the target voice data; if yes, Determining that the target voice data is abnormal.
The device according to claim 8, wherein the abnormality detecting unit comprises:

a low pass filtering subunit, configured to acquire low frequency data in the target voice data by performing low pass filtering on the target voice data;

a percentage calculation subunit for calculating a low frequency energy ratio, wherein the low frequency energy ratio is a proportion of a total energy of the low frequency data in the target voice data to a total energy of the target voice data;

An abnormality determining subunit, configured to determine that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, if the low frequency energy ratio is greater than a low frequency occupancy threshold, wherein the low frequency ratio The threshold is a proportion of the total energy of the low frequency data in the normal voice data to the total energy of the normal voice data.
The device according to claim 8, wherein the abnormality detecting unit comprises:

a high-pass filtering sub-unit, configured to acquire high-frequency data in the target voice data by performing high-pass filtering on the target voice data;

a ratio calculating subunit for calculating a high frequency energy ratio, wherein the high frequency energy ratio is a proportion of a total energy of the high frequency data in the target voice data to a total energy of the target voice data;

An abnormality determining subunit, configured to determine that the high frequency energy in the target voice data is less than the high frequency energy in the normal voice data, if the high frequency energy ratio is less than a high frequency occupancy threshold, wherein the high The frequency occupancy threshold is a proportion of the total energy of the high frequency data in the normal voice data to the total energy of the normal voice data.
The device according to any one of claims 8 to 10, wherein the device further comprises:

The abnormality prompting unit is configured to output a first prompt if the abnormality detecting unit determines that the target voice data is abnormal, wherein the first prompt is used to prompt the microphone to be blocked by the user.
The device according to any one of claims 11 to 4, wherein the device further comprises:

a clock timing unit, configured to: after the outputting the first prompt, an interval of a first duration, triggering the data acquiring unit to acquire target voice data transmitted through an uplink call path;

The abnormality processing unit is configured to perform abnormal processing according to the number of the microphone channels of the microphone when the abnormality detecting unit determines that the target voice data acquired after the first duration is abnormal.
The device according to claim 11, wherein the data acquisition unit is specifically configured to acquire target voice data collected by each of the microphone channels of the microphone microphone;

The exception processing unit is specifically configured to: when the abnormality detecting unit determines that the target voice data is abnormal after the first duration is determined, if the microphone microphone has only one microphone path, outputting a second prompt The second prompt is used to prompt the user that the microphone path may be faulty; if the microphone microphone has at least two microphone paths, and the target voice data collected by the partial microphone path is abnormal in the frequency domain, then selecting The other normal microphone path performs a voice call; if the microphone microphone has at least two microphone paths, and the target voice data collected by all the microphone channels is abnormal in the frequency domain, a third prompt is output, wherein the third prompt It is used to prompt the user that the microphone path may be completely faulty.
The device according to claim 11, wherein the data acquiring unit is configured to acquire target voice data transmitted through the uplink call path according to a preset time interval;

The exception processing unit is further configured to: if the abnormality detecting unit determines that the target voice data is abnormal in the second duration, trigger the abnormal prompting unit to output a first prompt, where the second The duration is the current interval, or at least two consecutive intervals including the current interval.
An abnormality detecting device for voice data, comprising: a processor, a memory, and a system bus;

The processor and the memory are connected by the system bus;

The memory is for storing one or more programs, the one or more programs including instructions that, when executed by the abnormality detecting device, cause the abnormality detecting device to perform any of claims 1-7 The method described in the item.