CN111787476B

CN111787476B - Audio detection method and device, equipment and storage medium

Info

Publication number: CN111787476B
Application number: CN202010534670.7A
Authority: CN
Inventors: 白金; 严锋贵
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-06-12
Filing date: 2020-06-12
Publication date: 2022-02-01
Anticipated expiration: 2040-06-12
Also published as: CN111787476A

Abstract

The embodiment of the application discloses an audio detection method, an audio detection device, audio detection equipment and a storage medium, wherein the method comprises the following steps: acquiring first test data; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested; and comparing the first test data with the acquired first reference data to obtain a first audio detection result, wherein the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected.

Description

Audio detection method and device, equipment and storage medium

Technical Field

The embodiment of the application relates to electronic technology, and relates to but is not limited to an audio detection method, an audio detection device, audio detection equipment and a storage medium.

Background

In recent years, with the development of the scientific and technological industry, terminals such as notebook computers, tablet computers and smart phones have frequently appeared in daily life. For manufacturers, how to detect the audio quality of the terminal on the production line is always an important issue.

At present, the audio detection process in the prior art mainly measures conventional audio quality indicators, such as frequency response, output wattage, signal-to-noise ratio, and the like. These indicators are not sensitive to noise problems that occur accidentally in the plant and the detection rate is not high. Meanwhile, most of the audio detection schemes in the prior art are not suitable for detecting a terminal with a call function, and are only limited to the detection of local equipment audio processing.

Disclosure of Invention

In view of this, embodiments of the present application provide an audio detection method and apparatus, a device, and a storage medium.

The technical scheme of the embodiment of the application is realized as follows:

in a first aspect, an embodiment of the present application provides an audio detection method, where the method includes:

acquiring first test data; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested;

and comparing the first test data with the acquired first reference data to obtain a first audio detection result, wherein the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected.

In a second aspect, an embodiment of the present application provides an audio detection apparatus, including:

a first obtaining unit configured to obtain first test data; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested;

and the first comparison unit is used for comparing the first test data with the acquired first reference data to obtain a first audio detection result, wherein the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor implements the steps in the audio detection method when executing the program.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the audio detection method described above.

The embodiment of the application provides an audio detection method, an audio detection device, audio detection equipment and a storage medium, wherein first test data are acquired; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested; and comparing the first test data with the acquired first reference data to obtain a first audio detection result, wherein the first audio detection result is an audio detection result of an uplink channel of the first terminal to be tested, so that artificial listening can be replaced, automatic testing of terminal equipment can be completed, and various testing scenes can be covered.

Drawings

Fig. 1 is a first schematic flow chart illustrating an implementation of an audio detection method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a second implementation flow of the audio detection method according to the embodiment of the present application;

fig. 3 is a schematic flow chart illustrating an implementation of an audio detection method according to an embodiment of the present application;

fig. 4 is a schematic flow chart illustrating an implementation of an audio detection method according to an embodiment of the present application;

FIG. 5A is a schematic structural diagram of an audio detection system according to an embodiment of the present application;

fig. 5B is a schematic flowchart illustrating an implementation flow of the audio detection method of the uplink channel according to the embodiment of the present application;

fig. 5C is a schematic flow chart illustrating an implementation of the downlink audio detection method according to the embodiment of the present application;

FIG. 5D is a schematic diagram illustrating an implementation process of the audio noise detection method according to the embodiment of the present application;

fig. 5E is a schematic flowchart illustrating an implementation process of the audio noise determination method according to the embodiment of the present application;

FIG. 5F is a schematic diagram illustrating an implementation principle of an echo cancellation noise determination method according to an embodiment of the present application;

FIG. 6 is a schematic diagram illustrating a structure of an audio detection apparatus according to an embodiment of the present disclosure;

fig. 7 is a hardware entity diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solution of the present application is further elaborated below with reference to the drawings and the embodiments. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning by themselves. Thus, "module", "component" or "unit" may be used mixedly.

It should be noted that the terms "first \ second \ third" referred to in the embodiments of the present application are only used for distinguishing similar objects, and do not represent a specific ordering for the objects.

The embodiment of the application provides an audio detection method, which is applied to an electronic device, and the functions realized by the method can be realized by calling a program code through a processor in the electronic device, and certainly, the program code can be stored in a storage medium of the electronic device. Fig. 1 is a schematic flow chart of an implementation process of an audio detection method according to an embodiment of the present application, as shown in fig. 1, the method includes:

s101, acquiring first test data; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested;

here, the electronic device may be various types of devices having information processing capability, such as a smart phone, a PDA (Personal Digital Assistant), a navigator, a Digital phone, a video phone, a smart watch, a smart band, a wearable device, a tablet computer, a kiosk, and the like.

For example, the uplink of the first terminal under test may perform operations such as noise reduction, coding, and gain control on the first test data.

Step S102, comparing the first test data with the acquired first reference data to obtain a first audio detection result, wherein the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected.

In some embodiments, the step S102 of comparing the first test data with the acquired first reference data to obtain a first audio detection result may be implemented by the following steps:

step S1021, determining whether noise exists in the first test data according to the first test data and the acquired first reference data;

step S1022, if there is a noise in the first test data, determining the category and time point information of the noise in the first test data.

In the embodiment of the application, first test data are obtained; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested; and comparing the first test data with the acquired first reference data to obtain a first audio detection result, wherein the first audio detection result is an audio detection result of an uplink channel of the first terminal to be tested, so that artificial listening can be replaced, automatic testing of terminal equipment can be completed, and various testing scenes can be covered.

Based on the foregoing embodiments, an embodiment of the present application further provides an audio detection method, and fig. 2 is a schematic flow chart illustrating an implementation of the audio detection method according to the embodiment of the present application, where as shown in fig. 2, the method includes:

step S201, acquiring first test data; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested;

step S202, comparing the first test data with the acquired first reference data to obtain a first audio detection result, wherein the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected;

in some embodiments, the method further comprises:

step S21a, acquiring an original audio; the original audio is the audio played through a standard sound box under a preset environment;

step S22a, determining first reference data according to the original audio; the first reference data is audio data obtained after the original audio is processed by an uplink path of a first specific terminal.

Step S203, acquiring intermediate data under the condition that the first audio detection result is passed; the intermediate data is first test data transmitted through a communication network;

step S204, acquiring second test data according to the intermediate data; the second test data is audio data of the intermediate data processed by a downlink channel of a second terminal to be tested;

step S205, comparing the second test data with the obtained second reference data to obtain a second audio detection result, where the second audio detection result is an audio detection result of a downlink channel of the second terminal to be tested.

In some embodiments, the method further comprises:

step S21b, obtaining a call audio; when the call audio is the call between the first specific terminal and the second specific terminal, the first reference data is transmitted to the audio data of the second specific terminal through a communication network;

step S22b, determining second reference data according to the call audio; and the second reference data is audio data obtained after the call audio is processed by a downlink channel of the second specific terminal.

Here, the above steps S201 to S205 can implement two scenarios including call scenario audio detection (including audio up and down processing scenarios) and local device audio processing detection. For example, when the first terminal to be tested and the second terminal to be tested are mobile terminals having a call function, the steps S201 to S202 may perform audio detection on the uplink of the first terminal to be tested, and the steps S203 to S205 may perform audio detection on the downlink of the second terminal to be tested. Of course, as for the local device, after recording with its own microphone, the detection of the local device audio processing can be realized through the above steps S201 to S202.

In the embodiment of the application, first test data are obtained; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested; comparing the first test data with the acquired first reference data to obtain a first audio detection result, wherein the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected; acquiring intermediate data under the condition that the first audio detection result is passed; the intermediate data is first test data transmitted through a communication network; acquiring second test data according to the intermediate data; the second test data is audio data of the intermediate data processed by a downlink channel of a second terminal to be tested; and comparing the second test data with the acquired second reference data to obtain a second audio detection result, wherein the second audio detection result is an audio detection result of a downlink of the second terminal to be tested, so that artificial listening can be replaced, automatic test of the terminal equipment can be completed, and various test scenes can be covered.

Based on the foregoing embodiments, an embodiment of the present application further provides an audio detection method, and fig. 3 is a schematic flow chart illustrating an implementation of the audio detection method according to the embodiment of the present application, where as shown in fig. 3, the method includes:

s301, acquiring first test data; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested;

step S302, determining whether noise exists in the first test data according to the first test data and the acquired first reference data;

step S303, if the first test data has noise, determining the category and time point information of the noise in the first test data;

step S304, determining whether noise exists and the category and time point information of the noise as a first audio detection result; the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected;

step S305, acquiring intermediate data under the condition that the first audio detection result is passed; the intermediate data is first test data transmitted through a communication network;

here, when the first audio detection result is a pass, it means that the determined first test data does not have a noise, and the first audio detection result passes.

S306, acquiring second test data according to the intermediate data; the second test data is audio data of the intermediate data processed by a downlink channel of a second terminal to be tested;

step S307, comparing the second test data with the obtained second reference data to obtain a second audio detection result, where the second audio detection result is an audio detection result of a downlink channel of the second terminal to be tested.

In the embodiment of the application, first test data are obtained; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested; determining whether noise exists in the first test data or not according to the first test data and the acquired first reference data; if the first test data has the noise, determining the category and time point information of the noise in the first test data; determining whether noise exists and the category and time point information of the noise as a first audio detection result; the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected; acquiring intermediate data under the condition that the first audio detection result is passed; the intermediate data is first test data transmitted through a communication network; acquiring second test data according to the intermediate data; the second test data is audio data of the intermediate data processed by a downlink channel of a second terminal to be tested; and comparing the second test data with the acquired second reference data to obtain a second audio detection result, wherein the second audio detection result is an audio detection result of a downlink channel of the second terminal to be tested, so that artificial listening can be replaced, automatic testing of the terminal equipment can be completed, various test scenes can be covered, and the existence, the category and the occurrence time point of the noise signal can be detected.

Based on the foregoing embodiments, an embodiment of the present application further provides an audio detection method, where the method includes:

step S311, acquiring first test data; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested;

step S312, determining whether noise exists in the first test data according to the first test data and the acquired first reference data;

step S313, if the first test data has noise, determining the category of the noise in the first test data through a deep neural network;

step S314, matching the first test data with a preset noise template library to determine the time point information of noise occurrence in the first test data;

here, the library of murmur templates may contain samples of different mur type data.

Step S315, determining whether noise exists and the category and time point information of the noise as a first audio detection result; the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected;

step S316, acquiring intermediate data under the condition that the first audio detection result is passed; the intermediate data is first test data transmitted through a communication network;

Step S317, second test data is obtained according to the intermediate data; the second test data is audio data of the intermediate data processed by a downlink channel of a second terminal to be tested;

step S318, comparing the second test data with the obtained second reference data to obtain a second audio detection result, where the second audio detection result is an audio detection result of a downlink channel of the second terminal to be tested.

Based on the foregoing embodiment, an embodiment of the present application further provides an audio detection method, and fig. 4 is a schematic diagram of an implementation flow of the audio detection method according to the embodiment of the present application, as shown in fig. 4, the method includes:

s401, acquiring first test data; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested;

step S402, performing frame windowing processing on the first test data and the acquired first reference data respectively;

step S403, respectively extracting the pitch features of the processed first test data and the processed first reference data;

step S404, determining Euclidean distance between the pitch feature of the first test data and the pitch feature of the first reference data;

step S405, if the Euclidean distance is larger than or equal to a first preset value, determining that noise exists in the first test data;

step S406, if the first test data has noise, determining the category and time point information of the noise in the first test data;

step S407, determining whether noise exists and the type and time point information of the noise as a first audio detection result; the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected;

step S408, acquiring intermediate data under the condition that the first audio detection result is passed; the intermediate data is first test data transmitted through a communication network;

Step S409, acquiring second test data according to the intermediate data; the second test data is audio data of the intermediate data processed by a downlink channel of a second terminal to be tested;

step S410, comparing the second test data with the obtained second reference data to obtain a second audio detection result, where the second audio detection result is an audio detection result of a downlink channel of the second terminal to be tested.

step S411, acquiring first test data; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested;

step S412, respectively carrying out Fourier transform on the first test data and the acquired first reference data;

step S413, determining a coherence coefficient of the transformed first test data and the transformed first reference data in a frequency domain;

step S414, if the coherence coefficient is less than or equal to a second preset value, determining that noise exists in the first test data;

step S415, if the first test data has noise, determining the category and time point information of the noise in the first test data;

step S416, determining whether noise exists, and the type and time point information of the noise as a first audio detection result; the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected;

step S417, under the condition that the first audio detection result is passed, acquiring intermediate data; the intermediate data is first test data transmitted through a communication network;

Step S418, acquiring second test data according to the intermediate data; the second test data is audio data of the intermediate data processed by a downlink channel of a second terminal to be tested;

step S419 compares the second test data with the obtained second reference data to obtain a second audio detection result, where the second audio detection result is an audio detection result of a downlink channel of the second terminal to be tested.

step S421, acquiring first test data; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested;

step S422, determining the acquired first reference data as a reference signal in echo cancellation;

step S423, determining the first test data as a microphone signal in echo cancellation;

step S424, the reference signal and the microphone signal are subjected to cancellation processing through a filter, and a residual signal is obtained;

step S425, if the residual signal is larger than or equal to a third preset value, determining that noise exists in the first test data;

step S426, if the first test data has noise, determining the category and time point information of the noise in the first test data;

step S427, determining whether noise exists, and the type and time point information of the noise as a first audio detection result; the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected;

step 428, acquiring intermediate data when the first audio detection result is passed; the intermediate data is first test data transmitted through a communication network;

Step S429, acquiring second test data according to the intermediate data; the second test data is audio data of the intermediate data processed by a downlink channel of a second terminal to be tested;

step S430, comparing the second test data with the obtained second reference data to obtain a second audio detection result, where the second audio detection result is an audio detection result of a downlink channel of the second terminal to be tested.

step S431, acquiring first test data; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested;

step S432, when any two of the following exist, determining that there is a noise in the first test data:

the Euclidean distance between the gene characteristic of the first test data and the pitch characteristic of the acquired first reference data is greater than or equal to the first preset value;

the coherence coefficient between the first test data and the first reference data is less than or equal to the second preset value;

a residual signal in an echo cancellation result of the first test data and the first reference data is greater than or equal to the third preset value;

step S433, if there is a noise in the first test data, determining the category and time point information of the noise in the first test data;

step S434, determining whether there is a noise, and the category and time point information of the noise as a first audio detection result; the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected;

step S435, acquiring intermediate data under the condition that the first audio detection result is passed; the intermediate data is first test data transmitted through a communication network;

Step S436, acquiring second test data according to the intermediate data; the second test data is audio data of the intermediate data processed by a downlink channel of a second terminal to be tested;

step S437, comparing the second test data with the obtained second reference data to obtain a second audio detection result, where the second audio detection result is an audio detection result of a downlink channel of the second terminal to be detected.

step S441, acquiring first test data; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested;

step S442, if the euclidean distance between the gene feature of the first test data and the pitch feature of the acquired first reference data is greater than or equal to the first preset value, the coherence coefficient between the first test data and the first reference data is less than or equal to the second preset value, and the residual signal in the echo cancellation result of the first test data and the first reference data is greater than or equal to the third preset value, determining that there is a noise in the first test data;

step S443, if there is a noise in the first test data, determining the category and time point information of the noise in the first test data;

step S444, determining whether noise exists and the category and time point information of the noise as a first audio detection result; the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected;

step S445, acquiring intermediate data under the condition that the first audio detection result is passed; the intermediate data is first test data transmitted through a communication network;

Step S446, acquiring second test data according to the intermediate data; the second test data is audio data of the intermediate data processed by a downlink channel of a second terminal to be tested;

step S447, comparing the second test data with the obtained second reference data to obtain a second audio detection result, where the second audio detection result is an audio detection result of a downlink channel of the second terminal to be tested.

Based on the foregoing embodiments, an audio detection system is provided in an embodiment of the present application, where the audio detection system can process two scenarios including call scenario audio detection and local device audio detection.

Fig. 5A is a schematic structural diagram of an audio detection system according to an embodiment of the present application, and as shown in fig. 5A, the audio detection system 50 includes: mobile device 51, mobile device 52, standard speaker 53, standard microphone 54, mute box 55, mute box 56, software detection means 57. Wherein the mobile device 51 comprises at least an uplink 511, the mobile device 52 comprises at least a downlink 521, the mobile device 51 and the standard speaker 53 are arranged in the mute box 55, and the mobile device 52 and the standard microphone 54 are arranged in the mute box 56.

Correspondingly, the audio detection system 50 may collect the reference signal by:

step S501, the mobile device 51 makes a call to the mobile device 52; wherein, the call content of the telephone is the standard sound source played by the standard loudspeaker 53;

here, the mobile device 51 and the mobile device 52 are standard mobile devices having a call function.

Step S502, recording the standard sound source through a microphone of the mobile device 51;

step S503, sending the recorded audio data to the uplink 511 for processing, so as to obtain first audio data;

step S504, if the first audio data is not abnormal, the first audio data is determined as a first reference signal;

step S505, sending the first audio data transmitted through the communication network to the downlink 521 for processing, and playing the processed first audio data through a speaker carried by the mobile device 52;

step S506, recording the played first audio data with the standard microphone 54 to obtain second audio data;

step S507, if the second audio data is not abnormal, determining the second audio data as a second reference signal.

Thus, after the first reference signal is obtained by the method, the audio detection of the uplink channel may be performed on the device to be detected, fig. 5B is a schematic flow chart of an implementation of the audio detection method of the uplink channel in the embodiment of the present application, and as shown in fig. 5B, the method includes:

step S511, controlling the standard loudspeaker 53 to play a standard sound source;

here, the standard sound source in step S511 and the standard sound source in step S501 are the same fixed standard sound source.

Step S512, controlling a microphone of the mobile device 51 to record the standard sound source, so as to obtain an input signal of the uplink 511;

here, the mobile device 51 is a mobile device to be detected having a call function.

Step S513, controlling the uplink path 511 to process the input signal to obtain a first signal to be tested;

step S514, determining an audio detection result of the uplink path 511 according to the first signal to be tested and the first reference signal.

Similarly, after the second reference signal is obtained by the above method, downlink audio detection may be performed on the device to be detected, and fig. 5C is a schematic flow chart of an implementation of the downlink audio detection method in this embodiment of the present application, and as shown in fig. 5C, the method includes:

step S521, controlling the mobile device 52 to receive the audio data output by the mobile device 51 to obtain an input signal of the downlink 521;

here, the mobile device 52 is a mobile device to be detected having a call function.

In this embodiment, the mobile device 51 may be a standard mobile device, and may also be a mobile device to be tested. When the mobile device 51 is a standard mobile device, the audio data output by the mobile device 51 is the first audio data transmitted through the communication network in step S505. When the mobile device 51 is a mobile device to be tested, the audio data output by the mobile device 51 is the signal of the first signal to be tested in step S513 after being transmitted through the communication network, and the first signal to be tested is a test signal passing the detection.

Step S522, controlling the downlink 521 to process the input signal to obtain a second signal to be tested;

step S523, controlling the standard microphone 54 to record the second signal to be tested;

step S524, determining an audio detection result of the downlink 521 according to the recorded second signal to be tested and the second reference signal.

Therefore, the automatic test of the audio quality of the terminal equipment can be realized by the audio detection system and the audio detection method, and the automatic test flow comprises the following steps: and step S511 to step S514 are realized by software control, the first signal to be tested and the first reference signal are sent to a software detection device 57 for detection, if the detection is abnormal, the test is stopped, it is determined that a problem exists in an audio uplink processing channel of the device to be tested, and relevant records are saved. And if the detection of the audio uplink processing channel is not abnormal, continuously adopting software control to realize the steps from S521 to S524, sending the second signal to be tested and the second reference signal to a software detection device 57 for detection, if the detection is abnormal, stopping the test, judging that the audio downlink processing channel of the device to be tested has a problem, and storing related records.

Fig. 5D is a schematic view of an implementation flow of the audio noise detection method according to the embodiment of the present application, and as shown in fig. 5D, the method includes:

step S531, acquiring a signal to be tested and a reference signal;

step S532, comparing the signal to be tested with a reference signal and determining whether noise exists in the signal to be tested;

here, if there is a noise in the signal to be tested, step S533 is executed, otherwise, the detection procedure is ended.

Step S533, determining the category of the noise in the signal to be tested through a deep neural network;

step S534, the signal to be tested is matched with a preset noise template library so as to determine the time point of the noise occurrence;

and step S535, saving the category and the occurrence time point of the noise.

In the embodiment of the present application, when the signal to be tested is the first signal to be tested in fig. 5B, the reference signal is the first reference signal in step S504. When the signal to be tested is the second signal to be tested in fig. 5C, the reference signal is the second reference signal in step S507.

Fig. 5E is a schematic view of an implementation flow of the audio noise determination method according to the embodiment of the present application, and as shown in fig. 5E, the method includes:

step S541, inputting a signal to be tested and a reference signal;

step S542, respectively extracting pitch characteristics of the signal to be tested and the reference signal, and comparing to obtain a first judgment result;

step S543, determining coherence between the signal to be tested and the reference signal to obtain a second determination result;

step S544, echo cancellation is carried out on the signal to be tested and the reference signal, and a third judgment result is determined according to a residual signal obtained by the cancellation;

step S545, determining whether there is a noise in the signal to be tested according to the first determination result, the second determination result, and the third determination result.

In some embodiments, the step S542, divideThe pitch characteristics of the signal to be tested and the reference signal are extracted respectively and compared to obtain a first judgment result, which can be realized in the following way: assuming that a reference signal is x and a signal to be tested is y, performing frame windowing on the reference signal and the signal to be tested respectively, and generally, the frame length can be 20ms (milliseconds), and the frame overlap is 10ms (milliseconds). Performing feature extraction operation on the processed signal, extracting the fundamental tone feature of the signal to be tested, and obtaining a feature vector P_y＝{t₁,t₂,t₃,...t_mAnd f, wherein t is the feature in the pitch feature vector corresponding to the signal to be tested, and m is the frame number. Simultaneously, extracting the fundamental tone feature of the reference signal to obtain a feature vector P_x＝{f₁,f₂,f₃,...f_mAnd f is the feature in the pitch feature vector corresponding to the signal to be tested, and m is the frame number. Further, P can be determined using equation (1)_xAnd P_yComparing the Euclidean distance rho with a preset threshold value to judge whether noise exists or not based on the fundamental tone characteristics. Wherein, t in the formula (1)_i,f_iRepresenting the pitch features extracted from a frame of signal.

Here, when the euclidean distance is greater than a preset threshold value, it may be determined that noise exists in the signal to be tested.

In some embodiments, the step S543 of determining coherence between the signal to be tested and the reference signal to obtain the second determination result may be implemented by: assuming that the reference signal is x and the signal to be tested is y, respectively performing Fast Fourier Transform (FFT) on the signal to be tested and the reference signal, calculating coherence of the two signals in a frequency domain through a formula (2), and calculating an average value of coherence coefficients of the signals in the frequency domain through a formula (3).

Wherein N represents the length of FFT operation, k is the frequency index value,

representing the cross-power spectrum of the reference signal and the signal to be tested,

a self-power spectrum representing the reference signal,

representing the self-power spectrum of the signal to be tested.

Here, when the average value of the coherence coefficients is less than a preset threshold value, it may be determined that there is a noise in the signal to be tested.

In some embodiments, the steps S543 to S544 can be implemented by fig. 5F. Fig. 5F is a schematic diagram illustrating an implementation principle of the echo cancellation noise determination method according to the embodiment of the present application, and as shown in fig. 5F, the reference signal in step S541 is used as a reference signal of an echo canceller 551, the signal to be tested in step S541 is used as a microphone signal in the echo canceller, the reference signal and the signal to be tested are subjected to cancellation processing by an adaptive filter 552, and then a residual signal is obtained, so that the residual signal can be compared with a preset threshold value, and whether noise exists or not based on echo cancellation can be determined. Wherein, the coefficient updating module 553 is configured to update the coefficients of the adaptive filter 552.

Here, when the residual signal is greater than a preset threshold, it may be determined that there is a noise in the signal to be tested.

In the embodiment of the application, the audio detection system and the related audio detection method can realize the automatic test of the audio, replace artificial listening, and finish the aging test of the equipment. And, easily cover multiple test scenes, including audio up, down processing scenes, local equipment audio processing scenes, etc. Meanwhile, the existence, the category and the occurrence time point of the noise signals can be detected.

It should be noted that the application of the technical solution in the embodiment of the present application is not limited to mobile devices, and all devices that require audio detection may be applied. If the device to be tested does not have the microphone and the loudspeaker, the scheme can be applied by externally connecting the loudspeaker and the microphone.

In addition, the presence or absence of the noise may be a logical combination of the results of one or two of the three methods of pitch feature determination, correlation determination, and echo cancellation in fig. 5E.

Based on the foregoing embodiments, an audio detection apparatus is provided in an embodiment of the present application, where the apparatus includes units, modules included in the units, and components included in the modules, and may be implemented by a processor in an electronic device; of course, the implementation can also be realized through a specific logic circuit; in the implementation process, the processor may be a CPU (Central Processing Unit), an MPU (Microprocessor Unit), a DSP (Digital Signal Processing), an FPGA (Field Programmable Gate Array), or the like.

Fig. 6 is a schematic structural diagram of an audio detection apparatus according to an embodiment of the present application, and as shown in fig. 6, the apparatus 600 includes:

a first obtaining unit 601 configured to obtain first test data; the first test data is audio data of original audio processed by an uplink channel of the first terminal to be tested;

a first comparing unit 602, configured to compare the first test data with the obtained first reference data to obtain a first audio detection result, where the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected.

In some embodiments, the apparatus further comprises:

the second acquisition unit is used for acquiring intermediate data under the condition that the first audio detection result is passed; the intermediate data is first test data transmitted through a communication network;

a third obtaining unit, configured to obtain second test data according to the intermediate data; the second test data is audio data of the intermediate data processed by a downlink channel of a second terminal to be tested;

and the second comparison unit is used for comparing the second test data with the acquired second reference data to obtain a second audio detection result, wherein the second audio detection result is an audio detection result of a downlink channel of the second terminal to be detected.

In some embodiments, the first comparing unit 602 includes:

the first determining module is used for determining whether noise exists in the first test data according to the first test data and the acquired first reference data;

and the second determining module is used for determining the category and the time point information of the noise in the first test data if the noise exists in the first test data.

In some embodiments, the second determining module comprises:

a first determining component for determining a category of a noise in the first test data by a deep neural network;

and the second determining component is used for matching the first test data with a preset noise template library so as to determine the time point information of the noise occurrence in the first test data.

In some embodiments, the first determining module comprises:

a frame windowing component, configured to perform frame windowing on the first test data and the first reference data respectively;

a feature extraction unit configured to extract pitch features of the processed first test data and the processed first reference data, respectively;

distance determining means for determining a euclidean distance between the pitch feature of the first test data and the pitch feature of the first reference data;

and a third determining unit configured to determine that a noise exists in the first test data if the euclidean distance is greater than or equal to a first preset value.

In some embodiments, the first determining module comprises:

fourier transform means for performing fourier transform on the first test data and the first reference data, respectively;

a correlation determination unit for determining a coherence coefficient of the transformed first test data and the transformed first reference data in the frequency domain;

a fourth determining unit configured to determine that a noise exists in the first test data if the coherence coefficient is less than or equal to a second preset value.

In some embodiments, the first determining module comprises:

echo determination means for determining the first reference data as a reference signal in echo cancellation;

the echo determination component is further configured to determine the first test data as a microphone signal in echo cancellation;

the echo cancellation component is used for carrying out cancellation processing on the reference signal and the microphone signal through a filter to obtain a residual signal;

fifth determining means for determining that there is a noise in the first test data if the residual signal is greater than or equal to a third preset value.

In some embodiments, the first determining module comprises:

sixth determining means for determining that a noise is present in the first test data when any two of:

the Euclidean distance between the gene feature of the first test data and the pitch feature of the first reference data is greater than or equal to the first preset value;

and residual signals in the echo cancellation results of the first test data and the first reference data are greater than or equal to the third preset value.

In some embodiments, the first determining module comprises:

seventh determining means for determining that there is a noise in the first test data if a euclidean distance between a gene feature of the first test data and a pitch feature of the first reference data is equal to or greater than the first preset value, a coherence coefficient between the first test data and the first reference data is equal to or less than the second preset value, and a residual signal in an echo cancellation result of the first test data and the first reference data is equal to or greater than the third preset value.

The above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

It should be noted that, in the embodiment of the present application, if the audio detection method is implemented in the form of a software functional module and sold or used as a standalone product, the audio detection method may also be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for causing an electronic device (which may be a personal computer, a server, etc.) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a ROM (Read Only Memory), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

Correspondingly, an embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory stores a computer program that can be run on the processor, and the processor executes the computer program to implement the steps in the audio detection method provided in the foregoing embodiment.

Correspondingly, the embodiment of the present application provides a readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps in the audio detection method.

Here, it should be noted that: the above description of the storage medium and device embodiments is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and apparatus of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

It should be noted that fig. 7 is a schematic diagram of a hardware entity of an electronic device according to an embodiment of the present application, and as shown in fig. 7, the hardware entity of the electronic device 700 includes: a processor 701, a communication interface 702, and a memory 703, wherein

The processor 701 generally controls the overall operation of the electronic device 700.

The communication interface 702 may enable the electronic device 700 to communicate with other terminals or servers via a network.

The Memory 703 is configured to store instructions and applications executable by the processor 701, and may also buffer data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor 701 and modules in the electronic device 700, and may be implemented by a FLASH Memory (FLASH Memory) or a Random Access Memory (RAM).

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional units in the embodiments of the present application may be integrated into one processing module, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit. Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments. Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict. The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for audio detection, the method comprising:

comparing the first test data with the acquired first reference data to obtain a first audio detection result, wherein the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected; the first reference data is audio data obtained after the original audio is processed by an uplink channel of a first specific terminal;

acquiring intermediate data under the condition that the first audio detection result is passed; the intermediate data is first test data transmitted through a communication network;

acquiring second test data according to the intermediate data; the second test data is audio data of the intermediate data processed by a downlink channel of a second terminal to be tested;

comparing the second test data with the acquired second reference data to obtain a second audio detection result, wherein the second audio detection result is an audio detection result of a downlink channel of the second terminal to be detected; the second reference data is audio data obtained after the call audio is processed through a downlink channel of a second specific terminal; and when the call audio is the call between the first specific terminal and the second specific terminal, the first reference data is transmitted to the audio data of the second specific terminal through a communication network.

2. The method of claim 1, wherein comparing the first test data with the obtained first reference data to obtain a first audio detection result comprises:

determining whether noise exists in the first test data or not according to the first test data and the acquired first reference data;

and if the first test data has the noise, determining the category and the time point information of the noise in the first test data.

3. The method of claim 2, wherein said determining the category and time point information of the noise in the first test data if the noise exists in the first test data comprises:

determining, by a deep neural network, a category of a noise in the first test data;

and matching the first test data with a preset noise template library to determine the time point information of the noise occurrence in the first test data.

4. The method of claim 2, wherein said determining whether a noise is present in said first test data based on said first test data and said obtained first reference data comprises:

respectively performing frame windowing on the first test data and the first reference data;

respectively extracting the pitch characteristics of the processed first test data and the processed first reference data;

determining a Euclidean distance between a pitch feature of the first test data and a pitch feature of the first reference data;

and if the Euclidean distance is larger than or equal to a first preset value, determining that noise exists in the first test data.

5. The method of claim 2, wherein said determining whether a noise is present in said first test data based on said first test data and said obtained first reference data comprises:

performing Fourier transform on the first test data and the first reference data respectively;

determining the coherence coefficient of the transformed first test data and the transformed first reference data in the frequency domain;

and if the coherence coefficient is less than or equal to a second preset value, determining that noise exists in the first test data.

6. The method of claim 2, wherein said determining whether a noise is present in said first test data based on said first test data and said obtained first reference data comprises:

determining the first reference data as a reference signal in echo cancellation;

determining the first test data as a microphone signal in echo cancellation;

performing cancellation processing on the reference signal and the microphone signal through a filter to obtain a residual signal;

and if the residual signal is greater than or equal to a third preset value, determining that noise exists in the first test data.

7. The method of claim 2, wherein said determining whether a noise is present in said first test data based on said first test data and said obtained first reference data comprises:

determining that a noise is present in the first test data when any two of:

the Euclidean distance between the gene characteristic of the first test data and the pitch characteristic of the first reference data is greater than or equal to a first preset value;

the coherence coefficient between the first test data and the first reference data is less than or equal to a second preset value;

and residual signals in the echo cancellation results of the first test data and the first reference data are greater than or equal to a third preset value.

8. The method of claim 2, wherein said determining whether a noise is present in said first test data based on said first test data and said obtained first reference data comprises:

and if the Euclidean distance between the gene feature of the first test data and the fundamental tone feature of the first reference data is greater than or equal to a first preset value, the coherence coefficient between the first test data and the first reference data is less than or equal to a second preset value, and a residual signal in the echo cancellation result of the first test data and the first reference data is greater than or equal to a third preset value, determining that the first test data has noise.

9. An audio detection apparatus, characterized in that the apparatus comprises:

the first comparison unit is used for comparing the first test data with the acquired first reference data to obtain a first audio detection result, wherein the first audio detection result is an audio detection result of an uplink channel of the first terminal to be detected; the first reference data is audio data obtained after the original audio is processed by an uplink channel of a first specific terminal;

the second comparison unit is used for comparing the second test data with the acquired second reference data to obtain a second audio detection result, wherein the second audio detection result is an audio detection result of a downlink channel of the second terminal to be detected; the second reference data is audio data obtained after the call audio is processed through a downlink channel of a second specific terminal; and when the call audio is the call between the first specific terminal and the second specific terminal, the first reference data is transmitted to the audio data of the second specific terminal through a communication network.

10. An electronic device comprising a memory and a processor, the memory storing a computer program operable on the processor, the processor implementing the steps in the audio detection method of any of claims 1 to 8 when executing the program.

11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the audio detection method of any one of claims 1 to 8.