CN111770427A - Microphone array detection method, device, equipment and storage medium - Google Patents

Microphone array detection method, device, equipment and storage medium Download PDF

Info

Publication number
CN111770427A
CN111770427A CN202010588457.4A CN202010588457A CN111770427A CN 111770427 A CN111770427 A CN 111770427A CN 202010588457 A CN202010588457 A CN 202010588457A CN 111770427 A CN111770427 A CN 111770427A
Authority
CN
China
Prior art keywords
signal
audio
microphone array
detected
audio signals
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010588457.4A
Other languages
Chinese (zh)
Other versions
CN111770427B (en
Inventor
陈扬坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202010588457.4A priority Critical patent/CN111770427B/en
Publication of CN111770427A publication Critical patent/CN111770427A/en
Application granted granted Critical
Publication of CN111770427B publication Critical patent/CN111770427B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R29/00Monitoring arrangements; Testing arrangements
    • H04R29/004Monitoring arrangements; Testing arrangements for microphones
    • H04R29/005Microphone arrays
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0224Processing in the time domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The application provides a detection method, a detection device, equipment and a storage medium of a microphone array. The method comprises the steps of determining whether audio signals contain illegal sound signals or not according to audio features of the audio signals collected by a microphone array, extracting features of the audio signals when the audio signals contain the illegal sound signals to obtain time-frequency features of each frame of audio signals, wherein the time-frequency features are used for indicating frequency domain amplitude features and time domain energy features of the audio signals, inputting the time-frequency features of each frame of audio signals into a microphone abnormity detection model obtained through pre-training, obtaining a detection result whether each microphone in the microphone array is abnormal or not, and improving accuracy of detection of the microphone array for collecting the illegal sound.

Description

Microphone array detection method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of audio device detection technologies, and in particular, to a method, an apparatus, a device, and a storage medium for detecting a microphone array.
Background
With the continuous development of monitoring equipment, the monitoring equipment is applied to a plurality of fields, and not only can provide video image monitoring but also audio monitoring.
The monitoring equipment for audio monitoring generally adopts a microphone array to collect audio data, and in an actual application scene, taking illegal sound snapshot applied to a road network as an example, the microphone array is influenced by factors such as outdoor wind and rain, dust, electromagnetic interference and the like in the using process, so that part of microphones can not normally collect sound easily, and the performance of the microphone array for snapshot of the illegal sound is rapidly reduced, therefore, the state of the microphone array needs to be detected, and the microphones in the microphone array can normally work. In the prior art, the detection of the microphone array is performed by calculating a feature difference value of each path of audio signal, and when the feature difference value is greater than a preset threshold value, it is determined that a microphone corresponding to the path of audio signal is abnormal.
However, the microphone array provided in the prior art is generally used for detecting a microphone array that collects short-distance sounds, such as a microphone array in a mobile phone, and when the microphone array needs to collect long-distance sounds or sounds in the surrounding environment are complex, the state of the microphone array is determined only by the size of the feature difference value, which is easy to cause false detection, resulting in poor detection accuracy.
Disclosure of Invention
The application provides a detection method, a device, equipment and a storage medium of a microphone array, which can detect the microphone array applied to a complex environment and improve the detection accuracy.
In a first aspect, an embodiment of the present application provides a method for detecting a microphone array, including:
determining whether the audio signals contain illegal sound signals according to the audio characteristics of the audio signals collected by the microphone array;
when the audio signals contain illegal sound signals, performing feature extraction on the audio signals to obtain time-frequency features of each frame of audio signals, wherein the time-frequency features are used for indicating frequency domain amplitude features and time domain energy features of the audio signals;
and inputting the time-frequency characteristics of each frame of audio signal into a microphone abnormality detection model obtained by pre-training to obtain a detection result of whether each microphone in the microphone array is abnormal or not.
In a second aspect, an embodiment of the present application provides a detection apparatus for a microphone array, where the apparatus includes:
the first processing module is used for determining whether the audio signals contain illegal sound signals according to the audio characteristics of the audio signals collected by the microphone array;
the second processing module is used for extracting the characteristics of the audio signals when the audio signals contain illegal sound signals to obtain the time-frequency characteristics of each frame of audio signals, and the time-frequency characteristics are used for indicating the frequency domain amplitude characteristics and the time domain energy characteristics of the audio signals;
and the third processing module is used for inputting the time-frequency characteristics of each frame of audio signal into the microphone anomaly detection model obtained by pre-training to obtain the detection result of whether each microphone in the microphone array is abnormal or not.
In a third aspect, an embodiment of the present application provides a monitoring device, including: a microphone array, a memory, and a processor;
the microphone array collects audio signals in a target area;
the memory stores computer-executable instructions;
the processor executes the computer-executable instructions stored by the memory, so that the processor executes the microphone array detection method according to any one of the first aspect.
In a fourth aspect, an embodiment of the present application provides a server, including: a memory and a processor;
the memory stores computer-executable instructions;
the processor executes the computer-executable instructions stored by the memory, so that the processor executes the microphone array detection method according to any one of the first aspect.
According to the embodiment of the application, the microphone array is detected when illegal sound signals are contained in the multi-channel audio signals collected by the microphone array, and when the characteristics of the audio signals collected by each microphone are extracted, the frequency domain amplitude characteristics and the time domain energy characteristics are extracted simultaneously, the characteristics of the audio signals can be reflected as comprehensively as possible by various time-frequency characteristics, the states of all the microphones are detected by using the microphone abnormity detection model obtained through training of a large number of samples in advance, and the detection accuracy of the microphone array is improved under the scene that the microphone array collects the illegal sound.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of a detection method of a microphone array according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of magnitude spectrum data of an audio signal according to an embodiment of the present application;
fig. 4 is a schematic diagram of a binarized magnitude spectrum of an audio signal provided in an embodiment of the present application;
fig. 5 is a schematic flowchart of a microphone anomaly detection model training process according to an embodiment of the present disclosure;
fig. 6 is a schematic flowchart of a detection method of a microphone array according to an embodiment of the present disclosure;
fig. 7 is a schematic flowchart of a detection method of a microphone array according to an embodiment of the present disclosure;
fig. 8 is a schematic diagram of a feature extraction process provided in an embodiment of the present application;
fig. 9 is a schematic flowchart of illegal sound signal identification according to an embodiment of the present disclosure;
fig. 10 is a schematic flowchart of an illegal sound signal recognition model training process according to an embodiment of the present disclosure;
fig. 11 is a schematic flowchart illustrating an optimization of an illegal sound signal identification model according to an embodiment of the present application;
fig. 12 is a schematic flowchart of a detection method of a microphone array according to an embodiment of the present disclosure;
fig. 13 is a schematic flow chart illustrating a process of determining the effectiveness of picking up a signal according to an embodiment of the present application;
fig. 14 is a schematic structural diagram of a detection apparatus of a microphone array according to an embodiment of the present disclosure;
fig. 15 is a schematic structural diagram of another detection apparatus for a microphone array according to an embodiment of the present disclosure;
fig. 16 is a schematic structural diagram of another detection apparatus for a microphone array according to an embodiment of the present disclosure;
fig. 17 is a schematic hardware structure diagram of a monitoring device according to an embodiment of the present application;
fig. 18 is a schematic hardware structure diagram of a server according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
First, a scenario in which the embodiment of the present application is applied will be briefly described. The monitoring device taking an internet protocol Camera (IP Camera, IPC) as an example collects audio signals in a target area through a microphone array, records the audio signals when illegal sounds are contained in the audio signals, for example, performs image capture on the target area to realize audio monitoring on the target area, and it should be understood that the illegal sounds are defined differently when different target areas are monitored, for example, when the target area is a certain traffic road section, the illegal sounds may include whistling sounds or engine sounds (such as street car sounds) exceeding a preset decibel, and when the target area is an examination room, the illegal sounds may include human sounds exceeding the preset decibel. In the prior art, in a scene that a microphone array collects illegal sounds, the microphone array is detected by playing a source audio for testing the microphone array, then the microphone array collects a test audio based on the source audio, and whether a microphone in the microphone array is an abnormal microphone is judged by comparing the source audio with the test audio.
In order to be able to perform real-time and accurate detection on a microphone array in the using process of the microphone array, when the microphone array collects illegal sounds in a target area, the state of the microphone is detected in real time aiming at audio signals which are acquired by the microphone array and contain illegal sound signals, so that the accuracy of microphone detection in the scene of monitoring the illegal sounds is improved.
The embodiments of the present application can be applied to the detection scenario of the microphone array described above, and are specifically described in the following embodiments.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application, and as shown in fig. 1, a monitoring device 10 includes a microphone array 101 and an audio detection device 102, where the microphone array 101 and the audio detection device 102 are connected in a wireless or wired manner to achieve communication.
The microphone array 101 collects audio signals in a target area through each microphone, it should be understood that the collected audio signals should be multiple paths of audio signals, the microphone array 101 outputs the audio signals to the audio detection device 102, and the audio detection device 102 obtains a detection result of whether each microphone in the microphone array 101 is an abnormal microphone by detecting the audio signals.
As an example, the monitoring device 10 is connected with the server 20 through the audio detection device 102 in a wireless or wired manner, so that the audio detection device can transmit the detection result to the server 20; as another example, the audio detection device 102 is integrated with the server 20 (not shown in the figure), the monitoring device 10 is connected with the server 20 in a wireless or wired manner, the monitoring device 10 transmits the audio signals collected by the microphone array 101 to the server 20, and the server 20 detects the state of each microphone in the microphone array according to the audio signals.
The following describes a detection method of a microphone array provided by the present application with several embodiments.
Fig. 2 is a schematic flow chart of a detection method of a microphone array according to an embodiment of the present disclosure, and with reference to fig. 2, the method includes: and when the audio signals contain illegal sound signals, performing feature extraction on the audio signals collected by the microphone array to obtain time-frequency features of each frame of audio signals. And inputting the time-frequency characteristics of each frame of audio signal into a microphone abnormality detection model obtained by pre-training to obtain a detection result of whether each microphone in the microphone array is abnormal.
Further, in order to improve the detection accuracy of detecting the state of the microphone when the microphone is used for collecting the illegal sound, the embodiment of the present application first needs to determine whether the audio signal includes the illegal sound signal according to the audio feature of the audio signal collected by the microphone array.
In the scheme, in order to detect whether the state of each microphone is abnormal when the microphone array collects the illegal sound signals, only the audio signals collected by the microphone array and containing the illegal sound signals are detected, and the audio signals collected by the microphone array and not containing the illegal sound signals are not detected.
In the implementation of the scheme, firstly, feature extraction needs to be performed on the audio signals collected by each microphone in the microphone array to obtain the time-frequency features of each frame of audio signals in each path of audio signals, and the time-frequency features of the audio signals in the microphone state including the frequency domain amplitude features and the time domain energy features of the audio signals can be embodied, so that the frequency domain amplitude features and the time domain energy features of each frame of audio signals in one path of audio signals collected by each microphone need to be extracted, and the extracted frequency domain amplitude features and the extracted time domain energy features of each frame of audio signals are input into the model based on the microphone anomaly detection model obtained by training in advance, exemplarily, the time-frequency features of the audio signals of the same frame collected by a plurality of microphones can be simultaneously input into the model, or the time-frequency features of the audio signals of a plurality of frames collected by a plurality of microphones can be simultaneously input into the model, the model outputs a detection result to realize abnormal detection of the microphone outputting the audio signal, and further obtain a detection result of whether each microphone in the microphone array is abnormal.
This scheme is in order to can accurate detection microphone array state when gathering the sound of breaking rules and regulations: on one hand, when the audio signals collected by the microphone array contain illegal sound signals, the microphone array is detected; on the other hand, when the characteristics of the audio signal collected by each microphone are extracted, the frequency domain amplitude characteristics and the time domain energy characteristics are extracted at the same time, and various time-frequency characteristics can reflect the characteristics of the audio signal as comprehensively as possible; on the other hand, the state of each microphone is detected by using a microphone abnormity detection model obtained by training a large number of samples in advance, so that the detection result is more accurate.
In the process of extracting the characteristics of the multiple audio signals collected by the microphone array to obtain the time-frequency characteristics of each frame of audio signal, the following steps can be exemplarily implemented:
s101: n frames of audio signals are obtained from each path of audio signals collected by each microphone of the microphone array, wherein N is a positive integer greater than or equal to 1.
The N frames of audio signals may be all audio signals or a part of audio signals included in the channel of audio signals, and the embodiment of the present application is not specifically limited herein. When the N frames of audio signals are all the audio signals included in the channel of audio signals, it is indicated that the time-frequency characteristics of the channel of audio signals are determined according to all the audio signals included in the channel of audio signals, and thus, the determined time-frequency characteristics of the channel of audio signals can be matched with the characteristics of the channel of audio signals as much as possible. When the N frames of audio signals are part of the audio signals included in the channel of audio signals, it is indicated that the time-frequency characteristics of the channel of audio signals are determined according to the part of the audio signals included in the channel of audio signals, so that the efficiency of determining the time-frequency characteristics of the channel of audio signals can be improved.
For example, the audio signal is audio with a duration of 1 minute, wherein 1 second of audio includes 25 frames of audio signal. The N frames of audio may be all 25 × 60 frames of audio signals included in the audio signal or may be 25 × 30 frames of audio signals included in the first 30 seconds of audio in the audio signal.
S102: and extracting the frequency domain amplitude characteristic and the time domain energy characteristic of each frame of audio signal.
In a specific implementation manner, the time domain energy feature of each frame of audio signal is extracted, and specifically, the audio energy of each frame of audio signal may be calculated to obtain the audio energy P1, P2, P3 … PN of the N frames of audio signal.
In a specific implementation manner, extracting frequency domain amplitude features of each frame of audio signals includes: converting each frame of audio signal into a frequency domain signal through Fourier transform to obtain a magnitude spectrum of each frame of audio signal, wherein the magnitude spectrum comprises the magnitude of the frequency domain signal corresponding to the audio signal on K frequency points, K is a positive integer greater than or equal to 1, carrying out binarization processing on the magnitude spectrum of each frame of audio signal, and combining each element in the magnitude spectrum after binarization processing to obtain the frequency domain magnitude characteristic of each frame of audio signal.
Wherein, each frame of audio signal can be understood as a segment of time domain signal, and then performing fourier transform processing on each frame of audio signal means: the time domain signal corresponding to each frame of audio signal is converted into a superposition of K basic time domain signals, each basic time domain signal may be a signal such as a sine wave signal or a residual wave signal, and each basic time domain signal corresponds to one frequency. The amplitude spectrum of each frame of audio signal can be obtained by combining the amplitudes of the basic time domain signals, that is, the amplitude spectrum of each frame of audio signal includes K amplitude values.
Referring to fig. 3, in order to provide a schematic diagram of magnitude spectrum data of an audio signal according to an embodiment of the present application, K frequency points are respectively labeled as F1, F2, F3 …, and Fk, so that the magnitude spectrum of a first frame of the N frames of audio signals can be represented as: s1(1), S1(2), S1(3) …, and S1(K) are arranged from bottom to top in the first column of fig. 3, where S1(1) is the amplitude of the basic time domain signal corresponding to the frequency F1, S1(2) is the amplitude of the basic time domain signal corresponding to the frequency F2, S1(3) is the amplitude of the basic time domain signal corresponding to the frequency F3, and S1(K) is the amplitude of the basic time domain signal corresponding to the frequency Fk. By analogy, the magnitude spectrum of the second frame of audio signals may be represented as: s2(1), S2(2), S2(3) …, S2(K), and the magnitude spectrum of the third frame audio signal may be expressed as: s3(1), S3(2), S3(3) …, S3(K) …, the amplitude spectrum of the audio signal of the nth frame can be expressed as: sn (1), Sn (2), Sn (3) …, Sn (K).
The Fourier transform may be FFT (Fast Fourier transform), or may be Fourier transforms of other forms, and the embodiment of the present application is not specifically limited herein.
In addition, the binarization processing of the amplitude spectrum of each frame of audio signal may be implemented as follows: an average of all amplitudes present in the amplitude spectrum of each frame of the audio signal is determined, and for any amplitude in the amplitude spectrum of each frame of the audio signal, such as a first amplitude, the first amplitude is set to a first value if the first amplitude is greater than the average, and the first amplitude is set to a second value if the first amplitude is less than or equal to the average. Only two types of elements can be included in the amplitude spectrum of each frame of the audio signal by the binarization process.
For example, the first value may be 1 and the second value may be-1, such that only two types of elements are included in the amplitude spectrum of each frame of the audio signal: 1 and-1. Of course, the first value and the second value may also be set to other values, for example, the first value is set to 1, and the second value is set to 0, and the embodiment of the present application is not particularly limited herein.
TABLE 1
Figure BDA0002555527180000071
Figure BDA0002555527180000081
After the binarization processing is performed on the magnitude spectrum of each frame of audio signal, each element included in the magnitude spectrum after the binarization processing may be combined to obtain the frequency domain magnitude characteristic of each frame of audio signal. As shown in table 1, when the amplitude spectrum of the first frame audio signal is represented as S1(1), S1(2), S1(3) …, S1(K), the amplitude spectrum after the binarization process can be represented as: s1(1) ', S1 (2)', S1(3) '…, S1 (K)', at which time the frequency domain amplitude features of the audio signal of the first frame are obtained as (S1(1) ', S1 (2)', S1(3) '…, S1 (K)'). When the amplitude spectrum of the second frame audio signal is represented as S2(1), S2(2), S2(3) …, S2(K), the amplitude spectrum after the binarization process can be represented as: s2(1) ', S2 (2)', S2(3) '…, S2 (K)', at which time the frequency domain amplitude features of the second frame audio signal are obtained as (S2(1) ', S2 (2)', S2(3) '…, S2 (K)'). When the amplitude spectrum of the third frame audio signal is represented as S3(1), S3(2), S3(3) …, S3(K), the amplitude spectrum after the binarization process may be represented as: s3(1) ', S3 (2)', S3(3) '…, S3 (K)', at which time the frequency domain amplitude features of the audio signal of the third frame are obtained as (S3(1) ', S3 (2)', S3(3) '…, S3 (K)'). When the amplitude spectrum of the nth frame audio signal is represented by Sn (1), Sn (2), Sn (3) …, Sn (k), the amplitude spectrum after the binarization process may be represented by: sn (1) ', Sn (2)', Sn (3) '…, Sn (k)', where the frequency domain amplitude features of the first frame audio signal are obtained as (Sn (1) ', Sn (2)', Sn (3) '…, Sn (k)').
S103: and combining the frequency domain amplitude characteristic and the time domain energy characteristic of each frame of audio signal in the N frames of audio signals to obtain the time-frequency characteristic of the audio signal.
In a possible implementation manner, as shown in fig. 4, a schematic diagram of a binarized magnitude spectrum of an audio signal provided in an embodiment of the present application is shown. And taking the frequency domain amplitude characteristics of each frame of audio signal in the N frames of audio signals as a column to form an N-column matrix, sequentially acquiring each element in the matrix according to a specified route, and taking a vector formed by the acquired elements as the frequency domain amplitude characteristics of the audio signals.
When the frequency-domain amplitude characteristic of each frame of the audio signal includes K elements, the N-column matrix is a matrix of N columns and M rows. When the frequency domain amplitude characteristic of each frame of the audio signal is the frequency domain amplitude characteristic shown in table 1 above, the matrix of N columns and K rows may be represented as a square grid as shown in fig. 4. As shown in fig. 4, each row square corresponds to each row element of the matrix, and each column square corresponds to each column element of the matrix. Specifically, the first column of squares corresponds to each element in the frequency domain amplitude features (S1(1) ', S1 (2)', S1(3) '…, S1 (K)') of the first frame of audio signal, the second column of squares corresponds to each element in the frequency domain amplitude features (S2(1) ', S2 (2)', S2(3) '…, S2 (K)') of the second frame of audio signal, the third column of squares corresponds to each element in the frequency domain amplitude features (S3(1) ', S3 (2)', S3(3) '…, S3 (K)'), …, and the nth column of squares corresponds to each element in the frequency domain amplitude features (Sn (1) ', Sn (2)', Sn (3) '(…, Sn K)') of the nth frame of audio signal.
Wherein, sequentially obtaining each element in the matrix according to the designated route may be: starting from the first element in the upper left corner of the matrix, each element in the matrix is sequentially obtained in an S-shaped route manner, where the S-shaped route may be the S-shaped route shown in fig. 4. Of course, the designated route may be other types of routes, and it is only necessary to acquire each element in the matrix.
And taking each acquired element as K × N frequency domain amplitude features of each audio signal, and representing the K × N frequency domain amplitude features as S _ r (j), wherein j is 1,2, … KN, and combining the K × N frequency domain amplitude features with the acquired N time domain energy features P1, P2 and P3 … PN to obtain (K +1) × N time frequency features.
In a specific implementation manner, as shown in fig. 2, after obtaining the detection result of each microphone in the microphone array, the present solution needs to alarm when an abnormal microphone exists, specifically, if the detection result indicates that the abnormal microphone exists, a corresponding first alarm message is generated, where the first alarm message is used to indicate at least one of information of abnormality of the microphone array, the number of abnormal microphones in the microphone array, and an identifier of the abnormal microphone, and if the detection result indicates that the abnormal microphone does not exist, the first alarm message is not generated, and the detection process is ended.
Optionally, the monitoring device may send the first alarm information to the server, or the monitoring device may display the first alarm information through a connected display.
Fig. 5 is a schematic flowchart of a microphone anomaly detection model training process according to an embodiment of the present application, in which an initial network model needs to be trained to obtain a microphone anomaly detection model before inputting time-frequency features of each frame of audio signals into a pre-trained microphone anomaly detection model.
As shown in fig. 5, first, audio training signals including illegal sound signals are acquired through a microphone array, each audio training signal is subjected to sample processing to obtain a plurality of training samples, each training sample includes a time-frequency feature and a state tag of each audio training signal, the state tags are used for marking the state of a microphone corresponding to the audio training signals to be normal or abnormal, and then the plurality of training samples are input into an initial network model for iterative training to obtain a microphone abnormality detection model.
In the process of performing sample processing on each audio training signal, including performing feature extraction and sample labeling on each audio training signal, the specific process of extracting the frequency domain amplitude feature and the time domain energy feature of each audio training signal is similar to steps S102 to S103, and is not described again here; and performing sample labeling on each audio training signal to obtain a state label of each corresponding microphone, wherein the state of the microphone is determined to be normal or abnormal based on any algorithm or any hardware detection method, and the scheme is not limited thereto.
In the process of training in the initial network model, illustratively, a plurality of training samples are input into the initial network model for iterative training until the loss function converges to be less than a preset value, so as to obtain a microphone anomaly detection model, and the model training process is ended.
In order to improve the detection accuracy of detecting the state of the microphone when the microphone is used for collecting the illegal sound, fig. 7 or 7 is a schematic flow diagram of a detection method of a microphone array provided by an embodiment of the present application.
For example, as shown in fig. 6, to avoid that performance differences of microphones affect illegal sound recognition, audio energy of each path of audio signal is calculated, one path of audio signal with the largest audio energy is selected from multiple paths of audio signals collected by a microphone array as a signal to be detected, and it is determined whether the signal to be detected contains an illegal sound signal, and if the signal to be detected contains an illegal sound signal, it indicates that the multiple paths of audio signals contain the illegal sound signal.
For example, as shown in fig. 7, in the present embodiment, the determining whether the signal to be detected includes an illegal sound signal includes: the method comprises the steps of dividing a signal to be detected into a plurality of signal segments by adopting frame division and windowing processing, extracting audio features for each signal segment, and combining the audio features corresponding to each signal segment into the audio features of the signal to be detected. The audio signal includes, but is not limited to, one or a combination of Fbank feature, fourier transform-spectrogram FFT-spectrum feature, mel-frequency Cepstrum Coefficient (MFCC) feature. Illustratively, the filter bank algorithm is adopted to extract the Fbank features of the signal to be detected, illustratively, the frame division and windowing (for example, Hamming window) processing is performed on the signal to be detected, the signal to be detected is divided into a plurality of signal segments, the filter bank algorithm is adopted for each divided signal segment to extract the Fbank features of the signal segment, and all the extracted Fbank features of the signal to be detected are combined to obtain the Fbank features of the signal to be detected.
And further, inputting the audio features of the signal to be detected into a pre-trained illegal sound signal recognition model, and recognizing whether the signal to be detected contains an illegal sound signal to obtain a recognition result.
Fig. 8 is a schematic diagram of a feature extraction process provided in an embodiment of the present application. As shown in fig. 8, after performing frame windowing, pre-emphasis, and Fast Fourier Transform (FFT) on an audio signal, extracting FFT-spectral features; after the FFT, triangular filtering is continuously performed on the audio signal, for example, triangular filtering is performed through a Mel-domain filter, and then logarithm is taken on the filtered signal, thereby extracting the Fbank feature; for the MFCC features, Discrete Cosine Transform (DCT) may be continued, and MFCC features may be extracted after DCT Transform.
Fig. 9 is a schematic flowchart of illegal sound signal identification according to an embodiment of the present disclosure. As shown in fig. 9, based on the basic method of deep learning, the detection of the scheme may not be limited to a single category of violation sounds, for example, in the traffic field, whistling sounds, street-firing sounds, braking sounds, etc. may all become the category of sounds to be detected. The unified detection scheme may enable detection of any offending sounds of interest to the user at the same time, with data support.
Fig. 10 is a schematic flow chart of training of an illegal sound signal recognition model according to an embodiment of the present application, in this embodiment, a pre-collected sample set is used to train an initial network model, specifically, an audio feature of each sample and a preset state label are input into the initial network model, the initial network model is iteratively trained until the recognition accuracy of the illegal sound signal recognition model meets a requirement, or until a loss function converges to be less than a preset value, the illegal sound signal recognition model is obtained, and a training process is ended.
In a specific implementation manner, before training an initial network model, samples need to be collected to form a sample set, where the sample set includes a sample containing an illegal sound and a sample not containing the illegal sound, and further, frame division and windowing are performed on each sample, each sample is divided into a plurality of signal segments, an audio feature is extracted for each signal segment, and all audio features of each sample are combined to obtain an audio feature of each sample. Optionally, the audio features include, but are not limited to, one or a combination of Fbank features, FFT-spectra features, MFCC features.
Illustratively, the initial network model is built based on a deep learning network, such as Deep Neural Networks (DNN), DFSMN, Long Short-Term Memory Networks (LSTM), Convolutional Neural Networks (CNN), and the like.
Fig. 11 is a schematic flowchart of optimizing an illegal sound signal identification model according to an embodiment of the present application. As shown in fig. 11, in this embodiment, when the pre-application data is insufficient, a migration learning strategy may be used to borrow other large data set pre-training models as the initial network model, and perform parameter tuning training with a small amount of initial training data. And continuously collecting illegal sound samples as training data in the later stage, continuing the next round of optimization training, and continuously increasing the training data by adopting an iterative updating mode so that the accuracy of the illegal sound signal identification model is higher and higher.
In order to improve the detection efficiency of the microphone array, before the microphone abnormity identification is carried out on the audio signals, whether the microphone array is abnormal or not is determined through one path of audio signals in the multiple paths of audio signals, and the abnormity detection is not carried out on the multiple paths of audio signals under the condition that the microphone array is abnormal through one path of audio signals, so that the detection efficiency is improved.
Fig. 12 is a schematic flow chart of a detection method for a microphone array according to an embodiment of the present disclosure, in this embodiment, before determining whether a signal to be detected includes an illegal sound signal, preliminary detection of a state of the microphone array may be performed by detecting whether the signal to be detected is an effective pickup signal, and then, illegal sound identification is performed if it is determined that the signal to be detected is an effective pickup signal, and if it is determined that the signal to be detected is not an effective pickup signal, it may be determined that the microphone array is abnormal, and second alarm information is generated, and it is not necessary to perform illegal sound identification and further detect multiple audio signals, so that detection efficiency is improved.
For example, fig. 13 is a schematic flowchart of a process for determining the sound-pickup effectiveness of a signal according to an embodiment of the present application, as shown in fig. 13, calculating to obtain audio energy of a signal to be detected, comparing the audio energy of the signal to be detected with a preset threshold, if the audio energy of the signal to be detected is smaller than the preset threshold, determining that the signal to be detected is an invalid sound-pickup signal, otherwise, continuously determining whether the signal to be detected is an valid sound-pickup signal according to the stability of a frequency spectrum peak of the signal to be detected.
Further, in the process of continuously determining whether the signal to be detected is an effective pickup signal according to the spectral peak stability of the signal to be detected, the spectral peak stability of the signal to be detected needs to be extracted, for example, N frames of audio signals are obtained from the signal to be detected, where N is a positive integer greater than or equal to 1, the specific implementation manner is similar to that in step S101, and details are not repeated here. Converting each frame of audio signal of the signal to be detected into a frequency domain signal through Fourier transform to obtain the position of the maximum value of each frame of audio signal in all amplitudes in the frequency domain, namely a spectral peak position index (i), and obtaining the maximum value through a formula
Figure BDA0002555527180000121
Figure BDA0002555527180000122
And calculating the spectral peak stability T _ Dif of the signal to be detected, wherein i is any frame in the N frames of audio signals.
Further, the stability of the frequency spectrum peak value of the signal to be detected is compared with a preset threshold value, if the stability of the frequency spectrum peak value of the signal to be detected is smaller than the preset threshold value, the signal to be detected is determined to be an invalid pickup signal, and if the stability of the frequency spectrum peak value of the signal to be detected is determined to be larger than or equal to the preset threshold value, the signal to be detected is determined to be an effective pickup signal.
Fig. 14 is a schematic structural diagram of a detection apparatus of a microphone array according to an embodiment of the present disclosure, and as shown in fig. 14, the detection apparatus 10 of a microphone array includes:
the first processing module 11 is configured to determine whether the audio signal includes an illegal sound signal according to an audio feature of an audio signal acquired by a microphone array;
the second processing module 12 is configured to, when the audio signal includes an illegal sound signal, perform feature extraction on the audio signal acquired by the microphone array to obtain a time-frequency feature of each frame of audio signal, where the time-frequency feature is used to indicate a frequency-domain amplitude feature and a time-domain energy feature of the audio signal;
and the third processing module 13 is configured to input the time-frequency characteristics of each frame of audio signal to a microphone anomaly detection model obtained through pre-training, so as to obtain a detection result of whether each microphone in the microphone array is anomalous.
The detection device 10 for the microphone array provided by the embodiment comprises a first processing module 11, a second processing module 12 and a third processing module 13, and in order to accurately detect the state of the microphone array when an illegal sound is collected: on one hand, when the audio signals collected by the microphone array contain illegal sound signals, the microphone array is detected; on the other hand, when the characteristics of the audio signal collected by each microphone are extracted, the frequency domain amplitude characteristics and the time domain energy characteristics are extracted at the same time, and various time-frequency characteristics can reflect the characteristics of the audio signal as comprehensively as possible; on the other hand, the state of each microphone is detected by using a microphone abnormity detection model obtained by training a large number of samples in advance, so that the detection result is more accurate.
In one possible design, the second processing module 12 is specifically configured to:
acquiring N frames of audio signals from each path of audio signal acquired by each microphone of a microphone array, wherein N is a positive integer greater than or equal to 1;
extracting frequency domain amplitude characteristics and time domain energy characteristics of each frame of audio signal;
and combining the frequency domain amplitude characteristic and the time domain energy characteristic of each frame of audio signal in the N frames of audio signals to obtain the time-frequency characteristic of the audio signal.
In one possible design, the second processing module 12 is specifically configured to:
converting the audio signal into a frequency domain signal through Fourier transform to obtain an amplitude spectrum of the audio signal; the amplitude spectrum comprises amplitudes of frequency domain signals corresponding to the audio signals at K frequency points; k is a positive integer greater than or equal to 1;
carrying out binarization processing on the amplitude spectrum of the audio signal;
and combining all elements in the amplitude spectrum after the binarization processing to obtain the frequency domain amplitude characteristic of the audio signal.
In one possible design, the second processing module 12 is specifically configured to:
determining an average of all amplitudes present in the amplitude spectrum of the audio signal;
comparing a first amplitude in the audio signal with the average value; the first amplitude is any amplitude in an amplitude spectrum of the audio signal;
if the first amplitude is greater than the average value, setting the first amplitude as a first value;
and if the first amplitude is smaller than or equal to the average value, setting the first amplitude as a second value.
Fig. 15 is a schematic structural diagram of another detection apparatus for a microphone array according to an embodiment of the present disclosure, and as shown in fig. 15, the detection apparatus 10 for a microphone array further includes:
the fourth processing module 14 is configured to obtain a plurality of training samples according to the collected multiple audio training signals including the violation sound signals; the training samples include the time-frequency features and state labels of each audio training signal; the state label is used for marking the state of the microphone corresponding to the audio training signal as normal or abnormal;
and the model training module 15 is configured to input the multiple training samples into an initial network model for iterative training, so as to obtain the microphone anomaly detection model.
An alarm module 16, configured to generate first alarm information if the detection result indicates that an abnormal microphone exists, where the first alarm information is used to indicate at least one of the following information:
the microphone array is abnormal, the number of abnormal microphones in the microphone array, and the identification of the abnormal microphones.
In one possible design, the first processing module 11 is specifically configured to:
selecting one path of audio signal with the largest audio energy from the audio signals collected by the microphone array as a signal to be detected;
dividing the signal to be detected into a plurality of signal segments by adopting frame division and windowing processing;
extracting the audio features of the signal to be detected according to the plurality of signal segments;
and inputting the audio features into a pre-trained illegal sound signal recognition model to obtain a recognition result of whether the signal to be detected contains the illegal sound signal.
In a possible design, the fourth processing module 14 is further configured to divide each sample in a pre-collected sample set into a plurality of signal segments by using frame windowing, where each sample is provided with a corresponding status label, and the status label is used to mark whether the sample contains an illegal sound;
the fourth processing module 14 is further configured to extract an audio feature for each signal segment obtained by dividing each sample;
the fourth processing module 14 is further configured to combine all the audio features extracted for each sample to obtain an audio feature of the sample;
the model training module 15 is further configured to input the audio feature of each sample and the state label of the sample to the initial network model for iterative training, so as to obtain the violation sound signal identification model.
Fig. 16 is a schematic structural diagram of another detection apparatus for a microphone array according to an embodiment of the present disclosure, and as shown in fig. 16, the detection apparatus 10 for a microphone array further includes:
a fifth processing module 17 configured to:
calculating the audio energy of the signal to be detected;
if the audio energy of the signal to be detected is smaller than a preset threshold value, determining that the signal to be detected is an invalid pickup signal;
if the audio energy of the signal to be detected is greater than or equal to a preset threshold value, extracting the spectral peak stability of the signal to be detected; if the stability of the frequency spectrum peak value is smaller than a preset stability threshold value, determining that the signal to be detected is an invalid pickup signal; if the stability of the frequency spectrum peak value is greater than or equal to a preset stability threshold value, determining that the signal to be detected is an effective pickup signal;
if the signal to be detected is an effective pickup signal, executing the step of determining whether the signal to be detected contains an illegal sound signal;
and if the signal to be detected is not a valid pickup signal, generating second alarm information, wherein the second alarm information is used for indicating that the microphone array is abnormal.
In a possible implementation manner, the fifth processing module 18 is specifically configured to:
acquiring N frames of audio signals from the signal to be detected, wherein N is a positive integer greater than or equal to 1;
converting each frame of audio signal of the signal to be detected into a frequency domain signal through Fourier transform;
determining a spectral peak position index (i) in a frequency domain signal corresponding to each frame of audio signal;
by the formula
Figure BDA0002555527180000161
And calculating the spectral peak stability T _ Dif of the signal to be detected.
The detection apparatus of the microphone array provided in the above embodiment may implement the technical solutions of the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again.
Referring to fig. 17, the embodiment of the present application only uses fig. 17 as an example to describe, and does not mean that the present application is limited thereto.
Fig. 17 is a schematic hardware structure diagram of a monitoring device according to an embodiment of the present application. As shown in fig. 17, in general, the monitoring apparatus 600 includes: a processor 601 and a memory 602.
The processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 601 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 601 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 601 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 601 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.
The memory 602 may include one or more computer-readable storage media, which may be non-transitory. The memory 602 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer-readable storage medium in the memory 602 is used to store at least one instruction for execution by the processor 601 to implement the method for detecting a microphone array provided by the method embodiments of the present application.
In some embodiments, the monitoring device 600 may further optionally include: a peripheral interface 603 and at least one peripheral. The processor 601, memory 602, and peripheral interface 603 may be connected by buses or signal lines. Various peripheral devices may be connected to the peripheral interface 603 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of a radio frequency circuit 604, a touch display 605, a camera 606, a microphone array 607, a positioning component 608, and a power supply 609.
The peripheral interface 603 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 601 and the memory 602. In some embodiments, the processor 601, memory 602, and peripheral interface 603 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 601, the memory 602, and the peripheral interface 603 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
The Radio Frequency circuit 604 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals. The radio frequency circuitry 604 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 604 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 604 comprises: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuitry 604 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: metropolitan area networks, various generation mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the rf circuit 604 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.
The display 605 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 605 is a touch display screen, the display screen 605 also has the ability to capture touch signals on or over the surface of the display screen 605. The touch signal may be input to the processor 601 as a control signal for processing. At this point, the display 605 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 605 may be one, providing the front panel of the monitoring device 600; in other embodiments, the display screen 605 may be at least two, respectively disposed on different surfaces of the monitoring device 600 or in a folded design; in still other embodiments, the display 605 may be a flexible display disposed on a curved surface or on a folded surface of the monitoring device 600. Even more, the display 605 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 605 may be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), and the like.
The camera assembly 606 is used to capture images or video. Optionally, camera assembly 606 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of the terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize panoramic shooting and VR (Virtual Reality) shooting functions or other fusion shooting functions. In some embodiments, camera assembly 606 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.
The microphone array 607 is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 601 for processing, or inputting the electric signals to the radio frequency circuit 604 for realizing voice communication. For stereo capture or noise reduction purposes, the microphones may be multiple and located at different locations of the monitoring device 600.
The location component 608 is used to locate the current geographic location of the monitoring device 600 to implement navigation or LBS (location based Service). The positioning component 608 can be a positioning component based on the GPS (global positioning System) in the united states, the beidou System in china, the graves System in russia, or the galileo System in the european union.
The power supply 609 is used to supply power to various components in the monitoring device 600. The power supply 609 may be ac, dc, disposable or rechargeable. When the power supply 609 includes a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.
Those skilled in the art will appreciate that the configuration shown in FIG. 17 is not intended to be limiting of monitoring device 600, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
Fig. 18 is a schematic hardware structure diagram of a server according to an embodiment of the present application. As shown in fig. 18, the server 700 provided in this embodiment may include: a memory 701 and a processor 702; optionally, a bus 703 may also be included. The bus 703 is used to realize connection between the elements.
The memory 701 stores computer-executable instructions;
the processor 702 executes computer-executable instructions stored in the memory 701, so that the processor 702 executes the method for detecting a microphone array provided by any one of the foregoing embodiments.
Wherein, the memory 701 and the processor 702 are electrically connected directly or indirectly to realize data transmission or interaction. For example, these components may be electrically connected to each other via one or more communication buses or signal lines, such as bus 703. The memory 701 stores computer-executable instructions for implementing the data access control method, including at least one software functional module that can be stored in the memory 701 in the form of software or firmware, and the processor 702 executes various functional applications and data processing by running software programs and modules stored in the memory 701.
The Memory 701 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory 701 is used for storing programs, and the processor 702 executes the programs after receiving execution instructions. Further, the software programs and modules within the memory 701 may also include an operating system, which may include various software components and/or drivers for managing system tasks (e.g., memory management, storage device control, power management, etc.), and may communicate with various hardware or software components to provide an operating environment for other software components.
The processor 702 may be an integrated circuit chip having signal processing capabilities. The Processor 702 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and so on. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The embodiments of the present application also provide a non-transitory computer-readable storage medium, and when instructions in the storage medium are executed by a processor of a mobile terminal, the terminal is enabled to execute the method for detecting a microphone array provided by the above embodiments.
The computer-readable storage medium in this embodiment may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, etc. that is integrated with one or more available media, and the available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., SSDs), etc.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The embodiment of the present application further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method for detecting a microphone array provided by the above embodiment.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (12)

1. A method for detecting a microphone array, comprising:
determining whether the audio signals contain illegal sound signals according to the audio characteristics of the audio signals collected by the microphone array;
when the audio signals contain illegal sound signals, performing feature extraction on the audio signals to obtain time-frequency features of each frame of audio signals, wherein the time-frequency features are used for indicating frequency domain amplitude features and time domain energy features of the audio signals;
and inputting the time-frequency characteristics of each frame of audio signal into a microphone abnormality detection model obtained by pre-training to obtain a detection result of whether each microphone in the microphone array is abnormal or not.
2. The method of claim 1, wherein the extracting the features of the audio signals to obtain the time-frequency features of each frame of audio signals comprises:
converting the audio signal into a frequency domain signal through Fourier transform to obtain an amplitude spectrum of the audio signal; the amplitude spectrum comprises amplitudes of frequency domain signals corresponding to the audio signals at K frequency points; k is a positive integer greater than or equal to 1;
carrying out binarization processing on the amplitude spectrum of the audio signal;
and combining all elements in the amplitude spectrum after the binarization processing to obtain the frequency domain amplitude characteristic of the audio signal.
3. The method according to claim 2, wherein the binarization processing is performed on the amplitude spectrum of the audio signal, and comprises:
determining an average of all amplitudes present in the amplitude spectrum of the audio signal;
comparing a first amplitude in the audio signal with the average value; the first amplitude is any amplitude in an amplitude spectrum of the audio signal;
if the first amplitude is greater than the average value, setting the first amplitude as a first value;
and if the first amplitude is smaller than or equal to the average value, setting the first amplitude as a second value.
4. The method according to any one of claims 1 to 3, wherein before inputting the time-frequency features of each frame of audio signal into the pre-trained microphone anomaly detection model, the method further comprises:
obtaining a plurality of training samples according to a plurality of collected audio training signals containing violation sound signals; the training samples include the time-frequency features and state labels of each audio training signal; the state label is used for marking the state of the microphone corresponding to the audio training signal as normal or abnormal;
and inputting the training samples into an initial network model for iterative training to obtain the microphone anomaly detection model.
5. The method of any one of claims 1 to 3, wherein determining whether the audio signal contains an illegal sound signal according to the audio characteristics of the audio signal acquired by the microphone array comprises:
selecting one path of audio signal with the largest audio energy from the audio signals collected by the microphone array as a signal to be detected;
dividing the signal to be detected into a plurality of signal segments by adopting frame division and windowing processing;
extracting the audio features of the signal to be detected according to the plurality of signal segments;
and inputting the audio features into a pre-trained illegal sound signal recognition model to obtain a recognition result of whether the signal to be detected contains the illegal sound signal.
6. The method of claim 5, wherein before inputting the audio features into a pre-trained offending sound signal recognition model, further comprising:
dividing each sample in a pre-collected sample set into a plurality of signal segments by adopting frame-division windowing, wherein each sample is provided with a corresponding state label, and the state label is used for marking whether the sample contains illegal sound;
extracting audio features for each signal segment obtained by dividing each sample;
combining all the audio features extracted for each sample to obtain the audio features of the sample;
and inputting the audio features of each sample and the state labels of the samples into an initial network model for iterative training to obtain the illegal sound signal identification model.
7. The method of claim 5, wherein before selecting one of the audio signals collected from the microphone array with the largest audio energy as a signal to be detected, the method further comprises:
calculating the audio energy of the signal to be detected;
if the audio energy of the signal to be detected is smaller than a preset threshold value, determining that the signal to be detected is an invalid pickup signal;
if the audio energy of the signal to be detected is greater than or equal to a preset threshold value, extracting the spectral peak stability of the signal to be detected; if the stability of the frequency spectrum peak value is smaller than a preset stability threshold value, determining that the signal to be detected is an invalid pickup signal; if the stability of the frequency spectrum peak value is greater than or equal to a preset stability threshold value, determining that the signal to be detected is an effective pickup signal;
if the signal to be detected is an effective pickup signal, executing the step of determining whether the signal to be detected contains an illegal sound signal;
and if the signal to be detected is not a valid pickup signal, generating second alarm information, wherein the second alarm information is used for indicating that the microphone array is abnormal.
8. The method according to claim 7, wherein the extracting the spectral peak stability of the signal to be detected comprises:
acquiring N frames of audio signals from the signal to be detected, wherein N is a positive integer greater than or equal to 1;
converting each frame of audio signal of the signal to be detected into a frequency domain signal through Fourier transform;
determining a spectral peak position index (i) in a frequency domain signal corresponding to each frame of audio signal;
by the formula
Figure FDA0002555527170000031
And calculating the spectral peak stability T _ Dif of the signal to be detected.
9. An apparatus for detecting a microphone array, the apparatus comprising:
the first processing module is used for determining whether the audio signals contain illegal sound signals according to the audio characteristics of the audio signals collected by the microphone array;
the second processing module is used for extracting the characteristics of the audio signals when the audio signals contain illegal sound signals to obtain the time-frequency characteristics of each frame of audio signals, and the time-frequency characteristics are used for indicating the frequency domain amplitude characteristics and the time domain energy characteristics of the audio signals;
and the third processing module is used for inputting the time-frequency characteristics of each frame of audio signal into the microphone anomaly detection model obtained by pre-training to obtain the detection result of whether each microphone in the microphone array is abnormal or not.
10. A monitoring device, comprising: a microphone array, a memory, and a processor;
the microphone array collects audio signals in a target area;
the memory stores computer-executable instructions;
the processor executes the computer-executable instructions stored in the memory, so that the processor performs the detection method of the microphone array according to any one of claims 1 to 8.
11. A server, comprising: a memory and a processor;
the memory stores computer-executable instructions;
the processor executes the computer-executable instructions stored in the memory, so that the processor performs the detection method of the microphone array according to any one of claims 1 to 8.
12. A storage medium, comprising: readable storage medium and computer program for implementing the method of detecting a microphone array of any of claims 1 to 8.
CN202010588457.4A 2020-06-24 2020-06-24 Microphone array detection method, device, equipment and storage medium Active CN111770427B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010588457.4A CN111770427B (en) 2020-06-24 2020-06-24 Microphone array detection method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010588457.4A CN111770427B (en) 2020-06-24 2020-06-24 Microphone array detection method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111770427A true CN111770427A (en) 2020-10-13
CN111770427B CN111770427B (en) 2023-01-24

Family

ID=72722277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010588457.4A Active CN111770427B (en) 2020-06-24 2020-06-24 Microphone array detection method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111770427B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112289341A (en) * 2020-11-03 2021-01-29 国网智能科技股份有限公司 Sound abnormity identification method and system for transformer substation equipment
CN112598027A (en) * 2020-12-09 2021-04-02 深圳市优必选科技股份有限公司 Equipment abnormity identification method and device, terminal equipment and storage medium
CN112969134A (en) * 2021-02-07 2021-06-15 深圳市微纳感知计算技术有限公司 Microphone abnormality detection method, device, equipment and storage medium
CN113219450A (en) * 2021-04-29 2021-08-06 深圳市恒天伟焱科技股份有限公司 Ranging positioning method, ranging device and readable storage medium
CN113518202A (en) * 2021-04-07 2021-10-19 华北电力大学扬中智能电气研究中心 Security monitoring method and device, electronic equipment and storage medium
CN113543010A (en) * 2021-09-15 2021-10-22 阿里巴巴达摩院(杭州)科技有限公司 Detection method and device for microphone equipment, storage medium and processor
CN113593251A (en) * 2021-07-22 2021-11-02 世邦通信股份有限公司 Quick screening method and system for street frying vehicle
CN114040312A (en) * 2021-11-29 2022-02-11 四川虹美智能科技有限公司 Microphone detection method and system of voice air conditioner
CN114220457A (en) * 2021-10-29 2022-03-22 成都中科信息技术有限公司 Audio data processing method and device of dual-channel communication link and storage medium
CN114827821A (en) * 2022-04-25 2022-07-29 世邦通信股份有限公司 Pickup control method and system for pickup, pickup apparatus, and storage medium
CN114859194A (en) * 2022-07-07 2022-08-05 杭州兆华电子股份有限公司 Non-contact-based partial discharge detection method and device

Citations (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040136539A1 (en) * 2003-01-09 2004-07-15 Uhi William Walter Audio-conditioned acoustics-based diagnostics
US20090028349A1 (en) * 2007-07-25 2009-01-29 Samsung Electronics Co., Ltd. Method and apparatus for detecting malfunctioning speaker
CN101426168A (en) * 2008-11-27 2009-05-06 嘉兴中科声学科技有限公司 Sounding body abnormal sound detection method and system
CN102163427A (en) * 2010-12-20 2011-08-24 北京邮电大学 Method for detecting audio exceptional event based on environmental model
CN102324229A (en) * 2011-09-08 2012-01-18 中国科学院自动化研究所 Method and system for detecting abnormal use of voice input equipment
CN103198838A (en) * 2013-03-29 2013-07-10 苏州皓泰视频技术有限公司 Abnormal sound monitoring method and abnormal sound monitoring device used for embedded system
CN103327433A (en) * 2013-05-27 2013-09-25 腾讯科技(深圳)有限公司 Audio input interface detection method and system thereof
CN103546853A (en) * 2013-09-18 2014-01-29 浙江中科电声研发中心 Speaker abnormal sound detecting method based on short-time Fourier transformation
CN103929707A (en) * 2014-04-08 2014-07-16 深圳市中兴移动通信有限公司 Method for detecting conditions of microphone voice-grade channels and terminal
CN106303804A (en) * 2016-07-28 2017-01-04 维沃移动通信有限公司 The control method of a kind of mike and mobile terminal
CN106297770A (en) * 2016-08-04 2017-01-04 杭州电子科技大学 The natural environment sound identification method extracted based on time-frequency domain statistical nature
CN106769048A (en) * 2017-01-17 2017-05-31 苏州大学 Self adaptation depth confidence network Method for Bearing Fault Diagnosis based on Nesterov momentum methods
CN107548007A (en) * 2016-06-23 2018-01-05 杭州海康威视数字技术股份有限公司 A kind of detection method and device of audio signal sample equipment
CN108469109A (en) * 2018-03-01 2018-08-31 广东美的制冷设备有限公司 Detection method, device, system, air conditioner and the storage medium of unit exception
CN108885133A (en) * 2016-04-01 2018-11-23 日本电信电话株式会社 Abnormal sound detects learning device, sound characteristic amount extraction device, abnormal sound sampling apparatus, its method and program
CN109104684A (en) * 2018-07-26 2018-12-28 Oppo广东移动通信有限公司 Microphone plug-hole detection method and Related product
CN109120779A (en) * 2018-07-24 2019-01-01 Oppo(重庆)智能科技有限公司 Microphone blocks based reminding method and relevant apparatus
CN109473120A (en) * 2018-11-14 2019-03-15 辽宁工程技术大学 A kind of abnormal sound signal recognition method based on convolutional neural networks
CN109688527A (en) * 2017-10-19 2019-04-26 英特尔公司 Loudspeaker faults are detected using acoustic echo
CN109817227A (en) * 2018-12-06 2019-05-28 洛阳语音云创新研究院 A kind of the abnormal sound monitoring method and system of farm
US10405115B1 (en) * 2018-03-29 2019-09-03 Motorola Solutions, Inc. Fault detection for microphone array
CN110338092A (en) * 2019-07-01 2019-10-18 河南牧业经济学院 A kind of pig Activity recognition method and system based on sound
CN110798790A (en) * 2018-08-01 2020-02-14 杭州海康威视数字技术股份有限公司 Microphone abnormality detection method, device storage medium
CN110991289A (en) * 2019-11-25 2020-04-10 达闼科技成都有限公司 Abnormal event monitoring method and device, electronic equipment and storage medium
CN111174370A (en) * 2018-11-09 2020-05-19 珠海格力电器股份有限公司 Fault detection method and device, storage medium and electronic device
CN111263284A (en) * 2020-01-09 2020-06-09 河南讯飞智元信息科技有限公司 Microphone fault detection method and device, electronic equipment and storage medium

Patent Citations (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040136539A1 (en) * 2003-01-09 2004-07-15 Uhi William Walter Audio-conditioned acoustics-based diagnostics
US20090028349A1 (en) * 2007-07-25 2009-01-29 Samsung Electronics Co., Ltd. Method and apparatus for detecting malfunctioning speaker
CN101426168A (en) * 2008-11-27 2009-05-06 嘉兴中科声学科技有限公司 Sounding body abnormal sound detection method and system
CN102163427A (en) * 2010-12-20 2011-08-24 北京邮电大学 Method for detecting audio exceptional event based on environmental model
CN102163427B (en) * 2010-12-20 2012-09-12 北京邮电大学 Method for detecting audio exceptional event based on environmental model
CN102324229A (en) * 2011-09-08 2012-01-18 中国科学院自动化研究所 Method and system for detecting abnormal use of voice input equipment
CN103198838A (en) * 2013-03-29 2013-07-10 苏州皓泰视频技术有限公司 Abnormal sound monitoring method and abnormal sound monitoring device used for embedded system
CN103327433A (en) * 2013-05-27 2013-09-25 腾讯科技(深圳)有限公司 Audio input interface detection method and system thereof
CN103546853A (en) * 2013-09-18 2014-01-29 浙江中科电声研发中心 Speaker abnormal sound detecting method based on short-time Fourier transformation
CN103929707A (en) * 2014-04-08 2014-07-16 深圳市中兴移动通信有限公司 Method for detecting conditions of microphone voice-grade channels and terminal
CN108885133A (en) * 2016-04-01 2018-11-23 日本电信电话株式会社 Abnormal sound detects learning device, sound characteristic amount extraction device, abnormal sound sampling apparatus, its method and program
CN107548007A (en) * 2016-06-23 2018-01-05 杭州海康威视数字技术股份有限公司 A kind of detection method and device of audio signal sample equipment
CN106303804A (en) * 2016-07-28 2017-01-04 维沃移动通信有限公司 The control method of a kind of mike and mobile terminal
CN106297770A (en) * 2016-08-04 2017-01-04 杭州电子科技大学 The natural environment sound identification method extracted based on time-frequency domain statistical nature
CN106769048A (en) * 2017-01-17 2017-05-31 苏州大学 Self adaptation depth confidence network Method for Bearing Fault Diagnosis based on Nesterov momentum methods
CN109688527A (en) * 2017-10-19 2019-04-26 英特尔公司 Loudspeaker faults are detected using acoustic echo
CN108469109A (en) * 2018-03-01 2018-08-31 广东美的制冷设备有限公司 Detection method, device, system, air conditioner and the storage medium of unit exception
US10405115B1 (en) * 2018-03-29 2019-09-03 Motorola Solutions, Inc. Fault detection for microphone array
CN109120779A (en) * 2018-07-24 2019-01-01 Oppo(重庆)智能科技有限公司 Microphone blocks based reminding method and relevant apparatus
CN109104684A (en) * 2018-07-26 2018-12-28 Oppo广东移动通信有限公司 Microphone plug-hole detection method and Related product
CN110798790A (en) * 2018-08-01 2020-02-14 杭州海康威视数字技术股份有限公司 Microphone abnormality detection method, device storage medium
CN111174370A (en) * 2018-11-09 2020-05-19 珠海格力电器股份有限公司 Fault detection method and device, storage medium and electronic device
CN109473120A (en) * 2018-11-14 2019-03-15 辽宁工程技术大学 A kind of abnormal sound signal recognition method based on convolutional neural networks
CN109817227A (en) * 2018-12-06 2019-05-28 洛阳语音云创新研究院 A kind of the abnormal sound monitoring method and system of farm
CN110338092A (en) * 2019-07-01 2019-10-18 河南牧业经济学院 A kind of pig Activity recognition method and system based on sound
CN110991289A (en) * 2019-11-25 2020-04-10 达闼科技成都有限公司 Abnormal event monitoring method and device, electronic equipment and storage medium
CN111263284A (en) * 2020-01-09 2020-06-09 河南讯飞智元信息科技有限公司 Microphone fault detection method and device, electronic equipment and storage medium

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112289341A (en) * 2020-11-03 2021-01-29 国网智能科技股份有限公司 Sound abnormity identification method and system for transformer substation equipment
CN112598027A (en) * 2020-12-09 2021-04-02 深圳市优必选科技股份有限公司 Equipment abnormity identification method and device, terminal equipment and storage medium
CN112969134A (en) * 2021-02-07 2021-06-15 深圳市微纳感知计算技术有限公司 Microphone abnormality detection method, device, equipment and storage medium
CN113518202A (en) * 2021-04-07 2021-10-19 华北电力大学扬中智能电气研究中心 Security monitoring method and device, electronic equipment and storage medium
CN113219450A (en) * 2021-04-29 2021-08-06 深圳市恒天伟焱科技股份有限公司 Ranging positioning method, ranging device and readable storage medium
CN113219450B (en) * 2021-04-29 2024-04-19 深圳市恒天伟焱科技股份有限公司 Ranging positioning method, ranging device and readable storage medium
CN113593251A (en) * 2021-07-22 2021-11-02 世邦通信股份有限公司 Quick screening method and system for street frying vehicle
CN113543010B (en) * 2021-09-15 2022-02-22 阿里巴巴达摩院(杭州)科技有限公司 Detection method and device for microphone equipment, storage medium and processor
CN113543010A (en) * 2021-09-15 2021-10-22 阿里巴巴达摩院(杭州)科技有限公司 Detection method and device for microphone equipment, storage medium and processor
CN114220457A (en) * 2021-10-29 2022-03-22 成都中科信息技术有限公司 Audio data processing method and device of dual-channel communication link and storage medium
CN114040312A (en) * 2021-11-29 2022-02-11 四川虹美智能科技有限公司 Microphone detection method and system of voice air conditioner
CN114040312B (en) * 2021-11-29 2023-08-22 四川虹美智能科技有限公司 Microphone detection method and system of voice air conditioner
CN114827821A (en) * 2022-04-25 2022-07-29 世邦通信股份有限公司 Pickup control method and system for pickup, pickup apparatus, and storage medium
CN114859194A (en) * 2022-07-07 2022-08-05 杭州兆华电子股份有限公司 Non-contact-based partial discharge detection method and device

Also Published As

Publication number Publication date
CN111770427B (en) 2023-01-24

Similar Documents

Publication Publication Date Title
CN111770427B (en) Microphone array detection method, device, equipment and storage medium
CN110147705B (en) Vehicle positioning method based on visual perception and electronic equipment
CN102843547B (en) Intelligent tracking method and system for suspected target
CN111914812B (en) Image processing model training method, device, equipment and storage medium
CN111933112B (en) Awakening voice determination method, device, equipment and medium
US20160187453A1 (en) Method and device for a mobile terminal to locate a sound source
CN110706371B (en) Block chain-based driving safety management method, system and storage medium
CN111262887B (en) Network risk detection method, device, equipment and medium based on object characteristics
CN109558512A (en) A kind of personalized recommendation method based on audio, device and mobile terminal
CN108491816A (en) The method and apparatus for carrying out target following in video
CN111325699B (en) Image restoration method and training method of image restoration model
CN111368811B (en) Living body detection method, living body detection device, living body detection equipment and storage medium
KR20130022626A (en) Method and apparatus for security monitoring using augmented reality
CN106297184A (en) The monitoring method of mobile terminal surrounding, device and mobile terminal
CN109243488A (en) Audio-frequency detection, device and storage medium
CN111081275B (en) Terminal processing method and device based on sound analysis, storage medium and terminal
US11576188B2 (en) External interference radar
US20150161198A1 (en) Computer ecosystem with automatically curated content using searchable hierarchical tags
CN108052506B (en) Natural language processing method, device, storage medium and electronic equipment
CN113724189A (en) Image processing method, device, equipment and storage medium
CN111310595B (en) Method and device for generating information
CN112269939A (en) Scene search method, device, terminal, server and medium for automatic driving
CN105683959A (en) Information processing device, information processing method, and information processing system
CN114333881B (en) Audio transmission noise reduction method, device and medium based on environment self-adaptation
CN115294648A (en) Man-machine gesture interaction method and device, mobile terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant