CN112397055A - Abnormal sound detection method and device and electronic equipment - Google Patents

Abnormal sound detection method and device and electronic equipment Download PDF

Info

Publication number
CN112397055A
CN112397055A CN202110068478.8A CN202110068478A CN112397055A CN 112397055 A CN112397055 A CN 112397055A CN 202110068478 A CN202110068478 A CN 202110068478A CN 112397055 A CN112397055 A CN 112397055A
Authority
CN
China
Prior art keywords
sound
sound signal
frequency point
frequency
abnormal sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110068478.8A
Other languages
Chinese (zh)
Other versions
CN112397055B (en
Inventor
张量
许振斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Family Intelligent Technology Co Ltd
Original Assignee
Beijing Family Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Family Intelligent Technology Co Ltd filed Critical Beijing Family Intelligent Technology Co Ltd
Priority to CN202110068478.8A priority Critical patent/CN112397055B/en
Publication of CN112397055A publication Critical patent/CN112397055A/en
Application granted granted Critical
Publication of CN112397055B publication Critical patent/CN112397055B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

The invention provides an abnormal sound detection method, an abnormal sound detection device and electronic equipment.

Description

Abnormal sound detection method and device and electronic equipment
Technical Field
The invention relates to the technical field of computers, in particular to an abnormal sound detection method and device and electronic equipment.
Background
At present, sound is one of important information sources, and there are many scenes in daily production and life that need to collect sound information, and detect and alarm by using the collected sound information, for example, in agricultural breeding work, the health condition of poultry needs to be monitored by poultry calling, and in the field of security and protection, sound needs to be used as an important means for solving video monitoring dead angles, and the like. Therefore, a method for detecting abnormal sounds is needed to detect abnormal sounds in the scene.
Disclosure of Invention
In order to solve the above problem, an object of the embodiments of the present invention is to provide an abnormal sound detection method, an abnormal sound detection apparatus, and an electronic device.
In a first aspect, an embodiment of the present invention provides an abnormal sound detection method, including:
when sound signals of the surrounding environment are collected, the abnormal sound detection equipment acquires surrounding environment information;
when the abnormal sound detection equipment is determined to be in a severe environment according to the acquired ambient environment information, performing Fourier transform on the sound signal to obtain a Fourier-transformed sound signal, wherein the sound signal comprises frequency components of the sound signal; wherein the frequency components of the sound signal comprise: the frequency range of the sound signal, the amplitude of the frequency points in the frequency range and the initial phase of each frequency point;
respectively calculating the difference value of the amplitude value between each frequency point and the adjacent frequency point of each frequency point;
when the difference values of the amplitudes between the frequency points and the adjacent frequency points of the frequency points are larger than or equal to the amplitude difference value threshold, determining the sound signals corresponding to the frequency points as narrow-band signals, and skipping to a step of converting all frequencies in the frequency range of the sound signals into Mel frequencies of the sound signals when the sound signals are determined as the narrow-band signals;
when the difference value between the amplitude of the frequency point and the amplitude between the adjacent frequency points on one side of the frequency point is more than or equal to an amplitude difference threshold value and the difference value between the amplitude of the frequency point and the amplitude between the adjacent frequency points on the other side of the frequency point is less than the amplitude difference threshold value, taking the adjacent frequency points on the other side of the frequency point as the frequency points to be detected;
taking the direction of the frequency point reaching the adjacent frequency point on the other side of the frequency point as a detection direction;
calculating the difference value between the amplitude of the frequency point adjacent to the frequency point to be detected in the detection direction and the amplitude of the frequency point to be detected;
judging whether the difference value between the amplitude of the frequency point adjacent to the frequency point to be detected in the detection direction and the amplitude of the frequency point to be detected is smaller than an amplitude difference value threshold value, if so, taking the frequency point adjacent to the frequency point to be detected in the detection direction as the frequency point to be detected, and returning to the step of calculating the difference value between the amplitude of the frequency point adjacent to the frequency point to be detected in the detection direction and the amplitude of the frequency point to be detected; if not, counting the number of the frequency points to be detected;
when the number of the frequency points to be detected is less than or equal to a number threshold, determining the sound signals between the frequency points to be detected and the frequency points to be detected indicated by the number of the frequency points to be detected in the detection direction as narrow-band signals;
converting each frequency within a frequency range of the sound signal into a Mel frequency of the sound signal when it is determined that the sound signal is determined to be a narrow-band signal;
inputting the Mel frequency of the sound signal into a Mel filter bank to obtain a filtering result of the sound signal in the Mel domain;
carrying out logarithmic calculation and discrete cosine transform on the filtering result of the obtained sound signal in the Mel domain to obtain the sound characteristic of the sound signal;
inputting the obtained sound characteristics of the sound signal into a trained convolutional neural network model, and detecting whether the sound signal is abnormal sound; wherein, the convolution neural network model includes: n convolutional layers, wherein the activation function and the pooling layer are arranged after the N convolutional layers, when the result of the convolution, activation and pooling of the sound characteristic of the sound signal is called a characteristic diagram and is represented by F, M fully-connected layers are provided, the size of a convolution kernel of each convolutional layer is Kn, the step length is Sn, the number of the convolution kernels is 2N, N is more than or equal to 1 and less than or equal to N, the number of neurons of the last fully-connected layer is 2, the number of neurons of other fully-connected layers is M, M is more than or equal to 1 and less than or equal to M, the activation function after convolution of each layer is Max-Feature-Map, and the following expression is that:
Figure 374399DEST_PATH_IMAGE001
wherein k is more than or equal to 1 and less than or equal to n;
Figure 866560DEST_PATH_IMAGE002
the number of F is 2 n;
the derivative of a is then:
Figure 232820DEST_PATH_IMAGE003
wherein w is the width of the feature map; h is the height of the feature map; i is the ith column of the characteristic diagram, i is more than or equal to 0 and is less than w; j is the jth row of the characteristic diagram, and j is more than or equal to 0 and less than h; k is more than or equal to 1 and less than or equal to n, and R is a real number space;
Figure 773522DEST_PATH_IMAGE004
a k-th feature map of 2n feature maps generated by the n-th convolutional layer;
Figure 278978DEST_PATH_IMAGE005
representing the characteristic value of the ith column and the jth row in the kth characteristic diagram;
Figure 82986DEST_PATH_IMAGE006
and (3) representing the characteristic value of the ith column and the jth row in the (k + n) th characteristic diagram in the 2n characteristic diagrams generated by the nth convolutional layer.
In a second aspect, an embodiment of the present invention further provides an abnormal sound detection apparatus, including:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring surrounding environment information when sound signals of the surrounding environment are acquired;
the first processing module is used for carrying out Fourier transform on the sound signal to obtain a Fourier-transformed sound signal when the abnormal sound detection device is determined to be in a severe environment according to the obtained surrounding environment information, wherein the sound signal comprises frequency components of the sound signal; wherein the frequency components of the sound signal comprise: the frequency range of the sound signal, the amplitude of the frequency points in the frequency range and the initial phase of each frequency point;
the calculation module is used for calculating the difference value of the amplitude between each frequency point and the adjacent frequency point of each frequency point;
the second processing module is used for determining the sound signals corresponding to the frequency points as narrow-band signals when the difference values of the amplitudes between the frequency points and the adjacent frequency points of the frequency points are greater than or equal to the amplitude difference value threshold value, and skipping to execute the function of the conversion module;
the third processing module is used for taking the adjacent frequency point on the other side of the frequency point as the frequency point to be detected when the difference value of the amplitude between the frequency point and the adjacent frequency point on one side of the frequency point is more than or equal to the amplitude difference threshold value and the difference value of the amplitude between the frequency point and the adjacent frequency point on the other side of the frequency point is less than the amplitude difference threshold value;
the fourth processing module is used for taking the direction from the frequency point to the adjacent frequency point on the other side of the frequency point as the detection direction;
the difference value calculation module is used for calculating the difference value between the amplitude value of the frequency point adjacent to the frequency point to be detected in the detection direction and the amplitude value of the frequency point to be detected;
the judging module is used for judging whether the difference value between the amplitude of the frequency point adjacent to the frequency point to be detected in the detection direction and the amplitude of the frequency point to be detected is smaller than an amplitude difference value threshold value or not, if so, the frequency point adjacent to the frequency point to be detected in the detection direction is used as the frequency point to be detected, and the function of the difference value calculating module is returned to be executed; if not, counting the number of the frequency points to be detected;
the determining module is used for determining the sound signals between the frequency points to be detected and the frequency points to be detected indicated by the number of the frequency points to be detected in the detection direction as narrow-band signals when the number of the frequency points to be detected is less than or equal to a number threshold;
a conversion module for converting each frequency within a frequency range of the sound signal into a Mel frequency of the sound signal when it is determined that the sound signal is determined to be a narrow-band signal;
the filtering module is used for inputting the Mel frequency of the sound signal into a Mel filter bank to obtain a filtering result of the sound signal in the Mel domain;
the fifth processing module is used for carrying out logarithmic calculation and discrete cosine transform on the filtering result of the obtained sound signal in the Mel domain to obtain the sound characteristic of the sound signal;
the detection module is used for inputting the obtained sound characteristics of the sound signal into a trained convolutional neural network model and detecting whether the sound signal is abnormal sound; wherein, the convolution neural network model includes: n convolutional layers, wherein the activation function and the pooling layer are arranged after the N convolutional layers, when the result of the convolution, activation and pooling of the sound characteristic of the sound signal is called a characteristic diagram and is represented by F, M fully-connected layers are provided, the size of a convolution kernel of each convolutional layer is Kn, the step length is Sn, the number of the convolution kernels is 2N, N is more than or equal to 1 and less than or equal to N, the number of neurons of the last fully-connected layer is 2, the number of neurons of other fully-connected layers is M, M is more than or equal to 1 and less than or equal to M, the activation function after convolution of each layer is Max-Feature-Map, and the following expression is that:
Figure 857912DEST_PATH_IMAGE007
wherein k is more than or equal to 1 and less than or equal to n;
Figure 999044DEST_PATH_IMAGE008
the number of F is 2 n;
the derivative of a is then:
Figure 559338DEST_PATH_IMAGE003
wherein w is the width of the feature map; h is the height of the feature map; i is the ith column of the characteristic diagram, i is more than or equal to 0 and is less than w; j is the jth row of the characteristic diagram, and j is more than or equal to 0 and less than h; k is more than or equal to 1 and less than or equal to n, and R is a real number space;
Figure 396232DEST_PATH_IMAGE004
a k-th feature map of 2n feature maps generated by the n-th convolutional layer;
Figure 143608DEST_PATH_IMAGE005
representing the characteristic value of the ith column and the jth row in the kth characteristic diagram;
Figure 494955DEST_PATH_IMAGE006
and (3) representing the characteristic value of the ith column and the jth row in the (k + n) th characteristic diagram in the 2n characteristic diagrams generated by the nth convolutional layer.
In a third aspect, the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps of the method in the first aspect.
In a fourth aspect, embodiments of the present invention also provide an electronic device, which includes a memory, a processor, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the processor to perform the steps of the method according to the first aspect.
In the solutions provided in the foregoing first to fourth aspects of the embodiments of the present invention, when determining that the abnormal sound detection device is in a severe environment according to the obtained ambient environment information, it first determines whether the obtained sound signal is a wideband signal or a narrowband signal, if it is determined that the sound signal is a narrowband signal, then continuously extracts the sound feature of the sound signal, and inputs the extracted sound feature into a trained convolutional neural network model to detect whether the sound signal is an abnormal sound, and compared with a mode of detecting an abnormal sound in the related art, the mode of detecting an abnormal sound may be adjusted in combination with the ambient environment information, so that the efficiency of detecting an abnormal sound under different environmental conditions is greatly improved; before abnormal sound detection is carried out, whether the obtained sound signal is a wide-frequency signal or a narrow-frequency signal is determined, so that whether the obtained sound signal is environmental sound or sound which is required to be detected and is emitted by an object entering the environment is detected; when the sound signal is determined to be not the environmental sound, entering an abnormal sound detection link; therefore, the detection efficiency of abnormal sound is improved, and the occurrence of false detection is prevented.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating an abnormal sound detection method according to embodiment 1 of the present invention;
fig. 2a is a schematic diagram 1 illustrating a corresponding relationship between a frequency point and a frequency amplitude of an acoustic signal in the abnormal sound detection method provided in embodiment 1 of the present invention;
fig. 2b is a schematic diagram 2 showing a corresponding relationship between a frequency point and a frequency amplitude of an acoustic signal in the abnormal sound detection method provided in embodiment 1 of the present invention;
fig. 2c is a schematic diagram 3 showing a corresponding relationship between a frequency point and a frequency amplitude of an acoustic signal in the abnormal sound detection method provided in embodiment 1 of the present invention;
fig. 3 is a schematic structural diagram illustrating an abnormal sound detection apparatus according to embodiment 2 of the present invention;
fig. 4 shows a schematic structural diagram of an electronic device provided in embodiment 3 of the present invention.
Detailed Description
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", and the like, indicate orientations and positional relationships based on those shown in the drawings, and are used only for convenience of description and simplicity of description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be considered as limiting the present invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
In the present invention, unless otherwise expressly specified or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
At present, sound is one of important information sources, and there are many scenes in daily production and life that need to collect sound information, and detect and alarm by using the collected sound information, for example, in agricultural breeding work, the health condition of poultry needs to be monitored by poultry calling, and in the field of security and protection, sound needs to be used as an important means for solving video monitoring dead angles, and the like. Therefore, a method for detecting abnormal sounds is needed to detect abnormal sounds in the scene.
Based on this, embodiments of the present application provide an abnormal sound detection method, an abnormal sound detection device, and an electronic device, when determining that the abnormal sound detection device is in a harsh environment according to the obtained ambient environment information, first determine whether the obtained sound signal is a wideband signal or a narrowband signal, if it is determined that the sound signal is a narrowband signal, continue to extract sound features of the sound signal, then input the extracted sound features into a trained convolutional neural network model, detect whether the sound signal is an abnormal sound, and adjust a mode of detecting the abnormal sound in combination with the ambient environment information, thereby greatly improving efficiency of abnormal sound detection under different environmental conditions.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.
Example 1
Referring to a flowchart of an abnormal sound detection method shown in fig. 1, the present embodiment provides an abnormal sound detection method, including the following specific steps:
step 100, when sound signals of the surrounding environment are collected, the abnormal sound detection device acquires surrounding environment information.
In the step 100, the abnormal sound detecting device is a device having a processor and is configured to detect abnormal sounds such as distress sounds and cursing sounds occurring around the device.
And the abnormal sound detection device is networked with other abnormal sound detection devices to form a block chain system for storing the determined abnormal sound. The abnormal sound detection device is registered on the blockchain system and becomes a common node in the blockchain system.
The ambient environment information includes, but is not limited to: weather information, location information of the abnormal sound detection device, time information, and geographical location information of the abnormal sound detection device.
The weather information is acquired from a weather information website by the abnormal sound detection equipment according to the geographical position information and the time information of the abnormal sound detection equipment.
The location information of the abnormal sound detection device itself may be a surrounding environment image collected by an image collection device connected to and disposed together with the abnormal sound detection device, that is, the surrounding environment image collected by the image collection device connected to and disposed together with the abnormal sound detection device is used as the location information of the abnormal sound detection device itself.
The time information is the current system time indicated by the system clock of the abnormal sound detection device itself.
The geographical position information of the abnormal sound detection device is preset in the abnormal sound detection device and is used for indicating the position and the area of the abnormal sound detection device.
After the ambient environment information is acquired, whether the abnormal sound detection device is in a severe environment or not can be determined according to the ambient environment information.
The severe environment can be the environment under the weather conditions such as sand storm, rainstorm and the like indicated by the weather information; or the midnight environment corresponding to the time period from 10 midnight to 5 early morning indicated by the time information; the abnormal sound detection device can also be indicated in a region environment with dense and noisy people flow according to the information of the place where the abnormal sound detection device is located and the information of the geographical position where the abnormal sound detection device is located.
In this embodiment, the abnormal sound detection algorithm of the abnormal sound detection apparatus in the non-severe environment is not discussed; in the non-harsh environment, the abnormal sound detection device may use any sound detection algorithm in the prior art to detect abnormal sounds in the surrounding environment, and the specific process is not described herein again.
Before the step 102 is executed, it is necessary to determine whether the abnormal sound detection device is in a harsh environment according to the surrounding environment information, and it can be known from the description of the harsh environment that a specific determination process of determining whether the abnormal sound detection device is in a harsh environment according to the surrounding environment information is the prior art, and details are not described here.
When it is determined that the abnormal sound detection apparatus is in a severe environment based on the surrounding environment information, the following step 102 is continuously performed.
And 102, when the abnormal sound detection equipment is determined to be in a severe environment according to the acquired ambient environment information, performing Fourier transform on the sound signal to obtain a Fourier-transformed sound signal, wherein the sound signal comprises frequency components of the sound signal.
Wherein the frequency components of the sound signal comprise: the frequency range of the sound signal, the amplitude of the frequency points in the frequency range and the initial phase of each frequency point.
The specific process of performing fourier transform on the sound signal to obtain the frequency component of the sound signal is the prior art, and is not described herein again.
And step 104, respectively calculating the difference value of the amplitude value between each frequency point and the adjacent frequency point of each frequency point.
And 106, when the difference values of the amplitudes between the frequency points and the adjacent frequency points of the frequency points are larger than or equal to the amplitude difference value threshold, determining the sound signals corresponding to the frequency points as narrow-band signals, and skipping to the step 122.
In step 106, the amplitude difference threshold is preset in the abnormal sound detection device.
In one embodiment, referring to the schematic diagrams of correspondence between frequency points and amplitudes of noise signals shown in fig. 2a to 2c, whether a frequency point is a narrow-band signal is sequentially determined according to a sequence from left to right. The above step 106 is described as a process of determining the noise signal corresponding to the frequency point as a narrow-band signal shown in fig. 2 a.
And 108, when the difference value of the amplitudes between the frequency point and the adjacent frequency point on one side of the frequency point is more than or equal to an amplitude difference threshold value and the difference value of the amplitudes between the frequency point and the adjacent frequency point on the other side of the frequency point is less than the amplitude difference threshold value, taking the adjacent frequency point on the other side of the frequency point as a frequency point to be detected.
And step 110, taking the direction of the frequency point reaching the adjacent frequency point on the other side of the frequency point as a detection direction.
And 112, calculating the difference value between the amplitude of the frequency point adjacent to the frequency point to be detected in the detection direction and the amplitude of the frequency point to be detected.
Step 114, judging whether the difference value between the amplitude of the frequency point adjacent to the frequency point to be detected in the detection direction and the amplitude of the frequency point to be detected is smaller than an amplitude difference value threshold value, if so, executing step 116; if not, step 118 is performed.
And step 116, taking the frequency point adjacent to the frequency point to be detected in the detection direction as the frequency point to be detected, and returning to step 112.
And step 118, counting the number of the frequency points to be detected.
And 120, when the number of the frequency points to be detected is less than or equal to a number threshold, determining the sound signals between the frequency points to be detected and the frequency points to be detected indicated by the number of the frequency points to be detected in the detection direction as narrow-band signals.
In the above step 120, the number threshold is set in advance in the abnormal sound detecting apparatus.
The above-described situations in steps 108 to 120 are the process shown in fig. 2b of determining the noise signal between the frequency point to be detected and the frequency point to be detected indicated by the number of the frequency points to be detected in the detection direction as the narrow-band signal.
And step 122, when the sound signal is determined to be the narrow-band signal, converting each frequency in the frequency range of the sound signal into the Mel frequency of the sound signal.
And step 124, inputting the Mel frequency of the sound signal into a Mel filter bank to obtain a filtering result of the sound signal in the Mel domain.
And step 126, performing logarithm calculation and discrete cosine transform on the filtering result of the obtained sound signal in the Mel domain to obtain the sound characteristic of the sound signal.
The process of obtaining the sound characteristics of the sound signal described in the above steps 122 to 126 is similar to the process of extracting Mel-Frequency Cepstral Coefficients (MFCCs) of the sound signal in the prior art, and is not repeated here.
And step 128, inputting the obtained sound characteristics of the sound signal into a trained convolutional neural network model, and detecting whether the sound signal is abnormal sound.
The trained convolutional neural network model is obtained by training with normal sounds (conversation sounds, singing sounds and the like) and abnormal sounds in advance. The specific training process is prior art and will not be described herein.
The convolutional neural network model includes: n convolutional layers, wherein the activation function and the pooling layer are arranged after the N convolutional layers, when the result of the convolution, activation and pooling of the sound characteristic of the sound signal is called a characteristic diagram and is represented by F, M fully-connected layers are provided, the size of a convolution kernel of each convolutional layer is Kn, the step length is Sn, the number of the convolution kernels is 2N, N is more than or equal to 1 and less than or equal to N, the number of neurons of the last fully-connected layer is 2, the number of neurons of other fully-connected layers is M, M is more than or equal to 1 and less than or equal to M, the activation function after convolution of each layer is Max-Feature-Map, and the following expression is that:
Figure 440914DEST_PATH_IMAGE001
wherein k is more than or equal to 1 and less than or equal to n;
Figure 117883DEST_PATH_IMAGE002
since the number of feature maps generated by the convolutional layer is the same as the number of convolutional kernels of the convolutional layer, the number of generated F is 2 n;
the derivative of a is then:
Figure 946030DEST_PATH_IMAGE003
wherein w is the width of the feature map; h is the height of the feature map; i is the ith column of the characteristic diagram, i is more than or equal to 0 and is less than w; j is the jth row of the characteristic diagram, and j is more than or equal to 0 and less than h; k is more than or equal to 1 and less than or equal to n, and R is a real number space;
Figure 835489DEST_PATH_IMAGE004
a k-th feature map of 2n feature maps generated by the n-th convolutional layer;
Figure 901534DEST_PATH_IMAGE005
representing the characteristic value of the ith column and the jth row in the kth characteristic diagram;
Figure 483825DEST_PATH_IMAGE006
and (3) representing the characteristic value of the ith column and the jth row in the (k + n) th characteristic diagram in the 2n characteristic diagrams generated by the nth convolutional layer.
The abnormal sound detection method further includes: and when the difference values of the amplitudes between the frequency points and the adjacent frequency points of the frequency points are smaller than the amplitude difference value threshold, determining the sound signals corresponding to the frequency points as broadband signals, thereby determining that the collected sound signals belong to environmental sounds without abnormal sound detection.
The wideband signal is the corresponding relationship between the frequency point and the amplitude of the sound signal described in fig. 2 c.
The process of detecting whether the sound signal is an abnormal sound by the trained convolutional neural network model is the prior art, and is not described herein again.
Since abnormal sounds can appear as important evidence in law, it is desirable to avoid the abnormal sounds from being falsified and altered as much as possible; therefore, when it is determined that the sound signal is an abnormal sound, the abnormal sound detection method proposed in the present embodiment may further include the following steps (1) to (3):
(1) assigning an abnormal sound identifier to the sound signal when it is determined that the sound signal is an abnormal sound;
(2) storing the sound signal into a block chain system where the abnormal sound detection equipment is located, and obtaining a block address fed back by the block chain system and used for storing the sound signal;
(3) carrying out Hash calculation on a block address for storing the sound signal to obtain a block address Hash value, generating a corresponding relation between an abnormal sound identifier and the block address Hash value, sending the block address Hash value to a block chain system, enabling the block chain system to generate a corresponding relation between the block address for storing the sound signal and the received block address Hash value when receiving the block address Hash value, and storing the generated corresponding relation between the block address for storing the sound signal and the received block address Hash value.
In the above step (1), the abnormal sound flag assigned to the sound signal is generated by the abnormal sound detecting apparatus, and the abnormal sound flag assigned to the sound signal is set in the sound signal.
In the step (2), after storing the sound signal, the block chain system may acquire a block address where the sound signal is stored, and then send the acquired block address to the abnormal sound detection apparatus.
As can be seen from the processes described in the steps (1) to (3), the sound signal as the abnormal sound can be stored in the block chain system, the block address where the abnormal sound is stored is subjected to hash calculation to obtain the block address hash value, and the calculated block address hash value is stored in the block chain system, so that the abnormal sound is stored by the block chain system, the abnormal sound and the block address where the abnormal sound is stored are managed by the block chain system, and the characteristic that the abnormal sound is traceable and not changeable by the block chain system is utilized to ensure that the abnormal sound is not tampered, thereby ensuring the accuracy of the abnormal sound.
When an abnormal sound needs to be queried, the abnormal sound detection method provided by this embodiment further includes the following steps (11) to (15):
(11) acquiring abnormal sound query information, wherein the abnormal sound query information comprises: the user identification of the user sending the abnormal sound query information, the query time and the abnormal sound identification of the abnormal sound needing to be queried;
(12) when the block address hash value corresponding to the abnormal sound identifier can be inquired by using the abnormal sound identifier in the abnormal sound inquiry information, generating a sound signal inquiry instruction by using the inquired block address hash value corresponding to the abnormal sound identifier;
(13) sending the generated sound signal query instruction to a block chain system, so that the block chain system queries a block address corresponding to the block address hash value carried in the received sound signal query instruction according to the block address hash value carried in the received sound signal query instruction, reads a sound signal from the queried block address, and feeds the read sound signal back to the abnormal sound detection device;
(14) receiving a sound signal fed back by the block chain system, and feeding back the received sound signal to a user sending the abnormal sound query information;
(15) sending the abnormal sound query information to the blockchain system, so that the blockchain system stores the abnormal sound query information into a query log, wherein the query log is arranged in the blockchain system.
As can be seen from the content described in the above steps (11) to (15), after the query is completed, the abnormal sound detection device sends the abnormal sound query information to the blockchain system, so that the blockchain system stores the abnormal sound query information in the query log, thereby recording the abnormal sound query information and facilitating the authority security management of data; when abnormal sound is leaked or lost, investigation can be carried out according to the abnormal sound inquiry information.
To sum up, this embodiment provides an abnormal sound detection method, where when determining that the abnormal sound detection device is in a harsh environment according to the obtained ambient environment information, it is first determined whether the obtained sound signal is a wideband signal or a narrowband signal, and if it is determined that the sound signal is a narrowband signal, then continuously extracting sound features of the sound signal, and then inputting the extracted sound features into a trained convolutional neural network model to detect whether the sound signal is an abnormal sound, and compared with a mode of abnormal sound detection in the related art, a mode of detecting an abnormal sound may be adjusted in combination with the ambient environment information, so that efficiency of abnormal sound detection under different environmental conditions is greatly improved; before abnormal sound detection is carried out, whether the obtained sound signal is a wide-frequency signal or a narrow-frequency signal is determined, so that whether the obtained sound signal is environmental sound or sound which is required to be detected and is emitted by an object entering the environment is detected; when the sound signal is determined to be not the environmental sound, entering an abnormal sound detection link; therefore, the detection efficiency of abnormal sound is improved, and the occurrence of false detection is prevented.
Example 2
The present embodiment proposes an abnormal sound detection apparatus for performing the above abnormal sound detection method.
Referring to a schematic structural diagram of an abnormal sound detection apparatus shown in fig. 3, the present embodiment provides an abnormal sound detection apparatus, including:
a first obtaining module 200, configured to obtain ambient environment information when a sound signal of an ambient environment is collected;
the first processing module 202 is configured to, when it is determined that the abnormal sound detection apparatus is in a severe environment according to the obtained ambient environment information, perform fourier transform on the sound signal to obtain a fourier-transformed sound signal, where the sound signal includes frequency components of the sound signal; wherein the frequency components of the sound signal comprise: the frequency range of the sound signal, the amplitude of the frequency points in the frequency range and the initial phase of each frequency point;
a calculating module 204, configured to calculate a difference between amplitudes of each frequency point and an adjacent frequency point of each frequency point;
the second processing module 206 is configured to determine the sound signal corresponding to the frequency point as a narrow-band signal and skip to execute a function of the conversion module when the difference between the frequency amplitude of the frequency point and the amplitude between adjacent frequency points of the frequency point is greater than or equal to the amplitude difference threshold;
a third processing module 208, configured to, when a difference between the amplitudes of the frequency point and an adjacent frequency point on one side of the frequency point is greater than or equal to an amplitude difference threshold and a difference between the amplitude of the frequency point and an adjacent frequency point on the other side of the frequency point is smaller than the amplitude difference threshold, take the adjacent frequency point on the other side of the frequency point as a frequency point to be detected;
a fourth processing module 210, configured to use a direction in which the frequency point reaches an adjacent frequency point on the other side of the frequency point as a detection direction;
a difference value calculating module 212, configured to calculate a difference value between the amplitude of the frequency point adjacent to the frequency point to be detected in the detection direction and the amplitude of the frequency point to be detected;
a judging module 214, configured to judge whether a difference between a frequency amplitude of a frequency point adjacent to the frequency point to be detected in the detection direction and an amplitude of the frequency point to be detected is smaller than an amplitude difference threshold, if so, take the frequency point adjacent to the frequency point to be detected in the detection direction as the frequency point to be detected, and return to execute the function of the difference calculating module; if not, counting the number of the frequency points to be detected;
a determining module 216, configured to determine, when the number of the frequency points to be detected is less than or equal to a number threshold, a sound signal between the frequency point and the frequency point to be detected, where the number of the frequency points to be detected indicates in the detection direction, as a narrow-band signal;
a conversion module 218, configured to convert each frequency in the frequency range of the sound signal into a Mel frequency of the sound signal when it is determined that the sound signal is determined to be a narrow-band signal;
the filtering module 220 is configured to input the Mel frequency of the sound signal into a Mel filter bank to obtain a filtering result of the sound signal in the Mel domain;
a fifth processing module 222, configured to perform logarithm calculation and discrete cosine transform on the filtering result of the obtained sound signal in the Mel domain to obtain the sound characteristic of the sound signal;
a detection module 224, configured to input the obtained sound features of the sound signal into a trained convolutional neural network model, and detect whether the sound signal is an abnormal sound; wherein, the convolution neural network model includes: n convolutional layers, wherein the activation function and the pooling layer are arranged after the N convolutional layers, when the result of the convolution, activation and pooling of the sound characteristic of the sound signal is called a characteristic diagram and is represented by F, M fully-connected layers are provided, the size of a convolution kernel of each convolutional layer is Kn, the step length is Sn, the number of the convolution kernels is 2N, N is more than or equal to 1 and less than or equal to N, the number of neurons of the last fully-connected layer is 2, the number of neurons of other fully-connected layers is M, M is more than or equal to 1 and less than or equal to M, the activation function after convolution of each layer is Max-Feature-Map, and the following expression is that:
Figure 268110DEST_PATH_IMAGE001
wherein k is more than or equal to 1 and less than or equal to n;
Figure 554735DEST_PATH_IMAGE002
the number of F is 2 n;
the derivative of a is then:
Figure 669760DEST_PATH_IMAGE003
wherein w is the width of the feature map; h is the height of the feature map; i is the ith column of the characteristic diagram, i is more than or equal to 0 and is less than w; j is the jth row of the characteristic diagram, and j is more than or equal to 0 and less than h; k is more than or equal to 1 and less than or equal to n, and R is a real number space;
Figure 422952DEST_PATH_IMAGE004
a k-th feature map of 2n feature maps generated by the n-th convolutional layer;
Figure 22430DEST_PATH_IMAGE005
representing the characteristic value of the ith column and the jth row in the kth characteristic diagram;
Figure 847166DEST_PATH_IMAGE006
2n characteristic diagrams generated by representing the nth convolutional layerThe ith column and the jth row in the (k + n) th feature maps.
Further, the abnormal sound detection apparatus according to the present embodiment further includes:
and the sixth processing module is used for determining the sound signals corresponding to the frequency points as broadband signals when the difference values of the amplitudes between the adjacent frequency points of the frequency points are smaller than the amplitude difference value threshold value, so that the collected sound signals are determined to belong to environmental sounds, and abnormal sound detection is not required.
Further, the abnormal sound detection apparatus according to the present embodiment further includes:
an assigning module for assigning an abnormal sound identifier to the sound signal when it is determined that the sound signal is an abnormal sound;
the storage module is used for storing the sound signal into a block chain system where the abnormal sound detection equipment is located, and obtaining a block address which is fed back by the block chain system and used for storing the sound signal;
and the hash calculation module is used for carrying out hash calculation on the block address for storing the sound signal to obtain a block address hash value, generating a corresponding relation between an abnormal sound identifier and the block address hash value, and sending the block address hash value to a block chain system, so that the block chain system generates a corresponding relation between the block address for storing the sound signal and the received block address hash value when receiving the block address hash value, and stores the generated corresponding relation between the block address for storing the sound signal and the received block address hash value.
Further, the abnormal sound detection apparatus according to the present embodiment further includes:
a second obtaining module, configured to obtain abnormal sound query information, where the abnormal sound query information includes: the user identification of the user sending the abnormal sound query information, the query time and the abnormal sound identification of the abnormal sound needing to be queried;
a third obtaining module, configured to, when a block address hash value corresponding to the abnormal sound identifier can be queried by using the abnormal sound identifier in the abnormal sound query information, generate a sound signal query instruction by using the queried block address hash value corresponding to the abnormal sound identifier;
a fourth obtaining module, configured to send the generated sound signal query instruction to a block chain system, so that the block chain system queries, according to a block address hash value carried in the received sound signal query instruction, a block address corresponding to the block address hash value carried in the received sound signal query instruction, reads a sound signal from the queried block address, and feeds the read sound signal back to the abnormal sound detection device;
the feedback module is used for receiving the sound signal fed back by the block chain system and feeding back the received sound signal to the user sending the abnormal sound query information;
a sending module, configured to send the abnormal sound query information to the blockchain system, so that the blockchain system stores the abnormal sound query information in a query log, where the query log is set in the blockchain system.
In summary, the present embodiment provides an abnormal sound detection apparatus, where when determining that the abnormal sound detection device is in a harsh environment according to the obtained ambient environment information, it is first determined whether the obtained sound signal is a wideband signal or a narrowband signal, and if it is determined that the sound signal is a narrowband signal, then continuously extracting sound features of the sound signal, and then inputting the extracted sound features into a trained convolutional neural network model to detect whether the sound signal is an abnormal sound, and compared with a mode of abnormal sound detection in the related art, the abnormal sound detection mode may be adjusted in combination with the ambient environment information, so that efficiency of abnormal sound detection under different environmental conditions is greatly improved; before abnormal sound detection is carried out, whether the obtained sound signal is a wide-frequency signal or a narrow-frequency signal is determined, so that whether the obtained sound signal is environmental sound or sound which is required to be detected and is emitted by an object entering the environment is detected; when the sound signal is determined to be not the environmental sound, entering an abnormal sound detection link; therefore, the detection efficiency of abnormal sound is improved, and the occurrence of false detection is prevented.
Example 3
The present embodiment proposes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the abnormal sound detection method described in embodiment 1 above. For specific implementation, refer to method embodiment 1, which is not described herein again.
In addition, referring to the schematic structural diagram of an electronic device shown in fig. 4, the present embodiment also provides an electronic device, which includes a bus 51, a processor 52, a transceiver 53, a bus interface 54, a memory 55, and a user interface 56. The electronic device comprises a memory 55.
In this embodiment, the electronic device further includes: one or more programs stored on the memory 55 and executable on the processor 52, configured for execution by the processor to perform the steps of:
when sound signals of the surrounding environment are collected, the abnormal sound detection equipment acquires surrounding environment information;
when the abnormal sound detection equipment is determined to be in a severe environment according to the acquired ambient environment information, performing Fourier transform on the sound signal to obtain a Fourier-transformed sound signal, wherein the sound signal comprises frequency components of the sound signal; wherein the frequency components of the sound signal comprise: the frequency range of the sound signal, the frequency amplitude of the frequency points in the frequency range and the initial phase of each frequency point;
respectively calculating the difference value of the amplitude value between each frequency point and the adjacent frequency point of each frequency point;
when the difference values of the amplitudes between the frequency points and the adjacent frequency points of the frequency points are larger than or equal to the amplitude difference value threshold, determining the sound signals corresponding to the frequency points as narrow-band signals, and skipping to the step of converting all frequencies in the frequency range of the sound signals into Mel frequencies of the sound signals when the sound signals are determined as the narrow-band signals;
when the difference value of the amplitude between the frequency point and the adjacent frequency point on one side of the frequency point is more than or equal to an amplitude difference threshold value and the difference value of the amplitude between the frequency point and the adjacent frequency point on the other side of the frequency point is less than the amplitude difference threshold value, taking the adjacent frequency point on the other side of the frequency point as a frequency point to be detected;
taking the direction of the frequency point reaching the adjacent frequency point on the other side of the frequency point as a detection direction;
calculating the difference value between the amplitude of the frequency point adjacent to the frequency point to be detected in the detection direction and the amplitude of the frequency point to be detected;
judging whether the difference value between the amplitude of the frequency point adjacent to the frequency point to be detected in the detection direction and the amplitude of the frequency point to be detected is smaller than an amplitude difference value threshold value, if so, taking the frequency point adjacent to the frequency point to be detected in the detection direction as the frequency point to be detected, and returning to the step of calculating the difference value between the frequency amplitude of the frequency point adjacent to the frequency point to be detected in the detection direction and the amplitude of the frequency point to be detected; if not, counting the number of the frequency points to be detected;
when the number of the frequency points to be detected is less than or equal to a number threshold, determining the sound signals between the frequency points to be detected and the frequency points to be detected indicated by the number of the frequency points to be detected in the detection direction as narrow-band signals;
converting each frequency within a frequency range of the sound signal into a Mel frequency of the sound signal when it is determined that the sound signal is determined to be a narrow-band signal;
inputting the Mel frequency of the sound signal into a Mel filter bank to obtain a filtering result of the sound signal in the Mel domain;
carrying out logarithmic calculation and discrete cosine transform on the filtering result of the obtained sound signal in the Mel domain to obtain the sound characteristic of the sound signal;
inputting the obtained sound characteristics of the sound signal into a trained convolutional neural network model, and detecting whether the sound signal is abnormal sound; wherein, the convolution neural network model includes: n convolutional layers, wherein the activation function and the pooling layer are arranged after the N convolutional layers, when the result of the convolution, activation and pooling of the sound characteristic of the sound signal is called a characteristic diagram and is represented by F, M fully-connected layers are provided, the size of a convolution kernel of each convolutional layer is Kn, the step length is Sn, the number of the convolution kernels is 2N, N is more than or equal to 1 and less than or equal to N, the number of neurons of the last fully-connected layer is 2, the number of neurons of other fully-connected layers is M, M is more than or equal to 1 and less than or equal to M, the activation function after convolution of each layer is Max-Feature-Map, and the following expression is that:
Figure 622224DEST_PATH_IMAGE001
wherein k is more than or equal to 1 and less than or equal to n;
Figure 77476DEST_PATH_IMAGE002
(ii) a n is a preset numerical value;
the derivative of a is then:
Figure 446141DEST_PATH_IMAGE003
wherein w is the width of the feature map; h is the height of the feature map; i is the ith column of the characteristic diagram, i is more than or equal to 0 and is less than w; j is the jth row of the characteristic diagram, and j is more than or equal to 0 and less than h; k is more than or equal to 1 and less than or equal to n, and R is a real number space;
Figure 340147DEST_PATH_IMAGE004
a k-th feature map of 2n feature maps generated by the n-th convolutional layer;
Figure 110657DEST_PATH_IMAGE005
representing the characteristic value of the ith column and the jth row in the kth characteristic diagram;
Figure 736811DEST_PATH_IMAGE006
representing the nth convolutionAnd (3) the characteristic value of the ith column and the jth row in the (k + n) th characteristic diagram in the 2n characteristic diagrams generated by the layer.
A transceiver 53 for receiving and transmitting data under the control of the processor 52.
Where a bus architecture (represented by bus 51) is used, bus 51 may include any number of interconnected buses and bridges, with bus 51 linking together various circuits including one or more processors, represented by processor 52, and memory, represented by memory 55. The bus 51 may also link various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further in this embodiment. A bus interface 54 provides an interface between the bus 51 and the transceiver 53. The transceiver 53 may be one element or may be multiple elements, such as multiple receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. For example: the transceiver 53 receives external data from other devices. The transceiver 53 is used for transmitting data processed by the processor 52 to other devices. Depending on the nature of the computing system, a user interface 56, such as a keypad, display, speaker, microphone, joystick, may also be provided.
The processor 52 is responsible for managing the bus 51 and the usual processing, running a general-purpose operating system as described above. And memory 55 may be used to store data used by processor 52 in performing operations.
Alternatively, processor 52 may be, but is not limited to: a central processing unit, a singlechip, a microprocessor or a programmable logic device.
It will be appreciated that the memory 55 in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (ddr Data Rate SDRAM, ddr SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchlink DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 55 of the systems and methods described in this embodiment is intended to comprise, without being limited to, these and any other suitable types of memory.
In some embodiments, memory 55 stores elements, executable modules or data structures, or a subset thereof, or an expanded set thereof as follows: an operating system 551 and application programs 552.
The operating system 551 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, for implementing various basic services and processing hardware-based tasks. The application 552 includes various applications, such as a Media Player (Media Player), a Browser (Browser), and the like, for implementing various application services. A program implementing the method of an embodiment of the present invention may be included in the application 552.
In summary, the present embodiment provides a computer-readable storage medium and an electronic device, where when determining that the abnormal sound detection device is in a severe environment according to the obtained ambient environment information, it is first determined whether the obtained sound signal is a wideband signal or a narrowband signal, and if it is determined that the sound signal is a narrowband signal, then continuously extracting sound features of the sound signal, and then inputting the extracted sound features into a trained convolutional neural network model to detect whether the sound signal is an abnormal sound, and compared with a mode of detecting an abnormal sound in the related art, the mode of detecting an abnormal sound may be adjusted in combination with the ambient environment information, so that efficiency of detecting an abnormal sound under different environmental conditions is greatly improved; before abnormal sound detection is carried out, whether the obtained sound signal is a wide-frequency signal or a narrow-frequency signal is determined, so that whether the obtained sound signal is environmental sound or sound which is required to be detected and is emitted by an object entering the environment is detected; when the sound signal is determined to be not the environmental sound, entering an abnormal sound detection link; therefore, the detection efficiency of abnormal sound is improved, and the occurrence of false detection is prevented.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. An abnormal sound detection method, comprising:
when sound signals of the surrounding environment are collected, the abnormal sound detection equipment acquires surrounding environment information;
when the abnormal sound detection equipment is determined to be in a severe environment according to the acquired ambient environment information, performing Fourier transform on the sound signal to obtain a Fourier-transformed sound signal, wherein the Fourier-transformed sound signal contains frequency components of the sound signal; wherein the frequency components of the sound signal comprise: the frequency range of the sound signal, the amplitude of the frequency points in the frequency range and the initial phase of each frequency point;
respectively calculating the difference value of the amplitude value between each frequency point and the adjacent frequency point of the frequency points;
when the difference values of the amplitudes between the frequency points and the adjacent frequency points of the frequency points are larger than or equal to the amplitude difference value threshold, determining the sound signals corresponding to the frequency points as narrow-band signals, and skipping to a step of converting all frequencies in the frequency range of the sound signals into Mel frequencies of the sound signals when the sound signals are determined as the narrow-band signals;
when the difference value of the amplitude between the frequency point and the adjacent frequency point on one side of the frequency point is more than or equal to an amplitude difference threshold value and the difference value of the amplitude between the frequency point and the adjacent frequency point on the other side of the frequency point is less than the amplitude difference threshold value, taking the adjacent frequency point on the other side of the frequency point as a frequency point to be detected;
taking the direction of the frequency point reaching the adjacent frequency point on the other side of the frequency point as a detection direction;
calculating the difference value between the amplitude of the frequency point adjacent to the frequency point to be detected in the detection direction and the amplitude of the frequency point to be detected;
judging whether the difference value between the amplitude of the frequency point adjacent to the frequency point to be detected in the detection direction and the amplitude of the frequency point to be detected is smaller than an amplitude difference value threshold value, if so, taking the frequency point adjacent to the frequency point to be detected in the detection direction as the frequency point to be detected, and returning to the step of calculating the difference value between the amplitude of the frequency point adjacent to the frequency point to be detected in the detection direction and the amplitude of the frequency point to be detected; if not, counting the number of the frequency points to be detected;
when the number of the frequency points to be detected is less than or equal to a number threshold, determining the sound signals between the frequency points to be detected and the frequency points to be detected indicated by the number of the frequency points to be detected in the detection direction as narrow-band signals;
converting each frequency within a frequency range of the sound signal into a Mel frequency of the sound signal when it is determined that the sound signal is determined to be a narrow-band signal;
inputting the Mel frequency of the sound signal into a Mel filter bank to obtain a filtering result of the sound signal in the Mel domain;
carrying out logarithmic calculation and discrete cosine transform on the filtering result of the obtained sound signal in the Mel domain to obtain the sound characteristic of the sound signal;
inputting the obtained sound characteristics of the sound signal into a trained convolutional neural network model, and detecting whether the sound signal is abnormal sound; wherein, the convolution neural network model includes: n convolutional layers, wherein the activation function and the pooling layer are arranged after the N convolutional layers, when the result of the convolution, activation and pooling of the sound characteristic of the sound signal is called a characteristic diagram and is represented by F, M fully-connected layers are provided, the size of a convolution kernel of each convolutional layer is Kn, the step length is Sn, the number of the convolution kernels is 2N, N is more than or equal to 1 and less than or equal to N, the number of neurons of the last fully-connected layer is 2, the number of neurons of other fully-connected layers is M, M is more than or equal to 1 and less than or equal to M, the activation function after convolution of each layer is Max-Feature-Map, and the following expression is that:
Figure 133816DEST_PATH_IMAGE001
wherein k is more than or equal to 1 and less than or equal to n;
Figure 519798DEST_PATH_IMAGE002
the number of F is 2 n;
the derivative of a is then:
Figure 884221DEST_PATH_IMAGE003
wherein w is the width of the feature map; h is the height of the feature map; i is the ith column of the characteristic diagram, i is more than or equal to 0 and is less than w; j is the jth row of the characteristic diagram, and j is more than or equal to 0 and less than h; k is more than or equal to 1 and less than or equal to n, and R is a real number space;
Figure 748272DEST_PATH_IMAGE004
a k-th feature map of 2n feature maps generated by the n-th convolutional layer;
Figure 890541DEST_PATH_IMAGE005
representing the characteristic value of the ith column and the jth row in the kth characteristic diagram;
Figure 978582DEST_PATH_IMAGE006
and (3) representing the characteristic value of the ith column and the jth row in the (k + n) th characteristic diagram in the 2n characteristic diagrams generated by the nth convolutional layer.
2. The method of claim 1, comprising: and when the difference values of the amplitudes of the frequency points and the amplitudes between the adjacent frequency points of the frequency points are smaller than the amplitude difference value threshold value, determining the sound signals corresponding to the frequency points as broadband signals, thereby determining that the collected sound signals belong to environmental sounds without abnormal sound detection.
3. The method of claim 1, further comprising:
assigning an abnormal sound identifier to the sound signal when it is determined that the sound signal is an abnormal sound;
storing the sound signal into a block chain system where the abnormal sound detection equipment is located, and obtaining a block address fed back by the block chain system and used for storing the sound signal;
carrying out Hash calculation on a block address for storing the sound signal to obtain a block address Hash value, generating a corresponding relation between an abnormal sound identifier and the block address Hash value, sending the block address Hash value to a block chain system, enabling the block chain system to generate a corresponding relation between the block address for storing the sound signal and the received block address Hash value when receiving the block address Hash value, and storing the generated corresponding relation between the block address for storing the sound signal and the received block address Hash value.
4. The method of claim 3, further comprising:
acquiring abnormal sound query information, wherein the abnormal sound query information comprises: the user identification of the user sending the abnormal sound query information, the query time and the abnormal sound identification of the abnormal sound needing to be queried;
when the block address hash value corresponding to the abnormal sound identifier can be inquired by using the abnormal sound identifier in the abnormal sound inquiry information, generating a sound signal inquiry instruction by using the inquired block address hash value corresponding to the abnormal sound identifier;
sending the generated sound signal query instruction to a block chain system, so that the block chain system queries a block address corresponding to the block address hash value carried in the received sound signal query instruction according to the block address hash value carried in the received sound signal query instruction, reads a sound signal from the queried block address, and feeds the read sound signal back to the abnormal sound detection device;
receiving a sound signal fed back by the block chain system, and feeding back the received sound signal to a user sending the abnormal sound query information;
sending the abnormal sound query information to the blockchain system, so that the blockchain system stores the abnormal sound query information into a query log, wherein the query log is arranged in the blockchain system.
5. An abnormal sound detection apparatus, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring surrounding environment information when sound signals of the surrounding environment are acquired;
the first processing module is used for performing Fourier transform on the sound signal to obtain a Fourier-transformed sound signal when the abnormal sound detection device is determined to be in a severe environment according to the obtained surrounding environment information, wherein the Fourier-transformed sound signal comprises frequency components of the sound signal; wherein the frequency components of the sound signal comprise: the frequency range of the sound signal, the amplitude of the frequency points in the frequency range and the initial phase of each frequency point;
the calculation module is used for calculating the difference value of the amplitude between each frequency point and the adjacent frequency point of each frequency point;
the second processing module is used for determining the sound signals corresponding to the frequency points as narrow-band signals and skipping to execute the function of the conversion module when the difference values between the amplitudes of the frequency points and the amplitudes between the adjacent frequency points of the frequency points are larger than or equal to the amplitude difference value threshold;
the third processing module is used for taking the adjacent frequency point on the other side of the frequency point as the frequency point to be detected when the difference value between the amplitude of the frequency point and the amplitude of the adjacent frequency point on one side of the frequency point is more than or equal to the amplitude difference threshold value and the difference value between the amplitude of the frequency point and the amplitude of the adjacent frequency point on the other side of the frequency point is less than the amplitude difference threshold value;
the fourth processing module is used for taking the direction from the frequency point to the adjacent frequency point on the other side of the frequency point as the detection direction;
the difference value calculation module is used for calculating the difference value between the amplitude value of the frequency point adjacent to the frequency point to be detected in the detection direction and the amplitude value of the frequency point to be detected;
the judging module is used for judging whether the difference value between the amplitude of the frequency point adjacent to the frequency point to be detected in the detection direction and the amplitude of the frequency point to be detected is smaller than a frequency difference value threshold value, if so, the frequency point adjacent to the frequency point to be detected in the detection direction is used as the frequency point to be detected, and the function of the difference value calculating module is returned to be executed; if not, counting the number of the frequency points to be detected;
the determining module is used for determining the sound signals between the frequency points to be detected and the frequency points to be detected indicated by the number of the frequency points to be detected in the detection direction as narrow-band signals when the number of the frequency points to be detected is less than or equal to a number threshold;
a conversion module for converting each frequency within a frequency range of the sound signal into a Mel frequency of the sound signal when it is determined that the sound signal is determined to be a narrow-band signal;
the filtering module is used for inputting the Mel frequency of the sound signal into a Mel filter bank to obtain a filtering result of the sound signal in the Mel domain;
the fifth processing module is used for carrying out logarithmic calculation and discrete cosine transform on the filtering result of the obtained sound signal in the Mel domain to obtain the sound characteristic of the sound signal;
the detection module is used for inputting the obtained sound characteristics of the sound signal into a trained convolutional neural network model and detecting whether the sound signal is abnormal sound; wherein, the convolution neural network model includes: n convolutional layers, wherein the activation function and the pooling layer are arranged after the N convolutional layers, when the result of the convolution, activation and pooling of the sound characteristic of the sound signal is called a characteristic diagram and is represented by F, M fully-connected layers are provided, the size of a convolution kernel of each convolutional layer is Kn, the step length is Sn, the number of the convolution kernels is 2N, N is more than or equal to 1 and less than or equal to N, the number of neurons of the last fully-connected layer is 2, the number of neurons of other fully-connected layers is M, M is more than or equal to 1 and less than or equal to M, the activation function after convolution of each layer is Max-Feature-Map, and the following expression is that:
Figure 714457DEST_PATH_IMAGE007
wherein k is more than or equal to 1 and less than or equal to n;
Figure 241253DEST_PATH_IMAGE008
the number of F is 2 n;
the derivative of a is then:
Figure 378974DEST_PATH_IMAGE009
wherein w is the width of the feature map; h is the height of the feature map; i is the ith column of the characteristic diagram, i is more than or equal to 0 and is less than w; j is the jth row of the characteristic diagram, and j is more than or equal to 0 and less than h; k is more than or equal to 1 and less than or equal to n, and R is a real number space;
Figure 965813DEST_PATH_IMAGE010
a k-th feature map of 2n feature maps generated by the n-th convolutional layer;
Figure 188984DEST_PATH_IMAGE005
representing the characteristic value of the ith column and the jth row in the kth characteristic diagram;
Figure 191575DEST_PATH_IMAGE006
and (3) representing the characteristic value of the ith column and the jth row in the (k + n) th characteristic diagram in the 2n characteristic diagrams generated by the nth convolutional layer.
6. The apparatus of claim 5, further comprising:
and the sixth processing module is used for determining the sound signals corresponding to the frequency points as broadband signals when the difference values between the amplitudes of the frequency points and the amplitudes between the adjacent frequency points of the frequency points are smaller than the amplitude difference value threshold value, so that the collected sound signals are determined to belong to environmental sounds, and abnormal sound detection is not required.
7. The apparatus of claim 5, further comprising:
an assigning module for assigning an abnormal sound identifier to the sound signal when it is determined that the sound signal is an abnormal sound;
the storage module is used for storing the sound signal into a block chain system where the abnormal sound detection equipment is located, and obtaining a block address which is fed back by the block chain system and used for storing the sound signal;
and the hash calculation module is used for carrying out hash calculation on the block address for storing the sound signal to obtain a block address hash value, generating a corresponding relation between an abnormal sound identifier and the block address hash value, and sending the block address hash value to a block chain system, so that the block chain system generates a corresponding relation between the block address for storing the sound signal and the received block address hash value when receiving the block address hash value, and stores the generated corresponding relation between the block address for storing the sound signal and the received block address hash value.
8. The apparatus of claim 7, further comprising:
a second obtaining module, configured to obtain abnormal sound query information, where the abnormal sound query information includes: the user identification of the user sending the abnormal sound query information, the query time and the abnormal sound identification of the abnormal sound needing to be queried;
a third obtaining module, configured to, when a block address hash value corresponding to the abnormal sound identifier can be queried by using the abnormal sound identifier in the abnormal sound query information, generate a sound signal query instruction by using the queried block address hash value corresponding to the abnormal sound identifier;
a fourth obtaining module, configured to send the generated sound signal query instruction to a block chain system, so that the block chain system queries, according to a block address hash value carried in the received sound signal query instruction, a block address corresponding to the block address hash value carried in the received sound signal query instruction, reads a sound signal from the queried block address, and feeds the read sound signal back to the abnormal sound detection device;
the feedback module is used for receiving the sound signal fed back by the block chain system and feeding back the received sound signal to the user sending the abnormal sound query information;
a sending module, configured to send the abnormal sound query information to the blockchain system, so that the blockchain system stores the abnormal sound query information in a query log, where the query log is set in the blockchain system.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of the claims 1 to 4.
10. An electronic device comprising a memory, a processor, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor to perform the steps of the method of any of claims 1-4.
CN202110068478.8A 2021-01-19 2021-01-19 Abnormal sound detection method and device and electronic equipment Active CN112397055B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110068478.8A CN112397055B (en) 2021-01-19 2021-01-19 Abnormal sound detection method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110068478.8A CN112397055B (en) 2021-01-19 2021-01-19 Abnormal sound detection method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN112397055A true CN112397055A (en) 2021-02-23
CN112397055B CN112397055B (en) 2021-07-27

Family

ID=74624975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110068478.8A Active CN112397055B (en) 2021-01-19 2021-01-19 Abnormal sound detection method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN112397055B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108885133A (en) * 2016-04-01 2018-11-23 日本电信电话株式会社 Abnormal sound detects learning device, sound characteristic amount extraction device, abnormal sound sampling apparatus, its method and program
CN109785857A (en) * 2019-02-28 2019-05-21 桂林电子科技大学 Abnormal sound event recognition method based on MFCC+MP fusion feature
JP2019085237A (en) * 2017-11-07 2019-06-06 株式会社日立製作所 Abnormality detection device and elevator apparatus
CN111259921A (en) * 2019-12-19 2020-06-09 杭州安脉盛智能技术有限公司 Transformer sound anomaly detection method based on improved wavelet packet and deep learning
CN112116924A (en) * 2019-06-21 2020-12-22 株式会社日立制作所 Abnormal sound detection system, pseudo sound generation system, and pseudo sound generation method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108885133A (en) * 2016-04-01 2018-11-23 日本电信电话株式会社 Abnormal sound detects learning device, sound characteristic amount extraction device, abnormal sound sampling apparatus, its method and program
JP2019085237A (en) * 2017-11-07 2019-06-06 株式会社日立製作所 Abnormality detection device and elevator apparatus
CN109785857A (en) * 2019-02-28 2019-05-21 桂林电子科技大学 Abnormal sound event recognition method based on MFCC+MP fusion feature
CN112116924A (en) * 2019-06-21 2020-12-22 株式会社日立制作所 Abnormal sound detection system, pseudo sound generation system, and pseudo sound generation method
CN111259921A (en) * 2019-12-19 2020-06-09 杭州安脉盛智能技术有限公司 Transformer sound anomaly detection method based on improved wavelet packet and deep learning

Also Published As

Publication number Publication date
CN112397055B (en) 2021-07-27

Similar Documents

Publication Publication Date Title
US7203132B2 (en) Real time acoustic event location and classification system with camera display
CN111601074A (en) Security monitoring method and device, robot and storage medium
US10832032B2 (en) Facial recognition method, facial recognition system, and non-transitory recording medium
CN111312273A (en) Reverberation elimination method, apparatus, computer device and storage medium
Bravo et al. Species-specific audio detection: a comparison of three template-based detection algorithms using random forests
CN113205820B (en) Method for generating voice coder for voice event detection
Mitilineos et al. A two-level sound classification platform for environmental monitoring
CN110837758B (en) Keyword input method and device and electronic equipment
CN110765850A (en) Alarm method and device based on residual error network, computer equipment and storage medium
CN109308903A (en) Speech imitation method, terminal device and computer readable storage medium
Torkamani et al. Detection of system changes due to damage using a tuned hyperchaotic probe
CN113670434A (en) Transformer substation equipment sound abnormality identification method and device and computer equipment
CN115858609A (en) Electric vehicle charging pile state monitoring method, fault identification method and electronic equipment
CN114861727A (en) Converter transformer fault detection method, device, equipment, medium and program product
Ekpezu et al. Using deep learning for acoustic event classification: The case of natural disasters
CN110672326B (en) Bearing fault detection method and computer readable storage medium
CN113253336B (en) Earthquake prediction method and system based on deep learning
CN112397055B (en) Abnormal sound detection method and device and electronic equipment
CN115314239A (en) Analysis method and related equipment for hidden malicious behaviors based on multi-model fusion
CN113766405A (en) Method and device for detecting noise of loudspeaker, electronic equipment and storage medium
CN112634870A (en) Keyword detection method, device, equipment and storage medium
CN111883226A (en) Information processing and model training method, device, equipment and storage medium
CN112562861B (en) Method and device for training infectious disease prediction model
CN117636909B (en) Data processing method, device, equipment and computer readable storage medium
CN117782595A (en) Bearing fault detection system, method, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant