CN113671031B - Wall hollowing detection method and device - Google Patents
Wall hollowing detection method and device Download PDFInfo
- Publication number
- CN113671031B CN113671031B CN202110958298.7A CN202110958298A CN113671031B CN 113671031 B CN113671031 B CN 113671031B CN 202110958298 A CN202110958298 A CN 202110958298A CN 113671031 B CN113671031 B CN 113671031B
- Authority
- CN
- China
- Prior art keywords
- network
- neural network
- convolutional neural
- output
- residual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 78
- 230000005236 sound signal Effects 0.000 claims abstract description 53
- 238000000034 method Methods 0.000 claims abstract description 45
- 238000013527 convolutional neural network Methods 0.000 claims description 65
- 238000013528 artificial neural network Methods 0.000 claims description 58
- 230000015654 memory Effects 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 14
- 230000000306 recurrent effect Effects 0.000 claims description 11
- 230000009467 reduction Effects 0.000 claims description 5
- 230000006403 short-term memory Effects 0.000 claims description 5
- 230000007787 long-term memory Effects 0.000 claims description 4
- 239000000758 substrate Substances 0.000 claims 2
- 238000010079 rubber tapping Methods 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 12
- 230000008569 process Effects 0.000 description 10
- 102100032202 Cornulin Human genes 0.000 description 7
- 101000920981 Homo sapiens Cornulin Proteins 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000006378 damage Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000003331 infrared imaging Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N29/00—Investigating or analysing materials by the use of ultrasonic, sonic or infrasonic waves; Visualisation of the interior of objects by transmitting ultrasonic or sonic waves through the object
- G01N29/04—Analysing solids
- G01N29/045—Analysing solids by imparting shocks to the workpiece and detecting the vibrations or the acoustic waves caused by the shocks
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N29/00—Investigating or analysing materials by the use of ultrasonic, sonic or infrasonic waves; Visualisation of the interior of objects by transmitting ultrasonic or sonic waves through the object
- G01N29/44—Processing the detected response signal, e.g. electronic circuits specially adapted therefor
- G01N29/4472—Mathematical theories or simulation
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N29/00—Investigating or analysing materials by the use of ultrasonic, sonic or infrasonic waves; Visualisation of the interior of objects by transmitting ultrasonic or sonic waves through the object
- G01N29/44—Processing the detected response signal, e.g. electronic circuits specially adapted therefor
- G01N29/4481—Neural networks
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N2291/00—Indexing codes associated with group G01N29/00
- G01N2291/02—Indexing codes associated with the analysed material
- G01N2291/028—Material parameters
- G01N2291/0289—Internal structure, e.g. defects, grain size, texture
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Algebra (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Acoustics & Sound (AREA)
- Machine Translation (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The application provides a wall hollowing detection method and device, wherein the method comprises the following steps: acquiring an audio signal generated by knocking a wall; acquiring a first characteristic and a second characteristic of the audio signal; splicing the first characteristic and the second characteristic to obtain a spliced characteristic; acquiring an identification result corresponding to the splicing characteristic based on a preset audio detection network model; and determining whether the wall body is in a hollowing state according to the identification result. The method can improve the accuracy of wall hollowing detection on the premise of reducing labor cost.
Description
Technical Field
The embodiment of the disclosure relates to a wall hollowing detection method and device.
Background
The facing layer of the building structure system is subject to natural force and thermodynamic action for a long time, so that the problems of aging, damage, loosening and the like of the facing layer material and the connecting material thereof and the supporting structure are formed, and the whole or partial debonding, loosening, hollowing and even falling off of the facing layer are caused, thus being easy to cause personal and property injuries.
At present, the hollowing detection of the wall body is mainly realized by methods of manual knocking, ear identification, infrared imaging and the like.
In the process of realizing the application, the method for identifying the human knocking and ear recognition has higher human cost and inaccurate detection due to the influence of human factors; infrared imaging detection is susceptible to external interference, such as noise, heat sources, and the like, which results in inaccurate detection.
Disclosure of Invention
In view of the above, the application provides a method and a device for detecting wall hollowing, which can improve the accuracy of wall hollowing detection on the premise of reducing labor cost.
In order to solve the technical problems, the technical scheme of the application is realized as follows:
In one embodiment, a wall hollowing detection method is provided, the method comprising:
acquiring an audio signal generated by knocking a wall;
acquiring a first characteristic and a second characteristic of the audio signal;
splicing the first characteristic and the second characteristic to obtain a spliced characteristic;
acquiring an identification result corresponding to the splicing characteristic based on a preset audio detection network model;
and determining whether the wall body is in a hollowing state according to the identification result.
In another embodiment, there is provided a wall hollowing detection device, the device including: the device comprises a storage unit, a first acquisition unit, a second acquisition unit, a third acquisition unit and a determination unit;
the storage unit is used for storing a preset audio detection network model;
the first acquisition unit is used for acquiring an audio signal generated by knocking the wall body;
The second acquisition unit is used for acquiring the first characteristic and the second characteristic of the audio signal acquired by the first acquisition unit; splicing the first characteristic and the second characteristic to obtain a spliced characteristic;
The third obtaining unit is used for obtaining the recognition result corresponding to the splicing characteristic obtained by the second obtaining unit based on the preset audio detection network model stored by the storage unit;
The determining unit is used for determining whether the wall body is in a hollowing state or not according to the identification result acquired by the third acquiring unit.
In another embodiment, an electronic device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the wall hollowing detection method when executing the program.
In another embodiment, a computer readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the wall hollowing detection method.
In another embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the wall hollowing detection method.
As can be seen from the above technical solutions, in the above embodiments, by performing feature acquisition on an acquired audio signal, and splicing the acquired features as overall features of the audio signal, an identification result corresponding to the spliced features is acquired based on a preset audio detection network model; and determining whether the wall body is in a hollowing state according to the identification result. According to the scheme, the accuracy of wall hollowing detection can be improved on the premise of reducing labor cost.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a schematic flow chart of acquiring a preset audio detection network model in an embodiment of the present application;
FIG. 2 is a schematic diagram of a residual CRNN-GLU in an embodiment of the application;
FIG. 3 is a schematic diagram of a wall hollowing detection process according to an embodiment of the application;
FIG. 4 is a schematic diagram of an audio signal acquisition device;
FIG. 5 is a schematic diagram of a wall hollowing detection process in a second embodiment of the present application;
FIG. 6 is a schematic diagram of a wall hollowing detection device according to an embodiment of the present application;
Fig. 7 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
The embodiment of the application provides a wall hollowing detection method, which comprises the steps of obtaining characteristics of an acquired audio signal, splicing the obtained characteristics to be used as the overall characteristics of the audio signal, and obtaining an identification result corresponding to the spliced characteristics based on a preset audio detection network model; and determining whether the wall body is in a hollowing state according to the identification result. According to the scheme, the accuracy of wall hollowing detection can be improved on the premise of reducing labor cost.
The preset audio detection network model in the embodiment of the application can be built in advance for storage, and can be directly used when needed, or can be built and used when needed.
The method for establishing the preset audio detection network model specifically comprises the following steps:
referring to fig. 1, fig. 1 is a flowchart illustrating a process of acquiring a preset audio detection network model according to an embodiment of the present application. The method comprises the following specific steps:
step 101, obtaining a training sample.
The obtained training samples can be unlabeled, partially labeled or fully labeled, and the initial audio detection network model can be used for labeling unlabeled samples as training samples.
Step 102, acquiring a first feature and a second feature of a training sample, and splicing the first feature and the second feature to obtain a spliced feature.
Wherein the first feature is a fundamental frequency feature;
The fundamental frequency features are used for representing tone characteristics, and can be selected and extracted in several dimensions according to practical application, for example, the following five fundamental frequency features can be selected:
1) The base frequency value of the voice frame;
2) The first-order difference value of the fundamental frequency of the adjacent frames;
3) Fundamental frequency second order difference values of adjacent frames;
4) The length of the current continuous base band;
5) The difference between the base frequency value of the current frame and the average value of the base frequency of the next N frames of the previous continuous base band, N is typically chosen to be 10.
The fundamental frequency characteristics of the five dimensions can obtain better characteristic combinations, but the method involved in the implementation of the application is not limited to the five-dimensional characteristic combinations, and the increase or decrease of the number of the characteristics does not affect the application of the application.
The method for extracting the fundamental frequency features is an example implementation manner, and the embodiment of the application is not limited to the implementation manner.
The second characteristic is any one of the following characteristics or characteristics obtained by splicing any at least two of the following characteristics:
speech features, decoding features, neural network features.
I.e. the second feature may be a speech feature, a decoding feature or a neural network feature;
Or alternatively, the first and second heat exchangers may be,
Features after splicing the voice features and the decoding features;
Or alternatively, the first and second heat exchangers may be,
Features after the voice features and the neural network features are spliced;
Or alternatively, the first and second heat exchangers may be,
Decoding the characteristics and splicing the characteristics of the neural network;
Or alternatively, the first and second heat exchangers may be,
Voice features, decoding features and features after neural network feature stitching.
When the second features are the features after splicing, the features before splicing are acquired respectively, then the acquired features are spliced, and the spliced features are acquired, if the second features are the features after splicing of the decoding features and the neural network features, the decoding features and the neural network features of the samples are acquired respectively, and then the decoding features and the neural network features are spliced, so that the spliced features are acquired.
Wherein,
The voice features are as follows: filter Bank (Fbank) features, mel-frequency cepstrum parameters (Mel-Frequency Cepstral Coefficients, MFCC), etc.;
Neural network features such as: bottleneck (bottleneck) features, DNN features, CNN features, LSTM features, etc.;
decoding features are as follows: threshold, maximum, minimum, confidence, grid distribution statistics, etc. of sentences.
If any one of the acquired voice features, decoding features and neural network features has multiple features, the features of the same type can be spliced.
The method for acquiring the voice characteristics, the decoding characteristics and the neural network characteristics is not limited in the embodiment of the application, and the method for acquiring the voice characteristics, the decoding characteristics and the neural network characteristics is not exemplified.
In the embodiment of the application, when the first feature and the second feature are spliced, the sequence of the two features is not limited, for example, the first feature is [ A 1,A2,……,An ], the second feature is [ B 1,B2,……,Bm ], and the spliced feature can be [ A 1,A2,……,An,B1,B2,……,Bm ] or [ B 1,B2,……,Bm,A1,A2,……,An ].
And step 103, establishing an initial audio detection network model.
The initial audio detection network model is built based on any one of the following neural networks: deep neural network (Deep Neural Networks, DNN), convolutional recurrent neural network (Convolutional Recurrent Neural Network, CRNN), convolutional neural network (Convolutional Neural Network, CNN), long Short-Term Memory (LSTM), i.e., instant recurrent neural network;
When the neural network for establishing the initial audio detection network model is CRNN, the CRNN adopts a residual convolution recursive network (Convolutional Recurrent Neural Network, CRNN) -gating linear unit (GATED LINEAR Units, GLU) structure;
when residual connection is performed in the residual convolution recursive network-gating linear unit structure, network input before the gating linear unit is added with the result of operation of the gating linear unit by the two convolution networks, and the result after residual connection is used as input of the next CNN network.
In the embodiment of the present application, taking the case that the convolutional recurrent neural network includes two residual connections, the residual convolutional recurrent network-gated linear unit structure includes: the system comprises a first residual connecting network, a second residual connecting network, a first convolutional neural network, a second convolutional neural network, a long-short-term memory (LSTM) network, a full-connection layer and an output layer;
The first residual error connection network comprises a third convolutional neural network, a fourth convolutional neural network and a first gating linear unit, wherein the third convolutional neural network and the fourth convolutional neural network are used for receiving network inputs, the outputs of the third convolutional neural network and the fourth convolutional neural network are input to the first gating linear unit, and the output of the first gating linear unit is added with the network inputs to be used as the output of the first residual error connection network;
the first convolutional neural network is in signal connection with the first residual error connection network and is used for receiving the output of the first residual error connection network;
the second residual error connection network is in signal connection with the first convolution neural network and is used for receiving the output of the first convolution neural network, the second residual error connection network comprises a fifth convolution neural network, a sixth convolution neural network and a second gating linear unit, the fifth convolution neural network and the sixth convolution neural network are used for receiving the output of the first convolution neural network, the output of the fifth convolution neural network and the output of the sixth convolution neural network are input to the second gating linear unit, and the output of the second gating linear unit is added with the output of the first convolution neural network to be used as the output of the second residual error connection network;
the second convolution neural network is in signal connection with the second residual error connection network and receives the output of the second residual error connection network, and the output of the second convolution neural network is used as the network input of the long-term and short-term memory network;
The network output of the long-short-period memory network is used as the input of the full-connection layer;
The output of the full connection layer is used as the input of the output layer;
The output of the output layer is taken as the output of the residual convolution recursive network-gated linear unit structure.
The first convolutional neural network to the sixth convolutional neural network may be the same network or may be different networks, which is not limited in the embodiment of the present application.
The network structure of a convolutional recurrent neural network is illustrated below in conjunction with the figures.
Referring to fig. 2, fig. 2 is a schematic diagram of a residual CRNN-GLU structure according to an embodiment of the application. From top to bottom in fig. 2, a first gating linear unit (first gating linear unit) corresponds to a first residual error connection network, a second gating linear unit (second gating linear unit) corresponds to a second residual error connection network, and two CNNs in the first residual error connection network are a third convolutional neural network and a fourth convolutional neural network; two CNNs in the second residual connection network are a fifth convolutional neural network and a sixth convolutional neural network, CNNs between the first residual connection network and the second residual connection network are the first convolutional neural network, and CNNs between the second residual connection network and a long short-term memory (LSTM) network are the second convolutional neural network.
The following four parts of realization processes are provided for the structure:
a first part:
Loss function: because there may be overlap between events, LOSS functions (LOSS) may be used in embodiments of the present application: BCE (binary cross-entcopy) loss function is defined as follows:
Wherein, P n represents a predicted signal obtained through the residual CRNN-GLU structure on the nth frame, O n represents a correct signal corresponding to the audio on the nth frame, and N represents a total frame number of one sample.
A second part:
GLU operation: in the structure of the network, A gating operation is represented, GLU (GATED LINEAR Unit), defined as follows:
Onext=(W*X+b)⊙σ(V*X+。)
Wherein P next represents the output of gating, W, V represents the convolution kernels of CNN, b, c represents offset, and the convolution kernels of two CNN networks can be the same or different; x represents the input feature before GLU, σ is a nonlinear sigmoid function, and # is a matrix operator that multiplies values at corresponding positions of the matrix.
Third section: in the implementation of Residual Connections (residual connection, resnet) in the embodiment of the present application, as shown in fig. 2, when residual connection is performed, the network input before the GLU is added to the result of the GLU operation by two convolutional neural networks, specifically as follows:
wherein I i denotes the features previously entered by the ith GLU, Representing the output of the ith gate.
The result RES i of the residual connection is taken as input to the next CNN network.
As shown in fig. 2, in the embodiment of the present application, two residual convolution recursive network-gated linear unit structures are adopted to perform two residual connection, but the implementation is not limited to two residual connection;
if the real-time requirement is not very high, the residual part can be increased, the network design can be very deep, and the accuracy is increased.
Fourth part:
After CNN network, it can go through Pooling dimension-reducing operation, then send to an LSTM (Bi-LSTM can be used in non-real-time scene), then connect with a full connection (FNN) layer, finally reach output layer (OUT-PUT), then calculate loss function, then through SGD (random gradient descent) method to update and adjust parameters.
In the embodiment of the present application, implementation manners of the first portion, the second portion and the fourth portion are not limited, when residual connection is implemented for the third portion, network input before GLU is added to two convolution results obtained by GLU operation, and the residual connection result RES i is used as input of the next CNN network. The implementation of the scheme can improve the accuracy of network identification.
The output of the model is either a null or non-null state.
And 104, training the initial audio detection network model by using the spliced features and labels of the corresponding training samples to obtain a preset audio detection network model.
Thus, the establishment of the preset audio detection network model is completed.
The following describes the wall hollowing detection process in detail by combining the drawings.
Example 1
Referring to fig. 3, fig. 3 is a schematic diagram of a wall hollowing detection flow in accordance with an embodiment of the present application. The method comprises the following specific steps:
Step 301, an audio signal generated by knocking a wall is obtained.
The method can be used for collecting the audio signals of the wall body knocking at present or collecting and storing the audio signals in advance.
The acquisition device for acquiring audio signals is not limited, and may be, but not limited to, the acquisition device shown in fig. 4, and fig. 4 is a schematic diagram of an audio signal acquisition device.
The small circle part in fig. 4 is a microphone pickup device, and the number 1, 2,4, 6 or 8 can be selected according to the cost, and the collection device in fig. 4 is exemplified by 8 microphones.
The acquisition equipment is provided with a liquid crystal display screen, and can display the judging result and other matters which can assist in judging in real time.
The computing function of the acquisition device can be selected to be nested in a device chip or partially adopt cloud computing and return a result.
The acquisition equipment is provided with a handle, so that the audio signal can be conveniently acquired.
Step 302 extracts a first feature and a second feature of the audio signal.
The first feature is a fundamental frequency feature;
the second characteristic is any one of the following characteristics or characteristics obtained by splicing any at least two of the following characteristics:
speech features, decoding features, neural network features.
And step 303, splicing the first feature and the second feature to obtain a spliced feature.
In the embodiment of the application, when the first feature and the second feature are spliced, the sequence of the two features is not limited, and if the first feature is [ A 1,A2,……,An ] and the second feature is [ B 1,B2,……,Bm ], the spliced feature can be [ A 1,A2,……,An,B1,B2,……,Bm ] or [ B 1,B2,……,Bm,A1,A2,……,An ]; but needs to be consistent with the sequence of feature stitching when the preset audio detection network model is established.
The first characteristic and the second characteristic are spliced to be used as characteristic information of the audio signal, the multidimensional characteristic of the audio signal is considered, and the detection result can be acquired more accurately.
And step 304, acquiring a recognition result corresponding to the splicing characteristic based on a preset audio detection network model.
Wherein, the recognition result is: in an empty state or not.
And step 305, determining whether the wall body is in a hollowing state according to the identification result.
When the identification result is in the empty state or not, determining whether the wall body is in the empty state according to the identification result comprises the following steps:
when the identification result is in the empty state, determining that the wall body is in the empty state;
And when the identification result is that the wall body is not in the empty state, determining that the wall body is not in the empty state.
In the embodiment of the application, the first characteristic and the second characteristic of the acquired audio signal are extracted, the first characteristic and the second characteristic are spliced to be used as the integral characteristic of the audio signal, and the recognition result corresponding to the spliced characteristic is obtained based on a preset audio detection network model; and determining whether the wall body is in a hollowing state according to the identification result. According to the scheme, the accuracy of wall hollowing detection can be improved on the premise of reducing labor cost
Example two
Referring to fig. 5, fig. 5 is a schematic diagram of a wall hollowing detection flow in a second embodiment of the application. The method comprises the following specific steps:
Step 501, an audio signal generated by knocking a wall is obtained.
The method can be used for collecting the audio signals of the wall body knocking at present or collecting and storing the audio signals in advance.
The embodiment of the application does not limit the acquisition equipment for acquiring the audio signals.
And 502, performing endpoint detection on the audio signal and performing noise reduction processing.
Endpoint detection, also called voice activity detection, voice Activity Detection, VAD, is aimed at distinguishing between voice and non-voice regions. The endpoint detection is to accurately locate the starting point and the ending point of the voice from the voice with noise, remove the mute part and remove the noise part, and find the real effective content of a voice.
VAD algorithms can be roughly divided into three categories: threshold-based VAD, VAD as classifier, model VAD.
Threshold-based VAD: by extracting the characteristics of time domain (short-time energy, short-time zero crossing rate and the like) or frequency domain (MFCC, spectral entropy and the like), the aim of distinguishing voice from non-voice is achieved by reasonably setting a threshold. This is a conventional VAD method.
VAD as classifier: the voice detection can be regarded as a voice/non-voice two-classification problem, and then the classifier is trained by a machine learning method, so that the purpose of detecting the voice is achieved.
Model VAD: a complete acoustic model (the granularity of the modeling unit may be coarse) may be used to distinguish between speech segments and non-speech segments based on the global information on the basis of decoding.
The embodiment of the application is not limited to a specific implementation mode of endpoint detection.
The specific implementation can be noise reduction or dereverberation processing.
In the specific implementation of the application, noise reduction or dereverberation processing is carried out after the end point detection, so that the external interference in the audio signal can be removed to a greater extent.
The method further comprises the steps of:
and if the signal intensity of the audio signal is smaller than a preset value, performing enhancement processing on the audio signal.
I.e. the signal strength of the audio signal is relatively small, an enhancement process is performed.
The audio signal after pretreatment can be easily extracted into significant Fbank features and fundamental frequency features, so that the final audio signal can be conveniently identified.
Step 503 extracts a first feature and a second feature of the audio signal.
The first feature is a fundamental frequency feature;
the second characteristic is any one of the following characteristics or characteristics obtained by splicing any at least two of the following characteristics:
speech features, decoding features, neural network features.
And step 504, splicing the first feature and the second feature to obtain a spliced feature.
In the embodiment of the application, when the first feature and the second feature are spliced, the sequence of the two features is not limited, and if the first feature is [ A 1,A2,……,An ] and the second feature is [ B 1,B2,……,Bm ], the spliced feature can be [ A 1,A2,……,An,B1,B2,……,Bm ] or [ B 1,B2,……,Bm,A1,A2,……,An ]; but needs to be consistent with the sequence of feature stitching when the preset audio detection network model is established.
The first characteristic and the second characteristic are spliced to be used as characteristic information of the audio signal, the multidimensional characteristic of the audio signal is considered, and the detection result can be acquired more accurately.
And step 505, acquiring a recognition result corresponding to the splicing characteristic based on a preset audio detection network model.
Wherein, the recognition result is: in an empty state or not.
And step 506, determining whether the wall body is in a hollowing state according to the identification result.
When the identification result is in the empty state or not, determining whether the wall body is in the empty state according to the identification result comprises the following steps:
when the identification result is in the empty state, determining that the wall body is in the empty state;
And when the identification result is that the wall body is not in the empty state, determining that the wall body is not in the empty state.
In the embodiment of the application, the acquired audio signals are preprocessed, the preprocessed audio signals are subjected to first feature extraction and second feature extraction, the first feature and the second feature are spliced to be used as the integral feature of the audio signals, and the recognition result corresponding to the spliced feature is obtained based on a preset audio detection network model; and determining whether the wall body is in a hollowing state according to the identification result. According to the scheme, the accuracy of wall hollowing detection can be improved on the premise of reducing labor cost.
Based on the same inventive concept, the embodiment of the application also provides a wall hollowing detection device. Referring to fig. 6, fig. 6 is a schematic structural diagram of a wall hollowing detection device according to an embodiment of the present application. The device comprises: a storage unit 601, a first acquisition unit 602, a second acquisition unit 603, a third acquisition unit 604, and a determination unit 605;
A storage unit 601, configured to store a preset audio detection network model;
a first obtaining unit 602, configured to obtain an audio signal generated by knocking a wall;
A second acquisition unit 603 for extracting the first feature and the second feature of the audio signal acquired by the first acquisition unit 602; splicing the first characteristic and the second characteristic to obtain a spliced characteristic;
A third obtaining unit 604, configured to obtain, based on the preset audio detection network model stored in the storage unit 601, a recognition result corresponding to the splicing feature obtained by the second obtaining unit 603;
A determining unit 605 is configured to determine whether the wall is in a hollowing state according to the identification result acquired by the third acquiring unit 604.
In a further embodiment of the present invention,
The first feature is a fundamental frequency feature;
the second characteristic is any one of the following characteristics or characteristics obtained by splicing any at least two of the following characteristics:
speech features, decoding features, neural network features.
In another embodiment, the apparatus comprises: a processing unit 606;
The processing unit 606 is configured to perform endpoint detection on the audio signal after the first obtaining unit 602 obtains the audio signal generated by knocking the wall, and before the second obtaining unit 603 obtains the first feature and the second feature of the audio signal, and perform noise reduction processing.
In a further embodiment of the present invention,
The storage unit is specifically used for acquiring training samples; acquiring first characteristics and second characteristics of the audio signals in the training samples, and splicing the first characteristics and the second characteristics to obtain spliced characteristics; establishing an initial audio detection network model; the initial audio detection network model is established based on any one of the following neural networks: deep neural network DNN, convolutional recurrent neural network CRNN, convolutional neural network CNN and long-term memory network LSTM; training the initial audio detection network model by using the splicing characteristics and labels of the corresponding training samples, and obtaining the preset audio detection network model.
In a further embodiment of the present invention,
When a neural network for establishing an initial audio detection network model is a CRNN, the CRNN adopts a residual convolution recursion network-gating linear unit structure;
when residual connection is performed in the residual convolution recursive network-gating linear unit structure, network input before the gating linear unit is added with the result of operation of the gating linear unit by the two convolution networks, and the result after residual connection is used as input of the next CNN network.
In another embodiment, a residual convolution recursive network-gated linear cell structure comprises: the system comprises a first residual connecting network, a second residual connecting network, a first convolutional neural network, a second convolutional neural network, a long-short-term memory (LSTM) network, a full-connection layer and an output layer;
The first residual error connection network comprises a third convolutional neural network, a fourth convolutional neural network and a first gating linear unit, wherein the third convolutional neural network and the fourth convolutional neural network are used for receiving network inputs, the outputs of the third convolutional neural network and the fourth convolutional neural network are input to the first gating linear unit, and the output of the first gating linear unit is added with the network inputs to be used as the output of the first residual error connection network;
the first convolutional neural network is in signal connection with the first residual error connection network and is used for receiving the output of the first residual error connection network;
the second residual error connection network is in signal connection with the first convolution neural network and is used for receiving the output of the first convolution neural network, the second residual error connection network comprises a fifth convolution neural network, a sixth convolution neural network and a second gating linear unit, the fifth convolution neural network and the sixth convolution neural network are used for receiving the output of the first convolution neural network, the output of the fifth convolution neural network and the output of the sixth convolution neural network are input to the second gating linear unit, and the output of the second gating linear unit is added with the output of the first convolution neural network to be used as the output of the second residual error connection network;
the second convolution neural network is in signal connection with the second residual error connection network and receives the output of the second residual error connection network, and the output of the second convolution neural network is used as the network input of the long-term and short-term memory network;
The network output of the long-short-period memory network is used as the input of the full-connection layer;
The output of the full connection layer is used as the input of the output layer;
The output of the output layer is taken as the output of the residual convolution recursive network-gated linear unit structure.
In a further embodiment of the present invention,
The determining unit 605 is specifically configured to determine, when the identification result is in a hollowing state or not, whether the wall is in a hollowing state according to the identification result, where the determining unit includes: when the identification result is in the empty state, determining that the wall body is in the empty state; and when the identification result is that the wall body is not in the empty state, determining that the wall body is not in the empty state.
The units of the above embodiments may be integrated or may be separately deployed; can be combined into one unit or further split into a plurality of sub-units.
In another embodiment, there is also provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the wall hollowing detection method when executing the program.
In another embodiment, a computer readable storage medium having stored thereon computer instructions which when executed by a processor perform the steps of the wall hollowing detection method is also provided.
In another embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the wall hollowing detection method.
Fig. 7 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention. As shown in fig. 7, the electronic device may include: processor (Processor) 710, communication interface (Communications Interface) 720, memory (Memory) 730, and communication bus 740, wherein Processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. Processor 710 may call logic instructions in memory 730 to perform the following method:
acquiring an audio signal generated by knocking a wall;
acquiring a first characteristic and a second characteristic of the audio signal;
splicing the first characteristic and the second characteristic to obtain a spliced characteristic;
acquiring an identification result corresponding to the splicing characteristic based on a preset audio detection network model;
and determining whether the wall body is in a hollowing state according to the identification result.
Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.
Claims (8)
1. A method for detecting wall hollowness, the method comprising:
acquiring an audio signal generated by knocking a wall;
acquiring a first characteristic and a second characteristic of the audio signal;
splicing the first characteristic and the second characteristic to obtain a spliced characteristic;
acquiring an identification result corresponding to the splicing characteristic based on a preset audio detection network model;
Determining whether the wall body is in a hollowing state according to the identification result;
The method further comprises the steps of:
obtaining a training sample;
acquiring first characteristics and second characteristics of the audio signals in the training samples, and splicing the first characteristics and the second characteristics to obtain spliced characteristics;
Establishing an initial audio detection network model;
training the initial audio detection network model by using the splicing characteristics and labels of corresponding training samples to acquire the preset audio detection network model;
when the neural network for establishing the initial audio detection network model is a convolution recurrent neural network, the convolution recurrent neural network adopts a residual convolution recurrent network-gating linear unit structure;
when residual connection is performed in the residual convolution recursive network-gating linear unit structure, network input before the gating linear unit is added with the result of operation of the gating linear unit by the two convolution networks, and the result after residual connection is used as input of the next convolution neural network.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises,
The first feature is a fundamental frequency feature;
the second characteristic is any one of the following characteristics or characteristics obtained by splicing any at least two of the following characteristics:
speech features, decoding features, neural network features.
3. The method of claim 1, wherein after the capturing the audio signal generated by tapping the wall, the method further comprises, prior to the capturing the first and second characteristics of the audio signal: and detecting the end point of the audio signal and performing noise reduction treatment.
4. The method of claim 1, wherein the step of determining the position of the substrate comprises,
The residual convolution recursive network-gated linear unit structure comprises: the first residual error connecting network, the second residual error connecting network, the first convolutional neural network, the second convolutional neural network, the long-term and short-term memory network, the full-connection layer and the output layer;
wherein the first residual connection network comprises a third convolutional neural network, a fourth convolutional neural network, and a first gating linear unit, the third convolutional neural network and the fourth convolutional neural network being configured to receive a network input and outputs of the third convolutional neural network and the fourth convolutional neural network being input to the first gating linear unit, an output of the first gating linear unit being added to the network input as an output of the first residual connection network;
The first convolutional neural network is in signal connection with the first residual error connection network and is used for receiving the output of the first residual error connection network;
The second residual connection network is in signal connection with the first convolutional neural network and is used for receiving the output of the first convolutional neural network, the second residual connection network comprises a fifth convolutional neural network, a sixth convolutional neural network and a second gating linear unit, the fifth convolutional neural network and the sixth convolutional neural network are used for receiving the output of the first convolutional neural network, and the outputs of the fifth convolutional neural network and the sixth convolutional neural network are input to the second gating linear unit, and the output of the second gating linear unit is added with the output of the first convolutional neural network to be used as the output of the second residual connection network;
The second convolutional neural network is in signal connection with the second residual error connection network and receives the output of the second residual error connection network, and the output of the second convolutional neural network is used as the network input of the long-period and short-period memory network;
the network output of the long-short-period memory network is used as the input of the full-connection layer;
the output of the full connection layer is used as the input of the output layer;
The output of the output layer is taken as the output of the residual convolution recursive network-gated linear unit structure.
5. The method according to any one of claim 1 to 4, wherein,
When the identification result is in the empty state or not, determining whether the wall body is in the empty state according to the identification result comprises the following steps:
when the identification result is in the empty state, determining that the wall body is in the empty state;
And when the identification result is that the wall body is not in the empty state, determining that the wall body is not in the empty state.
6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-5 when the computer program is executed by the processor.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1-5.
8. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method of any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110958298.7A CN113671031B (en) | 2021-08-20 | 2021-08-20 | Wall hollowing detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110958298.7A CN113671031B (en) | 2021-08-20 | 2021-08-20 | Wall hollowing detection method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113671031A CN113671031A (en) | 2021-11-19 |
CN113671031B true CN113671031B (en) | 2024-06-21 |
Family
ID=78544181
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110958298.7A Active CN113671031B (en) | 2021-08-20 | 2021-08-20 | Wall hollowing detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113671031B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106841403A (en) * | 2017-01-23 | 2017-06-13 | 天津大学 | A kind of acoustics glass defect detection method based on neutral net |
CN110992959A (en) * | 2019-12-06 | 2020-04-10 | 北京市科学技术情报研究所 | Voice recognition method and system |
CN111798840A (en) * | 2020-07-16 | 2020-10-20 | 中移在线服务有限公司 | Voice keyword recognition method and device |
CN112034044A (en) * | 2020-09-01 | 2020-12-04 | 南京邮电大学 | Intelligent recognition and detection device for hollowing of ceramic tile based on neural network |
Family Cites Families (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002318155A (en) * | 2001-04-24 | 2002-10-31 | Fuji Xerox Co Ltd | Signal determination device |
US20170140240A1 (en) * | 2015-07-27 | 2017-05-18 | Salesforce.Com, Inc. | Neural network combined image and text evaluator and classifier |
CN107704924B (en) * | 2016-07-27 | 2020-05-19 | 中国科学院自动化研究所 | Construction method of synchronous self-adaptive space-time feature expression learning model and related method |
CN107180628A (en) * | 2017-05-19 | 2017-09-19 | 百度在线网络技术(北京)有限公司 | Set up the method, the method for extracting acoustic feature, device of acoustic feature extraction model |
CN107545902B (en) * | 2017-07-14 | 2020-06-02 | 清华大学 | Article material identification method and device based on sound characteristics |
US11504034B2 (en) * | 2017-07-27 | 2022-11-22 | Vita-Course Digital Technologies (Tsingtao) Co., Ltd. | Systems and methods for determining blood pressure of a subject |
US10962509B2 (en) * | 2017-08-02 | 2021-03-30 | The United States Of America As Represented By The Secretary Of The Navy | System and method for detecting failed electronics using acoustics |
EA202090867A1 (en) * | 2017-10-11 | 2020-09-04 | Бп Эксплорейшн Оперейтинг Компани Лимитед | DETECTING EVENTS USING FEATURES IN THE AREA OF ACOUSTIC FREQUENCIES |
US11250314B2 (en) * | 2017-10-27 | 2022-02-15 | Cognizant Technology Solutions U.S. Corporation | Beyond shared hierarchies: deep multitask learning through soft layer ordering |
CN108241024A (en) * | 2018-01-25 | 2018-07-03 | 上海众材工程检测有限公司 | A kind of hollowing detection method and system based on wall |
JP6895908B2 (en) * | 2018-02-26 | 2021-06-30 | 日立Astemo株式会社 | Tapping sound inspection device |
WO2019172655A1 (en) * | 2018-03-06 | 2019-09-12 | 주식회사 엘지화학 | Device for diagnosing cracks in battery pack, and battery pack and vehicle comprising same |
CN112005095A (en) * | 2018-03-29 | 2020-11-27 | 太阳诱电株式会社 | Sensing system, information processing apparatus, program, and information collection method |
US11315570B2 (en) * | 2018-05-02 | 2022-04-26 | Facebook Technologies, Llc | Machine learning-based speech-to-text transcription cloud intermediary |
CN109086892B (en) * | 2018-06-15 | 2022-02-18 | 中山大学 | General dependency tree-based visual problem reasoning model and system |
CN208505984U (en) * | 2018-08-10 | 2019-02-15 | 平湖市佳诚房地产评估事务所(普通合伙) | A kind of hollowing detector for Real Estate Appraisal |
CN109142523A (en) * | 2018-08-14 | 2019-01-04 | 浙江核力建筑特种技术有限公司 | A kind of metope hollowing recognition quantitative analytic approach based on acoustic imaging |
WO2020093042A1 (en) * | 2018-11-02 | 2020-05-07 | Deep Lens, Inc. | Neural networks for biomedical image analysis |
CN110070882B (en) * | 2019-04-12 | 2021-05-11 | 腾讯科技(深圳)有限公司 | Voice separation method, voice recognition method and electronic equipment |
KR102221617B1 (en) * | 2019-10-01 | 2021-03-03 | 주식회사 에스아이웨어 | Handheld ultrasound scanner for defect detection of weldments |
CN112927696A (en) * | 2019-12-05 | 2021-06-08 | 中国科学院深圳先进技术研究院 | System and method for automatically evaluating dysarthria based on voice recognition |
US20210216874A1 (en) * | 2020-01-10 | 2021-07-15 | Facebook Technologies, Llc | Radioactive data generation |
AU2020101229A4 (en) * | 2020-07-02 | 2020-08-06 | South China University Of Technology | A Text Line Recognition Method in Chinese Scenes Based on Residual Convolutional and Recurrent Neural Networks |
CN111916101B (en) * | 2020-08-06 | 2022-01-21 | 大象声科(深圳)科技有限公司 | Deep learning noise reduction method and system fusing bone vibration sensor and double-microphone signals |
CN112684012A (en) * | 2020-12-02 | 2021-04-20 | 青岛科技大学 | Equipment key force-bearing structural part fault diagnosis method based on multi-parameter information fusion |
CN113125556A (en) * | 2021-03-05 | 2021-07-16 | 南京智慧基础设施技术研究院有限公司 | Structural damage detection system and method based on voiceprint recognition |
CN113075296A (en) * | 2021-04-01 | 2021-07-06 | 湖南翰坤实业有限公司 | Method and device for detecting safety of outer wall structure based on sound wave detection and BIM model |
CN113111804B (en) * | 2021-04-16 | 2024-06-04 | 贝壳找房(北京)科技有限公司 | Face detection method and device, electronic equipment and storage medium |
CN113220933A (en) * | 2021-05-12 | 2021-08-06 | 北京百度网讯科技有限公司 | Method and device for classifying audio segments and electronic equipment |
-
2021
- 2021-08-20 CN CN202110958298.7A patent/CN113671031B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106841403A (en) * | 2017-01-23 | 2017-06-13 | 天津大学 | A kind of acoustics glass defect detection method based on neutral net |
CN110992959A (en) * | 2019-12-06 | 2020-04-10 | 北京市科学技术情报研究所 | Voice recognition method and system |
CN111798840A (en) * | 2020-07-16 | 2020-10-20 | 中移在线服务有限公司 | Voice keyword recognition method and device |
CN112034044A (en) * | 2020-09-01 | 2020-12-04 | 南京邮电大学 | Intelligent recognition and detection device for hollowing of ceramic tile based on neural network |
Also Published As
Publication number | Publication date |
---|---|
CN113671031A (en) | 2021-11-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107799126B (en) | Voice endpoint detection method and device based on supervised machine learning | |
CN106504768B (en) | Phone testing audio frequency classification method and device based on artificial intelligence | |
CN110570873B (en) | Voiceprint wake-up method and device, computer equipment and storage medium | |
CN108597496A (en) | Voice generation method and device based on generation type countermeasure network | |
CN106601230B (en) | Logistics sorting place name voice recognition method and system based on continuous Gaussian mixture HMM model and logistics sorting system | |
US9799333B2 (en) | System and method for processing speech to identify keywords or other information | |
CN110288975B (en) | Voice style migration method and device, electronic equipment and storage medium | |
CN109147774B (en) | Improved time-delay neural network acoustic model | |
CN113223536B (en) | Voiceprint recognition method and device and terminal equipment | |
CN108922543A (en) | Model library method for building up, audio recognition method, device, equipment and medium | |
CN113327575A (en) | Speech synthesis method, device, computer equipment and storage medium | |
CN110136726A (en) | A kind of estimation method, device, system and the storage medium of voice gender | |
CN111933148A (en) | Age identification method and device based on convolutional neural network and terminal | |
CN109065026B (en) | Recording control method and device | |
CN112309398B (en) | Method and device for monitoring working time, electronic equipment and storage medium | |
CN117762372A (en) | Multi-mode man-machine interaction system | |
CN117935789A (en) | Speech recognition method, system, equipment and storage medium | |
CN113671031B (en) | Wall hollowing detection method and device | |
CN112420079A (en) | Voice endpoint detection method and device, storage medium and electronic equipment | |
CN111785302A (en) | Speaker separation method and device and electronic equipment | |
CN111326161B (en) | Voiceprint determining method and device | |
CN107437414A (en) | Parallelization visitor's recognition methods based on embedded gpu system | |
CN114236469A (en) | Robot voice recognition positioning method and system | |
CN114155845A (en) | Service determination method and device, electronic equipment and storage medium | |
CN113903328A (en) | Speaker counting method, device, equipment and storage medium based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20240521 Address after: Room 102, floor 1, building 1, No. 2, Chuangye Road, Haidian District, Beijing 100085 Applicant after: Seashell Housing (Beijing) Technology Co.,Ltd. Country or region after: China Address before: 101399 room 24, 62 Farm Road, Erjie village, Yangzhen, Shunyi District, Beijing Applicant before: Beijing fangjianghu Technology Co.,Ltd. Country or region before: China |
|
GR01 | Patent grant | ||
GR01 | Patent grant |