CN113671031B

CN113671031B - Wall hollowing detection method and device

Info

Publication number: CN113671031B
Application number: CN202110958298.7A
Authority: CN
Inventors: 解传栋; 李先刚; 邹伟; 汤志远; 赵帅江
Original assignee: Seashell Housing Beijing Technology Co Ltd
Current assignee: Seashell Housing Beijing Technology Co Ltd
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2024-06-21
Anticipated expiration: 2041-08-20
Also published as: CN113671031A

Abstract

The application provides a wall hollowing detection method and device, wherein the method comprises the following steps: acquiring an audio signal generated by knocking a wall; acquiring a first characteristic and a second characteristic of the audio signal; splicing the first characteristic and the second characteristic to obtain a spliced characteristic; acquiring an identification result corresponding to the splicing characteristic based on a preset audio detection network model; and determining whether the wall body is in a hollowing state according to the identification result. The method can improve the accuracy of wall hollowing detection on the premise of reducing labor cost.

Description

Wall hollowing detection method and device

Technical Field

The embodiment of the disclosure relates to a wall hollowing detection method and device.

Background

The facing layer of the building structure system is subject to natural force and thermodynamic action for a long time, so that the problems of aging, damage, loosening and the like of the facing layer material and the connecting material thereof and the supporting structure are formed, and the whole or partial debonding, loosening, hollowing and even falling off of the facing layer are caused, thus being easy to cause personal and property injuries.

At present, the hollowing detection of the wall body is mainly realized by methods of manual knocking, ear identification, infrared imaging and the like.

In the process of realizing the application, the method for identifying the human knocking and ear recognition has higher human cost and inaccurate detection due to the influence of human factors; infrared imaging detection is susceptible to external interference, such as noise, heat sources, and the like, which results in inaccurate detection.

Disclosure of Invention

In view of the above, the application provides a method and a device for detecting wall hollowing, which can improve the accuracy of wall hollowing detection on the premise of reducing labor cost.

In order to solve the technical problems, the technical scheme of the application is realized as follows:

In one embodiment, a wall hollowing detection method is provided, the method comprising:

acquiring an audio signal generated by knocking a wall;

acquiring a first characteristic and a second characteristic of the audio signal;

splicing the first characteristic and the second characteristic to obtain a spliced characteristic;

acquiring an identification result corresponding to the splicing characteristic based on a preset audio detection network model;

and determining whether the wall body is in a hollowing state according to the identification result.

In another embodiment, there is provided a wall hollowing detection device, the device including: the device comprises a storage unit, a first acquisition unit, a second acquisition unit, a third acquisition unit and a determination unit;

the storage unit is used for storing a preset audio detection network model;

the first acquisition unit is used for acquiring an audio signal generated by knocking the wall body;

The second acquisition unit is used for acquiring the first characteristic and the second characteristic of the audio signal acquired by the first acquisition unit; splicing the first characteristic and the second characteristic to obtain a spliced characteristic;

The third obtaining unit is used for obtaining the recognition result corresponding to the splicing characteristic obtained by the second obtaining unit based on the preset audio detection network model stored by the storage unit;

The determining unit is used for determining whether the wall body is in a hollowing state or not according to the identification result acquired by the third acquiring unit.

In another embodiment, an electronic device is provided that includes a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the wall hollowing detection method when executing the program.

In another embodiment, a computer readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the wall hollowing detection method.

In another embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the wall hollowing detection method.

As can be seen from the above technical solutions, in the above embodiments, by performing feature acquisition on an acquired audio signal, and splicing the acquired features as overall features of the audio signal, an identification result corresponding to the spliced features is acquired based on a preset audio detection network model; and determining whether the wall body is in a hollowing state according to the identification result. According to the scheme, the accuracy of wall hollowing detection can be improved on the premise of reducing labor cost.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a schematic flow chart of acquiring a preset audio detection network model in an embodiment of the present application;

FIG. 2 is a schematic diagram of a residual CRNN-GLU in an embodiment of the application;

FIG. 3 is a schematic diagram of a wall hollowing detection process according to an embodiment of the application;

FIG. 4 is a schematic diagram of an audio signal acquisition device;

FIG. 5 is a schematic diagram of a wall hollowing detection process in a second embodiment of the present application;

FIG. 6 is a schematic diagram of a wall hollowing detection device according to an embodiment of the present application;

Fig. 7 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

The embodiment of the application provides a wall hollowing detection method, which comprises the steps of obtaining characteristics of an acquired audio signal, splicing the obtained characteristics to be used as the overall characteristics of the audio signal, and obtaining an identification result corresponding to the spliced characteristics based on a preset audio detection network model; and determining whether the wall body is in a hollowing state according to the identification result. According to the scheme, the accuracy of wall hollowing detection can be improved on the premise of reducing labor cost.

The preset audio detection network model in the embodiment of the application can be built in advance for storage, and can be directly used when needed, or can be built and used when needed.

The method for establishing the preset audio detection network model specifically comprises the following steps:

referring to fig. 1, fig. 1 is a flowchart illustrating a process of acquiring a preset audio detection network model according to an embodiment of the present application. The method comprises the following specific steps:

step 101, obtaining a training sample.

The obtained training samples can be unlabeled, partially labeled or fully labeled, and the initial audio detection network model can be used for labeling unlabeled samples as training samples.

Step 102, acquiring a first feature and a second feature of a training sample, and splicing the first feature and the second feature to obtain a spliced feature.

Wherein the first feature is a fundamental frequency feature;

The fundamental frequency features are used for representing tone characteristics, and can be selected and extracted in several dimensions according to practical application, for example, the following five fundamental frequency features can be selected:

1) The base frequency value of the voice frame;

2) The first-order difference value of the fundamental frequency of the adjacent frames;

3) Fundamental frequency second order difference values of adjacent frames;

4) The length of the current continuous base band;

5) The difference between the base frequency value of the current frame and the average value of the base frequency of the next N frames of the previous continuous base band, N is typically chosen to be 10.

The fundamental frequency characteristics of the five dimensions can obtain better characteristic combinations, but the method involved in the implementation of the application is not limited to the five-dimensional characteristic combinations, and the increase or decrease of the number of the characteristics does not affect the application of the application.

The method for extracting the fundamental frequency features is an example implementation manner, and the embodiment of the application is not limited to the implementation manner.

The second characteristic is any one of the following characteristics or characteristics obtained by splicing any at least two of the following characteristics:

speech features, decoding features, neural network features.

I.e. the second feature may be a speech feature, a decoding feature or a neural network feature;

Or alternatively, the first and second heat exchangers may be,

Features after splicing the voice features and the decoding features;

Or alternatively, the first and second heat exchangers may be,

Features after the voice features and the neural network features are spliced;

Or alternatively, the first and second heat exchangers may be,

Decoding the characteristics and splicing the characteristics of the neural network;

Or alternatively, the first and second heat exchangers may be,

Voice features, decoding features and features after neural network feature stitching.

When the second features are the features after splicing, the features before splicing are acquired respectively, then the acquired features are spliced, and the spliced features are acquired, if the second features are the features after splicing of the decoding features and the neural network features, the decoding features and the neural network features of the samples are acquired respectively, and then the decoding features and the neural network features are spliced, so that the spliced features are acquired.

Wherein,

The voice features are as follows: filter Bank (Fbank) features, mel-frequency cepstrum parameters (Mel-Frequency Cepstral Coefficients, MFCC), etc.;

Neural network features such as: bottleneck (bottleneck) features, DNN features, CNN features, LSTM features, etc.;

decoding features are as follows: threshold, maximum, minimum, confidence, grid distribution statistics, etc. of sentences.

If any one of the acquired voice features, decoding features and neural network features has multiple features, the features of the same type can be spliced.

The method for acquiring the voice characteristics, the decoding characteristics and the neural network characteristics is not limited in the embodiment of the application, and the method for acquiring the voice characteristics, the decoding characteristics and the neural network characteristics is not exemplified.

In the embodiment of the application, when the first feature and the second feature are spliced, the sequence of the two features is not limited, for example, the first feature is [ A ₁,A₂,……,A_n ], the second feature is [ B ₁,B₂,……,B_m ], and the spliced feature can be [ A ₁,A₂,……,A_n,B₁,B₂,……,B_m ] or [ B ₁,B₂,……,B_m,A₁,A₂,……,A_n ].

And step 103, establishing an initial audio detection network model.

The initial audio detection network model is built based on any one of the following neural networks: deep neural network (Deep Neural Networks, DNN), convolutional recurrent neural network (Convolutional Recurrent Neural Network, CRNN), convolutional neural network (Convolutional Neural Network, CNN), long Short-Term Memory (LSTM), i.e., instant recurrent neural network;

When the neural network for establishing the initial audio detection network model is CRNN, the CRNN adopts a residual convolution recursive network (Convolutional Recurrent Neural Network, CRNN) -gating linear unit (GATED LINEAR Units, GLU) structure;

when residual connection is performed in the residual convolution recursive network-gating linear unit structure, network input before the gating linear unit is added with the result of operation of the gating linear unit by the two convolution networks, and the result after residual connection is used as input of the next CNN network.

In the embodiment of the present application, taking the case that the convolutional recurrent neural network includes two residual connections, the residual convolutional recurrent network-gated linear unit structure includes: the system comprises a first residual connecting network, a second residual connecting network, a first convolutional neural network, a second convolutional neural network, a long-short-term memory (LSTM) network, a full-connection layer and an output layer;

The first residual error connection network comprises a third convolutional neural network, a fourth convolutional neural network and a first gating linear unit, wherein the third convolutional neural network and the fourth convolutional neural network are used for receiving network inputs, the outputs of the third convolutional neural network and the fourth convolutional neural network are input to the first gating linear unit, and the output of the first gating linear unit is added with the network inputs to be used as the output of the first residual error connection network;

the first convolutional neural network is in signal connection with the first residual error connection network and is used for receiving the output of the first residual error connection network;

the second residual error connection network is in signal connection with the first convolution neural network and is used for receiving the output of the first convolution neural network, the second residual error connection network comprises a fifth convolution neural network, a sixth convolution neural network and a second gating linear unit, the fifth convolution neural network and the sixth convolution neural network are used for receiving the output of the first convolution neural network, the output of the fifth convolution neural network and the output of the sixth convolution neural network are input to the second gating linear unit, and the output of the second gating linear unit is added with the output of the first convolution neural network to be used as the output of the second residual error connection network;

the second convolution neural network is in signal connection with the second residual error connection network and receives the output of the second residual error connection network, and the output of the second convolution neural network is used as the network input of the long-term and short-term memory network;

The network output of the long-short-period memory network is used as the input of the full-connection layer;

The output of the full connection layer is used as the input of the output layer;

The output of the output layer is taken as the output of the residual convolution recursive network-gated linear unit structure.

The first convolutional neural network to the sixth convolutional neural network may be the same network or may be different networks, which is not limited in the embodiment of the present application.

The network structure of a convolutional recurrent neural network is illustrated below in conjunction with the figures.

Referring to fig. 2, fig. 2 is a schematic diagram of a residual CRNN-GLU structure according to an embodiment of the application. From top to bottom in fig. 2, a first gating linear unit (first gating linear unit) corresponds to a first residual error connection network, a second gating linear unit (second gating linear unit) corresponds to a second residual error connection network, and two CNNs in the first residual error connection network are a third convolutional neural network and a fourth convolutional neural network; two CNNs in the second residual connection network are a fifth convolutional neural network and a sixth convolutional neural network, CNNs between the first residual connection network and the second residual connection network are the first convolutional neural network, and CNNs between the second residual connection network and a long short-term memory (LSTM) network are the second convolutional neural network.

The following four parts of realization processes are provided for the structure:

a first part:

Loss function: because there may be overlap between events, LOSS functions (LOSS) may be used in embodiments of the present application: BCE (binary cross-entcopy) loss function is defined as follows:

Wherein, P _n represents a predicted signal obtained through the residual CRNN-GLU structure on the nth frame, O _n represents a correct signal corresponding to the audio on the nth frame, and N represents a total frame number of one sample.

A second part:

GLU operation: in the structure of the network, A gating operation is represented, GLU (GATED LINEAR Unit), defined as follows:

O_next＝(W*X+b)⊙σ(V*X+。)

Wherein P _next represents the output of gating, W, V represents the convolution kernels of CNN, b, c represents offset, and the convolution kernels of two CNN networks can be the same or different; x represents the input feature before GLU, σ is a nonlinear sigmoid function, and # is a matrix operator that multiplies values at corresponding positions of the matrix.

Third section: in the implementation of Residual Connections (residual connection, resnet) in the embodiment of the present application, as shown in fig. 2, when residual connection is performed, the network input before the GLU is added to the result of the GLU operation by two convolutional neural networks, specifically as follows:

wherein I _i denotes the features previously entered by the ith GLU, Representing the output of the ith gate.

The result RES _i of the residual connection is taken as input to the next CNN network.

As shown in fig. 2, in the embodiment of the present application, two residual convolution recursive network-gated linear unit structures are adopted to perform two residual connection, but the implementation is not limited to two residual connection;

if the real-time requirement is not very high, the residual part can be increased, the network design can be very deep, and the accuracy is increased.

Fourth part:

After CNN network, it can go through Pooling dimension-reducing operation, then send to an LSTM (Bi-LSTM can be used in non-real-time scene), then connect with a full connection (FNN) layer, finally reach output layer (OUT-PUT), then calculate loss function, then through SGD (random gradient descent) method to update and adjust parameters.

In the embodiment of the present application, implementation manners of the first portion, the second portion and the fourth portion are not limited, when residual connection is implemented for the third portion, network input before GLU is added to two convolution results obtained by GLU operation, and the residual connection result RES _i is used as input of the next CNN network. The implementation of the scheme can improve the accuracy of network identification.

The output of the model is either a null or non-null state.

And 104, training the initial audio detection network model by using the spliced features and labels of the corresponding training samples to obtain a preset audio detection network model.

Thus, the establishment of the preset audio detection network model is completed.

The following describes the wall hollowing detection process in detail by combining the drawings.

Example 1

Referring to fig. 3, fig. 3 is a schematic diagram of a wall hollowing detection flow in accordance with an embodiment of the present application. The method comprises the following specific steps:

Step 301, an audio signal generated by knocking a wall is obtained.

The method can be used for collecting the audio signals of the wall body knocking at present or collecting and storing the audio signals in advance.

The acquisition device for acquiring audio signals is not limited, and may be, but not limited to, the acquisition device shown in fig. 4, and fig. 4 is a schematic diagram of an audio signal acquisition device.

The small circle part in fig. 4 is a microphone pickup device, and the number 1, 2,4, 6 or 8 can be selected according to the cost, and the collection device in fig. 4 is exemplified by 8 microphones.

The acquisition equipment is provided with a liquid crystal display screen, and can display the judging result and other matters which can assist in judging in real time.

The computing function of the acquisition device can be selected to be nested in a device chip or partially adopt cloud computing and return a result.

The acquisition equipment is provided with a handle, so that the audio signal can be conveniently acquired.

Step 302 extracts a first feature and a second feature of the audio signal.

The first feature is a fundamental frequency feature;

speech features, decoding features, neural network features.

And step 303, splicing the first feature and the second feature to obtain a spliced feature.

In the embodiment of the application, when the first feature and the second feature are spliced, the sequence of the two features is not limited, and if the first feature is [ A ₁,A₂,……,A_n ] and the second feature is [ B ₁,B₂,……,B_m ], the spliced feature can be [ A ₁,A₂,……,A_n,B₁,B₂,……,B_m ] or [ B ₁,B₂,……,B_m,A₁,A₂,……,A_n ]; but needs to be consistent with the sequence of feature stitching when the preset audio detection network model is established.

The first characteristic and the second characteristic are spliced to be used as characteristic information of the audio signal, the multidimensional characteristic of the audio signal is considered, and the detection result can be acquired more accurately.

And step 304, acquiring a recognition result corresponding to the splicing characteristic based on a preset audio detection network model.

Wherein, the recognition result is: in an empty state or not.

And step 305, determining whether the wall body is in a hollowing state according to the identification result.

When the identification result is in the empty state or not, determining whether the wall body is in the empty state according to the identification result comprises the following steps:

when the identification result is in the empty state, determining that the wall body is in the empty state;

And when the identification result is that the wall body is not in the empty state, determining that the wall body is not in the empty state.

In the embodiment of the application, the first characteristic and the second characteristic of the acquired audio signal are extracted, the first characteristic and the second characteristic are spliced to be used as the integral characteristic of the audio signal, and the recognition result corresponding to the spliced characteristic is obtained based on a preset audio detection network model; and determining whether the wall body is in a hollowing state according to the identification result. According to the scheme, the accuracy of wall hollowing detection can be improved on the premise of reducing labor cost

Example two

Referring to fig. 5, fig. 5 is a schematic diagram of a wall hollowing detection flow in a second embodiment of the application. The method comprises the following specific steps:

Step 501, an audio signal generated by knocking a wall is obtained.

The embodiment of the application does not limit the acquisition equipment for acquiring the audio signals.

And 502, performing endpoint detection on the audio signal and performing noise reduction processing.

Endpoint detection, also called voice activity detection, voice Activity Detection, VAD, is aimed at distinguishing between voice and non-voice regions. The endpoint detection is to accurately locate the starting point and the ending point of the voice from the voice with noise, remove the mute part and remove the noise part, and find the real effective content of a voice.

VAD algorithms can be roughly divided into three categories: threshold-based VAD, VAD as classifier, model VAD.

Threshold-based VAD: by extracting the characteristics of time domain (short-time energy, short-time zero crossing rate and the like) or frequency domain (MFCC, spectral entropy and the like), the aim of distinguishing voice from non-voice is achieved by reasonably setting a threshold. This is a conventional VAD method.

VAD as classifier: the voice detection can be regarded as a voice/non-voice two-classification problem, and then the classifier is trained by a machine learning method, so that the purpose of detecting the voice is achieved.

Model VAD: a complete acoustic model (the granularity of the modeling unit may be coarse) may be used to distinguish between speech segments and non-speech segments based on the global information on the basis of decoding.

The embodiment of the application is not limited to a specific implementation mode of endpoint detection.

The specific implementation can be noise reduction or dereverberation processing.

In the specific implementation of the application, noise reduction or dereverberation processing is carried out after the end point detection, so that the external interference in the audio signal can be removed to a greater extent.

The method further comprises the steps of:

and if the signal intensity of the audio signal is smaller than a preset value, performing enhancement processing on the audio signal.

I.e. the signal strength of the audio signal is relatively small, an enhancement process is performed.

The audio signal after pretreatment can be easily extracted into significant Fbank features and fundamental frequency features, so that the final audio signal can be conveniently identified.

Step 503 extracts a first feature and a second feature of the audio signal.

The first feature is a fundamental frequency feature;

speech features, decoding features, neural network features.

And step 504, splicing the first feature and the second feature to obtain a spliced feature.

And step 505, acquiring a recognition result corresponding to the splicing characteristic based on a preset audio detection network model.

Wherein, the recognition result is: in an empty state or not.

And step 506, determining whether the wall body is in a hollowing state according to the identification result.

In the embodiment of the application, the acquired audio signals are preprocessed, the preprocessed audio signals are subjected to first feature extraction and second feature extraction, the first feature and the second feature are spliced to be used as the integral feature of the audio signals, and the recognition result corresponding to the spliced feature is obtained based on a preset audio detection network model; and determining whether the wall body is in a hollowing state according to the identification result. According to the scheme, the accuracy of wall hollowing detection can be improved on the premise of reducing labor cost.

Based on the same inventive concept, the embodiment of the application also provides a wall hollowing detection device. Referring to fig. 6, fig. 6 is a schematic structural diagram of a wall hollowing detection device according to an embodiment of the present application. The device comprises: a storage unit 601, a first acquisition unit 602, a second acquisition unit 603, a third acquisition unit 604, and a determination unit 605;

A storage unit 601, configured to store a preset audio detection network model;

a first obtaining unit 602, configured to obtain an audio signal generated by knocking a wall;

A second acquisition unit 603 for extracting the first feature and the second feature of the audio signal acquired by the first acquisition unit 602; splicing the first characteristic and the second characteristic to obtain a spliced characteristic;

A third obtaining unit 604, configured to obtain, based on the preset audio detection network model stored in the storage unit 601, a recognition result corresponding to the splicing feature obtained by the second obtaining unit 603;

A determining unit 605 is configured to determine whether the wall is in a hollowing state according to the identification result acquired by the third acquiring unit 604.

In a further embodiment of the present invention,

The first feature is a fundamental frequency feature;

speech features, decoding features, neural network features.

In another embodiment, the apparatus comprises: a processing unit 606;

The processing unit 606 is configured to perform endpoint detection on the audio signal after the first obtaining unit 602 obtains the audio signal generated by knocking the wall, and before the second obtaining unit 603 obtains the first feature and the second feature of the audio signal, and perform noise reduction processing.

In a further embodiment of the present invention,

The storage unit is specifically used for acquiring training samples; acquiring first characteristics and second characteristics of the audio signals in the training samples, and splicing the first characteristics and the second characteristics to obtain spliced characteristics; establishing an initial audio detection network model; the initial audio detection network model is established based on any one of the following neural networks: deep neural network DNN, convolutional recurrent neural network CRNN, convolutional neural network CNN and long-term memory network LSTM; training the initial audio detection network model by using the splicing characteristics and labels of the corresponding training samples, and obtaining the preset audio detection network model.

In a further embodiment of the present invention,

When a neural network for establishing an initial audio detection network model is a CRNN, the CRNN adopts a residual convolution recursion network-gating linear unit structure;

In another embodiment, a residual convolution recursive network-gated linear cell structure comprises: the system comprises a first residual connecting network, a second residual connecting network, a first convolutional neural network, a second convolutional neural network, a long-short-term memory (LSTM) network, a full-connection layer and an output layer;

In a further embodiment of the present invention,

The determining unit 605 is specifically configured to determine, when the identification result is in a hollowing state or not, whether the wall is in a hollowing state according to the identification result, where the determining unit includes: when the identification result is in the empty state, determining that the wall body is in the empty state; and when the identification result is that the wall body is not in the empty state, determining that the wall body is not in the empty state.

The units of the above embodiments may be integrated or may be separately deployed; can be combined into one unit or further split into a plurality of sub-units.

In another embodiment, there is also provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the wall hollowing detection method when executing the program.

In another embodiment, a computer readable storage medium having stored thereon computer instructions which when executed by a processor perform the steps of the wall hollowing detection method is also provided.

Fig. 7 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention. As shown in fig. 7, the electronic device may include: processor (Processor) 710, communication interface (Communications Interface) 720, memory (Memory) 730, and communication bus 740, wherein Processor 710, communication interface 720, memory 730 communicate with each other via communication bus 740. Processor 710 may call logic instructions in memory 730 to perform the following method:

acquiring an audio signal generated by knocking a wall;

Further, the logic instructions in the memory 730 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the invention.

Claims

1. A method for detecting wall hollowness, the method comprising:

acquiring an audio signal generated by knocking a wall;

Determining whether the wall body is in a hollowing state according to the identification result;

The method further comprises the steps of:

obtaining a training sample;

acquiring first characteristics and second characteristics of the audio signals in the training samples, and splicing the first characteristics and the second characteristics to obtain spliced characteristics;

Establishing an initial audio detection network model;

training the initial audio detection network model by using the splicing characteristics and labels of corresponding training samples to acquire the preset audio detection network model;

when the neural network for establishing the initial audio detection network model is a convolution recurrent neural network, the convolution recurrent neural network adopts a residual convolution recurrent network-gating linear unit structure;

when residual connection is performed in the residual convolution recursive network-gating linear unit structure, network input before the gating linear unit is added with the result of operation of the gating linear unit by the two convolution networks, and the result after residual connection is used as input of the next convolution neural network.

2. The method of claim 1, wherein the step of determining the position of the substrate comprises,

The first feature is a fundamental frequency feature;

speech features, decoding features, neural network features.

3. The method of claim 1, wherein after the capturing the audio signal generated by tapping the wall, the method further comprises, prior to the capturing the first and second characteristics of the audio signal: and detecting the end point of the audio signal and performing noise reduction treatment.

4. The method of claim 1, wherein the step of determining the position of the substrate comprises,

The residual convolution recursive network-gated linear unit structure comprises: the first residual error connecting network, the second residual error connecting network, the first convolutional neural network, the second convolutional neural network, the long-term and short-term memory network, the full-connection layer and the output layer;

wherein the first residual connection network comprises a third convolutional neural network, a fourth convolutional neural network, and a first gating linear unit, the third convolutional neural network and the fourth convolutional neural network being configured to receive a network input and outputs of the third convolutional neural network and the fourth convolutional neural network being input to the first gating linear unit, an output of the first gating linear unit being added to the network input as an output of the first residual connection network;

The second residual connection network is in signal connection with the first convolutional neural network and is used for receiving the output of the first convolutional neural network, the second residual connection network comprises a fifth convolutional neural network, a sixth convolutional neural network and a second gating linear unit, the fifth convolutional neural network and the sixth convolutional neural network are used for receiving the output of the first convolutional neural network, and the outputs of the fifth convolutional neural network and the sixth convolutional neural network are input to the second gating linear unit, and the output of the second gating linear unit is added with the output of the first convolutional neural network to be used as the output of the second residual connection network;

The second convolutional neural network is in signal connection with the second residual error connection network and receives the output of the second residual error connection network, and the output of the second convolutional neural network is used as the network input of the long-period and short-period memory network;

5. The method according to any one of claim 1 to 4, wherein,

6. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1-5 when the computer program is executed by the processor.

7. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1-5.

8. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the method of any of claims 1-5.