CN113671031A

CN113671031A - Wall hollowing detection method and device

Info

Publication number: CN113671031A
Application number: CN202110958298.7A
Authority: CN
Inventors: 解传栋; 李先刚; 邹伟; 汤志远; 赵帅江
Original assignee: Beijing Fangjianghu Technology Co Ltd
Current assignee: Seashell Housing Beijing Technology Co Ltd
Priority date: 2021-08-20
Filing date: 2021-08-20
Publication date: 2021-11-19
Anticipated expiration: 2041-08-20
Also published as: CN113671031B

Abstract

The application provides a wall hollowing detection method and device, wherein the method comprises the following steps: acquiring an audio signal generated by knocking a wall body; acquiring a first feature and a second feature of the audio signal; splicing the first feature and the second feature to obtain a spliced feature; acquiring an identification result corresponding to the splicing characteristic based on a preset audio detection network model; and determining whether the wall is in a hollow state or not according to the identification result. The method can improve the accuracy of wall hollowing detection on the premise of reducing labor cost.

Description

Wall hollowing detection method and device

Technical Field

The embodiment of the disclosure relates to a wall hollowing detection method and device.

Background

The facing layer of the building structure system is subjected to natural force action and thermal force action for a long time, so that the problems of aging, damage, looseness and the like of the facing layer material and a connecting material and a supporting structure of the facing layer material are solved, the integral or local debonding, looseness, hollowing and even shedding of the facing layer are caused, and personal and property injuries are easily caused.

At present, the hollowing detection of the wall body is mainly realized by methods such as manual knocking, ear identification or infrared imaging.

In the process of realizing the application, the method for identifying through manual knocking and ear identification has higher labor cost, and inaccurate detection is caused due to the influence of human factors; the infrared imaging detection is easily interfered by the outside world, such as noise, heat source and the like, which causes inaccurate detection.

Disclosure of Invention

In view of this, the application provides a wall hollowing detection method and device, which can improve the accuracy of wall hollowing detection on the premise of reducing labor cost.

In order to solve the technical problem, the technical scheme of the application is realized as follows:

in one embodiment, a wall hollowing detection method is provided, the method comprising:

acquiring an audio signal generated by knocking a wall body;

acquiring a first feature and a second feature of the audio signal;

splicing the first feature and the second feature to obtain a spliced feature;

acquiring an identification result corresponding to the splicing characteristic based on a preset audio detection network model;

and determining whether the wall is in a hollow state or not according to the identification result.

In another embodiment, there is provided a wall hollowing detection apparatus, the apparatus comprising: the device comprises a storage unit, a first acquisition unit, a second acquisition unit, a third acquisition unit and a determination unit;

the storage unit is used for storing a preset audio detection network model;

the first acquisition unit is used for acquiring an audio signal generated by knocking a wall body;

the second obtaining unit is used for obtaining the first characteristic and the second characteristic of the audio signal obtained by the first obtaining unit; splicing the first characteristic and the second characteristic to obtain a spliced characteristic;

the third obtaining unit is configured to obtain, based on the preset audio detection network model stored in the storage unit, an identification result corresponding to the splicing feature obtained by the second obtaining unit;

the determining unit is configured to determine whether the wall is in an empty drum state according to the identification result obtained by the third obtaining unit.

In another embodiment, an electronic device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the wall hollowing detection method when executing the program.

In another embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the wall hollowing detection method.

In another embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the wall hollowing detection method.

According to the technical scheme, the characteristics of the collected audio signals are obtained, the obtained characteristics are spliced to be used as the overall characteristics of the audio signals, and the recognition results corresponding to the spliced characteristics are obtained based on the preset audio detection network model; and determining whether the wall is in the empty drum state or not according to the identification result. According to the scheme, the accuracy of wall hollowing detection can be improved on the premise of reducing labor cost.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

Fig. 1 is a schematic flow chart illustrating a process of acquiring a preset audio detection network model in an embodiment of the present application;

FIG. 2 is a schematic diagram of a residual CRNN-GLU structure in an embodiment of the present application;

FIG. 3 is a schematic view illustrating a process of detecting hollowing of a wall according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an audio signal acquisition device;

FIG. 5 is a schematic view illustrating a process of detecting hollowing of a wall according to a second embodiment of the present application;

FIG. 6 is a schematic structural diagram of a wall hollowing detection device in an embodiment of the present application;

fig. 7 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, method, article, or apparatus.

The technical solution of the present invention will be described in detail with specific examples. Several of the following embodiments may be combined with each other and some details of the same or similar concepts or processes may not be repeated in some embodiments.

The embodiment of the application provides a wall hollowing detection method, which comprises the steps of acquiring characteristics of collected audio signals, splicing the acquired characteristics to serve as integral characteristics of the audio signals, and acquiring identification results corresponding to the spliced characteristics based on a preset audio detection network model; and determining whether the wall is in the empty drum state or not according to the identification result. According to the scheme, the accuracy of wall hollowing detection can be improved on the premise of reducing labor cost.

The preset audio detection network model in the embodiment of the application can be pre-established for storage, can be directly used when needed, and can also be established and used when needed.

The method for establishing the preset audio detection network model specifically comprises the following steps:

referring to fig. 1, fig. 1 is a schematic flowchart of acquiring a preset audio detection network model in the embodiment of the present application. The method comprises the following specific steps:

step 101, obtaining a training sample.

The obtained training samples can be unlabeled, partially labeled or fully labeled samples, and the unlabeled samples can be labeled by using the initial audio detection network model as the training samples.

And 102, acquiring a first feature and a second feature of the training sample, and splicing the first feature and the second feature to obtain a spliced feature.

Wherein the first feature is a fundamental frequency feature;

the fundamental frequency features are used for representing tone characteristics, and the multi-dimensional fundamental frequency features can be selected and extracted according to practical application, for example, the following five fundamental frequency features can be selected:

1) the base frequency value of the voice frame;

2) the fundamental frequency first-order difference value of the adjacent frames;

3) the fundamental frequency second-order difference value of the adjacent frames;

4) the length of the current continuous base frequency band;

5) the difference between the current frame fundamental frequency value and the average value of the next N frame fundamental frequencies of the previous continuous fundamental frequency band, where N is usually selected to be 10.

The five-dimensional fundamental frequency features can achieve better feature combinations, but the method involved in the concrete implementation of the application is not limited to the five-dimensional feature combinations, and the increase or decrease of the feature number does not affect the application of the application.

The above method for extracting the fundamental frequency feature is an exemplary implementation manner, and the embodiment of the present application is not limited to the implementation manner.

The second characteristic is any one of the following characteristics or a characteristic obtained by splicing at least two of the following characteristics:

speech features, decoding features, neural network features.

That is, the second feature may be a speech feature, a decoding feature, or a neural network feature;

or the like, or, alternatively,

features after the speech features and the decoding features are spliced;

or the like, or, alternatively,

features after the phonetic features and neural network features are spliced;

or the like, or, alternatively,

decoding the characteristics after the characteristics and the neural network characteristics are spliced;

or the like, or, alternatively,

the characteristics after the speech characteristics, the decoding characteristics and the neural network characteristics are spliced.

And when the second feature is the spliced feature, respectively acquiring the features before splicing, splicing the acquired features, and acquiring the spliced feature, wherein if the second feature is the feature after splicing the decoding feature and the neural network feature, respectively acquiring the decoding feature and the neural network feature of the sample, and splicing the decoding feature and the neural network feature to acquire the spliced feature.

Wherein the content of the first and second substances,

the speech features are as follows: filter Bank (Fbank) characteristics, Mel-Frequency Cepstral Coefficients (MFCC), etc.;

neural network features are as follows: a bottle neck feature, a DNN feature, a CNN feature, an LSTM feature, etc.;

the decoding characteristics are as follows: threshold, maximum, minimum, confidence, grid distribution statistics, etc. for the sentence.

If any one of the acquired voice features, decoding features and neural network features has multiple features, the same type of features can be spliced.

In the embodiment of the present application, the manner of obtaining the speech feature, the decoding feature, and the neural network feature is not limited, and here, the manner of obtaining is not illustrated.

In the embodiment of the present application, when the first feature and the second feature are combined, the order of the two features is not limited, for example, the first feature is [ a ]₁,A₂,……，A_n]The second characteristic is [ B₁,B₂,……，B_m]Then the spliced feature may be [ A ]₁,A₂,……，A_n，B₁,B₂,……，B_m]May also be [ B ]₁,B₂,……，B_m，A₁,A₂,……，A_n]。

Step 103, establishing an initial audio detection network model.

The initial audio detection network model is established based on any one of the following neural networks: deep Neural Networks (DNNs), Convolutional Recurrent Neural Networks (CRNN), Convolutional Neural Networks (CNNs), Long-Short Term Memory Networks (LSTM), i.e., time-cycled Neural Networks;

when a Neural Network for establishing an initial audio detection Network model is a CRNN, the CRNN adopts a residual convolution recursive Network (CRNN) -Gated Linear Unit (GLU) structure;

when residual connection is carried out in the residual convolution recursive network-gate control linear unit structure, the network input before the gate control linear unit is added with the results of two convolution networks through gate control linear unit operation, and the results after residual connection are used as the input of the next CNN network.

In the embodiment of the present application, taking an example that the convolutional recurrent neural network includes two residual connections, the residual convolutional recurrent network-gated linear unit structure includes: a first residual connecting network, a second residual connecting network, a first convolutional neural network, a second convolutional neural network, a Long Short Term Memory (LSTM) network, a full connecting layer and an output layer;

wherein the first residual connecting network comprises a third convolutional neural network, a fourth convolutional neural network and a first gated linear unit, the third convolutional neural network and the fourth convolutional neural network are used for receiving network input, the output of the third convolutional neural network and the output of the fourth convolutional neural network are input to the first gated linear unit, and the output of the first gated linear unit is added with the network input to be used as the output of the first residual connecting network;

the first convolution neural network is in signal connection with the first residual error connecting network and is used for receiving the output of the first residual error connecting network;

the second residual connecting network is in signal connection with the first convolution neural network and used for receiving the output of the first convolution neural network, the second residual connecting network comprises a fifth convolution neural network, a sixth convolution neural network and a second gating linear unit, the fifth convolution neural network and the sixth convolution neural network are used for receiving the output of the first convolution neural network, the output of the fifth convolution neural network and the output of the sixth convolution neural network are input to the second gating linear unit, and the output of the second gating linear unit is added with the output of the first convolution neural network to serve as the output of the second residual connecting network;

the second convolutional neural network is in signal connection with the second residual error connecting network and receives the output of the second residual error connecting network, and the output of the second convolutional neural network is used as the network input of the long-term and short-term memory network;

the network output of the long-short term memory network is used as the input of the full connection layer;

the output of the full connection layer is used as the input of the output layer;

the output of the output layer is used as the output of the residual convolution recursive network-gated linear unit structure.

The first convolutional neural network to the sixth convolutional neural network may be the same network or different networks, which is not limited in this embodiment of the present application.

The network structure of the convolutional recurrent neural network is illustrated below with reference to the drawings.

Referring to fig. 2, fig. 2 is a schematic structural diagram of residual CRNN-GLU in the embodiment of the present application. In fig. 2, from top to bottom, a first gated linear unit (a first gated linear unit) corresponds to a first residual connecting network, a second gated linear unit (a second gated linear unit) corresponds to a second residual connecting network, and two CNNs in the first residual connecting network are a third convolutional neural network and a fourth convolutional neural network; two CNNs in the second residual error connection network are a fifth convolutional neural network and a sixth convolutional neural network, the CNN between the first residual error connection network and the second residual error connection network is a first convolutional neural network, and the CNN between the second residual error connection network and a Long Short Term Memory (LSTM) network is a second convolutional neural network.

The following four implementation processes are given for the structure:

a first part:

loss function: since there may be overlap between events, a LOSS function (LOSS) may be used in embodiments of the present application: BCE (binary cross-entry) loss function, defined as follows:

wherein, P_nRepresents a prediction signal obtained by a residual error CRNN-GLU structure on the nth frame, O_nIndicating that, at the nth frame, the audio corresponds to the correct signal,n denotes the total number of frames of one sample.

A second part:

GLU operation: in the network structure, the network is provided with a plurality of network interfaces,

represents a gating operation, the GLU (gated Linear Unit), is defined as follows:

O_next＝(W*X+b)⊙σ(V*X+。)

wherein, P_nextThe output of gating is represented, W and V represent convolution kernels of CNN, b and c represent bias, and the convolution kernels of the two CNN networks can be the same or different; x represents the input characteristic before GLU, sigma is a nonlinear sigmoid function, and an indicator is a matrix operator and multiplies the value at the corresponding position of the matrix, wherein the indicator represents convolution operation.

And a third part: as shown in fig. 2, when Residual connection is performed, the network input before GLU is added to the result of GLU operation by two convolutional neural networks, specifically as follows:

wherein, I_iRepresenting features input prior to the ith GLU,

representing the output of the ith gate.

Connecting the residual error results RES_iAs input to the next CNN network.

As shown in fig. 2, in the embodiment of the present application, two residual convolution recursive network-gated linear unit structures are used to perform two residual connections, but the implementation is not limited to the implementation of two residual connections;

if the requirement on the real-time performance is not high, the residual error part can be increased, the network can be designed to be deep, and the accuracy can be increased.

The fourth part:

after the CNN network, Pooling dimensionality reduction operation can be carried OUT, then the CNN network is sent to an LSTM (a Bi-LSTM can be adopted in a non-real-time scene), then a full connection (FNN) layer is connected, finally an output layer (OUT-PUT) is reached, then a loss function is calculated, and parameter iteration updating and optimization are carried OUT through an SGD (random gradient descent) method.

In the embodiment of the present application, implementation manners of the first part, the second part, and the fourth part are not limited, and when residual connection is implemented for the third part, the network input before the GLU is added to the result of two convolutions through the GLU operation, and the result RES after residual connection is added_iAs input to the next CNN network. The realization of the scheme can improve the accuracy of network identification.

The output of the model is either an empty state or a non-empty state.

And 104, training the initial audio detection network model by using the splicing characteristics and the labels corresponding to the training samples to obtain a preset audio detection network model.

And at this point, the establishment of the preset audio detection network model is completed.

The process for detecting the wall hollowing in the embodiment of the application is described in detail below with reference to the accompanying drawings.

Example one

Referring to fig. 3, fig. 3 is a schematic view illustrating a process of detecting hollowing of a wall body according to an embodiment of the present application. The method comprises the following specific steps:

step 301, acquiring an audio signal generated by knocking a wall.

The audio signal acquisition can be carried out on the current wall body knocking, or the stored audio signal can be acquired in advance.

The capturing device for capturing the audio signal is not limited, such as but not limited to the capturing device shown in fig. 4, and fig. 4 is a schematic diagram of an audio signal capturing device.

The small circle part in fig. 4 is a microphone pickup device, the number of which is 1, 2, 4, 6 or 8 can be selected according to the cost, and the acquisition device in fig. 4 takes 8 microphones as an example.

The collecting device is provided with a liquid crystal display screen which can display the judgment result and other items which can assist in judgment in real time.

The computing function of the acquisition device can be selected to be nested in a device chip or partially adopt cloud computing and return a result.

The acquisition equipment is provided with a handle, so that the audio signal can be conveniently acquired for use.

Step 302, extracting a first feature and a second feature of the audio signal.

The first feature is a fundamental frequency feature;

speech features, decoding features, neural network features.

And 303, splicing the first characteristic and the second characteristic to obtain a spliced characteristic.

In the embodiment of the present application, when the first feature and the second feature are combined, the order of the two features is not limited, for example, the first feature is [ a ]₁,A₂,……，A_n]The second characteristic is [ B₁,B₂,……，B_m]Then the spliced feature may be [ A ]₁,A₂,……，A_n，B₁,B₂,……，B_m]May also be [ B ]₁,B₂,……，B_m，A₁,A₂,……，A_n](ii) a But needs to be consistent with the sequence of feature splicing when the preset audio detection network model is established.

The first characteristic and the second characteristic are spliced to serve as the characteristic information of the audio signal, the multi-dimensional characteristic of the audio signal is considered, and the detection result can be obtained more accurately.

And 304, acquiring an identification result corresponding to the splicing characteristic based on a preset audio detection network model.

Wherein the identification result is: in an empty state or not.

And 305, determining whether the wall is in a hollow state according to the identification result.

When the identification result is in the empty drum state or not, determining whether the wall is in the empty drum state according to the identification result, including:

when the identification result is in the empty drum state, determining that the wall body is in the empty drum state;

and when the identification result is that the wall body is not in the empty drum state, determining that the wall body is not in the empty drum state.

According to the embodiment of the application, the first feature and the second feature are extracted from the collected audio signal, the first feature and the second feature are spliced to be used as the overall feature of the audio signal, and the identification result corresponding to the spliced feature is obtained based on a preset audio detection network model; and determining whether the wall is in the empty drum state or not according to the identification result. According to the scheme, the accuracy of wall hollowing detection can be improved on the premise of reducing labor cost

Example two

Referring to fig. 5, fig. 5 is a schematic view illustrating a process of detecting hollowing of a wall body according to a second embodiment of the present application. The method comprises the following specific steps:

step 501, obtaining an audio signal generated by knocking a wall.

The embodiment of the present application does not limit the acquisition device for acquiring audio signals.

Step 502, performing endpoint detection on the audio signal, and performing noise reduction processing.

Endpoint Detection, also called Voice Activity Detection, VAD, aims to distinguish between speech and non-speech areas. The end point detection is to accurately locate the starting point and the ending point of the voice from the voice with noise, remove the mute part and the noise part, and find a piece of content really effective in the voice.

VAD algorithms can be roughly classified into three categories: threshold-based VAD, VAD as classifier, model VAD.

Threshold-based VAD: the aim of distinguishing the voice from the non-voice is achieved by extracting characteristics of a time domain (short-time energy, short-time zero crossing rate and the like) or a frequency domain (MFCC, spectral entropy and the like) and reasonably setting a threshold. This is the conventional VAD method.

VAD as classifier: the voice detection can be regarded as a voice/non-voice two-classification problem, and a classifier is trained by a machine learning method to achieve the purpose of voice detection.

Model VAD: the speech segments and the non-speech segments can be distinguished by global information on the basis of decoding by using a complete acoustic model (the granularity of the modeling unit can be very coarse).

The embodiment of the present application does not limit the specific implementation manner of endpoint detection.

The specific implementation may be noise reduction or dereverberation.

In the specific implementation of the method, after the endpoint detection, the noise reduction or dereverberation processing is carried out, so that the external interference in the audio signal can be removed to a greater extent.

The method further comprises the following steps:

and if the signal intensity of the audio signal is smaller than a preset value, performing enhancement processing on the audio signal.

I.e. when the signal strength of the audio signal is relatively small, enhancement processing is performed.

The audio signal after preprocessing can easily extract meaningful Fbank characteristics and fundamental frequency characteristics, and the identification of the final audio signal is convenient.

Step 503, extracting a first feature and a second feature of the audio signal.

The first feature is a fundamental frequency feature;

speech features, decoding features, neural network features.

And step 504, splicing the first feature and the second feature to obtain a spliced feature.

And 505, acquiring an identification result corresponding to the splicing characteristic based on a preset audio detection network model.

Wherein the identification result is: in an empty state or not.

Step 506, determining whether the wall is in a hollow state according to the identification result.

According to the embodiment of the application, the collected audio signals are preprocessed, first features and second features of the preprocessed audio signals are extracted, the first features and the second features are spliced to be used as overall features of the audio signals, and recognition results corresponding to the spliced features are obtained based on a preset audio detection network model; and determining whether the wall is in the empty drum state or not according to the identification result. According to the scheme, the accuracy of wall hollowing detection can be improved on the premise of reducing labor cost.

Based on the same inventive concept, the embodiment of the application also provides a wall hollowing detection device. Referring to fig. 6, fig. 6 is a schematic structural diagram of a wall hollowing detection device in the embodiment of the present application. The device comprises: a storage unit 601, a first acquisition unit 602, a second acquisition unit 603, a third acquisition unit 604, and a determination unit 605;

the storage unit 601 is used for storing a preset audio detection network model;

a first obtaining unit 602, configured to obtain an audio signal generated by tapping a wall;

a second obtaining unit 603 configured to extract a first feature and a second feature of the audio signal obtained by the first obtaining unit 602; splicing the first characteristic and the second characteristic to obtain a spliced characteristic;

a third obtaining unit 604, configured to obtain, based on the preset audio detection network model stored in the storage unit 601, an identification result corresponding to the splicing feature obtained by the second obtaining unit 603;

a determining unit 605, configured to determine whether the wall is in an empty drum state according to the identification result obtained by the third obtaining unit 604.

In another embodiment of the present invention, the substrate is,

the first feature is a fundamental frequency feature;

speech features, decoding features, neural network features.

In another embodiment, the apparatus comprises: a processing unit 606;

the processing unit 606 is configured to, after the first obtaining unit 602 obtains the audio signal generated by tapping the wall, perform endpoint detection on the audio signal before the second obtaining unit 603 obtains the first feature and the second feature of the audio signal, and perform noise reduction processing on the audio signal.

In another embodiment of the present invention, the substrate is,

the storage unit is specifically used for acquiring training samples; acquiring a first feature and a second feature of the audio signal in the training sample, and splicing the first feature and the second feature to obtain a spliced feature; establishing an initial audio detection network model; wherein the initial audio detection network model is established based on any one of the following neural networks: a deep neural network DNN, a convolution recurrent neural network CRNN, a convolution neural network CNN and a long-short term memory network LSTM; and training the initial audio detection network model by using the splicing characteristics and the labels corresponding to the training samples to obtain the preset audio detection network model.

In another embodiment of the present invention, the substrate is,

when a neural network for establishing an initial audio detection network model is a CRNN, the CRNN adopts a residual convolution recursive network-gate control linear unit structure;

In another embodiment, a residual convolutional recursive network-gated linear cell structure comprises: a first residual connecting network, a second residual connecting network, a first convolutional neural network, a second convolutional neural network, a Long Short Term Memory (LSTM) network, a full connecting layer and an output layer;

In another embodiment of the present invention, the substrate is,

the determining unit 605 is specifically configured to, when the identification result is in an empty drum state or not, determine whether the wall is in the empty drum state according to the identification result, including: when the identification result is in the empty drum state, determining that the wall body is in the empty drum state; and when the identification result is that the wall body is not in the empty drum state, determining that the wall body is not in the empty drum state.

The units of the above embodiments may be integrated into one body, or may be separately deployed; may be combined into one unit or further divided into a plurality of sub-units.

In another embodiment, an electronic device is also provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the wall hollowing detection method when executing the program.

In another embodiment, a computer readable storage medium is also provided, having stored thereon computer instructions, which when executed by a processor, may implement the steps in the wall hollowing detection method.

Fig. 7 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 7, the electronic device may include: a Processor (Processor)710, a communication Interface 720, a Memory (Memory)730 and a communication bus 740, wherein the Processor 710, the communication Interface 720 and the Memory 730 communicate with each other via the communication bus 740. Processor 710 may call logic instructions in memory 730 to perform the following method:

acquiring an audio signal generated by knocking a wall body;

acquiring a first feature and a second feature of the audio signal;

splicing the first feature and the second feature to obtain a spliced feature;

In addition, the logic instructions in the memory 730 can be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A wall hollowing detection method is characterized by comprising the following steps:

acquiring an audio signal generated by knocking a wall body;

acquiring a first feature and a second feature of the audio signal;

splicing the first feature and the second feature to obtain a spliced feature;

2. The method of claim 1,

the first feature is a fundamental frequency feature;

speech features, decoding features, neural network features.

3. The method of claim 1, wherein after obtaining the audio signal generated by tapping the wall and before obtaining the first and second features of the audio signal, the method further comprises:

and carrying out endpoint detection on the audio signal and carrying out noise reduction processing.

4. The method of claim 1, further comprising:

obtaining a training sample;

acquiring a first feature and a second feature of the audio signal in the training sample, and splicing the first feature and the second feature to obtain a spliced feature;

establishing an initial audio detection network model; wherein the initial audio detection network model is established based on any one of the following neural networks: a deep neural network, a convolution recurrent neural network, a convolution neural network and a long-short term memory network;

and training the initial audio detection network model by using the splicing characteristics and the labels corresponding to the training samples to obtain the preset audio detection network model.

5. The method of claim 4,

when the neural network for establishing the initial audio detection network model is a convolution recurrent neural network, the convolution recurrent neural network adopts a residual convolution recurrent network-gate control linear unit structure;

when residual connection is carried out in the residual convolution recursive network-gate control linear unit structure, the network input before the gate control linear unit is added with the results of two convolution networks through gate control linear unit operation, and the results after residual connection are used as the input of the next convolution neural network.

6. The method of claim 5,

the residual convolution recursive network-gated linear unit structure comprises: the system comprises a first residual error connecting network, a second residual error connecting network, a first convolutional neural network, a second convolutional neural network, a long-term and short-term memory network, a full connecting layer and an output layer;

wherein the first residual connecting network comprises a third convolutional neural network, a fourth convolutional neural network and a first gated linear unit, the third convolutional neural network and the fourth convolutional neural network are used for receiving network input and the output of the third convolutional neural network and the fourth convolutional neural network is input to the first gated linear unit, the output of the first gated linear unit is added with the network input as the output of the first residual connecting network;

the first convolutional neural network is in signal connection with the first residual connecting network and is used for receiving the output of the first residual connecting network;

the second residual connecting network is in signal connection with the first convolutional neural network and is used for receiving the output of the first convolutional neural network, the second residual connecting network comprises a fifth convolutional neural network, a sixth convolutional neural network and a second gated linear unit, the fifth convolutional neural network and the sixth convolutional neural network are used for receiving the output of the first convolutional neural network, the output of the fifth convolutional neural network and the output of the sixth convolutional neural network are input to the second gated linear unit, and the output of the second gated linear unit is added with the output of the first convolutional neural network to be used as the output of the second residual connecting network;

the second convolutional neural network is in signal connection with the second residual error connection network and receives the output of the second residual error connection network, and the output of the second convolutional neural network is used as the network input of the long-short term memory network;

and the output of the output layer is used as the output of the residual convolution recursive network-gated linear unit structure.

7. The method according to any one of claims 1 to 6,

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-7 when executing the program.

9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 7.

10. A computer program product comprising a computer program, characterized in that the computer program realizes the method of any of claims 1-7 when executed by a processor.