CN113192489A

CN113192489A - Paint spraying robot voice recognition method based on multi-scale enhancement BiLSTM model

Info

Publication number: CN113192489A
Application number: CN202110531117.2A
Authority: CN
Inventors: 杨亦琛; 李娟�
Original assignee: Jinling Institute of Technology
Current assignee: Jinling Institute of Technology
Priority date: 2021-05-16
Filing date: 2021-05-16
Publication date: 2021-07-30

Abstract

A paint spraying robot voice recognition method based on a multi-scale enhancement BilSTM model. 1) Collecting a common spraying sound instruction by using a signal collecting system, wherein a data collecting card selects NI-9234; 2) repeatedly adding 100 times of Gaussian white noise into the acquired audio signal, generating a noise-containing signal, solving a corresponding Mel frequency spectrum sequence, and then solving an average sequence of 100 Mel frequency spectrum sequences; 3) extracting features of the average Mel frequency spectrum sequence by using a multi-scale convolution filter, and further mining the extracted features by using a BilSTM model to obtain corresponding output; 4) the output of the BilSTM model is spliced together, then input into a full connection layer and a Softmax layer for processing, and finally speech recognition is realized by combining a CTC algorithm; 5) and embedding the model obtained by training in the step 1-4 into a spraying robot, and intelligently realizing a corresponding spraying task. The model of the invention can realize the intelligent voice recognition function of the spraying robot and has high practical application value.

Description

Paint spraying robot voice recognition method based on multi-scale enhancement BiLSTM model

Technical Field

The invention relates to the field of intelligent spraying robots, in particular to a voice recognition method of a spraying robot based on a multi-scale enhanced BilSTM model.

Background

With the rapid development of the domestic construction industry at present, the decoration industry closely related to the construction industry also has great market prospect; however, most of work in the current decoration industry needs to be finished manually, for example, wall surface spraying is carried out manually by holding a spraying machine in hands, the spraying effect is different, and the construction quality and efficiency are difficult to guarantee.

Manual spraying has high labor intensity of workers, the spraying distance and the spraying speed are not easy to control, and the spraying thickness error is easy to cause rework and even does not meet the quality requirement; the paint contains heavy metal, radioactive substance, toxic agricultural production system solvent and the like, and the paint needs to be atomized in the spraying process, the atomized paint is easy to be inhaled into the lungs of field constructors, and the severe construction environment has great harm to the health of spraying workers. Aiming at the various problems, the patent provides a voice recognition method of a painting robot based on a multiscale reinforced BilSTM model, which can help the robot to realize automation of spraying on the wall surface of a house, so that the self-adaptive spraying operation replaces the manual disordered spraying operation, the working environment can be improved, the labor intensity of workers is reduced, the spraying efficiency is greatly improved, and the construction quality can be ensured.

The domestic patent related to the intelligent spraying robot is 'an intelligent spraying robot system and a spraying method thereof' (201910960106.9). by designing a scanning modeling unit, an off-line programming unit, a driving control unit, a robot body and a thickness detection unit, the intelligent spraying of the spraying robot is realized, and the spraying quality problem caused by errors such as spraying track errors, spraying process parameter errors and the like is effectively reduced. The invention discloses a building outer wall spraying method based on an intelligent spraying robot (202011419313.2). The building outer wall spraying method based on the intelligent spraying robot can enable a robot body to automatically perform spraying operation on the outer wall of a building along a wavy track by controlling a retraction assembly, wherein a controller can automatically supplement paint according to the surplus value of the paint in a paint box, the required manual interference is less, the manual intervention is not needed, the construction cost is low, and no personnel risk exists. The above patents are all executed by the spraying robot after the tasks are preset, and have no self-adaptability, in reality, the spraying robot needs to make corresponding changes according to different conditions, but not mechanical execution tasks, and the voice recognition function of the spraying robot is endowed with the corresponding spraying tasks which can be completed in a self-adaptive manner, so that the method has important practical significance.

Disclosure of Invention

In order to solve the problems, the invention provides a paint spraying robot voice recognition method based on a multi-scale enhancement BilSTM model on the basis of a Convolutional Neural Network (CNN) and a bidirectional Long Short-Term Memory (BilSTM). Firstly, considering the influence of noise components contained in collected signals on model identification precision, the patent provides an aggregate denoising algorithm, and the noise influence can be well eliminated through multiple aggregate averaging so as to enhance the characteristics of voice signals; secondly, aiming at the characteristic that the characteristics of the voice signal are not easy to mine, the multi-scale convolution filter bank is designed, and the characteristics existing in the signal are mined from the multi-scale direction by designing four convolution kernels with effective lengths, so that the model can be greatly helped to mine the characteristics in the voice signal, and the diagnosis precision of the model is improved; finally, a BilSTM model is adopted to further extract the characteristics of the voice signals, and a full connection layer, a Softmax layer and a CTC algorithm are added into the model to finally realize voice recognition. To achieve the purpose, the invention provides a paint spraying robot voice recognition method based on a multi-scale enhanced BilSTM model, which comprises the following specific steps:

step 1, acquiring instruction signals: collecting a common spraying sound instruction by using a signal collecting system, wherein a data collecting card selects NI-9234;

step 2, aggregate denoising pretreatment: repeatedly adding 100 times of Gaussian white noise into the acquired audio signal, generating a noise-containing signal, solving a corresponding Mel frequency spectrum sequence, and then solving an average sequence of 100 Mel frequency spectrum sequences;

step 3, multi-scale feature extraction: extracting features of the average Mel frequency spectrum sequence by using a multi-scale convolution filter, and further mining the extracted features by using a BilSTM model to obtain corresponding output;

and 4, feature fusion recognition: splicing the outputs of the BilSTM model together, inputting the spliced outputs into a full connection layer and a Softmax layer for processing, and finally realizing voice recognition by combining a CTC algorithm;

step 5, the spraying robot is applied: embedding the model obtained by training in the steps 1-4 into a spraying robot, and intelligently realizing a corresponding spraying task.

Further, the process of preprocessing the audio signal by using the set denoising preprocessing in step 2 can be expressed as:

assuming that the collected audio signal is x (t), which includes a valid signal c (t) and an ambient noise signal n (t), that is: x (t) ═ c (t) + n (t), adding 100 white gaussian noise g (t) to x (t) to generate noise signal s (t), solving the Mel frequency spectrum sequence of s (t), finally solving the average sequence Ms of 100 Mel frequency spectrum sequences_ave：

In the formula, Ms (·) represents the calculation solution of the mel-frequency spectrum sequence, and since the mean value of white gaussian noise is 0, when the number of times of adding white gaussian noise is large enough

Is close to 0, which enables the environmental noise signal n (t) in the collected signal to be filtered out, thereby greatly enhancing the characteristics of the effective signal c (t).

Further, in step 3, the Mel frequency spectrum averaging sequence Ms obtained in step 2_aveThe specific steps for extracting the multi-scale features are as follows:

step 3.1, design four one-dimensional convolution checks Ms of different scales_aveFiltering is carried out, the lengths of the four convolution kernels are respectively

Wherein L is Ms_aveLength of (d);

step 3.2, carrying out further processing by using a BilSTM model, wherein the specific steps can be expressed as follows:

step 3.2.1, building a BilSTM network with a forward propagation layer and a backward propagation layer;

step 3.2.2, forward propagation layer pair Ms is utilized_avePerforming calculation to obtain forward hidden state at t moment

The calculation expression is as follows:

in the formula, H represents an activation function of a hidden layer, and a sigmoid activation function, x, is selected in the patent_tFor inputting data (Ms)_ave)，

Representing the connection weight coefficient between the forward input layer and the hidden layer,

represents the connection weight coefficient between the forward hidden layers,

representing the forward hidden layer state at time t-1,

representing the bias coefficients of the forward hidden layer.

Step 3.2.3, Ms is paired using the backward propagation layer_aveCalculating to obtain a backward hidden state at time t

The calculation expression is as follows:

in the formula (I), the compound is shown in the specification,

represents the connection weight coefficient between the backward input layer and the hidden layer,

represents the connection weight coefficient between the backward hidden layers,

indicating the backward hidden layer state at time t-1,

representing the bias coefficients of the backward hidden layer.

Step 3.2.4, calculating output vector y of output layer_tThe calculation expression is:

in the formula (I), the compound is shown in the specification,

for the connection weight coefficient between the forward hidden layer and the output layer,

is a connection weight coefficient between the backward hidden layer and the output layer, b₀Is the bias coefficient of the output layer.

Step 3.2.5, splicing the outputs of a plurality of BilSTM models together, inputting the spliced outputs into a full connection layer, and then processing the spliced outputs by a Softmax layer;

and 3.2.6, decoding the output of the Softmax layer by utilizing a CTC algorithm to realize voice recognition.

The invention discloses a paint spraying robot voice recognition method based on a multi-scale enhanced BilSTM model, which has the beneficial effects that: the invention has the technical effects that:

1. the invention considers the influence of noise components contained in the collected signals on the model identification precision, provides an aggregate denoising algorithm, and can well eliminate the noise influence through multiple aggregate averaging to enhance the characteristics of voice signals, thereby improving the robustness of a network model;

2. aiming at the characteristic that the characteristics of the voice signal are not easy to mine, the invention designs a multi-scale convolution filter group, and mines the characteristics existing in the signal from the multi-scale direction by designing convolution kernels with four effective lengths, so that the model can be greatly helped to mine the characteristics in the voice signal, and the diagnosis precision of the model is improved;

3. the invention adopts a BilSTM model to further extract the characteristics of the voice signals, adds a full connection layer, a Softmax layer and a CTC algorithm into the model, and finally realizes the voice recognition by designing a new model.

Drawings

FIG. 1 is a flow chart of the present invention;

fig. 2 is a network structure diagram of the multiscale enhanced BiLSTM model according to the present invention.

Detailed Description

The invention is described in further detail below with reference to the following detailed description and accompanying drawings:

the invention provides a voice recognition method of a painting robot based on a multi-scale enhanced BilSTM model, and aims to help the painting robot to intelligently recognize voice so as to complete a corresponding painting task. FIG. 1 is a flow chart of the present invention, and the steps of the present invention will be described in detail in conjunction with the flow chart.

the process of preprocessing the audio signal by using the set denoising preprocessing in the step 2 can be expressed as follows:

step 3 Merr spectrum averaging sequence Ms obtained in step 2_aveThe specific steps for extracting the multi-scale features are as follows:

Wherein L is Ms_aveLength of (d);

The calculation expression is as follows:

represents the connection weight coefficient between the forward hidden layers,

representing the forward hidden layer state at time t-1,

representing the bias coefficients of the forward hidden layer.

The calculation expression is as follows:

in the formula (I), the compound is shown in the specification,

indicating the backward hidden layer state at time t-1,

representing the bias coefficients of the backward hidden layer.

in the formula (I), the compound is shown in the specification,

Fig. 2 is a network structure diagram of the multiscale enhanced BiLSTM model proposed by the present invention. It can be clearly seen from the structure diagram that, for the collected voice signal, 100 groups of white gaussian noise are added, then the mel frequency spectrum sequence of the noise-added signal is solved, then the solved mel frequency spectrum sequence is arithmetically averaged, and finally the averaged mel frequency spectrum sequence is obtained, namely, the noise interference in the original voice signal is filtered out in a collective noise-adding mode, and the characteristics of the effective signal are enhanced; designing 4 convolutional filter groups with different scales, and inputting the convolutional filter groups into a BilSTM model respectively to realize the learning of the characteristics of the original signal from a multi-scale layer; and then splicing the outputs of the BilSTM model together, and finally realizing the intelligent recognition of the voice through the processing of a full connection layer, a Softmax layer and a CTC decoding algorithm. In addition, as can be seen from the structure of the BilSTM model, the BilSTM is composed of forward and backward LSTM models, and features contained in the sound signal can be mined more accurately than the LSTM through the connection between the forward and backward hidden layers.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, but any modifications or equivalent variations made according to the technical spirit of the present invention are within the scope of the present invention as claimed.

Claims

1. A paint spraying robot voice recognition method based on a multi-scale enhancement BilSTM model comprises the following specific steps:

2. The painting robot voice recognition method based on the multi-scale enhancement BilSTM model as claimed in claim 1, characterized in that: the process of preprocessing the audio signal by using the set denoising preprocessing in the step 2 can be expressed as follows:

3. The painting robot voice recognition method based on the multi-scale enhancement BilSTM model as claimed in claim 1, characterized in that: step 3 Merr spectrum averaging sequence Ms obtained in step 2_aveTo carry outThe specific steps of multi-scale feature extraction are as follows: