CN113192489A - Paint spraying robot voice recognition method based on multi-scale enhancement BiLSTM model - Google Patents

Paint spraying robot voice recognition method based on multi-scale enhancement BiLSTM model Download PDF

Info

Publication number
CN113192489A
CN113192489A CN202110531117.2A CN202110531117A CN113192489A CN 113192489 A CN113192489 A CN 113192489A CN 202110531117 A CN202110531117 A CN 202110531117A CN 113192489 A CN113192489 A CN 113192489A
Authority
CN
China
Prior art keywords
layer
signal
frequency spectrum
spraying
voice recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110531117.2A
Other languages
Chinese (zh)
Inventor
杨亦琛
李娟�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jinling Institute of Technology
Original Assignee
Jinling Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jinling Institute of Technology filed Critical Jinling Institute of Technology
Priority to CN202110531117.2A priority Critical patent/CN113192489A/en
Publication of CN113192489A publication Critical patent/CN113192489A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Manipulator (AREA)

Abstract

A paint spraying robot voice recognition method based on a multi-scale enhancement BilSTM model. 1) Collecting a common spraying sound instruction by using a signal collecting system, wherein a data collecting card selects NI-9234; 2) repeatedly adding 100 times of Gaussian white noise into the acquired audio signal, generating a noise-containing signal, solving a corresponding Mel frequency spectrum sequence, and then solving an average sequence of 100 Mel frequency spectrum sequences; 3) extracting features of the average Mel frequency spectrum sequence by using a multi-scale convolution filter, and further mining the extracted features by using a BilSTM model to obtain corresponding output; 4) the output of the BilSTM model is spliced together, then input into a full connection layer and a Softmax layer for processing, and finally speech recognition is realized by combining a CTC algorithm; 5) and embedding the model obtained by training in the step 1-4 into a spraying robot, and intelligently realizing a corresponding spraying task. The model of the invention can realize the intelligent voice recognition function of the spraying robot and has high practical application value.

Description

Paint spraying robot voice recognition method based on multi-scale enhancement BiLSTM model
Technical Field
The invention relates to the field of intelligent spraying robots, in particular to a voice recognition method of a spraying robot based on a multi-scale enhanced BilSTM model.
Background
With the rapid development of the domestic construction industry at present, the decoration industry closely related to the construction industry also has great market prospect; however, most of work in the current decoration industry needs to be finished manually, for example, wall surface spraying is carried out manually by holding a spraying machine in hands, the spraying effect is different, and the construction quality and efficiency are difficult to guarantee.
Manual spraying has high labor intensity of workers, the spraying distance and the spraying speed are not easy to control, and the spraying thickness error is easy to cause rework and even does not meet the quality requirement; the paint contains heavy metal, radioactive substance, toxic agricultural production system solvent and the like, and the paint needs to be atomized in the spraying process, the atomized paint is easy to be inhaled into the lungs of field constructors, and the severe construction environment has great harm to the health of spraying workers. Aiming at the various problems, the patent provides a voice recognition method of a painting robot based on a multiscale reinforced BilSTM model, which can help the robot to realize automation of spraying on the wall surface of a house, so that the self-adaptive spraying operation replaces the manual disordered spraying operation, the working environment can be improved, the labor intensity of workers is reduced, the spraying efficiency is greatly improved, and the construction quality can be ensured.
The domestic patent related to the intelligent spraying robot is 'an intelligent spraying robot system and a spraying method thereof' (201910960106.9). by designing a scanning modeling unit, an off-line programming unit, a driving control unit, a robot body and a thickness detection unit, the intelligent spraying of the spraying robot is realized, and the spraying quality problem caused by errors such as spraying track errors, spraying process parameter errors and the like is effectively reduced. The invention discloses a building outer wall spraying method based on an intelligent spraying robot (202011419313.2). The building outer wall spraying method based on the intelligent spraying robot can enable a robot body to automatically perform spraying operation on the outer wall of a building along a wavy track by controlling a retraction assembly, wherein a controller can automatically supplement paint according to the surplus value of the paint in a paint box, the required manual interference is less, the manual intervention is not needed, the construction cost is low, and no personnel risk exists. The above patents are all executed by the spraying robot after the tasks are preset, and have no self-adaptability, in reality, the spraying robot needs to make corresponding changes according to different conditions, but not mechanical execution tasks, and the voice recognition function of the spraying robot is endowed with the corresponding spraying tasks which can be completed in a self-adaptive manner, so that the method has important practical significance.
Disclosure of Invention
In order to solve the problems, the invention provides a paint spraying robot voice recognition method based on a multi-scale enhancement BilSTM model on the basis of a Convolutional Neural Network (CNN) and a bidirectional Long Short-Term Memory (BilSTM). Firstly, considering the influence of noise components contained in collected signals on model identification precision, the patent provides an aggregate denoising algorithm, and the noise influence can be well eliminated through multiple aggregate averaging so as to enhance the characteristics of voice signals; secondly, aiming at the characteristic that the characteristics of the voice signal are not easy to mine, the multi-scale convolution filter bank is designed, and the characteristics existing in the signal are mined from the multi-scale direction by designing four convolution kernels with effective lengths, so that the model can be greatly helped to mine the characteristics in the voice signal, and the diagnosis precision of the model is improved; finally, a BilSTM model is adopted to further extract the characteristics of the voice signals, and a full connection layer, a Softmax layer and a CTC algorithm are added into the model to finally realize voice recognition. To achieve the purpose, the invention provides a paint spraying robot voice recognition method based on a multi-scale enhanced BilSTM model, which comprises the following specific steps:
step 1, acquiring instruction signals: collecting a common spraying sound instruction by using a signal collecting system, wherein a data collecting card selects NI-9234;
step 2, aggregate denoising pretreatment: repeatedly adding 100 times of Gaussian white noise into the acquired audio signal, generating a noise-containing signal, solving a corresponding Mel frequency spectrum sequence, and then solving an average sequence of 100 Mel frequency spectrum sequences;
step 3, multi-scale feature extraction: extracting features of the average Mel frequency spectrum sequence by using a multi-scale convolution filter, and further mining the extracted features by using a BilSTM model to obtain corresponding output;
and 4, feature fusion recognition: splicing the outputs of the BilSTM model together, inputting the spliced outputs into a full connection layer and a Softmax layer for processing, and finally realizing voice recognition by combining a CTC algorithm;
step 5, the spraying robot is applied: embedding the model obtained by training in the steps 1-4 into a spraying robot, and intelligently realizing a corresponding spraying task.
Further, the process of preprocessing the audio signal by using the set denoising preprocessing in step 2 can be expressed as:
assuming that the collected audio signal is x (t), which includes a valid signal c (t) and an ambient noise signal n (t), that is: x (t) ═ c (t) + n (t), adding 100 white gaussian noise g (t) to x (t) to generate noise signal s (t), solving the Mel frequency spectrum sequence of s (t), finally solving the average sequence Ms of 100 Mel frequency spectrum sequencesave
Figure BDA0003067910390000021
In the formula, Ms (·) represents the calculation solution of the mel-frequency spectrum sequence, and since the mean value of white gaussian noise is 0, when the number of times of adding white gaussian noise is large enough
Figure BDA0003067910390000022
Is close to 0, which enables the environmental noise signal n (t) in the collected signal to be filtered out, thereby greatly enhancing the characteristics of the effective signal c (t).
Further, in step 3, the Mel frequency spectrum averaging sequence Ms obtained in step 2aveThe specific steps for extracting the multi-scale features are as follows:
step 3.1, design four one-dimensional convolution checks Ms of different scalesaveFiltering is carried out, the lengths of the four convolution kernels are respectively
Figure BDA0003067910390000031
Wherein L is MsaveLength of (d);
step 3.2, carrying out further processing by using a BilSTM model, wherein the specific steps can be expressed as follows:
step 3.2.1, building a BilSTM network with a forward propagation layer and a backward propagation layer;
step 3.2.2, forward propagation layer pair Ms is utilizedavePerforming calculation to obtain forward hidden state at t moment
Figure BDA0003067910390000032
The calculation expression is as follows:
Figure BDA0003067910390000033
in the formula, H represents an activation function of a hidden layer, and a sigmoid activation function, x, is selected in the patenttFor inputting data (Ms)ave),
Figure BDA0003067910390000034
Representing the connection weight coefficient between the forward input layer and the hidden layer,
Figure BDA0003067910390000035
represents the connection weight coefficient between the forward hidden layers,
Figure BDA0003067910390000036
representing the forward hidden layer state at time t-1,
Figure BDA0003067910390000037
representing the bias coefficients of the forward hidden layer.
Step 3.2.3, Ms is paired using the backward propagation layeraveCalculating to obtain a backward hidden state at time t
Figure BDA0003067910390000038
The calculation expression is as follows:
Figure BDA0003067910390000039
in the formula (I), the compound is shown in the specification,
Figure BDA00030679103900000310
represents the connection weight coefficient between the backward input layer and the hidden layer,
Figure BDA00030679103900000311
represents the connection weight coefficient between the backward hidden layers,
Figure BDA00030679103900000312
indicating the backward hidden layer state at time t-1,
Figure BDA00030679103900000313
representing the bias coefficients of the backward hidden layer.
Step 3.2.4, calculating output vector y of output layertThe calculation expression is:
Figure BDA00030679103900000314
in the formula (I), the compound is shown in the specification,
Figure BDA00030679103900000315
for the connection weight coefficient between the forward hidden layer and the output layer,
Figure BDA00030679103900000316
is a connection weight coefficient between the backward hidden layer and the output layer, b0Is the bias coefficient of the output layer.
Step 3.2.5, splicing the outputs of a plurality of BilSTM models together, inputting the spliced outputs into a full connection layer, and then processing the spliced outputs by a Softmax layer;
and 3.2.6, decoding the output of the Softmax layer by utilizing a CTC algorithm to realize voice recognition.
The invention discloses a paint spraying robot voice recognition method based on a multi-scale enhanced BilSTM model, which has the beneficial effects that: the invention has the technical effects that:
1. the invention considers the influence of noise components contained in the collected signals on the model identification precision, provides an aggregate denoising algorithm, and can well eliminate the noise influence through multiple aggregate averaging to enhance the characteristics of voice signals, thereby improving the robustness of a network model;
2. aiming at the characteristic that the characteristics of the voice signal are not easy to mine, the invention designs a multi-scale convolution filter group, and mines the characteristics existing in the signal from the multi-scale direction by designing convolution kernels with four effective lengths, so that the model can be greatly helped to mine the characteristics in the voice signal, and the diagnosis precision of the model is improved;
3. the invention adopts a BilSTM model to further extract the characteristics of the voice signals, adds a full connection layer, a Softmax layer and a CTC algorithm into the model, and finally realizes the voice recognition by designing a new model.
Drawings
FIG. 1 is a flow chart of the present invention;
fig. 2 is a network structure diagram of the multiscale enhanced BiLSTM model according to the present invention.
Detailed Description
The invention is described in further detail below with reference to the following detailed description and accompanying drawings:
the invention provides a voice recognition method of a painting robot based on a multi-scale enhanced BilSTM model, and aims to help the painting robot to intelligently recognize voice so as to complete a corresponding painting task. FIG. 1 is a flow chart of the present invention, and the steps of the present invention will be described in detail in conjunction with the flow chart.
Step 1, acquiring instruction signals: collecting a common spraying sound instruction by using a signal collecting system, wherein a data collecting card selects NI-9234;
step 2, aggregate denoising pretreatment: repeatedly adding 100 times of Gaussian white noise into the acquired audio signal, generating a noise-containing signal, solving a corresponding Mel frequency spectrum sequence, and then solving an average sequence of 100 Mel frequency spectrum sequences;
the process of preprocessing the audio signal by using the set denoising preprocessing in the step 2 can be expressed as follows:
assuming that the collected audio signal is x (t), which includes a valid signal c (t) and an ambient noise signal n (t), that is: x (t) ═ c (t) + n (t), adding 100 white gaussian noise g (t) to x (t) to generate noise signal s (t), solving the Mel frequency spectrum sequence of s (t), finally solving the average sequence Ms of 100 Mel frequency spectrum sequencesave
Figure BDA0003067910390000041
In the formula, Ms (·) represents the calculation solution of the mel-frequency spectrum sequence, and since the mean value of white gaussian noise is 0, when the number of times of adding white gaussian noise is large enough
Figure BDA0003067910390000042
Is close to 0, which enables the environmental noise signal n (t) in the collected signal to be filtered out, thereby greatly enhancing the characteristics of the effective signal c (t).
Step 3, multi-scale feature extraction: extracting features of the average Mel frequency spectrum sequence by using a multi-scale convolution filter, and further mining the extracted features by using a BilSTM model to obtain corresponding output;
step 3 Merr spectrum averaging sequence Ms obtained in step 2aveThe specific steps for extracting the multi-scale features are as follows:
step 3.1, design four one-dimensional convolution checks Ms of different scalesaveFiltering is carried out, the lengths of the four convolution kernels are respectively
Figure BDA0003067910390000051
Wherein L is MsaveLength of (d);
step 3.2, carrying out further processing by using a BilSTM model, wherein the specific steps can be expressed as follows:
step 3.2.1, building a BilSTM network with a forward propagation layer and a backward propagation layer;
step 3.2.2, forward propagation layer pair Ms is utilizedavePerforming calculation to obtain forward hidden state at t moment
Figure BDA0003067910390000052
The calculation expression is as follows:
Figure BDA0003067910390000053
in the formula, H represents an activation function of a hidden layer, and a sigmoid activation function, x, is selected in the patenttFor inputting data (Ms)ave),
Figure BDA0003067910390000054
Representing the connection weight coefficient between the forward input layer and the hidden layer,
Figure BDA0003067910390000055
represents the connection weight coefficient between the forward hidden layers,
Figure BDA0003067910390000056
representing the forward hidden layer state at time t-1,
Figure BDA0003067910390000057
representing the bias coefficients of the forward hidden layer.
Step 3.2.3, Ms is paired using the backward propagation layeraveCalculating to obtain a backward hidden state at time t
Figure BDA0003067910390000058
The calculation expression is as follows:
Figure BDA0003067910390000059
in the formula (I), the compound is shown in the specification,
Figure BDA00030679103900000510
represents the connection weight coefficient between the backward input layer and the hidden layer,
Figure BDA00030679103900000511
represents the connection weight coefficient between the backward hidden layers,
Figure BDA00030679103900000512
indicating the backward hidden layer state at time t-1,
Figure BDA00030679103900000513
representing the bias coefficients of the backward hidden layer.
Step 3.2.4, calculating output vector y of output layertThe calculation expression is:
Figure BDA00030679103900000514
in the formula (I), the compound is shown in the specification,
Figure BDA00030679103900000515
for the connection weight coefficient between the forward hidden layer and the output layer,
Figure BDA00030679103900000516
is a connection weight coefficient between the backward hidden layer and the output layer, b0Is the bias coefficient of the output layer.
Step 3.2.5, splicing the outputs of a plurality of BilSTM models together, inputting the spliced outputs into a full connection layer, and then processing the spliced outputs by a Softmax layer;
and 3.2.6, decoding the output of the Softmax layer by utilizing a CTC algorithm to realize voice recognition.
And 4, feature fusion recognition: splicing the outputs of the BilSTM model together, inputting the spliced outputs into a full connection layer and a Softmax layer for processing, and finally realizing voice recognition by combining a CTC algorithm;
step 5, the spraying robot is applied: embedding the model obtained by training in the steps 1-4 into a spraying robot, and intelligently realizing a corresponding spraying task.
Fig. 2 is a network structure diagram of the multiscale enhanced BiLSTM model proposed by the present invention. It can be clearly seen from the structure diagram that, for the collected voice signal, 100 groups of white gaussian noise are added, then the mel frequency spectrum sequence of the noise-added signal is solved, then the solved mel frequency spectrum sequence is arithmetically averaged, and finally the averaged mel frequency spectrum sequence is obtained, namely, the noise interference in the original voice signal is filtered out in a collective noise-adding mode, and the characteristics of the effective signal are enhanced; designing 4 convolutional filter groups with different scales, and inputting the convolutional filter groups into a BilSTM model respectively to realize the learning of the characteristics of the original signal from a multi-scale layer; and then splicing the outputs of the BilSTM model together, and finally realizing the intelligent recognition of the voice through the processing of a full connection layer, a Softmax layer and a CTC decoding algorithm. In addition, as can be seen from the structure of the BilSTM model, the BilSTM is composed of forward and backward LSTM models, and features contained in the sound signal can be mined more accurately than the LSTM through the connection between the forward and backward hidden layers.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, but any modifications or equivalent variations made according to the technical spirit of the present invention are within the scope of the present invention as claimed.

Claims (3)

1. A paint spraying robot voice recognition method based on a multi-scale enhancement BilSTM model comprises the following specific steps:
step 1, acquiring instruction signals: collecting a common spraying sound instruction by using a signal collecting system, wherein a data collecting card selects NI-9234;
step 2, aggregate denoising pretreatment: repeatedly adding 100 times of Gaussian white noise into the acquired audio signal, generating a noise-containing signal, solving a corresponding Mel frequency spectrum sequence, and then solving an average sequence of 100 Mel frequency spectrum sequences;
step 3, multi-scale feature extraction: extracting features of the average Mel frequency spectrum sequence by using a multi-scale convolution filter, and further mining the extracted features by using a BilSTM model to obtain corresponding output;
and 4, feature fusion recognition: splicing the outputs of the BilSTM model together, inputting the spliced outputs into a full connection layer and a Softmax layer for processing, and finally realizing voice recognition by combining a CTC algorithm;
step 5, the spraying robot is applied: embedding the model obtained by training in the steps 1-4 into a spraying robot, and intelligently realizing a corresponding spraying task.
2. The painting robot voice recognition method based on the multi-scale enhancement BilSTM model as claimed in claim 1, characterized in that: the process of preprocessing the audio signal by using the set denoising preprocessing in the step 2 can be expressed as follows:
assuming that the collected audio signal is x (t), which includes a valid signal c (t) and an ambient noise signal n (t), that is: x (t) ═ c (t) + n (t), adding 100 white gaussian noise g (t) to x (t) to generate noise signal s (t), solving the Mel frequency spectrum sequence of s (t), finally solving the average sequence Ms of 100 Mel frequency spectrum sequencesave
Figure FDA0003067910380000011
In the formula, Ms (·) represents the calculation solution of the mel-frequency spectrum sequence, and since the mean value of white gaussian noise is 0, when the number of times of adding white gaussian noise is large enough
Figure FDA0003067910380000012
Is close to 0, which enables the environmental noise signal n (t) in the collected signal to be filtered out, thereby greatly enhancing the characteristics of the effective signal c (t).
3. The painting robot voice recognition method based on the multi-scale enhancement BilSTM model as claimed in claim 1, characterized in that: step 3 Merr spectrum averaging sequence Ms obtained in step 2aveTo carry outThe specific steps of multi-scale feature extraction are as follows:
step 3.1, design four one-dimensional convolution checks Ms of different scalesaveFiltering is carried out, the lengths of the four convolution kernels are respectively
Figure FDA0003067910380000021
Wherein L is MsaveLength of (d);
step 3.2, carrying out further processing by using a BilSTM model, wherein the specific steps can be expressed as follows:
step 3.2.1, building a BilSTM network with a forward propagation layer and a backward propagation layer;
step 3.2.2, forward propagation layer pair Ms is utilizedavePerforming calculation to obtain forward hidden state at t moment
Figure FDA0003067910380000022
The calculation expression is as follows:
Figure FDA0003067910380000023
in the formula, H represents an activation function of a hidden layer, and a sigmoid activation function, x, is selected in the patenttFor inputting data (Ms)ave),
Figure FDA0003067910380000024
Representing the connection weight coefficient between the forward input layer and the hidden layer,
Figure FDA0003067910380000025
represents the connection weight coefficient between the forward hidden layers,
Figure FDA0003067910380000026
representing the forward hidden layer state at time t-1,
Figure FDA0003067910380000027
bias coefficient representing a forward hidden layer;
Step 3.2.3, Ms is paired using the backward propagation layeraveCalculating to obtain a backward hidden state at time t
Figure FDA0003067910380000028
The calculation expression is as follows:
Figure FDA0003067910380000029
in the formula (I), the compound is shown in the specification,
Figure FDA00030679103800000210
represents the connection weight coefficient between the backward input layer and the hidden layer,
Figure FDA00030679103800000211
represents the connection weight coefficient between the backward hidden layers,
Figure FDA00030679103800000212
indicating the backward hidden layer state at time t-1,
Figure FDA00030679103800000213
a bias coefficient representing a backward hidden layer;
step 3.2.4, calculating output vector y of output layertThe calculation expression is:
Figure FDA00030679103800000214
in the formula (I), the compound is shown in the specification,
Figure FDA00030679103800000215
for the connection weight coefficient between the forward hidden layer and the output layer,
Figure FDA00030679103800000216
for backward hidingConnection weight coefficient between layers and output layer, b0Is the bias coefficient of the output layer.
Step 3.2.5, splicing the outputs of a plurality of BilSTM models together, inputting the spliced outputs into a full connection layer, and then processing the spliced outputs by a Softmax layer;
and 3.2.6, decoding the output of the Softmax layer by utilizing a CTC algorithm to realize voice recognition.
CN202110531117.2A 2021-05-16 2021-05-16 Paint spraying robot voice recognition method based on multi-scale enhancement BiLSTM model Pending CN113192489A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110531117.2A CN113192489A (en) 2021-05-16 2021-05-16 Paint spraying robot voice recognition method based on multi-scale enhancement BiLSTM model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110531117.2A CN113192489A (en) 2021-05-16 2021-05-16 Paint spraying robot voice recognition method based on multi-scale enhancement BiLSTM model

Publications (1)

Publication Number Publication Date
CN113192489A true CN113192489A (en) 2021-07-30

Family

ID=76981842

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110531117.2A Pending CN113192489A (en) 2021-05-16 2021-05-16 Paint spraying robot voice recognition method based on multi-scale enhancement BiLSTM model

Country Status (1)

Country Link
CN (1) CN113192489A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542151A (en) * 2011-11-30 2012-07-04 重庆大学 Rotary machine axis track purification method based on ensemble empirical mode decomposition
CN103226649A (en) * 2013-03-25 2013-07-31 西安交通大学 Ensemble noise-reconstructed EMD (empirical mode decomposition) method for early and compound faults of machinery
CN103839197A (en) * 2014-03-19 2014-06-04 国家电网公司 Method for judging abnormal electricity consumption behaviors of users based on EEMD method
CN110211574A (en) * 2019-06-03 2019-09-06 哈尔滨工业大学 Speech recognition modeling method for building up based on bottleneck characteristic and multiple dimensioned bull attention mechanism
CN111653275A (en) * 2020-04-02 2020-09-11 武汉大学 Method and device for constructing voice recognition model based on LSTM-CTC tail convolution and voice recognition method
CN112593680A (en) * 2020-12-07 2021-04-02 李朝阳 Building outer wall spraying method based on intelligent spraying robot
CN112642619A (en) * 2019-10-10 2021-04-13 中国科学院重庆绿色智能技术研究院 Intelligent spraying robot system and spraying method thereof

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542151A (en) * 2011-11-30 2012-07-04 重庆大学 Rotary machine axis track purification method based on ensemble empirical mode decomposition
CN103226649A (en) * 2013-03-25 2013-07-31 西安交通大学 Ensemble noise-reconstructed EMD (empirical mode decomposition) method for early and compound faults of machinery
CN103839197A (en) * 2014-03-19 2014-06-04 国家电网公司 Method for judging abnormal electricity consumption behaviors of users based on EEMD method
CN110211574A (en) * 2019-06-03 2019-09-06 哈尔滨工业大学 Speech recognition modeling method for building up based on bottleneck characteristic and multiple dimensioned bull attention mechanism
CN112642619A (en) * 2019-10-10 2021-04-13 中国科学院重庆绿色智能技术研究院 Intelligent spraying robot system and spraying method thereof
CN111653275A (en) * 2020-04-02 2020-09-11 武汉大学 Method and device for constructing voice recognition model based on LSTM-CTC tail convolution and voice recognition method
CN112593680A (en) * 2020-12-07 2021-04-02 李朝阳 Building outer wall spraying method based on intelligent spraying robot

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
范勇: ""基于改进EMD与SOM神经自动机驱动机构故障诊断研究"" *

Similar Documents

Publication Publication Date Title
CN109664300B (en) Robot multi-style calligraphy copying method based on force sense learning
CN110543878B (en) Pointer instrument reading identification method based on neural network
CN111192237B (en) Deep learning-based glue spreading detection system and method
CN109858406B (en) Key frame extraction method based on joint point information
CN105931218A (en) Intelligent sorting method of modular mechanical arm
Liu et al. Recognition methods for coal and coal gangue based on deep learning
CN101441776B (en) Three-dimensional human body motion editing method driven by demonstration show based on speedup sensor
CN113538486B (en) Method for improving identification and positioning accuracy of automobile sheet metal workpiece
CN106514667A (en) Human-computer cooperation system based on Kinect skeletal tracking and uncalibrated visual servo
CN111261183A (en) Method and device for denoising voice
CN106406518A (en) Gesture control device and gesture recognition method
CN105389570A (en) Face angle determination method and system
CN112507859B (en) Visual tracking method for mobile robot
CN114851209B (en) Industrial robot working path planning optimization method and system based on vision
CN113192489A (en) Paint spraying robot voice recognition method based on multi-scale enhancement BiLSTM model
CN109087646B (en) Method for leading-in artificial intelligence ultra-deep learning for voice image recognition
CN111681649B (en) Speech recognition method, interaction system and achievement management system comprising system
CN103530857B (en) Based on multiple dimensioned Kalman filtering image denoising method
TW202001871A (en) Voice actuated industrial machine control system
CN112965487A (en) Mobile robot trajectory tracking control method based on strategy iteration
Luo et al. Robot artist performs cartoon style facial portrait painting
CN110472691A (en) Target locating module training method, device, robot and storage medium
CN114715363A (en) Navigation method and system for submarine stratum space drilling robot and electronic equipment
Binder et al. Utilizing an enterprise architecture framework for model-based industrial systems engineering
CN112507940A (en) Skeleton action recognition method based on difference guidance representation learning network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210730

RJ01 Rejection of invention patent application after publication