CN110956154A

CN110956154A - Vibration information terrain classification and identification method based on CNN-LSTM

Info

Publication number: CN110956154A
Application number: CN201911268073.8A
Authority: CN
Inventors: 白成超; 郭继峰; 刘天航; 郑红星
Original assignee: Harbin Gauss Touch Technology Co Ltd
Current assignee: Harbin Gauss Touch Technology Co Ltd
Priority date: 2019-12-11
Filing date: 2019-12-11
Publication date: 2020-04-03

Abstract

The invention provides a vibration information terrain classification and recognition method based on CNN-LSTM, which comprises the steps of firstly segmenting the vibration information terrain classification and recognition method to form 1 x 192-dimensional vectors, carrying out standardization processing on the vectors, normalizing each vibration vector into a mode with a mean value of 0 and a standard deviation of 1, then obtaining a characteristic vector in a frequency domain by utilizing fast Fourier transform, then carrying out learning training by utilizing a multilayer perceptron neural network, and finally using a trained network model for online detection and classification. The invention combines the characteristics of the deep neural network and the long-term and short-term memory, designs the CNN-LSTM deep neural network which gives consideration to the deep neural network and the long-term and short-term memory, and simultaneously provides test analysis of five terrain environments with different hardness degrees. The analysis result shows that the CNN-LSTM-based terrain classification effect is satisfactory for use in practical application processes.

Description

Vibration information terrain classification and identification method based on CNN-LSTM

Technical Field

The invention belongs to the technical field of terrain environment recognition, and particularly relates to a vibration information terrain classification recognition method based on CNN-LSTM.

Background

The terrain classification and identification is realized mainly based on vision and laser radar no matter whether extraterrestrial celestial body patrol or ground robot application is achieved, but with the improvement of sensor capability, a lot of new realization ideas appear, and the terrain classification and identification research is divided into five types according to the difference of the sensors, namely the terrain classification and identification based on vision, laser radar, sound, inertia measurement and multi-sensor fusion.

Three methods of automatic texture classification were comparatively analyzed as early as 1975 by j.s.weszka et al from the university of maryland, where feature classification based on fourier power spectra was less effective and feature classification based on second-order and first-order grayscale statistics was comparable. Otte et al propose a new method for terrain classification using feature sequences generated on recurring images. The method learns the generated characteristic sequence based on the recurrent neural network, and simultaneously, compared with the existing methods of standard RNN, LSTM, DCM and the like, the author verifies that the result is superior to other learning frameworks in real time. Zeltner provides a terrain classification and identification method based on unsupervised learning, and a feature vector obtained after image segmentation is trained by utilizing a deep convolutional neural network, wherein the feature vector comprises color data, depth data and infrared data, so that the dependence on training data labels is reduced, and the adaptability of the system to unknown terrain categories is improved. Rothrock et al then proposed a vision-based extraterrestrial and celestial terrain classification software system, namely spoc (soil Property and Object classification), which learns a small number of examples given by human experts through a deep neural network, and finally obtains a learning model applicable to large-scale data analysis, and verified in Mars 2020 Mars-car Mission (Mars 2020 road Mission) and Mars science laboratory Mission (Mars science laboratory). Chu He et al in 2017 proposes a hierarchical classification method, which combines a multi-layer Bayesian network and a conditional random field with prior knowledge to classify SAR images, so as to obtain higher classification accuracy. Wu et al propose a faster and more effective visual terrain classification method by improving a visual bag-of-words framework and different fusion algorithms, compared with the current method which is only researched differently from a feature extraction method, authors start with the research from the overall framework of the visual bag-of-words and expound the characteristics of fusion at different stages, and finally, the fusion algorithm is evaluated by using two forms of an average linear kernel and a multiplicative linear kernel. In 2018, the P.Kozlowski and the K.Walas take the RGB information, the depth information and the infrared information collected by the RGBD camera as input, and the newly designed depth neural network is utilized to realize the effective recognition of the terrain with various materials.

For the demand of safe navigation in the terrain containing vegetation, J.F.Landen et al, 2006, proposed a terrain classification and identification method based on three-dimensional radar point cloud data, in which "scattering" classes are used to represent multiple pore volumes, such as grass and tree canopy; characterizing thin objects, such as wires or branches, with a "linear" class; solid objects such as ground, rock, etc. are characterized by "face" features. Suger et al proposed a semi-supervised learning terrain classification recognition method in 2015, realized model learning of ground trafficability analysis by using partially labeled 3D lidar data, and verified effectiveness on different platforms.

The J.Libby et al from the university of Kangjiron proposes to use sound features to classify and identify the terrain for the first time, and designs a multi-class classifier based on a support vector product. After that, a.valada et al applies deep feature learning to the recognition of the terrain classification based on the sound features, designs a new convolutional neural network structure for deep feature learning of various sound data, and verifies the robustness of the proposed algorithm under the condition of noise change based on a large number of experiments, and the classification effect is still good even if a low-quality data recorder is used in a strong noise environment. Aiming at the defects of sensing modes such as vision and the like in classification precision, robustness and operation efficiency, A.Valada and W.Burgard provide a cyclic model based on deep long-short term memory by using sound signals of a round-robin interaction process, the model learns deep spatial features by using a new convolutional neural network framework, and simultaneously learns the temporal dynamics by using a long-short term memory model (LSTM), thereby realizing the effective identification of the terrain on the temporal and spatial levels.

In 2017, S.khaleghian and S.Tahei propose a design idea of an intelligent tire, a three-axis accelerometer is mounted on the inner side of the tire to sense acceleration information change, wheel sliding is measured by using a wheel type odometer, so that the power of an acceleration signal and the slip rate of the wheel are obtained, and classification and identification of four different terrains are realized by using a fuzzy logic algorithm with the power as input. Oliveira et al propose a new classifier that can cluster different terrains based on acceleration data provided by an inertial measurement unit.

The adaptability and the accuracy of measurement by a single sensor are limited in many times, and gradually, more and more learners tend to combine multiple sensing modes, and the recognition and classification with higher accuracy are realized by mutual matching and supplement. Manduchi et al have proposed a combination scheme based on binocular vision and unipolar radar to obstacle identification and topography classification problem under the cross-country environment, wherein binocular vision is used for realizing obstacle detection, utilizes the classification system based on colour of design to carry out label matching with the obstacle that detects, distinguishes the target kind through analysis radar data at last, and this kind of method provides the reference for subsequent semantic navigation. Navarro et al propose methods for offline and online terrain classification, in which online obstacle detection is achieved based on TOF cameras (Time-of-Flight cameras), and terrain classes passing through regions are trained offline based on gaussian process using 3D lidar point cloud information. Then, Kai Zhao et al uses sound and vibration data during round-robin interaction to classify terrain, and compared with the traditional manual marking of features, the article combines the Relieff and the mRMR algorithm to provide a two-step selection method of an optimal feature subset, and finally realizes effective classification of different data through a method of combining multiple classifiers. J.park et al in 2018 designs a terrain classification and identification network based on a long-short term memory unit and Deep ensemble learning (Deep ensemble learning), and provides a feature selection method which gives consideration to no loss of performance and can exclude unnecessary sensor data, and obtains higher identification accuracy in different physical environments. R.d. rosenfeld et al propose a multi-supervised learning terrain classification method, which distinguishes different surfaces using 2D lidar data and RGBD camera data based on density applied noise spatial clustering. The author considers the adaptability of the algorithm and the driving control of the mobile platform at the same time, and can further effectively control the running state of the platform between different terrains.

It can be seen that, a relatively deep research has been made on the problem of terrain classification and identification, but the operation environment has high uncertainty in the process of inspection of extraterrestrial celestial bodies, and the sensor is prone to faults, such as a strong illumination environment, and the visual detection is in an unreliable state; the sand and dust weather which can change in the Mars detection environment is unfavorable for the laser radar; meanwhile, the inspection cost of the extraterrestrial celestial body is considered, the sensor configuration is limited by the mass and the volume, and the whole set of sensor load and backup cannot be carried, so that a sensing mode robust to the environment is needed to realize terrain classification and identification in different environments, the vibration sensor is a very good choice, on one hand, the vibration information can well reflect the terrain change characteristic through characteristic representation, on the other hand, the vibration sensor is not easily influenced by the environment change and has certain robustness.

At present, the research of terrain classification based on vibration information is still under development, and related results are mainly completed by K.Iagnemma team of MIT and C.Weiss team of Germany Tubingo university. In 2002, an online terrain parameter estimation method is proposed by K.Iagnemema et al, and the sizes and changes of the cohesion and the internal friction angle are estimated in real time by solving simplified equations of the cohesion and the internal friction angle and utilizing a linear least square estimator. On the basis, a patrol instrument terrain classification and identification method based on vision/vibration is proposed for the first time in 2004. The method comprises the steps of measuring the subsidence of the wheels of the planet walker based on vision, estimating terrain parameters on line based on touch, and classifying and identifying the terrain based on vibration feedback. The terrain classification method based on vibration, which is perfected by Brooks et al on the basis of K.Iagnemema, adopts marked vibration data for off-line learning training of a classifier, and simultaneously utilizes linear judgment analysis to perform on-line classification and identification of terrain. C.Brooks and K.Iagnemma in 2007 propose a new self-supervision terrain classification method, which effectively combines vision and vibration sensing together, firstly realizes vibration classification by collecting vibration information in a wheel-ground interaction process, and secondly learns a terrain label based on vibration identification and a terrain picture collected by the vision, so that a vision camera can be used for estimating the front terrain type in the actual application process.

Weiss et al 2006 propose a terrain classification method based on a support vector machine, which adopts a radial basis function as a kernel function, calculates originally acquired acceleration data to obtain eight features, and obtains the result as a feature vector through normalization processing; synchronization, using the above results in combination with the feature vectors proposed by c.brooks, gives a new feature representation method. Then, c.weiss et al effectively improve classification performance using three-axis directional vibration acceleration data, and as can be seen from the results, the feature vector classification accuracy in the vertical direction is higher than that in the tangential direction, but the fusion of the three can obtain the best classification result. And then, the C.Weiss et al realize terrain classification testing based on a support vector machine and simultaneously analyze the influence of different vibration measurement directions on results, thereby providing a simple and effective acquisition mode. In 2008, c.weiss et al proposed a vision/vibration combined terrain classification method, which has similar ideas to the method proposed in 2007 by c.brooks, and the author classifies the terrain ahead using the visual information, and when the mobile platform moves to the shooting position at the previous moment, the vibration classification result is used to verify the previous classification prediction, so that prediction compensation can be effectively performed. In 2009, C.Weiss et al propose a classification method considering multi-measurement influence based on Bayesian filtering, and simultaneously compare the classification method with single-measurement classification based on a support vector machine, and the result shows that the recognition effect is obviously improved. C.Weiss and A.Zell provide a terrain classification and identification system based on vibration sensing aiming at the situation that the movement in unknown terrain may meet untrained terrain types, the system utilizes a Gaussian mixture model for detection, and after the mobile platform collects enough unknown terrain data, the new type is added into the classification model on line, so that the application range is expanded, and the mobile platform has self-learning and self-running capabilities in an unknown environment.

In addition to the two teams, other scholars have conducted related studies. E.collins et al, 2008, among others, propose a response-based terrain input classification method. Compared with the existing vibration method, the method has the advantage that the dependence on the speed is reduced by traversing the trafficability of the terrain. The method uses an AGV vibration transfer function to map the vibration output to terrain input, and uses surface contours from real terrain for verification in the simulation. Tick et al proposed a multi-layered classifier terrain classification method based on angular velocity in 2012. The innovation of the method is that the characteristics are represented by adopting acceleration and angular velocity measurement in all reference directions, the characteristics are screened by utilizing sequential forward floating characteristic selection, the characteristics are classified by utilizing a linear Bayesian normal classifier, different characteristic sets are generated according to different speed conditions, and the classifier is switched according to different speeds, so that the method is closer to actual operation. Later, based on the experimental finding that the spatial frequency response amplitude of the terrain can effectively represent the terrain features, in 2008, E.M. Dupont et al utilize a probabilistic neural network to realize the online terrain classification and identification of vibration sensing measurement. In the same year, e.m. dupont et al have realized moving platform terrain classification and identification at different speeds based on a feature space manifold method, in which a vibration sensing measurement unit of the moving platform is used for data acquisition, the classification process combines principal component analysis to perform feature extraction and dimension reduction, and then principal component analysis transform coefficients are used to develop and construct manifold curves, and when the platform movement speed changes, unknown coefficients of the terrain are inserted by using these known coefficients.

Disclosure of Invention

The invention aims to solve the problem that the terrain types cannot be effectively classified and identified by the existing perception mode, and provides a vibration information terrain classification and identification method based on CNN-LSTM. The invention provides the recognition capability of the terrain environment for the wheel-ground interaction system, and can distinguish the physical attributes of the terrain material, the soft and hard degree and the like, thereby improving the judgment of the terrain passing capability, and reducing the probability of falling into the terrain which is ' flat-like ' and is actually soft '. The invention can be directly applied to interaction platforms such as extraterrestrial celestial body patrolling device systems and ground cross-country vehicles, and the like, and improves the cognition of the extraterrestrial celestial body patrolling device systems and the ground cross-country vehicles on the terrain of the operating environment.

The invention is realized by the following technical scheme, and provides a vibration information terrain classification and identification method based on CNN-LSTM, which comprises the following steps:

the method comprises the following steps: acquiring original vibration data of a vibration sensor in different terrain environments;

step two: dividing the original data of the vibration information acquired in the first step into a plurality of sections, wherein each divided section corresponds to 64 acquired vibration data, and finally forming a vector with the size of 3 multiplied by 64:

step three: and (3) carrying out terrain type marking on the vectors obtained after the two steps of segmentation processing, wherein each vector corresponds to one terrain:

wherein, T_jJ is 1,2,3,4,5 respectively corresponding to masonry, fine sand, flat ground, cement ground and earth ground;

step four: converting the vector obtained after the segmentation processing into a frequency domain;

step five: carrying out learning training on the vector converted into the frequency domain by using a CNN-LSTM deep neural network to obtain a trained CNN-LSTM deep terrain classification and identification network;

step six: and acquiring vibration data online in real time, executing the second step to the fourth step, and performing online classification and identification by using the CNN-LSTM deep terrain classification and identification network trained in the fifth step to obtain the terrain type.

Further, the fourth step is specifically: the raw vibration data is divided into 1 x 192-dimensional vectors, which are normalizedNormalizing each vibration vector into a form with a mean value of 0 and a standard deviation of 1, and then transforming the normalized data by using FFT to obtain front and back [ F (V)_x)_1:64]Left and right [ F (V) ]_y)_1:64]And up and down [ F (V)_z)_1:64]Vector, finally normalize each data to [0, 1%]Interval, using the transformed three-axis data as a feature vector to obtain a feature vector in the frequency domain for actual training, i.e. obtaining the feature vector in the frequency domain for actual training

F^*＝[F(V_x)_1:64F(V_y)_1:64F(V_z)_1:64]。

Furthermore, the CNN-LSTM deep neural network is a seven-layer deep neural network, the first layer, the third layer and the fifth layer are convolution layers, the second layer and the fourth layer are pooling layers, the sixth layer is an LSTM layer, and a full-connection layer is adopted to predict output at the final stage of the network; the activation function of the convolutional layer and the pooling layer is a ReLU function, and the activation function of the fully-connected layer is a Softmax function.

Further, the seven-layer deep neural network consists of three convolutional layers of step size 1, each convolutional operation is performed by shifting the kernel one sample at a time over the input vector, during which operation the superimposed matrices are multiplied and summed, while in order to reduce the size of the input representation by half, the feature mapping after the first and third convolutional layers employs maximum pooling filtering of step size 2, followed by extracting temporal information from the features using the LSTM layer, so that the features extracted from the convolution and pooling process are decomposed into sequential components and fed to the cyclic LSTM unit for temporal analysis, and only the last output from the LSTM is input into the fully connected layers for terrain category prediction.

Further, in step one, the sampling frequency is 100 Hz.

Further, a plurality of neurons are randomly discarded in the LSTM layer, so that the strong adaptability of the neurons to the training data is prevented, and when the neurons are discarded, the connection weight values are excluded from updating.

The method comprises the steps of utilizing vibration data in the vertical direction, collecting multi-dimensional vibration data for processing, firstly dividing the vibration data to form 1 x 192-dimensional vectors, carrying out standardization processing on the vectors, normalizing each vibration vector to be in a mode that the mean value is 0 and the standard deviation is 1, then utilizing fast Fourier transform to obtain characteristic vectors in a frequency domain, then utilizing a multi-layer perceptron neural network to carry out learning training, and finally utilizing a trained network model for online detection and classification.

Drawings

FIG. 1 is a schematic diagram of a CNN-LSTM-based deep neural network design;

FIG. 2 is a schematic diagram of a CNN-LSTM-based deep neural network structure;

FIG. 3 is a comparison graph of prediction accuracy for different categories;

FIG. 4 is a confusion matrix chart of five experiments based on the CNN-LSTM deep neural network.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

With reference to fig. 1 and fig. 2, the present invention provides a CNN-LSTM-based vibration information terrain classification and identification method, which includes the following steps:

the method comprises the following steps: acquiring original vibration data of a vibration sensor in different terrain environments; the sampling frequency was 100 Hz.

The fourth step is specifically as follows: the raw vibration data is divided into 1 x 192-dimensional vectors, normalized to mean 0 and standard deviation 1, and then transformed by FFT to obtain front and rear [ F (V) ]_x)_1:64]Left and right [ F (V) ]_y)_1:64]And up and down [ F (V)_z)_1:64]Vector, finally normalize each data to [0, 1%]Interval, using the transformed three-axis data as a feature vector to obtain a feature vector in the frequency domain for actual training, i.e. obtaining the feature vector in the frequency domain for actual training

F^*＝[F(V_x)_1:64F(V_y)_1:64F(V_z)_1:64]。

The CNN-LSTM deep neural network is a seven-layer deep neural network, the first layer, the third layer and the fifth layer are convolution layers, the second layer and the fourth layer are pooling layers, the sixth layer is an LSTM layer, and a full-connection layer is adopted for predicting output in the final stage of the network; the convolution layer and the pooling layer have good effects of extracting spatial features, and the subsequent LSTM layer can effectively capture the time characteristics existing in the features, so that the network can adapt to terrain application with different features in different states, and more choices are provided for the design of terrain features. The activation function of the convolutional layer and the pooling layer is a ReLU function, and the activation function of the fully-connected layer is a Softmax function. Detailed network design parameters are given in table 1:

TABLE 1 CNN-LSTM-based deep neural network design parameters

The seven-layer deep neural network consists of three convolutional layers of step size 1, each convolutional operation is performed by shifting the kernel one sample at a time over the input vector, during which operation the superimposed matrices are multiplied and summed, while in order to reduce the size of the input representation by half, the feature mapping after the first and third convolutional layers employs maximum pooling filtering of step size 2, followed by extraction of temporal information from the features using the LSTM layer, so that the features extracted from the convolutional and pooling processes are decomposed into sequential components and fed to the cyclic LSTM unit for temporal analysis, and only the output from the last step of the LSTM is input into the fully-connected layers for terrain category prediction.

Overfitting of the network model during training is a problem that must be considered, especially in situations where the number of terrain features is small. In order to prevent overfitting during training, the invention provides a concept of discarding regularization, namely randomly discarding a plurality of neurons at the LSTM layer, and the invention is set to 20%. The idea of randomly discarding part of the network during the training phase is to prevent strong adaptation of the neurons to the training data. When neurons drop, the connection weights will be excluded from the update, forcing the network to learn from imperfect patterns, improving the generalization ability of the model. The design of the CNN-LSTM-based terrain classification and identification network is given in the above.

The invention utilizes a ground robot to collect experimental data. It has a hard rubber wheel, which can generate clear vibration signal. The method utilizes vibration signals in multiple directions to improve the accuracy of terrain classification. The terrain classification and identification method has two stages: training and classifying. Training has relatively high requirements for computation and is therefore typically an off-line step. The classification phase is very fast and can be directly called by using a trained model.

The invention combines the characteristics of the deep neural network and the long-term and short-term memory, designs the CNN-LSTM deep neural network which gives consideration to the deep neural network and the long-term and short-term memory, and simultaneously provides test analysis of five terrain environments with different hardness degrees.

Based on the trained network model, testing is carried out in the terrain environments with five different materials. In order to ensure that no other interference is introduced, the mobile platform will run at a constant speed, where v is 0.2 m/s. Meanwhile, for convenience of representation, brick terrains, flat grounds, cement grounds, fine sand grounds and earth grounds with different hardness degrees are respectively represented by the

numbers

1,2,3,4 and 5. Through 5 experiments, the accuracy of classification of 5 types of terrains is obtained, as shown in table 2.

TABLE 2 prediction accuracy for different terrain categories

As can be seen from table 2 above, the present invention has higher recognition accuracy for soft terrain and soft terrain, and lower classification accuracy for medium hardness, hard terrain and hard terrain, which indicates that the network is not ideal for terrain recognition and classification in experimental environment. As can be seen from fig. 3, the overall classification accuracy is still maintained at about 75%, and the classification characteristics and the degree of confusion are analyzed below to further identify the cause of the accuracy error in classifying the five types of terrain.

As shown in fig. 4, (a), (b), (c), (d), and (e) are confusion matrices of five experiments, respectively, it can be seen from the results of the confusion matrices of the five experiments that the medium hardness terrain is easily misclassified into hard terrain and hard terrain, and the hard terrain are also mutually identified incorrectly.

The method for classifying and identifying the terrain based on the vibration information of the CNN-LSTM, which is provided by the invention, is described in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A vibration information terrain classification and identification method based on CNN-LSTM is characterized in that: the method comprises the following steps:

2. The method of claim 1, wherein: the fourth step is specifically as follows: the raw vibration data is divided into 1 x 192-dimensional vectors, normalized to mean 0 and standard deviation 1, and then transformed by FFT to obtain front and rear [ F (V) ]_x)_1:64]Left and right [ F (V) ]_y)_1:64]And up and down [ F (V)_z)_1:64]Vector, finally normalize each data to [0, 1%]Interval, using the transformed three-axis data as a feature vector to obtain a feature vector in the frequency domain for actual training, i.e. obtaining the feature vector in the frequency domain for actual training

F^*＝[F(V_x)_1:64F(V_y)_1:64F(V_z)_1:64]。

3. The method of claim 2, wherein: the CNN-LSTM deep neural network is a seven-layer deep neural network, the first layer, the third layer and the fifth layer are convolution layers, the second layer and the fourth layer are pooling layers, the sixth layer is an LSTM layer, and a full-connection layer is adopted for predicting output in the final stage of the network; the activation function of the convolutional layer and the pooling layer is a ReLU function, and the activation function of the fully-connected layer is a Softmax function.

4. The method of claim 3, wherein: the seven-layer deep neural network consists of three convolutional layers of step size 1, each convolutional operation is performed by shifting the kernel one sample at a time over the input vector, during which operation the superimposed matrices are multiplied and summed, while in order to reduce the size of the input representation by half, the feature mapping after the first and third convolutional layers employs maximum pooling filtering of step size 2, followed by extraction of temporal information from the features using the LSTM layer, so that the features extracted from the convolutional and pooling processes are decomposed into sequential components and fed to the cyclic LSTM unit for temporal analysis, and only the output from the last step of the LSTM is input into the fully-connected layers for terrain category prediction.

5. The method of claim 1, wherein: in step one, the sampling frequency is 100 Hz.

6. The method of claim 4, wherein: and randomly discarding a plurality of neurons at an LSTM layer to prevent the neurons from having strong adaptability to training data, and when the neurons are discarded, the connection weight is excluded from updating.