CN110210371B

CN110210371B - In-air handwriting inertial sensing signal generation method based on deep confrontation learning

Info

Publication number: CN110210371B
Application number: CN201910454780.XA
Authority: CN
Inventors: 薛洋; 徐松斌
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2019-05-29
Filing date: 2019-05-29
Publication date: 2021-01-19
Anticipated expiration: 2039-05-29
Also published as: CN110210371A

Abstract

The invention discloses a method for generating an aerial handwriting inertial sensing signal based on deep confrontation learning, which comprises the following steps of: filtering, smoothing and denoising the acquired aerial handwritten inertial sensing signal sequence to be used as a training sample set; designing a depth convolution conditional countermeasure generation network based on the time sequence feature map position coding; training the generating network with a training sample set; and inputting the established sample length, sample class labels and random noise vectors into a trained countermeasure generation network, and taking the input of a generator as a generated aerial handwriting inertial sensing signal sequence. The invention can generate the aerial handwriting inertial sensing signal samples with certain diversity and good enough quality, and the effective length and the class of the generated samples are controllable.

Description

In-air handwriting inertial sensing signal generation method based on deep confrontation learning

Technical Field

The invention relates to the technical field of deep learning and artificial intelligence, in particular to a method for generating an aerial handwriting inertial sensing signal based on deep confrontation learning.

Background

In-air handwriting recognition based on inertial sensors (accelerometers and gyroscopes) is one of the leading research directions emerging in the field of computers in recent years, and has wide application in smart home, automatic driving, education, medical treatment, industrial production, auxiliary communication and the like. The aerial handwriting recognition model often needs sufficient training samples to obtain better generalization capability, but the currently disclosed aerial handwriting data set based on the inertial sensor is few, and the main reasons of data scarcity are three. First, the process of sensor data acquisition is time consuming and labor intensive. Secondly, inertial sensor data readability is poor, resulting in data cleaning and labeling difficulties. Third, the handwritten data is related to the personal privacy of the person being collected and is not easily disclosed. The scarcity of data presents difficulties for model training.

At present, there are two main methods for adding training samples, namely manual data enhancement and automatic generation by using a model. General data enhancement methods such as rotation, flipping, masking, scaling and noising, when used for inertial sensor signal data for in-air handwriting, tend to distort the physical meaning of the data, which is unreasonable. On the other hand, the advent of deep learning and countermeasure generation networks has made it possible to generate more in-air handwritten character samples directly using depth models. However, most of currently, the generation-resistant network structures are specially designed for image data, and when the network structures are directly applied to handwritten data in the air, problems such as poor generation effect and mode collapse (mode collapse) are easily caused.

Disclosure of Invention

The invention aims to provide a method for generating an aerial handwritten inertial sensing signal based on deep confrontation learning aiming at the defects of the prior art, and provides a brand-new method for enhancing an aerial handwritten character sample based on an inertial sensor by designing and training a deep convolution condition confrontation generation network based on time sequence characteristic diagram position coding. The method can well relieve the common mode collapse problem of the anti-generation network, can generate the samples with controllable character types and controllable sequence lengths, has good quality and diversity of the generated samples, and has good application and research values.

The purpose of the invention can be realized by the following technical scheme:

a method for generating an aerial handwriting inertial sensing signal based on deep confrontation learning, comprising the following steps of:

s1, acquiring an aerial handwritten inertial sensing signal sequence and a character class label, and performing data preprocessing;

s2, designing and training a deep convolution condition confrontation generation network based on the time sequence feature map position coding, wherein the deep convolution condition confrontation generation network comprises a generator network and a discriminator network, and the method comprises the following specific steps:

s21, designing a generator network, using the random noise vector as input, assigning length and class labels, and generating inertial sensing signal sequence samples of the aerial handwritten characters; the generator network comprises a time sequence characteristic diagram position coding module, a noise mapping module, a characteristic diagram up-sampling module and an amplitude adjusting module, and the specific operation steps are as follows:

s21(a), a time sequence characteristic diagram position coding step: the length of the characteristic diagram is specified, a time sequence characteristic diagram position coding module is used for calculating and outputting a characteristic diagram position coding sequence, and the formula of the time sequence characteristic diagram position coding is as follows:

wherein pos (T) is a symbol of the position code of the time sequence characteristic diagram, T represents the time, T_fIs the length of the significant part, T, in the specified feature map_LIs the length of the zero-filled part in the feature map, T_f+T_LIs a fixed value, operator

Represents rounding down;

s21(b), noise mapping step: inputting noise vectors, category labels and time sequence feature map position codes, utilizing a noise mapping module to query weights, and respectively carrying out full-connection mapping on noise, wherein the specific steps are as follows:

(1) input d_zNormalized normal noise vector of dimension, input d_cClass label one-hot encoding of dimension, spliced into d_z+d_cDimension vector, and stack T_f+T_LThen, it becomes a size of (T)_f+T_L)×(d_z+d_c) The timing noise matrix of (a);

(2) initialization (T)_f+T_L) Each size is (d)_z+d_c)×d₁The noise mapping weight matrix takes out the pos (t) noise mapping weight matrix to carry out full-connection operation on the noise vector at the time t in the time sequence noise matrix, and updates the time sequence noise momentArraying; then, initialize (T)_f+T_L) Each size is d₁×d₂The hidden layer weight matrix takes out the Pos (t) th hidden layer weight matrix to carry out full-connection operation on the noise vector at the time t in the time sequence noise matrix, and updates the time sequence noise matrix;

s21(c), a characteristic diagram upsampling step: inputting the time sequence noise matrix into an up-sampling module of the characteristic diagram, and performing up-sampling by using the deconvolution layer to obtain the value of alpha (T)_f+T_L) A xd inertial sensing signal sequence, where α represents the up-sampling magnification, d is the dimensionality of the inertial sensing signal, and the sequence is smoothed by one-dimensional convolution;

s21(d), sample amplitude adjustment step: inputting the smoothed sequence into an amplitude adjusting module, and adjusting the amplitude of the sample sequence according to the specified effective length and the specified character type, wherein the adjusting formula is as follows:

wherein d is_cIs the number of character classes, j and T_fRespectively the effective length, G, of the assigned class and feature map_jIs the smoothed sequence samples output by the feature map upsampling module,

is the sample after amplitude adjustment, beta_jIs a trainable parameter for a given category j;

s22, designing a discriminator network, inputting a real sample and a class label, or generating a sample and a specified class label, and outputting a truth prediction about the sample, wherein the concrete operation steps are as follows:

s22(a), inputting the one-hot coding of the real sample and the class label, splicing the sample and the label on the channel dimension, then inputting the spliced sample and the label into a depth residual error network, extracting the characteristic through the convolution layer and calculating the full connection, and outputting the truth prediction l related to the real sample_r；

S22(b), input by generator networkConnecting the pseudo samples generated under the condition of the specified category and the one-hot coding of the labels of the specified category, splicing the pseudo samples and the labels on the channel dimension, then inputting the spliced pseudo samples and the labels into a depth residual error network, and outputting a truth prediction l related to the pseudo samples through the extraction characteristics of the convolutional layer and full-connection calculation_g；

When the discriminator network discriminates the true sample and the false sample, the same set of network parameters are used;

s23, training the generator network and the discriminator network simultaneously by using a training sample set; calculating the countermeasure loss by using a truth predicted value output by the discriminator, calculating the feature matching (feature matching) loss by using the output of the second last layer full connection of the discriminator, reversely propagating errors and optimizing network parameters;

and S3, inputting the one-hot codes and the standard normal noise vectors with the specified lengths and the specified types into a trained generator network, wherein the sequence output by the generator network is the generated inertial sensing signal sequence sample of the handwritten character in the air.

Further, the specific steps of the data acquisition and preprocessing process in step S1 are as follows:

s11, acquiring an inertial sensing signal sequence of the handwritten character in the air and a character class label from the public data set 6DMG, and performing one-hot encoding on the class label; the 6-dimensional inertial sensing signals comprise 3-dimensional acceleration signals and 3-dimensional angular velocity signals, and the category labels are 62 types and comprise 10 Arabic numerals, 26 capital English letters and 26 lowercase English letters;

and S12, carrying out moving average filtering denoising with the fixed window length of 5 on the obtained inertial sensing signal sequence.

Further, the loss calculations in step S23 are performed, where the penalty loss includes an arbiter loss and a generator loss, and the specific formula is:

wherein L is_DIs the sign of the penalty of the arbiter_GIs the sign of the generator against losses, L_fmIs the sign of the loss of feature matching, n is the number of training samples of the same Batch (Batch);

representing the prediction of the truth of the arbiter with respect to the ith real sample,

representing a prediction of the truth of the arbiter with respect to the ith generated sample;

representing the output of the last full-link layer of the discriminator with respect to the ith real sample,

representing the output of the last fully connected layer of the discriminator with respect to the ith generated sample.

Further, the network training process of step S23 first incorporates the feature matching loss into the generator countermeasure loss to obtain the feature matching loss

Then training the generator and the discriminator in an alternate training mode; in each iteration of the batch training, 1 time is performed

And corresponding error feedback and network parameter update, and then L is performed k times_DThe calculation of (a) and the corresponding error feedback and network parameter updating; when return loss occurs, local network parameter updating and other parameter fixing are adoptedTraining in a non-trainable mode; in particular, backhaul

Updating only the parameters of the generator network at any time, passing back L_DOnly the parameters of the arbiter network are updated at that time.

Further, in the network training process of step S23, the maximum Mean difference mmd (maximum Mean variance) is used as a measure for generating the similarity between the sample distribution and the real sample distribution; firstly, when a training set sample is traversed once each time, a generator network is used for generating a false sample set, and the total sample number and the sample number of each character type of the false sample set are required to be consistent with the training sample set; and then, calculating and recording the MMD of the false sample set and the training sample set, and taking the model corresponding to the minimum MMD value which can be reached in the training process as a final model.

Further, in the network training process of step S23, spectrum normalization is performed when training the arbiter network; specifically, after updating the network parameters of the discriminator each time, the network parameters are updated layer by using spectrum standardization, so that the discriminator meets the Lipschiz continuous condition, and the network convergence is accelerated; the spectral normalization is realized by a power iteration method, and the specific formula is as follows:

σ(W)＝u^TWv

where W is a parameter matrix for a layer of the discriminator network, v and u are temporary variables for power iterations, σ (W) is the sign of the spectral norm of W,

is a network parameter matrix after spectrum standardization; the operator | · |)₂Indicating the computation of the L2 norm and operator T denotes transposition.

Further, the total length T of the timing characteristic diagram_f+T_L32; the up-sampling multiplying power alpha is 8; the data dimension d of the inertial sensing signal is 6; the noise vector dimension d_z100; the character category number and category label one-hot encoding dimension d_c62; the noise mapping weight matrix dimension d₁32, the hidden layer weight matrix dimension d₂64; in the alternative training L_DAnd the times k of calculation, return and parameter updating is 3.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention provides a method for generating an aerial handwritten inertial sensing signal based on deep confrontation learning.

2. The invention provides a method for generating an in-air handwritten inertial sensing signal based on deep confrontation learning.

3. According to the method for generating the air handwriting inertia sensing signal based on the deep countermeasure learning, provided by the invention, in a generator network, the time sequence characteristic graph position code and the noise mapping layer which is used for carrying out weight query according to the code can enable the loss from different local parts of a generated sample to be correctly transmitted back to different part weights of the noise mapping layer, so that the problem of mode collapse is relieved.

4. According to the method for generating the aerial handwriting inertial sensing signal based on the deep confrontation learning, the samples generated by the confrontation generation network are good in diversity and quality.

5. According to the method for generating the aerial handwriting inertial sensing signal based on the deep confrontation learning, provided by the invention, in a generator network, the last one-dimensional convolution smoothing of a characteristic up-sampling module and an amplitude adjusting module can improve the authenticity of a generated sample.

6. The invention provides a method for generating an aerial handwritten inertial sensing signal based on deep confrontation learning.

7. The method for generating the aerial handwritten inertial sensing signal based on the deep confrontation learning introduces the characteristic matching loss during training, so that the discriminator can discriminate the authenticity of the sample from the characteristic space, and the discrimination performance of the discriminator is improved.

8. The method for generating the aerial handwriting inertial sensing signal based on the deep confrontation learning, provided by the invention, uses MMD (matrix-mass-distance) guidance during training, and can avoid model under-fitting or over-fitting.

Drawings

Fig. 1 is a flowchart of a training process of a method for generating an air handwriting inertia sensing signal based on deep confrontation learning according to an embodiment of the present invention.

Fig. 2 is a flowchart of generating an aerial handwritten inertial sensing signal sample by using a trained deep convolution condition based on time series feature map position coding to resist generation network according to an embodiment of the present invention.

Fig. 3 is an exemplary diagram of an arabic digital sample of an aerial handwritten inertial sensing signal generated by a run-of-time feature map location coding-based robust generation network according to an embodiment of the present invention.

Fig. 4 is an exemplary diagram of an air handwriting inertia sensing signal capital letter sample generated by a countermeasure generation network based on a deep convolution condition of time sequence feature map position coding according to an embodiment of the present invention.

Fig. 5 is an exemplary diagram of a sample of a lowercase letter of an aerial handwritten inertial sensing signal generated by a countermeasure generation network based on a depth convolution condition of a position code of a time sequence feature diagram according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

Example (b):

referring to fig. 1 and fig. 2, the embodiment discloses a method for generating an aerial handwritten inertial sensing signal based on deep confrontation learning, which includes the following specific steps:

s1, acquiring an aerial handwritten inertial sensing signal sequence and a character type label, and preprocessing data, wherein the steps are as follows:

and S11, acquiring an inertial sensing signal sequence of the handwritten character in the air and a character class label from the public data set 6DMG, and carrying out one-hot coding on the class label. The 6-dimensional inertial sensing signals comprise 3-dimensional acceleration signals and 3-dimensional angular velocity signals, and the category labels are 62 types and comprise 10 Arabic numerals, 26 capital English letters and 26 lowercase English letters;

s12, carrying out moving average filtering denoising with the window length of 5 on the obtained inertial sensing signal sequence;

and S21, designing a generator network, taking the random noise vector as input, assigning a length and a class label, and generating inertial sensing signal sequence samples of the aerial handwritten characters. The generator network comprises a time sequence characteristic diagram position coding module, a noise mapping module, a characteristic diagram up-sampling module and an amplitude adjusting module, and the specific operation steps are as follows:

and S21(a), a time sequence characteristic diagram position coding step. The length of the characteristic diagram is specified, a time sequence characteristic diagram position coding module is used for calculating and outputting a characteristic diagram position coding sequence, and the formula of the time sequence characteristic diagram position coding is as follows:

wherein pos (T) is a symbol of the position code of the time sequence characteristic diagram, T represents the time, T_fIs the length of the significant part, T, in the specified feature map_LIs the length of the zero-filled part in the feature map, T_f+T_LIs a fixed value (set to T in this embodiment)_f+T_L32), operator

Represents rounding down;

and S21(b), a noise mapping step. Inputting noise vectors, category labels and time sequence feature map position codes, utilizing a noise mapping module to query weights, and respectively carrying out full-connection mapping on noise, wherein the specific steps are as follows:

(1) input d_zA normalized normal noise vector of dimension (100 in this embodiment) is input as d_cClass label one-hot coding of dimension (62 in this embodiment) and splicing into d_z+d_cDimension vector, and stack T_f+T_LThen, it becomes a size of (T)_f+T_L)×(d_z+d_c) The timing noise matrix of (a);

(2) initialization (T)_f+T_L) Each size is (d)_z+d_c)×d₁(in this embodiment, d₁32), extracting pos (t) noise mapping weight matrix from the noise vector at time t in the time sequence noise matrix, performing full-join operation, and updating the time sequence noise matrix. Then, initialize (T)_f+T_L) Each size is d₁×d₂(in this embodiment, d₂64), taking out the pos (t) hidden layer weight matrix from the noise vector at the time t in the time sequence noise matrix, performing full-connection operation, and updating the time sequence noise matrix;

and S21(c), a characteristic diagram upsampling step. Inputting the time sequence noise matrix into an up-sampling module of the characteristic diagram, and performing up-sampling by using the deconvolution layer to obtain the value of alpha (T)_f+T_L) X d inertial sensing signal sequence and flattening the sequence with one-dimensional convolutionAnd (4) slipping. α represents an up-sampling magnification (α is 8 in this embodiment), and d is a dimension of the inertial sensing signal (d is 6 in this embodiment);

and S21(d) sample amplitude adjustment. Inputting the smoothed sequence into an amplitude adjusting module, and adjusting the amplitude of the sample sequence according to the specified effective length and the specified character type, wherein the adjusting formula is as follows:

S22(b), inputting the false samples generated by the generator network under the condition of the specified category and the one-hot coding of the labels of the specified category, splicing the false samples and the labels on the channel dimension, then inputting the spliced false samples and the labels into a depth residual error network, extracting features through a convolutional layer and calculating through full connection, and outputting the truth prediction l related to the false samples_g；

The discriminator network uses the same set of network parameters when discriminating between true and false samples.

Parameters are updated using spectral normalization while training the discriminator network. Specifically, after updating the network parameters of the discriminator each time, the network parameters are updated layer by using spectrum standardization, so that the discriminator meets the Lipschiz continuous condition, and the network convergence is accelerated. The spectral normalization is realized by a power iteration method, and the specific formula is as follows:

σ(W)＝u^TWv

is a network parameter matrix after spectrum standardization; the operator | · |)₂Indicating the calculation of the L2 norm, operator T indicates transposition;

and S23, training the generator network and the discriminator network simultaneously by using the training sample set. And calculating the countermeasure loss by using the truth predicted value output by the discriminator, calculating the feature matching (feature matching) loss by using the output of the second last layer full connection of the discriminator, reversely propagating the error and optimizing the network parameters. The countermeasure loss comprises discriminator loss and generator loss, and the specific formula is as follows:

During actual training, firstly, the feature matching loss is merged into a generator to resist the loss to obtain

The generator and the arbiter are then trained in an alternating training manner. In each iteration of the batch training, 1 time is performed

And corresponding error feedback and network parameter update, then k times (3 times in this embodiment) L_DAnd corresponding error feedback and network parameter updates. And when the return loss occurs, the training is carried out by adopting a mode of updating local network parameters and fixing other parameters without training. In particular, backhaul

In the training process of the model, the MMD (maximum Mean variance) is used as a measure for generating the similarity degree of the sample distribution and the real sample distribution. Firstly, when a training set sample is traversed once each time, a generator network is used for generating a false sample set, and the total sample number and the sample number of each character type of the false sample set are required to be consistent with the training sample set; then, calculating and recording the MMD of the dummy sample set and the training sample set, and taking a model corresponding to the minimum MMD value which can be reached in the training process as a final model;

As shown in fig. 3, 4, and 5, the samples generated by the present invention have higher similarity to the samples of the same character type, but the same samples do not appear, and no pattern collapse phenomenon is found. Meanwhile, the obvious difference can be directly seen among different types of samples.

In conclusion, the invention is mainly used for solving the problem of scarcity of aerial handwritten character samples based on the inertial sensor. The invention relates to a time sequence feature map position coding-based deep convolution condition confrontation generation network, which utilizes the feature map position coding and a noise mapping layer for carrying out weight query according to the position coding, thereby realizing the controllable length of a generated sample and avoiding the problem of mode collapse. Meanwhile, the character type label is introduced to the network input end of the discriminator and the generator to carry out conditional antagonism, so that the type controllability of the generated sample is realized. The convolution smoothing and amplitude adjustment operations of the generator network can make the generated samples more realistic. Therefore, the method can generate the aerial handwritten character samples with controllable types and lengths, and the samples are good in reality degree and diversity and worthy of popularization.

The above description is only for the preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution of the present invention and the inventive concept within the scope of the present invention, which is disclosed by the present invention, and the equivalent or change thereof belongs to the protection scope of the present invention.

Claims

1. A method for generating an aerial handwriting inertial sensing signal based on deep confrontation learning is characterized by comprising the following steps:

Represents rounding down;

(2) initialization (T)_f+T_L) Each size is (d)_z+d_c)×d₁The noise mapping weight matrix takes out the Pos (t) th noise mapping weight matrix to carry out full-connection operation on the noise vector at the time t in the time sequence noise matrix, and updates the time sequence noise matrix; then, initialize (T)_f+T_L) Each size is d₁×d₂The hidden layer weight matrix takes out the Pos (t) th hidden layer weight matrix to carry out full-connection operation on the noise vector at the time t in the time sequence noise matrix, and updates the time sequence noise matrix; wherein d is₁Mapping the dimension of the weight matrix for noise, d₂Is the dimension of the weight matrix of the hidden layer;

s23, training the generator network and the discriminator network simultaneously by using a training sample set; calculating the countermeasure loss by using the truth prediction value output by the discriminator, calculating the characteristic matching loss by using the output of the second last layer full connection of the discriminator, reversely propagating errors and optimizing network parameters;

2. The method for generating an air handwriting inertia sensing signal based on deep confrontation learning according to claim 1, wherein the data acquisition and preprocessing process in step S1 includes the following steps:

and S12, carrying out moving average filtering denoising with fixed window length on the obtained inertial sensing signal sequence.

3. The method for generating an in-air handwritten inertial sensing signal based on deep confrontation learning of claim 1, wherein the items of loss calculation of step S23 are that the confrontation loss includes discriminator loss and generator loss, and the specific formula is:

wherein L is_DIs the sign of the penalty of the arbiter_GIs the sign of the generator against losses, L_fmIs the sign of the loss of feature matching, n is the number of training samples of the same batch;

4. The method for generating the aerial handwriting inertia sensing signal based on the deep confrontation learning as claimed in claim 3, wherein: in the network training process of step S23, the feature matching loss is first incorporated into the generator countermeasure loss to obtain

And corresponding error feedback and network parameter update, and then L is performed k times_DThe calculation of (a) and the corresponding error feedback and network parameter updating; when the return loss occurs, a mode of updating local network parameters and fixing other parameters without training is adopted for training; in particular, backhaul

5. The method for generating the aerial handwriting inertia sensing signal based on the deep confrontation learning as claimed in claim 1, wherein: in the network training process of step S23, the maximum mean difference MMD is used as a measure for generating the similarity between the sample distribution and the true sample distribution; firstly, when a training set sample is traversed once each time, a generator network is used for generating a false sample set, and the total sample number and the sample number of each character type of the false sample set are required to be consistent with the training sample set; and then, calculating and recording the MMD of the false sample set and the training sample set, and taking the model corresponding to the minimum MMD value which can be reached in the training process as a final model.

6. The method for generating the aerial handwriting inertia sensing signal based on the deep confrontation learning as claimed in claim 1, wherein: in the network training process of step S23, spectrum normalization is performed when training the arbiter network; specifically, after updating the network parameters of the discriminator each time, the network parameters are updated layer by using spectrum standardization, so that the discriminator meets the Lipschiz continuous condition, and the network convergence is accelerated; the spectral normalization is realized by a power iteration method, and the specific formula is as follows:

σ(W)＝u^TWv

is a network parameter matrix after spectrum standardization; the operator | · | non-conducting phosphor₂Indicating the computation of the L2 norm and operator T denotes transposition.

7. The method for generating the aerial handwriting inertia sensing signal based on the deep confrontation learning as claimed in claim 4, wherein: total length T of the timing characteristic diagram_f+T_L32; the up-sampling multiplying power alpha is 8; the data dimension d of the inertial sensing signal is 6; the noise vector dimension d_z100; the character category number and category label one-hot encoding dimension d_c62; the noise mapping weight matrix dimension d₁32, the hidden layer weight matrix dimension d₂64; in the alternative training L_DAnd the times k of calculation, return and parameter updating is 3.