CN108875592A

CN108875592A - A kind of convolutional neural networks optimization method based on attention

Info

Publication number: CN108875592A
Application number: CN201810519139.5A
Authority: CN
Inventors: 王红滨; 王勇军; 何鸣; 王念滨; 周连科; 陈田田; 秦帅; 赵昱杰; 李秀明; 薛冬梅
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2018-04-13
Filing date: 2018-05-28
Publication date: 2018-11-23

Abstract

The present invention is to provide a kind of convolutional neural networks optimization method based on attention model.The noise data of submarine target is segmented first, extracts its MFCC for every section of noise data, the purpose is to target noise data are become to the vector quantization data of fixed length.Then, the vector quantization data of obtained fixed length are spliced by the arrangement position and its sequential relationship of hydrophone in experimentation, it forms a complete period water and listens a gust feature, then listen a gust feature to change into corresponding picture to be input in trained network as input data set in the water of formation again.The present invention in the interpretation of result of service condition and modifies and optimizes to model to model by testing, and deep learning obtains the promotion of 10%-15% to Underwater Targets Recognition discrimination.

Description

A kind of convolutional neural networks optimization method based on attention

Technical field

The present invention relates to a kind of Underwater targets recognitions.

Background technique

Underwater acoustic signal processing and sonar technique be one quickly grow, demand driving power is powerful, application prospect is extremely wide Subject, be an important component part of modern sonar system and hydroacoustic electronic warfare, all the time by many scholars, engineering The very big concern of technical staff.Various submarine targets all have its own distinctive noise characteristic, can be used for target identification, divide Class, status monitoring etc. application field.Due to the complexity of marine environment and the particularity in underwater sound signal channel, to make an uproar from target A kind of validity feature expression that can reflect target substantive characteristics and be able to satisfy underwater long-range detection requirement again is extracted in acoustical signal, The always problem in this field.

Traditional Underwater Targets Recognition solution is mostly started with from the low-frequency information of submarine target, the DEMON based on target The generic that analysis obtains target is carried out with LOFAR spectrum.

Summary of the invention

The purpose of the present invention is to provide one kind quickly to carry out dimensionality reduction operation to model to reduce over-fitting wind The convolutional neural networks optimization method based on attention model of danger.

The object of the present invention is achieved like this：

(1) cutting of noise data

The cutting of the noise data is segmented as unit of 100ms to the noise data of submarine target；

(2) extraction of data characteristics

The extraction of the data characteristics is to extract its MFCC, the extraction process of MFCC such as formula for every section of noise data For：

In formula, f is target noise frequency, and mel (f) is the mel cepstrum coefficients that shown target noise extracts；

(3) multichannel data splices

After submarine target noise data is segmented and carries out the extraction of data characteristics, in conjunction with the group battle array position of hydrophone, group At corresponding vector matrix, meanwhile, the data of different time series data composition different batches, every section of noise data passes through previous step Obtained MFCC is spliced by the exhaust position of multichannel hydrophone and the sequential relationship of every section of noise data, forms depth Learning model input data set；

(4) pondization operation is accelerated based on attention model

Submarine target noise data is handled using convolution model, attention is defined as in the processing method of convolutional coding structure Characteristic pattern in corresponding convolutional layer after convolution operation, the dimensionality reduction direction of characteristic pattern is determined by the method for principal component analysis, is being mentioned It takes and carries out pond on the basis of the feature of convolution kernel；

(6) attention weighting connection

The feature for being extracted convolutional neural networks using convolution kernel feature, i.e. the last layer are as a result, be based on attention model It is weighted processing.

After obtaining feature by convolution operation in convolutional neural networks, classification can be done using these features.It is theoretical On say, directly can go to train classifier using all obtained features extracted, but if can directly face greatly if training Calculation amount challenge.The inspiration that researcher obtains from biology has using the mankind in image recognition and classification " static Property " attribute, aggregate statistics are carried out to the feature of different location, these summary statistics features not only have lower dimension, together When can also prevent over-fitting.But the problem of traditional pondization strategy is that all pond strategies are fixed, i.e., pondization operates It is only simple when participating in model training that dimensionality reduction rambunctiously is carried out to upper one layer of convolution results.The invention proposes a kind of combinations The pond and splicing construction of convolutional neural networks and attention model are divided into interlayer attention pond model and towards full articulamentum Merging features attention split-join model.Dimensionality reduction operation quickly can be carried out to reduce the over-fitting wind of model to model Danger.

The invention proposes pond and the splicing constructions of a kind of combination convolutional neural networks and attention model, are divided into interlayer The attention split-join model of attention pond model and the merging features towards full articulamentum.Interlayer attention pond model into The characteristics of row pondization fully considers current convolution kernel when operating carries out effective pond according to the data characteristic of convolution kernel and characteristic pattern Change, the operation of this pondization not only can the scale in raising pond by a relatively large margin can be with thus to the quick dimensionality reduction of characteristic pattern progress Prevent the over-fitting easily caused on lesser data set using deep learning model.Meanwhile with convolution kernel and characteristic pattern Data characteristic as attention pondization operation carry out the feature of image can be kept not lose during quick dimensionality reduction It loses.

Traditional Underwater Targets Recognition solution is mostly started with from the low-frequency information of submarine target, the DEMON based on target The generic that analysis obtains target is carried out with LOFAR spectrum.The present invention is using attention model to the pond of convolutional neural networks model Layer optimizes, and carries out analysis identification using audio-frequency information of the convolutional neural networks to submarine target, is extracted using sound characteristic Vectorization method of the method as model learning input data believes the vector quantization information processing of the audio data of generation at audio frequency Input data of the thermodynamic chart of breath as model training.By testing to model in the interpretation of result of service condition and to model It modifies and optimizes, deep learning obtains the promotion of 10%-15% to Underwater Targets Recognition discrimination.

The present invention is using the method for MFCC vector splicing multichannel hydrophone input and merely using MFCC and other classifiers Classification results comparing result.Wherein list is apparently higher than using the method for MFCC vector joining method fusion multichannel hydrophone The pure method that sound is described using MFCC.Compared to minimum recognition accuracy, present invention engagement MFCC vector used is spelled The method connect improves nearly by 16.1% than the accuracy rate of conventional method.

Detailed description of the invention

Fig. 1 is the convolutional neural networks application framework based on attention model；

Fig. 2 is the interlayer pondization operation based on attention model；

Fig. 3 is the full connection layer operation based on attention model；

Fig. 4 is the convolutional neural networks structure based on attention model；

Fig. 5 is the convolution kernel size Comparative result based on attention model；

Fig. 6 is data set preprocessing process；

Fig. 7 is that data set extracts result example.

Specific embodiment

It illustrates below and the present invention is described in more detail.

Training dataset is pre-processed first.Spy directly is done to noise data using Mel-Frequency with previous Sign extracts difference, and the present invention is first segmented the noise data of submarine target as unit of 100ms, for every section of noise number According to its MFCC is extracted, the purpose is to target noise data are become to the vector quantization data of fixed length.Then, by the arrow of obtained fixed length Quantized data is spliced by the arrangement position and its sequential relationship of hydrophone in experimentation, forms a complete period Water listens a gust feature, then listens a gust feature to change into corresponding picture to be input to training net as input data set in the water of formation again In network.

(1) cutting of noise data

The noise data obtained from experimental situation is institute after the multichannel hydrophone group battle array for being placed in different location different angle The underwater target noise of measurement, by taking 16 road water listen battle array as an example, single channel hydrophone time of measuring 6min, frequency band 25.6kHz, sample rate All signals for 65536Hz, acquisition are voltage value.Since the unsuitable convolutional neural networks of the data of single hydrophone are direct It is handled, while considering that the architectural characteristic of convolutional network, the present invention carry out noise data acquired in multiple groups hydrophone There is the processing being directed to.The present invention considers hydrophone and obtains the positional relationship of data and the sequential relationship of data itself, right first Data are cut, and then opsition dependent is integrated with sequential relationship.Cutting unit will fully consider hydrophone characteristic.Hereafter divide Singulation position is set to 100ms, to form the input data set of the Underwater Targets Recognition of convolutional coding structure.

(2) extraction of data characteristics (by taking MFCC as an example)

Present invention use can merge a variety of noise characteristic extracting methods, such as LPCC, PLP, hereafter by taking MFCC as an example, Shown in the extraction process of MFCC such as formula (4), in specifically used this method, the acquisition of underwater noise used by considering is set Standby acquisition characteristic carrys out setup parameter.By 512 frames of every section of noise point in this example, 31 triangular filters are utilized.The present invention is mentioning While taking single order MFCC (16 filters), it is contemplated that the static characteristic that MFCC can only obtain noise is unable to get noise number According to behavioral characteristics, while extracting single order MFCC result again extract target noise first-order difference MFCC (15 scales Point), behavioral characteristics of the first-order difference MFCC as target noise, the extraction submarine target for allowing the present invention finer and smoother is made an uproar The feature of sound data.It is combined finally by by all characteristic lines, to obtain one 128 for the noise data after every section of cutting The feature vector of × 1 dimension.

In formula, f is target noise frequency, and mel (f) is the mel cepstrum coefficients that shown target noise extracts.

(3) multichannel data splices

Data splicing is the significant process that the present invention combines actual experiment, and actually data splicing is exactly one by hydrophone Process of the position in conjunction with target noise sequential relationship.Specific feature extracting method will be pressed after the segmentation of submarine target noise data Afterwards, in conjunction with the group battle array position of specific hydrophone, corresponding vector matrix is formed, meanwhile, different batches of different time series data composition Secondary data, such as data of all hydrophones of first group of 100ms form batch (Batch) data, second group of 100ms's All hydrophone data form the data of another Batch.Every section of noise data is by previous step by obtained MFCC by more The sequential relationship of the exhaust position of road hydrophone and every section of noise data is spliced, to form deep learning mode input Data set.

(4) pondization operation is accelerated based on attention model

The reduction process that the present invention accelerates pondization to operate using attention model.Attention mechanism, which will be one, to be encoded The method that device-decoder architecture is freed from the internal attribute of regular length.The attention operation that the present invention defines refers to It is the operation trend by keep model in step before in current operation.The present invention is handled underwater using convolution model Target noise data, attention is defined as the feature in corresponding convolutional layer after convolution operation in the processing method of convolutional coding structure Figure, the dimensionality reduction direction of characteristic pattern is determined by the method for principal component analysis, to provide effective finger for adjacent pondization operation It leads.The present invention fully considers the effect of upper one layer of convolution kernel carrying out Chi Huashi, on the basis of extracting the feature of convolution kernel into Row pond.

(5) attention weighting connection

The feature that the present invention utilizes convolution kernel feature to extract convolutional neural networks, i.e. the last layer result.Convolutional Neural The full articulamentum of network plays the role of " classifier " in entire convolutional neural networks.If convolutional layer, pond layer and swash The operations such as function layer living are if initial data to be mapped to hidden layer feature space, and full articulamentum then plays the " distribution that will be acquired Formula character representation " is mapped to the effect in sample labeling space.The apish pathways for vision of convolutional layer extracts feature, full articulamentum one As be responsible for classification or return, since full articulamentum can lose some feature locations information.The core operation connected entirely is exactly square Battle array vector product, essence is exactly by a feature space linear transformation to another feature space.Any dimension of object space is all Think to will receive source space per one-dimensional influence.The feature that the present invention utilizes convolution kernel feature to extract convolutional neural networks, That is the last layer is as a result, be weighted processing based on attention model, to the characteristics of not only having considered process of convolution but also remain spy The location information of sign.

Main feature and content of the invention is as follows：

(1) pondization operation is accelerated based on attention model

Encoder-Decoder structure shows advanced level in multiple fields, but this structure indicates list entries For the internal representation of regular length.The length of list entries is limited, also causes model to the performance of especially long list entries It is deteriorated.Focus can be freed from preceding n fixed sequence program using attention model, to accomplish to be concerned about desired pass N preamble sequence of note.Attention mechanism is one and liberates coder-decoder structure from the internal attribute of regular length Method out.By keeping model to the intermediate output of each step in list entries treatment process as a result, training pattern learns Input how is selectively paid close attention to, and it is connected with the item in output sequence.People when carrying out observation image, In fact it is not that once just each position pixel of entire image has been seen, is to focus onto figure according to demand mostly The specific part of picture.And the image study that the mankind can observe before will observe what image attention power should be concentrated to following Position, the reduction process that the present invention accelerates pondization to operate using attention model.

(2) full articulamentum (fully connected layers, FC) plays " classification in entire convolutional neural networks The effect of device ".If the operations such as convolutional layer, pond layer and activation primitive layer are that initial data is mapped to hidden layer feature space If, " the distributed nature expression " that full articulamentum then plays the role of to acquire is mapped to sample labeling space.Actually make In, full articulamentum can be realized by 1 × 1 convolution operation to reach quick calculated result, meanwhile, 1 × 1 convolution kernel can play one A effect across channel polymerization plays the purpose for reducing parameter so may further play the role of dimensionality reduction (or rising dimension). The apish pathways for vision of convolutional layer extracts feature, and full articulamentum is generally responsible for classifying or return, since full articulamentum can be lost Lose some feature locations information.

The core operation connected entirely is exactly matrix-vector product, and essence is exactly by a feature space linear transformation to another A feature space.Any dimension of object space all thinks to will receive every one-dimensional influence of source space.The present invention utilizes convolution kernel The feature that feature extracts convolutional neural networks, i.e. the last layer are as a result, be weighted processing based on attention model, thus both The characteristics of considering process of convolution remains the location information of feature again.

Accelerate pondization operation for based on attention model, calculation method is as follows.

Model uses AoC_LIt indicates, shown in calculation method such as formula (1)

In formula, L_iI-th of pond of L layer is indicated as a result, eigVector (k_i) indicate kth_iA extracted spy of convolution kernel Levy vector, Area (k_i) indicate kth_iThe region that a convolution kernel is covered.

IoC_iThe attention model influenced based on convolution kernel is represented, then shown in its calculation method such as formula (2).

In formula, w_iFor a weight matrix, specific gravity of every one-dimensional characteristic in entire characteristic pattern is recorded,To connect entirely Model treatment before layer is as a result, m is the number of convolution kernel.

w_iCalculating be calculated with multilayer perceptron model, shown in method such as formula (3).

w_i=f (k_i,y_i) (4)

In formula, f () indicates a feedforward neural network.Feedforward neural network is input with convolution kernel, with k_iRepresent it Corresponding convolution kernel, y_iIndicate label corresponding to this feature.

Claims

1. a kind of convolutional neural networks optimization method based on attention model, it is characterized in that：

(1) cutting of noise data

(2) extraction of data characteristics

The extraction of the data characteristics is to extract its MFCC for every section of noise data, and the extraction process of MFCC such as formula is：

(3) multichannel data splices

After submarine target noise data is segmented and carries out the extraction of data characteristics, in conjunction with the group battle array position of hydrophone, phase is formed The vector matrix answered, meanwhile, the data of different time series data composition different batches, every section of noise data is obtained by previous step MFCC spliced by the exhaust position of multichannel hydrophone and the sequential relationship of every section of noise data, formed deep learning mould Type input data set；

(4) pondization operation is accelerated based on attention model

Submarine target noise data is handled using convolution model, attention is defined as accordingly in the processing method of convolutional coding structure Characteristic pattern in convolutional layer after convolution operation determines the dimensionality reduction direction of characteristic pattern by the method for principal component analysis, rolls up extracting Pond is carried out on the basis of the feature of product core；

(6) attention weighting connection

The feature for being extracted convolutional neural networks using convolution kernel feature, i.e. the last layer based on attention model as a result, carried out Weighting processing.