CN107679462A

CN107679462A - A kind of depth multiple features fusion sorting technique based on small echo

Info

Publication number: CN107679462A
Application number: CN201710823051.8A
Authority: CN
Inventors: 于刚; 李艇
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2017-09-13
Filing date: 2017-09-13
Publication date: 2018-02-09
Anticipated expiration: 2037-09-13
Also published as: CN107679462B

Abstract

The invention provides a kind of depth multiple features fusion sorting technique based on small echo, including cognitive phase on line lower training stage and line, wherein, the line lower training stage is trained by building convolutional neural networks to the sample of n class labels, wavelet transform is added in the convolutional layer of model end and full articulamentum to decompose the mapping of depth multiple features, the high-low frequency weight linear fusion that will be obtained, so as to obtain optimal weights；Cognitive phase is identified and classified to the action in image and video with convolutional neural networks collocation SVMs on line.The beneficial effects of the invention are as follows：Improve the accuracy rate of the Classification and Identification of image/video.

Description

A kind of depth multiple features fusion sorting technique based on small echo

Technical field

The present invention relates to robot vision image procossing, more particularly to a kind of depth multiple features fusion classification based on small echo Method.

Background technology

Recent years, deep learning become science and technology and enclose most fiery vocabulary.It has gradually overturned speech recognition, image point The algorithm mentality of designing of the various fields such as class, text understanding, one kind has been gradually formed from training data, has been arrived by an end The model at end, then directly output obtains a kind of new model of final result.As the arrival in big data epoch and GPU etc. are each The development of the more powerful computing device of kind, deep learning is further strengthened, can make full use of various mass datas, fully automated Initial data is condensed into certain knowledge by ground study to abstract knowledge representation.Wherein convolutional neural networks are depth again Most common framework in habit.

With the continuous expansion of convolutional neural networks framework, the network number of plies deepens constantly, the feature of each module extraction Map gradually increases, by simply by convolutional layer flatten be a vector carry out again full connection not only amount of calculation it is huge but also Feature Fuzzy can be caused, so as to influence the accuracy rate of the Classification and Identification of image/video.

The content of the invention

In order to solve the problems of the prior art, the invention provides a kind of standard for the Classification and Identification for improving image/video The depth multiple features fusion sorting technique based on small echo of true rate.

The invention provides a kind of depth multiple features fusion sorting technique based on small echo, including line lower training stage and line Upper cognitive phase, wherein, the line lower training stage is trained by building convolutional neural networks to the sample of n class labels, in mould The convolutional layer of type end and full articulamentum add wavelet transform and the mapping of depth multiple features are decomposed, the height that will be obtained Frequency component linear fusion, so as to obtain optimal weights；Cognitive phase convolutional neural networks collocation SVMs pair on line Action in image and video is identified and classified.

As a further improvement on the present invention, the line lower training stage comprises the following steps：

Step 1：Structure convolutional neural networks first are trained；

Step 2：3 passages are set in first layer, are respectively：1 gray channel, 2 light stream passages, wherein gray scale are led to Road includes the gray level image group of video segment, and light stream passage includes the movement relation information of the interframe of video segment two；

Step 3：Build multimode convolutional neural networks；

Step 4：Using wavelet transform, high and low frequency is extracted from the feature map of the full articulamentum of each module Component, the high-low frequency weight in three modules is each merged；

Step 5：High-low frequency weight after fusion is connected by merge layers and is connected entirely with next layer, is obtained To the feature map of one group of 128 dimension；

Step 6：N output node, corresponding n kinds classification behavior, each node and all feature of last layer are set Map is connected entirely；

Step 7：Calculating parameter between each layer is adjusted by back-propagation algorithm so that the output of each sample Error between label declines, and after error meets requirement, training finishes, then to each output vector according to its corresponding sample Video line is that title sets label.

As a further improvement on the present invention, the training stage comprises the following steps on line：

Step 8：Input needs the video flowing identified, the pretreatment in step 1 is carried out to video, in being trained under line Obtained optimal models, it is loaded into weight, it would be desirable to Internet of the video flowing of identification by step 2 to step 8, extract feature Vector；

Step 9：Characteristic vector in step 10 is classified using SVMs, finds the mark most matched therewith Label, obtain optimal accuracy rate.

As a further improvement on the present invention, comprise the following steps：

S1：Obtain training sample image；

S2：Image preprocessing；

S3：Build gray scale, light stream multichannel network passage；

S4：Gray scale, light stream x and y channel network are built respectively；

S5：Wavelet transform is carried out to the full articulamentum Feature Mapping of each channel end；

S6:High and low frequency component is extracted, carries out the Fusion Features of interchannel；

S7:Pass through the feature after merge layer fused in tandem；

S8：Train and extract optimal weights；

S9：Video is sent into the optimal models trained and carries out feature extraction；

S10：ONLINE RECOGNITION is carried out using SVMs.

As a further improvement on the present invention, in step sl, training sample and sample label are obtained from data set； In step s 2, it is unified that resolution ratio is carried out to the video flowing that training sample is concentrated, resolution ratio system is carried out using Lanczos interpolation methods One, along x in Interpolation Process, respectively to eight adjacent click-through row interpolations, that is, calculate weighted sum, Lanczos is inserted in y directions The window function of value method is：

Two dimensional form is then：L (x, y)=L (x) L (y).

As a further improvement on the present invention, in step s3, by establishing gray channel to the gray processing of video flowing, ash Degree figure retains the most basic information of original image, and the light of x and y both directions is established in the extraction to inter motion information in video flowing Circulation road, the Optic flow information of interframe is extracted using improved L-K optical flow methods, pyramid down-sampling is replaced using convolution kernel, it is first First partial derivative f is tried to achieve from f (x, y, t)_x,f_y,f_t, convolution kernel selects Prewitt wave filters, i.e.,：

I_x=I*D_x, I_y=I*D_y, I_t=I*D_t

Velocity estimation is carried out using least square method：

As a further improvement on the present invention, in step s 4, each passage passes through sampling processing, and dimension of picture is changed into 150*100,5 layers of convolutional layer are built, 3 layers of pond layer, reconnect one layer of full articulamentum, first layer convolutional layer convolution kernel size afterwards For 5*5*5, convolutional layer convolution kernel size afterwards is 3*3*3, and step-length is arranged to 1, and pond layer uses 3D maxpooling, Core selection 2*2*2 and two kinds of the 2*2*1 of pond layer, activation primitive selection relu.

As a further improvement on the present invention, in step s 5, the Feature Mapping of the full articulamentum of each channel end is used Wavelet transform carries out extraction high-low frequency weight, passes through continuous wavelet function ψ_a,b(t) it can be write as discrete wavelet function：

Show that wavelet transform form is：

As a further improvement on the present invention, in step s 6, by gray channel, the full articulamentum of light stream x and y passage 512 dimension feature map are decomposed into 3 couples of 128 dimension feature map containing high-low frequency weight, then 128 dimensions by each passage Feature map carry out vector product calculation, obtain the two groups of feature for containing 128 dimensions map；In the step s 7, by setting up Merge layers, mode set concat, the high fdrequency component of fusion and low frequency component are connected, and set n output node, right The classification behavior of n kinds is answered to be connected entirely with all feature map in upper strata.

As a further improvement on the present invention, in step s 8, training sample set is put into network and be trained, adjusted back The minimum model of penalty values, preserves optimal weights；In step slo, the video flowing of input is extracted by convolutional neural networks The feature map of 128 dimensions, it is linear function to select kernel function, and structure SVMs carries out Classification and Identification.

The beneficial effects of the invention are as follows：By such scheme, to changing in the convolutional neural networks training process of classics Enter, add wavelet transform and the depth characteristic in training process is decomposed, extract multiresolution features, then will be each Corresponding multiresolution features just merge in depth characteristic, strengthen bottom-up information, strengthen high layer information, reduce network calculations Complexity, while enhance the robustness of network training, improve the accuracy rate of the Classification and Identification of image/video.

Brief description of the drawings

Fig. 1 is a kind of flow chart of the depth multiple features fusion sorting technique based on small echo of the present invention.

Fig. 2 is single channel network.

Fig. 3 is to be based on the improved convolutional neural networks overall construction drawing of small echo.

Embodiment

The invention will be further described for explanation and embodiment below in conjunction with the accompanying drawings.

A kind of depth multiple features fusion sorting technique based on small echo, in two stages：Know on line lower training stage and line The other stage.The sample of n class labels is trained by building convolutional neural networks, convolutional layer and full connection in model end Layer adds wavelet transform and the mapping of depth multiple features is decomposed, the high-low frequency weight linear fusion that will be obtained, so as to obtain Optimal weights are obtained, then the action in image and video is identified and classified with neutral net collocation SVMs.

(1) the line lower training stage

Step 1：Structure convolutional neural networks first are trained, by taking action recognition as an example, using Activity recognition data set HMDB51 is training set, and video segment is pre-processed, unified video resolution；

Step 3：Build multimode convolutional neural networks

Step 6：N output node, corresponding n kind classification behaviors (label) are set, and each node owns with last layer Feature map are connected entirely；

Step 7：Calculating parameter between each layer is adjusted by back-propagation algorithm so that the output of each sample Error between label declines, and after error meets requirement, training finishes, then to each output vector according to its corresponding sample Video line is that title sets label；

(2) ONLINE RECOGNITION

A kind of depth multiple features fusion sorting technique based on small echo provided by the invention, convolution god of this method to classics Through being improved in network training process, add wavelet transform and the depth characteristic in training process is decomposed, carry Multiresolution features are taken, then corresponding multiresolution features in each depth characteristic are just merged, strengthen bottom-up information, are strengthened High layer information, reduces the complexity of network calculations, while enhances the robustness of network training.

As shown in figure 1, a kind of depth multiple features fusion sorting technique based on small echo, specifically includes following steps：

S1：Obtain training sample image：

Training sample and sample label are obtained from HMDB51 data sets.

S2：Image preprocessing：

It is unified that resolution ratio is carried out to the video flowing that training sample is concentrated, when carrying out resolution ratio unified operation, image border It can obscure, always cause information loss.Resolution ratio unification is carried out using Lanczos interpolation methods herein, along x, y in Interpolation Process Direction is respectively to eight adjacent click-through row interpolations, that is, calculate weighted sum, so it is 8*8 description.Although Lanczos interpolation calculations amount is more complicated compared with other interpolation methods, but due to being run on GPU, it is little to overall performance impact, Effect is also more notable than other interpolation methods simultaneously.Its window function is：

Two dimensional form is then：L (x, y)=L (x) L (y).

S3：Build gray scale, light stream multichannel network passage：

By establishing gray channel to the gray processing of video flowing, gray-scale map retains the most basic information of original image, so Gray channel is essential.The light stream passage of x and y both directions is established in extraction to inter motion information in video flowing. Light stream is the instantaneous velocity of pixel motion of the space motion object on observation imaging plane, is to utilize pixel in image sequence to exist The correlation between change and consecutive frame in time-domain finds previous frame with existing corresponding relation between present frame, from And calculate a kind of method of the movable information of object between consecutive frame.For in action recognition, light stream passage is equally must not Can be less.Herein using the Optic flow information of improved L-K optical flow methods extraction interframe.Pyramid down-sampling is replaced using convolution kernel Amount of calculation can be reduced, while effect is more excellent.First partial derivative f is tried to achieve from f (x, y, t)_x,f_y,f_t, convolution kernel selects Prewitt Wave filter, i.e.,：

I_x=I*D_x, I_y=I*D_y, I_t=I*D_t

Velocity estimation is carried out using least square method：

S4：Gray scale, light stream x and y channel network are built respectively：

Fig. 2 is single channel network structure, and each passage is handled by down-sampling, and dimension of picture is changed into 150*100, is built 5 layers of convolutional layer, 3 layers of pond layer, one layer of full articulamentum are reconnected afterwards.First layer convolutional layer convolution kernel size is 5*5*5, afterwards Convolutional layer convolution kernel size be 3*3*3, step-length is arranged to 1.Pond layer uses 3D maxpooling, the core choosing of pond layer Two kinds of 2*2*2 and 2*2*1 are selected, prevents that dimension declines too fast on later time.Activation primitive selects relu, and the function can simulate brain Neuron receives signal and more accurately activates model, and the sigmoid functions that compare have unilateral suppression, relatively broad excited side The characteristics of boundary and sparse activity.

S5：Wavelet transform is carried out to the full articulamentum Feature Mapping of each channel end：

The Feature Mapping of the full articulamentum of each channel end is subjected to extraction high-low frequency weight using wavelet transform, passed through Continuous wavelet function ψ_a,b(t) it can be write as discrete wavelet function：

Show that wavelet transform form is：

S6:High and low frequency component is extracted, carries out the Fusion Features of interchannel：

Gray channel, 512 dimension feature map of the full articulamentum of light stream x and y passage are decomposed into 3 by Dwt operations in Fig. 3 Vector product fortune is carried out to the 128 dimension feature map containing high-low frequency weight, then by 128 dimension feature map of each passage Calculate, obtain the two groups of feature for containing 128 dimensions map.

S7:Pass through the feature after merge layer fused in tandem：

By setting up merge layers, mode sets concat, the high fdrequency component of fusion and low frequency component is connected, if N output node is put, corresponding n kind classification behaviors (label) feature maps all with upper strata carry out being connected entirely

S8：Train and extract optimal weights：

Training sample set is put into network and is trained, the minimum model of readjustment penalty values, preserves optimal weights.

S9：Video is sent into the optimal models trained and carries out feature extraction.

S10：ONLINE RECOGNITION is carried out using SVMs：

By feature map of the video flowing of input by the dimension of convolutional neural networks extraction 128, it is linear to select kernel function Function, structure SVMs carry out Classification and Identification.

It can relatively be reached without the convolutional neural networks model for adding small echo progress depth characteristic fusion, method of the invention To more preferable effect, tested in common data sets, also reach higher accuracy rate.Meanwhile limitation does not have the present invention For the identification of action in body embodiment, the Classification and Identification of image/video can be widely applied to.

A kind of depth multiple features fusion sorting technique based on small echo provided by the invention, using wavelet transform from Low frequency and high fdrequency component are extracted in feature map, high-low frequency weight is merged respectively, reaches enhancing bottom-up information, is strengthened high The purpose of layer information, so as to improve the accuracy rate of Network Recognition and robustness.

A kind of depth multiple features fusion sorting technique based on small echo provided by the invention, suitable for robot vision image Processing technology field, be particularly suitable for use in deep learning, feature extraction, Computer Vision.

Above content is to combine specific preferred embodiment further description made for the present invention, it is impossible to is assert The specific implementation of the present invention is confined to these explanations.For general technical staff of the technical field of the invention, On the premise of not departing from present inventive concept, some simple deduction or replace can also be made, should all be considered as belonging to the present invention's Protection domain.

Claims

A kind of 1. depth multiple features fusion sorting technique based on small echo, it is characterised in that：Including on line lower training stage and line Cognitive phase, wherein, the line lower training stage is trained by building convolutional neural networks to the sample of n class labels, in model The convolutional layer of end and full articulamentum add wavelet transform and the mapping of depth multiple features are decomposed, the low-and high-frequency that will be obtained Component linear fusion, so as to obtain optimal weights；Cognitive phase arranges in pairs or groups SVMs to figure with the convolutional neural networks on line Action in picture and video is identified and classified.
2. the depth multiple features fusion sorting technique according to claim 1 based on small echo, it is characterised in that trained under line Stage comprises the following steps：

Step 1：Structure convolutional neural networks first are trained；

Step 2：3 passages are set in first layer, are respectively：1 gray channel, wherein 2 light stream passages, gray channel bag Gray level image group containing video segment, light stream passage include the movement relation information of the interframe of video segment two；

Step 3：Build multimode convolutional neural networks；

Step 4：Using wavelet transform, high and low frequency point is extracted from the feature map of the full articulamentum of each module Amount, the high-low frequency weight in three modules is each merged；

Step 5：High-low frequency weight after fusion is connected by merge layers and is connected entirely with next layer, obtains one The feature map of the dimension of group 128；

Step 6：N output node is set, and corresponding n kinds classification behavior, each node and all feature map of last layer are complete Connection；

Step 7：Calculating parameter between each layer is adjusted by back-propagation algorithm so that the output of each sample and mark Error between label declines, and after error meets requirement, training finishes, then to each output vector according to its corresponding Sample video Behavior title sets label.
3. the depth multiple features fusion sorting technique according to claim 2 based on small echo, it is characterised in that trained on line Stage comprises the following steps：

Step 8：Input needs the video flowing identified, and the pretreatment in step 1 is carried out to video, by being obtained in being trained under line Optimal models, be loaded into weight, it would be desirable to the video flowing of identification by step 2 arrive step 8 Internet, extract feature to Amount；

Step 9：Characteristic vector in step 10 is classified using SVMs, the label most matched therewith is found, obtains To optimal accuracy rate.
4. the depth multiple features fusion sorting technique according to claim 1 based on small echo, it is characterised in that including following Step：

S1：Obtain training sample image；

S2：Image preprocessing；

S3：Build gray scale, light stream multichannel network passage；

S4：Gray scale, light stream x and y channel network are built respectively；

S5：Wavelet transform is carried out to the full articulamentum Feature Mapping of each channel end；

S6:High and low frequency component is extracted, carries out the Fusion Features of interchannel；

S7:Pass through the feature after merge layer fused in tandem；

S8：Train and extract optimal weights；

S9：Video is sent into the optimal models trained and carries out feature extraction；

S10：ONLINE RECOGNITION is carried out using SVMs.
5. the depth multiple features fusion sorting technique according to claim 4 based on small echo, it is characterised in that in step S1 In, training sample and sample label are obtained from data set；In step s 2, the video flowing concentrated to training sample divides Resolution is unified, carries out resolution ratio unification using Lanczos interpolation methods, along x in Interpolation Process, y directions are respectively to adjacent eight Individual click-through row interpolation, that is, weighted sum is calculated, the window function of Lanczos interpolation methods is：

<mrow> <mi>L</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>sin</mi> <mi> </mi> <mi>c</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>)</mo> </mrow> <mi>sin</mi> <mi> </mi> <mi>c</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>/</mo> <mi>a</mi> <mo>)</mo> </mrow> </mrow> </mtd> <mtd> <mtable> <mtr> <mtd> <mrow> <mi>i</mi> <mi>f</mi> </mrow> </mtd> <mtd> <mrow> <mo>-</mo> <mi>a</mi> <mo><</mo> <mi>x</mi> <mo><</mo> <mi>a</mi> </mrow> </mtd> </mtr> </mtable> </mtd> </mtr> <mtr> <mtd> <mn>0</mn> </mtd> <mtd> <mrow> <mi>o</mi> <mi>t</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> <mi>w</mi> <mi>i</mi> <mi>s</mi> <mi>e</mi> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow>

Two dimensional form is then：L (x, y)=L (x) L (y).
6. the depth multiple features fusion sorting technique according to claim 5 based on small echo, it is characterised in that in step S3 In, by establishing gray channel to the gray processing of video flowing, gray-scale map retains the most basic information of original image, in video flowing The light stream passage of x and y both directions is established in the extraction of inter motion information, and the light of interframe is extracted using improved L-K optical flow methods Stream information, pyramid down-sampling is replaced using convolution kernel, try to achieve partial derivative f from f (x, y, t) first_x,f_y,f_t, convolution kernel choosing With Prewitt wave filters, i.e.,：

I_x=I*D_x, I_y=I*D_y, I_t=I*D_t

Velocity estimation is carried out using least square method：

<mrow> <mi>E</mi> <mrow> <mo>(</mo> <msub> <mi>u</mi> <mn>1</mn> </msub> <mo>,</mo> <msub> <mi>u</mi> <mn>2</mn> </msub> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&Sigma;</mo> <mrow> <mi>x</mi> <mo>,</mo> <mi>y</mi> </mrow> </munder> <mi>g</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <msup> <mrow> <mo>&lsqb;</mo> <msub> <mi>u</mi> <mn>1</mn> </msub> <msub> <mi>f</mi> <mi>x</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>u</mi> <mn>2</mn> </msub> <msub> <mi>f</mi> <mi>y</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>f</mi> <mi>t</mi> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>,</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>&rsqb;</mo> </mrow> <mn>2</mn> </msup> <mo>.</mo> </mrow>
7. the depth multiple features fusion sorting technique according to claim 6 based on small echo, it is characterised in that in step S4 In, each passage passes through sampling processing, and dimension of picture is changed into 150*100, builds 5 layers of convolutional layer, 3 layers of pond layer, connects again afterwards One layer of full articulamentum is connect, first layer convolutional layer convolution kernel size is 5*5*5, and convolutional layer convolution kernel size afterwards is 3*3*3, Step-length is arranged to 1, and pond layer uses 3D maxpooling, core selection 2*2*2 and two kinds of the 2*2*1 of pond layer, activation primitive Select relu.
8. the depth multiple features fusion sorting technique according to claim 7 based on small echo, it is characterised in that in step S5 In, the Feature Mapping of the full articulamentum of each channel end is subjected to extraction high-low frequency weight using wavelet transform, by continuous Wavelet function ψ_a,b(t) it can be write as discrete wavelet function：

<mrow> <msub> <mi>&psi;</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>n</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>=</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mrow> <mo>-</mo> <mi>m</mi> <mo>/</mo> <mn>2</mn> </mrow> </msubsup> <mi>&psi;</mi> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mrow> <mo>-</mo> <mi>m</mi> </mrow> </msubsup> <mi>t</mi> <mo>-</mo> <msub> <mi>b</mi> <mn>0</mn> </msub> <mi>n</mi> <mo>)</mo> </mrow> </mrow>

Show that wavelet transform form is：

<mrow> <msub> <mi>Wf</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>n</mi> </mrow> </msub> <mo>=</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mrow> <mo>-</mo> <mi>m</mi> <mo>/</mo> <mn>2</mn> </mrow> </msubsup> <munderover> <mo>&Integral;</mo> <mrow> <mo>-</mo> <mi>&infin;</mi> </mrow> <mrow> <mo>+</mo> <mi>&infin;</mi> </mrow> </munderover> <mi>f</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <msup> <mi>&psi;</mi> <mo>*</mo> </msup> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mn>0</mn> <mrow> <mo>-</mo> <mi>m</mi> </mrow> </msubsup> <mi>t</mi> <mo>-</mo> <msub> <mi>nb</mi> <mn>0</mn> </msub> <mo>)</mo> </mrow> <mi>d</mi> <mi>t</mi> <mo>=</mo> <mo><</mo> <mi>f</mi> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>,</mo> <msub> <mi>&psi;</mi> <mrow> <mi>m</mi> <mo>,</mo> <mi>n</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>></mo> <mo>,</mo> <mi>m</mi> <mo>,</mo> <mi>n</mi> <mo>&Element;</mo> <mi>Z</mi> <mo>.</mo> </mrow>
9. the depth multiple features fusion sorting technique according to claim 8 based on small echo, it is characterised in that in step S6 In, by gray channel, 512 dimension feature map of the full articulamentum of light stream x and y passage are decomposed into 3 pairs containing high-low frequency weight 128 dimension feature map, then 128 dimension feature map of each passage are subjected to vector product calculation, obtain two groups and contain 128 dimensions Feature map；In the step s 7, by setting up merge layers, mode sets concat, by the high fdrequency component of fusion and low Frequency component is connected, and sets n output node, and corresponding n kinds classification behavior is connected entirely with all feature map in upper strata Connect.
10. the depth multiple features fusion sorting technique according to claim 9 based on small echo, it is characterised in that in step In S8, training sample set is put into network and is trained, the minimum model of readjustment penalty values, preserve optimal weights；In step In S10, by feature map of the video flowing of input by the dimension of convolutional neural networks extraction 128, it is linear letter to select kernel function Number, structure SVMs carry out Classification and Identification.