CN111524098A

CN111524098A - Neural network output layer cutting and template frame size determining method based on self-organizing clustering

Info

Publication number: CN111524098A
Application number: CN202010265447.7A
Authority: CN
Inventors: 郝梦茜; 张辉; 周斌; 杨柏胜; 倪少波; 靳松直; 丛龙剑; 刘严羊硕; 郑文娟; 韦海萍; 田爱国; 邵俊伟; 李建伟; 张孝赫; 张连杰; 张艺明
Original assignee: Beijing Aerospace Automatic Control Research Institute
Current assignee: Beijing Aerospace Automatic Control Research Institute
Priority date: 2020-04-07
Filing date: 2020-04-07
Publication date: 2020-08-11
Anticipated expiration: 2040-04-07
Also published as: CN111524098B

Abstract

The invention relates to a neural network output layer cutting and template frame size determining method based on self-organizing clustering, belongs to the technical field of target detection and identification of convolutional neural networks, and particularly provides a network output layer cutting and template frame size determining method aiming at an SSD algorithm. The self-organizing clustering is used for obtaining a better clustering result under the condition that the size distribution of the target is uncertain, the clustering result is used for calculating the upper limit area of the target, the number of layers of an output layer is determined, the output layers with overlarge receptive field and overlarge number of layers are deleted, the network depth and the number of parameters are reduced, the difficulty of model training is reduced, the convergence of the model is accelerated, the generalization capability of the model is improved, the time consumed by calculation is reduced, and the calculation efficiency is improved.

Description

Neural network output layer cutting and template frame size determining method based on self-organizing clustering

Technical Field

The invention relates to a neural network output layer cutting and template frame size determining method based on self-organizing clustering, belongs to the technical field of target detection and identification of convolutional neural networks, and particularly provides a network output layer cutting and template frame size determining method aiming at an SSD algorithm.

Background

In recent years, the convolutional neural network shows the performance far beyond that of the traditional image analysis method in the field of image target detection and identification, and has good use effect in the fields of civil use, national defense, industry and the like. At present, in academic circles, the main research direction of the convolutional neural network is mainly visible light image large target scenes, and in the problems, the size of a target is large, the characteristics are rich, training samples are rich, and a deeper network is required to provide better nonlinear characteristics for target detection and identification.

However, in some special application scenarios such as remote sensing and military, SAR and infrared images are mainly used, the imaging resolution is low, the target types are limited, the target pixel size is generally small, the number of training samples is limited, the use of a deeper network often causes the difficulty in convergence of the training process, the training result is easy to be over-fitted, the model generalization performance is poor, and the practical effect is poor.

In order to solve the problem, part of schemes reduce the difficulty of model training by reducing the network depth and reducing the number of parameters to be trained of the network, but the method for reducing the network depth basically depends on artificial experience adjustment, and the adjustment effect is difficult to guarantee.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the method is used for clustering the target sizes in training sample data by using a self-organizing clustering algorithm to obtain the number of clustering centers and the clustering centers, and aims at solving the problems that under the scenes of limited target types, generally small target pixel sizes and limited training sample number, the training convergence is difficult and the model generalization capability is poor due to an excessively deep SSD network; according to the corresponding criterion, the number of the SSD network output layers and the size of a template box (also called default box) are determined, unnecessary output layers are removed, the network depth is reduced, the network complexity is reduced, and the model training convergence difficulty is reduced. Aiming at the problems, the scheme provides a self-organizing clustering-based output layer cutting and template frame size determining method for an SSD algorithm, and by analyzing training sample data, sample size distribution is extracted, so that a proper network output layer number and a reasonable template frame size are determined, the original model output layer is cut and the template frame size is reasonably determined, the network depth is reduced, the network complexity is reduced, the model training convergence difficulty is reduced, and the calculation time is shortened.

The solution of the invention is:

a neural network output layer clipping and template frame size determining method based on self-organizing clustering comprises the following steps:

(1) each target on the training data is represented by a two-dimensional feature vector (w, h), wherein w is the width of a target pixel, h is the height of the target pixel, the number of the two-dimensional feature vectors (w, h) is represented by N, the two-dimensional feature vectors (w, h) are referred to as samples, and the samples are represented by x.

(2) Setting the initial clustering center number of the N samples obtained in the step (1) to be K and the minimum sample number in clustering to be theta_NAnd the standard deviation of the sample distance distribution in the clustering is theta_SThe minimum distance between the two cluster centers is theta_CThe maximum iterative operation number is I_max。

(3) Randomly selecting K samples from N samples as initial clustering centers, and enabling N samples to be selected as initial clustering centers_C＝K，N_CRepresenting the number of current cluster centers, each cluster center being represented by Z_jDenotes, j ═ 1,2, …, N_CS for the category corresponding to each cluster center_jDenotes, j ═ 1,2, …, N_CClass S_jThe number of samples in (1) is N_jDenotes, j ═ 1,2, …, N_CThe number of iterative operations is represented by I, and I is 1.

(4) Traversing all the samples x, calculating the samples x and each clustering center Z_jA distance D between_jSample x is classified into the category corresponding to the cluster center with the smallest distance to sample x.

(5) If a certain class S_jNumber of middle samples N_j<θ_NThen cancel the category and make the current cluster center number N_CReducing by 1, and classifying the samples in the category into other categories according to the minimum distance criterion in (4); otherwise to class S_jNo treatment is done.

(6) For each category S_jThe average value of the samples x is the corrected clustering center Z_j，j＝1,2,…,N_C。

(7) For each category S_jCalculate eachAverage distance of samples in a class to cluster center

j＝1,2,…,N_C。

(8) Calculating the total average distance between all the class samples and the corresponding cluster center

(9) Judging the class S_jSplit, merge and iterative operations.

1) If the iterative operation times I is more than or equal to I_maxI.e. the last iteration, by_CAnd (4) jumping to the step (13) when the value is 0.

2) If theta is greater than theta_NK/2, namely the number of the cluster centers is equal to or less than half of the specified value, the step (10) is entered, and the existing clusters are split.

3) If the number of iterations I is even, or N_CIf the K is more than or equal to 2K, the splitting treatment is not carried out, and the step (13) is skipped; if I is odd, and N_C<And 2K, entering the step (10) to perform splitting treatment.

(10) Calculate each class S_jFrom medium samples x to cluster center Z_jIs a standard deviation vector of_j，j＝1,2,…,N_C。

(11) For the standard deviation vector sigma calculated in (10)_jExtracting the maximum component by sigma_jmaxDenotes, j ═ 1,2, …, N_C。

(12) Set of maximum components σ_jmax}，j＝1,2,…,N_CIn, if there is σ_jmax>θ_SAnd either of the following two conditions is satisfied:

(1)

and N is_j>(θ_N+1)*2；

(2)N_C≤K/2；

Then Z will be_jSplitting into two new cluster centers, and counting the number N of the cluster centers_CAnd adding 1. After the splitting operation is finished, jumping back to the step (4), and adding 1 to the iterative operation times I; otherwise, the clustering center Z is not aligned_jThe operation proceeds to step (13).

(13) Calculating N_CDistance D between every two clustering centers_ij＝||Z_i-Z_j||，i＝1,2,…,N_C-1，j＝i+1,2,…,N_C。

(14) If the distance D between two nearest cluster centers_ij<θ_CCombining the two clustering centers into a new clustering center, combining the corresponding categories of the two clustering centers into a category, and counting the number N of the clustering centers_CSubtracting 1; otherwise, no processing is performed.

(15) If the iterative operation times I is more than or equal to I_maxAnd (5) ending the clustering operation, entering the step (16), and otherwise, returning to the step (4) and adding 1 to the iterative operation times I.

(16) To N_CIndividual clustering center Z_j＝(w_j,h_j) Calculating the upper limit area

To N_CAn

Sorting from small to large to obtain the maximum upper limit area

(17) According to

Judging the number L of output layers_out：

If it is

The number of output layers is L_out＝6；

If it is

Then outputNumber of layers L_out＝5；

If it is

The number of output layers is L_out＝4；

If it is

The number of output layers is L_out＝3；

If it is

The number of output layers is L_out＝2。

(18) There are 6 output layers in the SSD algorithm, conv4_3, fc7, conv8_2, conv9_2, conv10_2, conv11_ 2.

If L is_outOnly conv4_3, fc7 layers are reserved when the number is 2, and convolutional layers after fc7 are deleted;

if L is_outOnly conv4_3, fc7, conv8_2 are reserved when the value is 3, and the convolutional layer after conv8_2 is deleted;

if L is_outOnly conv4_3, fc7, conv8_2 and conv9_2 are reserved for 4, and the convolution layer behind conv9_2 is deleted;

if L is_outOnly conv4_3, fc7, conv8_2, conv9_2 and conv10_2 are reserved when the convolutional layer after conv10_2 is deleted when the convolutional layer is 5;

if L is_outNo pruning is done for the SSD network 6.

(19) Determining the output layer where the corresponding template frame is located for each clustering center:

to N_CIndividual clustering center Z_j＝(w_j,h_j) Calculating the area A_j＝w_j×h_j，

If A_j>(300/3)²Designing a corresponding template box at the conv11_2 layer;

if (300/5)²<A_j≤(300/3)²Designing a corresponding template box at the conv10_2 layer;

if (300/10)²<A_j≤(300/5)²Designing a corresponding template box at the conv9_2 layer;

if (300/19)²<A_j≤(300/10)²Designing a corresponding template box at the conv8_2 layer;

if (300/38)²<A_j≤(300/19)²Then the corresponding template box is designed at fc7 level;

if A_j≤(300/38)²The corresponding template box is designed at the conv4_3 layer.

(20) Determining the corresponding template frame size of each cluster center:

N_Cindividual clustering center Z_j＝(w_j,h_j) The corresponding template frame sizes are respectively:

max_size＝max(w_j,h_j)

where floor () is rounded down.

(21) If some output layer does not design corresponding template frames after the corresponding template frames are designed for all the cluster centers, designing according to the following criteria:

if a certain output layer does not design the corresponding template frame, the min _ size, max _ size, aspect _ ratio parameter of the layer closest to it is used. If two output layers are at the same distance from the Layer, the shallow Layer is Layer_BDeep Layer is Layer_TThen the output layer parameter

Layer adopted by aspect _ ratio_BAspect _ ratio of_B and Layer_TAspect _ ratio of_TThe union of (a).

(22) A template frame ratio of aspect _ ratio 1 is added to all output layers.

(23) And training the convolutional neural network which finishes the cutting of the output layer and the determination of the size of the template frame to obtain a neural network model with fewer layers, lower complexity and higher computational efficiency.

In the above scheme, in step (1), the specific method for extracting the target width w and height h in the labeling information comprises: reading the values < xmin >, < ymin >, < xmax >, < ymax > in each < bndbox > node in xml, and calculating the width w ═ xmax-xmin +1 and the height h ═ ymax-ymin +1 of the target.

In the scheme, in the step (3), the specific method for randomly selecting K samples is that K random numbers α are generated according to the uniform distribution of U (0,1) between 0 and 1₁,α₂,…,α_KTake the ceil (a)_iN) samples as the ith initial cluster center, with ceil () rounded up.

In step (4), the sample x and the clustering center Z_jThe distance between the two sensors is calculated by the following method: d_j＝||x-Z_j||。

In the scheme, in the step (4), the classification method of the sample x is as follows: if it is

Then the sample x is attributed to S_jAnd (4) class.

In the above scheme, in step (5), the specific method for canceling a certain category is: the cluster center is cancelled, so that the number of the cluster centers is N_CSubtracting 1, releasing the samples originally belonging to the category, calculating the distance between the released samples and other cluster centers, and classifying the released samples into which category the distance between the released samples and which cluster center is closest.

In the step (6), the clustering centers Z of all the categories are corrected_jThe specific method comprises the following steps:

the above scheme calculates each class S in step (7)_jAverage distance of the sample to the cluster center in (1)

The specific method comprises the following steps:

in step (8), the total average distance between all the category samples and the corresponding cluster center is calculated

The specific method comprises the following steps:

the above scheme calculates each class S in step (10)_jWhere each sample x ═ x_w,x_h) To the clustering center Z_j＝(w_j,h_j) Is a standard deviation vector of_j＝(σ_w,j,σ_h,j) The specific method comprises the following steps:

in step (11), the above scheme extracts sigma in each standard deviation vector_j＝(σ_w,j,σ_h,j) Maximum component σ of_jmaxThe specific method comprises the following steps:

σ_jmax＝max(σ_w,j,σ_h,j)

in step (12), Z is_jThe specific method for splitting into two new clustering centers is as follows: clustering S_jFrom medium samples x to cluster center Z_jHas a standard deviation of_j＝(σ_w,j,σ_h,j) If σ is_w,j≥σ_h,jLet γ be (σ)_w,j0); if σ_w,j<σ_h,jLet γ equal to (0, σ)_h,j)。Z_jIs split offThe two new cluster centers of (a) are: z_j+kγ and Z_j-k γ, wherein 0<k<1。

In the above scheme, in step (14), the specific method for merging the clustering centers is as follows: if the distance D between two nearest cluster centers_ij<θ_CCombining the corresponding categories of the two clustering centers into one category, canceling the clustering center status of the two clustering centers, and recalculating the clustering centers of the samples released from the two categories

And make the number of clustering centers N_CMinus 1.

The above scheme obtains the maximum area in step (16)

The specific method comprises the following steps:

wherein ,

drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a schematic diagram of the dimensions of a template frame.

Detailed Description

The invention is further illustrated by the following figures and examples.

Examples

(7) For each category S_jCalculating the average distance of the samples in each category to the cluster center

j＝1,2,…,N_C。

(9) Judging the class S_jSplit, merge and iterative operations.

(1)

and N is_j>(θ_N+1)*2；

(2) N_C≤K/2；

(14) If the distance D between two nearest cluster centers_ij<θ_CThen cluster the twoCombining the centers into a new cluster center, combining the corresponding categories of the two cluster centers into a category, and counting the number N of the cluster centers_CSubtracting 1; otherwise, no processing is performed.

To N_CAn

Sorting from small to large to obtain the maximum upper limit area

(17) According to

Judging the number L of output layers_out：

If it is

The number of output layers is L_out＝6；

If it is

The number of output layers is L_out＝5；

If it is

The number of output layers is L_out＝4；

If it is

The number of output layers is L_out＝3；

If it is

The number of output layers is L_out＝2。

if L is_outNo pruning is done for the SSD network 6.

If A_j>(300/3)²Designing a corresponding template box at the conv11_2 layer;

(20) Determining the corresponding template frame size of each cluster center:

max_size＝max(w_j,h_j)

where floor () is rounded down.

(22) A template frame ratio of aspect _ ratio 1 is added to all output layers.

In the scheme, in the step (3), K samples are randomly selectedThe method comprises generating K random numbers α with uniform distribution between 0 and 1 according to U (0,1)₁,α₂,…,α_KTake the ceil (a)_iN) samples as the ith initial cluster center, with ceil () rounded up.

Then the sample x is attributed to S_jAnd (4) class.

The specific method comprises the following steps:

The specific method comprises the following steps:

σ_jmax＝max(σ_w,j,σ_h,j)

in step (12), Z is_jThe specific method for splitting into two new clustering centers is as follows: clustering S_jFrom medium samples x to cluster center Z_jHas a standard deviation of_j＝(σ_w,j,σ_h,j) If σ is_w,j≥σ_h,jLet γ be (σ)_w,j0); if σ_w,j<σ_h,jLet γ equal to (0, σ)_h,j)。Z_jThe two new cluster centers split are respectively: z_j+kγ and Z_j-k γ, wherein 0<k<1。

And make the number of clustering centers N_CMinus 1.

The above scheme obtains the maximum area in step (16)

The specific method comprises the following steps:

wherein ,

fig. 1 shows a specific implementation process of the SSD network output layer clipping and template frame size determining method based on self-organizing clustering according to the present invention.

In fig. 1, "extracting width and height of a target in training data as a feature vector sample" corresponds to step (1):

all the training data are 1000 pictures in total, all the targets in all the pictures are traversed, the values < xmin >, < ymin >, < xmax >, < ymax > in the target marking information < bndbnbdbox > node are read, the width w of the target is xmax-xmin +1, the height h is ymax-ymin +1, the (w, h) is used as a two-dimensional feature vector sample x to be recorded, the subsequent operation is carried out, and the number N of the feature vectors is recorded as 1858 in the embodiment.

"parameter initialization" in fig. 1 corresponds to step (2):

in this embodiment, the initial cluster center number K is set to 6, and the minimum number of samples θ in the cluster_N80, standard deviation θ of sample distance distribution in cluster_S5, minimum distance θ between two cluster centers_CMaximum number of iterations I, 5_max＝100。

The step (3) of "randomly selecting initial clustering centers" in fig. 1 corresponds to:

generating K random numbers α in a uniform distribution between 0 and 1₁,α₂,…,α_KTake the ceil (a)_iN) samples as the ith initial clustering center, wherein ceil () is rounded up, let N_CK, the number of iterations I is 1.

"samples are sorted by minimum distance criterion" in fig. 1 corresponding to step (4):

traversing all the samples x, calculating the sample x and the clustering center Z_jA distance D between_j＝||x-Z_jAnd | l, classifying the sample x into a category corresponding to the cluster center with the minimum distance from the sample x.

The "cancel category with too small number of samples" in fig. 1 corresponds to step (5):

if a certain class S_jNumber of middle samples N_j<θ_NThen cancel the category and make the current cluster center number N_CReducing 1, releasing samples originally belonging to the category, calculating the distance between the released samples and other cluster centers, and classifying the released samples into which category when the distance between the released samples and which cluster center is closest; otherwise to class S_jNo treatment is done.

"correcting cluster center" in fig. 1 corresponds to step (6):

for each category S_jAll samples in (1) are averaged to obtain an average value which is the corrected clustering center

j＝1,2,…,N_c。

In fig. 1, "calculating the average distance from the samples in each class to the cluster center" corresponds to step (7):

calculate each class S_jSample to cluster center Z in_jAverage distance of

"calculate the total average distance of all class samples from their respective cluster centers" in fig. 1 corresponds to step (8):

calculating the total average distance between all the class samples and the corresponding cluster centers

In fig. 1, "splitting, merging, and iterative operations for judgment category" corresponds to step (9):

judging the splitting, merging and iterative operation of the category, judging whether the current state needs to be split, if so, jumping to the step (10), and if not, jumping to the step (13), wherein the specific judgment method comprises the following steps:

In fig. 1, "calculating the standard deviation of each class sample to the cluster center" corresponds to step (10):

calculate each class S_jWhere each sample x ═ x_w,x_h) To the clustering center Z_jIs a standard deviation vector of_j＝(σ_w,j,σ_h,j) The specific method comprises the following steps:

the "obtaining the largest component in standard deviation" described in fig. 1 corresponds to step (11):

extracting sigma in each standard deviation vector_j＝(σ_w,j,σ_h,j) Maximum component σ of_jmax＝max(σ_w,j,σ_h,j)。

The "class splitting for classes satisfying the splitting condition" described in fig. 1 corresponds to step (12):

if each class S_jHas a in_jmax>θ_SAnd either of the following two conditions is satisfied:

(1)

and N is_j>(θ_N+1)*2；

(2)N_C≤K/2；

Then Z will be_jSplit into two new cluster centers, class S_jFrom medium samples x to cluster center Z_jHas a standard deviation of_j＝(σ_w,j,σ_h,j) If σ is_w,j≥σ_h,jLet γ be (σ)_w,j0); if σ_w,j<σ_h,jLet γ equal to (0, σ)_h,j)。Z_jTwo new clustering centers which are split are respectively Z_j+kγ and Z_j-k γ, wherein 0<k<1, in this embodiment, k is equal to 0.5, and let the number of clustering centers N_CAdding 1; otherwise, the clustering center Z is not aligned_jThe operation proceeds to step (13).

And (4) after the splitting operation is finished, adding 1 to the iterative operation times I, and returning to the step (4).

The step (13) of calculating the distance between every two clustering centers described in fig. 1 corresponds to:

calculating N_CDistance D between every two clustering centers_ij＝||Z_i-Z_j||，i＝1,2,…,N_C-1，j＝i+1,2,…,N_C。

The "merge classes satisfying the merge condition" described in fig. 1 corresponds to step (14):

if the distance D between two nearest cluster centers_ij<θ_CCombining the corresponding categories of the two clustering centers into a category, combining the two clustering centers into a new clustering center, canceling the clustering center positions of the two clustering centers, and recalculating the clustering centers of the two released samples

And make the number of clustering centers N_CMinus 1.

The "judge whether the iteration is finished" in fig. 1 corresponds to step (15):

if the iterative operation times I is more than or equal to I_maxAnd (5) ending the clustering operation, entering the step (16), and otherwise, returning to the step (4) and adding 1 to the iterative operation times I.

The "calculating the maximum upper limit area of the cluster center" described in fig. 1 corresponds to step (16):

to N_CIndividual cluster center vector (w)_j,h_j) Calculating the upper limit area

To N_CAn

Sorting from small to large to obtain the maximum upper limit area

In this embodiment, 7 cluster centers are obtained at the end of clustering, which are respectively (10.9,28.7), (27.2,12.1), (9.8,4.5), (6.8,11.4), (13.9,21.7), (19.7,15.4), (11.9,10.2), and the upper limit area of each cluster center is calculated

So that the maximum upper limit area is

The "judgment of the number of output layers" in fig. 1 corresponds to step (17):

according to

Judging the number L of output layers_out

If it is

The number of output layers is L_out＝6；

If it is

The number of output layers is L_out＝5；

If it is

The number of output layers is L_out＝4；

If it is

The number of output layers is L_out＝3；

If it is

The number of output layers is L_out＝2。

In this example

So that the number of output layers L_out＝3。

The "pruning SSD network" described in fig. 1 corresponds to step (18):

there are 6 output layers in the SSD algorithm, conv4_3, fc7, conv8_2, conv9_2, conv10_2, conv11_ 2.

if L is_outOnly conv4_3, fc7, conv8_2, conv9_2, conv10_2 are reserved when the conv10_2 is deleted when the conv is 5The convolutional layer of (1);

if L is_outNo pruning is done for the SSD network 6.

In this example L_outTherefore, only conv4_3, fc7, conv8_2 output layers are reserved, and convolutional layers after conv8_2 are deleted.

The step (19) corresponding to "determining the output layer where the corresponding template box is located for each cluster center" described in fig. 1:

for each clustering center Z_j＝(w_j,h_j) Calculating the area A_j＝w_j×h_j，

If A_j>(300/3)²Designing a corresponding template box at the conv11_2 layer;

The cluster centers in this example are (10.9,28.7), (27.2,12.1), (9.8,4.5), (6.8,11.4), (13.9,21.7), (19.7,15.4), (11.9, 10.2); the area of each cluster center is A₁＝312.83，A₂＝329.12，A₃＝44.10，A₄＝77.52，A₅＝301.63，A₆＝303.38，A₇121.38, the corresponding template frames are located in conv8_2 layer, conv8_2 layer, conv4_3 layer, fc7 layer, conv8_2 layer, conv8_2 layer and fc7 layer.

The "determine their respective template box size for each cluster center" correspondence step (20) described in fig. 1:

N_Cindividual clustering center Z_j＝(w_j,h_j) The corresponding template frame dimensions are:

max_size＝max(w_j,h_j)

where floor () is rounded down.

In this example, the cluster centers are (10.9,28.7), (27.2,12.1), (9.8,4.5), (6.8,11.4), (13.9,21.7), (19.7,15.4), (11.9,10.2), respectively. The sizes of the template frames corresponding to the clustering centers are calculated as follows:

clustering center 1: min _ size 17.7; max _ size 28.7; aspect _ ratio is 2

Cluster center 2: min _ size 18.1; max _ size 27.2; aspect _ ratio is 2

Clustering center 3: min _ size ═ 6.6; max _ size 9.8; aspect _ ratio is 2

Cluster center 4: min _ size ═ 8.8; max _ size 11.4; aspect _ ratio is 1

Cluster center 5: min _ size 17.3; max _ size ═ 21.7; aspect _ ratio is 1

Cluster center 6: min _ size 17.4; max _ size 19.7; aspect _ ratio is 1

The clustering center 7: min _ size ═ 11.0; max _ size 11.9; aspect _ ratio is 1

The "outputting layer for which the template box has not been designed as described in fig. 1 relates to the template box" corresponding step (21):

In the present practical example, the conv4_3 layer, fc7 layer and conv8_2 layer output layer all have responsive template frames, so the present step operation is not performed.

conv4_3 layer: min _ size ═ 6.6; max _ size 9.8; aspect _ ratio is 2

fc7 layer: min _ size ═ 8.8; max _ size 11.4; aspect _ ratio is 1

min_size＝11.0；max_size＝11.9；aspect_ratio＝1

conv8_2 layer: min _ size 17.7; max _ size 28.7; aspect _ ratio is 2

min_size＝18.1；max_size＝27.2；aspect_ratio＝2

min_size＝17.3；max_size＝21.7；aspect_ratio＝1

min_size＝17.4；max_size＝19.7；aspect_ratio＝1

The "template frame ratio with aspect _ ratio of 1 added to all output layers" described in fig. 1 corresponds to step (22):

in the present embodiment, the conv4_3 layer has no template frame proportion of "1" and thus "1" is added; the fc7 layer and conv8_2 layer both have a template frame ratio of 1 and therefore do not need to be added.

The network structure determined at the end of the scheme reserves conv4_3, fc7 and conv8_2 output layers for the SSD network, deletes the convolution layer after conv8_2, and the template frame design of each output layer is as follows:

conv4_3 layer:

min_size＝6.6

max_size＝9.8

aspect_ratio＝1,2

fc7 layer:

min_size＝8.8,11.0

max_size＝11.4,11.9

aspect_ratio＝1

conv8_2 layer:

min_size＝17.7,18.1,17.3,17.4

max_size＝28.7,27.2,21.7,19.7

aspect_ratio＝1,2

a schematic of the dimensions of the template frame is shown in fig. 2.

Before the SSD network is modified, the network converges to the MAP of 0.9 which needs to be iterated 35000 times, and after the SSD network is modified, the network converges to the MAP of 0.9 which only needs 23000 times, which shows that the scheme can remove unnecessary output layers, reduce the network depth, reduce the network complexity and reduce the model training convergence difficulty. Before the SSD network is modified, the time consumed by network calculation is 29ms, and after the SSD network is modified, the time consumed by network calculation is 20ms, so that the calculation efficiency is improved.

The invention uses the self-organizing clustering algorithm to perform clustering analysis on the sizes of the training samples, determines the number of SSD network output layers according to clustering results, and cuts the network.

The self-organizing clustering is used for obtaining a better clustering result under the condition that the size distribution of the target is uncertain, the clustering result is used for calculating the upper limit area of the target, the number of layers of an output layer is determined, the output layers with overlarge receptive field and overlarge number of layers are deleted, the network depth and the number of parameters are reduced, the difficulty of model training is reduced, the convergence of the model is accelerated, the generalization capability of the model is improved, the time consumed by calculation is reduced, and the calculation efficiency is improved.

The invention uses the self-organizing clustering algorithm to perform clustering analysis on the sizes of the training samples, and determines the size of the template frame according to the clustering result.

The size of the template frame is designed by using the self-organizing clustering result, so that the size of the template frame is closer to the real size of the target, the regression difficulty of the network on the position deviation of the target is reduced, and the accuracy of target detection is improved.

Claims

1. A neural network output layer cutting and template frame size determining method based on self-organizing clustering is characterized in that the method comprises the following steps:

(3) Randomly selecting K samples from N samples as initial clustering centers, and enabling N samples to be selected as initial clustering centers_C＝K，N_CRepresenting the number of current cluster centers, each cluster center being represented by Z_jDenotes, j ═ 1,2, …, N_CS for the category corresponding to each cluster center_jDenotes, j ═ 1,2, …, N_CClass S_jThe number of samples in (1) is N_jDenotes, j ═ 1,2, …, N_CThe iterative operation times are represented by I;

(4) traversing all the samples x, calculating the samples x and each clustering center Z_jA distance D between_jClassifying the sample x into a category corresponding to the clustering center with the minimum distance to the sample x;

(5) if a certain class S_jNumber of middle samples N_j<θ_NThen cancel the category and make the current cluster center number N_CReducing by 1, and classifying the samples in the category into other categories according to the minimum distance criterion in (4); otherwise not to class S_jProcessing;

(6) for each category S_jThe average value of the samples x is the corrected clustering center Z_j，j＝1,2,…,N_C；

j＝1,2,…,N_C；

(9) Judging the class S_jIs divided intoSplitting, merging and iterative operation;

1) if the iterative operation times I is more than or equal to I_maxI.e. the last iteration, by_CJumping to the step (13) when the value is 0;

2) if theta is greater than theta_NK/2 or less, namely the number of the clustering centers is equal to or less than half of the specified value, entering the step (10) and splitting the existing clusters;

3) if the number of iterations I is even, or N_CIf the K is more than or equal to 2K, the splitting treatment is not carried out, and the step (13) is skipped; if I is odd, and N_C<2K, entering the step (10) to perform splitting treatment;

(10) calculate each class S_jFrom medium samples x to cluster center Z_jIs a standard deviation vector of_j，j＝1,2,…,N_C；

(11) For the standard deviation vector sigma calculated in (10)_jExtracting the maximum component by sigma_jmaxDenotes, j ═ 1,2, …, N_C；

(12) Set of maximum components σ_jmaxWhere j is 1,2, …, N_CIf there is σ_jmax>θ_SAnd either of the following two conditions is satisfied:

(a)

and N is_j>(θ_N+1)*2；

(b)N_C≤K/2；

Then Z will be_jSplitting into two new cluster centers, and counting the number N of the cluster centers_CAdding 1, jumping back to the step (4) after the splitting operation is completed, and adding 1 to the iterative operation frequency I; otherwise, the clustering center Z is not aligned_jPerforming operation, and entering the step (13);

(13) calculating N_CDistance D between every two clustering centers_ij＝||Z_i-Z_j||，i＝1,2,…,N_C-1，j＝i+1,2,…,N_C；

(14) If the distance D between two nearest cluster centers_ij<θ_CThen the two cluster centers are merged intoA new cluster center, the corresponding categories of the two cluster centers are merged into a category, and the number N of the cluster centers is set_CSubtracting 1; otherwise, no processing is carried out;

(15) if the iterative operation times I is more than or equal to I_maxAfter the clustering operation is finished, entering the step (16), otherwise returning to the step (4) and adding 1 to the iterative operation times I;

To N_CAn

Sorting from small to large to obtain the maximum upper limit area

(17) According to

Judging the number L of output layers_out：

If it is

The number of output layers is L_out＝6；

If it is

The number of output layers is L_out＝5；

If it is

The number of output layers is L_out＝4；

If it is

The number of output layers isL_out＝3；

If it is

The number of output layers is L_out＝2。

if L is_outNo pruning is performed on the SSD network if 6;

If A_j>(300/3)²Designing a corresponding template box at the conv11_2 layer;

(20) Determining the corresponding template frame size of each cluster center:

max_size＝max(w_j,h_j)

wherein floor () is rounded down;

if a certain output Layer is not designed with a corresponding template frame, the min _ size, max _ size and aspect _ ratio parameters of the Layer closest to the output Layer are adopted, if two output layers are the same distance away from the Layer, the shallow Layer is Layer_BDeep Layer is Layer_TThen the output layer parameter

Layer adopted by aspect _ ratio_BAspect _ ratio of_B and Layer_TAspect _ ratio of_TA union of (1);

(22) adding a template frame proportion of aspect _ ratio equal to 1 to all output layers;

(23) and training the convolutional neural network which finishes the cutting of the output layer and the determination of the size of the template frame to obtain a neural network model.

2. The method for neural network output layer clipping and template box size determination based on self-organizing clustering as claimed in claim 1, wherein: in the step (1), the specific method for extracting the width w and the height h of the target in the labeling information comprises the following steps: reading the values < xmin >, < ymin >, < xmax >, < ymax > in each < bndbox > node in xml, and calculating the width w ═ xmax-xmin +1 and the height h ═ ymax-ymin +1 of the target.

3. The method for neural network output layer clipping and template box size determination based on self-organizing clustering as claimed in claim 1, wherein in the step (3), K samples are randomly selected by generating K random numbers α according to the uniform distribution of U (0,1) between 0 and 1₁,α₂,…,α_KTake the ceil (a)_iN) samples as the ith initial cluster center, with ceil () rounded up.

4. The method for neural network output layer clipping and template box size determination based on self-organizing clustering as claimed in claim 1, wherein: in the step (4), the sample x and the clustering center Z_jThe distance between the two sensors is calculated by the following method: d_j＝||x-Z_jThe classification method of the sample x is as follows: if it is

Then the sample x is attributed to S_jAnd (4) class.

5. The method for neural network output layer clipping and template box size determination based on self-organizing clustering as claimed in claim 1, wherein: in the step (5), the specific method for canceling a certain category is as follows: the cluster center is cancelled, so that the number of the cluster centers is N_CSubtracting 1, releasing the sample originally belonging to the category, calculating the distance between the released sample and other cluster centers, and classifying the released sample into the category when the distance between the released sample and which cluster center is the nearest.

6. The method of claim 1 for neural network output layer clipping and template box size determination based on self-organizing clusteringThe method is characterized in that: in the step (6), modifying the clustering center Z of each category_jThe specific method comprises the following steps:

in step (7), each class S is calculated_jAverage distance of the sample to the cluster center in (1)

The specific method comprises the following steps:

in the step (8), the total average distance between all the category samples and the corresponding cluster center is calculated

The specific method comprises the following steps:

7. the method for neural network output layer clipping and template box size determination based on self-organizing clustering as claimed in claim 1, wherein: in step (10), each class S is calculated_jWhere each sample x ═ x_w,x_h) To the clustering center Z_j＝(w_j,h_j) Is a standard deviation vector of_j＝(σ_w,j,σ_h,j) The specific method comprises the following steps:

8. the method of claim 1 for neural network output layer clipping and template box sizing based on self-organizing clustering, whereinIs characterized in that: in step (11), σ in each standard deviation vector is extracted_j＝(σ_w,j,σ_h,j) Maximum component σ of_jmaxThe specific method comprises the following steps:

σ_jmax＝max(σ_w,j,σ_h,j)。

9. the method for neural network output layer clipping and template box size determination based on self-organizing clustering as claimed in claim 1, wherein: in step (12), Z is_jThe specific method for splitting into two new clustering centers is as follows: clustering S_jFrom medium samples x to cluster center Z_jHas a standard deviation of_j＝(σ_w,j,σ_h,j) If σ is_w,j≥σ_h,jLet γ be (σ)_w,j0); if σ_w,j<σ_h,jLet γ equal to (0, σ)_h,j)。Z_jThe two new cluster centers split are respectively: z_j+kγ and Z_j-k γ, wherein 0<k<1。

10. The method for neural network output layer clipping and template box size determination based on self-organizing clustering as claimed in claim 1, wherein: in the step (14), the specific method for merging the clustering centers comprises the following steps: if the distance D between two nearest cluster centers_ij<θ_CCombining the corresponding categories of the two clustering centers into one category, canceling the clustering center status of the two clustering centers, and recalculating the clustering centers of the samples released from the two categories

And make the number of clustering centers N_CSubtracting 1;

in the step (16), the maximum area is obtained

The specific method comprises the following steps:

wherein ,