CN114999183B

CN114999183B - Traffic intersection vehicle flow detection method

Info

Publication number: CN114999183B
Application number: CN202210598379.5A
Authority: CN
Inventors: 朱金荣; 曹海涛; 张梦; 张剑云
Original assignee: Yangzhou University
Current assignee: Yangzhou University
Priority date: 2022-05-30
Filing date: 2022-05-30
Publication date: 2023-10-31
Anticipated expiration: 2042-05-30
Also published as: CN114999183A

Abstract

The application discloses a traffic intersection traffic flow detection method, which comprises the steps of collecting traffic intersection traffic flow images, constructing a vehicle data set and dividing the data set; preprocessing data and constructing a traffic flow detection network model; the traffic flow detection network model comprises an input end, a feature extraction network, a feature fusion network and an output end; training the traffic flow detection network model to obtain a weight file, and detecting traffic flow; according to the application, by improving the preprocessing method at the input end, the network structure is changed in the feature extraction network, the parameters are reduced, the loss function is modified at the output end, the precision is improved, and better instantaneity is achieved.

Description

Traffic intersection vehicle flow detection method

Technical Field

The application relates to the technical field of traffic flow detection, in particular to a traffic intersection traffic flow detection method.

And under the traffic intersection monitoring visual angle, the vehicles are detected from near to far and from large to small in multiple scales, and the vehicles are shielded and detected under the visual angle. Aiming at the problems, the application uses random erasure and Mosaic data enhancement to increase the robustness of the network model to shielding under the condition of vehicle detection in the preprocessing stage, improves the network structure of the resnet residual error in the feature extraction network, uses resnext and CBAM attention mechanism to improve the network detection precision, and suppresses useless information near a smaller target at a far distance of an image through the attention mechanism to highlight the self features. In the prediction stage, the rectangular frame loss is improved, so that the positioning accuracy of the detection frame is higher, the detection frame is more in line with the size of a model of the vehicle, and the condition that the adjacent target frame is filtered due to overlarge detection of the vehicle is avoided.

Background

The urban motor vehicle is increased year by year due to the continuous development of the urban scale, and according to statistics of the ministry of public security, 3214 thousands of motor vehicles are registered in 2019 nationally only, and the national motor vehicle has 3.48 hundred million. Congestion due to accidents, complex traffic scenes or abnormal weather is also becoming more common. Meanwhile, huge mass data information brought by vehicles and traffic roads provides basic information resources for the construction of the intelligent traffic system of the city, urgent demands for the system construction are also quickened, and real-time traffic flow statistics of the city roads is the most basic link in the intelligent traffic system. Various traffic control decisions need to rely on statistical information of the number of vehicles at each key road node, or the number of vehicles in the current scene at a certain moment, or the number of vehicles passing through a certain road in the scene in a certain time period. Only if the traffic flow statistical information is possessed, various traffic branches can be managed and controlled, and the subsequent treatment of various traffic accidents can be reasonably arranged. Meanwhile, real-time road condition display in various navigation software also needs to rely on the prepositive information obtained by traffic flow statistics. And it is one of the evaluation criteria reflecting the prosperity of an area, and the statistical information of the traffic flow of each scene area can also provide reference for urban road planning.

The flow counting monitoring system is required to monitor and analyze by manpower or to detect by a special sensor, so that the manpower monitoring is time-consuming and low in efficiency, and the sensor detection is required to support various detection sensors by vehicles, so that the popularization is difficult. With the development of machine vision technology, various monitoring video analyses by means of machine vision, such as detecting moving objects by a frame difference method and counting traffic flow, are gradually derived. However, the method has poor robustness, and the detection area is defined manually according to different scenes, so that the universality is weak. The method is based on a machine vision algorithm yolov5, has high real-time performance and high detection speed, and can well finish traffic flow identification.

Disclosure of Invention

This section is intended to summarize some aspects of embodiments of the application and to briefly introduce some preferred embodiments, which may be simplified or omitted in this section, as well as the description abstract and the title of the application, to avoid obscuring the objects of this section, description abstract and the title of the application, which is not intended to limit the scope of this application.

The present application has been made in view of the above and/or problems occurring in the prior art.

Therefore, the technical problem to be solved by the application is that under the traffic intersection monitoring view angle, the vehicles are detected from near to far and from large to small in multiple scales, and the vehicles are shielded and detected under the view angle.

In order to solve the technical problems, the application provides the following technical scheme: a traffic intersection traffic flow detection method comprises,

collecting traffic flow images of traffic intersections, constructing a vehicle data set, and dividing the data set;

preprocessing data and constructing a traffic flow detection network model;

the traffic flow detection network model comprises an input end, a feature extraction network, a feature fusion network and an output end;

and training the traffic flow detection network model to obtain a weight file, and detecting the traffic flow.

As a preferable scheme of the traffic intersection traffic flow detection method, the application comprises the following steps: the constructing of the vehicle dataset includes,

taking the traffic flow image on the road at fixed time by using the traffic intersection camera, and storing the traffic flow image;

labeling the stored images to obtain the positions of the vehicles in each image, and generating a labeling file; the image and the corresponding annotation file are processed according to 8:1:1 is divided into a training set, a testing set and a verification set, and a complete vehicle detection data set is obtained.

As a preferable scheme of the traffic intersection traffic flow detection method, the application comprises the following steps: the method for preprocessing the input end of the network model during training comprises the following steps:

processing the picture by combining mosaic data enhancement with a random erasure method;

slicing the picture through Focus;

the number of channels is multiplied by 4.

As a preferable scheme of the traffic intersection traffic flow detection method, the application comprises the following steps: the random erase method comprises the following steps:

setting upper and lower thresholds of the erasing area and randomly selecting rectangular areas;

obtaining the width and the height of a rectangular area;

randomly selecting a point coordinate in the selected random area, and generating a rectangular frame according to the calculated width and height;

and judging whether the rectangular box is in the original drawing or not, otherwise, repeating the operation.

As a preferable scheme of the traffic intersection traffic flow detection method, the application comprises the following steps: the setting the upper and lower thresholds of the erasing area includes:

randomly selecting a rectangular region P from a picture P with a size W multiplied by H in a training set _e The random value is adopted to replace the inside of the device, setting upper and lower thresholds of rectangular length-width ratio, namely:

P _e ＝P×random(p _l ,p _h )

r _e ＝random(r ₁ ,r ₂ )

obtaining P _e Width of part W _e And height H _e ：

As a preferable scheme of the traffic intersection traffic flow detection method, the application comprises the following steps: the slicing operation may include the steps of,

taking a value at every other pixel in a picture, similar to adjacent downsampling, to obtain four pictures;

the four pictures are complementary, and W, H information is concentrated into a channel space;

the input channels are expanded by 4 times, namely, the spliced pictures are changed into 12 channels relative to the original RGB three-channel mode;

and carrying out convolution operation on the obtained new image to obtain a double downsampling characteristic image under the condition of no information loss.

As a preferable scheme of the traffic intersection traffic flow detection method, the application comprises the following steps: enhancing features in a feature extraction network using a resnext structure;

the CBAM attention mechanism is used for suppressing useless information and reinforcing useful information;

the resnext structure includes that,

the characteristic diagram output after the convolution operation of the input channel and the 1 multiplied by 1 convolution kernel of half the number of channels is divided into 32 groups, and each part is x _i I e {1,2,3,., 32}, each x _i Corresponding to a convolution kernel of 3×3, performing aggregation operation on the output, and performing residual connection with the original output.

As a preferable scheme of the traffic intersection traffic flow detection method, the application comprises the following steps: suppressing garbage with CBAM attention mechanisms, enhancing useful information, includes:

respectively carrying out global pooling and average pooling on the input, obtaining a compressed characteristic diagram, adding the compressed characteristic diagram, and multiplying the compressed characteristic diagram by a weight coefficient between 0 and 1 of the Sigmoid function;

multiplying the obtained feature map with corresponding elements input into corresponding tensors to obtain a new feature map;

carrying out maximum pooling and average pooling on the new feature images in the channel dimension respectively, splicing the two feature images in the channel dimension, and then reducing the dimension through a convolution layer;

multiplying the obtained product by a weight coefficient between 0 and 1 of the Sigmoid function;

multiplying the obtained feature map with corresponding elements input into corresponding tensors to obtain a final required feature map;

as a preferable scheme of the traffic intersection traffic flow detection method, the application comprises the following steps: usage basis IoU _loss E-IoU of (E-IoU) _loss As a loss function;

calculating a loss value of the output end by using the loss function;

in E-IoU _loss In (3) it is necessary to calculate the overlap loss, center distance loss, and width-height loss.

As a preferable scheme of the traffic intersection traffic flow detection method, the application comprises the following steps: comprising the following steps:

overlap loss, i.e., cross-over ratio. The ratio of the area of the intersecting area of the detection frame and the labeling frame in the data set to the area of the merging area is indicated;

wherein: a is a target frame area, and B is a detection frame area.

Center distance loss, i.e. target frame center point b _A And detection frame b _B The square of the difference between the center points and the minimum bounding box diagonal ratio c is expressed as:

width and height loss, which means the width and height W of the detection frame _B ,H _B Marking frame width and height W in data set _A ,H _A Square of the difference to the minimum circumscribed frame ratio.

EIOU _loss ＝1-IOU+dis _loss +asp _loss

The application has the beneficial effects that: according to the application, by improving the preprocessing method at the input end, the network structure is changed in the feature extraction network, the parameters are reduced, the loss function is modified at the output end, the precision is improved, and better instantaneity is achieved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

FIG. 1 is an overall flow chart of the present application;

fig. 2 is a schematic diagram of a random erase method at a preprocessing stage.

Fig. 3 is an overall structural diagram of the feature extraction network modification section.

Fig. 4 is a partial block diagram of the attention mechanism in the feature extraction network.

FIG. 5 is IoU _loss And EIoU _loss Schematic comparison diagram.

Fig. 6 is a comparison of detection results based on a monitoring camera scene.

FIG. 7 is a graph comparing the mAP values of the Yolov5 model with those of Yolov5-CD training.

Detailed Description

In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the application will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application, but the present application may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present application is not limited to the specific embodiments disclosed below.

In the following detailed description of the embodiments of the present application, reference is made to the accompanying drawings, which form a part hereof, and in which are shown by way of illustration only, and in which is shown by way of illustration only, and in which the scope of the application is not limited for ease of illustration. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.

Further still, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic may be included in at least one implementation of the application. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

Example 1

Referring to fig. 1 and 2, the present embodiment provides a traffic intersection traffic flow detection method, including,

s1: collecting traffic flow images of traffic intersections, constructing a vehicle data set, and dividing the data set; the construction of the vehicle data set specifically includes:

According to the application, a yolov5-CD (car detect) network model is constructed, so that the detection of traffic flow at the traffic intersection monitoring view angle is realized, and the multi-scale detection capability of the network to a large target at a near position and a small target at a far position at the monitoring view angle and the detection effect when vehicles are mutually shielded are enhanced.

S2: the method comprises the steps of preprocessing data, constructing a traffic flow detection network model, preprocessing the input end of the network during training, and under the condition of traffic intersection monitoring view angles, the vehicles inevitably generate the condition of mutual shielding, and the vehicles are in a plurality of scales from the near to the far from the large to the small. The method for enhancing the Mosaic data in the original yolov5 network model can improve the detection capability of the network to the small target by splicing four pictures in the data set into one picture through zooming, so that in the yolov5-CD network model, the Mosaic data is enhanced and immediately erased, partial areas in the target frame are erased, the occlusion situation is simulated, and the robustness of the network to occlusion vehicle detection is improved, and the method specifically comprises the following steps:

slicing the picture through Focus;

the number of channels is multiplied by 4.

The random erase method comprises the following steps:

setting upper and lower thresholds of the erasing area and randomly selecting rectangular areas; obtaining the width and the height of a rectangular area; randomly selecting a point coordinate in the selected random area, and generating a rectangular frame according to the calculated width and height; and judging whether the rectangular box is in the original drawing or not, otherwise, repeating the operation.

The mosaic data enhancement and random erasure method specifically comprises the following steps:

randomly selecting a rectangular region P from a picture P with a size W multiplied by H in a training set _e The interior of which is replaced by a random value.

Referring to fig. 1, a specific flow of a random erasing method is shown, and the process mainly includes:

setting upper and lower thresholds of the random erasing area to obtain a random erasing rectangular area, and setting upper and lower thresholds of the rectangular length-width ratio, namely:

P _e ＝P×random(p _l ,p _h )

r _e ＝random(r ₁ ,r ₂ )

obtaining P _e Width of part W _e And height H _e The method comprises the following steps:

randomly selecting a point coordinate in P (e _x ,e _y ) At this point a width W is generated _e High is H _e If the rectangular frame is still in the picture P, the area is considered to be a random erasure area P _e . If not, repeating the above steps.

Randomly cutting the four pictures;

and splicing the cut pictures to a picture to serve as training data.

The slicing operation includes the steps of,

S3: the traffic flow detection network model comprises an input end, a characteristic extraction network, a characteristic fusion network,

output end, specifically:

the feature extraction includes the steps of,

dividing the input into 32 low-dimensional tensors by a convolution of 1x 1;

performing 3×3 convolution operation on each branch, performing 1×1 convolution operation, aggregating all branches, and recovering to the original channel number;

added to the original input.

The input feature map F (H x W x C) is subjected to global maximum pooling and global average pooling based on width and height respectively to obtain two feature maps of 1x C, and then the two feature maps are respectively sent into a two-layer neural network (MLP), wherein the number of neurons in the first layer is C/r (r is a reduction rate), the activation function is Relu, the number of neurons in the second layer is C, and the two-layer neural networks are shared. And then, adding the features output by the MLP, generating final channel attention features through sigmoid activation operation, multiplying the final channel attention features by an input feature map, and generating the input features required by the spatial attention module.

And carrying out global maximum pooling and global average pooling on the input features in dimensions to obtain two H multiplied by W multiplied by 1 feature graphs, and then carrying out channel splicing operation on the 2 feature graphs based on the dimensions. Then the dimension is reduced by a 7 x 7 convolution operation, i.e., H x W x 1. And generating the spatial attention characteristic through a sigmoid activation function. And finally, multiplying the characteristic by the input characteristic of the module to obtain the finally generated characteristic.

It should be noted that the feature is enhanced using a resnext structure in the feature extraction network; the CBAM attention mechanism is used for suppressing useless information and strengthening useful information.

The resnext structure is used for replacing resnet in the feature extraction network, the accuracy of the model is improved under the condition that the network parameter orders are basically the same, and a CBAM attention mechanism is embedded behind the resnext structure, so that useless information nearby a small target vehicle at a distance under a monitoring view angle can be restrained, the features of the small target vehicle at the distance are highlighted, and the small target vehicle at the distance is prevented from being ignored due to being too small. The said reset includes the steps of,

Assuming 256 channels of input from the first convolutional layer, convolving with 128 convolution kernels of 1x1 size, 256 channels, to obtain 128 channels of output, dividing the outputs into 32 groups of channels, each group of output being 4 channels to the second convolutional layer,

the second convolution layer receives 128-channel inputs before ungrouping, convolves with 128 3x3 size, 128-channel convolution kernels, and finally obtains 128-channel outputs.

The third convolution layer receives the 128-channel input and convolves with 256 1x 1-sized, 128-channel convolution kernels to ultimately yield a 256-channel output.

And performing aggregation operation on the output of the third layer of convolution layer, and adding the aggregated output with the original 256-channel input to obtain an output characteristic.

Referring to fig. 3, a comparison of the resnet residual module (left) with the modified resnext module (right) is shown.

Suppressing garbage with CBAM attention mechanisms, enhancing useful information, includes:

the extracted features are input into a Neck for reinforcement, and the steps need to be described as follows:

the FPN high dimension transmits semantic information to the low dimension, the PAN low dimension transmits the semantic information to the high dimension once again, the deep feature map carries stronger semantic features, weaker positioning information, and the shallow feature map carries stronger position information and weaker semantic features.

The FPN transmits deep semantic features to the shallow layer, enhances semantic expression on multiple scales, and the PAN transmits shallow positioning information to the deep layer, thereby enhancing positioning capability on multiple scales.

S4: training the traffic flow detection network model to obtain a weight file, and detecting the traffic flow, specifically:

usage basis IoU _loss E-IoU of (E-IoU) _loss As a loss function; calculating a loss value of the output end by using the loss function; in E-IoU _loss In (3) it is necessary to calculate the overlap loss, center distance loss, and width-height loss.

Furthermore, during training, when the images in each training set pass through the network model, a detection frame is obtained at the prediction end, and the closer the detection frame is to the target frame during labeling, the better the performance of the network model is. The loss function is a function for calculating the difference between the detection frame and the target frame and enabling the target to be achieved through back propagationThe mark frame and the detection frame are closer in the next training. The yolov5 prototype model employed rectangular frame loss (IoU) _loss ) Confidence loss (BCE) _loss ) And classification loss (BCE) _losss ) As a function of its loss. In the yolov5-CD network model, include

Usage basis IoU _loss E-IoU of (E-IoU) _loss As rectangular frame loss, the problem that a network cannot be trained and gradient cannot be reduced caused by the fact that the intersection ratio between a detection frame and a target frame of a vehicle is 0 in the training process is avoided;

in E-IoU, the overlapping loss, the center distance loss and the width and height loss need to be calculated, so that the difference between the detection frame and the target frame is smaller, the detection frame is more in line with the vehicle size, the positioning accuracy is improved, and the situation that the target frame is filtered due to overlarge overlapping area between the target frame and other vehicle models due to the fact that the target frame is not in line with the vehicle size can be prevented to a certain extent.

Further, the overlapping loss, i.e. the intersection ratio, refers to the ratio of the area of the intersection region of the detection frame and the labeling frame in the dataset to the area of the merging region;

center distance loss refers to the square of the ratio of the Euclidean distance of the center points of the detection frame and the labeling frame in the data set to the diagonal length of the minimum external frame;

the width and height loss refers to the square of the ratio of the difference between the width and the height of the detection frame and the marking frame in the data set to the minimum external frame.

EIOU _loss ＝1-IOU+dis _loss +asp _loss

And combining the prediction layer to predict a target frame, and outputting to obtain a detection result, wherein the steps are as follows:

EIOU is used in Prediction _loss As a loss function of the target frame, ioU is reserved _Loss Weakening its drawbacks at the same time as the original properties of (a) IoU is used _Loss When the overlap of the target frame and the detection frame is calculated only as the loss function, the overlap ratio is 0 when the target frame and the detection frame are completely disjoint, the loss function has no gradient, and the superposition between the target frame and the detection frame cannot be reflected. And EIoU _Loss When the loss function is used, not only the cross-over ratio between the target frame and the detection frame is considered, but also the center distance loss and the width and height loss are considered, so that the situation that the loss function is degenerated into a IoU loss function under the condition that the target frame covers the detection frame is avoided.

Example 2

The method is used for detecting the traffic flow, and has good performance under the condition that the traffic flow is crowded and the character model is blocked.

Specifically, the method adopts an average Precision value (mAP) as an evaluation index of an algorithm, and the detected result in the experiment contains four types of TP (true positive), TN (true negative), FP (false positive), FN (false negative), and for the type P (traffic flow) detected by the algorithm, the Precision (Precision) of the type P on a single image is the ratio of the detected correct target number to the detected target number:

recall is the ratio of the correct number of targets to the total number of samples:

average accuracy value (mAP), which represents the average of the accuracy rates of all class identifications of the entire dataset:

yolov5 is a single-stage algorithm in target detection, and has strong performance, high detection speed and high precision, but under the conditions of shielding and crowding of targets, missed detection and false detection are easy to occur.

In order to verify that the yolov5-CD algorithm is crowded in the traffic flow compared with the original model of yolov5, the traffic flow can be detected more accurately under the shielding condition.

In the embodiment, the yolov5 original model and the yolov5-CD are adopted to respectively train and compare mAP values under a Wild Person data set, and images under the condition that shielding and the like possibly exist between vehicles are measured and compared in a traffic monitoring view angle.

The comparison data are as follows:

	Precision(％)	Recall(％)	mAP(％)	mAP@.5:0.95(％)
					Yolov5	92.3	93.8	96.9	80.4
Yolov5-CD	93.7	94.9	97.6	83.5

where Precision value represents Precision, recall value represents Recall, mAP value represents average Precision, mAP@5:0.95 value represents average mAP value over different cross ratio thresholds (from 0.5 to 0.95, step size 0.05) (0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.95).

The results are shown in FIG. 6, wherein the left side is the detection result of the yolov5 original model, and the right side is the detection result of the yolov5-CD model. It can be seen that the left taxi is missed in detection in fig. 6 (1), and the electric vehicle is also misdetected as a car, while the interference of the electric vehicle is eliminated in fig. 6 (1), and no missed detection occurs. The right side of fig. 6 (2) (3) is to successfully detect the small target vehicle which is not detected at the left far end of fig. 6 (2) (3).

FIG. 7 is a map value curve of yolov5 and yolov5-CD in the training phase, and it can be seen that the convergence rate of yolov5-CD is faster than that of the original model of yolov5, and there is a certain improvement in the map value.

It should be noted that the above embodiments are only for illustrating the technical solution of the present application and not for limiting the same, and although the present application has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present application may be modified or substituted without departing from the spirit and scope of the technical solution of the present application, which is intended to be covered in the scope of the claims of the present application.

Claims

1. A traffic intersection traffic flow detection method is characterized in that: comprising the steps of (a) a step of,

preprocessing data and constructing a traffic flow detection network model;

training the traffic flow detection network model to obtain a weight file, and detecting traffic flow;

the constructing of the vehicle dataset includes,

labeling the stored images to obtain the positions of the vehicles in each image, and generating a labeling file; dividing the image and the corresponding annotation file into a training set, a test set and a verification set according to the ratio of 8:1:1, and obtaining a complete vehicle detection data set;

the method for preprocessing the input end of the network model during training comprises the following steps:

slicing the picture through Focus;

the number of channels is multiplied by 4;

the random erase method comprises the following steps:

obtaining the width and the height of a rectangular area;

judging whether the rectangular frame is in the original drawing or not, otherwise repeating the operation;

the setting the upper and lower thresholds of the erasing area includes:

P _e ＝P×random(p _l ，p _h )

r _e ＝random(r ₁ ，r ₂ )

obtaining P _e Width of part W _e And height H _e ：

The slicing operation may include the steps of,

carrying out convolution operation on the obtained new image to obtain a double downsampling characteristic image under the condition of no information loss;

enhancing features in a feature extraction network using a resnext structure;

the resnext structure includes that,

the method comprises the steps of (1) uniformly dividing a characteristic diagram output after convolution operation is carried out on an input channel and 1×1 convolution kernels with half the number of channels into 32 groups, wherein each part is xi, i epsilon {1,2, 3..32 }, each xi corresponds to one 3×3 convolution kernel, carrying out aggregation operation on the output, and carrying out residual connection with the original output;

suppressing garbage with CBAM attention mechanisms, enhancing useful information, includes: respectively carrying out global pooling and average pooling on the input, obtaining a compressed characteristic diagram, adding the compressed characteristic diagram, and multiplying the compressed characteristic diagram by a weight coefficient between 0 and 1 of the Sigmoid function;

usage basis IoU _loss E-IoU of (E-IoU) _loss As a loss function;

calculating a loss value of the output end by using the loss function;

in E-IoU _loss In (3) calculating the overlapping loss, the center distance loss and the width and height loss;

comprising the following steps: overlapping loss, namely the intersection ratio, refers to the ratio of the area of the intersection region of the detection frame and the labeling frame in the data set to the area of the merging region;

wherein: a is a target frame area, and B is a detection frame area;

width and height loss, which means the width and height W of the detection frame _B ，H _B Marking frame width and height W in data set _A ，H _A Square of the difference value of (2) and the minimum circumscribed frame ratio;

EIOU _loss ＝1-IOU+dis _loss +asp _loss 。