CN110533103B

CN110533103B - Lightweight small object target detection method and system

Info

Publication number: CN110533103B
Application number: CN201910815228.9A
Authority: CN
Inventors: 秦豪
Original assignee: Dilu Technology Co Ltd
Current assignee: Dilu Technology Co Ltd
Priority date: 2019-08-30
Filing date: 2019-08-30
Publication date: 2023-08-01
Anticipated expiration: 2039-08-30
Also published as: CN110533103A

Abstract

The invention discloses a light-weight small object target detection method and a light-weight small object target detection system, comprising the following steps that an acquisition module acquires images and transmits the images to an image processing module; the image processing module performs image processing on the acquired images and outputs the processed images as training data; constructing a neural network module, fully training the neural network module based on training data, and storing the trained neural network module; and detecting the target object by using the trained neural network module. The invention has the beneficial effects that: according to the invention, the parameter quantity of the model is greatly reduced by optimizing the detection network, so that the network calculation cost and the storage space are reduced, the detection speed is higher, the real-time detection requirement can be met, and the detection precision of a small target object is improved.

Description

Lightweight small object target detection method and system

Technical Field

The invention relates to the technical field of target detection, in particular to a lightweight small object target detection method and system.

Background

In recent years, target detection based on deep learning has been attracting attention as one of important research directions in the field of computer vision, and small target detection has been a problem in a deep learning convolutional neural network model. The YOLOV3 is used as an improved end-to-end target detection model, and has the advantages that the image processing speed is high, for example, the detection speed on a GPU (graphics processing unit) can reach 20 milliseconds to process one image, and compared with YOLOV1 and YOLOV2, the detection effect on small targets is improved; however, the detection speed is seriously dependent on hardware equipment, for example, processing an image on a CPU requires hundreds of milliseconds, and meanwhile, the detection is easy to be missed and false detected when a small target is detected, so that the accuracy is not high enough.

Disclosure of Invention

This section is intended to outline some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. Some simplifications or omissions may be made in this section as well as in the description summary and in the title of the application, to avoid obscuring the purpose of this section, the description summary and the title of the invention, which should not be used to limit the scope of the invention.

The present invention has been made in view of the above-described problems occurring in the prior art.

Therefore, the technical problems solved by the invention are as follows: the light-weight small object target detection method can improve detection speed and detection accuracy, and optimize a network to reduce parameter quantity.

In order to solve the technical problems, the invention provides the following technical scheme: the light-weight small object target detection method comprises the following steps that an acquisition module acquires images and transmits the images to an image processing module; the image processing module performs image processing on the acquired images and outputs the processed images as training data; constructing a neural network module, fully training the neural network module based on training data, and storing the trained neural network module; and detecting the target object by using the trained neural network module.

As a preferable scheme of the lightweight small object target detection method, the invention comprises the following steps: the acquisition module is an image acquisition camera, acquires an image every 10 seconds, and needs to acquire under different scenes.

As a preferable scheme of the lightweight small object target detection method, the invention comprises the following steps: the image processing further comprises the following steps of summarizing all collected images, screening and filtering, and deleting repeated invalid images; and marking the image by using a marking tool and storing the marked image obtained after marking.

As a preferable scheme of the lightweight small object target detection method, the invention comprises the following steps: the convolutional neural network is constructed based on the shufflelenet 2 and comprises a backbone network and a target detection module, wherein the backbone network comprises a separation residual error convolutional layer and a downsampling convolutional layer, and can perform feature extraction on images of training data to obtain deep features and shallow features; the target detection layer processes the extracted deep layer features and shallow layer features, calculates the accurate position and the size of the center of the target object, outputs a detection result, and has the following calculation formula:

(X，Y)＝down*{(x，y)+(dx，dy)}

(W，H)＝(bw，bh)*e ^(dw，dh)

wherein down is a proportional value, 4 or 8 is taken, dx and dy are offset, x and y are positions in a feature layer, dw and dh are results of a convolution calculation layer, and (bw and bh) represent the length and width of a basic frame, and the selection of the basic frame is determined by the size of a target object.

As a preferable scheme of the lightweight small object target detection method, the invention comprises the following steps: the method also comprises an error detection module, comprising the following steps,

the true position offset (true_dx, true_dy) and the target normalized size (true_dw, true_dh) are calculated according to the annotation image, and the calculation formula is as follows:

wherein true_x and true_w are the center position and the size of the target on the original image respectively, down is a proportional value, 4 or 8 is taken, ox and oy are the positions on the corresponding feature layers, and ow and oh are the lengths of the corresponding basic frames.

Comparing (dx, dy, dw, dh) with the true values (true_dx, true_dy, true_dw, true_dh) and calculating an error value, the calculation formula is as follows:

Loss＝loss1+loss2+loss3+loss4

wherein:

loss1＝-true_dx*log(dx)-(1-true_dx)*log(1-dx)

loss2＝-true_dy*log(dy)-(1-true_dy)*log(1-dy)

loss3＝w*(true_dw-dw) ²

loss4＝w*(true_dh-dh) ²

w＝2-true_x*true_y/(image_w*image_h)

where w is the small object processing coefficient, and is the optimization strategy for the small object.

As a preferable scheme of the lightweight small object target detection method, the invention comprises the following steps: the error detection module further comprises the following steps that according to the calculated error value Loss, parameters of the backbone network are updated in a back propagation mode, so that an output detection result is closer to a true value; and when ten groups of data are continuously trained and the network does not output better detection results, ending the training and saving the trained neural network module parameters.

As a preferable scheme of the lightweight small object target detection method, the invention comprises the following steps: the detection of the target object further comprises the following steps that the acquisition module acquires a target image; performing abnormal screening on the acquired target image, removing incomplete images and invalid images, and inputting normal target images into a trained neural network module; and the trained neural network module processes the normal target image to obtain a square frame of the corresponding position of the target object.

The invention solves the other technical problem that: the lightweight small object target detection system comprises a simplified small object detection network, so that the detection speed and the detection precision are improved.

In order to solve the technical problems, the invention provides the following technical scheme: the light-weight small object target detection system comprises an acquisition module, a detection module and a detection module, wherein the acquisition module is used for acquiring image data; the image processing module can process the collected images, screen the images meeting the requirements and mark the images; the neural network module can process the input image and mark a square frame at the corresponding position of the target object; the error detection module is used for calculating the error of the detection result and judging whether the neural network module needs to be trained again.

As a preferred embodiment of the lightweight small object target detection system according to the present invention, wherein: the image processing module comprises a marking tool which is used for marking the target object on the image.

As a preferred embodiment of the lightweight small object target detection system according to the present invention, wherein: the neural network module comprises a backbone network, wherein the backbone network is used for extracting characteristics of an image; the target detection module can calculate the accurate position and the size of the center of the target object and mark the accurate position and the size on the image.

The invention has the beneficial effects that: according to the invention, the parameter quantity of the model is greatly reduced by optimizing the detection network, so that the network calculation cost and the storage space are reduced, the detection speed is higher, the real-time detection requirement can be met, and the detection precision of a small target is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:

FIG. 1 is a schematic overall flow chart of a light-weight small object target detection method according to a first embodiment of the invention;

FIG. 2 is a schematic diagram of a backbone network in a lightweight small object target detection method according to a first embodiment of the present invention;

FIG. 3 is a schematic diagram of the overall structure of a light-weight small object target detection system according to a second embodiment of the present invention during training;

fig. 4 is a schematic diagram of the overall structure of the light-weight small object target detection system according to the second embodiment of the present invention in actual use.

Detailed Description

So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.

Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

While the embodiments of the present invention have been illustrated and described in detail in the drawings, the cross-sectional view of the device structure is not to scale in the general sense for ease of illustration, and the drawings are merely exemplary and should not be construed as limiting the scope of the invention. In addition, the three-dimensional dimensions of length, width and depth should be included in actual fabrication.

Also in the description of the present invention, it should be noted that the orientation or positional relationship indicated by the terms "upper, lower, inner and outer", etc. are based on the orientation or positional relationship shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first, second, or third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected, and coupled" should be construed broadly in this disclosure unless otherwise specifically indicated and defined, such as: can be fixed connection, detachable connection or integral connection; it may also be a mechanical connection, an electrical connection, or a direct connection, or may be indirectly connected through an intermediate medium, or may be a communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Example 1

The target detection means that a target object contained in an image is recognized by a target detection algorithm, and the position of the target object in the image can be output. The target detection algorithm mainly has two modes, namely, firstly calculating a candidate region and then performing CNN classification, such as RCNN series network; and secondly, directly and simultaneously outputting positioning and classifying results, such as SSD and YOLO series networks. The former algorithm has higher accuracy, however, the calculation speed is low, and the latter algorithm is more suitable for actual scenes and has more real-time performance, so that the former algorithm is more common in actual use. The YOLO series network has been developed to YOLO 3 at present, and compared with the detection speed and accuracy of the previous two generations, the YOLO series network has the advantages of improved size, clear structure and good real-time performance, so that YOLO 3 is one of the most commonly used detection algorithms in engineering. Specifically, referring to fig. 1, in order to further optimize the network structure and improve the detection accuracy and speed, the embodiment provides a lightweight small object target detection method, which includes the following steps that the acquisition module 100 acquires an image and transmits the image to the image processing module 200; the image processing module 200 performs image processing on the acquired image and outputs the processed image as training data; constructing a neural network module 300, fully training the neural network module 300 based on training data, and storing the trained neural network module 300; the trained neural network module 300 is used to detect a target object.

The acquisition module 100 is an image acquisition camera, acquires an image every 10 seconds, and needs to acquire the image under different situations, and in this embodiment, the image acquisition can be performed along with the running of the automobile by installing the image acquisition camera on the automobile, so as to acquire rich image contents. Specifically, in order to make the content of the acquired image data sufficiently rich, the image acquisition needs to be performed under different scenes, for example, different weather conditions, including sunny days, rainy days, cloudy days, and the like, and different times, including morning, noon, afternoon, evening, and the like.

The image processing module 200 performs image processing on the collected images and further includes the steps of summarizing all the collected images, screening and filtering the collected images, and deleting the images which are invalid repeatedly; the image is marked using the marking tool 201 and the marked image obtained after the marking is saved.

Specifically, when the image is collected by the collection module 100, the time interval is short, and the situation that the content of the collected image is repeated or similar in a short time can occur, so that after all the collected images are collected, screening and filtering are needed, repeated and invalid images are deleted, and the remaining effective images can be used as training data.

Preferably, in order for the trained neural network module 300 to detect reliably, the number of images as training data should be no less than 1 ten thousand.

The screened and filtered image is marked by using a marking tool 201, wherein the marking tool 201 is a public marking tool labelme software, can plot the target in detail, marks the target object by using a square frame, returns the relative position of the target object in the image, and stores the target object in an xml file format form to obtain a marked image.

The construction of the neural network module 300 is based on the shufflelenet 2, the neural network module 300 comprising a backbone network 301 and a target detection module 302, the construction of which comprises the steps of,

referring to fig. 2, the backbone network 301 is mainly formed by stacking two basic units, namely a separated residual convolution layer and a downsampled convolution layer, by taking reference to the thought of a lightweight network shufflelenet 2, and the backbone network 301 can perform feature extraction on images of training data to obtain a deep feature layer and a shallow feature layer. The separating residual error convolution layer mainly divides an input feature layer into two parts according to the number of features, one half of the features participate in the calculation of the feature re-extraction of the convolution layer, the other half of the features do not participate in the calculation, the two parts of the features are put together again after the calculation of one half of the features is finished, and are subjected to scrambling recombination according to feature numbers, wherein the scrambling purpose is to mutually interact each group of feature information, so that feature fusion can be better completed, for example, the features comprise 1, 2,3, 4, 5 and 6 in total, and the number of the features is 6The group is divided into two groups, 1, 2 and 3 are one group, 4, 5 and 6 are another group, wherein 4, 5 and 6 participate in feature re-extraction to obtainThe feature arrangement after the disorder recombination becomes 1, < >>2、/>3、/>At the next grouping, 1 again,/-is changed>2 group of (I)>3、/>A group of.

The purpose of the downsampling convolution layer is to reduce the size of an input feature layer by half, the number of features is doubled, the downsampling convolution layer firstly puts the input feature layer into two groups of convolution feature extraction layers to calculate, the number of features is unchanged after calculation, but the number of features is doubled because of the two groups of features, meanwhile, the size of the feature layer is reduced by half when the convolution features are extracted, and finally, new feature layers are recombined through disorder, for example, the number of the feature layers is 10, the size is 64, the number is 10, the size is 32, after the convolution extraction layers are processed, the number after the convolution extraction layers are processed is 32, but the number after the recombination is 20, the size is 32, so that the purposes of reducing the size of the feature layers and increasing the number of feature channels are achieved.

The main network 301 is stacked by the separation residual convolution layer and the downsampling convolution layer, and the purpose of enriching features is achieved by repeatedly stacking the separation residual convolution layer and the downsampling convolution layer, it is to be understood that the main function of the main network 301 is to perform feature extraction to obtain deep features and shallow features, specifically, for a picture with a size of 640 x 640, deep features of 40 x 40 and shallow features of 80 x 80 are obtained after feature extraction of the main network 301, and the deeper 20 x 20 feature layers are not needed, because the embodiment is mainly used for detecting small objects instead of large objects.

The object detection module 302 mainly performs processing on the feature layer, and the processing manner is the same for deep features and shallow features. Specifically, the target detection module 302 includes a plurality of convolution layers, for example, for a shallow feature, the target detection module 302 performs a convolution calculation on the shallow feature first, so as to obtain accurate information of each corresponding position point on the shallow feature of 80×80, including a target center position offset and a size of a target frame, and according to the offset and the feature layer position, the accurate position of the center point of the target object can be calculated, where the calculation formula is as follows:

(X，Y)＝down*{(x，y)+(dx，dy)}

wherein down is a ratio value, 4 or 8 is taken, dx and dy are offset, x and y are positions at the feature layer, and 8 is obtained by dividing the original image size 640 by the feature map size 80. The calculation seed of the shallow features is down 8, and the calculation of the deep features is 16.

The size calculation of the target object includes the following steps, wherein for each element of the feature layer, 3 basic frames are corresponding, the sizes are (0.5, 1), (1, 1.5), (2, 3), and according to the calculation result of the convolution calculation layer and the corresponding basic frames, the size of the target object can be determined, and the specific calculation formula is as follows:

(W，H)＝(bw，bh)*e ^(dw，dh)

where dw, dh are the results of the convolution computation layer, (bw, bh) represents the length and width of the basic frame, and the selection of the basic frame is determined by the size of the target object, and the basic frame closest to the size is selected.

It can be understood that the neural network module 300 is constructed, and the image of the training data is input into the neural network module 300, so that the accurate position and size of the center of the target object can be calculated, and the detection result can be output.

The neural network module 300 further includes an error detection module 400, for the constructed neural network module 300, the accuracy of its detection needs to be improved by training before being put into practical use, the error detection module 400 is used to determine whether the detection error of the neural network module 300 and the neural network module 300 have been sufficiently trained to be used in practical work, the use of the error detection module 400 includes the steps of,

the image of the training data extracts features in the backbone network 301 of the neural network module 300, obtains a target position offset (dx, dy) and a target normalized size (dw, dh) on the feature layer, and obtains a real position offset (true_dx, true_dy) and a target normalized size (true_dw, true_dh) according to the corresponding labeling image obtained by the labeling tool 201, wherein the calculation formulas are as follows:

Based on the comparison of (dx, dy, dw, dh) and the true values (true_dx, true_dy, true_dw, true_dh) obtained by the neural network module 300, an error value is calculated as follows:

Loss＝loss1+loss2+loss3+loss4

where Loss is an error value, including object positioning offset Loss functions Loss1 and Loss2, and object size regression Loss functions Loss3 and Loss4, specifically:

loss1＝-true_dx*log(dx)-(1-true_dx)*log(1-dx)

loss2＝-true_dy*log(dy)-(1-true_dy)*log(1-dy)

loss3＝w*(true_dw-dw) ²

loss4＝w*(true_dh-dh) ²

w is a small object processing coefficient, is an optimization strategy for the small object, and has the following calculation formula:

w＝2-true_w*true_h/(image_w*image_h)

wherein true_w and true_h refer to the width and height of the target object, and the width and height of the original image are respectively shown in image_w and image_h.

According to the calculated error value Loss, updating the parameters of the network in the neural network module 300 in a back propagation mode, so that the detection result output by the neural network module 300 is more similar to a true value; when ten sets of data are continuously trained and the network no longer outputs better detection results, the training is ended, the parameters of the trained neural network module 300 are saved, and the calculated parameters are provided in actual use.

After training of the neural network module 300 is completed, the trained neural network module 300 can be used to detect the target object, which comprises the following steps,

acquiring a target image using the acquisition module 100; the acquired target images are subjected to abnormal screening, incomplete images and invalid images are removed, further detection is not performed, the acquisition module 100 is used for acquisition again, and normal target images are input into the trained neural network module 300, and particularly, the incomplete images and the invalid images comprise images with damaged image quality due to decoding problems caused by signal transmission problems, and objects or images which do not need to be detected in the images are repeated and do not need to be transmitted to the neural network module 300 for detection; after the trained neural network module 300 processes the input normal target image, a box of the corresponding position of the target object is obtained, and a detection result is output for reference of a user.

Example 2

Referring to the illustrations of fig. 3 to 4, a lightweight small object target detection system is provided in this embodiment, and includes an acquisition module 100, an image processing module 200, a neural network module 300, and an error detection module 400. The acquisition module 100 is configured to acquire image data, and may be an image acquisition camera; the image processing module 200 can process the acquired images, screen out images meeting the requirements for detection, and annotate the images in a training stage to obtain annotation images; the neural network module 300 can process the input image and mark a square frame of the corresponding position of the target object; the error detection module 400 is configured to calculate an error of the detection result in the training phase, and determine whether the neural network module 300 needs to be trained again.

Specifically, the image processing module 200 includes a marking tool 201, which is used to mark a target object on an image, and mark the real position and size of the target object, and the marking tool 201 may be labelme software of a public marking tool.

The neural network module 300 includes a backbone network 301 and a target detection module 302. The backbone network 301 is used for extracting features of an image, and is mainly formed by stacking two basic units, namely a separated residual convolution layer and a downsampled convolution layer, by taking reference to the thought of a lightweight network shuffle 2. The target detection module 302 can calculate the precise position and size of the center of the target object and mark the target object on the image, and the target detection module 302 includes a plurality of convolution layers.

Because the neural network module 300 constructs the basic backbone network 301 based on the lightweight network shufflelenet 2, compared with the traditional neural network formed by taking the dark net53 or the mobilent as the backbone network, the speed of the neural network module is greatly improved, the neural network module can be deployed on the GPU and the CPU, is suitable for various use occasions, and the detection speed is compared as follows:

backbone network	Detection speed (ms) on GPU	Detection speed (ms) on CPU
			darknet53	13	500 or more
mobilenet	8	100 or more
			shufflenetv2	4	30

It can be seen that, in the embodiment, compared with the conventional neural network, the neural network module 300 constructed based on the lightweight network shufflelenet 2 as the backbone network 301 has a great improvement in detection speed, particularly when deployed on a CPU, the dependency on hardware devices is reduced, the detection can be deployed and performed in an environment with deficient computing resources, and the real-time performance of the detection can be improved due to the improvement in speed.

In addition, the neural network module 300 provided in the present embodiment is optimized in terms of accuracy and parameter amount compared to the conventional yolov3 network for target detection, as follows:

network system	Test accuracy (%)	Parameter size (M)
			volov3	85	127
Neural network module 300	92	9.5

Compared with the yolov3 network, the neural network module 300 provided in this embodiment cuts the feature layer used for detecting the large object target in the yolov3 network, removes the feature layer, makes the neural network module 300 concentrate on the detection and identification of the small object target, sets the weight of the parameter according to the size of the object, makes the very small object be better and more accurately identified, not only improves the detection speed, but also improves the detection accuracy, in addition, the parameter quantity of the detection model is greatly reduced, and compared with the parameter quantity exceeding 120M of the yolov3 network, the parameter quantity in this embodiment does not exceed 10M, and has a very remarkable reduction effect.

It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims

1. The light-weight small object target detection method is characterized by comprising the following steps of: the acquisition module (100) acquires images and transmits the images to the image processing module (200); the acquisition module (100) is an image acquisition camera, acquires an image every 10 seconds, and needs to acquire under different scenes;

the image processing module (200) performs image processing on the acquired image and outputs the processed image as training data;

the image processing further comprises the following steps of summarizing all collected images, screening and filtering, and deleting repeated invalid images; marking the image by using a marking tool (201) and storing a marked image obtained after marking;

constructing a neural network module (300), fully training the neural network module (300) based on training data, and storing the trained neural network module (300);

detecting a target object using the trained neural network module (300);

the convolutional neural network is constructed based on the shufflelenet 2 of the neural network module (300), and comprises a main network (301) and a target detection module (302), wherein the main network (301) comprises a separated residual convolution layer and a downsampled convolution layer, and can perform feature extraction on images of training data to obtain deep features and shallow features;

the target detection module (302) processes the extracted deep layer features and shallow layer features, calculates the accurate position and the accurate size of the center of the target object, outputs a detection result, and has the following calculation formula:

(X，Y)＝down*{(x，y)+(dx，dy)}

(W，H)＝(bw，bh)*e(dw，dh)

wherein down is a proportional value, 4 or 8 is taken, dx and dy are offset, x and y are positions in a feature layer, dw and dh are results of a convolution calculation layer, and (bw and bh) represent the length and width of a basic frame, and the selection of the basic frame is determined by the size of a target object;

an error detection module (400) comprising the steps of,

wherein true_x and true_w are the center position and the size of the target on the original image respectively, down is a proportional value, 4 or 8 is taken, ox and oy are the positions on the corresponding feature layers, and ow and oh are the lengths of the corresponding basic frames;

Loss＝loss1+loss2+loss3+loss4

wherein:

loss1＝-true_dx*log(dx)-(1-true_dx)*log(1-dx)

loss2＝-true_dy*log(dy)-(1-true_dy)*log(1-dy)loss3＝w*(true_dw-dw) ²

loss4＝w*(true_dh-dh) ²

w＝2-true_x*true_y/(image_w*image_h)

2. A lightweight small object target detection method as in claim 1, wherein: the error detection module (400) further comprises the steps of,

according to the calculated error value Loss, updating parameters of the backbone network in a back propagation mode, so that an output detection result is more approximate to a true value;

and when ten sets of data are continuously trained and the network no longer outputs better detection results, ending the training and saving the trained neural network module (300) parameters.

3. A lightweight small object target detection method as in claim 2, wherein: the detecting of the target object further comprises the steps of,

an acquisition module (100) acquires a target image;

performing anomaly screening on the acquired target image, removing incomplete images and invalid images, and inputting normal target images into a trained neural network module (300);

and the trained neural network module (300) processes the normal target image to obtain a square frame of the corresponding position of the target object.

4. A lightweight small object target detection system, characterized by: comprises an acquisition module (100), an image processing module (200), a neural network module (300) and an error detection module (400)

The acquisition module (100) is used for acquiring image data;

the image processing module (200) can process the acquired images, screen the images meeting the requirements and mark the images;

the neural network module (300) can process the input image and mark a square frame of the corresponding position of the target object;

the error detection module (400) is used for calculating the error of the detection result and judging whether the neural network module (300) needs to be trained again; comprises the steps of,

Loss＝loss1+loss2+loss3+loss4

wherein:

loss1＝-true_dx*log(dx)-(1-true_dx)*log(1-dx)

loss2＝-true_dy*log(dy)-(1-true_dy)*log(1-dy)loss3＝w*(true_dw-dw) ²

loss4＝w*(true_dh-dh) ²

w＝2-true_x*true_y/(image_w*image_h)

5. A lightweight small object target detection system as in claim 4, wherein: the image processing module (200) comprises a marking tool (201), wherein the marking tool (201) is used for marking a target object on an image.

6. A lightweight small object target detection system as in claim 5, wherein: the neural network module (300) comprises a backbone network (301) and a target detection module (302)

The backbone network (301) is used for extracting features of the image;

the target detection module (302) can calculate the accurate position and the size of the center of the target object and mark the accurate position and the size on the image;

processing the extracted deep layer characteristics and shallow layer characteristics, calculating the accurate position and the size of the center of the target object, outputting a detection result, and calculating the following formula:

(X，Y)＝down*{(x，y)+(dx，dy)}

(W，H)＝(bw，bh)*e(dw，dh)