WO2021098831A1 - 一种适用于嵌入式设备的目标检测系统 - Google Patents

一种适用于嵌入式设备的目标检测系统 Download PDF

Info

Publication number
WO2021098831A1
WO2021098831A1 PCT/CN2020/130499 CN2020130499W WO2021098831A1 WO 2021098831 A1 WO2021098831 A1 WO 2021098831A1 CN 2020130499 W CN2020130499 W CN 2020130499W WO 2021098831 A1 WO2021098831 A1 WO 2021098831A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
branch
model
network
module
Prior art date
Application number
PCT/CN2020/130499
Other languages
English (en)
French (fr)
Inventor
叶杭杨
Original Assignee
乐鑫信息科技(上海)股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 乐鑫信息科技(上海)股份有限公司 filed Critical 乐鑫信息科技(上海)股份有限公司
Priority to US17/778,788 priority Critical patent/US20220398835A1/en
Publication of WO2021098831A1 publication Critical patent/WO2021098831A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/255Detecting or recognising potential candidate objects based on visual cues, e.g. shapes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the invention relates to the field of target detection and online correction of embedded devices, in particular to a target detection system suitable for embedded devices.
  • Deep learning methods show better results than traditional methods, but there are some shortcomings in practical applications:
  • the model parameters are large and occupy a lot of storage space. It is extremely disadvantageous for embedded devices with scarce resources.
  • such a network can only be deployed on the server, and the terminal device can call the server's interface through the network to achieve the function of target detection. Once the network is blocked, all functions cannot be realized.
  • the simplest method is to simplify the model and obtain a small network model to achieve target detection.
  • the small network model can shrink the detection model while reducing the amount of parameters and calculations, making it possible to achieve offline target detection in embedded devices, but such a network structure has limited expression capabilities and cannot adapt to all background states. For example, during the experiment, it was found that the detection rate of the small network model was significantly reduced when the target was detected in a darker environment.
  • the purpose of the present invention is to provide an embedded device with a target detection system that has good expressive power and can use actual training sets to perform effective model training and correction, and mainly solves the problems existing in the above-mentioned prior art.
  • the technical solution adopted by the present invention is to provide a target detection system suitable for embedded devices, which is characterized in that it includes an embedded device; the embedded device runs local business logic and target detection logic;
  • the target detection logic is composed of a multi-layer structure containing multiple branch modules and a result merging module; the branch module is composed of a shared basic network, a private basic network, and a detection module; the branch module of the first layer
  • the shared basic network accepts target detection input images; except for the branch modules of the first layer, the parameters of the shared basic network of the remaining branch modules all directly come from the output of the shared basic network of the upper layer;
  • the output of the shared basic network is used as the input of the private basic network;
  • the private basic network output characteristic map is used as the input of the detection module;
  • the output of the detection module is the output of the branch module of a single layer;
  • the result merging module merges the outputs of the branch modules of each layer, and outputs the target detection result;
  • the local business logic takes the target detection result as an input, and uses the target detection result to further complete the service.
  • the shared basic network is formed by stacking a plurality of basic network blocks; in the shared basic network of the branch module of the first layer, the basic network block of the first layer is a CNN network block, and the rest The basic network block is a MobileNet network block; in the shared basic network of the branch modules in other layers, all the basic network blocks are MobileNet network blocks; in the shared basic network, the number of MobileNet network blocks Dynamically increase or decrease with the difficulty of the target.
  • the private basic network is formed by stacking a plurality of MobileNet network blocks, and the number of the MobileNet network blocks dynamically increases or decreases with the expressive power; the parameters of the private basic network are only valid for the current branch module.
  • the detection module divides the feature map into a first branch, a second branch, and a third branch; the first branch is composed of one MobileNet network block, and the second branch is composed of two MobileNet networks. Block structure, the third branch is made up of 3 MobileNet network blocks;
  • the detection module merges the feature maps of the first branch, the second branch, and the third branch, and obtains the score, the detection frame, and the key point as the branch of the current layer after convolution The output of the module.
  • model online self-calibration system includes a sample collection logic running on the embedded device and a sample labeling module and a model correction module running on the server;
  • sample collection logic collects samples, save them in a sample library, and upload the sample library to the server from time to time;
  • the sample labeling module finishes labeling the images in the sample library to form a labeling sample library, and then using the labeling sample library to complete the calibration of the model network parameters through the model correction module, and the model network after calibration
  • the parameters are delivered and updated to the embedded device.
  • sample collection function of the sample collection logic is activated in the form of a timing trigger or a service trigger; the sample collection logic after being triggered performs the following steps:
  • Step 1.1 Set the detection result queue to be empty
  • Step 1.2 Obtain a new frame of image, perform target detection, and send the image and the detection result of the image to the detection result queue at the same time;
  • Step 1.3 In the detection result queue, take the image where the last detection result is "object detected” as the starting point, and scan toward the end of the queue. If the next detection result is "detection” To the image of "Object”, with the image as the end point, skip to step 1.4, otherwise skip to step 1.2;
  • Step 1.4 Count the number Z of the images in the interval from the start point to the end point in the step 1.3 where the detection result is "no object detected";
  • Step 1.5 If Z is greater than Z threshold , go back to step 1.1. If Z is less than or equal to Z threshold , then one frame is extracted from the Z frame of the image, stored in the sample library, and this sample collection is terminated.
  • the limited capacity of the sample library of the sample collection logic is N, and when the number of existing samples in the sample library is greater than or equal to the limited capacity N, the new sample replaces the oldest sample in the sample library sample;
  • the server After receiving the sample library uploaded by the embedded device, the server deletes duplicate images in the sample library by calculating the similarity of the images in the sample library.
  • sample labeling work performed by the sample labeling module includes steps:
  • Step 2.1 Extract an image from the sample library, and send it to multiple super networks at the same time for target recognition, and obtain the target recognition result;
  • Step 2.2 Calculate the difficulty coefficient ⁇ of the image by using the target recognition result
  • Step 2.3 If the difficulty coefficient ⁇ corresponding to the image is less than or equal to the difficulty threshold ⁇ threshold , classify the image as a second-level difficult sample; for the second-level difficult sample, remove the image from the sample library Remove from, integrate the target recognition results of multiple super-large networks, and after automatic labeling is completed, put it into the labeling sample library;
  • Step 2.4 If the difficulty coefficient ⁇ corresponding to the image is greater than the difficulty threshold ⁇ threshold , classify the image as a first-level difficult sample; for the first-level difficult sample, remove the image from the sample After being removed from the library, save it separately, and manually complete the annotation; after manual annotation, put the picture into the annotation sample library;
  • Step 2.5 If there are still unprocessed images in the sample library, go back to step 2.1, otherwise the sample labeling work is completed.
  • step 2.2 specifically includes sub-steps:
  • Step 2.2.1 select one of the target recognition results of the super network as the benchmark result
  • Step 2.2.2 calculate the IoU of the detection frame in the target recognition result of the other super-large networks and the detection frame in the benchmark result;
  • Step 2.2.3 For each of the super large networks, from the multiple output target recognition results, select the target recognition result with the largest IoU and the IoU value greater than the threshold C threshold , and the corresponding target recognition result.
  • the benchmark results are grouped into one group; the target recognition results that cannot be grouped are grouped independently;
  • Step 2.2.4 Calculate the difficulty coefficient ⁇ , where:
  • Step 2.3 is expanded to steps:
  • Step 2.3.1 if the difficulty coefficient ⁇ corresponding to the image is less than or equal to the difficulty threshold ⁇ threshold , classify the image as a second-level difficult sample;
  • Step 2.3.2 remove the image from the sample library
  • Step 2.3.3 For the second-level difficult sample, discard the corresponding independent group of the target recognition results, calculate the average value of the detection frame in the non-independent group of the target recognition results, as the final label of the sample, complete Automatic labeling.
  • the work of the model correction module includes the following steps:
  • Step 3.1 Divide the labeled sample library into an actual training set and an actual verification set; use the publicly obtained general samples as the public verification set;
  • Step 3.2 Calculate the LOSS values of the original model in the public verification set and the actual verification set respectively;
  • Step 3.3 Divide the actual training set into multiple groups, and use the original model as a pre-training model
  • Step 3.4 Select a set of data in the actual training set
  • Step 3.5 Perform model training on the pre-training model to obtain a post-training model
  • Step 3.6 Calculate the LOSS values of the trained model in the public verification set and the actual verification set respectively;
  • Step 3.7 If the difference between the LOSS values of the original model and the trained model in the public validation set is greater than the threshold L threshold , and the difference between the LOSS values in the actual validation set is greater than the threshold I threshold , jump to Step 3.8, otherwise go to step 3.9;
  • Step 3.8 If there is still data that has not participated in training in the actual training set, set the post-training model as the new pre-training model, and skip to step 3.4, otherwise, go to step 3.9;
  • Step 3.9 stop training; after stopping training, use the network parameters of the trained model as the output of the model correction module.
  • the invention utilizes the number of layers of the shared basic network and the private basic network that can be dynamically adjusted to share parameters between the shared basic networks, thereby reducing the overall network parameters and the amount of calculation.
  • the model correction system included in the present invention collects difficult samples encountered by embedded devices in the current environment, submits them to the server from time to time, uses the server's large-scale target detection model to automatically label the samples, and then uses the labeled samples to train and update the embedded device Network model.
  • the present invention has the following advantages:
  • the sample library does not need to be uploaded in real time, which greatly reduces the network dependence of embedded devices.
  • the embedded device can use the results of the large-scale target detection model on the server to update its own model network parameters, and complete the model upgrade more efficiently.
  • Figure 1 is a system structure diagram of a preferred embodiment of the present invention
  • Figure 2 is a network structure diagram of a deep learning network in a preferred embodiment of the present invention.
  • Figure 3 is a schematic diagram of the structure of a shared basic network in a preferred embodiment of the present invention.
  • FIG. 4 is a schematic diagram of the structure of a detection module in a preferred embodiment of the present invention.
  • Figure 5 is a flow chart of sample collection logic in a preferred embodiment of the present invention.
  • Figure 6 is a flowchart of a sample labeling module in a preferred embodiment of the present invention.
  • FIG. 7 is an example diagram of groupings for calculating difficulty coefficients in a preferred embodiment of the present invention.
  • Fig. 8 is a flowchart of a model correction module in a preferred embodiment of the present invention.
  • 1-branch module 1.1-shared basic network, 1.2-private basic network, 1.3-detection module, 2-result merging module, 3.1-network block, 3.2-optional network block, 4.1-first branch, 4.2-Second branch, 4.3-Third branch, 5-embedded device, 5.1-target detection logic, 5.2-local business logic, 5.3-sample collection logic, 6-server, 6.1-sample labeling module, 6.2- Model correction module, 7-sample library, 8-network model parameters, 9-Faster-RCNN network, 10-SSD network 10.
  • a target detection system suitable for embedded devices of the present invention includes an embedded device 5 and a server 6. Run remote business logic on the server 6; run target detection logic 5.1 and local business logic 5.2 on the embedded device 5.
  • the target detection logic 5.1 contains a deep learning network model.
  • a target detection system suitable for embedded devices of the present invention also includes an online model self-calibration system, which is used to solve the problem of decreased learning ability caused by reducing the amount of parameters of the small model in order to reduce the amount of calculation.
  • the online self-calibration system includes sample collection logic 5.3 running on the embedded device 5, sample labeling module 6.1 and model correction module 6.2 running on the server 6;
  • the target detection logic 5.1 On the embedded device 5, all the actually collected images enter the target detection logic 5.1, and the detection results of the target detection logic 5.1 are sent to the local business logic 5.2 and the sample collection logic 5.3, respectively.
  • the local business logic 5.2 completes the business-related logic, and the sample collection logic 5.3 is used as a part of the online self-calibration system.
  • the controlled collection samples are placed in the sample library 7 to prepare for subsequent calibration.
  • the samples in the sample library 7 can be transmitted to the server 6 through various methods such as Bluetooth and Wi-Fi.
  • the duplicate pictures are deleted by calculating the similarity between the pictures, and the sample labeling module 6.1 is entered.
  • the marked samples are used as the training set and the test set to enter the model correction module 6.2 to train a new target detection network model parameter 8, and then deploy the updated network model parameter 8 to the embedded device 5.
  • the deep learning network model in the target detection logic consists of a multi-layer structure containing multiple branch modules 1 and a result merging module 2.
  • the network consists of several branch modules 1: M1, M2...Mx.
  • Each branch module 1 corresponds to one or more anchors.
  • the number of branch modules is 2, namely M1, M2; (2) M1 corresponds to an anchor size of 16 ⁇ 16; (3) M2 corresponds to two anchor sizes (32 ⁇ 32, 64 ⁇ 56)
  • this model can detect targets near the size of the anchor.
  • Each branch module 1 is composed of three major components: shared basic network 1.1, private basic network 1.2 and detection module 1.3.
  • Shared basic network 1.1 is formed by stacking MobileNet network blocks.
  • MobileNet is a network structure suitable for mobile devices. Compared with CNN, it greatly reduces the amount of calculation and parameters, and at the same time has the "scaling" feature of CNN.
  • the design of the shared basic network 1.1 (backbone_1) of the first layer is different from that of the shared basic network 1.1 of other layers: In order to prevent MobileNet from losing too many features, the first layer uses CNN.
  • the function of the shared basic network 1.1 is mainly to determine the zoom ratio of the branch module through stride.
  • stride is multiplied by 8, that is, the feature map obtained by the branch module is 1/8 of the original image in size.
  • a large stride can be used, which can quickly reduce the size of the feature map and reduce the amount of parameters and calculations.
  • the shallow shared basic network 1.1 shares parameters with the deep shared basic network 1.1, reducing overall network parameters and calculations. For example, the output of backbone_1 becomes the input of backbone_2, the output of backbone_2 becomes the input of backbone_3, and so on.
  • the private basic network 1.2 is also stacked by MobileNet. Different from the shared basic network 1.1, the parameters of the private basic network 1.2 are only valid for the current module and are not affected by other modules.
  • the private basic network 1.2 can also be increased or decreased based on actual detection results.
  • the expressive power is too poor, you can appropriately increase the network layer to improve the expressive power; when the expressive power is still acceptable, you can appropriately reduce the network to increase the speed.
  • the detection module 1.3 improves the detection effect of the model by fusing the feature maps of different receptive fields.
  • the result of the target detection logic merges module 2 to gather the detection frames predicted by all the branch modules, and after NMS removes the redundant detection frames, the final prediction result is obtained.
  • the shared basic network is formed by stacking multiple network blocks 3.1, and the convolution corresponding to the dashed box is optional network block 3.2.
  • the optional network block 3.2 can be increased or decreased according to the difficulty of the detected object. If the detected object is difficult to detect, or there are many false detections, you can add these optional network blocks 3.2; otherwise, subtract them.
  • the input feature map is entered from the input of the detection module, with information of C dimensions.
  • the feature map After entering the module, the feature map will be divided into the first branch 4.1, the second branch 4.2, and the third branch 4.3 .
  • the feature map is on the second branch 4.2.
  • the number of dimensions of the feature map is increased from C to 2C.
  • the receptive field of the second branch 4.2 is between the upper and lower branches, and the number of dimensions is increased to make it the main feature information.
  • the characteristics of the first branch 4.1 and the third branch 4.3 are used as auxiliary information.
  • the information of the three branches is connected together to form a new feature map.
  • the new feature map undergoes different 1 ⁇ 1 convolutions to obtain the score and the detection frame. If there is a need for key points, add a 1 ⁇ 1 convolution to get the key points.
  • the sample collection logic running on the embedded device is triggered by a custom condition. For example, it can be triggered periodically to start the sample collection logic every hour, or it can be triggered by business. For example, when the device is entering a face, if there is a picture of "no object detected" at this time, there is a high probability that the detection is missed. Start the sample collection logic.
  • the workflow of sample collection logic including steps:
  • Step 501 The sample collection logic is triggered.
  • Step 502 Send the detection result of each frame to the "detection result queue", and calculate the number of consecutive failed frames Z, which specifically includes:
  • Step 502.1 Start with the last detected object
  • Step 502.2 Record the number of frames where no object is detected
  • Step 502.3 The next object detection ends, and the total number of frames in which no object is detected is counted.
  • Step 503 Set the threshold Z threshold . When Z is greater than Z threshold , it is determined that there is no object in the Z frame picture, and the sample collection logic ends; when Z is less than Z threshold , it is determined that the Z frame picture is a missed object and enter Step 504.
  • Step 504 Extract 1 frame from the missed Z frames.
  • Step 505 Save this frame of picture into the sample library, and the sample collection logic ends.
  • the size of the sample library will be limited. When the limit is exceeded, the new sample will replace the oldest sample. Ensure that it does not occupy too much storage resources and the freshness of sample data (which can better reflect the recent environmental conditions).
  • the sample labeling module running on the server automatically labels or manually labels each frame of images in the collected sample library.
  • the specific steps are:
  • Step 601 Each frame of image in the sample library enters the sample labeling module
  • Step 602 The image samples are sent to multiple super large networks, such as YOLO, SSD, Faster-RCNN, etc.
  • Step 603 Obtain results L 1 , L 2 to L X respectively .
  • Step 604 Integrate the results (L 1 , L 2 to L X ) of multiple super large networks, and calculate the image difficulty coefficient ⁇ .
  • Step 605 If the difficulty coefficient ⁇ is less than or equal to the difficulty threshold ⁇ threshold , go to step 606; if the difficulty coefficient ⁇ is greater than the difficulty threshold ⁇ threshold , go to step 608.
  • Step 606 Integrate the target recognition results of the multiple super-large networks to complete the automatic labeling of the image.
  • Step 607 Classify the image as a second-level difficult sample, put it into the labeled sample library, and go to step 610.
  • Step 608 Submit manual processing to complete the manual annotation of the image.
  • Step 609 Classify the image as a first-level difficult sample and put it into the labeled sample library.
  • Step 610 Form a data set.
  • the data set contains both automatically and manually labeled image samples.
  • step 604 the specific process of calculating the sample difficulty coefficient is to group first, and then obtain the result according to the grouping information.
  • the grouping steps include:
  • Step 701 Obtain target recognition results of each super network.
  • Step 702 Select the target recognition result of one of the super-large networks as the reference group (that is, each detection frame is used as the reference detection frame of a group), and classify the target recognition results of the remaining super-large networks as to be classified.
  • Step 703 Select a super network to be classified, take its target recognition result, and calculate the IoU value between multiple detection frames and the reference detection frame.
  • Step 704 Among the multiple detection frames to be classified, the detection frame with the largest IoU value is selected. If the IoU value of this detection frame is greater than the threshold C threshold , then the current detection frame is included in the group of the reference detection frame. The detection frames that cannot be grouped are grouped separately.
  • step 705 if there is still an unprocessed super network, jump to step 703. Otherwise it ends.
  • the result of Faster-RCNN network 9 is used as the reference group.
  • the IoU of the detection frame 2 of the Faster-RCNN network 9 is the largest and greater than the C threshold , so the SSD network
  • the detection frame 1 of 10 and the detection frame 2 of the Faster-RCNN network 9 are grouped together, and so on.
  • the detection frame 5 of the SSD network 10 fails to be grouped, so it is grouped independently.
  • the difficulty factor ⁇ is calculated by the following formula:
  • step 606 the specific method of automatic labeling of the image is to first discard the independent group of detection frames, and then use the average value of the non-independent group of detection frames as the final label of the image sample.
  • the expression is as follows:
  • x, y, w, h represent the horizontal and vertical coordinates of the upper left corner of the detection frame, and the width and height of the detection frame, respectively.
  • the data set generated by the labeled samples is divided into the actual training set and the actual verification set, and the public data set is regarded as the public verification set.
  • the minimum unit of training data is batch.
  • the calibration process includes steps:
  • Step 801 Prepare the original model (the model after the last calibration, if the calibration is performed for the first time, it is the original model), and calculate the Loss value of the original model on the public verification set and the actual verification set, L 0 and I 0 .
  • Step 802 Prepare an actual training set of a batch, and proceed to step 803. If all the samples in the actual training set have been traversed, stop training and jump to step 806.
  • Step 803 Start training.
  • Step 804 After each batch training, calculate the Loss value, L and I, of the trained model on the public verification set and the actual verification set.
  • Step 805 If L 0 -L>L Threshold and I 0 -I>I Threshold are regarded as an effective training, the network parameters of the model are updated, and step 801 is skipped; otherwise, the iteration is stopped and step 806 is entered.
  • Step 806 The correction is completed, and new model network data is generated.
  • the first initial model was established, using open source data sets. Open source data sets usually cover various scenarios and are highly rich. A model trained with such data can be relatively evenly adapted to various scenarios. This initial model will be deployed to the device first.
  • the embedded device uses the model online self-calibration system to update image samples from time to time to the server.
  • the network parameters of the model corrected by the online self-calibration system are sent back to the embedded device by the server through Bluetooth, Wi-Fi, etc. To update the network parameters in the device.

Abstract

一种适用于嵌入式设备的目标检测系统,包含嵌入式设备(5)、服务器(6);运行于嵌入式设备(5)上的目标检测逻辑(5.1)由多层共享基础网络、私有基础网络和检测模块组成;共享基础网络的参数直接来自于上一层的输出;图像经共享基础网络、私有基础网络处理得到特征图,再经检测模块处理后,由结果合并模块合并输出目标检测结果。目标检测系统还包含模型在线自校准系统,嵌入式设备(5)采集样本后不定时上传至服务器(6),服务器(6)通过自动和人工的方式对样本进行标注后,训练模型,更新至嵌入式设备(5)。该目标检测系统能在嵌入式设备(5)上取得良好的表现,并利用服务器(6)上的大型目标检测模型完成自动标注,减轻了工作量,并更加有效率的完成模型校正。

Description

一种适用于嵌入式设备的目标检测系统 技术领域
本发明涉及嵌入式设备的目标检测和在线校正领域,特别是一种适用于嵌入式设备的目标检测系统。
背景技术
目前目标检测的主流方法是基于深度学习实现。深度学习方法也表现出比传统方法更好的效果,但在实际应用时存在一些缺陷:
1、庞大的计算量,需要用专业芯片(GPU)加速。对于移动设备,尤其是嵌入式设备,尤为不利。
2、模型参数量大,占用大量储存空间。对于资源紧缺的嵌入式设备极其不利。
于是这样的网络只能部署在服务器上,终端设备通过网络,调用服务器的接口,来达到目标检测的功能。一旦网络屏蔽,所有功能都无法实现。
为了能够在终端设备上实现离线目标检测,能够很好的摆脱网络束缚,最简单方法是:精简模型,得到一个小网络模型来实现目标检测。小网络模型虽然可以将检测模型缩小的同时减少参数量和计算量,使得在嵌入式设备实现离线目标检测有可能实现,但这样的网络结构,表达能力会受限,无法适应所有背景状态。例如,实验过程中发现,小网络模型在较暗环境下进行目标检测时,检出率明显降低。
另外,针对小网络模型进行训练的时候,会遇到摄像头拍摄的图片与训练集不一致时(色彩饱和度、曝光度、锐度等),容易出现漏检。解决办法是能够用摄像头实际采集到的图片进行学习。但建立实际数据训练集,会耗费大量的人力物力,而且周期很长。数据集过小,训练出来的网络不具泛化性。
发明内容
本发明的目的在于为嵌入式设备提供一种具有良好表现力,且能够使用实际训练集进行有效模型训练校正的目标检测系统,主要解决上述现有技术存在的问题。为了实现上述目的,本发明所采用的技术方案是提供一种适用于嵌入式设备的目标检测系统,其特征在于,包含嵌入式设备;所述嵌入式设备上运行本地业务逻辑和目标检测逻辑;
所述目标检测逻辑由一个包含多个分支模块的多层结构和一个结果合并模块构成;所述分支模块由一个共享基础网络、一个私有基础网络和一个检测模块组成;第一层所述分支模块的所述共享基础网络接受目标检测输入图像;除第一层所述分支模块外,其余所述分支模块的所述共享基础网络的参数均直接来自于上一层所述共享基础网络的输出;所述共享基础网络的输出作为所述私有基础网络的输入;所述私有基础网络输出特征图,作为所述检测模块的输入;所述检测模块的输出是单层所述分支模块的输出;所述结果合并模块合并每一层所述分支模块的输出,输出目标检测结果;
所述本地业务逻辑的以所述目标检测结果作为输入,利用所述目标检测结果进一步完成业务。
进一步地,所述共享基础网络由多个基础网络块堆叠而成;在第一层所述分支模块的所述共享基础网络中,第一层所述基础网络块是CNN网络块,余下的所述基础网络块是MobileNet网络块;在其他层所述分支模块的所述共享基础网络中,全部所述基础网络块均是MobileNet网络块;在所述共享基础网络中,MobileNet网络块的个数随目标难度动态增减。
进一步地,所述私有基础网络由多个MobileNet网络块堆叠而成,所述MobileNet网络块的数量随表现力动态增减;所述私有基础网络的参数只对当前所述分支模块有效。
进一步地,所述检测模块将所述特征图分成第一支路、第二支路和第三支路;所述第一支路由一个MobileNet网络块构成,所述第二支路由2个MobileNet网络块构成,所述第三支路由3个MobileNet网络块构成;
所述特征图经过所述第一支路和所述第三支路后,其特征维数个数不变;所述特征图经过所述第二支路后,其特征维数个数翻倍;所述检测模块合并所述第一支路、所述第二支路和所述第三支路的所述特征图,经过卷积得到分数、检测框和关键点作为当前层的所述分支模块的输出。
进一步地,还包含服务器和模型在线自校准系统;所述模型在线自校准系统包含运行在所述嵌入式设备上的样本收集逻辑和运行于所述服务器上的样本标注模块和模型校正模块;
所述样本收集逻辑采集样本后,保存于样本库中,并将所述样本库不定时上传至所述服务器;
所述样本标注模块完成对所述样本库中的图像进行标注,形成标注样本库,然后利用所述标注样本库,通过模型校正模块,完成模型网络参数的校 准,并将校准后所述模型网络参数下发并更新至所述嵌入式设备。
进一步地,所述样本收集逻辑的样本收集功能,以定时触发或者业务触发的形式启动;被触发后的所述样本收集逻辑进行以下步骤:
步骤1.1、设置检测结果队列为空;
步骤1.2、获取新的一帧图像,进行目标检测,并将所述图像和所述图像的检测结果同时送入所述检测结果队列;
步骤1.3、在所述检测结果队列中,以最后一次所述检测结果为“检测到物体”的所述图像为起始点,向队列尾部方向扫描,如果遇到下一次所述检测结果为“检测到物体”的所述图像,以所述图像为结束点,跳转到步骤1.4,否则跳转到步骤1.2;
步骤1.4、统计步骤1.3中从所述起始点至所述结束点区间内,所述检测结果为“未检测到物体”的所述图像数目Z;
步骤1.5、如果Z大于Z threshold,则回到步骤1.1。如果Z小于等于Z threshold,则从Z帧所述图像中抽取一帧,存入所述样本库,本次样本收集终止。
进一步地,所述样本收集逻辑的所述样本库的限定容量为N,当所述样本库的已有样本数目大于等于所述限定容量N时,新样本替换所述样本库中历史最久的样本;
所述服务器收到所述嵌入式设备上传的所述样本库后,通过计算所述样本库中图像的相似性,删除所述样本库中的重复图像。
进一步地,所述样本标注模块进行的样本标注工作包含步骤:
步骤2.1、从所述样本库中提取一副图像,同时送入多个超大网络进行目标识别,并得到目标识别结果;
步骤2.2、利用所述目标识别结果,计算所述图像的难度系数λ;
步骤2.3、如果所述图像对应的所述难度系数λ小于等于难度阈值λ threshold,将所述图像归类为二级难样本;对于所述二级难样本,将所述图像从所述样本库中移除,综合多个所述超大网络的所述目标识别结果,完成自动标注后,放入所述标注样本库;
步骤2.4、如果所述图像对应的所述难度系数λ大于所述难度阈值λ threshold,将所述图像归类为一级难样本;对于所述一级难样本,将所述图像从所述样本库中移除后另外保存,由人工完成标注;人工标注后,将所述图片放入所述标注样本库;
步骤2.5、如果所述样本库内还有未处理的图像,回到步骤2.1,否则样本标注工作完成。
进一步地,步骤2.2具体包含子步骤:
步骤2.2.1、选择一个所述超大网络的所述目标识别结果作为基准结果;
步骤2.2.2、计算其他所述超大网络的所述目标识别结果中的检测框与所述基准结果中的检测框的IoU;
步骤2.2.3、对于每一个所述超大网络,从输出的多个所述目标识别结果中,选取所述IoU最大且所述IoU值大于阈值C threshold的所述目标识别结果,与对应的所述基准结果编入一组;不能编组的所述目标识别结果独立成组;
步骤2.2.4、计算所述难度系数λ,其中:
Figure PCTCN2020130499-appb-000001
步骤2.3扩展为步骤:
步骤2.3.1、如果所述图像对应的所述难度系数λ小于等于所述难度阈 值λ threshold,将所述图像归类为二级难样本;
步骤2.3.2、将所述图像从所述样本库中移除;
步骤2.3.3、对于所述二级难样本,丢弃对应的独立成组的所述目标识别结果,计算非独立成组的所述目标识别结果中检测框的平均值,作为样本最终标签,完成自动标注。
进一步地,所述模型校正模块的工作包含步骤:
步骤3.1、将所述标注样本库分成实际训练集和实际验证集;将公开获得的一般样本,作为公开验证集;
步骤3.2、分别计算原始模型在公开验证集和实际验证集的LOSS值;
步骤3.3、将实际训练集划分成多组,将所述原始模型作为训练前模型;
步骤3.4、选取所述实际训练集中的一组数据;
步骤3.5、对所述训练前模型进行模型训练,得到训练后模型;
步骤3.6、分别计算所述训练后模型在所述公开验证集和所述实际验证集的LOSS值;
步骤3.7、如果所述原始模型和所述训练后模型在所述公开验证集的LOSS值的差大于阈值L threshold,且在所述实际验证集的LOSS值的差大于阈值I threshold,跳转到步骤3.8,反之进入步骤3.9;
步骤3.8、如果所述实际训练集中还有未参与训练的数据,则将所述训练后模型设置为新的所述训练前模型,跳转到步骤3.4,否则进入步骤3.9;
步骤3.9,停止训练;停止训练之后,将所述训练后模型的网络参数作为所述模型校正模块的输出。
本发明利用在共享基础网络间共享参数、可动态调整的共享基础网络和私有基础网络的层数,减少了整体网络参数和计算量。
本发明的包含的模型校正系统收集嵌入式设备在当前环境中遇到的难样本,不定时提交给服务器,利用服务器的大型目标检测模型自动标注样本,然后利用标注的样本训练和更新嵌入式设备的网络模型。
鉴于上述技术特征,本发明具有如下优点:
1、不受嵌入式设备资源紧缺和计算速度有限的限制,在嵌入式设备上依然能取得良好的表现。
2、样本库不用实时上传,极大的减轻了嵌入式设备的网络依赖。
3、服务器上的大型目标检测模型的自动标注,减轻了人工标注的工作量。
4、嵌入式设备可以利用服务器上大型目标检测模型的成果更新自己的模型网络参数,更加有效率的完成模型升级。
附图说明
图1是本发明中一个较佳实施例的系统结构图;
图2是本发明中一个较佳实施例中深度学习网络的网络结构图;
图3是本发明中一个较佳实施例中共享基础网络的结构示意图;
图4是本发明中一个较佳实施例中检测模块的结构示意图;
图5是本发明中一个较佳实施例中样本收集逻辑的流程图;
图6是本发明中一个较佳实施例中样本标注模块的流程图;
图7是本发明中一个较佳实施例中计算难度系数的分组示例图;
图8是本发明中一个较佳实施例中模型校正模块的流程图。
图中:1-分支模块,1.1-共享基础网络,1.2-私有基础网络,1.3-检测模块,2-结果合并模块,3.1-网络块,3.2-可选网络块,4.1-第一支路,4.2-第二支路,4.3-第三支路,5-嵌入式设备,5.1-目标检测逻辑,5.2-本地业务逻辑,5.3-样本收集逻辑,6-服务器,6.1-样本标注模块,6.2-模型校正模块,7-样本库,8-网络模型参数,9-Faster-RCNN网络,10-SSD网络10。
具体实施方式
下面结合具体实施方式,进一步阐述本发明。应理解,这些实施例仅用于说明本发明而不用于限制本发明的范围。此外应理解,在阅读了本发明讲授的内容之后,本领域技术人员可以对本发明作各种改动或修改,这些等价形式同样落于本申请所附权利要求书所限定的范围。
请参阅图1,本发明的一种适用于嵌入式设备的目标检测系统,包含嵌入式设备5和服务器6。在服务器6上运行远程业务逻辑;在嵌入式设备5上运行目标检测逻辑5.1和本地业务逻辑5.2。目标检测逻辑5.1包含一个深度学习的网络模型。
本发明的一种适用于嵌入式设备的目标检测系统,还包含模型在线自校准系统,用以解决因为小模型的为了减少计算量而减少参数量导致的学习能力下降的问题。在线自校准系统包含运行在嵌入式设备5上的样本收集逻辑5.3和运行于服务器6上的样本标注模块6.1和模型校正模块6.2;
在嵌入式设备5上,所有实际采集的图像都进入目标检测逻辑5.1,目标检测逻辑5.1的检测结果,被分别送入本地业务逻辑5.2和样本收集逻辑5.3。本地业务逻辑5.2完成业务相关的逻辑,样本收集逻辑5.3则作为在线自校准系统的一部分,受控采集样本放入样本库7,为后续的校正做准备。
样本库7中的样本可以通过蓝牙、Wi-Fi等多种方式传输至服务器6。
样本库7上传至服务器6之后,通过计算图片之间的相似性,删除重复图片,进入样本标注模块6.1。标注完成后的样本,被作为训练集和测试集,进入模型校正模块6.2,训练出新的目标检测网络模型参数8,然后将更新后的网络模型参数8部署到嵌入式设备5上。
请参阅图2,目标检测逻辑中的深度学习网络模型,由一个包含多个分支模块1的多层结构和一个结果合并模块2构成。该网络由若干个分支模块1组成:M1、M2……Mx。每个分支模块1对应一个或多个anchor。例如,作如下设计:(1)分支模块数为2,即M1,M2;(2)M1对应一个anchor尺寸16×16;(3)M2对应两个anchor尺寸(32×32,64×56),最终这个模型可以检测设定anchor尺寸大小附近的目标。
每个分支模块1又由三大组件构成:共享基础网络1.1、私有基础网络1.2和检测模块1.3。
1、共享基础网络1.1由MobileNet网络块堆叠而成。MobileNet是一种适用于移动设备的网络结构,相较于CNN,极大地减少了计算量和参数量,同时又具有CNN的“缩放”特性。其中第一层的共享基础网络1.1(backbone_1)的设计与其他层的共享基础网络1.1有所不同:为了防止MobileNet丢失太 多特征,第一层网络使用的是CNN。
共享基础网络1.1的功能主要是通过stride来决定该分支模块的缩放比例。以backbone_1的设计为例,stride累乘后为8,即该分支模块得到的特征图在尺寸上是原图的1/8。当检测的物体比较大时,可以采用大stride,这样可以快速渐小特征图的大小,减少参数量和计算量。
浅层的共享基础网络1.1,与深层的共享基础网络1.1共享参数,减少整体网络参数和计算量。如backbone_1的输出成为backbone_2的输入,backbone_2的输出成为backbone_3的输入,以此类推。
2、私有基础网络1.2同样由MobileNet堆叠而成。与共享基础网络1.1不同的是,私有基础网络1.2的参数只对当前模块有效,不受其他模块的影响。
私有基础网络1.2也可以根据实际的检测效果进行增减。当表现力过差时,可以适当增加网络层来提升表现力;当表现力尚可时,可以适当减少网络来提升速度。
3、检测模块1.3通过融合不同感受野的特征图,提升模型的检测效果。
目标检测逻辑的结果合并模块2集合所有分支模块预测的检测框,经过NMS剔除多余的检测框得到最后预测结果。
请参阅图3,共享基础网络由多个网络块3.1堆叠而成,其中虚线框所对应的卷积为可选网络块3.2。可选网络块3.2可以视被检测物体的难易程度,进行增减。如果被检物体较难检出,或者误检很多,可以增加这些可选网络块3.2;反之,则减去。
请参阅图4,输入特征图从检测模块的输入端进入,带有C个维数的信息,进入模块后,特征图会分成第一支路4.1、第二支路4.2、第三支路4.3。特征图在第二支路4.2上,经过2个MobileNet模块后,特征图的维数个数由C提升到2C。第二支路4.2的感受野介于上下两条支路之间,提升其维数个数,使其成为主要特征信息。第一支路4.1和第三支路4.3的特征作为辅助信息。最后将三条支路的信息连结在一起,构成一个新的特征图。新的特征图分别经过不同的1×1卷积得到分数和检测框,如果有关键点的需求,再加入一个1×1卷积来得到关键点。
请参阅图5,运行于嵌入式设备的样本收集逻辑,由自定义的条件触发启动。例如可以定时触发,每隔一小时启动一次样本收集逻辑,也可以是业务触发,例如设备正在进行人脸录入,此时如果出现“没有检测到物体”的图片,很大概率是漏检了,就启动样本收集逻辑。样本收集逻辑的工作流程,包含步骤:
步骤501、样本收集逻辑被触发。
步骤502、将每一帧检测结果送入“检测结果队列”,计算连续失败的帧数目Z,具体包含:
步骤502.1、以最后一次检测到物体为开始;
步骤502.2、记录没有检测到物体的帧数目;
步骤502.3、以下一次检测到物体结束,统计没有检测到物体的帧数目的总数。
步骤503、设定阈值Z threshold,当Z大于Z threshold时,判断这Z帧图片确实没有物体,样本收集逻辑结束;当Z小于Z threshold时,则判断这Z帧图片是漏检了物体,进入步骤504。
步骤504、从漏检的Z帧中抽取1帧。
步骤505、将此帧图片存入样本库,样本收集逻辑结束。
其中,样本库的大小会被限定,当超出限定时,新样本会替换历史最久的样本。保证不占用太多储存资源和样本数据的新鲜度(能更好的反映最近的环境状况)。
请参阅图6,运行于服务器的样本标注模块,对采集到的样本库中的每一帧图像,都进行自动标注或者人工标注,具体步骤是:
步骤601、样本库中的每一帧图像进入样本标注模块;
步骤602、图像样本送入多个超大网络,如YOLO、SSD、Faster-RCNN等。
步骤603、分别得到结果L 1、L 2至L X
步骤604、综合多个超大网络的结果(L 1、L 2至L X),计算图像难度系数λ。
步骤605、如果难度系数λ小于等于难度阈值λ threshold,进入步骤606;如果难度系数λ大于难度阈值λ threshold,进入步骤608。
步骤606、综合多个所述超大网络的所述目标识别结果,完成图像的自动标注。
步骤607、将图像归类为二级难样本,放入标注样本库,进入步骤610。
步骤608、提交人工处理,完成图像的人工标注。
步骤609、将图像归类为一级难样本,放入标注样本库。
步骤610、形成数据集。
如此可以实现快速采集难样本数据集,同时保证样本标注的正确性。最后数据集中同时包含自动标注和人工标注的图像样本。
其中,步骤604中,计算样本难度系数具体过程是先分组,再根据分组信息得出结果。其中,分组的步骤包含:
步骤701、得到各个超大网络的目标识别结果。
步骤702、选择其中一个超大网络的目标识别结果作为基准组别(即,每个检测框作为一个组别的基准检测框),将剩下各个超大网络的目标识别结果归为待分类。
步骤703、选择一个待分类的超大网络,取其目标识别结果,计算其中多个检测框和基准检测框之间的IoU值。
步骤704、在待分类的多个检测框中,选取IoU值最大的检测框。如果此检测框的IoU值大于阈值C threshold,则当前检测框编入该基准检测框所在组。未能编组的检测框各自成组。
步骤705,如果还有非处理的超大网络,跳转到步骤703。否则结束。
一个具体的分组例子请参阅图7。此例子中,将Faster-RCNN网络9的结果作为基准组别。分别计算SSD网络10的检测框1与Faster-RCNN网络9的检测框1至5的IoU,最后发现,与Faster-RCNN网络9的检测框2的IoU最大,且大于C threshold,于是把SSD网络10的检测框1和Faster-RCNN网络9 的检测框2归为一组,以此类推。SSD网络10的检测框5未能编组,所以独立成组。
分组完成后,统计各组的检测框个数,记为N 1至N k。难度系数λ由下面的公式计算得出:
Figure PCTCN2020130499-appb-000002
其中
Figure PCTCN2020130499-appb-000003
为超大网络的个数。以图7为例子,可以得到λ=0.1。
在步骤606中,图像的自动标注的具体做法是,首先丢弃独立成组的检测框,然后将非独立成组的检测框的平均值,作为该图像样本的最终标签。表达式如下:
Figure PCTCN2020130499-appb-000004
其中
Figure PCTCN2020130499-appb-000005
为超大网络的个数,x、y、w、h分别表示检测框左上角的横坐标和纵坐标,检测框的宽和高。
请参阅图8,利用标注后的样本,对原始模型进行微调,以适应当前环境。将标注后的样本生成的数据集分成实际训练集和实际验证集,同时将公开数据集的作为公开验证集。训练数据以batch为最小单位。
校正过程包含步骤:
步骤801、准备原始模型(上一次校正后的模型,如果第一次进行校正则是初始模型),并计算原始模型在公开验证集和实际验证集上的Loss值,L 0和I 0
步骤802、准备一个batch的实际训练集,进入步骤803。如果实际训练集中的样本均被遍历过后,停止训练,跳转到步骤806。
步骤803、开始训练。
步骤804、每经过一个batch的训练之后,计算训练后模型在公开验证集和实际验证集上的Loss值,L和I。
步骤805、如果L 0-L>L Threshold并且I 0-I>I Threshold,视为一次有效训练,更新模型的网络参数,跳转到步骤801;反之则停止迭代,进入步骤806。
步骤806、校正完成,生成新的模型网络数据。
嵌入式设备上,第一次的初始模型建立,采用开源的数据集。开源数据集通常覆盖各种场景,丰富度高。用这样的数据训练出来的模型,能够相对平均得适应各个场景。这个初始模型会被首先部署到设备上。在业务运行过程中,嵌入式设备利用模型在线自校准系统不定时向服务器更新图像样本,经过在线自校准系统校正过的模型网络参数,由服务器通过蓝牙、Wi-Fi等手段发送回嵌入式设备,更新设备中的网络参数。
以上所述仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。

Claims (10)

  1. 一种适用于嵌入式设备的目标检测系统,其特征在于,包含嵌入式设备;所述嵌入式设备上运行本地业务逻辑和目标检测逻辑;
    所述目标检测逻辑由一个包含多个分支模块的多层结构和一个结果合并模块构成;所述分支模块由一个共享基础网络、一个私有基础网络和一个检测模块组成;第一层所述分支模块的所述共享基础网络接受目标检测输入图像;除第一层所述分支模块外,其余所述分支模块的所述共享基础网络的参数均直接来自于上一层所述共享基础网络的输出;所述共享基础网络的输出作为所述私有基础网络的输入;所述私有基础网络输出特征图,作为所述检测模块的输入;所述检测模块的输出是单层所述分支模块的输出;所述结果合并模块合并每一层所述分支模块的输出,输出目标检测结果;
    所述本地业务逻辑的以所述目标检测结果作为输入,利用所述目标检测结果进一步完成业务。
  2. 如权利要求1所述的目标检测系统,其特征在于,所述共享基础网络由多个基础网络块堆叠而成;在第一层所述分支模块的所述共享基础网络中,第一层所述基础网络块是CNN网络块,余下的所述基础网络块是MobileNet网络块;在其他层所述分支模块的所述共享基础网络中,全部所述基础网络块均是MobileNet网络块;在所述共享基础网络中,MobileNet网络块的个数随目标难度动态增减。
  3. 如权利要求1所述的目标检测系统,其特征在于,所述私有基础网络由多个MobileNet网络块堆叠而成,所述MobileNet网络块的数量随表现力动态增减;所述私有基础网络的参数只对当前所述分支模块有效。
  4. 如权利要求1所述的目标检测系统,其特征在于,所述检测模块将所述特征图分成第一支路、第二支路和第三支路;所述第一支路由一个MobileNet网络块构成,所述第二支路由2个MobileNet网络块构成,所述第三支路由3个MobileNet网络块构成;
    所述特征图经过所述第一支路和所述第三支路后,其特征维数个数不变;所述特征图经过所述第二支路后,其特征维数个数翻倍;所述检测模块合并所述第一支路、所述第二支路和所述第三支路的所述特征图,经过卷积得到分数、检测框和关键点作为当前层的所述分支模块的输出。
  5. 如权利要求1所述的目标检测系统,其特征在于,还包含服务器和模型在线自校准系统;所述模型在线自校准系统包含运行在所述嵌入式设备上的样本收集逻辑和运行于所述服务器上的样本标注模块和模型校正模块;
    所述样本收集逻辑采集样本后,保存于样本库中,并将所述样本库不定时上传至所述服务器;
    所述样本标注模块完成对所述样本库中的图像进行标注,形成标注样本库,然后利用所述标注样本库,通过模型校正模块,完成模型网络参数的校准,并将校准后所述模型网络参数下发并更新至所述嵌入式设备。
  6. 如权利要求5所述的目标检测系统,其特征在于,所述样本收集逻辑的样本收集功能,以定时触发或者业务触发的形式启动;被触发后的所述样本收集逻辑进行以下步骤:
    步骤1.1、设置检测结果队列为空;
    步骤1.2、获取新的一帧图像,进行目标检测,并将所述图像和所述图像的检测结果同时送入所述检测结果队列;
    步骤1.3、在所述检测结果队列中,以最后一次所述检测结果为“检测到物体”的所述图像为起始点,向队列尾部方向扫描,如果遇到下一次所述检测结果为“检测到物体”的所述图像,以所述图像为结束点,跳转到步骤1.4,否则跳转到步骤1.2;
    步骤1.4、统计步骤1.3中从所述起始点至所述结束点区间内,所述检测结果为“未检测到物体”的所述图像数目Z;
    步骤1.5、如果Z大于Z threshold,则回到步骤1.1。如果Z小于等于Z threshold,则从Z帧所述图像中抽取一帧,存入所述样本库,本次样本收集终止。
  7. 如权利要求5所述的目标检测系统,其特征在于,所述样本收集逻辑的所述样本库的限定容量为N,当所述样本库的已有样本数目大于等于所述限定容量N时,新样本替换所述样本库中历史最久的样本;
    所述服务器收到所述嵌入式设备上传的所述样本库后,通过计算所述样本库中图像的相似性,删除所述样本库中的重复图像。
  8. 如权利要求5所述的目标检测系统,其特征在于,所述样本标注模块进行的样本标注工作包含步骤:
    步骤2.1、从所述样本库中提取一副图像,同时送入多个超大网络进行目标识别,并得到目标识别结果;
    步骤2.2、利用所述目标识别结果,计算所述图像的难度系数λ;
    步骤2.3、如果所述图像对应的所述难度系数λ小于等于难度阈值λ threshold,将所述图像归类为二级难样本;对于所述二级难样本,将所述图像从所述样本库中移除,综合多个所述超大网络的所述目标识别结果,完成自动标注后,放入所述标注样本库;
    步骤2.4、如果所述图像对应的所述难度系数λ大于所述难度阈值λ threshold,将所述图像归类为一级难样本;对于所述一级难样本,将所述图像从所述样本库中移除后另外保存,由人工完成标注;人工标注后,将所述图片放入所述标注样本库;
    步骤2.5、如果所述样本库内还有未处理的图像,回到步骤2.1,否则样本标注工作完成。
  9. 如权利要求8所述的目标检测系统,其特征在于,
    步骤2.2具体包含子步骤:
    步骤2.2.1、选择一个所述超大网络的所述目标识别结果作为基准结果;
    步骤2.2.2、计算其他所述超大网络的所述目标识别结果中的检测框与所述基准结果中的检测框的IoU;
    步骤2.2.3、对于每一个所述超大网络,从输出的多个所述目标识别结果中,选取所述IoU最大且所述IoU值大于阈值C threshold的所述目标识别结果,与对应的所述基准结果编入一组;不能编组的所述目标识别结果独立成组;
    步骤2.2.4、计算所述难度系数λ,其中:
    Figure PCTCN2020130499-appb-100001
    步骤2.3扩展为步骤:
    步骤2.3.1、如果所述图像对应的所述难度系数λ小于等于所述难度阈值λ threshold,将所述图像归类为二级难样本;
    步骤2.3.2、将所述图像从所述样本库中移除;
    步骤2.3.3、对于所述二级难样本,丢弃对应的独立成组的所述目标识别结果,计算非独立成组的所述目标识别结果中检测框的平均值,作为样本最 终标签,完成自动标注。
  10. 如权利要求5所述的目标检测系统,其特征在于,所述模型校正模块的工作包含步骤:
    步骤3.1、将所述标注样本库分成实际训练集和实际验证集;将公开获得的一般样本,作为公开验证集;
    步骤3.2、分别计算原始模型在公开验证集和实际验证集的LOSS值;
    步骤3.3、将实际训练集划分成多组,将所述原始模型作为训练前模型;
    步骤3.4、选取所述实际训练集中的一组数据;
    步骤3.5、对所述训练前模型进行模型训练,得到训练后模型;
    步骤3.6、分别计算所述训练后模型在所述公开验证集和所述实际验证集的LOSS值;
    步骤3.7、如果所述原始模型和所述训练后模型在所述公开验证集的LOSS值的差大于阈值L threshold,且在所述实际验证集的LOSS值的差大于阈值I threshold,跳转到步骤3.8,反之进入步骤3.9;
    步骤3.8、如果所述实际训练集中还有未参与训练的数据,则将所述训练后模型设置为新的所述训练前模型,跳转到步骤3.4,否则进入步骤3.9;
    步骤3.9,停止训练;停止训练之后,将所述训练后模型的网络参数作为所述模型校正模块的输出。
PCT/CN2020/130499 2019-11-22 2020-11-20 一种适用于嵌入式设备的目标检测系统 WO2021098831A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/778,788 US20220398835A1 (en) 2019-11-22 2020-11-20 Target detection system suitable for embedded device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911153078.6A CN110909794B (zh) 2019-11-22 2019-11-22 一种适用于嵌入式设备的目标检测系统
CN201911153078.6 2019-11-22

Publications (1)

Publication Number Publication Date
WO2021098831A1 true WO2021098831A1 (zh) 2021-05-27

Family

ID=69818851

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/130499 WO2021098831A1 (zh) 2019-11-22 2020-11-20 一种适用于嵌入式设备的目标检测系统

Country Status (3)

Country Link
US (1) US20220398835A1 (zh)
CN (1) CN110909794B (zh)
WO (1) WO2021098831A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780358A (zh) * 2021-08-16 2021-12-10 华北电力大学(保定) 一种基于无锚网络的实时金具检测方法

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909794B (zh) * 2019-11-22 2022-09-13 乐鑫信息科技(上海)股份有限公司 一种适用于嵌入式设备的目标检测系统
CN112118366A (zh) * 2020-07-31 2020-12-22 中标慧安信息技术股份有限公司 一种人脸图片数据的传输方法及装置
CN112183558A (zh) * 2020-09-30 2021-01-05 北京理工大学 一种基于YOLOv3的目标检测和特征提取一体化网络
CN114913419B (zh) * 2022-05-10 2023-07-18 西南石油大学 一种智慧停车目标检测方法及系统
CN116188767B (zh) * 2023-01-13 2023-09-08 湖北普罗格科技股份有限公司 一种基于神经网络的堆叠木板计数方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549852A (zh) * 2018-03-28 2018-09-18 中山大学 基于深度网络增强的特定场景下行人检测器自动学习方法
CN108710897A (zh) * 2018-04-24 2018-10-26 江苏科海智能系统有限公司 一种基于ssd-t的远端在线通用目标检测系统
WO2019011249A1 (zh) * 2017-07-14 2019-01-17 腾讯科技(深圳)有限公司 一种图像中物体姿态的确定方法、装置、设备及存储介质
CN109801265A (zh) * 2018-12-25 2019-05-24 国网河北省电力有限公司电力科学研究院 一种基于卷积神经网络的实时输电设备异物检测系统
CN110909794A (zh) * 2019-11-22 2020-03-24 乐鑫信息科技(上海)股份有限公司 一种适用于嵌入式设备的目标检测系统

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108073869A (zh) * 2016-11-18 2018-05-25 法乐第(北京)网络科技有限公司 一种场景分割和障碍物检测的系统
CN108573238A (zh) * 2018-04-23 2018-09-25 济南浪潮高新科技投资发展有限公司 一种基于双网络结构的车辆检测方法
CN109145798B (zh) * 2018-08-13 2021-10-22 浙江零跑科技股份有限公司 一种驾驶场景目标识别与可行驶区域分割集成方法
US10423860B1 (en) * 2019-01-22 2019-09-24 StradVision, Inc. Learning method and learning device for object detector based on CNN to be used for multi-camera or surround view monitoring using image concatenation and target object merging network, and testing method and testing device using the same
CN109919108B (zh) * 2019-03-11 2022-12-06 西安电子科技大学 基于深度哈希辅助网络的遥感图像快速目标检测方法
CN110047069B (zh) * 2019-04-22 2021-06-04 北京青燕祥云科技有限公司 一种图像检测装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019011249A1 (zh) * 2017-07-14 2019-01-17 腾讯科技(深圳)有限公司 一种图像中物体姿态的确定方法、装置、设备及存储介质
CN108549852A (zh) * 2018-03-28 2018-09-18 中山大学 基于深度网络增强的特定场景下行人检测器自动学习方法
CN108710897A (zh) * 2018-04-24 2018-10-26 江苏科海智能系统有限公司 一种基于ssd-t的远端在线通用目标检测系统
CN109801265A (zh) * 2018-12-25 2019-05-24 国网河北省电力有限公司电力科学研究院 一种基于卷积神经网络的实时输电设备异物检测系统
CN110909794A (zh) * 2019-11-22 2020-03-24 乐鑫信息科技(上海)股份有限公司 一种适用于嵌入式设备的目标检测系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780358A (zh) * 2021-08-16 2021-12-10 华北电力大学(保定) 一种基于无锚网络的实时金具检测方法

Also Published As

Publication number Publication date
CN110909794A (zh) 2020-03-24
CN110909794B (zh) 2022-09-13
US20220398835A1 (en) 2022-12-15

Similar Documents

Publication Publication Date Title
WO2021098831A1 (zh) 一种适用于嵌入式设备的目标检测系统
WO2020073951A1 (zh) 用于图像识别的模型的训练方法、装置、网络设备和存储介质
CN109831680A (zh) 一种视频清晰度的评价方法及装置
CN111460968B (zh) 基于视频的无人机识别与跟踪方法及装置
CN113011319A (zh) 多尺度火灾目标识别方法及系统
CN110991311A (zh) 一种基于密集连接深度网络的目标检测方法
CN110163041A (zh) 视频行人再识别方法、装置及存储介质
CN113723157B (zh) 一种农作物病害识别方法、装置、电子设备及存储介质
CN110674886B (zh) 一种融合多层级特征的视频目标检测方法
CN116089883B (zh) 用于提高已有类别增量学习新旧类别区分度的训练方法
US20210312587A1 (en) Distributed image analysis method and system, and storage medium
CN109446946A (zh) 一种基于多线程的多摄像头实时检测方法
CN109919033A (zh) 一种基于边缘计算的自适应城市寻人方法
CN115063648A (zh) 一种绝缘子缺陷检测模型构建方法及系统
CN115037543A (zh) 一种基于双向时间卷积神经网络的异常网络流量检测方法
CN109359689B (zh) 一种数据识别方法及装置
CN111222534A (zh) 一种基于双向特征融合和更平衡l1损失的单发多框检测器优化方法
CN113408630A (zh) 一种变电站指示灯状态识别方法
CN111652288B (zh) 一种基于稠密特征金字塔的改进型ssd小目标检测方法
CN111723742A (zh) 一种人群密度分析方法、系统、装置及计算机可读存储介质
CN110163081A (zh) 基于ssd的实时区域入侵检测方法、系统及存储介质
CN113822373B (zh) 一种基于集成与知识蒸馏的图像分类模型训练方法
CN112730437B (zh) 基于深度可分离卷积神经网络的喷丝板表面缺陷检测方法、装置、存储介质及设备
CN114095725A (zh) 一种判断摄像头是否异常的方法和系统
CN114067202A (zh) 一种小麦赤霉病的抗性鉴定方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20891202

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20891202

Country of ref document: EP

Kind code of ref document: A1