WO2022252565A1 - Target detection system, method and apparatus, and device and medium - Google Patents

Target detection system, method and apparatus, and device and medium Download PDF

Info

Publication number
WO2022252565A1
WO2022252565A1 PCT/CN2021/139062 CN2021139062W WO2022252565A1 WO 2022252565 A1 WO2022252565 A1 WO 2022252565A1 CN 2021139062 W CN2021139062 W CN 2021139062W WO 2022252565 A1 WO2022252565 A1 WO 2022252565A1
Authority
WO
WIPO (PCT)
Prior art keywords
detector
candidate
module
target
detection
Prior art date
Application number
PCT/CN2021/139062
Other languages
French (fr)
Chinese (zh)
Inventor
廖丹萍
Original Assignee
浙江智慧视频安防创新中心有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江智慧视频安防创新中心有限公司 filed Critical 浙江智慧视频安防创新中心有限公司
Publication of WO2022252565A1 publication Critical patent/WO2022252565A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Definitions

  • the present disclosure relates to the technical field of deep learning, and more specifically, the present disclosure relates to a target detection system, method, device, equipment and medium.
  • Object detection is an important research direction of computer vision and digital image processing, and it is widely used in robot navigation, intelligent video surveillance, industrial inspection, aerospace and many other fields.
  • the goal of object detection is to find the object of interest in the image, including two subtasks of object location and object classification, that is, to determine the category and location of the object at the same time.
  • Algorithms based on neural networks can basically be classified into two categories: two-stage algorithms represented by Faster R-CNN and one-stage algorithms represented by YOLO and SSD.
  • the two-stage model represented by Faster R-CNN roughly includes five modules:
  • Input Module This module receives an input image.
  • Feature extraction module This module extracts feature maps from input images through a series of convolutional neural networks.
  • Region Proposal Network This module receives the feature map and outputs the rough bounding box position of the foreground area containing the target and the bounding box position of the background area.
  • Candidate region extraction module This module uses the frame position output by RPN to cut out the candidate background region and foreground region from the feature map, and adjusts the candidate region to the same size.
  • Detection module This module classifies the obtained candidate areas, and uses the frame regression algorithm to further correct the frame position to obtain the final position of the detection area.
  • the detection module needs to classify the obtained candidate areas to determine which type of foreground object or background it belongs to.
  • the prerequisite step for classification is to construct a training set of candidate region feature maps, including feature maps and labels corresponding to candidate regions.
  • the label of the candidate area is generally determined by the intersection over union (IoU) of the candidate area and the real border.
  • the detection module sets a fixed IoU threshold. When the IoU between the candidate area and a real frame is greater than the IoU threshold, its label is the object category (positive sample) contained in the real frame. If the IoU of the candidate area and all ground-truth bounding boxes is less than the IoU threshold, its label is the background class (negative sample).
  • the present disclosure provides a target detection system, including:
  • An input module configured to receive output image data
  • a feature extraction module is used to extract the feature map through the convolutional neural network through the image data
  • the candidate area suggestion module is used to receive the feature map, and output the rough frame position of the foreground area containing the target and the frame position of the background area;
  • the candidate area extraction module is used to use the frame position output by the candidate area suggestion module to cut out the candidate background area and the foreground area from the feature map, and adjust the areas to the same size to obtain the candidate area;
  • the detection module is used to classify the obtained candidate areas, and use the frame regression algorithm to further correct the frame position of the foreground candidate area to obtain the final position of the detection target. .
  • the detection module specifically includes: no less than one detector, wherein each detector is preset with a corresponding IoU threshold for classifying candidate regions into positive samples and negative samples, wherein the real The candidate area whose intersection ratio is greater than the IoU threshold is a positive sample, and the candidate area whose intersection ratio with the real border is smaller than the IoU threshold is a negative sample;
  • the detection module is specifically used for:
  • the detection module is also used for:
  • the number of the detectors is three, respectively the first detector, the second detector and the third detector;
  • intersection-over-union ratio threshold of the first detector is preset to be 0.45-0.55;
  • intersection-over-union ratio threshold of the second detector is preset to be 0.56-0.65;
  • intersection-over-union ratio threshold of the third detector is preset to be 0.66-0.75.
  • the present disclosure can also provide a target detection method, which is applied to the above-mentioned system, and the method includes:
  • a loss function is used to compare the detection result with the ground truth label to get the loss of each detector.
  • the step of comparing the detection result with the real label using the loss function to obtain the loss of each detector it also includes:
  • the loss function is a cross-entropy loss function
  • the loss function is a Smooth L1 loss function or a GIoU loss function.
  • the present disclosure can also provide a target detection device, including:
  • An image data collection module configured to collect image data and target tags corresponding to the image data, wherein the target tags include object categories and frame positions in the image;
  • a target detection module configured to input the image data to the target detection system to obtain the detection result of each detector
  • the loss calculation module is used to compare the detection result with the real label by using the loss function to obtain the loss of each detector.
  • the present disclosure can also provide a computer storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it is used to realize the steps of the above-mentioned object detection method.
  • the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • the processor executes the computer program, the above-mentioned target detection method is realized. step.
  • this disclosure designs multiple detectors with different intersection and union ratio thresholds, and specifically selects a candidate area suitable for the detector for each detector, which is more beneficial to a single detector. training, so it can improve the performance very well.
  • FIG. 1 shows a schematic structural view of Embodiment 1 of the present disclosure
  • Figure 2 shows a schematic structural view of a preferred implementation of Example 1 of the present disclosure
  • Figure 3 shows a schematic diagram of the testing phase of Example 1 of the present disclosure
  • FIG. 4 shows a schematic flow diagram of Embodiment 2 of the present disclosure
  • FIG. 5 shows a schematic structural diagram of Embodiment 3 of the present disclosure
  • FIG. 6 shows a schematic structural diagram of Embodiment 5 of the present disclosure.
  • the present disclosure provides a target detection system, including:
  • An input module configured to receive output image data
  • a feature extraction module is used to extract the feature map by taking the image data through a convolutional neural network feature
  • the candidate area suggestion module is used to receive the feature map, and output the rough frame position of the foreground area containing the target and the frame position of the background area;
  • the candidate area extraction module is used to use the frame position output by the candidate area suggestion module to cut out the candidate background area and the foreground area from the feature map, and adjust the areas to the same size to obtain the candidate area;
  • the detection module is used to classify the obtained candidate areas, and use the frame regression algorithm to further correct the frame position of the foreground candidate area to obtain the final position of the detection target.
  • the detection module specifically includes: no less than one detector, wherein each detector is preset with a corresponding IoU threshold for classifying candidate regions into positive samples and negative samples, wherein , the candidate area whose intersection ratio with the real frame is greater than the IoU threshold is a positive sample, and the candidate area whose intersection ratio with the real frame is smaller than the IoU threshold is a negative sample;
  • the detection module is specifically used for:
  • the detection module is also used for:
  • the number of the detectors is three, respectively the first detector, the second detector and the third detector;
  • intersection-over-union ratio threshold of the first detector is preset to be 0.45-0.55;
  • intersection-over-union ratio threshold of the second detector is preset to be 0.56-0.65;
  • intersection-over-union ratio threshold of the third detector is preset to be 0.66-0.75.
  • the detection module of this preferred embodiment has a total of three detectors, namely the first detector H1, the second detector H2 and the third detector H3;
  • the cross-over union ratio of the first detector H1 is preset to be 0.5;
  • intersection and union ratio of the second detector H2 is preset to be 0.6;
  • the cross-over union ratio of the third detector H3 is preset to be 0.7.
  • the candidate area is input to the first detector H1. Input the candidate area B1 of the first detector H1 to obtain the classification information C1;
  • the candidate area is input to the second detector H2. Input the candidate area B2 of the second detector H2 to obtain the classification information C2;
  • the candidate area is input to the third detector H3. Input the candidate area B3 of the third detector H3 to obtain the classification information C3;
  • the candidate area B1 adjusted by the first detector H1 is screened, and if its IoU is between 0.6 and 0.7, the candidate area is input to the second detector H2. If the IoU between the candidate area and the real frame is higher than 0.7, the candidate area is input to the third detector H3.
  • the candidate area B2 adjusted by the second detector H2 is screened, and if the IoU between the candidate area and the real border is higher than 0.7, the candidate area is input to the third detector H3.
  • the image to be detected is input to the neural network.
  • All the candidate regions B0 obtained by the candidate region extraction module are input to the detector H1 to obtain the adjusted candidate region B1.
  • Input all B1 to the detector H2 to obtain the candidate area B2.
  • Input B2 to H3 to get detection area B3 and corresponding classification information C3.
  • Use the NMS algorithm to deduplicate B3 to obtain the final detection area.
  • the present disclosure can also provide a target detection method, which is applied to the target detection system according to Embodiment 1, and the method includes:
  • S201 Collect image data and a target label corresponding to the image data, wherein the target label includes object category and frame position in the image;
  • S202 Input the image data to the target detection system to obtain the detection result of each detector
  • the step of comparing the detection result with the real label using the loss function to obtain the loss of each detector it also includes:
  • the loss function is a cross-entropy loss function
  • the loss function is a Smooth L1 loss function or a GIoU loss function.
  • the present disclosure can also provide a target detection device, including:
  • An image data collection module 301 configured to collect image data and target tags corresponding to the image data, wherein the target tags include object categories and frame positions in the image;
  • a target detection module 302 configured to input the image data to the target detection system to obtain a detection result of each detector
  • the loss calculation module 303 is configured to use a loss function to compare the detection result with the real label to obtain the loss of each detector.
  • the image data collection module 301 described in this disclosure is sequentially connected with the target detection module 302 and the loss calculation module 303 .
  • the present disclosure can also provide a computer storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it is used to realize the steps of the above object detection method.
  • the computer storage medium of the present disclosure may be implemented using semiconductor memory, magnetic core memory, magnetic drum memory, or magnetic disk memory.
  • Mos Semiconductor memory, mainly used in computers, mainly has two types of semiconductor memory elements: Mos and bipolar.
  • Mos components are highly integrated, the process is simple but the speed is slow.
  • Bipolar components are complex in process, high in power consumption, low in integration but fast in speed.
  • NMos is fast, for example, the access time of Intel's 1K-bit SRAM is 45ns.
  • CMos consumes less power, and the 4K-bit CMos static memory access time is 300ns.
  • the above-mentioned semiconductor memories are all random access memories (RAM), that is, they can be read and written into new content randomly during the working process.
  • ROM semiconductor read-only memory
  • ROM can be read randomly but cannot be written in during the working process, and it is used to store solidified programs and data.
  • ROM is divided into non-rewritable fuse-type read-only memory ⁇ ⁇ PROM and rewritable read-only memory EPROM two.
  • Magnetic core memory has the characteristics of low cost and high reliability, and has more than 20 years of actual use experience. Before the mid-1970s, magnetic core memory was widely used as the main memory. Its storage capacity can reach more than 10 bits, and the fastest access time is 300ns. The typical magnetic core memory capacity in the world is 4MS ⁇ 8MB, and the access cycle is 1.0 ⁇ 1.5 ⁇ s. After the rapid development of semiconductor storage replaced the magnetic core memory as the main memory, the magnetic core memory can still be used as a large-capacity expansion memory.
  • Drum memory a magnetically recorded external memory. Due to its fast information access speed and stable and reliable work, although its capacity is small, it is gradually being replaced by disk storage, but it is still used as an external memory for real-time process control computers and medium and large computers. In order to meet the needs of small and microcomputers, ultra-small magnetic drums have appeared, which are small in size, light in weight, high in reliability, and easy to use.
  • Disk storage a type of magnetically recorded external storage. It has the advantages of magnetic drum and magnetic tape storage, that is, its storage capacity is larger than that of magnetic drum, and its access speed is faster than that of magnetic tape storage, and it can be stored offline. Therefore, disks are widely used as large storage devices in various computer systems. capacity of external memory. Disks are generally divided into two categories: hard disks and floppy disks.
  • hard disk storage There are many types of hard disk storage. Structurally, it can be divided into interchangeable type and fixed type. Interchangeable disk platters can be exchanged, and fixed disk platters are fixed. There are two kinds of replaceable and fixed disks: multi-chip combination and single-chip structure, and both can be divided into fixed head type and movable head type.
  • the capacity of the fixed head type disk is small, the recording density is low and the access speed is high, but the cost is high.
  • the moving head type disk has a high recording density (up to 1000-6250 bits/inch), so it has a large capacity, but its access speed is lower than that of a fixed head disk.
  • the storage capacity of disk products can reach hundreds of megabytes, the bit density is 6 250 bits per inch, and the track density is 475 tracks per inch.
  • the multi-chip interchangeable disk storage has a large off-body capacity because the disk group can be replaced, and has a large capacity and high speed, and can store large-capacity intelligence data. It is widely used in online information retrieval systems and database management systems.
  • the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • the processor executes the computer program, the steps of the above target detection method are realized.
  • Fig. 6 is a schematic diagram of the internal structure of an electronic device in one embodiment.
  • the electronic device includes a processor, a storage medium, a memory, and a network interface connected through a system bus.
  • the storage medium of the computer device stores an operating system, a database, and computer-readable instructions
  • the database may store control information sequences
  • the processor may implement a target detection method .
  • the processor of the electrical device is used to provide computing and control capabilities, and supports the operation of the entire computer device.
  • Computer-readable instructions may be stored in the memory of the computer device, and when executed by the processor, the computer-readable instruction may cause the processor to execute an object detection method.
  • the network interface of the computer device is used for connecting and communicating with the terminal.
  • FIG. 6 is only a block diagram of a part of the structure related to the solution of this application, and does not constitute a limitation on the computer equipment to which the solution of this application is applied.
  • the specific computer equipment can be More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.
  • the electronic devices include, but are not limited to, smart phones, computers, tablet computers, wearable smart devices, artificial intelligence devices, power banks, etc.
  • the processor can be composed of integrated circuits, for example, it can be composed of a single packaged integrated circuit, or it can be composed of multiple integrated circuits with the same function or different functions, including one or more central Processor (Central Processing unit, CPU), microprocessor, digital processing chip, graphics processor and a combination of various control chips, etc.
  • the processor is the control core (Control Unit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, by running or executing programs or modules stored in the memory (such as executing remote data read and write programs, etc.), and call the data stored in the memory to execute various functions of the electronic device and process data.
  • Control Unit Control Unit
  • the bus may be a peripheral component interconnect standard (PCI for short) bus or an extended industry standard architecture (EISA for short) bus or the like.
  • PCI peripheral component interconnect standard
  • EISA extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the bus is configured to implement communication between the memory and at least one processor.
  • Figure 6 only shows an electronic device with components, and those skilled in the art can understand that the structure shown in Figure 6 does not constitute a limitation to the electronic device, and may include fewer or more components than shown in the figure , or combinations of certain components, or different arrangements of components.
  • the electronic device may also include a power supply (such as a battery) for supplying power to each component.
  • the power supply may be logically connected to the at least one processor through a power management device, thereby realizing Charge management, discharge management, and power management functions.
  • the power supply may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators and other arbitrary components.
  • the electronic device may also include various sensors, bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device may also include a network interface.
  • the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which are usually used to communicate between the electronic device and A communication link is established between other electronic devices.
  • the electronic device may further include a user interface.
  • the user interface may be a display (Display) or an input unit (such as a keyboard (Keyboard)).
  • the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like.
  • the display may also be properly referred to as a display screen or a display unit, and is used for displaying information processed in the electronic device and for displaying a visualized user interface.
  • the computer-usable storage medium may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function, etc.; use of the created data, etc.
  • the disclosed devices, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the modules is only a logical function division, and there may be other division methods in actual implementation.
  • modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional module in each embodiment of the present invention may be integrated into one processing unit, or each unit may physically exist separately, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software function modules.

Abstract

A target detection system, method and apparatus, and a medium and a device. The system comprises: an input module, which is used for receiving output image data; a feature extraction module, which is used for performing feature extraction on the image data by means of a convolutional neural network, so as to obtain an extracted feature map; a candidate region proposal module, which is used for receiving the feature map, and outputting a coarse bounding-box position of a foreground region that includes a target, and a bounding-box position of a background region; a candidate region extraction module, which is used for cropping the feature map to obtain a candidate background region and a candidate foreground region by using the bounding-box positions which are output by the candidate region proposal module, and adjusting the regions to the same size, so as to obtain candidate regions; and a detection module, which is used for classifying the obtained candidate regions, and further modifying a bounding-box position of a foreground candidate region by using a bounding-box regression algorithm, so as to obtain a final position of a detected target.

Description

一种目标检测系统、方法、装置、设备及介质A target detection system, method, device, equipment and medium 技术领域technical field
本公开涉及深度学习技术领域,更为具体来说,本公开涉及一种目标检测系统、方法、装置、设备及介质。The present disclosure relates to the technical field of deep learning, and more specifically, the present disclosure relates to a target detection system, method, device, equipment and medium.
背景技术Background technique
目标检测是计算机视觉和数字图像处理的重要研究方向,广泛应用于机器人导航、智能视频监控、工业检测、航空航天等诸多领域。目标检测的目标是找出图像中感兴趣的对象,包含物体定位和物体分类两个子任务,即同时确定物体的类别和位置。Object detection is an important research direction of computer vision and digital image processing, and it is widely used in robot navigation, intelligent video surveillance, industrial inspection, aerospace and many other fields. The goal of object detection is to find the object of interest in the image, including two subtasks of object location and object classification, that is, to determine the category and location of the object at the same time.
目前,利用卷积神经网络结合大量图片数据进行训练的目标检测模式已经成为行业的主流方式。基于神经网络的算法基本上可以归为两类:以Faster R-CNN为代表的两阶段算法和以YOLO、SSD等为代表的一阶段算法。At present, the target detection mode that uses convolutional neural network combined with a large amount of image data for training has become the mainstream method in the industry. Algorithms based on neural networks can basically be classified into two categories: two-stage algorithms represented by Faster R-CNN and one-stage algorithms represented by YOLO and SSD.
以Faster R-CNN为代表的两阶段模型大致包括五个模块:The two-stage model represented by Faster R-CNN roughly includes five modules:
输入模块:该模块接收输入图像。Input Module: This module receives an input image.
特征提取模块:该模块将输入图像经过一系列的卷积神经网络提取特征图。Feature extraction module: This module extracts feature maps from input images through a series of convolutional neural networks.
候选区域建议模块(Region Proposal Network,RPN):该模块接收特征图,输出包含目标的前景区域的粗略边框位置和背景区域边框位置。Region Proposal Network (RPN): This module receives the feature map and outputs the rough bounding box position of the foreground area containing the target and the bounding box position of the background area.
候选区域提取模块:该模块利用RPN输出的边框位置,从特征图中裁剪出候选背景区域和前景区域,并将候选区域调整成相同大小。Candidate region extraction module: This module uses the frame position output by RPN to cut out the candidate background region and foreground region from the feature map, and adjusts the candidate region to the same size.
检测模块:该模块将得到的候选区域进行分类,并利用边框回归算法对边框位置进行进一步修正,得到检测区域的最终位置。Detection module: This module classifies the obtained candidate areas, and uses the frame regression algorithm to further correct the frame position to obtain the final position of the detection area.
检测模块需要将获得的候选区域进行分类,判断其属于哪一类前景对象或者是背景。分类的前提步骤是构建候选区特征图训练集,包括候选区域对应的特征图和标签。候选区域的标签一般通过候选区域和真实边框的交并比(intersection over union,IoU)来确定。通常,检测模块会设定一个固定的IoU阈值。当候选区域与某个真实边框的IoU大于该IoU阈值,则其标签为真实边框所包含的物体类别(正样本)。若候选区域与所有真实边框的IoU都小于IoU阈值,则其标签为背景类(负样本)。实验观察发现,当IoU的阈值设置得比较低的时候,会有大量的低质量候选区域被贴上正样本标签。在这种情况下,检测器会产生较多的不准确边框。而当IoU的阈值设置得比较高的时候,虽然候选区域质量提高了,但是正样本数量大大下降,模型容易过拟合。The detection module needs to classify the obtained candidate areas to determine which type of foreground object or background it belongs to. The prerequisite step for classification is to construct a training set of candidate region feature maps, including feature maps and labels corresponding to candidate regions. The label of the candidate area is generally determined by the intersection over union (IoU) of the candidate area and the real border. Usually, the detection module sets a fixed IoU threshold. When the IoU between the candidate area and a real frame is greater than the IoU threshold, its label is the object category (positive sample) contained in the real frame. If the IoU of the candidate area and all ground-truth bounding boxes is less than the IoU threshold, its label is the background class (negative sample). Experimental observations show that when the IoU threshold is set relatively low, a large number of low-quality candidate regions will be labeled as positive samples. In this case, the detector produces more inaccurate bounding boxes. When the threshold of IoU is set relatively high, although the quality of the candidate area is improved, the number of positive samples is greatly reduced, and the model is easy to overfit.
发明内容Contents of the invention
为解决现有的基于深度学习的目标检测算法准确度不够高的技术问题,本公开提供了一种目标检测系统,包括:In order to solve the technical problem that the accuracy of the existing target detection algorithm based on deep learning is not high enough, the present disclosure provides a target detection system, including:
输入模块,用于接收输出的图像数据;An input module, configured to receive output image data;
特征提取模块,用于将所述图像数据经过卷积神经网络以提取特征图;A feature extraction module is used to extract the feature map through the convolutional neural network through the image data;
候选区域建议模块,用于接收所述特征图,输出包含目标的前景区域的粗略边框位置和背景区域边框位置;The candidate area suggestion module is used to receive the feature map, and output the rough frame position of the foreground area containing the target and the frame position of the background area;
候选区域提取模块,用于利用所述候选区域建议模块输出的边框位置,从所述特征图中裁剪出候选背景区域和前景区域,并将区域调整成相同大小,得到候选区域;The candidate area extraction module is used to use the frame position output by the candidate area suggestion module to cut out the candidate background area and the foreground area from the feature map, and adjust the areas to the same size to obtain the candidate area;
检测模块,用于将得到的候选区域进行分类,并利用边框回归算法对前景候选区域的边框位置进行进一步修正,得到检测目标的最终位置。。The detection module is used to classify the obtained candidate areas, and use the frame regression algorithm to further correct the frame position of the foreground candidate area to obtain the final position of the detection target. .
进一步,further,
所述检测模块具体包括:不少于一个检测器,其中,每个所述检测器预设有对应的交并比IoU阈值,用于将候选区域分类为正样本和负样本,其中,与真实边框的交并比值大于IoU阈值的候选区域为正样本,与真实边框的交并比值小于IoU阈值的候选区域为负样本;The detection module specifically includes: no less than one detector, wherein each detector is preset with a corresponding IoU threshold for classifying candidate regions into positive samples and negative samples, wherein the real The candidate area whose intersection ratio is greater than the IoU threshold is a positive sample, and the candidate area whose intersection ratio with the real border is smaller than the IoU threshold is a negative sample;
所述检测模块具体用于:The detection module is specifically used for:
对所述候选区域提取模块提取得到的所述候选区域进行筛选,计算候选区域与真实边框的交并比值,并根据所述交并比值,查找交并比阈值与之对应的检测器,并将该候选区域输入给对应的检测器。Filtering the candidate regions extracted by the candidate region extraction module, calculating the intersection ratio between the candidate region and the real border, and according to the intersection ratio, searching for a detector corresponding to the intersection ratio threshold, and The candidate regions are input to the corresponding detectors.
进一步,所述检测模块还用于:Further, the detection module is also used for:
将所述候选区域输入给检测器后,对候选区域进行分类和位置调整,并将调整后的候选区域重新计算与真实标签的交并比IoU,并将其输入给与其IoU数值范围对应的检测器。After inputting the candidate area to the detector, classify and adjust the position of the candidate area, and recalculate the IoU of the adjusted candidate area with the real label, and input it to the detection corresponding to its IoU value range device.
进一步,所述检测器的数量为三个,分别为第一检测器、第二检测器和第三检测器;Further, the number of the detectors is three, respectively the first detector, the second detector and the third detector;
所述第一检测器的交并比阈值预设为0.45~0.55;The intersection-over-union ratio threshold of the first detector is preset to be 0.45-0.55;
所述第二检测器的交并比阈值预设为0.56~0.65;The intersection-over-union ratio threshold of the second detector is preset to be 0.56-0.65;
所述第三检测器的交并比阈值预设为0.66~0.75。The intersection-over-union ratio threshold of the third detector is preset to be 0.66-0.75.
为实现上述技术目的,本公开还能够提供一种目标检测方法,应用于上述的系统中,所述方法包括:In order to achieve the above-mentioned technical purpose, the present disclosure can also provide a target detection method, which is applied to the above-mentioned system, and the method includes:
收集图像数据和图像数据对应的目标标签,其中,所述目标标签包括图像中的物体类别和边 框位置;Collect image data and target tags corresponding to the image data, wherein the target tags include object categories and frame positions in the image;
将所述图像数据输入给所述目标检测系统,得到每个检测器的检测结果;inputting the image data to the target detection system to obtain the detection result of each detector;
利用损失函数将检测结果与真实标签相比较,得到每个检测器的损失。A loss function is used to compare the detection result with the ground truth label to get the loss of each detector.
进一步,所述利用损失函数将检测结果与真实标签相比较,得到每个检测器的损失的步骤之后,还包括:Further, after the step of comparing the detection result with the real label using the loss function to obtain the loss of each detector, it also includes:
将所有所述检测器的损失相加,得到所述目标检测系统的总体损失。The losses of all the detectors are summed to obtain the overall loss of the object detection system.
进一步,当所述系统用于目标分类,所述损失函数为交叉熵损失函数;Further, when the system is used for target classification, the loss function is a cross-entropy loss function;
当所述系统用于位置回归分析,所述损失函数为Smooth L1损失函数或GIoU损失函数。When the system is used for position regression analysis, the loss function is a Smooth L1 loss function or a GIoU loss function.
为实现上述技术目的,本公开还能够提供一种目标检测装置,包括:In order to achieve the above technical purpose, the present disclosure can also provide a target detection device, including:
图像数据收集模块,用于收集图像数据和图像数据对应的目标标签,其中,所述目标标签包括图像中的物体类别和边框位置;An image data collection module, configured to collect image data and target tags corresponding to the image data, wherein the target tags include object categories and frame positions in the image;
目标检测模块,用于将所述图像数据输入给所述目标检测系统,得到每个检测器的检测结果;A target detection module, configured to input the image data to the target detection system to obtain the detection result of each detector;
损失计算模块,用于利用损失函数将检测结果与真实标签相比较,得到每个检测器的损失。The loss calculation module is used to compare the detection result with the real label by using the loss function to obtain the loss of each detector.
为实现上述技术目的,本公开还能够提供一种计算机存储介质,其上存储有计算机程序,计算机程序被处理器执行时用于实现上述的目标检测方法的步骤。To achieve the above-mentioned technical purpose, the present disclosure can also provide a computer storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it is used to realize the steps of the above-mentioned object detection method.
为实现上述技术目的,本公开还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述的目标检测方法的步骤。In order to achieve the above-mentioned technical purpose, the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the computer program, the above-mentioned target detection method is realized. step.
本公开的有益效果为:The beneficial effects of the disclosure are:
相对于传统的目标检测系统及算法模型,本公开设计了设定不同交并比阈值的多个检测器,并为每个检测器特定选择适合该检测器的候选区域,更有利于单个检测器的训练,因此可以很好地提升性能。Compared with the traditional target detection system and algorithm model, this disclosure designs multiple detectors with different intersection and union ratio thresholds, and specifically selects a candidate area suitable for the detector for each detector, which is more beneficial to a single detector. training, so it can improve the performance very well.
附图说明Description of drawings
图1示出了本公开的实施例1的结构示意图;FIG. 1 shows a schematic structural view of Embodiment 1 of the present disclosure;
图2示出了本公开的实施例1的优选实施方式的结构示意图;Figure 2 shows a schematic structural view of a preferred implementation of Example 1 of the present disclosure;
图3示出了本公开的实施例1的测试阶段的示意图;Figure 3 shows a schematic diagram of the testing phase of Example 1 of the present disclosure;
图4示出了本公开的实施例2的流程示意图;FIG. 4 shows a schematic flow diagram of Embodiment 2 of the present disclosure;
图5示出了本公开的实施例3的结构示意图;FIG. 5 shows a schematic structural diagram of Embodiment 3 of the present disclosure;
图6示出了本公开的实施例5的结构示意图。FIG. 6 shows a schematic structural diagram of Embodiment 5 of the present disclosure.
具体实施方式Detailed ways
以下,将参照附图来描述本公开的实施例。但是应该理解,这些描述只是示例性的,而并非要限制本公开的范围。此外,在以下说明中,省略了对公知结构和技术的描述,以避免不必要地混淆本公开的概念。Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. It should be understood, however, that these descriptions are exemplary only, and are not intended to limit the scope of the present disclosure. Also, in the following description, descriptions of well-known structures and techniques are omitted to avoid unnecessarily obscuring the concept of the present disclosure.
在附图中示出了根据本公开实施例的各种结构示意图。这些图并非是按比例绘制的,其中为了清楚表达的目的,放大了某些细节,并且可能省略了某些细节。图中所示出的各种区域、层的形状以及它们之间的相对大小、位置关系仅是示例性的,实际中可能由于制造公差或技术限制而有所偏差,并且本领域技术人员根据实际所需可以另外设计具有不同形状、大小、相对位置的区域/层。Various structural schematic diagrams according to embodiments of the present disclosure are shown in the accompanying drawings. The figures are not drawn to scale, with certain details exaggerated and possibly omitted for clarity of presentation. The shapes of the various regions and layers shown in the figure, as well as their relative sizes and positional relationships are only exemplary, and may deviate due to manufacturing tolerances or technical limitations in practice, and those skilled in the art will Regions/layers with different shapes, sizes, and relative positions can be additionally designed as needed.
实施例一:Embodiment one:
如图1所示:As shown in Figure 1:
本公开提供了一种目标检测系统,包括:The present disclosure provides a target detection system, including:
输入模块,用于接收输出的图像数据;An input module, configured to receive output image data;
特征提取模块,用于将所述图像数据经过卷积神经网络特征特取以得到提取特征图;A feature extraction module is used to extract the feature map by taking the image data through a convolutional neural network feature;
候选区域建议模块,用于接收所述特征图,输出包含目标的前景区域的粗略边框位置和背景区域边框位置;The candidate area suggestion module is used to receive the feature map, and output the rough frame position of the foreground area containing the target and the frame position of the background area;
候选区域提取模块,用于利用所述候选区域建议模块输出的边框位置,从所述特征图中裁剪出候选背景区域和前景区域,并将区域调整成相同大小,得到候选区域;The candidate area extraction module is used to use the frame position output by the candidate area suggestion module to cut out the candidate background area and the foreground area from the feature map, and adjust the areas to the same size to obtain the candidate area;
检测模块,用于将得到的候选区域进行分类,并利用边框回归算法对前景候选区域的边框位置进行进一步修正,得到检测目标的最终位置。The detection module is used to classify the obtained candidate areas, and use the frame regression algorithm to further correct the frame position of the foreground candidate area to obtain the final position of the detection target.
进一步地,所述检测模块具体包括:不少于一个检测器,其中,每个所述检测器预设有对应的交并比IoU阈值,用于将候选区域分类为正样本和负样本,其中,与真实边框的交并比值大于IoU阈值的候选区域为正样本,与真实边框的交并比值小于IoU阈值的候选区域为负样本;Further, the detection module specifically includes: no less than one detector, wherein each detector is preset with a corresponding IoU threshold for classifying candidate regions into positive samples and negative samples, wherein , the candidate area whose intersection ratio with the real frame is greater than the IoU threshold is a positive sample, and the candidate area whose intersection ratio with the real frame is smaller than the IoU threshold is a negative sample;
所述检测模块具体用于:The detection module is specifically used for:
对所述候选区域提取模块提取得到的所述候选区域进行筛选,计算候选区域与真实边框的交并比值,并根据所述交并比值,查找交并比阈值与之对应的检测器,并将该候选区域输入给对应的检测器。Filtering the candidate regions extracted by the candidate region extraction module, calculating the intersection ratio between the candidate region and the real border, and according to the intersection ratio, searching for a detector corresponding to the intersection ratio threshold, and The candidate regions are input to the corresponding detectors.
进一步,所述检测模块还用于:Further, the detection module is also used for:
将所述候选区域输入给检测器后,对候选区域进行分类和位置调整,并将调整后的候选区域 重新计算与真实标签的交并比IoU,并将其输入给与其IoU数值范围对应的检测器。After inputting the candidate area to the detector, classify and adjust the position of the candidate area, and recalculate the IoU of the adjusted candidate area with the real label, and input it to the detection corresponding to its IoU value range device.
进一步,所述检测器的数量为三个,分别为第一检测器、第二检测器和第三检测器;Further, the number of the detectors is three, respectively the first detector, the second detector and the third detector;
所述第一检测器的交并比阈值预设为0.45~0.55;The intersection-over-union ratio threshold of the first detector is preset to be 0.45-0.55;
所述第二检测器的交并比阈值预设为0.56~0.65;The intersection-over-union ratio threshold of the second detector is preset to be 0.56-0.65;
所述第三检测器的交并比阈值预设为0.66~0.75。The intersection-over-union ratio threshold of the third detector is preset to be 0.66-0.75.
下面结合一个具体的实施例一的优选实施方式详解本公开的目标检测系统:The target detection system of the present disclosure will be explained in detail below in conjunction with a preferred implementation of a specific embodiment 1:
如图2所示:as shown in picture 2:
该优选实施例的检测模块共具有三个检测器,分别为第一检测器H1、第二检测器H2和第三检测器H3;The detection module of this preferred embodiment has a total of three detectors, namely the first detector H1, the second detector H2 and the third detector H3;
所述第一检测器H1的交并比预设为0.5;The cross-over union ratio of the first detector H1 is preset to be 0.5;
所述第二检测器H2的交并比预设为0.6;The intersection and union ratio of the second detector H2 is preset to be 0.6;
所述第三检测器H3的交并比预设为0.7。The cross-over union ratio of the third detector H3 is preset to be 0.7.
在检测模块进行检测的过程中,若候选区域与真实边框的IoU介于0.5和0.6之间,则将该候选区域输入给第一检测器H1。输入第一检测器H1的候选区域B1得到分类信息C1;During the detection process of the detection module, if the IoU between the candidate area and the real frame is between 0.5 and 0.6, the candidate area is input to the first detector H1. Input the candidate area B1 of the first detector H1 to obtain the classification information C1;
若候选区域与真实边框的IoU介于0.6和0.7之间,则将该候选区域输入给第二检测器H2。输入第二检测器H2的候选区域B2得到分类信息C2;If the IoU between the candidate area and the real frame is between 0.6 and 0.7, the candidate area is input to the second detector H2. Input the candidate area B2 of the second detector H2 to obtain the classification information C2;
候选区域与真实边框的IoU高于0.7,则将该候选区域输入给第三检测器H3。输入第三检测器H3的候选区域B3得到分类信息C3;If the IoU between the candidate area and the real frame is higher than 0.7, the candidate area is input to the third detector H3. Input the candidate area B3 of the third detector H3 to obtain the classification information C3;
同时,将经过第一检测器H1检测器调整后的候选区域B1进行筛选,若其IoU介于0.6和0.7之间,则将该候选区域输入给第二检测器H2。若候选区域与真实边框的IoU高于0.7,则将该候选区域输入给第三检测器H3。At the same time, the candidate area B1 adjusted by the first detector H1 is screened, and if its IoU is between 0.6 and 0.7, the candidate area is input to the second detector H2. If the IoU between the candidate area and the real frame is higher than 0.7, the candidate area is input to the third detector H3.
将经过第二检测器H2调整后的候选区域B2进行筛选,若候选区域与真实边框的IoU高于0.7,则将该候选区域输入给第三检测器H3。The candidate area B2 adjusted by the second detector H2 is screened, and if the IoU between the candidate area and the real border is higher than 0.7, the candidate area is input to the third detector H3.
如图3所示,在测试阶段,将待检测的图像输入给神经网络。将候选区提取模块得出的候选区域B0全部输入给检测器H1,得到调整后的候选区域B1。将B1全部输入给检测器H2,得到候选区域B2。将B2输入给H3,得到检测区域B3和对应的分类信息C3。利用NMS算法对B3进行去重操作,得到最后的检测区域。As shown in Figure 3, in the test phase, the image to be detected is input to the neural network. All the candidate regions B0 obtained by the candidate region extraction module are input to the detector H1 to obtain the adjusted candidate region B1. Input all B1 to the detector H2 to obtain the candidate area B2. Input B2 to H3 to get detection area B3 and corresponding classification information C3. Use the NMS algorithm to deduplicate B3 to obtain the final detection area.
实施例二:Embodiment two:
如图4所示,As shown in Figure 4,
本公开还能够提供一种目标检测方法,应用于如实施例一所述的目标检测系统中,所述方法包括:The present disclosure can also provide a target detection method, which is applied to the target detection system according to Embodiment 1, and the method includes:
S201:收集图像数据和图像数据对应的目标标签,其中,所述目标标签包括图像中的物体类别和边框位置;S201: Collect image data and a target label corresponding to the image data, wherein the target label includes object category and frame position in the image;
S202:将所述图像数据输入给所述目标检测系统,得到每个检测器的检测结果;S202: Input the image data to the target detection system to obtain the detection result of each detector;
S203:利用损失函数将检测结果与真实标签相比较,得到每个检测器的损失。S203: Using a loss function to compare the detection result with the real label to obtain the loss of each detector.
进一步,所述利用损失函数将检测结果与真实标签相比较,得到每个检测器的损失的步骤之后,还包括:Further, after the step of comparing the detection result with the real label using the loss function to obtain the loss of each detector, it also includes:
将所有所述检测器的损失相加,得到所述目标检测系统的总体损失。The losses of all the detectors are summed to obtain the overall loss of the object detection system.
进一步,当所述系统用于目标分类,所述损失函数为交叉熵损失函数;Further, when the system is used for target classification, the loss function is a cross-entropy loss function;
当所述系统用于位置回归分析,所述损失函数为Smooth L1损失函数或GIoU损失函数。When the system is used for position regression analysis, the loss function is a Smooth L1 loss function or a GIoU loss function.
实施例三:Embodiment three:
如图5所示,As shown in Figure 5,
本公开还能够提供一种目标检测装置,包括:The present disclosure can also provide a target detection device, including:
图像数据收集模块301,用于收集图像数据和图像数据对应的目标标签,其中,所述目标标签包括图像中的物体类别和边框位置;An image data collection module 301, configured to collect image data and target tags corresponding to the image data, wherein the target tags include object categories and frame positions in the image;
目标检测模块302,用于将所述图像数据输入给所述目标检测系统,得到每个检测器的检测结果;A target detection module 302, configured to input the image data to the target detection system to obtain a detection result of each detector;
损失计算模块303,用于利用损失函数将检测结果与真实标签相比较,得到每个检测器的损失。The loss calculation module 303 is configured to use a loss function to compare the detection result with the real label to obtain the loss of each detector.
其中,本公开所述的图像数据收集模块301依次与所述目标检测模块302以及所述损失计算模块303相连接。Wherein, the image data collection module 301 described in this disclosure is sequentially connected with the target detection module 302 and the loss calculation module 303 .
实施例四:Embodiment four:
本公开还能够提供一种计算机存储介质,其上存储有计算机程序,计算机程序被处理器执行时用于实现上述的目标检测方法的步骤。The present disclosure can also provide a computer storage medium, on which a computer program is stored, and when the computer program is executed by a processor, it is used to realize the steps of the above object detection method.
本公开的计算机存储介质可以采用半导体存储器、磁芯存储器、磁鼓存储器或磁盘存储器实现。The computer storage medium of the present disclosure may be implemented using semiconductor memory, magnetic core memory, magnetic drum memory, or magnetic disk memory.
半导体存储器,主要用于计算机的半导体存储元件主要有Mos和双极型两种。Mos元件集成度高、工艺简单但速度较慢。双极型元件工艺复杂、功耗大、集成度低但速度快。 NMos和CMos问世后,使Mos存储器在半导体存储器中开始占主要地位。NMos速度快,如英特尔公司的1K位静态随机存储器的存取时间为45ns。而CMos耗电省,4K位的CMos静态存储器存取时间为300ns。上述半导体存储器都是随机存取存储器(RAM),即在工作过程中可随机进行读出和写入新内容。而半导体只读存储器(ROM)在工作过程中可随机读出但不能写入,它用来存放已固化好的程序和数据。ROM又分为不可改写的熔断丝式只读存储器──PROM和可改写的只读存储器EPROM两种。Semiconductor memory, mainly used in computers, mainly has two types of semiconductor memory elements: Mos and bipolar. Mos components are highly integrated, the process is simple but the speed is slow. Bipolar components are complex in process, high in power consumption, low in integration but fast in speed. After the advent of NMos and CMos, Mos memory began to play a major role in semiconductor memory. NMos is fast, for example, the access time of Intel's 1K-bit SRAM is 45ns. CMos consumes less power, and the 4K-bit CMos static memory access time is 300ns. The above-mentioned semiconductor memories are all random access memories (RAM), that is, they can be read and written into new content randomly during the working process. The semiconductor read-only memory (ROM) can be read randomly but cannot be written in during the working process, and it is used to store solidified programs and data. ROM is divided into non-rewritable fuse-type read-only memory ─ ─ PROM and rewritable read-only memory EPROM two.
磁芯存储器,具有成本低,可靠性高的特点,且有20多年的实际使用经验。70年代中期以前广泛使用磁芯存储器作为主存储器。其存储容量可达10位以上,存取时间最快为300ns。国际上典型的磁芯存储器容量为4MS~8MB,存取周期为1.0~1.5μs。在半导体存储快速发展取代磁芯存储器作为主存储器的位置之后,磁芯存储器仍然可以作为大容量扩充存储器而得到应用。Magnetic core memory has the characteristics of low cost and high reliability, and has more than 20 years of actual use experience. Before the mid-1970s, magnetic core memory was widely used as the main memory. Its storage capacity can reach more than 10 bits, and the fastest access time is 300ns. The typical magnetic core memory capacity in the world is 4MS ~ 8MB, and the access cycle is 1.0 ~ 1.5μs. After the rapid development of semiconductor storage replaced the magnetic core memory as the main memory, the magnetic core memory can still be used as a large-capacity expansion memory.
磁鼓存储器,一种磁记录的外存储器。由于其信息存取速度快,工作稳定可靠,虽然其容量较小,正逐渐被磁盘存储器所取代,但仍被用作实时过程控制计算机和中、大型计算机的外存储器。为了适应小型和微型计算机的需要,出现了超小型磁鼓,其体积小、重量轻、可靠性高、使用方便。Drum memory, a magnetically recorded external memory. Due to its fast information access speed and stable and reliable work, although its capacity is small, it is gradually being replaced by disk storage, but it is still used as an external memory for real-time process control computers and medium and large computers. In order to meet the needs of small and microcomputers, ultra-small magnetic drums have appeared, which are small in size, light in weight, high in reliability, and easy to use.
磁盘存储器,一种磁记录的外存储器。它兼有磁鼓和磁带存储器的优点,即其存储容量较磁鼓容量大,而存取速度则较磁带存储器快,又可脱机贮存,因此在各种计算机系统中磁盘被广泛用作大容量的外存储器。磁盘一般分为硬磁盘和软磁盘存储器两大类。Disk storage, a type of magnetically recorded external storage. It has the advantages of magnetic drum and magnetic tape storage, that is, its storage capacity is larger than that of magnetic drum, and its access speed is faster than that of magnetic tape storage, and it can be stored offline. Therefore, disks are widely used as large storage devices in various computer systems. capacity of external memory. Disks are generally divided into two categories: hard disks and floppy disks.
硬磁盘存储器的品种很多。从结构上,分可换式和固定式两种。可换式磁盘盘片可调换,固定式磁盘盘片是固定的。可换式和固定式磁盘都有多片组合和单片结构两种,又都可分为固定磁头型和活动磁头型。固定磁头型磁盘的容量较小,记录密度低存取速度高,但造价高。活动磁头型磁盘记录密度高(可达1000~6250位/英寸),因而容量大,但存取速度相对固定磁头磁盘低。磁盘产品的存储容量可达几百兆字节,位密度为每英寸6 250位,道密度为每英寸475道。其中多片可换磁盘存储器由于盘组可以更换,具有很大的脱体容量,而且容量大,速度高,可存储大容量情报资料,在联机情报检索系统、数据库管理系统中得到广泛应用。There are many types of hard disk storage. Structurally, it can be divided into interchangeable type and fixed type. Interchangeable disk platters can be exchanged, and fixed disk platters are fixed. There are two kinds of replaceable and fixed disks: multi-chip combination and single-chip structure, and both can be divided into fixed head type and movable head type. The capacity of the fixed head type disk is small, the recording density is low and the access speed is high, but the cost is high. The moving head type disk has a high recording density (up to 1000-6250 bits/inch), so it has a large capacity, but its access speed is lower than that of a fixed head disk. The storage capacity of disk products can reach hundreds of megabytes, the bit density is 6 250 bits per inch, and the track density is 475 tracks per inch. Among them, the multi-chip interchangeable disk storage has a large off-body capacity because the disk group can be replaced, and has a large capacity and high speed, and can store large-capacity intelligence data. It is widely used in online information retrieval systems and database management systems.
实施例五:Embodiment five:
本公开还提供了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现上述的目标检测方法的步骤。The present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the computer program, the steps of the above target detection method are realized.
图6为一个实施例中电子设备的内部结构示意图。如图6所示,该电子设备包括通 过系统总线连接的处理器、存储介质、存储器和网络接口。其中,该计算机设备的存储介质存储有操作系统、数据库和计算机可读指令,数据库中可存储有控件信息序列,该计算机可读指令被处理器执行时,可使得处理器实现一种目标检测方法。该电设备的处理器用于提供计算和控制能力,支撑整个计算机设备的运行。该计算机设备的存储器中可存储有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行一种目标检测方法。该计算机设备的网络接口用于与终端连接通信。本领域技术人员可以理解,图6中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。Fig. 6 is a schematic diagram of the internal structure of an electronic device in one embodiment. As shown in Figure 6, the electronic device includes a processor, a storage medium, a memory, and a network interface connected through a system bus. Wherein, the storage medium of the computer device stores an operating system, a database, and computer-readable instructions, the database may store control information sequences, and when the computer-readable instructions are executed by the processor, the processor may implement a target detection method . The processor of the electrical device is used to provide computing and control capabilities, and supports the operation of the entire computer device. Computer-readable instructions may be stored in the memory of the computer device, and when executed by the processor, the computer-readable instruction may cause the processor to execute an object detection method. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art can understand that the structure shown in FIG. 6 is only a block diagram of a part of the structure related to the solution of this application, and does not constitute a limitation on the computer equipment to which the solution of this application is applied. The specific computer equipment can be More or fewer components than shown in the figures may be included, or some components may be combined, or have a different arrangement of components.
该电子设备包括但不限于智能电话、计算机、平板电脑、可穿戴智能设备、人工智能设备、移动电源等。The electronic devices include, but are not limited to, smart phones, computers, tablet computers, wearable smart devices, artificial intelligence devices, power banks, etc.
所述处理器在一些实施例中可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。所述处理器是所述电子设备的控制核心(Control Unit),利用各种接口和线路连接整个电子设备的各个部件,通过运行或执行存储在所述存储器内的程序或者模块(例如执行远端数据读写程序等),以及调用存储在所述存储器内的数据,以执行电子设备的各种功能和处理数据。In some embodiments, the processor can be composed of integrated circuits, for example, it can be composed of a single packaged integrated circuit, or it can be composed of multiple integrated circuits with the same function or different functions, including one or more central Processor (Central Processing unit, CPU), microprocessor, digital processing chip, graphics processor and a combination of various control chips, etc. The processor is the control core (Control Unit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, by running or executing programs or modules stored in the memory (such as executing remote data read and write programs, etc.), and call the data stored in the memory to execute various functions of the electronic device and process data.
所述总线可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。所述总线被设置为实现所述存储器以及至少一个处理器等之间的连接通信。The bus may be a peripheral component interconnect standard (PCI for short) bus or an extended industry standard architecture (EISA for short) bus or the like. The bus can be divided into address bus, data bus, control bus and so on. The bus is configured to implement communication between the memory and at least one processor.
图6仅示出了具有部件的电子设备,本领域技术人员可以理解的是,图6示出的结构并不构成对所述电子设备的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。Figure 6 only shows an electronic device with components, and those skilled in the art can understand that the structure shown in Figure 6 does not constitute a limitation to the electronic device, and may include fewer or more components than shown in the figure , or combinations of certain components, or different arrangements of components.
例如,尽管未示出,所述电子设备还可以包括给各个部件供电的电源(比如电池),优选地,电源可以通过电源管理装置与所述至少一个处理器逻辑相连,从而通过电源管理装置实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电装置、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。所述电子设备还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不 再赘述。For example, although not shown, the electronic device may also include a power supply (such as a battery) for supplying power to each component. Preferably, the power supply may be logically connected to the at least one processor through a power management device, thereby realizing Charge management, discharge management, and power management functions. The power supply may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators and other arbitrary components. The electronic device may also include various sensors, bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
进一步地,所述电子设备还可以包括网络接口,可选地,所述网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该电子设备与其他电子设备之间建立通信连接。Further, the electronic device may also include a network interface. Optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which are usually used to communicate between the electronic device and A communication link is established between other electronic devices.
可选地,该电子设备还可以包括用户接口,用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard)),可选地,用户接口还可以是标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备中处理的信息以及用于显示可视化的用户界面。Optionally, the electronic device may further include a user interface. The user interface may be a display (Display) or an input unit (such as a keyboard (Keyboard)). Optionally, the user interface may also be a standard wired interface or a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like. Wherein, the display may also be properly referred to as a display screen or a display unit, and is used for displaying information processed in the electronic device and for displaying a visualized user interface.
进一步地,所述计算机可用存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。Further, the computer-usable storage medium may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function, etc.; use of the created data, etc.
在本发明所提供的几个实施例中,应该理解到,所揭露的设备,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the several embodiments provided by the present invention, it should be understood that the disclosed devices, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules is only a logical function division, and there may be other division methods in actual implementation.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本发明各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。In addition, each functional module in each embodiment of the present invention may be integrated into one processing unit, or each unit may physically exist separately, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or in the form of hardware plus software function modules.
以上对本公开的实施例进行了描述。但是,这些实施例仅仅是为了说明的目的,而并非为了限制本公开的范围。本公开的范围由所附权利要求及其等价物限定。不脱离本公开的范围,本领域技术人员可以做出多种替代和修改,这些替代和修改都应落在本公开的范围之内。The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. The scope of the present disclosure is defined by the appended claims and their equivalents. Various substitutions and modifications can be made by those skilled in the art without departing from the scope of the present disclosure, and these substitutions and modifications should all fall within the scope of the present disclosure.

Claims (10)

  1. 一种目标检测系统,其特征在于,包括:A target detection system, characterized in that it comprises:
    输入模块,用于接收输出的图像数据;An input module, configured to receive output image data;
    特征提取模块,用于将所述图像数据经过卷积神经网络特征特取以得到提取特征图;A feature extraction module is used to extract the feature map by taking the image data through a convolutional neural network feature;
    候选区域建议模块,用于接收所述特征图,输出包含目标的前景区域的粗略边框位置和背景区域边框位置;The candidate area suggestion module is used to receive the feature map, and output the rough frame position of the foreground area containing the target and the frame position of the background area;
    候选区域提取模块,用于利用所述候选区域建议模块输出的边框位置,从所述特征图中裁剪出候选背景区域和前景区域,并将区域调整成相同大小,得到候选区域;The candidate area extraction module is used to use the frame position output by the candidate area suggestion module to cut out the candidate background area and the foreground area from the feature map, and adjust the areas to the same size to obtain the candidate area;
    检测模块,用于将得到的候选区域进行分类,并利用边框回归算法对前景候选区域的边框位置进行进一步修正,得到检测目标的最终位置。The detection module is used to classify the obtained candidate areas, and use the frame regression algorithm to further correct the frame position of the foreground candidate area to obtain the final position of the detection target.
  2. 根据权利要求1所述的系统,其特征在于,所述检测模块具体包括:不少于一个检测器,其中,每个所述检测器预设有对应的交并比IoU阈值,用于将候选区域分类为正样本和负样本,其中,与真实边框的交并比值大于IoU阈值的候选区域为正样本,与真实边框的交并比值小于IoU阈值的候选区域为负样本;The system according to claim 1, wherein the detection module specifically includes: no less than one detector, wherein each detector is preset with a corresponding intersection-over-union ratio (IoU) threshold, which is used to select candidate Regions are classified into positive samples and negative samples. Among them, the candidate region whose intersection ratio with the real frame is greater than the IoU threshold is a positive sample, and the candidate region whose intersection ratio with the real frame is smaller than the IoU threshold is a negative sample;
    所述检测模块具体用于:The detection module is specifically used for:
    对所述候选区域提取模块提取得到的所述候选区域进行筛选,计算候选区域与真实边框的交并比值,并根据所述交并比值,查找交并比阈值与之对应的检测器,并将该候选区域输入给对应的检测器。Filtering the candidate regions extracted by the candidate region extraction module, calculating the intersection ratio between the candidate region and the real border, and according to the intersection ratio, searching for a detector corresponding to the intersection ratio threshold, and The candidate regions are input to the corresponding detectors.
  3. 根据权利要求2所述的系统,其特征在于,所述检测模块还用于:The system according to claim 2, wherein the detection module is also used for:
    将所述候选区域输入给检测器后,对候选区域进行分类和位置调整,并将调整后的候选区域重新计算与真实标签的交并比IoU,并将其输入给与其IoU数值范围对应的检测器。After inputting the candidate area to the detector, classify and adjust the position of the candidate area, and recalculate the IoU of the adjusted candidate area with the real label, and input it to the detection corresponding to its IoU value range device.
  4. 根据权利要求2或3任一项中所述的系统,其特征在于,所述检测器的数量为三个,分别为第一检测器、第二检测器和第三检测器;The system according to any one of claims 2 or 3, wherein the number of said detectors is three, being respectively a first detector, a second detector and a third detector;
    所述第一检测器的交并比阈值预设为0.45~0.55;The intersection-over-union ratio threshold of the first detector is preset to be 0.45-0.55;
    所述第二检测器的交并比阈值预设为0.56~0.65;The intersection-over-union ratio threshold of the second detector is preset to be 0.56-0.65;
    所述第三检测器的交并比阈值预设为0.66~0.75。The intersection-over-union ratio threshold of the third detector is preset to be 0.66-0.75.
  5. 一种目标检测方法,应用于如权利要求1~4任一项中所述的系统中,其特征在于,所述方法包括:A target detection method applied to the system according to any one of claims 1 to 4, characterized in that the method comprises:
    收集图像数据和图像数据对应的目标标签,其中,所述目标标签包括图像中的物体类别和边框位置;Collect image data and target labels corresponding to the image data, wherein the target labels include object categories and frame positions in the image;
    将所述图像数据输入给所述目标检测系统,得到每个检测器的检测结果;inputting the image data to the target detection system to obtain the detection result of each detector;
    利用损失函数将检测结果与真实标签相比较,得到每个检测器的损失。A loss function is used to compare the detection result with the ground truth label to get the loss of each detector.
  6. 根据权利要求5所述的方法,其特征在于,所述利用损失函数将检测结果与真实标签相比较,得到每个检测器的损失的步骤之后,还包括:The method according to claim 5, characterized in that, after the step of comparing the detection result with the real label using a loss function to obtain the loss of each detector, it also includes:
    将所有所述检测器的损失相加,得到所述目标检测系统的总体损失。The losses of all the detectors are summed to obtain the overall loss of the object detection system.
  7. 根据权利要求5或6任一项中所述的方法,其特征在于,当所述系统用于目标分类,所述损失函数为交叉熵损失函数;According to the method described in any one of claim 5 or 6, it is characterized in that, when the system is used for target classification, the loss function is a cross-entropy loss function;
    当所述系统用于位置回归分析,所述损失函数为Smooth L1损失函数或GIoU损失函数。When the system is used for position regression analysis, the loss function is a Smooth L1 loss function or a GIoU loss function.
  8. 一种目标检测装置,其特征在于,包括:A target detection device, characterized in that it comprises:
    图像数据收集模块,用于收集图像数据和图像数据对应的目标标签,其中,所述目标标签包括图像中的物体类别和边框位置;An image data collection module, configured to collect image data and target tags corresponding to the image data, wherein the target tags include object categories and frame positions in the image;
    目标检测模块,用于将所述图像数据输入给所述目标检测系统,得到每个检测器的检测结果;A target detection module, configured to input the image data to the target detection system to obtain the detection result of each detector;
    损失计算模块,用于利用损失函数将检测结果与真实标签相比较,得到每个检测器的损失。The loss calculation module is used to compare the detection result with the real label by using the loss function to obtain the loss of each detector.
  9. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,处理器执行计算机程序时实现权利要求5~7任一项中所述的目标检测方法对应的步骤。An electronic device, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the computer program, it realizes the object detection method corresponding to any one of claims 5 to 7. step.
  10. 一种计算机存储介质,其上存储有计算机程序指令,其特征在于,所述程序指令被处理器执行时用于实现权利要求5~7任一项中所述的目标检测方法对应的步骤。A computer storage medium, on which computer program instructions are stored, wherein the program instructions are used to implement the corresponding steps of the target detection method described in any one of claims 5-7 when executed by a processor.
PCT/CN2021/139062 2021-06-04 2021-12-17 Target detection system, method and apparatus, and device and medium WO2022252565A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110622240.5A CN113255682B (en) 2021-06-04 2021-06-04 Target detection system, method, device, equipment and medium
CN202110622240.5 2021-06-04

Publications (1)

Publication Number Publication Date
WO2022252565A1 true WO2022252565A1 (en) 2022-12-08

Family

ID=77186397

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/139062 WO2022252565A1 (en) 2021-06-04 2021-12-17 Target detection system, method and apparatus, and device and medium

Country Status (2)

Country Link
CN (1) CN113255682B (en)
WO (1) WO2022252565A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255682B (en) * 2021-06-04 2021-11-16 浙江智慧视频安防创新中心有限公司 Target detection system, method, device, equipment and medium
CN117237697A (en) * 2023-08-01 2023-12-15 北京邮电大学 Small sample image detection method, system, medium and equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108109160A (en) * 2017-11-16 2018-06-01 浙江工业大学 It is a kind of that interactive GrabCut tongue bodies dividing method is exempted from based on deep learning
CN108830188A (en) * 2018-05-30 2018-11-16 西安理工大学 Vehicle checking method based on deep learning
CN109800631A (en) * 2018-12-07 2019-05-24 天津大学 Fluorescence-encoded micro-beads image detecting method based on masked areas convolutional neural networks
CN109977945A (en) * 2019-02-26 2019-07-05 博众精工科技股份有限公司 Localization method and system based on deep learning
CN110210391A (en) * 2019-05-31 2019-09-06 合肥云诊信息科技有限公司 Tongue picture grain quantitative analysis method based on multiple dimensioned convolutional neural networks
CN111091105A (en) * 2019-12-23 2020-05-01 郑州轻工业大学 Remote sensing image target detection method based on new frame regression loss function
CN111539469A (en) * 2020-04-20 2020-08-14 东南大学 Weak supervision fine-grained image identification method based on vision self-attention mechanism
CN111611947A (en) * 2020-05-25 2020-09-01 济南博观智能科技有限公司 License plate detection method, device, equipment and medium
WO2020181685A1 (en) * 2019-03-12 2020-09-17 南京邮电大学 Vehicle-mounted video target detection method based on deep learning
CN113255682A (en) * 2021-06-04 2021-08-13 浙江智慧视频安防创新中心有限公司 Target detection system, method, device, equipment and medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877058B (en) * 2010-02-10 2012-07-25 杭州海康威视软件有限公司 People flow rate statistical method and system
CN109858481A (en) * 2019-01-09 2019-06-07 杭州电子科技大学 A kind of Ship Target Detection method based on the detection of cascade position sensitivity
CN111160407B (en) * 2019-12-10 2023-02-07 重庆特斯联智慧科技股份有限公司 Deep learning target detection method and system
CN111401410B (en) * 2020-02-27 2023-06-13 江苏大学 Traffic sign detection method based on improved cascade neural network
CN111861978B (en) * 2020-05-29 2023-10-31 陕西师范大学 Bridge crack example segmentation method based on Faster R-CNN
CN112598683B (en) * 2020-12-27 2024-04-02 北京化工大学 Sweep OCT human eye image segmentation method based on sweep frequency optical coherence tomography

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108109160A (en) * 2017-11-16 2018-06-01 浙江工业大学 It is a kind of that interactive GrabCut tongue bodies dividing method is exempted from based on deep learning
CN108830188A (en) * 2018-05-30 2018-11-16 西安理工大学 Vehicle checking method based on deep learning
CN109800631A (en) * 2018-12-07 2019-05-24 天津大学 Fluorescence-encoded micro-beads image detecting method based on masked areas convolutional neural networks
CN109977945A (en) * 2019-02-26 2019-07-05 博众精工科技股份有限公司 Localization method and system based on deep learning
WO2020181685A1 (en) * 2019-03-12 2020-09-17 南京邮电大学 Vehicle-mounted video target detection method based on deep learning
CN110210391A (en) * 2019-05-31 2019-09-06 合肥云诊信息科技有限公司 Tongue picture grain quantitative analysis method based on multiple dimensioned convolutional neural networks
CN111091105A (en) * 2019-12-23 2020-05-01 郑州轻工业大学 Remote sensing image target detection method based on new frame regression loss function
CN111539469A (en) * 2020-04-20 2020-08-14 东南大学 Weak supervision fine-grained image identification method based on vision self-attention mechanism
CN111611947A (en) * 2020-05-25 2020-09-01 济南博观智能科技有限公司 License plate detection method, device, equipment and medium
CN113255682A (en) * 2021-06-04 2021-08-13 浙江智慧视频安防创新中心有限公司 Target detection system, method, device, equipment and medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHENG YUAN: "Research on Object Detection Algorithm of Remote Sensing Images Based on Deep Learning", MASTER THESIS, TIANJIN POLYTECHNIC UNIVERSITY, CN, 15 July 2020 (2020-07-15), CN , XP055857596, ISSN: 1674-0246 *

Also Published As

Publication number Publication date
CN113255682A (en) 2021-08-13
CN113255682B (en) 2021-11-16

Similar Documents

Publication Publication Date Title
WO2022252565A1 (en) Target detection system, method and apparatus, and device and medium
CN102663015B (en) Video semantic labeling method based on characteristics bag models and supervised learning
Su et al. RCAG-Net: Residual channelwise attention gate network for hot spot defect detection of photovoltaic farms
CN111723786A (en) Method and device for detecting wearing of safety helmet based on single model prediction
Chen et al. An improved Yolov3 based on dual path network for cherry tomatoes detection
Yang et al. Multi-scale bidirectional fcn for object skeleton extraction
Shinde et al. Wafer defect localization and classification using deep learning techniques
CN103150470A (en) Visualization method for concept drift of data stream in dynamic data environment
CN106529455A (en) Fast human posture recognition method based on SoC FPGA
CN112487161A (en) Enterprise demand oriented expert recommendation method, device, medium and equipment
Li et al. MobileNetV3-CenterNet: A target recognition method for avoiding missed detection effectively based on a lightweight network
CN111797175B (en) Data storage method and device, storage medium and electronic equipment
CN116630753A (en) Multi-scale small sample target detection method based on contrast learning
WO2022098092A1 (en) Method of video search in an electronic device
US20220147565A1 (en) Method of video search in an electronic device
US11605230B2 (en) Systems and methods for compliance monitoring
Zhihao et al. Object detection algorithm based on dense connection
Zheng Multiple-level alignment for cross-domain scene text detection
CN113052133A (en) Yolov 3-based safety helmet identification method, apparatus, medium and equipment
CN112995063B (en) Flow monitoring method, device, equipment and medium
Kuo et al. Applications of deep learning to road sign detection in dvr images
CN114882489B (en) Method, device, equipment and medium for horizontally correcting rotating license plate
CN112989938A (en) Real-time tracking and identifying method, device, medium and equipment for pedestrians
Lu et al. Improved YOLO algorithm for object detection in traffic video
CN113806539B (en) Text data enhancement system, method, equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21943917

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE