CN113255682B

CN113255682B - Target detection system, method, device, equipment and medium

Info

Publication number: CN113255682B
Application number: CN202110622240.5A
Authority: CN
Inventors: 廖丹萍
Original assignee: Zhejiang Smart Video Security Innovation Center Co Ltd
Current assignee: Hangzhou Eda Precision Electromechanical Science & Technology Co ltd
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2021-11-16
Anticipated expiration: 2041-06-04
Also published as: WO2022252565A1; CN113255682A

Abstract

The present disclosure provides a target detection system, method, apparatus, medium and device, wherein the system comprises: the input module is used for receiving the output image data; the characteristic extraction module is used for carrying out characteristic extraction on the image data through a convolutional neural network to obtain an extracted characteristic diagram; the candidate region suggesting module is used for receiving the feature map and outputting a rough frame position of a foreground region containing the target and a frame position of a background region; the candidate region extraction module is used for cutting out a candidate background region and a candidate foreground region from the feature map by utilizing the frame position output by the candidate region suggestion module, and adjusting the regions to be the same in size to obtain a candidate region; and the detection module is used for classifying the obtained candidate regions and further correcting the frame positions of the foreground candidate regions by using a frame regression algorithm to obtain the final position of the detection target.

Description

Target detection system, method, device, equipment and medium

Technical Field

The present disclosure relates to the field of deep learning technologies, and more particularly, to a system, method, apparatus, device, and medium for target detection.

Background

The target detection is an important research direction of computer vision and digital image processing, and is widely applied to various fields of robot navigation, intelligent video monitoring, industrial detection, aerospace and the like. The target detection aims at finding out an object of interest in an image and comprises two subtasks of object positioning and object classification, namely, the determination of the category and the position of the object at the same time.

At present, a target detection mode for training by combining a convolutional neural network with a large amount of picture data has become a mainstream mode of the industry. Neural network based algorithms can be basically classified into two categories: a two-stage algorithm represented by fast R-CNN and a one-stage algorithm represented by YOLO, SSD, and the like.

The two-stage model, represented by Faster R-CNN, roughly consists of five modules:

an input module: the module receives an input image.

A feature extraction module: the module extracts a feature map from an input image through a series of convolutional neural networks.

Candidate Region suggestion module (Region pro-social Network, RPN): the module receives the feature map and outputs a coarse bounding box location containing a foreground region of the target and a bounding box location containing a background region of the target.

A candidate region extraction module: the module cuts out a candidate background area and a candidate foreground area from the feature map by using the frame position output by the RPN, and adjusts the candidate area to be the same size.

A detection module: the module classifies the obtained candidate regions, and further corrects the positions of the frames by using a frame regression algorithm to obtain the final positions of the detection regions.

The detection module needs to classify the obtained candidate regions and determine which type of foreground object the candidate regions belong to or are the background. The classification presupposes that a candidate region feature map training set is constructed, and the candidate region feature map training set comprises feature maps and labels corresponding to candidate regions. The label of the candidate region is generally determined by the intersection ratio (IoU) of the candidate region and the real border. Typically, the detection module will set a fixed IoU threshold. When IoU of the candidate region and a real frame is greater than the IoU threshold, the label is the object class (positive sample) contained in the real frame. If IoU for the candidate region and all real borders is less than the IoU threshold, then it is labeled as background class (negative examples). Experimental observations have found that when the threshold of IoU is set relatively low, a large number of low quality candidate regions are labeled with positive samples. In this case, the detector may generate more inaccurate frames. When the threshold value of IoU is set to be relatively high, although the quality of the candidate region is improved, the number of positive samples is greatly reduced, and the model is easily over-fitted.

Disclosure of Invention

For solving the technical problem that the accuracy of the existing target detection algorithm based on deep learning is not high enough, the present disclosure provides a target detection system, which comprises:

the input module is used for receiving the output image data;

the characteristic extraction module is used for extracting a characteristic graph from the image data through a convolutional neural network;

the candidate region suggesting module is used for receiving the feature map and outputting a rough frame position of a foreground region containing the target and a frame position of a background region;

the candidate region extraction module is used for cutting out a candidate background region and a candidate foreground region from the feature map by utilizing the frame position output by the candidate region suggestion module, and adjusting the regions to be the same in size to obtain a candidate region;

and the detection module is used for classifying the obtained candidate regions and further correcting the frame positions of the foreground candidate regions by using a frame regression algorithm to obtain the final position of the detection target.

Further, in the present invention,

the detection module specifically comprises: the method comprises the following steps that not less than one detector is provided, wherein each detector is preset with a corresponding intersection ratio IoU threshold value and is used for classifying candidate regions into positive samples and negative samples, the candidate regions with the intersection ratio to a real frame being greater than IoU threshold values are the positive samples, and the candidate regions with the intersection ratio to the real frame being less than IoU threshold values are the negative samples;

the detection module is specifically configured to:

and screening the candidate region extracted by the candidate region extraction module, calculating an intersection ratio of the candidate region and a real frame, searching a detector corresponding to the intersection ratio threshold according to the intersection ratio, and inputting the candidate region to the corresponding detector.

Further, the detection module is further configured to:

after the candidate region is input to the detector, the candidate region is classified and position-adjusted, and the intersection ratio IoU with the real tag is recalculated for the adjusted candidate region and input to the detector corresponding to the IoU numerical range thereof.

Further, the number of the detectors is three, and the detectors are respectively a first detector, a second detector and a third detector;

the intersection ratio threshold of the first detector is preset to be 0.45-0.55;

the intersection ratio threshold of the second detector is preset to be 0.56-0.65;

the intersection ratio threshold of the third detector is preset to be 0.66-0.75.

In order to achieve the above technical object, the present disclosure can also provide an object detection method applied to the above system, where the method includes:

collecting image data and a target label corresponding to the image data, wherein the target label comprises an object type and a frame position in an image;

inputting the image data to the target detection system to obtain the detection result of each detector;

and comparing the detection result with the real label by using a loss function to obtain the loss of each detector.

Further, after the step of comparing the detection result with the real tag by using the loss function to obtain the loss of each detector, the method further includes:

and adding the losses of all the detectors to obtain the total loss of the target detection system.

Further, when the system is used for target classification, the loss function is a cross entropy loss function;

when the system is used for positional regression analysis, the loss function is the Smooth L1 loss function or the GIoU loss function.

To achieve the above technical object, the present disclosure can also provide an object detecting device, including:

the image data collection module is used for collecting image data and a target label corresponding to the image data, wherein the target label comprises an object type and a frame position in an image;

the target detection module is used for inputting the image data to the target detection system to obtain the detection result of each detector;

and the loss calculation module is used for comparing the detection result with the real label by using a loss function to obtain the loss of each detector.

To achieve the above technical objects, the present disclosure can also provide a computer storage medium having stored thereon a computer program for implementing the steps of the above object detection method when the computer program is executed by a processor.

To achieve the above technical objective, the present disclosure further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the object detection method when executing the computer program.

The beneficial effect of this disclosure does:

compared with the traditional target detection system and algorithm model, the method and the system have the advantages that the multiple detectors with different intersection ratio thresholds are designed, the candidate area suitable for each detector is specifically selected for each detector, the training of the single detector is facilitated, and therefore the performance can be well improved.

Drawings

Fig. 1 shows a schematic structural diagram of a first embodiment of the present disclosure;

FIG. 2 shows a schematic structural diagram of a preferred implementation of the first embodiment of the present disclosure;

fig. 3 shows a flow diagram of a second embodiment of the disclosure;

fig. 4 shows a schematic structural diagram of a third embodiment of the present disclosure;

fig. 5 shows a schematic structural diagram of embodiment five of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

Various structural schematics according to embodiments of the present disclosure are shown in the figures. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers, and relative sizes and positional relationships therebetween shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, as actually required.

The first embodiment is as follows:

as shown in fig. 1:

the present disclosure provides a target detection system, comprising:

the input module is used for receiving the output image data;

the characteristic extraction module is used for carrying out characteristic extraction on the image data through a convolutional neural network to obtain an extracted characteristic diagram;

Further, the detection module specifically includes: the method comprises the following steps that not less than one detector is provided, wherein each detector is preset with a corresponding intersection ratio IoU threshold value and is used for classifying candidate regions into positive samples and negative samples, the candidate regions with the intersection ratio to a real frame being greater than IoU threshold values are the positive samples, and the candidate regions with the intersection ratio to the real frame being less than IoU threshold values are the negative samples;

the detection module is specifically configured to:

Further, the detection module is further configured to:

The object detection system of the present disclosure is described in detail below with reference to a preferred implementation of a specific example one:

as shown in fig. 2:

the detection module of the preferred embodiment has a total of three detectors, a first detector H1, a second detector H2, and a third detector H3;

the intersection ratio of the first detector H1 is preset to be 0.5;

the intersection ratio of the second detector H2 is preset to 0.6;

the intersection ratio of the third detector H3 is preset to 0.7.

In the process of detection by the detection module, if the candidate region is between 0.5 and 0.6 from IoU of the real border, the candidate region is input to the first detector H1. The candidate region B1 inputted to the first detector H1 gets classification information C1;

if the candidate region is between 0.6 and 0.7 from the IoU of the real border, the candidate region is input to the second detector H2. The candidate region B2 inputted to the second detector H2 obtains classification information C2;

the candidate region is input to the third detector H3 with IoU being above 0.7 relative to the real border. The candidate region B3 inputted to the third detector H3 obtains classification information C3;

meanwhile, the candidate region B1 adjusted by the first detector H1 is filtered, and if IoU is between 0.6 and 0.7, the candidate region is input to the second detector H2. If IoU for the candidate region and the real border is higher than 0.7, the candidate region is input to the third detector H3.

The candidate region B2 adjusted by the second detector H2 is filtered, and if IoU of the candidate region and the real border is higher than 0.7, the candidate region is input to the third detector H3.

Example two:

as shown in figure 3 of the drawings,

the present disclosure can also provide an object detection method, which is applied to the object detection system according to the first embodiment, and the method includes:

s201: collecting image data and a target label corresponding to the image data, wherein the target label comprises an object type and a frame position in an image;

s202: inputting the image data to the target detection system to obtain the detection result of each detector;

s203: and comparing the detection result with the real label by using a loss function to obtain the loss of each detector.

Example three:

as shown in figure 4 of the drawings,

the present disclosure can also provide an object detecting device including:

an image data collection module 301, configured to collect image data and a target label corresponding to the image data, where the target label includes an object category and a frame position in an image;

a target detection module 302, configured to input the image data to the target detection system, so as to obtain a detection result of each detector;

and a loss calculating module 303, configured to compare the detection result with the real tag by using a loss function, so as to obtain a loss of each detector.

The image data collection module 301 of the present disclosure is connected to the target detection module 302 and the loss calculation module 303 in sequence.

Example four:

the present disclosure can also provide a computer storage medium having stored thereon a computer program for implementing the steps of the object detection method described above when executed by a processor.

The computer storage medium of the present disclosure may be implemented with a semiconductor memory, a magnetic core memory, a magnetic drum memory, or a magnetic disk memory.

Semiconductor memories are mainly used as semiconductor memory elements of computers, and there are two types, Mos and bipolar memory elements. Mos devices have high integration, simple process, but slow speed. The bipolar element has the advantages of complex process, high power consumption, low integration level and high speed. NMos and CMos were introduced to make Mos memory dominate in semiconductor memory. NMos is fast, e.g. 45ns for 1K bit sram from intel. The CMos power consumption is low, and the access time of the 4K-bit CMos static memory is 300 ns. The semiconductor memories described above are all Random Access Memories (RAMs), i.e. read and write new contents randomly during operation. And a semiconductor Read Only Memory (ROM), which can be read out randomly but cannot be written in during operation, is used to store solidified programs and data. The ROM is classified into a non-rewritable fuse type ROM, PROM, and a rewritable EPROM.

The magnetic core memory has the characteristics of low cost and high reliability, and has more than 20 years of practical use experience. Magnetic core memories were widely used as main memories before the mid 70's. The storage capacity can reach more than 10 bits, and the access time is 300ns at the fastest speed. The typical international magnetic core memory has a capacity of 4 MS-8 MB and an access cycle of 1.0-1.5 mus. After semiconductor memory is rapidly developed to replace magnetic core memory as a main memory location, magnetic core memory can still be applied as a large-capacity expansion memory.

Drum memory, an external memory for magnetic recording. Because of its fast information access speed and stable and reliable operation, it is being replaced by disk memory, but it is still used as external memory for real-time process control computers and medium and large computers. In order to meet the needs of small and micro computers, subminiature magnetic drums have emerged, which are small, lightweight, highly reliable, and convenient to use.

Magnetic disk memory, an external memory for magnetic recording. It combines the advantages of drum and tape storage, i.e. its storage capacity is larger than that of drum, its access speed is faster than that of tape storage, and it can be stored off-line, so that the magnetic disk is widely used as large-capacity external storage in various computer systems. Magnetic disks are generally classified into two main categories, hard disks and floppy disk memories.

Hard disk memories are of a wide variety. The structure is divided into a replaceable type and a fixed type. The replaceable disk is replaceable and the fixed disk is fixed. The replaceable and fixed magnetic disks have both multi-disk combinations and single-chip structures, and are divided into fixed head types and movable head types. The fixed head type magnetic disk has a small capacity, a low recording density, a high access speed, and a high cost. The movable head type magnetic disk has a high recording density (up to 1000 to 6250 bits/inch) and thus a large capacity, but has a low access speed compared with a fixed head magnetic disk. The storage capacity of a magnetic disk product can reach several hundred megabytes with a bit density of 6250 bits per inch and a track density of 475 tracks per inch. The disk set of the multiple replaceable disk memory can be replaced, so that the disk set has large off-body capacity, large capacity and high speed, can store large-capacity information data, and is widely applied to an online information retrieval system and a database management system.

Example five:

the present disclosure also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the object detection method are implemented.

Fig. 5 is a schematic diagram of an internal structure of the electronic device in one embodiment. As shown in fig. 5, the electronic device includes a processor, a storage medium, a memory, and a network interface connected through a system bus. The storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can make a processor realize a target detection method when being executed by the processor. The processor of the electrical device is used to provide computing and control capabilities to support the operation of the entire computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform a method of object detection. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

The electronic device includes, but is not limited to, a smart phone, a computer, a tablet, a wearable smart device, an artificial smart device, a mobile power source, and the like.

The processor may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor is a Control Unit of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules (for example, executing remote data reading and writing programs, etc.) stored in the memory and calling data stored in the memory.

The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connected communication between the memory and at least one processor or the like.

Fig. 5 shows only an electronic device having components, and those skilled in the art will appreciate that the structure shown in fig. 5 does not constitute a limitation of the electronic device, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.

For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor through a power management device, so that functions such as charge management, discharge management, and power consumption management are implemented through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.

Further, the electronic device may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used to establish a communication connection between the electronic device and other electronic devices.

Optionally, the electronic device may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.

Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. An object detection system, comprising:

the input module is used for receiving the output image data;

the detection module is used for classifying the obtained candidate regions and correcting the frame positions of the foreground candidate regions by using a frame regression algorithm to obtain the final position of the detection target;

the detection module specifically comprises: the device comprises three detectors, wherein each detector is preset with a corresponding intersection ratio IoU threshold and is used for classifying candidate regions into positive samples and negative samples, wherein the candidate regions with the intersection ratio to a real frame being greater than IoU threshold are the positive samples, and the candidate regions with the intersection ratio to the real frame being less than IoU threshold are the negative samples;

the detection module is specifically configured to:

screening the candidate region extracted by the candidate region extraction module, calculating an intersection ratio of the candidate region and a real frame, searching a detector corresponding to the intersection ratio threshold according to the intersection ratio, and inputting the candidate region to the corresponding detector;

wherein the content of the first and second substances,

in the process of the detection by the detection module,

if IoU for the candidate region and the real bounding box is between the cross-over threshold of the first detector and the cross-over threshold of the second detector, then the candidate region is input to the first detector H1;

the first candidate region B1 inputted to the first detector H1 obtains first classification information C1;

if IoU for the candidate region and the real border is between the intersection ratio threshold of the second detector and the intersection ratio threshold of the third detector, then the candidate region is input to the second detector H2;

the second candidate region B2 inputted to the second detector H2 obtains second classification information C2;

if IoU for the candidate region and the real border is above the threshold of the intersection ratio of the third detector, then the candidate region is input to the third detector H3;

the third candidate region B3 inputted to the third detector H3 obtains third classification information C3;

meanwhile, the first candidate region B1 adjusted by the first detector H1 is screened, and if IoU is between the preset intersection ratio threshold of the second detector and the intersection ratio threshold of the third detector, the candidate region is input to the second detector H2; if IoU for the candidate region and the real border is above the threshold of the intersection ratio of the third detector, then the candidate region is input to the third detector H3;

the second candidate region B2 adjusted by the second detector H2 is filtered, and if IoU of the candidate region and the real frame is higher than the threshold of the intersection ratio of the third detector, the candidate region is input to the third detector H3.

2. The system of claim 1, wherein the detection module is further configured to:

3. The system according to any one of claims 1 or 2,

4. An object detection method applied to the system as claimed in any one of claims 1 to 3, wherein the method comprises:

5. The method of claim 4, wherein the step of comparing the detection result with the authentic tag using a loss function to obtain the loss of each detector further comprises:

6. The method according to any one of claims 4 or 5, wherein, when the system is used for object classification, the loss function is a cross-entropy loss function;

7. An object detection device for use in a system as claimed in any one of claims 1 to 3, comprising:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor implements the steps corresponding to the object detection method as claimed in any one of claims 4 to 6 when executing the computer program.

9. A computer storage medium having computer program instructions stored thereon, wherein the program instructions, when executed by a processor, are adapted to perform the steps corresponding to the object detection method as claimed in any one of claims 4 to 6.