CN113255682B - Target detection system, method, device, equipment and medium - Google Patents

Target detection system, method, device, equipment and medium Download PDF

Info

Publication number
CN113255682B
CN113255682B CN202110622240.5A CN202110622240A CN113255682B CN 113255682 B CN113255682 B CN 113255682B CN 202110622240 A CN202110622240 A CN 202110622240A CN 113255682 B CN113255682 B CN 113255682B
Authority
CN
China
Prior art keywords
detector
candidate region
candidate
module
intersection ratio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110622240.5A
Other languages
Chinese (zh)
Other versions
CN113255682A (en
Inventor
廖丹萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Eda Precision Electromechanical Science & Technology Co ltd
Original Assignee
Zhejiang Smart Video Security Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Smart Video Security Innovation Center Co Ltd filed Critical Zhejiang Smart Video Security Innovation Center Co Ltd
Priority to CN202110622240.5A priority Critical patent/CN113255682B/en
Publication of CN113255682A publication Critical patent/CN113255682A/en
Application granted granted Critical
Publication of CN113255682B publication Critical patent/CN113255682B/en
Priority to PCT/CN2021/139062 priority patent/WO2022252565A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides a target detection system, method, apparatus, medium and device, wherein the system comprises: the input module is used for receiving the output image data; the characteristic extraction module is used for carrying out characteristic extraction on the image data through a convolutional neural network to obtain an extracted characteristic diagram; the candidate region suggesting module is used for receiving the feature map and outputting a rough frame position of a foreground region containing the target and a frame position of a background region; the candidate region extraction module is used for cutting out a candidate background region and a candidate foreground region from the feature map by utilizing the frame position output by the candidate region suggestion module, and adjusting the regions to be the same in size to obtain a candidate region; and the detection module is used for classifying the obtained candidate regions and further correcting the frame positions of the foreground candidate regions by using a frame regression algorithm to obtain the final position of the detection target.

Description

Target detection system, method, device, equipment and medium
Technical Field
The present disclosure relates to the field of deep learning technologies, and more particularly, to a system, method, apparatus, device, and medium for target detection.
Background
The target detection is an important research direction of computer vision and digital image processing, and is widely applied to various fields of robot navigation, intelligent video monitoring, industrial detection, aerospace and the like. The target detection aims at finding out an object of interest in an image and comprises two subtasks of object positioning and object classification, namely, the determination of the category and the position of the object at the same time.
At present, a target detection mode for training by combining a convolutional neural network with a large amount of picture data has become a mainstream mode of the industry. Neural network based algorithms can be basically classified into two categories: a two-stage algorithm represented by fast R-CNN and a one-stage algorithm represented by YOLO, SSD, and the like.
The two-stage model, represented by Faster R-CNN, roughly consists of five modules:
an input module: the module receives an input image.
A feature extraction module: the module extracts a feature map from an input image through a series of convolutional neural networks.
Candidate Region suggestion module (Region pro-social Network, RPN): the module receives the feature map and outputs a coarse bounding box location containing a foreground region of the target and a bounding box location containing a background region of the target.
A candidate region extraction module: the module cuts out a candidate background area and a candidate foreground area from the feature map by using the frame position output by the RPN, and adjusts the candidate area to be the same size.
A detection module: the module classifies the obtained candidate regions, and further corrects the positions of the frames by using a frame regression algorithm to obtain the final positions of the detection regions.
The detection module needs to classify the obtained candidate regions and determine which type of foreground object the candidate regions belong to or are the background. The classification presupposes that a candidate region feature map training set is constructed, and the candidate region feature map training set comprises feature maps and labels corresponding to candidate regions. The label of the candidate region is generally determined by the intersection ratio (IoU) of the candidate region and the real border. Typically, the detection module will set a fixed IoU threshold. When IoU of the candidate region and a real frame is greater than the IoU threshold, the label is the object class (positive sample) contained in the real frame. If IoU for the candidate region and all real borders is less than the IoU threshold, then it is labeled as background class (negative examples). Experimental observations have found that when the threshold of IoU is set relatively low, a large number of low quality candidate regions are labeled with positive samples. In this case, the detector may generate more inaccurate frames. When the threshold value of IoU is set to be relatively high, although the quality of the candidate region is improved, the number of positive samples is greatly reduced, and the model is easily over-fitted.
Disclosure of Invention
For solving the technical problem that the accuracy of the existing target detection algorithm based on deep learning is not high enough, the present disclosure provides a target detection system, which comprises:
the input module is used for receiving the output image data;
the characteristic extraction module is used for extracting a characteristic graph from the image data through a convolutional neural network;
the candidate region suggesting module is used for receiving the feature map and outputting a rough frame position of a foreground region containing the target and a frame position of a background region;
the candidate region extraction module is used for cutting out a candidate background region and a candidate foreground region from the feature map by utilizing the frame position output by the candidate region suggestion module, and adjusting the regions to be the same in size to obtain a candidate region;
and the detection module is used for classifying the obtained candidate regions and further correcting the frame positions of the foreground candidate regions by using a frame regression algorithm to obtain the final position of the detection target.
Further, in the present invention,
the detection module specifically comprises: the method comprises the following steps that not less than one detector is provided, wherein each detector is preset with a corresponding intersection ratio IoU threshold value and is used for classifying candidate regions into positive samples and negative samples, the candidate regions with the intersection ratio to a real frame being greater than IoU threshold values are the positive samples, and the candidate regions with the intersection ratio to the real frame being less than IoU threshold values are the negative samples;
the detection module is specifically configured to:
and screening the candidate region extracted by the candidate region extraction module, calculating an intersection ratio of the candidate region and a real frame, searching a detector corresponding to the intersection ratio threshold according to the intersection ratio, and inputting the candidate region to the corresponding detector.
Further, the detection module is further configured to:
after the candidate region is input to the detector, the candidate region is classified and position-adjusted, and the intersection ratio IoU with the real tag is recalculated for the adjusted candidate region and input to the detector corresponding to the IoU numerical range thereof.
Further, the number of the detectors is three, and the detectors are respectively a first detector, a second detector and a third detector;
the intersection ratio threshold of the first detector is preset to be 0.45-0.55;
the intersection ratio threshold of the second detector is preset to be 0.56-0.65;
the intersection ratio threshold of the third detector is preset to be 0.66-0.75.
In order to achieve the above technical object, the present disclosure can also provide an object detection method applied to the above system, where the method includes:
collecting image data and a target label corresponding to the image data, wherein the target label comprises an object type and a frame position in an image;
inputting the image data to the target detection system to obtain the detection result of each detector;
and comparing the detection result with the real label by using a loss function to obtain the loss of each detector.
Further, after the step of comparing the detection result with the real tag by using the loss function to obtain the loss of each detector, the method further includes:
and adding the losses of all the detectors to obtain the total loss of the target detection system.
Further, when the system is used for target classification, the loss function is a cross entropy loss function;
when the system is used for positional regression analysis, the loss function is the Smooth L1 loss function or the GIoU loss function.
To achieve the above technical object, the present disclosure can also provide an object detecting device, including:
the image data collection module is used for collecting image data and a target label corresponding to the image data, wherein the target label comprises an object type and a frame position in an image;
the target detection module is used for inputting the image data to the target detection system to obtain the detection result of each detector;
and the loss calculation module is used for comparing the detection result with the real label by using a loss function to obtain the loss of each detector.
To achieve the above technical objects, the present disclosure can also provide a computer storage medium having stored thereon a computer program for implementing the steps of the above object detection method when the computer program is executed by a processor.
To achieve the above technical objective, the present disclosure further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the object detection method when executing the computer program.
The beneficial effect of this disclosure does:
compared with the traditional target detection system and algorithm model, the method and the system have the advantages that the multiple detectors with different intersection ratio thresholds are designed, the candidate area suitable for each detector is specifically selected for each detector, the training of the single detector is facilitated, and therefore the performance can be well improved.
Drawings
Fig. 1 shows a schematic structural diagram of a first embodiment of the present disclosure;
FIG. 2 shows a schematic structural diagram of a preferred implementation of the first embodiment of the present disclosure;
fig. 3 shows a flow diagram of a second embodiment of the disclosure;
fig. 4 shows a schematic structural diagram of a third embodiment of the present disclosure;
fig. 5 shows a schematic structural diagram of embodiment five of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
Various structural schematics according to embodiments of the present disclosure are shown in the figures. The figures are not drawn to scale, wherein certain details are exaggerated and possibly omitted for clarity of presentation. The shapes of various regions, layers, and relative sizes and positional relationships therebetween shown in the drawings are merely exemplary, and deviations may occur in practice due to manufacturing tolerances or technical limitations, and a person skilled in the art may additionally design regions/layers having different shapes, sizes, relative positions, as actually required.
The first embodiment is as follows:
as shown in fig. 1:
the present disclosure provides a target detection system, comprising:
the input module is used for receiving the output image data;
the characteristic extraction module is used for carrying out characteristic extraction on the image data through a convolutional neural network to obtain an extracted characteristic diagram;
the candidate region suggesting module is used for receiving the feature map and outputting a rough frame position of a foreground region containing the target and a frame position of a background region;
the candidate region extraction module is used for cutting out a candidate background region and a candidate foreground region from the feature map by utilizing the frame position output by the candidate region suggestion module, and adjusting the regions to be the same in size to obtain a candidate region;
and the detection module is used for classifying the obtained candidate regions and further correcting the frame positions of the foreground candidate regions by using a frame regression algorithm to obtain the final position of the detection target.
Further, the detection module specifically includes: the method comprises the following steps that not less than one detector is provided, wherein each detector is preset with a corresponding intersection ratio IoU threshold value and is used for classifying candidate regions into positive samples and negative samples, the candidate regions with the intersection ratio to a real frame being greater than IoU threshold values are the positive samples, and the candidate regions with the intersection ratio to the real frame being less than IoU threshold values are the negative samples;
the detection module is specifically configured to:
and screening the candidate region extracted by the candidate region extraction module, calculating an intersection ratio of the candidate region and a real frame, searching a detector corresponding to the intersection ratio threshold according to the intersection ratio, and inputting the candidate region to the corresponding detector.
Further, the detection module is further configured to:
after the candidate region is input to the detector, the candidate region is classified and position-adjusted, and the intersection ratio IoU with the real tag is recalculated for the adjusted candidate region and input to the detector corresponding to the IoU numerical range thereof.
Further, the number of the detectors is three, and the detectors are respectively a first detector, a second detector and a third detector;
the intersection ratio threshold of the first detector is preset to be 0.45-0.55;
the intersection ratio threshold of the second detector is preset to be 0.56-0.65;
the intersection ratio threshold of the third detector is preset to be 0.66-0.75.
The object detection system of the present disclosure is described in detail below with reference to a preferred implementation of a specific example one:
as shown in fig. 2:
the detection module of the preferred embodiment has a total of three detectors, a first detector H1, a second detector H2, and a third detector H3;
the intersection ratio of the first detector H1 is preset to be 0.5;
the intersection ratio of the second detector H2 is preset to 0.6;
the intersection ratio of the third detector H3 is preset to 0.7.
In the process of detection by the detection module, if the candidate region is between 0.5 and 0.6 from IoU of the real border, the candidate region is input to the first detector H1. The candidate region B1 inputted to the first detector H1 gets classification information C1;
if the candidate region is between 0.6 and 0.7 from the IoU of the real border, the candidate region is input to the second detector H2. The candidate region B2 inputted to the second detector H2 obtains classification information C2;
the candidate region is input to the third detector H3 with IoU being above 0.7 relative to the real border. The candidate region B3 inputted to the third detector H3 obtains classification information C3;
meanwhile, the candidate region B1 adjusted by the first detector H1 is filtered, and if IoU is between 0.6 and 0.7, the candidate region is input to the second detector H2. If IoU for the candidate region and the real border is higher than 0.7, the candidate region is input to the third detector H3.
The candidate region B2 adjusted by the second detector H2 is filtered, and if IoU of the candidate region and the real border is higher than 0.7, the candidate region is input to the third detector H3.
Example two:
as shown in figure 3 of the drawings,
the present disclosure can also provide an object detection method, which is applied to the object detection system according to the first embodiment, and the method includes:
s201: collecting image data and a target label corresponding to the image data, wherein the target label comprises an object type and a frame position in an image;
s202: inputting the image data to the target detection system to obtain the detection result of each detector;
s203: and comparing the detection result with the real label by using a loss function to obtain the loss of each detector.
Further, after the step of comparing the detection result with the real tag by using the loss function to obtain the loss of each detector, the method further includes:
and adding the losses of all the detectors to obtain the total loss of the target detection system.
Further, when the system is used for target classification, the loss function is a cross entropy loss function;
when the system is used for positional regression analysis, the loss function is the Smooth L1 loss function or the GIoU loss function.
Example three:
as shown in figure 4 of the drawings,
the present disclosure can also provide an object detecting device including:
an image data collection module 301, configured to collect image data and a target label corresponding to the image data, where the target label includes an object category and a frame position in an image;
a target detection module 302, configured to input the image data to the target detection system, so as to obtain a detection result of each detector;
and a loss calculating module 303, configured to compare the detection result with the real tag by using a loss function, so as to obtain a loss of each detector.
The image data collection module 301 of the present disclosure is connected to the target detection module 302 and the loss calculation module 303 in sequence.
Example four:
the present disclosure can also provide a computer storage medium having stored thereon a computer program for implementing the steps of the object detection method described above when executed by a processor.
The computer storage medium of the present disclosure may be implemented with a semiconductor memory, a magnetic core memory, a magnetic drum memory, or a magnetic disk memory.
Semiconductor memories are mainly used as semiconductor memory elements of computers, and there are two types, Mos and bipolar memory elements. Mos devices have high integration, simple process, but slow speed. The bipolar element has the advantages of complex process, high power consumption, low integration level and high speed. NMos and CMos were introduced to make Mos memory dominate in semiconductor memory. NMos is fast, e.g. 45ns for 1K bit sram from intel. The CMos power consumption is low, and the access time of the 4K-bit CMos static memory is 300 ns. The semiconductor memories described above are all Random Access Memories (RAMs), i.e. read and write new contents randomly during operation. And a semiconductor Read Only Memory (ROM), which can be read out randomly but cannot be written in during operation, is used to store solidified programs and data. The ROM is classified into a non-rewritable fuse type ROM, PROM, and a rewritable EPROM.
The magnetic core memory has the characteristics of low cost and high reliability, and has more than 20 years of practical use experience. Magnetic core memories were widely used as main memories before the mid 70's. The storage capacity can reach more than 10 bits, and the access time is 300ns at the fastest speed. The typical international magnetic core memory has a capacity of 4 MS-8 MB and an access cycle of 1.0-1.5 mus. After semiconductor memory is rapidly developed to replace magnetic core memory as a main memory location, magnetic core memory can still be applied as a large-capacity expansion memory.
Drum memory, an external memory for magnetic recording. Because of its fast information access speed and stable and reliable operation, it is being replaced by disk memory, but it is still used as external memory for real-time process control computers and medium and large computers. In order to meet the needs of small and micro computers, subminiature magnetic drums have emerged, which are small, lightweight, highly reliable, and convenient to use.
Magnetic disk memory, an external memory for magnetic recording. It combines the advantages of drum and tape storage, i.e. its storage capacity is larger than that of drum, its access speed is faster than that of tape storage, and it can be stored off-line, so that the magnetic disk is widely used as large-capacity external storage in various computer systems. Magnetic disks are generally classified into two main categories, hard disks and floppy disk memories.
Hard disk memories are of a wide variety. The structure is divided into a replaceable type and a fixed type. The replaceable disk is replaceable and the fixed disk is fixed. The replaceable and fixed magnetic disks have both multi-disk combinations and single-chip structures, and are divided into fixed head types and movable head types. The fixed head type magnetic disk has a small capacity, a low recording density, a high access speed, and a high cost. The movable head type magnetic disk has a high recording density (up to 1000 to 6250 bits/inch) and thus a large capacity, but has a low access speed compared with a fixed head magnetic disk. The storage capacity of a magnetic disk product can reach several hundred megabytes with a bit density of 6250 bits per inch and a track density of 475 tracks per inch. The disk set of the multiple replaceable disk memory can be replaced, so that the disk set has large off-body capacity, large capacity and high speed, can store large-capacity information data, and is widely applied to an online information retrieval system and a database management system.
Example five:
the present disclosure also provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the object detection method are implemented.
Fig. 5 is a schematic diagram of an internal structure of the electronic device in one embodiment. As shown in fig. 5, the electronic device includes a processor, a storage medium, a memory, and a network interface connected through a system bus. The storage medium of the computer device stores an operating system, a database and computer readable instructions, the database can store control information sequences, and the computer readable instructions can make a processor realize a target detection method when being executed by the processor. The processor of the electrical device is used to provide computing and control capabilities to support the operation of the entire computer device. The memory of the computer device may have stored therein computer readable instructions that, when executed by the processor, may cause the processor to perform a method of object detection. The network interface of the computer device is used for connecting and communicating with the terminal. Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
The electronic device includes, but is not limited to, a smart phone, a computer, a tablet, a wearable smart device, an artificial smart device, a mobile power source, and the like.
The processor may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor is a Control Unit of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules (for example, executing remote data reading and writing programs, etc.) stored in the memory and calling data stored in the memory.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connected communication between the memory and at least one processor or the like.
Fig. 5 shows only an electronic device having components, and those skilled in the art will appreciate that the structure shown in fig. 5 does not constitute a limitation of the electronic device, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor through a power management device, so that functions such as charge management, discharge management, and power consumption management are implemented through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used to establish a communication connection between the electronic device and other electronic devices.
Optionally, the electronic device may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.
Further, the computer usable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (9)

1. An object detection system, comprising:
the input module is used for receiving the output image data;
the characteristic extraction module is used for carrying out characteristic extraction on the image data through a convolutional neural network to obtain an extracted characteristic diagram;
the candidate region suggesting module is used for receiving the feature map and outputting a rough frame position of a foreground region containing the target and a frame position of a background region;
the candidate region extraction module is used for cutting out a candidate background region and a candidate foreground region from the feature map by utilizing the frame position output by the candidate region suggestion module, and adjusting the regions to be the same in size to obtain a candidate region;
the detection module is used for classifying the obtained candidate regions and correcting the frame positions of the foreground candidate regions by using a frame regression algorithm to obtain the final position of the detection target;
the detection module specifically comprises: the device comprises three detectors, wherein each detector is preset with a corresponding intersection ratio IoU threshold and is used for classifying candidate regions into positive samples and negative samples, wherein the candidate regions with the intersection ratio to a real frame being greater than IoU threshold are the positive samples, and the candidate regions with the intersection ratio to the real frame being less than IoU threshold are the negative samples;
the detection module is specifically configured to:
screening the candidate region extracted by the candidate region extraction module, calculating an intersection ratio of the candidate region and a real frame, searching a detector corresponding to the intersection ratio threshold according to the intersection ratio, and inputting the candidate region to the corresponding detector;
wherein the content of the first and second substances,
in the process of the detection by the detection module,
if IoU for the candidate region and the real bounding box is between the cross-over threshold of the first detector and the cross-over threshold of the second detector, then the candidate region is input to the first detector H1;
the first candidate region B1 inputted to the first detector H1 obtains first classification information C1;
if IoU for the candidate region and the real border is between the intersection ratio threshold of the second detector and the intersection ratio threshold of the third detector, then the candidate region is input to the second detector H2;
the second candidate region B2 inputted to the second detector H2 obtains second classification information C2;
if IoU for the candidate region and the real border is above the threshold of the intersection ratio of the third detector, then the candidate region is input to the third detector H3;
the third candidate region B3 inputted to the third detector H3 obtains third classification information C3;
meanwhile, the first candidate region B1 adjusted by the first detector H1 is screened, and if IoU is between the preset intersection ratio threshold of the second detector and the intersection ratio threshold of the third detector, the candidate region is input to the second detector H2; if IoU for the candidate region and the real border is above the threshold of the intersection ratio of the third detector, then the candidate region is input to the third detector H3;
the second candidate region B2 adjusted by the second detector H2 is filtered, and if IoU of the candidate region and the real frame is higher than the threshold of the intersection ratio of the third detector, the candidate region is input to the third detector H3.
2. The system of claim 1, wherein the detection module is further configured to:
after the candidate region is input to the detector, the candidate region is classified and position-adjusted, and the intersection ratio IoU with the real tag is recalculated for the adjusted candidate region and input to the detector corresponding to the IoU numerical range thereof.
3. The system according to any one of claims 1 or 2,
the intersection ratio threshold of the first detector is preset to be 0.45-0.55;
the intersection ratio threshold of the second detector is preset to be 0.56-0.65;
the intersection ratio threshold of the third detector is preset to be 0.66-0.75.
4. An object detection method applied to the system as claimed in any one of claims 1 to 3, wherein the method comprises:
collecting image data and a target label corresponding to the image data, wherein the target label comprises an object type and a frame position in an image;
inputting the image data to the target detection system to obtain the detection result of each detector;
and comparing the detection result with the real label by using a loss function to obtain the loss of each detector.
5. The method of claim 4, wherein the step of comparing the detection result with the authentic tag using a loss function to obtain the loss of each detector further comprises:
and adding the losses of all the detectors to obtain the total loss of the target detection system.
6. The method according to any one of claims 4 or 5, wherein, when the system is used for object classification, the loss function is a cross-entropy loss function;
when the system is used for positional regression analysis, the loss function is the Smooth L1 loss function or the GIoU loss function.
7. An object detection device for use in a system as claimed in any one of claims 1 to 3, comprising:
the image data collection module is used for collecting image data and a target label corresponding to the image data, wherein the target label comprises an object type and a frame position in an image;
the target detection module is used for inputting the image data to the target detection system to obtain the detection result of each detector;
and the loss calculation module is used for comparing the detection result with the real label by using a loss function to obtain the loss of each detector.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and operable on the processor, wherein the processor implements the steps corresponding to the object detection method as claimed in any one of claims 4 to 6 when executing the computer program.
9. A computer storage medium having computer program instructions stored thereon, wherein the program instructions, when executed by a processor, are adapted to perform the steps corresponding to the object detection method as claimed in any one of claims 4 to 6.
CN202110622240.5A 2021-06-04 2021-06-04 Target detection system, method, device, equipment and medium Active CN113255682B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110622240.5A CN113255682B (en) 2021-06-04 2021-06-04 Target detection system, method, device, equipment and medium
PCT/CN2021/139062 WO2022252565A1 (en) 2021-06-04 2021-12-17 Target detection system, method and apparatus, and device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110622240.5A CN113255682B (en) 2021-06-04 2021-06-04 Target detection system, method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN113255682A CN113255682A (en) 2021-08-13
CN113255682B true CN113255682B (en) 2021-11-16

Family

ID=77186397

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110622240.5A Active CN113255682B (en) 2021-06-04 2021-06-04 Target detection system, method, device, equipment and medium

Country Status (2)

Country Link
CN (1) CN113255682B (en)
WO (1) WO2022252565A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113255682B (en) * 2021-06-04 2021-11-16 浙江智慧视频安防创新中心有限公司 Target detection system, method, device, equipment and medium
CN117237697B (en) * 2023-08-01 2024-05-17 北京邮电大学 Small sample image detection method, system, medium and equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877058A (en) * 2010-02-10 2010-11-03 杭州海康威视软件有限公司 People flow rate statistical method and system
CN111160407A (en) * 2019-12-10 2020-05-15 重庆特斯联智慧科技股份有限公司 Deep learning target detection method and system

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108109160A (en) * 2017-11-16 2018-06-01 浙江工业大学 It is a kind of that interactive GrabCut tongue bodies dividing method is exempted from based on deep learning
CN108830188B (en) * 2018-05-30 2022-03-04 西安理工大学 Vehicle detection method based on deep learning
CN109800631B (en) * 2018-12-07 2023-10-24 天津大学 Fluorescence coding microsphere image detection method based on mask region convolution neural network
CN109858481A (en) * 2019-01-09 2019-06-07 杭州电子科技大学 A kind of Ship Target Detection method based on the detection of cascade position sensitivity
CN109977945A (en) * 2019-02-26 2019-07-05 博众精工科技股份有限公司 Localization method and system based on deep learning
CN109977812B (en) * 2019-03-12 2023-02-24 南京邮电大学 Vehicle-mounted video target detection method based on deep learning
CN110210391A (en) * 2019-05-31 2019-09-06 合肥云诊信息科技有限公司 Tongue picture grain quantitative analysis method based on multiple dimensioned convolutional neural networks
CN111091105B (en) * 2019-12-23 2020-10-20 郑州轻工业大学 Remote sensing image target detection method based on new frame regression loss function
CN111401410B (en) * 2020-02-27 2023-06-13 江苏大学 Traffic sign detection method based on improved cascade neural network
CN111539469B (en) * 2020-04-20 2022-04-08 东南大学 Weak supervision fine-grained image identification method based on vision self-attention mechanism
CN111611947B (en) * 2020-05-25 2024-04-09 济南博观智能科技有限公司 License plate detection method, device, equipment and medium
CN111861978B (en) * 2020-05-29 2023-10-31 陕西师范大学 Bridge crack example segmentation method based on Faster R-CNN
CN112598683B (en) * 2020-12-27 2024-04-02 北京化工大学 Sweep OCT human eye image segmentation method based on sweep frequency optical coherence tomography
CN113255682B (en) * 2021-06-04 2021-11-16 浙江智慧视频安防创新中心有限公司 Target detection system, method, device, equipment and medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877058A (en) * 2010-02-10 2010-11-03 杭州海康威视软件有限公司 People flow rate statistical method and system
CN111160407A (en) * 2019-12-10 2020-05-15 重庆特斯联智慧科技股份有限公司 Deep learning target detection method and system

Also Published As

Publication number Publication date
WO2022252565A1 (en) 2022-12-08
CN113255682A (en) 2021-08-13

Similar Documents

Publication Publication Date Title
CN111723786B (en) Method and device for detecting wearing of safety helmet based on single model prediction
CN113255682B (en) Target detection system, method, device, equipment and medium
CN109508879B (en) Risk identification method, device and equipment
CN114754786B (en) Truck navigation path-finding method, device, equipment and medium
CN115115825B (en) Method, device, computer equipment and storage medium for detecting object in image
US12008037B2 (en) Method of video search in an electronic device
CN115062200A (en) User behavior mining method and system based on artificial intelligence
CN110598042A (en) Incremental update-based video structured real-time updating method and system
CN112860851B (en) Course recommendation method, device, equipment and medium based on root cause analysis
CN112528903A (en) Face image acquisition method and device, electronic equipment and medium
CN111797175B (en) Data storage method and device, storage medium and electronic equipment
CN117313141A (en) Abnormality detection method, abnormality detection device, abnormality detection equipment and readable storage medium
CN113806539B (en) Text data enhancement system, method, equipment and medium
CN115525761A (en) Method, device, equipment and storage medium for article keyword screening category
CN112328630B (en) Data query method, device, equipment and storage medium
CN111178455B (en) Image clustering method, system, device and medium
CN112989938A (en) Real-time tracking and identifying method, device, medium and equipment for pedestrians
CN114882489B (en) Method, device, equipment and medium for horizontally correcting rotating license plate
CN112995063B (en) Flow monitoring method, device, equipment and medium
CN113537286B (en) Image classification method, device, equipment and medium
CN117237697B (en) Small sample image detection method, system, medium and equipment
CN115424042A (en) Network sparsification method, device, medium and equipment based on interlayer feature similarity
CN111832436B (en) Multi-task and weak supervision-based beauty prediction method and device and storage medium
CN112232115A (en) Calculation factor implantation method, medium and equipment
CN112995222B (en) Network detection method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20210813

Assignee: Zhejiang Fenghuang Yunrui Technology Co.,Ltd.

Assignor: Zhejiang smart video security Innovation Center Co.,Ltd.

Contract record no.: X2022330000060

Denomination of invention: A target detection system, method, device, equipment and medium

Granted publication date: 20211116

License type: Common License

Record date: 20220325

Application publication date: 20210813

Assignee: HANGZHOU SHIHUI TECHNOLOGY Co.,Ltd.

Assignor: Zhejiang smart video security Innovation Center Co.,Ltd.

Contract record no.: X2022330000061

Denomination of invention: A target detection system, method, device, equipment and medium

Granted publication date: 20211116

License type: Common License

Record date: 20220325

EE01 Entry into force of recordation of patent licensing contract
TR01 Transfer of patent right

Effective date of registration: 20220608

Address after: 311261 No. 60, Hengda Road, daicun Town, Xiaoshan District, Hangzhou City, Zhejiang Province

Patentee after: HANGZHOU EDA PRECISION ELECTROMECHANICAL SCIENCE & TECHNOLOGY CO.,LTD.

Address before: 311215 unit 1, building 1, area C, Qianjiang Century Park, ningwei street, Xiaoshan District, Hangzhou City, Zhejiang Province

Patentee before: Zhejiang smart video security Innovation Center Co.,Ltd.

TR01 Transfer of patent right
EC01 Cancellation of recordation of patent licensing contract

Assignee: Zhejiang Fenghuang Yunrui Technology Co.,Ltd.

Assignor: Zhejiang smart video security Innovation Center Co.,Ltd.

Contract record no.: X2022330000060

Date of cancellation: 20220706

Assignee: HANGZHOU SHIHUI TECHNOLOGY Co.,Ltd.

Assignor: Zhejiang smart video security Innovation Center Co.,Ltd.

Contract record no.: X2022330000061

Date of cancellation: 20220707

EC01 Cancellation of recordation of patent licensing contract